Documentation:Toolkit for the Digitization of First Nations Knowledge/SECTION C: Digitization Best Practices and Guides/C2: Standards

From UBC Wiki

The following tables contain standards summarized from various industry standards that should be followed for all digitization projects if possible. By adhering to these accepted standards, we are able to

  • Ensure that the digital files created through digitization are of high quality and meet national and international standards
  • Maintain the integrity and longevity of the digital files for long term digital preservation

These standards are subject to change as technology and practice evolve. Furthermore, each digitization project is unique in its setting and goals. The ultimate objective is to have a preservation master copy that is a faithful reproduction of the original from which additional copies can be made.

Manuscripts and printed text

Preservation and Access Master Print Access Screen Access Thumbnail
File format TIFF and TXT or PDF/A with OCR JPEG, PNG or PDF with OCR JPEG, PNG or PDF with OCR JPEG or PNG
Resolution 300 -- 600 dpi 150 -- 300 dpi 150 dpi 150 dpi
Bit depth 24 bit RBG colour or 8 bit grayscale 24 bit RBG colour or 8 bit grayscale 24 bit RBG colour or 8 bit grayscale 24 bit colour RBG or 8 bit grayscale
Dimensions 3000 -- 6000 pixels across the long edge 3000 pixels across the long edge 800 pixels across the long edge 200 pixels across the long edge
Compression Uncompressed Lossless compression Lossless compression Lossless compression

Photographs

Preservation and Access Master Print Access Screen Access Thumbnail
File format TIFF JPEG or PNG JPEG or PNG JPEG or PNG
Resolution 300 -- 600 dpi 150 -- 300 dpi 150 dpi 150 dpi
Bit depth 24 bit RBG colour or 8 bit grayscale 24 bit RBG colour or 8 bit grayscale 24 bit RBG colour or 8 bit grayscale 24 bit RBG colour or 8 bit grayscale
Dimensions 3000 -- 6000 pixels across the long edge 3000 pixels across the long edge 800 pixels across the long edge 200 pixels across the long edge
Compression Uncompressed Lossless compression Lossless compression Lossless compression

Film, negatives, and slides

Preservation and Access Master Print Access Screen Access Thumbnail
File format TIFF JPEG or PNG JPEG or PNG JPEG or PNG
Resolution 800 -- 1200 dpi 150 -- 300 dpi 150 dpi 150 dpi
Bit depth 24 bit RBG colour or 8 bit grayscale 24 bit RBG colour or 8 bit grayscale 24 bit RBG colour or 8 bit grayscale 24 bit RBG colour or 8 bit grayscale
Dimensions 4000 -- 6000 pixels across the long edge 3000 pixels across the long edge 800 pixels across the long edge 200 pixels across the long edge
Compression Uncompressed Lossless compression Lossless compression Lossless compression

Graphic art

Preservation and Access Master Print Access Screen Access Thumbnail
File format TIFF JPEG or PNG JPEG or PNG JPEG or PNG
Resolution 600 - 800 dpi 150 -- 600 dpi 150 dpi 150 dpi
Bit depth 24 bit RBG colour or 8 bit grayscale 24 bit RBG colour or 8 bit grayscale 24 bit RBG colour or 8 bit grayscale 24 bit RBG colour or 8 bit grayscale
Dimensions 6000 -- 8000 pixels across the long edge 6000 pixels across the long edge 800 pixels across the long edge 200 pixels across the long edge
Compression Uncompressed Lossless compression Lossless compression Lossless compression

Maps

Preservation and Access Master Print Access Screen Access Thumbnail
File format TIFF JPEG or PNG JPEG or PNG JPEG or PNG
Resolution Less than 36 inches on the long edge: 600 dpi

Greater than 36 inches on the long edge: 300 -- 400 dpi

Less than 36 inches on the long edge: 300 dpi

Greater than 36 inches on the long edge: 150 dpi

150 dpi 150 dpi
Bit depth 24 bit RBG colour or 8 bit grayscale 24 bit RBG colour or 8 bit grayscale 24 bit RBG colour or 8 bit grayscale 24 bit RBG colour or 8 bit grayscale
Dimensions 6000 -- 8000 pixels across the long edge 6000 pixels across the long edge 1078 pixels across the long edge 200 pixels across the long edge
Compression Uncompressed Lossless compression Lossless compression Lossless compression

Audio recordings

Preservation and Access Master Screen Access
File Format WAV, BWF or AIF (Apple) MP3
Sample Rate Spoken language: 44.1 kHzMusic and ambient sounds: 96 kHz 44.1 kHz
Bit Depth 24 bit 16 bit
Comments Highest recommended current quality, Standard for DVD/HD audio, Requires conversion to 16 bit and 44.1 kHz for most consumer audio devices Lowest frequency range acceptable, Maximizes storage space, May not provide sufficient quality for future formats

Video recordings

Preservation and Access Master[1] Screen Access[2]
File format QuickTime .mov File format .mov
Codec UYVY Codec QuickTime H.264
Bit depth 10 bit Frame size width 640
Frame size width 720 pixels Frame size height 360
Frame size height 576 pixels Pixel aspect ratio Square
Frame rate 25 frames per second Frame rate 23.976
Frame type Progressive Field Output Progressive
Frame aspect ratio 4:3 Pixel depth 24
Pixel aspect ratio 1:1 Spatial quality 75
Colour space YCrCb Min. Spatial quality 25
Chroma sub sampling 4:2:2 Key frame interval 30
Audio component Uncompressed stereo audio Temporal quality 50
Compressor uncompressed PCM Min. temporal quality 25
Bit depth 16bit / 24bit Average data rate 1.331 Mbps
Sample rate 48KHz Maximum data rate 1.331 Mbps
Number of channels 2 Audio Encoder AAC, Stereo (L R), 48.000 kHz
Audio interleave 1 sec File size 599.04 MB/hour of source
File size 93 GB/hour (approx.)

Glossary

dpi stands for dots per inch, a measurement of resolution for a digitized document (the higher the dpi, the better the tonality of the image.) The dpi setting of the scanner relates to the final pixel size of the scanned image.

8-bit refers to method of shoring image information in a computer’s memory or in an image file, such that each pixel is represented by one 8-bit byte.

grayscale refers to an image in which the value of each pixel is a single sample composed exclusively of shades of grey.

JPEG stands for Joint Photographic Experts Group and refers to a type of graphics file format commonly used for images, photographs, etc.

PDF stands for Portable Document Format and is Adobe’s proprietary file format.

OCR stands for Optical Character Recognition. It is the electronic translation of scanned text into machine-encoded text. OCR makes it possible to edit the text, search for a word or phrase, etc.

24-bit RGB refers to 24 bits per pixel in which three 8-bit integers between 0 and 255 represent red, green and blue intensities.

TIFF stands for Tagged Image File Format and refers to a type of file format for storing images.

96 kHz 24-bit refers to sample rate for audio. It means that a sample at 24 bits is taken 96,000 times per second.

BWF stands for Broadcast Wave Format. It is a standard used by the broadcast industry whereby metadata can be added to Wave files.

WAV stands for Waveform Audio File Format. It is an audio file format standard for storing an audio bitstream.

MP3 is a digital audio encoding format using a form of lossy data compress

References

  1. To be truly a preservation master, video should be in an uncompressed state. However, in a raw, uncompressed state, 1 minute of video uses up to 1GB of storage. Another file format (codec with wrapper) suitable for preservation is JPEG 2000 with the MXF wrapper. JPEG 2000 offers lossless compression and reduces the file size by 3:1.
  2. These standards are applicable to born digital video as well. Born digital video captured at these standards can then serve as the Preservation and Access Masters.