Library:Research Data/Format

From UBC Wiki

A file format is a way of encoding information within a computer file so that it can be recognized by an application and accessed. It is indicated by the file name extension (generally a full stop followed by three letters). In other words, this allows the computer to recognize that a document contains text or that a file should be processed as a video. Additionally, file formatting is important as this may affect whether the file contents are accessible following long-term storage.

Download these instructions as a PDF

Considerations

Proprietary and non-proprietary (open) formats

Proprietary formats are limited by software patents, lack of format specification details, or built-in encryption to prevent open usage by the public. This results in requiring specific software provided by one vendor in order to use the proprietary format. In contrast, an open format is a file format that is freely available for everyone to use. Because the specifications are released, open-source developers can write software to utilize the file format in the case that a particular vendor no longer supports the file format. This increases the chances that technological developments do not make particular file formats obsolete.


Industry format adoption

In some cases, an industry may treat specific file formats as a de facto standard even if the formats are proprietary and rely on expensive software. In those cases, it may be more convenient to use the same proprietary file format.


Technical dependencies

Technical dependencies are the degree to which a particular format depends on particular hardware, operating system, or software and how these dependencies might influence future usage of the media. Using non-proprietary file formats may decrease the risk of technical obsolescence by removing the dependency on the underlying technology.


File quality and file size

Each file type such as text, images, or sound has many file formats available. File quality, the representation of the given item’s characteristics, is a large part of the file format decision. Encoding that handles high resolution will be larger than lower quality file formats. However, the trade-off comes at the cost of storage space and convenience in disseminating the file to others.


Recommended File Formats

  • Databases: XML, CSV
  • E-Books: EPUB
  • Images: JPG, PNG, PDF, TIFF, BMP
  • Sound: MP3, FLAC
  • Spreadsheets: CSV
  • Text: TXT, CSV, PDF/A, ASCII, UTF-8
  • Video: MPG, MOV, AVI

For more information:

Library of Congress (2013, July 24). Sustainability of Digital Formats Planning for Library of Congress Collections. Retrieved from http://www.digitalpreservation.gov/formats/

Virginia Tech. (2013, August 27). Recommended File Formats. Retrieved from http://etd.vt.edu/howto/accept.html