This document identifies the file formats that Library and Archives Canada (LAC) will be supporting within the Trusted Digital Repository (TDR). The formats are identified as:
“Recommended” formats are those that LAC believes will be sustainable over a long period of time, whereas the formats considered “acceptable for transfer” are those formats that LAC considers to be most representative of commonly used formats (formats in widespread use) in the collections that LAC will be preserving in the TDR (e.g., most commonly used formats in digital publications and Government of Canada (GoC) electronic records).
The list of file formats to be supported will evolve over time, particularly as new formats are introduced or older formats become obsolete. It should be noted that for any given collection submitted for preservation within LAC’s TDR, file formats that do not fall within the category of “recommended” or “acceptable for transfer” will be evaluated on the basis of their content: where the content is deemed of preservation value, the content will be normalized/migrated to a “recommended” preservation format1.
1.2.1 Preserving digital information
Canadians have been generating digital information for decades. Our books, music, movies and the records of our private and public organizations are increasingly being created in digital formats. The preservation of this digital information is a problem that touches all sectors – academic, government, private and non-profit – and ultimately all Canadians.
By its very nature, digital information is fragile. Digital bits can be preserved, but our ability to use the information is at risk if the computer hardware and software needed to interpret/render the information are no longer available, or the format specifications are not accessible (e.g., the format is proprietary, is subject to intellectual property rights, or the specifications are no longer available). Preserving digital information is complicated. It involves the active commitment of organizations, the development of appropriate policies and plans, and the implementation of sound practices. It requires all organizations with an interest in preserving digital information to share expertise, advice and best practices.
Among these best practices, the identification and use of appropriate file formats is critical for preserving digital information. Due to a mix of technical and practical issues, certain file formats are more suitable for digital preservation. This document identifies and describes digital formats which LAC is recommending for long-term preservation and access to digital information.
These recommendations are contextualized within LAC’s Digital Preservation Policy2 and the development of LAC’s TDR. The TDR is LAC’s digital preservation infrastructure supporting secure acquisition, storage, management and continuing access to Canada’s digital memory.
1.2.2 Digital content preservation strategy
LAC has adopted the following strategy for preserving digital content:
LAC has developed these guidelines for a broad audience including the public, academic and private sectors. Whether it is a government department producing a budget or a citizen self-publishing, this document is intended to provide guidance on which digital file formats are most suitable for preservation and long-term access.
These guidelines also serve as the policy foundation for LAC’s Local Digital Format Registry (LDFR), the underpinning set of guidelines for file format normalization/migration services within LAC’s TDR.
These guidelines and recommendations are concerned with media-independent content; that is digital content that is managed as file types and is not inextricably linked to a physical storage medium (in contrast to videotape which is dependent both on the physical carrier and the playback equipment). These guidelines do not address recommendations for physical preservation media4.
The file formats covered in this document have been clustered into the following content types:
This document consists of file format recommendations based on LAC’s experience in collecting and preserving digital content as well as international best practices.
1.5 Summary of recommendations
1.5.1 Definition of file formats
Generally speaking, file formats are specific patterns or structures which organize and define data. Some formats contain only one ‘stream’ of uncompressed data, others may contain codecs to encode and compress the data5, and others still may support several ‘streams’ of media.
In addition to file formats, there are also ‘container’ or ‘encapsulating’ formats. These formats can contain and support various types or layers of audio, video, still imagery, and their associated metadata. Each of these formats may be handled by different programs, processes, or hardware; but for the multimedia data stream to be interpreted properly, the information must be encapsulated together. Library of Congress define three types of container formats:
For further information on formats, see the working definition6 on the Library of Congress Web site on Sustainability of Digital Formats.
There are thousands of file types now in existence: LAC’s guidelines specify only the file formats that will be supported in the TDR. For a more complete registry please refer to PRONOM7, the Unified Digital Format Registry8 or the Library of Congress Web site on Sustainability of Digital Formats9.
1.5.2 Evaluating the sustainability of file formats
In developing these guidelines, LAC has attempted to balance the requirements for quality, stability, potential longevity and industry acceptance. Where possible, a preference has been placed on the selection of non-proprietary national and international standards, or failing the availability of non-proprietary standards on, de facto standard industry formats. De facto standard formats are widely used and recognized formats that have become industry standards because of their ubiquitous use and support, and not because they have been formally approved by a standards organization. LAC has also reserved the right to select formats that it believes will become more widely adopted by the preservation community in the near future (e.g., SIARD).
Based on a review of criteria published by Library of Congress, the National Archives (UK), and the National Library of the Netherlands10, Library and Archives Canada has established the following criteria for evaluating file formats for long-term preservation and access.
Table 1, below, summarizes the evaluation scheme used, whereas Table 2, following, provides a definition for each evaluation criterion along with the rating to be assigned based on the degree to which the criterion has been met.
| Rating | |
|---|---|
| Symbol | Description |
| √ | Evaluation criterion fully met |
| √$ | Evaluation criterion fully met, however a cost is associated with meeting the criterion (e.g., to acquire the specification) |
| * | Evaluation criterion partially met |
| x | Evaluation criterion not met |
| √/x | Evaluation criterion met in one sector (e.g., for Government of Canada content) but not met in another sector (e.g., for non-government / commercial content) |
| √/* | Evaluation criterion met in one sector (e.g., for Government of Canada content) but not met / partially met in another sector (e.g., for non-government / commercial content) |
1.5.3 File format recommendations
Table 3, following, summarizes the files formats that LAC recommends for the preservation of and long term access to digital content, and also identifies the file formats that are acceptable for the transfer of digital content to LAC.
Please note that there is no implied migration path from the “acceptable for transfer” formats and the “recommended” for preservation formats. The selection of a preservation format will be based on the degree to which the significant properties of the source format (and of individual instances of the format) are retained in the target preservation format (and the relative importance (or weigthing) of specific properties).
Table 4 summarizes the ratings of LAC’s recommended file formats against the criteria identified in Section 1.5.2, whereas Appendix A – Recommended Preservation Format Evaluation provides detailed rating information. Please note that there is no implied order of preference / precedence in the list of formats.
Appendix B – Applying the Guidelines to LAC Preservation Policies, graphically demonstrates the mapping of the recommended preservation formats to LAC’s preservation strategy (outlined in Section 1.2.2).
| Criterion | Evaluation Basis | Rating |
|---|---|---|
Openness/Transparency |
Specifications available from one or more of the following: a) Open membership organization (such as the W3C (World Wide Web Consortium), the OMG (Object Management Group)) |
√ Evaluation criterion fully met |
Specifications available only at cost |
√$ Evaluation criterion fully met, however a cost is associated with meeting the criterion |
|
Specifications potentially available from multiple sources (could not be confirmed) |
* Evaluation criterion partially met |
|
Specifications only available from / under the control of a single vendor or small group of vendors |
x Evaluation criterion not met |
|
Adoption as a preservation standard |
The majority of the organizations investigated use/are planning to use the format as a preservation standard (50% or more of the organizations) |
√ Evaluation criterion fully met |
Some of the organizations investigated use/are planning to use the format as a preservation standard (less than 50% of the organizations) |
* Evaluation criterion partially met |
|
None of the organizations investigated use/are planning to use the format as a preservation standard |
x Evaluation criterion not met |
|
| Stability/Compatibility | ||
a) degree of forward/backward compatibility |
A format is backward compatible if it provides all of the functionality of a previous release or version of the format A format is forward compatible if it has the ability to gracefully accept content intended for later versions of the format (that is, software designed to interpret / render a prior version of a format can also interpret / render the current version of the format) Forward/backward compatibility: |
|
a) High compatibility: A format is both forward and backward compatible |
√ Evaluation criterion fully met |
|
b) Medium compatibility: A format is backward compatible only |
* Evaluation criterion partially met |
|
|
c) Low compatibility: A format is neither forward nor backward compatible |
x Evaluation criterion not met |
|
b) degree of protection against file corruption |
Corruption protection: Resilience to random bit-level/byte-level changes in content |
|
a) High resilience: Changes have little or no impact to renderability/interpretability / uses methods for detecting/recovering from changes |
√ Evaluation criterion fully met |
|
b) Medium resilience: Changes affect renderability but not interpretability / some ability to recover from changes |
* Evaluation criterion partially met |
|
c) Low resilience: Any change affects the ability to interpret and render the format |
x Evaluation criterion not met |
|
c) frequency of version releases |
Format stability demonstrated by the number of version releases and/or extensions; format’s use in derivatives and/or industry-specific applications |
|
High format stability |
√ Evaluation criterion fully met |
|
Medium format stability |
* Evaluation criterion partially met |
|
Low format stability |
x Evaluation criterion not met |
|
Low dependency / High interoperability |
√ Evaluation criterion fully met |
|
Low dependency / Low interoperability |
* Evaluation criterion partially met |
|
High dependency / Low interoperability |
x Evaluation criterion not met |
|
Standardization |
Format follows a formal process enacted by any of the following: a) Open membership organization (such as the W3C (World Wide Web Consortium), the OMG (Object Management Group)) |
√ Evaluation criterion fully met |
Format is subject to documented processes implemented by a single vendor or small group of vendors or no documented process |
x Evaluation criterion not met |
|
| Content Type | Recommended | Acceptable for transfer |
|---|---|---|
Text |
|
|
Audio |
|
|
Digital Video |
|
|
Still Images |
|
|
Web Archiving |
|
|
Structured Data - Databases |
|
|
Structured Data – Statistical and Qualitative Analysis |
|
|
Structured Data – Scientific |
|
|
Geospatial11 |
|
|
Computer Aided Design –Technical Drawing |
|
|
Computer Aided Design – CASE |
|
|
Source Code and Scripts |
|
|
| Content Type |
Format | Open- ness / Trans- parency |
Adoption | Stability / Compatibility | Depend- encies / Inter- operability |
Standard- ization |
||
|---|---|---|---|---|---|---|---|---|
Forward/ |
Corruption Protection |
Release Stability |
||||||
Text |
EPUB (underlying standard for eBooks) |
√ Evaluation criterion fully met |
* Evaluation criterion partially met |
* Evaluation criterion partially met |
√ Evaluation criterion fully met |
√ Evaluation criterion fully met |
√ Evaluation criterion fully met |
|
Extensible Markup Language (XML) |
√ Evaluation criterion fully met |
√ Evaluation criterion fully met |
√ Evaluation criterion fully met |
√ Evaluation criterion fully met |
√ Evaluation criterion fully met |
√ Evaluation criterion fully met |
||
Extensible HyperText Markup Language (XHTML) |
√ Evaluation criterion fully met |
* Evaluation criterion partially met |
* Evaluation criterion partially met |
√ Evaluation criterion fully met |
√ Evaluation criterion fully met |
√ Evaluation criterion fully met |
||
HyperText Markup Language (HTML) |
√ Evaluation criterion fully met |
√ Evaluation criterion fully met |
* Evaluation criterion partially met |
√ Evaluation criterion fully met |
√ Evaluation criterion fully met |
√ Evaluation criterion fully met |
||
Multipurpose Internet Mail Extensions (MIME) |
√ Evaluation criterion fully met |
* Evaluation criterion partially met |
* Evaluation criterion partially met |
√ Evaluation criterion fully met |
√ Evaluation criterion fully met |
√ Evaluation criterion fully met |
||
Open Document Format (ODF) |
√ Evaluation criterion fully met |
√ Evaluation criterion fully met |
* Evaluation criterion partially met |
* Evaluation criterion partially met |
√ Evaluation criterion fully met |
√ Evaluation criterion fully met |
||
PDF for long-term preservation: PDF-Archive (PDF/A) |
√ Evaluation criterion fully met |
√ Evaluation criterion fully met |
* Evaluation criterion partially met |
√ Evaluation criterion fully met |
√ Evaluation criterion fully met |
√ Evaluation criterion fully met |
||
Rich Text Format (RTF) |
x Evaluation criterion not met |
√ Evaluation criterion fully met |
* Evaluation criterion partially met |
* Evaluation criterion partially met |
x Evaluation criterion not met |
x Evaluation criterion not met |
||
Standard Generalized Markup Language (SGML) |
√$ Evaluation criterion fully met, however a cost is associated with meeting the criterion |
√ Evaluation criterion fully met |
√ Evaluation criterion fully met |
√ Evaluation criterion fully met |
√ Evaluation criterion fully met |
√ Evaluation criterion fully met |
||
Text (TXT) |
√$ Evaluation criterion fully met, however a cost is associated with meeting the criterion |
√ Evaluation criterion fully met |
* Evaluation criterion partially met |
√ Evaluation criterion fully met |
√ Evaluation criterion fully met |
√ Evaluation criterion fully met |
||
Audio |
Broadcast Wave Format (BWF) |
√ Evaluation criterion fully met |
√ Evaluation criterion fully met |
√ Evaluation criterion fully met |
√ Evaluation criterion fully met |
√ Evaluation criterion fully met |
√ Evaluation criterion fully met |
|
Digital Video |
JPEG 2000 MXF (MOTION JPEG 2000) |
√$ Evaluation criterion fully met, however a cost is associated with meeting the criterion |
√ Evaluation criterion fully met |
√ Evaluation criterion fully met |
√ Evaluation criterion fully met |
√ Evaluation criterion fully met |
√ Evaluation criterion fully met |
√ Evaluation criterion fully met |
Still Images |
Joint Photographic Experts Group (JPEG) |
√$ Evaluation criterion fully met, however a cost is associated with meeting the criterion |
√ Evaluation criterion fully met |
√ Evaluation criterion fully met |
√ Evaluation criterion fully met |
√ Evaluation criterion fully met |
||
Joint Photographic Experts Group JPEG2000 (JP2) |
√$ Evaluation criterion fully met, however a cost is associated with meeting the criterion |
√ Evaluation criterion fully met |
√ Evaluation criterion fully met |
√ Evaluation criterion fully met |
√ Evaluation criterion fully met |
|||
Portable Network Graphics (PNG) |
√$ Evaluation criterion fully met, however a cost is associated with meeting the criterion |
√ Evaluation criterion fully met |
√ Evaluation criterion fully met |
√ Evaluation criterion fully met |
√ Evaluation criterion fully met |
|||
Tagged Image File Format (TIFF) |
√ Evaluation criterion fully met |
√ Evaluation criterion fully met |
√ Evaluation criterion fully met |
√ Evaluation criterion fully met |
√ Evaluation criterion fully met |
|||
TIFF - GeoTIFF |
√ Evaluation criterion fully met |
x Evaluation criterion not met |
√ Evaluation criterion fully met |
* Evaluation criterion partially met |
√ Evaluation criterion fully met |
√ Evaluation criterion fully met |
||
Structured Data - Database |
Software Independent Archiving of Relational Databases (SIARD) |
* Evaluation criterion partially met |
* Evaluation criterion partially met |
* Evaluation criterion partially met |
√ Evaluation criterion fully met |
* Evaluation criterion partially met |
||
Delimited Flat File with Data Description |
√$ Evaluation criterion fully met, however a cost is associated with meeting the criterion |
√ Evaluation criterion fully met |
* Evaluation criterion partially met |
√ Evaluation criterion fully met |
√ Evaluation criterion fully met |
√ Evaluation criterion fully met |
||
Structured Data - Statistical and Qualitative Analysis Data |
Data Documentation Initiative (DDI) Version 3.0 |
√ Evaluation criterion fully met |
* Evaluation criterion partially met |
√ Evaluation criterion fully met |
√ Evaluation criterion fully met |
√ Evaluation criterion fully met |
||
Data Exchange and Conversion Utilities and Tools (DExT) |
√ Evaluation criterion fully met |
* Evaluation criterion partially met |
* Evaluation criterion partially met |
√ Evaluation criterion fully met |
√ Evaluation criterion fully met |
|||
Statistical Data and Metadata Exchange (SDMX) |
√ Evaluation criterion fully met |
* Evaluation criterion partially met |
* Evaluation criterion partially met |
* Evaluation criterion partially met |
√ Evaluation criterion fully met |
√ Evaluation criterion fully met |
||
Delimited Flat File with Variable Description |
√$ Evaluation criterion fully met, however a cost is associated with meeting the criterion |
* Evaluation criterion partially met |
* Evaluation criterion partially met |
√ Evaluation criterion fully met |
√ Evaluation criterion fully met |
√ Evaluation criterion fully met |
||
Structured Data - Scientific Data |
Not applicable at this time |
|||||||
Geospatial Data |
ISO 19115 Geographic Information – Metadata (NAP – Metadata) (North American Profile) |
√$ Evaluation criterion fully met, however a cost is associated with meeting the criterion |
√ Evaluation criterion fully met GoC /n.a. |
√ Evaluation criterion fully met |
||||
Computer-Aided Design (CAD) – Technical Drawings |
Drawing Interchange File Format (DXF) |
x Evaluation criterion not met |
√ Evaluation criterion fully met |
√ Evaluation criterion fully met |
* Evaluation criterion partially met |
|||
Computer-Aided Design (CAD) – CASE |
XML Metadata Interchange (XMI) |
√ Evaluation criterion fully met |
x Evaluation criterion not met |
* Evaluation criterion partially met |
√ Evaluation criterion fully met |
√ Evaluation criterion fully met |
||
Source Code and Scripts |
Not applicable at this time |
|||||||
1 Note: Within the TDR, automatic normalization will be performed on the “acceptable for transfer” formats identified in the guidelines (conversion or migration to a “recommended” format): all other formats will be addressed on an individual case basis. Should the format prove to be a commonly used format, automated normalization/migration will be considered for future submissions.
2 www.collectionscanada.gc.ca/digital-initiatives/012018-2000.01-e.html
3 A service copy may be created as part of the acceptance/approval process or may be produced dynamically.
4 A policy addressing storage media for use in preservation is currently under development.
5 Please see Appendix C: Concepts and Definitions - Codecs.
6 www.digitalpreservation.gov/formats/intro/format_eval_rel.shtml#what
7 www.nationalarchives.gov.uk/pronom/
9 www.digitalpreservation.gov/formats/content/content_categories.shtml
10 See Gillesse et al 2008; Rauch, Carl et al. 'File-Formats for Preservation: Evaluating the Long-Term Stability of File-Formats." Proceedings ELPUB2007 Conference on Electronic Publishing : Vienna, Austria , 2007. http://elpub.scix.net/data/works/att/122_elpub2007.content.pdf; National Archives (UK). "Selecting File Formats for Long-Term Preservation." (2003). www.nationalarchives.gov.uk/documents/selecting_file_formats.rtf; Library of Congress. "Sustainability of Digital Formats: Planning for Library of Congress Collections." (2007). www.digitalpreservation.gov/formats/sustain/sustain.shtml.
11 For geospatial information, the “acceptable for transfer” formats with asterisks will be preserved as is (not migrated) until such time as the adoption rate of the Treasury Board Secretariat (TBS) standard (identifying ISO 19115), and the avalaibility of tools supporting the standard is more fully understood (exception to preservation strategy for the near future).