File Formats

Raw  29042

The value of raw files in the process of Cultural Heritage Digitization is enormous. As described in the Standard Raw Digitization Workflow, it can significantly increase productivity compared to direct-to-TIFF workflows. However, the raw itself is not generally considered to be a Preservation Digital Object. No two raw processing programs will render a particular raw file identically, and even different versions/generations of the same raw processing program will render a particular file with subtle, but non-trivial, differences.

There is often value to maintaining a short/mid term archive of the raw files used to create the PDOs in a collection. For instance, between Capture One v3, v4, v6, v7, and v8, there were significant and continuous improvements to the amount of real detail that could be extracted from a given raw file which yielded tangible increases in numerically measured sampling efficiency and perceived visual fidelity; this improvement was due to improved algorithms in the software. These benefits could only be seen when reprocessing the original raw file, so institutions that only had processed derivatives like TIFFs missed out on this gain in quality.

Given these improvements in raw processing, it makes sense to retain raw files for months or even years. However, a raw file does not stand alone – specific software is required to convert it into a useful image; the same raw file, opened in different (or future) software changes the way the image appears. So in the scope of decades and centuries, the raw file should not be considered a preservation object.

There is also some value to having the raw files in the short-term and mid-term as a low-storage-cost emergency backup of the PDO TIFFs/JPG2000 files.

“In 2014, six months into a massively expanded digitization effort, we had a significant failure in our primary storage system.  Since the expanded digitization program was just ramping up and our off-site emergency backup solution had not yet been finalized, some portion of our work was lost. Fortunately our IT department had provided us with separate storage for a raw file archive; I insisted we carry over the film-era practice of keeping your negatives. Reprocessing these raw files to TIFF meant we did not lose any work. It really saved our bacon.”

– Brad Flowers, Dallas Museum of Art

DNG  Screen Shot 2015-10-18 at 12.44.48 PM

Despite misconceptions to the contrary, DNG is NOT a preservation format. It is simply a wrapper for raw files which conform to guidelines provided by, and under constant revision by, Adobe. In the same way a “normal” raw file should not be considered a PDO, neither should DNG. With either a “normal” raw file or a DNG-wrapped raw file the final image depends on both the file and the software (and version thereof) used to open it. A DNG-wrapper improves the likelihood that a file will be easily “opened” but does not guarantee the image it contains will look the same as when it was created. 

TIFF  Screen Shot 2015-10-18 at 12.45.46 PM

The gold standard of preservation image formats, the Tagged Image File Format or TIFF (aka .TIF or .tif) file is notable for its exceptionally wide adoption, fully documented and transparent format, versatility, and simplicity. It seems likely that commercially available solutions will be available indefinitely to read TIFFs, but with digital technologies, the commonplace can become obscure history with startling speed.

The simplicity of the format itself distinguishes TIFF from other formats more than any other attribute. A programmer with limited experience could create a fully-featured TIFF reader using nothing but the 121-page document outlining its specification; assuming the collection is held as standard TIFFs with LZW compression, this might take an entry-level programmer one day of work to complete. Thus this format can be easily integrated into future system on new platforms and provide a consistent rendition of the underlying image well into the unknown future.

JPG2000  jpeg-logo-plain

Years ago, the JPG2000 format was created to be the successor to TIFF for preservation applications. It provides for several major variations, including lossless and lossy wavelet compression. However, wide-spread adoption of JPG2000 has not materialized, and the slow adoption rate warrants concern about its utility as a long term archival format. It is more commonly in use as a service/presentation format than the format for a PDO.

Additional Formats and Considerations: LOC

The Library of Congress run website, www.digitalpreservation.gov, lists seven primary factors used to determine the suitability of a file format to preservation: Disclosure, Adoption, Transparency, Self-documentation, External Dependencies, Impact of Patents, and Technical Protection Mechanisms. Reports are provided for many image formats under these considerations (e.g. TIFF Report).

Have a question or want to know more?
X
X