Thursday, April 2, 2015

Digitizing Genealogy -- Understanding and Using Scanned File Formats

When a scanning device or a camera creates a digitized file, that file must be stored on the device in some kind of file format. The most common format is referred to as the JPEG format. There are, however, dozens of different image file formats. The most common file types include the following;

  • TIFF -- Preferred archival format
  • JPEG -- A lossy file format
  • PNG -- Lossless but compressed
  • GIF -- A compressed file format, not preferred by archivists
  • RAW -- Used by higher end digital cameras
  • BMP -- Microsoft proprietary format
  • PSD-- Used by Photoshop

At this point the post could become highly technical. But the issues for genealogists, when they have a choice, should be to save their image files in the most accepted archival file format available. Unfortunately, many scanning devices and most digital cameras default to JPEG format.

My preferred source for information about digital preservation is the Library of Congress. The main reference is a section of the website entitled, "Sustainability of Digital Formats, Planning for Library of Congress Collections." I will refer you to the document itself, which is somewhat technical, but the summary of the summary is that TIFF images are preferred over any other type of file format. Most of the other file formats listed above are acceptable, but not preferred. The two file formats the Library of Congress deems unacceptable are the RAW format and the Photoshop file format PSD.

The issue here is sustainability. Will the file format be used in the future and will images stored in that format be able to be used by new computer devices in the future? The important terms here are "lossy" and "lossless." These are complicated terms and deal with the fact that lossy file formats lose information (quality, resolution etc.) as they are edited. Lossless file formats, such as TIFF files, do not lose information. Additionally, compressed files also involve a trade-off between file size and quality. Compressing a file so that it takes less storage room on a hard disk or other device, necessarily affects image quality. There is a way to avoid losing quality with compression and the JPEG 2000 file format claims lossless compression, although you may have some trouble finding a program that supports this file format.

What this boils down to for genealogists is that we need to use the most common and most highly supported file formats available to us and we also need to be aware of the file formats supported by the programs we use to store our data.

There are dozens (hundreds?) of image programs available today. Many of these programs such as the high end Adobe Photoshop and the lower end Photoshop Elements, can save files in a variety of file formats. Many other programs will convert images from one format to another. Just remember, you can't get blood out of a turnip. If your image was created in a lossy file format such as a JPEG file, subsequently saving the file as a TIFF file may help, but the quality of the image is determined at the time the image is created by the scanner or the camera and the file format will not improve that quality. It may help to preserve or not help depending on the circumstances, but the initial file creation is what is important.

