Some people eat, sleep and chew gum, I do genealogy and write...

Saturday, May 29, 2010

File formats for saving "original" photos -- Part Four

Choosing an image file format for storing original photographs and scans raises a number of major fundamental issues including most importantly, the survivability of the format in the long term. The most recent issue of the Family Tree magazine has an article on Endangered Sources by Lisa A. Alzo. The article identifies various categories of public and private records that could be lost over time. One obvious solution, and a solution becoming overwhelmingly popular, is to digitize the paper records. But we are also faced with the same issues of loss through digital formats becoming obsolete. This can happen either because the file format is no longer supported and newer software does not recognize the format, or more obviously through hardware obsolescence. For example, try and find a machine to read recording tape on reels, i.e. a reel to reel tape recorder. They may be still available but you would likely have to buy a used machine categorized under vintage electronics.

How long will the digital formats of RAW, JPEG and TIFF remain as viable storage formats? There are a fairly large number of these formats. Here is a selective list of the most popular formats:

TIFF -- originally Tagged Image File Format, but most recently, the acronym usage has disappeared. Originated by the Aldus Corporation, the developer of the PageMaker program, the format is now used by Adobe Systems. The TIFF format is supported by many image-manipulation applications, by publishing and page layout applications, by scanning, faxing, word processing, optical character recognition and other applications.

PNG -- originally Portable Network Graphics. PNG was developed to replace GIF files and is bitmapped image format that uses a lossless data compression. Although PNG compares unfavorably with JPEG for photographic storage. Quoting from Wikipedia:
JPEG (Joint Photography Experts Group) can produce a smaller file than PNG for photographic (and photo-like) images, since JPEG uses a lossy encoding method specifically designed for photographic image data, which is typically dominated by soft, low-contrast transitions, and an amount of noise or similar irregular structures. Using PNG instead of a high-quality JPEG for such images would result in a large increase in filesize (often 5–10 times) with negligible gain in quality.
PNG is a better choice than JPEG for storing images that contain text, line art, or other images with sharp transitions. Where an image contains both sharp transitions and photographic parts a choice must be made between the large but sharp PNG and a small JPEG with artifacts around sharp transitions. JPEG also does not support transparency.
JPEG is a worse choice for storing images that require further editing as it suffers from generation loss, whereas lossless formats do not. Since PNG's extreme inefficiency in compressing photographs makes it not useful for saving temporary photographs that require successive editing, the usual choice is a loss-less compression format designed for photographic images, such as lossless JPEG 2000, or Adobe DNG (Digital negative). When the photograph is ready to be distributed, it can then be saved as a JPEG, and this limits the information loss to just one generation. Furthermore, PNG does not provide a standard means of embedding Exif image data from sources such as digital cameras, which makes it problematic for use amongst photographers, especially professionals. TIFF, JPEG 2000, and DNG do support such meta data.
GIF -- Graphics Interchange Format. Introduced by CompuServe in 1987 its usage is restricted to 256 colors and it is used almost exclusively for images on the Web. GIF files are most appropriate for sharp-edged line art with a limited number of colors.

JPEG -- See the discussion above under PNG. JPEG stands for Joint Photographic Experts Group. JPEG is best used for photographic reproductions. Its biggest drawback is that is a lossy file format and repeated editing will degrade a JPEG file.

RAW -- not an acronym. A RAW file is not directly usable as an image, but it contains all of the information from the camera and can be used to create an image with a program called a RAW converter. There is no single RAW format, every camera or other device has its own specifications.

BMP -- also called the DIB file format (device-independent bitmap) it is a bitmapped image file format. Quoting from Microsoft support:
A device-independent bitmap (DIB) is a format used to define device- independent bitmaps in various color resolutions. The main purpose of DIBs is to allow bitmaps to be moved from one device to another (hence, the device-independent part of the name). A DIB is an external format, in contrast to a device-dependent bitmap, which appears in the system as a bitmap object (created by an application using CreateBitmap(), CreateCompatibleBitmap(), CreateBitmapIndirect(), or CreateDIBitmap()). A DIB is normally transported in metafiles (usually using the StretchDIBits() function), BMP files, and the Clipboard (CF_DIB data format).   
PSD, PSP etc. -- PSD files are a proprietary format created by Adobe Photoshop. PSP is another proprietary format created by Corel Paint Shop Pro. Each proprietary format has it limitations in the programs that can recognize the file.

OK, so which ones to use? If you are not concerned about survivability and only need an optimized image, for example a web page, you can use JPEG, PNG or GIF files. JPEG files are lossy, so they are not the best choice for archiving images. TIFF files are relatively large and lossless, but file size is rapidly becoming a non-issue except for web usage. It makes a lot of sense to save your files in their native RAW format but there is still a possibility that the file format used by your camera will change or disappear.

One possible alternative is to use Adobe's  proprietary DNG file format. I will consider this option in my next installment.

No comments:

Post a Comment