Some people eat, sleep and chew gum, I do genealogy and write...

Monday, November 29, 2010

DPI, PPI and Megapixels for genealogists: Round One

Almost every advertisement for a digital product today will have some reference to the device's resolution. In the digital world, resolution has become like miles per gallon, torque or horsepower in automobile ads. It has become almost routine for the new model of any digital camera to claim a higher Megapixel count than the previous one. The real question is how much of this is advertising hype and how much of it should I care about as a genealogist?

 The first rule that needs to be explored is that no copy can exceed the quality of the original. Creating a digital image involves the use of electronic sensors, usually in an array, that measure the reflected intensity of the source object or document and convert the continuous analog signal (i.e. the light rays coming from the object) into a discrete digital signal (i.e. individual pixels). The electronic sensors essentially count the number of photons that hit each different element and convert that number into a luminance intensity. Translated, what this means is that the light hitting the sensor is broken into tiny dots (pixels) each of which can vary in intensity. In contrast to photographic film in which the elements that react to light are on the molecular level, the sensors used for digitization have a finite size. Progress in electronics have continued to make these elements smaller and smaller, thus increasing the resolution of the device. More pixel density generally means greater detail or resolution in the image created.

However, there is a physical limitation on the degree to which additional detail will convey more information. There is a rule called Shannon's sampling theorem (also called the Nyquist-Shannon sampling theorem and a few other things which usually is applied to analog signal transmission) which in this context, essentially says that the distance between samples should be less than half of the smallest interesting detail in the image. This means, once again translated into English, that there is no point in capturing more significant detail than actually exists in the source. If you are attempting to digitize an old deed, for example, there is no point in creating an image that includes information down to the molecular level. Even though physical reality has no apparent lower level of detail, at some point capturing more of that detail adds nothing to the information conveyed.

If you examine a film photograph at high magnification, you will always see a limit to the resolution. Likewise, any digitization will be unable to go beyond the detail present in the original. (I am not talking about special filters or light sources that may bring out more detail than is visible under ordinary light, that is another topic). So the question is, what is adequate for normal archival reproduction of photographs and documents?

Back to pixels per inch (PPI) and dots per inch (DPI). Archivists do not use these terms at all. They are more concerned about whether or not the thinnest line segment in the document is adequately represented by the digitization. This amount of detail is called pixels per line segment. The pixel count, whatever, should be high enough so that the thinnest line is covered by, at least, two pixels. The optimal resolution of the digitizing device is therefore determined by the document, not by an abstract number calculated from the device. So, the way to determine the optimal resolution is to review the target documents and scan at a resolution that will adequately preserve the detail of the thinnest line segment available. There are similar issues involved in scanning continuous tone documents, like film photographs, but the considerations become even more technical.

Practically, optimal scanning of documents is achieved by scanning a various resolutions and then magnifying the digital image on a computer screen until the image's individual pixel elements can be seen in an area showing a thin line segment. It becomes apparent, at high magnification, whether or not the thin line segment is adequately represented by pixels. (In real life, those archivists who are concerned with this issue have special software to assist them in making this determination). In this regard, the higher the resolution (i.e more pixels or dots per inch) the greater the possibility that a sufficient amount of detail from the original will be preserved. But what may not be obvious, is the relationship of Shannon's theorem to this issue. More detail may not necessarily covey more information. Not every document needs to be digitized at the maximum theoretical limit. At some point file size becomes a greater and greater factor.

So why do I care? I scan thousands of documents. I would like to know that my work is not being wasted. From a practical standpoint, I usually scan or digitize at a device setting which is normally represented as between 300 and 400 dpi or ppi. (Actually, these two are completely different measurements, but that is again another post). In some documents there may be a slight loss of detail, but on the whole, the losses are not significant. Increased levels of resolution on a digital device generally do not result in more detail, but simply increase the effective size of the image. Usually, 300 dpi is sufficient for a normal 4 x 6 inch photo reproduced at size. If you wish to enlarge the image, you will have to scan the photo at a higher dpi resolution such as 400 dpi, 600 dpi or even 800 dpi or more. But always remember, you can't plumb a dry well. No scan or digital photo is going to be any better than the original.

Next post. To edit or not to edit photographs, that is the question.

No comments:

Post a Comment