Some people eat, sleep and chew gum, I do genealogy and write...

Tuesday, September 10, 2013

Initial thoughts on the digitizing stage of preserving your genealogical documents

I need to start by defining a few terms. Scanning is the process of using a light source to digitize an object or document. Technically, the digitization part is done by a computer breaking down the light reflectioned from the surface of the object or document being scanned, into a series of signals that are stored as bits of information by the computer. The imaging software can then reassemble a copy of the original as an image. In a digital camera, the same operation is performed by the camera's sensor. The computer in the camera then interprets the different signals into a stream of numbers, 1s and 0s, that can be assembled by the imaging program into a digital image. The sensor in a digital camera and the sensor in a scanner both perform the same function that the drum in photocopying machine or the film in a camera perform. The sensors break the image into bits of information that can be manipulated and stored by the computer for later viewing.

However, in practice, the terms scanning and digitizing have assumed most of the same meaning.

In the last few years, I have scanned well over 100,000 documents. I started scanning years ago by testing every setting and varying the settings to compare the final product. I would scan the same photograph at 100 dpi (dots per inch), 200 dpi, 300 dpi and so forth with different scanners to see the effects of changing the settings. I soon discovered an important fact about scanning; there is an optimal resolution that cannot be improved upon merely by increasing the dpi of the scan. This seemed to be counter-intuitive, but an examination of the product, under magnification, proved over and over again that increasing the dpi past a certain point did not result in high quality images. The images merely got larger with no more detail. I also discovered that there were physical limits to the resolution of all lenses and scanning devices because of the size of light waves.

After a while, I began an intensive study of how scanning devices worked. In every case there turns out to be a limit to the resolution of any combinations of scanner or lenses. See, for example, The Diffraction Barrier in Optical Microscopy. The limit of resolution is caused, ultimately, by the interference between different light waves. In practice, higher resolutions that approached the Diffraction Barrier, creates an illusion of banding and movement, we refer to as a moire effect.

Almost all of the available scanners today, have an optical resolution that exceeds the physical limitations of scanning or digitizing. As a matter of note, if any scanning device fails to specify its optical resolution, you can assume that the resolution is not very good in comparison to other scanning devices available. One complicating factor is the fact that cameras are seldom rated in reference to a the dpi they produce. I have only been able to evaluate the resolution of a digital camera for use as a scanning device by trial and error.

All of this brings us to the state of art today. We have various types of scanning devices that have and are being commonly sold, with some targeted at genealogists. In the past, I have written about the various types of scanners and how to compare and buy a scanner. In this post, I am focusing on the process of scanning as part of an overall effort to preserve paper-based documents and photographs. So I will also include cameras as an option to digitization.

Some scanning devices are sold on the basis of convenience and portability. There is always a trade-off between convenience and quality. The smaller hand-held or portable devices must reduce the scanning area to achieve portability. To scan even an ordinary 8.5 x 11 inch document, they use software to "stitch" the separate parts of the page together. This makes scanning even a few documents very slow.

I would suggest that any claimed optical resolution higher than 1200 dpi is not accurately reported. Any higher resolution claimed is likely interpolated resolution, meaning it is a way of using software routines to make an image appear sharper without actually acquiring any more detail. Even a very low quality scan will appear sharp if the image size is kept very small. Magnifying the image in image editing software reveals the difference between the different levels of dpi. I will illustrate this in a subsequent post.

There is no free lunch in scanning. You can speed up the process by using a sheet fed scanner, but the trade-off is that many of the documents will be crooked or of low quality. If you want to have a good, archive version of a document, you need to carefully scan each documents on a good quality flatbed scanner, keeping in mind that need for quality in scanning a photograph is vastly different than scanning a sheet of text.


  1. I see. So there is no shortcut or fast way to scanning quality texts or images then. It should be done singly and slowly to avoid having low quality scans.

  2. Here is my sample Arduino code that will list BLE devices in range. The output is the devices (cropped) name and its address. Afterwards you can still communicate via UART. scscan