Some people eat, sleep and chew gum, I do genealogy and write...

Sunday, August 9, 2009

Standards for Genealogical Scanning


In a recent article published in the Netherlands, The Current State-of-art in Newspaper Digitization, A Market Perspective, by Edwin Klijn in the D-Lib Magazine, the author summarizes the current standards for professional scanning. Since so much of the source material for genealogical research is being scanned and put online, I thought it important that individuals who scanning for their own research know of these international standards. Quoting from the article:

Most companies use specialized equipment for scanning from microfilm and paper originals. Sometimes this is commercially available hardware such as standard A0 or A1 flatbed scanners. Some companies use custom-made large-format scanners purposely built to digitize newspapers. To create master images the consensus approach is to scan at 300ppi. The preferred format is uncompressed lossless TIFF, although some respondents also suggest using JPEG (quality 10) or JPEG2000. Scanning from the originals is generally acknowledged to produce higher quality master images. There is some disagreement amongst the survey respondents as to whether one should scan in colour or greyscale. Scanning in colour produces a master that is closer to the original newspaper (more 'authentic') than greyscale. Also, according to some respondents colour images may lead to better OCR results, or at least provide better 'raw materials' to improve the OCR in due course. Choosing the appropriate format is also closely related to the issue of storage. A master image in TIFF format requires approximately twice as much storage space as a JPEG2000 (lossless) image and ten times as much as a JPEG (quality 10) image requires.

Frequently applied image enhancement technologies include tools for deskewing, despeckling, rotation, cropping, noise removal, balancing white backgrounds and image splitting. These tools are often used in semi-automated processes, with manual correction performed at the end. Some companies optimize images in order to improve OCR results. In their workflow they clearly distinguish between images produced for viewing and images that are specifically prepared for OCR processing. In this context the alternative of so-called hybrid PDFs is suggested. These PDFs embed different quality levels within a single file, e.g. one image optimized for the plain text and delivered as a bitonal image, and another image for the illustrations on the page, delivered in greyscale.

As the derivative for web delivery, most respondents recommend JPEG, mainly because of its efficient compression rate and zooming potential. Three respondents mention the JPEG2000 format as a suitable derivative. ISO-standard JPEG2000 is considered to be an efficient compression format because it produces relatively small files. One large digitization company strongly advises against using JPEG and – to a lesser degree – JPEG2000. It argues that in the case of bitonal and greyscale images, such as those with line-art drawings, JPEG compression can lead to low-quality images. According to this respondent, PNG is preferable to JPEG because it is presently more widely supported than the promising – but not yet generally accepted – JPEG2000. This view is supported by another respondent who believes that PNG provides the optimum compression for B&W and text 'images'. Two other respondents suggest PDF as an alternative format for derivatives. Since the majority of all users are familiar with PDF files, delivering newspaper pages or articles in PDF is a common feature of most newspaper web delivery systems.

This corresponds with my own experience in scanning over the past ten or fifteen years. Although, I suggest that delivery systems in PDF format are not as useful to genealogists until the lineage linked database programs start supporting inclusion of files in PDF format.

Digitized records in the National Library of Australia


Digitized records are obviously not confined to the United States of America. I have a particular interest in Australia, since two of my ancestral families lived in Australia and I still have relatives there including some who moved there more recently. Thanks to my niece in Australia, I looked at the Website for The National Library of Australia.

The National Library has some very impressive resources, including the Australasian Genealogical Computer Index. This CD based database contains indexes to 3.9 million records drawn from the collections of 39 family history societies and institutions throughout Australia and New Zealand and is available to researchers in this fully searchable format for the first time. 19th & 20th century data drawn from cemetery records, shipping arrivals, newspaper entries, council rate books and war memorials are among the records included. Each entry shows name, date and type of event and the source from which it is drawn. Although this resource is only available in the Library, it is also available from the Society of Australian Genealogists.

The list of additional resources at the National Library and available from the Library online is way too long to include in a blog, there are 178 databases listed on the Website. Another example is the index to Cemetery Transcriptions, which is also available from the Australian Institute of Genealogical Studies, Inc.

The Library also sponsors the Australian Newspapers Digitisation Program (yes, it is spelled that way) which contains more than 1.9 million pages.

Saturday, August 8, 2009

Take the blind search engine test for genealogy

Some time ago I wrote about the new Microsoft search engine, Bing. At the time, I did a specific search on the name of one of my great-grandfathers. I knew there were a number of specific online sources, including an archive in Norther Arizona University and some books that included his name. Google found all of the pertinent sources in the first search. Bing did not find any of the sources and was very disappointing. Well, it is time to re-evaluate the searches. This time I used an online tool that does a simultaneous search in three search engines, Google, Bing and Yahoo. The Website is called BlindSearch and you can directly compare your own searches.

I decided to try the same search I made earlier, that is, searching for "Henry Martin Tanner." All of the three responses to the search had many useful references. After a few minutes consideration I decided on one of the lists. It turned out to be Google, the winner for me, once again. But I have to admit that both of the other searches had a lot of the relevant documents right up there in the ranking.

I understand that Microsoft, the producer of the Bing search engine, purchased the search technology from Yahoo recently. It would now be understandable why those two search engines would produce similar results. My conclusion always has been, use all the tools you have available. If one search engine does not produce results, try another. No one solution fits all on the Internet.

For genealogy, let the computer do what computers do

Old habits die hard. I still find that many people, including myself, fail to use the full power of the computer for genealogy, simply because we cannot think like computers. Now, don't get me wrong, although we often talk about computers thinking and reasoning, they are still rather stupid machines that do only what we tell them to do. So, how do I fail to use the full power of my computer? The answer is a little bit complex. We, myself and many others, still hang on to paper based thinking and habits.

Here is one example. I still have a pile of handwritten notes sitting next to my computer, probably an inch or so of paper, that I will likely never look at. Each time I go to the Family History Library, I make notes of my sources and what I search, only to return home and add the notes to the pile by my computer. I always remind myself to transcribe my notes into the computer and add them to the individuals concerned in my program of the day (I use a lot of different programs, like PAF, Legacy, RootsMagic, Ancestral Quest, etc.) But the notes never seem to make it from the paper to the computer and the next time I go to the library I am reminded of my errors and omissions.

I could solve the problem by using my laptop to advantage and actually storing my notes right on the computer in the note sections of the individuals I am researching. So why don't I change? I have a lot of excuses, but it comes down to habit. I am used to carrying around yellow pads and a pens and pencils, but I am still not used to opening up my computer and using its power to organize and store my information.

Another example. I scan thousands of documents. Whenever I mention this fact to anyone connected with genealogy, they always ask me, how do you organize all the information so you can find anything? Here, I am slightly ahead of the pack. I don't organize anything. I put it in a huge pile and let the computer find what I want. In the case of the scanned images, I use Picasa from Google to "organize" my scans, photos and images. All I have to do is label each image (which I am behind in doing) and let the computer "find" any image or set of images I need for any purpose. Now, if I could just find something that would do the same thing for my handwritten notes...

From a different perspective, I see people almost every day who are fighting the technology. They "hate" computers and can't seem to find anything on the computer or the network, including the latest copy of their own data files. It may be that some people just cannot learn to use computers, but from my own personal experience I believe it is a matter of priorities. Genealogists do not learn to use computers because they do not wish to do so. Using a computer is not a question of age or physical ability, I have seen people in their 90s use a computer like a professional. I have also seen people who a severely disabled use a computer with one finger or even their arms or legs. Most of the people who complain the loudest about computers will not take the time, nor do they have the interest in learning how to use them.

What do computers do well? Quickly organize (read this as search and find) huge piles and stacks of information. The Internet is now the largest and most disorganized pile of information in the world, but by using computers to do what they already do well, this pile of unorganized information is highly useful and high accessible. Computers aren't toys or games, they are tools, and very powerful tools at that.

More later on this subject.

Friday, August 7, 2009

A platinum find -- Historic Newspapers Online


For the past few weeks I have been looking for more digitized images of genealogical source material online. I felt like I had hit the jackpot when I discovered the Historical Newspapers Online, a comprehensive and current database of Websites with digitized newspaper images. The site is maintained by the University of Pennsylvania Libraries and it is worth exploring by any genealogist. The extensive list of newspaper sites is part of a larger section of history databases available through the same University of Pennsylvania Website.

Some of the Websites are commercial and require a subscription, others are freely available. Some of the files are limited to PDF images, others have JPEGs. Through this site, I found ICON: International Coalition on Newspapers with another comprehensive list of international newspaper digitization projects with links organized by country. There was also an indirect link to the Library of Congress' Chronicling America, another major collection of digitized images.

Thursday, August 6, 2009

FamilyInsight now with Ordinance Tracker

Ohana Software's FamilyInsight program has a major upgrade. Below is a list of the major features of the program showing the new enhancements. However, the program on my computer does not show that any upgrades are available and in searching the Ohana Website, I cannot find the upgrade. I assume that the upgrade has yet to be formally released because usually the program loads upgrade automatically. This program works both as a PAF add-on tool and as a standalone program.

PAF Add-on Functionality Shows in the tools menu of your PAF 5 program. Just a Click and your file is loaded into FamilyInsight from PAF5

Usability Will open files directly without going through other programs. Opens GEDCOM files created by other genealogy software programs as well as PAF 5 files and PAF backup files MAC and Windows versions available Easily edit records and add events and other information directly in FamilyInsight. Easily edit or add sources and notes. Save file as a PAF file, PAF backup, and GEDCOMs of various types.

Guide Me Click the "Guide Me" Button to see hints and guidance as you use the program. Online Training Videos Webinars

NEW Helper Feature Sign on to help someone else on new FamilySearch

NEW Reserve Ordinances Reserve names for ordinances directly from FamilyInsight

NEW Ordinance Tracker Easily view all the names you have reserved for ordinances listed in FamilySearch Not necessary to have a Family History database to view your reserved list Print Family Ordinance Requests Assign and unassign names to the temple Unreserve names Family Ordinance Request to Reprint cards

Synchronize your file with new.familysearch.org Search for matches to records in your file directly from FamilyInsight Update ordinance information into your file Add people and information you choose to your file from new.familysearch.org with just a few clicks Add new information and people to the FamilySearch family tree directly from your file. Synchronize your data with FamilySearch and add the person id to your file Mark multiple records as matches and combine them in FamilySearch family tree using FamilyInsight Add notes to new.familysearch.org or to your file Easily see which records in your file need ordinances View a relative's full information when you compare individual records, so that you can make more accurate decisions. Icons clearly show which records are linked and the status of temple ordinances. Sort by these icons as well as other columns Color coded links show when information has changed in your file or in FamilySearch family tree for each record. Unlink a record if it has been previously matched with the wrong record on FamilySearch. Handles multiple parental relationships easily to accommodate adoptive and other relationships.

Edit Summary Person Information for an Individual on new.familysearch.org Set the name and basic event information that you think should be shown on the summary page for an individual. This is the information that will be printed on temple cards Separate records that have been incorrectly combined in FamilySearch family tree 2009 FamilySearch Software award winning feature Separate multiple records with one click Separate by attribute such as a name or birth date

Separate into multiple records that are combined into new people when you complete the separate.

Easily Merge duplicate records in your file Finds more matches using the unique Insight matching algorithm Matches listed according to their match probability. Highest probability at the top. Mark records as not a match and they will not be shown in the list in the future unless you choose to see them. Mark records to Research to see if they are matches

Compare 2 Different files and update between them Update just the information and people you want Add new people from one file to another No longer a need to import messy GEDCOM files into your data to get the information you want If you have multiple files you can compare and get all the information in one file.

IGI Search Search older online IGI Update ordinance information from the online IGI directly into your file. The first and always the best in IGI Searching. Search Filters for LDS ordinances and other missing data

Edit Places -- 2009 FamilySearch software award 2009 FamilySearch Software award winning feature Automatic places name suggestions from FamilySearch Correct multiple instances of a place name with one click Drag and drop correction Mark Places as valid and keep what is in your file but choose a standardized place to search on FamilySearch family tree.

Edit RINs and Pedigrees Trim files to direct lines including children and grandchildren and spouses of a particular relative Save trimmed files as new files and keep the old file intact. Change RIN numbers in your file to what you want. Delete unwanted extra unlinked pedigrees with a click of the button. View all the hidden pedigrees in your file Compact your file to take less memory on your computer.

File Protection features Automatically Archives your file Repairs your file and gives you an easily understandable report. Safely shows and stores all foreign language characters in names, places and elsewhere.

Multi Language German Portuguese French More languages coming If you are interested in a joining our translation team please let us know

Sort and Find Sort columns many different ways - record numbers, surnames, given names % match, status, icons. Sort places by largest to smallest, smallest to largest, or by the number of instances in your file. Search and Find feature to find by name parts, RIN or pedigree.

Automatically checks for updates
Downloads the update and installs from the program.
Checks for the latest version so you are always up to date.
Downloads the update and installs from the program.

Wednesday, August 5, 2009

Legacy Family Tree demos New FamilySearch Integration

In an announcement about their Legacy Family Tree software made after the BYU Conference on Family History and Genealogy in Provo, Utah, Millennia Software stated:
The highlight of the conference was our Thursday evening class where, for the first time ever, we demonstrated our highly-anticipated FamilySearch integration software. There were so many of you attending the class (over 300) that we had to move everyone to the big auditorium (thanks to Marlo of Heritage Collector Suite for switching rooms with us!). Although the software was not yet complete, I must admit it was fun to hear the applause and cheering as we showed how the software works. I'm really excited about it and cannot wait to start using it with my Legacy Family Tree database. We announced that within the next two weeks we would begin its certification process with FamilySearch and have the free update available to you before the end of the year (of course we hope it's much sooner than that but we're learning not to announce "soon" anymore :)).
Since I am a long time Legacy user, I am also looking forward to New FamilySearch integration. Let's hope it is sooner rather than later.