Some people eat, sleep and chew gum, I do genealogy and write...

Tuesday, December 10, 2013

What is Digital Preservation and why do I care?

From time to time, the online genealogical community produces a wave of posts about backing up files. But we all need to take one further step and that is to be concerned about digital preservation. Simply put, we need to be concerned that the backup files we have on our flash drives, external hard drives and online cloud storage don't become obsolete. There are multiple pathways to the perdition of losing files through changing file formats and operating systems and any one of those paths could render your carefully backed-up files unreadable. This issue is sort-of a background problem that I regularly face, both from my own systems as well as from those of the people I talk to in classes and at conferences.

I believe I have previously told the story of my own challenges in keeping a computer-based journal for the past thirty years. The major part of the challenge happened just a few years ago right around this time of year, when I discovered that my old Microsoft Word files were not recognized by any of the programs I had on my computer. Fortunately, I found Open Office (presently I use Libre Office) was able to read the old files and I spent a week or two converting files and verifying the newly created files were complete and readable. In just the short time since that happened, I have already gone through a major issue in retrieving files from a crashed older iMac.

How many of you have updated any of you have updated your operating system recently? Can you count how many updates there have been since you last opened and verified some of your older files? Are you even aware that this is a problem? Here are some interesting websites in that regard:

I am not particularly picking on Microsoft, here is the same thing for the Apple OS X systems:

Let me propound another hypothetical situation. Most of us are mature adults and for some of us time passes rather quickly. Let's go back just five short years to about 2008. What computers were we using at that time? What operating system? Without digging into the history and reconstructing events, how many of us can even remember? How about ten years ago in 2003? Remember this event and the year: the first iPad was introduced on 3 April 2010. Now what do you think about computers and operating systems? I am already getting long sad stories from tablet computer users about the inability to retrieve data and failing apps on older tablet computers. 

Now, if for some reason, you can easily answer my two questions above because you are still using the same computer and the same operating system, you are living a much more dangerous data life than I could possibly endure. But think about this, there are a huge number of genealogists who are still depending entirely on a program that was discontinued in 2002! That is Personal Ancestral File or PAF. How many of us still have PAF on our current computer? How long will any computer be able to read those files?

Obviously, I am writing about two separate issues: changing hardware technology (i.e. new computers) and changing operating system and application program changes. Concern about both of these areas is what is being talked about in the digital preservation circles. Leading that effort is the Library of Congress and the National Digital Information Infrastructure and Preservation Program

The process of keeping our older files and their formats compatible with present technology is called "file migration." We are coming to the end of another year (2013, assuming I can still read a calendar) and it is time to think about a lot of things, but as genealogists it is time to think about data migration and preservation. Before we run out an buy that next new computer, maybe we should assure outselves that all the work we have done in past years is still readable and secure. Just a thought. 


  1. Longevity of our data is something we should all be worried about. In principle, if your data representation is described by a true standard (through some body such as FHISO), and that specification made freely available, then it will always be possible to generate a piece of software that can understand your data. Unfortunately, some vendors still do not publicise their proprietary formats, and very often we’re left with a proprietary database schema that is useless without the vendor’s own software. This is one of the reasons I questioned the need for a disk-based database at all since it’s opaque, not portable, and prone to corruptions.

    1. Good points, especially if the data, even with standards, is on an obsolete format such as the store of old Iomega disks I have.

  2. I knew there was a reason I did not throw out all those paper copies. But, when you only have about 800 in your tree it isn't that much.