Saturday, July 28, 2012

What exactly is data obsolescence?

How can my data become obsolete? History is history. My ancestors are my ancestors and none of that is going to change. When we talk about data obsolescence, we are referring to a more recent phenomena of changing computer file formats rendering older formats unreadable by more recent computer programs. We are also talking about the computers or hardware going out-of-date, and rendering the programs and files stored on computers and other storage devices unreadable. We are  not worrying about our information going out-of-date.

The first problem is hardware obsolescence. This happens over time as new computing devices are invented and developed. My newer iMac, for example, will not run any of the programs I used on my old Apple II Plus computer. The reason for this is not very simple. A computer, in whatever form, is really a number of different electronic circuits strung together in a highly sophisticated array of switches. The type of programs or files created by a computer depend on both the hardware and the software. But in the case of computers, the central processing units (CPU) of the computer has undergone tremendous changes. Information stored in a particular computer is coded to that computer's CPU.

Each type of CPU has its own operating system. As the computer chips change over time, the operating systems used by the computer to run the software also changes. In addition, each time there is a change in the CPU and the operating system, there needs to be changes in the programs that work with the newer computer and operating system. Sometimes the companies that manufacture computers make incremental changes to both computers and operating systems that don't immediately affect whether or not a particular program will work with the newer computer and operating system. Eventually, and almost inevitably, the cumulative changes make older programs obsolete.

Let me give an example of recording information on paper. The type of paper used is analogous to the file format used to store information on a computer system. I could use a paper with a high acid content, such as newsprint, and after a very few years (or even days) the paper will start to yellow and eventually it will become brittle and disintegrate. Uncared for, the information stored on newsprint will be lost. Although the data held in digital format may not disintegrate like newsprint, it can age and lose the ability to retrieve the information over time if the format of the file becomes obsolete. In the case of newsprint, the changes are natural processes. In the case of computers, the changes come about as a result of technological developments.

Although hardware changes may render a specific device or storage media unreadable, such as the quickly vanishing ability most computers have to read floppy disks, the real issue is the format of the files on whatever media. If I store my information on a 3.5 floppy disk, it is not impossible to retrieve the data. The difficulty is finding a device that will read the information from the floppy disk and transfer it to a newer device. With floppy disks, reading the information is difficult and it may take some time to find someone with a functional floppy disk, but floppy disks are not quite totally extinct. On the other hand, some file formats have already become entirely unreadable even if the file can be physically transferred to a newer medium or device.

The challenge of old file formats is formidable and twofold. First, you have to find a device that will recognize the storage media, and then second, you have to find a program that will recognize the file format. There are hundreds (if not thousands) of unique file formats and very few of them are cross-compatible with other formats. I had the experience, I have mentioned a few times previously, of almost losing my entire journal due to most of the data being stored in a program called MacWrite. If you are interested, here is a forum discussion about trying to recapture information from MacWrite.

I have mentioned Personal Ancestral File (PAF) a number of times as a potential candidate for file obsolescence. One aspect of the program that is already a problem is the "backup" function. PAF had a file option to create a "backup" file, which was essentially a compresses file format. Many PAF users religiously backed up their files using this file compression. In later versions, the program used a variety of Windows zip file or WinZip, but earlier programs spread larger files over several floppy disks. So far, I have been successful in restoring most of these older files, but there may soon come a day when either the media or the file formats will become so obsolete that they cannot be restored.

So what is the solution? Migration. What this means is simple, copy the information to newer file formats and files before the older files become obsolete. Yes, this takes time, effort and money. It is like going to the dentist or having an annual physical checkup, you may put it off forever, but you will ultimately pay the price of your neglect.

What do I do with old PAF files? PAF files for Macintosh are very difficult to recover. Presently, almost all of the current programs for Mac or PC will read GEDCOM files from PAF. Some of the programs will also read PAF files. I am not aware of any of the genealogy programs currently that will restore a PAF backup file except Ancestral Quest.  (Let me know if you have something else that will work).

More on migration in the future.

No comments:

Post a Comment