Some people eat, sleep and chew gum, I do genealogy and write...

Monday, July 2, 2012

Handling Oodles of Data

I got two items today that spoke to the issue of digital preservation by the semi-truck load. The first was a blog post from FamilySearch that stated, in part
Volume of data: the digital pipeline in Family Search is generating somewhere in the range of 15 terabytes of images, or one million to three million pages digitally every business day of the year. The software to handle this volume did not exist when we started digital preservation. We push many of our vendors to come up with new technology to meet our needs as we stretch their capabilities and often break their products. We are also writing our own magnetic tape storage software because no products exist on the market that can handle preservation storage volume of this magnitude.
You really need to read the rest of this amazing summary of the challenges of producing and maintaining such huge amounts of data. But be aware, that each of us face the exact same challenges although at a much smaller scale.

At the same time, I found this from the Library of Congress:
While some of you may have stopped purchasing CDs for your own listening pleasure, the Library of Congress continues to collect them in huge quantities.
For example, according to Rene Sayles, a Library Technician in the Geography and Maps division, over 700 new CDs come in every month from just one source: the National Geospatial-Intelligence Agency (and especially its Controlled Image Base series).
In addition to this cascade of new items, the Library has droves of valuable CD-only historic material already in its possession, with even more material stored on really endangered storage media such as ¼ or ½ half inch floppy discs, zip discs, mini DVDs, digital audio tape, digital linear tapes and many more. A rough estimate is that the Library has more than 300 terabytes of data stored on these devices, with the potential for it all to be locked-up when players for them no longer exist.
How many of you out there still have a hoard of floppy disks and piles of CDs. How much longer do you think your computer system and programs will support those data storage media?

I have been able to get the data from a stray 3.5 inch floppy in the past few months, but I really don't know how much longer I will be able to do that. I got rid of my floppy disks some time ago, but I do have a bunch of Zip disks that are probably no longer readable. Who knows what secrets are now lost to mankind?

The key here is data migration. You need to move your data to newer storage devices and update it with newer programs constantly. Easier said than done, of course. Look at TechTips for some articles on the subject. 

1 comment:

  1. I have a LaCie USB Zip-disk drive. It doesn't read the programed maaterial, just the raw data, but I can interpret lots of missing data from this drive.