Some people eat, sleep and chew gum, I do genealogy and write...

Friday, August 5, 2011

Now the photo (document, etc.) is digitized, then what?

Digital preservation has primarily focused on the issue of digitizing existing documents. So of us have ended up with a hard disk full of digitized files, including books, old photos, documents, and about anything else we could scan or photograph. So now what do we do?

The first concern, of course, is to backup the files so they are protected from loss, preferably at some off-site location. But that is just the first page of the first chapter of the preservation story. Backing up files is like buying food for storage, you can store it for only so long before it goes bad and has to be thrown out. Some of us also maintain the actual physical documents and photos and their conservation is an entirely different book, not just a different story. Fortunately, there is an abundance of online help for the amateur or home archivist. You may want to start with the Digital Preservation website of the Library of Congress. This site covers the whole gamut of digital preservation, audio, video, photographs, documents, websites, everything.

One of the concerns is the issue of file formats. For example, you digitized an old photograph and now find out that the file format you used is no longer supported. What do you do? Are you basically lost? Obviously this subject is huge. There are hundreds (thousands) of online documents on the subject but unless you start reading and studying and get past the basic digitizing step of the process to file preservation, you are literally wasting your time. Your concern should be in direct proportion to the amount of data you want to preserve. Here are the questions as posed by the Library of Congress:
  • When seeking to acquire a body of digital content with the intention of sustaining it for the long term, which formats are preferred or acceptable and why?
  • Which digital formats must be fully supported by systems, automated tools, or workflow associated with the digital content life cycle processes under discussion at the Library, i.e., support for receiving and validating digital content (in the Get process), selecting digital content (in the Select process), preparing digital content for responsible long-term custody (in the Prepare/Assemble process), and establishing strategies for preservation (in the Sustain process)? 
  • Given content in a particular format, does the Library already have a commitment to support content in this digital format? If so, are their more specific technical requirements that apply? What associated metadata of a technical nature is essential? Does LC have an existing workflow process appropriate for receiving and validating digital content in this format? Or are software tools for format validation and metadata extraction available for building a workflow process? 
  • If a particular digital format is not already categorized as preferred or acceptable for a particular category or subcategory of material, what information or assistance is available to develop a recommendation that a format should be supported or that a process be developed for reformatting to a supported format?
If you read these items carefully, you will see a lot of buzz words: formats, digital content, workflow, life cycle, long-term custody, metadata, reformatting and so forth.  The discussion of digital preservation deals with each of these terms in a very specific way. LC (Library of Congress) defines a format as follows:
This Web site defines formats as packages of information that can be stored as data files or sent via network as data streams (aka bitstreams, byte streams). For reference, the working definition from the proposed Global Registry of Digital Formats is "A format is a fixed, byte-serialized encoding of an information model."
You may recognize some of the file formats discussed by LC. These include, WAVE, PDF, MP3, TIFF, JPEG etc.

Where do we get started? Do we really need to know all this stuff about file formats? The answers to both questions are not simple. Digital preservation is really quite a complex issue and the whole issue is made more complex by reason of the fact that the whole subject is in constant change and evaluation. The illuminating principle here is that huge complex organizations, like the Library of Congress, are actively reviewing and researching the issues.

Cutting through all the jargon, the issue is whether or not any particular digital file will survive in any particular format. If you save your journal in Microsoft Word 2011, how long will that particular file format be supported before you have to migrate the file to a newer or different format?

If you have gotten this far, you probably are wondering if there is a simple solution. The answer is no. But there is hope. I will keep at this subject for a while so we can all get some idea of what we need to do to preserve our files.

No comments:

Post a Comment