Some people eat, sleep and chew gum, I do genealogy and write...

Friday, November 4, 2011

Taming the genealogy paper monster

This is not really a series because I keep getting distracted, but I have been discussing one of the genealogical facts of life, paper. Here is the problem with paper from a modern hypertext world perspective; it is extremely difficult to cross-reference without increasing the problem. Let me illustrate what I am talking about.

Let's suppose you find a will of a deceased ancestor that names the deceased and several other family members. If you are maintaining a purely paper based system, the problem is how do you identify the fact that the will has several pertinent names? One way, would be to put the will in a surname folder which includes everything relevant to that surname. Great, what if this will is simply one of ten thousand other document dealing with that particular family. Having a surname folder for each family is not a scalable system. OK, so you have your primary folder for the surname family and then break the files down into types of documents. Have a look at the Family History Library Catalog and do a place search. You will start to see how many different categories you can possibly have under each surname. But you have not solved the problem of cross referencing the names and topics in the will. How do I know, for example, if I am looking in the file for a spouse of one of the children named in the will that the will exists? Do I make a copy of the will and include the copy in every potential folder? If the will mentions transferring a parcel of land, do I put another copy of the will into a folder for real estate? If I put all of the wills in a will folder, how do I know when to look in the will folder?

Of course, you can rely on your memory to remember that you have the will, but your memory might start failing at around 10,000 documents or so. Every one of these paths to organization lead to having an additional folder or an additional index or additional copies of each document. Do I really want ten duplicate copies of the will? From my own experience with tens of thousands of documents and having worked in libraries for years, I recognize the limitations of any paper based system, even a system of cross-referencing. When I worked as a bibliographer in the University of Utah Library, I would commonly find closely related books that had been classified in entirely different subject headings and so were in different locations in the library. In one case, biographical material about an individual was in three different locations within the library and there was absolutely no card catalog indication that the three different locations existed. I only found the locations by actually walking up and down the shelves in the library.

To begin to solve this problem, we have to focus on the smallest classifiable unit in our organizational system. We cannot impose a meta-system that has no direct relationship to the content of the smallest classifiable unit in the system. For example, I cannot rely on color coding, if the color coding has no real relationship to the content of the underlying documents. In genealogy, the smallest classifiable unit of organization is the individual document or photograph. A document might be a book or a single piece of paper. A photograph is always a single image, regardless of the number of people in the photograph. The document (in this case I am including photos) must be identifiable. This means that the document has to have an identity with some kind of distinctive content.

Let's say I have a picture of a tree in a forest. No people. Just a tree. Do I know why the photo is in my collection? Is the tree significant in some way? Do I have a date and a place the photo was taken? Is the tree photo part of a series of photos that tell a story? Such as a vacation trip or a survey of a work area? Focusing in on the significance of each document is important to determine how or where the document relates to the overall history of the family. Any system of organizing files and documents must take into account the cross-classification needs of the individual smallest classifiable unit. How does the unit fit into the overall picture.

No such classification system can be static. I may not have any idea why the family took the picture of the tree, but upon further research find out the reason which may be part of a valuable and interesting part of the family's history, or not as the case may be.

Once I have focused on the smallest classifiable units in my collection, I need a way to record those units so that I can connect them to every other relevant unit. Back to the will with several family members, I need a way to connect the will to every family member mentioned and to any other individuals who may be affected by the will. But at the same time, I want to avoid making multiple copies of the document. I only want one copy. We have reached the one-to-many or many-to-one relationship problem.

The answer to this problem is metadata. Tune in to find out why.


  1. I'm waiting for your next blog on metadata - so far my messy system is more-or-less working, but ... the piles are breeding in corners.

  2. I wish I could confine my piles to corners. They are strewn from an upstairs room that is being used for junk storage (I am downstairs), to my bedroom where they rely on an available (clear of other junk) space on the dresser or the floor beside my bed.

    I recently went through one family folder that I had inherited from my dad. In it were eight copies of the same exact family group sheet. I had transferred the info to digital files at some point, and so was able to throw out the eight pages, making sure that there were no notes on the back, etc.

    Now, I have the rest of my life, however long that is, to do the same to a few hundred other folders.