Some people eat, sleep and chew gum, I do genealogy and write...

Monday, June 10, 2013

File Naming Conventions for Genealogists

My rule is to let the computer do what computers do well and let me do what I do well. Most organization schemes intrude on the computer's realm try to make the organizer feel useful. So the question is what do computers do well? The answer is so simple as to be obvious: they organize huge amounts of data. The key here is telling the dumb computer what you want. Pretty colors, folders, lists, charts and all are for our own benefit, not the computer's benefit.

I could spend all day looking through piles of paper and still not find what I am looking for. The answer is letting the computer have a go at searching. But to search the computer has to have something to search for. For example, there are elaborate techniques for searching online which I hardly ever use. My experience is that I can find anything I am looking for, usually in less than two minutes. If I don't find something, I am looking for the wrong thing. Sometimes it takes a while to get educated about what something is called so I can find it. This is not magic, it is experience. Do a few thousand searches and you will see what I mean. My Grandson is presently practicing the piano and he is repeating the same piece over and over, then the same section over and over, faster and faster. This is how you learn to search. When you start to think like a computer or a computer programmer, you find stuff.

Now, that said, how do we organize out piles of genealogy? Well, the computer has to have access to all the stuff and that takes digitizing and labeling, i.e. file names and metadata. Simple is good. Complex is not good. Dates are really good. Names are OK. Attached or embedded is best.

This is why I start with naming files. Most file names in the past were next to useless. But today on any computer system, you can usually have up to 255 characters without a problem. That is longer than a Tweet. Here is my suggestion for the form of a file name:

Date (space) name of person or whatever (space) incremental number

So for example, if I had just digitized a photo of one of my ancestors the name of the file might look like this:

2013-06-10 George Jarvis on porch in St George Utah abt 1900 001.tif

Of course, you leave the extension (.doc, .tif, .jpg etc.) strictly alone. You could leave out the "on" and "in" if you like. Do not use any of the characters such as / , : or any other similar characters because in some programs these are significant to the program and might cause an error. Click here for a list of the operative characters that should not be included. Don't get confused with paths and all that. Telling the computer where to find the file is up to the programmers. That is why you have a computer operating system.

I put all my documents into one huge folder on my hard drive or an external hard drive if I run out of room on my computer (which I do regularly). I currently am using 3 Terabyte external drives to back up my data files. One of the major issues with backing up files is having them scattered all over the computer. You then are forced to backup the entire hard drive to make sure you have all the files. If you want to move them to a new computer it becomes a problem. So everything is in one huge clump and can be searched and moved if needed. I just did this moving of files this last week and it took about six hours to move one file folder of about 63,000 files.

Now, metadata is another way to differentiate files. You add keywords and descriptions to each file to enable the computer to find what you are looking for. Most of todays operating systems will search text type documents on the computer, so adding metadata to text may not be so crucial, but adding metadata to images is. I will come back to a discussion of metadata in another post.

What about folders and such? You can use all you like, but the more you do folders and subfolders, you are only making your life harder for yourself and it makes no difference to the computer at all. You might as well put everything in one huge file. Although there is a an analogy between the computer desktop and drawers and files, it is for our benefit, not the computer's.


3 comments:

  1. I also deal with thousands of files. Because I mainly use UNIX (Macs) or UNIX-like (Linux) systems I prefer to replace any spaces in file names with underscores. This makes dealing with files and paths to files much easier from the terminal.

    ReplyDelete
  2. Not true, James. I developed a filing system where I can pull all the records for any person plus their descendants, any number of generations, in a matter of seconds. You cannot do that with all your files in one folder.

    ReplyDelete
  3. I've been a Windows user forever and I like using folders and sub-folders. I've used the same naming conventions for many years but haven't settled on one from genealogical records. I'm finding the most critical step is to slow down and file/name as I find information instead of getting caught up in the thrill of the search!

    ReplyDelete