Some people eat, sleep and chew gum, I do genealogy and write...

Sunday, December 4, 2011

Genealogist drowning in data?

A recent short video from The Economist started me thinking, is there a point at which the amount of data exceeds anyone's ability to review, much less digest it? Right now, if I concentrated on one of my ancestors, I could practically spend full time just looking for additional information online. Do you think you can just do an search and have it covered?, even with its huge database is really just a small fry compared to the total number of records going onto the Internet almost daily.

A question was asked some years ago by Michael Lesk at the University of Arizona in his article "How much Information is there in the World?, "Are we better off having all possible information and giving it the most sketchy consideration, or having less information but trying to analyze it better?" From the genealogical viewpoint, how can we ever know that we have searched all of the possible sources of information? 50 or 100 years ago, genealogical research was very limited. If you want to get an idea of how limited, you can read, Jacobus, Donald Lines. Genealogy As Pastime and Profession. Baltimore: Genealogical Pub. Co, 1968. Jacobus' description of doing research seems positively antiquated. It is also apparent from his descriptions that the genealogists of his time missed a tremendous amount of information that was locked up and inaccessible at the time.

We can easily become so fascinated with every new thing that we move like children in a toy store, running from one discovery to the next without ever discovering the potential of each new revelation. The only way to overcome data shock is to concentrate on specific tasks and research goals. Going back to Lesk's article, he said, "Two years ago I heard Ted Nelson at a conference suggest that we should keep the entire record of everyone's life; all the home snapshots, videos and the like. Some six-year-old, he said, is going to grow up to be President; and then the historians will wish we knew absolutely everything about his or her life. The only way to do this is to save everything about everyone's life. I laughed, but it's indeed possible."

Even if it were possible to record every single event in a person's life, is it ever desirable to do so? Do we really need or want that kind of information? Who would want to read all of that information if you wrote it down? My Great-grandfather left a one page handwritten biography of his life and I doubt that more than a handful of his descendants have ever read even that one page. I have been keeping a personal journal since 1975 and have a hard time believing that anyone will ever take the time to read the entire document. What do you think are the chances that anyone will read all of the other documents and histories I have accumulated?

Now, is this basically simply pessimism? Do we do genealogy for others? Would I change what I am doing even if I did believe that no one would ever read or look at whatever I had done? No, not for a minute.

If seems tautological to say that the explosion of data sources and the amount of readily available genealogical data calls for a completely different way of looking at genealogical research, but it is still true. I read a very well done genealogical case study recently and immediately noted that the author was apparently unaware of several other sources of information that were now available online. When does our research begin if we have to review all of the possible sources of information available about our ancestors?  I recently wrote a post referring to the National Archive's initiative for putting information online, how do we really know whether or not there is some other document out there we need to review? I was speaking with one Family History Consultant who has a lengthy experience in doing research in American Indian genealogy. Within a few minutes, I had mentioned several online resources with which she was entirely unfamiliar. How do we keep up with our expertise, when whole new fields of information become regularly available?

It would be nice to have some answers, maybe as I continue to think about the challenges of too much information I can share some of my insights.


  1. Too much data? No such thing! I am writing a biography of my great grandfather, a member of an immigrant community that was little studied and even less written about.

    Not if we are sincere in our efforts to document the lives of our ancestors.

    Just my two cents, plain.

  2. That was a fascinating (and frightening) video.

    It's not only genealogists who are drowning in data, but everybody. And it isn't slowing down. According to Eric Schmidt, every two days we're creating more data than has been created since the beginning of time until 2002. And he said that over a year ago. (which is pretty much just saying the same thing as that video)

    Yet still for any given search, most people trust Google to return the best result within the top few choices.

    I think, more and more, we will come to rely on people who do the vigilant work of discovering, sifting, and sorting for us, compiling this into information, and making it available to us in a much more consumable fashion. These are digital curators, who compile all the best information on a particular topic. There's a growing need for people to fill this role.

    Hopefully, someone will step up and become the curator of American Indian genealogy resources and information. If that person can do a good job, she'll earn the trust of an audience who will rely on her for that valuable service.