Pages

Friday, December 2, 2011

Searching 40 TB of Records on an iPad

The tag line from a recent release of the National Archives reads, "Imagine that you want to find electronic records related to a particular geographic location in a very large collection (40 TB and about 70 million files) of archival electronic records. Wouldn’t it be cool if you could pick up an iPad, have a map pop up on the screen, run your finger over the area on the map you were interested in, and have a list of relevant record collections show up on the screen next to the map? Wouldn’t it be really cool if you could then drill down through that list and see metadata about records in each collection?"

The article reports the results of the 2011 Large Data Analysis and Visualization (LDAV) symposium in Providence, RI. The NARA partners have developed a set of online searching tools that do the following:
  • Searches across a large collection
  • Identifies files in the collection that contain geospatial information (i.e., GIS datasets)
  • Identifies applications that can open the identified files
  • Opens the files
  • Extracts metadata from the files
  • Determines the geographic coverage of the files
  • Adds the metadata and the latitude and longitude each file covers to an index
The development of these types of tools points out the need for genealogical researchers to become more proficient in a number of areas.

  1. Genealogists need to be very familiar with computer operation on a number of different formats and that touch screens may well become the default method of operating most types of information systems.
  2. Genealogists need to understand maps and be more consistent in recording the location of events. The days when you could put "Mary, born abt 1800 in Ohio" are long gone. The whole NARA project points out the geographical basis for nearly all organizations of record collections.
  3. Genealogists who resist computerizing their data will be left even further behind in doing any kink of extensive research.
  4. There is a firm commitment by large record repositories to make more of their collections available electronically. 

My observation is that a huge number of today's casual researchers will likely be left out of this whole system.

No comments:

Post a Comment