Some people eat, sleep and chew gum, I do genealogy and write...
Saturday, November 26, 2016
Handling Massive Data: The Genealogical Challenge
Recently, we were challenged by two major electronic events. The first was that my wife's computer, an older Apple iMac, finally got so bogged down as to be almost unusable. What happens with computers is that as they are used, the hard drives fill up with unusable junk. Sometimes bits and pieces of programs and files that slow down the operation of the computer. Also, the operating system and other programs on the computer are being constantly upgraded or at least they should be. The upgrades often come in response to the development of faster, more powerful computers. Eventually, the new operating system is slowed down by the older computer chips. All of this probably goes unnoticed to the average user, especially those who are still operating an old PC with Windows Vista or some such.
When you are working for extended periods of time on a computer, this creeping slowdown finally reaches a breaking point. Some people simply stop using their computer. We don't have that option. We have learned to go out and buy a new computer and start the process all over again. For a while, the new computers solve the slow down problem and transferring all the data addresses the fragmented program problems. The newer operating systems then take advantage of the new processors and everything about the computer speeds up and life is good.
One issue that continues to plague genealogists who are using computers extensively, is the vast amount of data that accumulates from scanned documents, photos, notes, and a myriad of other stuff that goes along with research. Over time, I have had to move to larger and larger capacity storage devices and migrating the data from one device to another is a constant battle. Fortunately, the price of the new storage continues to drop. The latest hard drives are 8 Terabytes and cost only $179.
Another challenge was that the 12-volt battery in our Prius V died. When that happened in the past with regular non-hybrid cars, all you had to do was jump the battery and charge it for a while or get a replacement. When the Prius battery dies, the car goes dead. You can't even open the rear hatch to get to the battery which is in the back of the car. Well, all turned out easily fixed. In fact, it was the easiest battery change I have ever done in my life and I have changed out a lot of batteries over the years. How did I know what to do with the dead battery? I looked on YouTube.com of course and watched a couple of videos on how to change out the battery and open the back hatch.
Back to the data movement. As we transition to a new computer, we realized that we were still scanning documents and the accumulation had now reached monumental proportions. We decided we needed to make sure all of the documents were on both my computer's hard drives and on my wife's hard drives. So I ordered a new 8 TB hard drive and began the process of transferring the data; over 700,000 files. That process is going on now, for the second day, and will likely take over two days to make the complete transfer.
So what do we do with all that data? The scanned images need to be identified. The photos need to be tagged and uploaded to the appropriate people in the FamilySearch.org Family Tree and the duplicates need to be deleted. All that has to take place between two separate computers. The first step is consolidating all of the data on my wife's computer with that on mine on one hard drive. Then using that hard drive we will create a "working file" for all the documents we are currently working on identifying. Then we will move the completed files to a "completed file folder." The huge amount of working space on the 8 TB drives makes this all possible. We will likely incorporate online storage in the process also. What we feel is appropriate will be added to the Family Tree.
This whole process becomes a background to all the other activities we are involved in from day to day. All of our changes are constantly being backed up to other backup hard drives or to the internet in cloud storage programs.