Pages

Monday, September 8, 2014

Intelligent Indexing

One of the projects being developed by the Family History Technology Lab at Brigham Young University (BYU) is the "Intelligent Indexing Research with Downloads." This project includes two of the most sought after goals in genealogy: handwriting recognition and automatic data acquisition. I guess it is one of the mysteries why FamilySearch.org has not been working on these two issues when it comes to Indexing. Maybe they are? But the larger companies in the private sector seem to guard their internal developments rather closely.

The program is described as follows:
The Intelligent Indexing project's aim is to improve human indexing of records by leveraging technology that will make indexing faster and easier. The project currently involves an intuitive interface that reduces context switching for the user and a novel handwritten word recognition algorithm to reduce the amount of work done by the user. The handwriting recognition algorithm is able to identify words that look similar, allowing the user to index a word once and have their response filled into any other occurrences in a document. 
This project is providing steps towards better human guided automated indexing which will greatly enhance the amount of work any one indexer can do. 
The current stage of the project is in need of testers, so if you would like to contribute, please use one of the links below to download the use-study program. 
Current Research: 
Owing to the scarcity of repeated names in a single document, the current word matching algorithm is ineffective when working with names. We are currently working on techniques of automatically separating handwritten words and letters and on performing single handwritten character recognition. These should allow use to populate helpful suggestion lists when a user is indexing names. 
We also hope to develop the project into a collaborative indexing tool where multiple users will be able index portions of a document from their handheld devices.
I have seen some similar ideas from Mocavo.com, but this BYU project carries the implementation of the automatic handwriting recognition and data acquisition much further. Here is a demo of the program:



Implementing a program like this would take indexing to a whole new level of efficiency. Presently, the indexer is forced to move across field linearly, even if the fields are very repetitious. I would handle this in way that quickly identified all the repetitious information and then let the indexer focus on the variables. The time savings of avoiding the need to enter the same information into the same field every time would be significant.

Here is another very short demo of the handwriting recognition function:



I am aware that the FamilySearch Indexing program is in the process of changing from a local program based system to a completely online program. Maybe some of these types of features will be incorporated?

No comments:

Post a Comment