Pages

Thursday, February 2, 2012

historical-data.org at RootsTech

A Google initiative has been proposed concerning creating standards for content providers to share content as a standard way to exchange common metadata between web applications. The primary focus of the proposal is in the area search engines interaction with genealogical databases. For example, the proposal's concern is identifying record entries so that searches can find matching data i.e. a name in a name field is recognized as a name. The website for the initiative is historical-data.org.

Representatives from Google were at RootsTech to discuss the initiative. The website is fairly technical and uses the term "schema." A schema is defined as a representation of a plan or theory in the form of an outline or model. The description of the initiative is as follows:
This site defines a collection of schemas (applied in the form of HTML tags) that webmasters can use to markup their historical and genealogical information in a consistent way.


Many sites are generated from structured data, which is often stored in databases. When this data is formatted into HTML, it becomes very difficult to recover the original structured data. Many applications, especially search engines, can benefit greatly from direct access to this structured data. On-page markup enables search engines to understand the information on web pages and provide richer search results in order to make it easier for users to find relevant information on the web. Markup can also enable new tools and applications that make use of the structure.
 Because it is essentially an HTML system, this initiative is intended to apply only to online applications and databases.


Normally a search engine, such as Google, will view genealogical data as simply text. By using tags or markers, those building data entry forms can mark the acquired data in a way that search engines can recognize the identity of the data entered.


The website goes on to say,
The scope of this project is limited to defining the schemas for historical and genealogical microdata that is designed primarily to enable search engines, web crawlers, browsers, and other tools for enabling users to find and understand the historical data contained on an HTML page. As such, the schemas defined by this project are designed to be lightweight and generic and are not designed to support all of the intricacies of a rich research model that supports in-depth semantic analysis and sophisticated bibliographic accountability.
A very interesting proposal and one worth watching.


1 comment:

  1. I totally agree with the comments about Jay Verkler. Does any know what his plans are in the future as Dennis Brimhall takes the reigns at FamilySearch?

    ReplyDelete