Some people eat, sleep and chew gum, I do genealogy and write...

Sunday, March 3, 2019

Far-reaching Changes from MyHeritage’s The Theory of Family Relativity and AutoClusters: Part One


This is part one of a multi-part series about MyHeritage's Theory of Family Relativity. 

Background

To appreciate the technological innovations introduced by MyHeritage.com at the RootsTech 2019 Conference in Salt Lake City, Utah, it is necessary to have an understanding of the need for these innovations.

To begin, we need to remember that before the technological innovations presently available, genealogical researchers were faced with contacting relatives by letters, physically visiting record repositories, and manually searching through books, documents and other records to find their ancestral family members. The time and expense of such searches around the world could be enormous.

With the advent of the internet, digitization, and the development of genealogical database programs with family tree hosting, such as MyHeritage.com, FamilySearch.org, Ancestry.com, and Findmypast.com that have accumulated billions of digitized and indexed historical records basic genealogical research has been transformed. However, the holdings of these large websites and many other smaller ones with digital records must still be searched using the search engine capabilities of each website.

To assist these user-initiated searches, each of these large websites has developed automated systems of record hints to aid researchers in compiling their family history by matching suggested historical sources that may assist the researcher in obtaining more information about individuals and families. Record hints are generated by complex programs that search the indexed historical records on the website for matches to the individual and family information in the various family trees supplied by the users.

Meanwhile, for reasons completely unrelated to genealogical or historical research, scientists all over the world have developed other complex methods of identifying individuals called collectively DNA testing. By an extension of the original DNA testing methods and in conjunction with large genealogical databases, scientists have now extended the original uses of DNA testing to include producing information that can assist in genealogical research. Rather than turn this present article into a long discussion of the origins of DNA testing and the various DNA tests that have become available, it is sufficient to note that there are books and online websites now dedicated to explaining the complexities of DNA research and testing. A relatively short time ago, some DNA testing companies began investigating using DNA test results to help identify ancestral lines for genealogists. This line of research was based on the use of DNA tests to identify individual fathers in paternity lawsuits. Genealogical DNA testing rapidly became an adjunct to both traditional and online genealogical research.

Any current discussion of DNA testing rapidly seems to disintegrate into detailed recitals of the complexities of chromosomes and the other components of cells. I compare this fascination with the minutia DNA to the early days of automobiles when the details of engine performance and the workings of manifolds, fuel injection, and other such mechanical complexities were set forth in consumer ads. Today’s automobile ads focus on comfort, luxury, and accessories. The intricacies of the engine and drive train are left to those who are interested in automobiles as automobiles. I suggest that the focus on DNA testing will also change to focus on the results and uses of the test rather than sounding like a beginning course in human biology.

Granted some knowledge of the details of DNA help in understanding the results of the DNA tests but the real issues for genealogists, who are not also biologists, is in the matches made with the websites’ family trees. The idea of taking a genealogical DNA test is finding matches that give insight into familial relationships. The concern is that the matches are based on accurate DNA testing techniques. However, DNA matches by themselves do not solve genealogical brick walls. In fact, without the support of robust, well-documented family trees, the DNA test does not tell you much of anything outside a small circle of very close and known-to-you relatives.

For example, some of the most publicized types of results from taking and submitting a DNA test are when the DNA test results turn up a close relative; mother, father, sister, brother. The reality before MyHeritage was that finding a distant cousin, say a fifth cousin three generations removed, was not helpful because the number of possible common ancestors was too large to hazard a guess as to the path of a relationship through a common ancestor. Additionally, the most common DNA test, the autosomal DNA test, is only accurate back about six generations because the amount of the original ancestor’s DNA transmitted to a child decreases by half in each successive generations. If we start with 100 percent, each child receives only 50% of each parent’s DNA. If you do the math, you will easily see that the possible percentage of DNA transmitted drops below 1% after six generations. This percentage is likely within the possible margin of error of the DNA companies. As a side note, the idea of assigning ethnicities to the DNA test results is no more than a novelty, much larger data sets, i.e. more people contributing to the target ethnicity will have to be obtained and the geographic target areas will have to be much more closely defined, i.e. smaller.

Genealogists have evolved some tactics for attempting to establish relationships to a possible common ancestor. There are two types of DNA tests that are more stable than autosomal DNA testing. They are the Y-DNA test for the direct paternal lines and mitochondrial DNA tests for the direct maternal lines. There are some other DNA tests, but they are not generally used for genealogical research. Using a combination of all three tests, it is possible to establish relationships using some ancestral lines.

The Holy Grail of DNA testing is the ability to somehow match two separate DNA test subjects and calculate or otherwise determine a common ancestor and a relationship path. Now we come to MyHeritage’s The Theory of Family Relativity.

The Theory of Family Relativity

The idea behind the MyHeritage’s The Theory of Family Relativity (hereinafter “the Theory) is that if you combine the relationships shown in family trees, with the relationships show in DNA test results and validly selected historical sources, you will be able to determine the most reasonable path of relationship between the two DNA matching individuals. This process assumes that the DNA test results are accurate and fall within acceptable lines. The problem is that the calculations necessary to determine those relationships with any degree of accuracy requires massive computational ability. Only recently has such massive computational ability been practical or reasonably available. The top of the line desktop computers today can be networked to operate as one massive supercomputer so the cost of constructing a supercomputer for research is far less than the huge cost just a few years ago. Of course, if you were serious about using a present-day supercomputer, you could buy one for a fraction of the cost of a much less capable machine just a few years ago. Some supercomputers start at about the cost of a mid-range car.

Another issue with developing such a huge database is the need of having access to one or more massive online family trees and millions of individual family trees. It would also be necessary to have a huge number of individual DNA testing subjects and a robust set of original historical genealogical data sources. All of these necessary elements are abundantly present for MyHeritage to begin the process of programming such a database.

Here is the description from MyHeritage of the new program from the Press Release:


TEL AVIV, Israel & LEHI, Utah--(BUSINESS WIRE)--MyHeritage, the leading global service for family history and DNA testing, revealed today its latest innovation in genetic genealogy — the Theory of Family Relativity™. This technology offers users, for the first time ever, theories that utilize nearly 10 billion historical records and family tree profiles to explain DNA connections. Until now, family history enthusiasts used two distinct domains for making discoveries: the paper-trail world of records and trees, and the biological world of DNA connections. Now, MyHeritage has combined these two domains and integrated them seamlessly.

The Theory of Family Relativity™ is based on a big data graph that connects billions of data points drawn from thousands of databases on MyHeritage, in real time. Every node on this graph represents a person, and every edge depicts a blood relationship between two individuals that is described in a family tree or a historical record; or a match between two tree profiles that are likely to be the same person; or two records that are likely to be about the same person. These connections between people and records are established by MyHeritage’s industry-leading matching technologies. MyHeritage engineers and algorithm experts led by the company’s CTO, Sagi Bashari, developed a unique approach that allows the big data graph to instantly compute all paths between millions of blood relatives. The Theory of Family Relativity™ draws upon this resource to construct the most plausible theories explaining how pairs of people linked by a DNA Match on MyHeritage are related, using family trees and historical records.

Previously, users who took a DNA test looking to find relatives were faced with puzzling lists of thousands of distant relatives, without many clues explaining the DNA connections. Now, for a growing percentage of these DNA Matches, theories are provided by MyHeritage that explain the precise relationship paths using trees and records. In these theories, not only does genealogy illuminate DNA connections, but DNA also helps separate fact from fiction in the genealogy and shows which tree and record connections appear to be correct.

This technology uses millions of family trees on MyHeritage, as well as the World Family Tree on Geni, which is replicated daily to MyHeritage, and the single family tree of FamilySearch, which is also replicated daily to MyHeritage under license. This combination results in the most comprehensive family tree traversal available today. Additionally, the technology utilizes billions of historical records on MyHeritage, including all census records, as well as the MyHeritage Record Detective™ technology that indicates whenever two records are about the same person. For example: a theory that explains a DNA Match between two users can begin in the family tree of the first user, traverse through a series of matching trees into a census record, continue to a household relative, who then matches into another tree, until the path completes with the family tree of the second user. MyHeritage displays the complete path of every theory, and explains every step along the way, allowing the user to verify its accuracy. Each theory is presented with a confidence level that is based on the confidence of the matches used to construct it.

“Our new technology is a game changer in its scope and power and is a tribute to our passion for developing the best genetic genealogy tools for our users,” said Gilad Japhet, Founder, and CEO of MyHeritage. “Using genealogy to explain DNA Matches, and using DNA to validate genealogy matches, combines the best of both worlds. We expect this technology to help people make new discoveries in their family history. With every day that goes by, this technology grows even more powerful as more tree profiles, historical records, and DNA kits are added to our global database.”

The Theory of Family Relativity™ feature is included for free with all Premium, PremiumPlus, and Complete subscriptions on MyHeritage. Individuals who upload their raw DNA data from other testing services to MyHeritage who do not have a subscription can pay a one-time fee of $29 per DNA kit to unlock the Theory of Family Relativity™ and the full range of advanced DNA features offered by MyHeritage.

2 comments:

  1. This is quite a fascinating addition to the DNA matching on My Heritage.

    I manage my mother's DNA test in my account on My Heritage. There has been a DNA match which is supposed to be relatively close to her but the individual has a MyHeritage tree of two people. I haven't bothered trying to use just her name to try and figure out the relationship and have not bothered to contact her directly yet.

    The new Theory of Family Relative screen shows her as matching with a 30% confidence to a probable copy of her in someone else's tree - it's marked Private with just her last name there. Her grandmother matches with 100% confidence with a woman in someone else's tree. The grandfather of that grandmother matches with 99% confidence to my mother's great-great-grandfather in my mother's tree. Taking this stitched together combination of four trees shows this previously undocumentable match to be a 3rd cousin based on the paper trail.

    The DNA prediction was that they would be between 1st cousins twice removed and 2nd cousins twice removed. 3rd cousin is right in the middle of that range.

    Basically this removes all the work of determining probable earliest common ancestors, if sufficient trees exist on My Heritage, and just leaves the work of deciding if this connection is real or not.

    ReplyDelete
    Replies
    1. This is exactly what MyHeritage is trying to do. I think it is great.

      Delete