Pages

Thursday, March 14, 2019

Far-reaching Changes from MyHeritage’s The Theory of Family Relativity and AutoClusters: Part Three


The AutoCluster Tool from MyHeritage.com

Quoting from a blog post from MyHeritage.com entitled, “Introducing AutoClusters for DNA Matches,”
AutoClusters organize your MyHeritage DNA Matches into shared match clusters that likely descended from common ancestors. By grouping together DNA Matches who likely belong to the same branch and have a common ancestor, AutoClusters can be very helpful in shedding light on the relationship paths that connect you and your matches. By reviewing family trees of clustered matches, users can piece together the entire branch. Clusters are color-coded for convenience and are presented in a powerful visual chart, as well as in list format. 
This new tool was developed in collaboration with Evert-Jan Blom of GeneticAffairs.com, based on technology that he created, further enhanced by the MyHeritage team. Our enhancements include better clustering of endogamous populations (people who lived in isolated communities with a high rate of intermarriages, such as Ashkenazi Jews and Acadians), and automatic threshold selection for optimal clustering so that users need not experiment with any parameters.
The concept of using “clusters” has long been a staple of genealogical researchers for years. The basic idea is to identify something a group of items or people have in common and then build on that relationship to identify the group. Here is the formal definition of cluster analysis from Wikipedia: Cluster analysis.
Cluster analysis or clustering is the task of grouping a set of objects in such a way that objects in the same group (called a cluster) are more similar (in some sense) to each other than to those in other groups (clusters). It is a main task of exploratory data mining, and a common technique for statistical data analysis, used in many fields, including machine learning, pattern recognition, image analysis, information retrieval, bioinformatics, data compression, and computer graphics.
There is no one way to define and discover clusters. Genealogists can use DNA, as is used here in the MyHeritage app or religious affiliation, ethnic origin, occupation, or any combination of other ways of identifying groups. What is unique about the method demonstrated by MyHeritage is its application to a relatively large number of DNA testing results that are then graphically cross-related to each other to form actual graphics clusters. But at this point, the work for the genealogical researcher has just begun.

One immediately evident use for these graphically constructed clusters is to indicate people who may be candidates for applying The Theory of Family Relativity to discover the identity of the person who may be the common ancestor of the interrelated people in each cluster. Of course, the possibility of deriving the identity of the common ancestor depends on whether or not the target ancestor has already been identified by competent and well-documented research in a sufficient number of family trees. To do so accurately may involve an extension of both the AutoClusters and the supporting data for The Theory of Family Relativity.

With the numbers of individuals included in the larger clusters, it may be a rather long task to identify each member of the cluster and map them into a single family tree that may then show the direction the research should take to identify the common ancestor. Of course, a large enough documented family tree would immeasurably accelerate the process.

At the very least, these clusters from MyHeritage will assist traditional genealogical research in ways the can only be presently imagined.
Here are the other installments of this series:

Part One: https://genealogysstar.blogspot.com/2019/03/far-reaching-changes-from-myheritages.html
Part Two: https://genealogysstar.blogspot.com/2019/03/far-reaching-changes-from-myheritages_7.html

No comments:

Post a Comment