So much has happened in genetics the last 10 years! I will keep this original document up on my website a while longer for historical interest. It is the first public analysis of DNA I gave that was relevant to ancestry. For analyses all grown up, see the Publications section of my main page.  Bedford Home Page

My published research discovered new branches of mitochondrial DNA haplogroup T2e, including the important Ottomon Sephardic and Mexican branch labeled “T2e1a1a” and the Jewish branch “T2e1b” with an interesting non-synonymous mutation in a highly conserved region. The branches can be seen beginning with build 16 of Phylotree.org

---------------------------------------------------------------------------------------------------------------

 

Mitochondrial DNA and genealogy

Felice Bedford, University of Arizona, 2005, 2006, 2007

Sections of document: Introduction, Analysis of my mitochondrial DNA sequence, comparing my mtDNA to others, full sequence mtDNA, some things to think about, mitochondrial DNA databases for comparison, references, appendix (random things about DNA, random things about mitochondria)

----------------------------------------------------------------------------------------------------------------

bedford, 2005, 2006, 2007

Introduction:

Mitochondrial DNA has been used to trace "deep maternal ancestry". Deep maternal ancestry refers to your mother's mother's mother's mother's mother etc. repeated at least 500 times (and often many more). This is further back than people have written records, oral histories, last names, and the religions they identify with. It is possible to uncover deep maternal ancestry because of a unique characteristic of mitochondrial DNA. Most of your DNA is inherited such that half comes from your mother and half from your father. However, mitochondrial DNA comes only from your mother.

 

Analysis of my mitochondrial DNA shows the following mutations:

HVR1 Mutations

16114T

 

16126C

 

16153A

 

16192T

 

16221T

 

16294T

 

16519C

HVR2 Mutations

73G

 

150T

 

263G

 

309.1C

 

315.1C

 

As is conventional, all mutations are with respect to the Cambridge Reference Standard (CRS). The CRS is a commonly found European sequence. Thus, if you had that sequence - the CRS - exactly, by convention you would be said to have "no mutations". Differences from the CRS are shown as mutations at particular locations. This is true even if the rest of the world (besides those having the "standard sequence" as defined by convention) shared those same "mutations".

Locations between 16,001 and 16,540 (HVR1) and between 61and 570 (HVR2) were tested by familytreedna.com. Mutations shown above are all those mutations found in those ranges.

Familytreedna.com has classified the pattern of mutations above as haplogroup "T5". "T" refers to what many people (who are Sykes' groupies) refer to as clan mother Tara. Tara is believed to have lived around 17,000 years ago in Northern Italy, although not everyone agrees with this estimate. Tara's people would have come from the Near East, and her descendents spread all over Europe. Tara's ancestral mother, like for all the 7 (or 8) European "Daughters of Eve" (see Sykes, 2001), would have been one woman that left Africa about 100,000 years ago. Nearly 10% of Europe can trace their maternal ancestor to Tara. According to data provided by Vincent McCauley, and used in Richards et al. (2000), the criterion for T haplogroup are the mutations at 16126 and 16294. (All criteria refer to the part of the mitochondrial DNA known as HVR1, "Hyper Variable Region 1", also known as Control Region 1; HVR2, Hyper Variable Region 2, is less systematic and fewer databases exist. When it's clear HVR1 is being used, the 16 is usually dropped; e.g. 126 refers to position 16,126). At position 126, normally there would be a T, and at position 294, normally there would be a C. Mutations (of the substitution variety) usually go from C to T, T to C, G to A, or A to G.

T5 is one of the lineages that descended from T/Tara, and went a separate way from other subdivisions of T. Richards et al. use the criteria of additionally having a mutation from HSV1 at 153 (i.e. mutations at 126, 153, 294) to be classified as T5. According to familytreedna.com, T5 was a woman who would have lived closer to 10,000 years ago and therefore likely participated in the Neolithic expansion with the arrival of agriculture (Tara descended from hunter/gatherers, like 85% of European ancestors). Note, however, this may be based on the estimate that a mutation occurs on average once every 10,000 years which may be wrong in many situations. Other researchers do not agree with the Richards et al. subdivisions of Haplogroup T. Most agree there is a clear T1 descendent and pattern, but do not agree on all of the others. T2 seems to have emerged as a clear lineage as well. One source of confusion for other lineages of T is over regions 292 and 296. I believe undifferentiated T's are usually labeled as T*, so in some people's notation, my pattern is T*. To add to the confusion, some people who were originally classified as T5 are being reclassified as T2, the latter of which usually have a mutation in the control region that T5s do not. The reclassification is based on testing the entire mtDNA rather than just the control regions (see below). T5s appear to have many coding region mutations in common with T2. Few T5s have been fully sequenced in the published literature as of yet, leading to possibly false overclassification into T2. There's a nice article that appeared in 2006 showing a network of all Ts (control regions only) that were on the mitosearch database. It's by a mathematician who became interested because he is in T haplogroup as well.

In addition to having the 3 defining mutations for T5, my mitochondrial DNA has 4 additional mutations in Control Region 1 (HVR1). Considering each of my additional mutations, consider first the one at site 519. The mutation at 519 is pretty common in many lineages, both in and outside T. It is therefore not really useful for identifying origins. In addition, many research studies and publicly available mitochondrial databases do not look above 365. Therefore, if comparing mutation patterns to others in databases, one usually needs to ignore 519.

I suspect that my mutation at 221 may be an artifact. For one thing, it is just about unheard of in Haplogroup T, and is quite unusual elsewhere. For another, it was one of the 6 locations listed in a paper that describes a study with numerous phantom mutations at those 6 locations (Bandelt, et al., 2002), which otherwise are atypical. Phantom mutations are artifacts that arise from sequencing error. I do not mean to suggest that those 6 locations are more prone to sequencing error, just that when one sees a mutation at a location that doesn't make any sense, and 221 can be one of these, then sequencing error as an explanation becomes possible. When DNA is sequenced for testing, it is copied many times- much like the process of copying that produces real mutations, the testing itself produces mutations that are not present in the original DNA (hmm, doesn't this suggest that the very same regions subject to mutation naturally would be change artificially? Probably only in a wrong-headed model of why natural mutations cluster in the first place). Not sure how to pursue this possibility, except to resequence my original DNA, or perhaps test a relative who should have the identical mitochondrial DNA (or if it's a reading error, not a sequencing error, it should be visible on the trace). If anyone out there know my brother, ask him to send in the kit that I sent him more than a year ago…)

That leaves 2 more mutations in Control Region 1 to be explained, at 114 and at 192. They usually are not found in "T"/Tara Haplotypes.

 

Comparing my mtDNA to others:

Mitosearch.org is a publicly available database which has the results for more than 37,000 individuals (in January 2007) at least for control region I. In addition, a researcher was kind enough to check another database of 70,000 mitochondrial samples (thanks Valery!) from both published and unpublished studies. His database also includes the FBI database, which is available to the public. Out of these 100,000+ samples, my sequence, (i.e. 114, 126, 153, 192, and 294) is a match to only 4 other people. This is comparing only up to site 385 so that the data are comparable to a maximum number of samples. It also ignores my mutation at 221. If I If include the mutation at 221, then I have 0 matches. I believe it's unusual to have so few matches, though I think the more samples they collect, the more they are seeing relatively unique patterns. If one had the CRS exactly, in just the mitosearch database alone (currently 37,169 individuals) one would have matches to 864 other people.

One of my matches is from central Portugal. His/her mutations can be seen in the appendix of the paper by Pereria et al. (2000). This individual also is an exact match to mine for Control Region 2. (Control Region 2 doesn't at first appear to be an exact match, but different notations are used by the researchers and by many testing labs). A second individual is on the mitosearch.org database. The oldest known maternal ancestor in that line is named Jesusa Gonzalez, was born in a near border town to Mexico in the U.S. (Roma, Texas), and married a man born in Mexico. Testing on Control Region 2 was not done. The third match is from the FBI database. The individual is listed as "Hispanic" and no other information is available. The fourth match contains one extra mutation that I do not have, but has those critical unusual ones at 114 and 192 in addition to the standard T5 mutations. That person is from Brazil and was a research subject in a paper by Alves-Silva et al. (2000).

All 4 of my matches are to those of Hispanic origin. That fits my Sephardic heritage, though it sure raises interesting questions about the time-line. It is unclear when those atypical mutations arose. If one figures my maternal ancestors were in Spain or Portugal from around 2100 to 500 years ago, did the mutations develop during that time? Or did they develop before that? At least I know they could not have developed after.

Full-sequence mtDNA

It is now becoming increasingly common to test every single base pair of the entire mitochondria- not just the 2 control regions. It has become faster and cheaper to do so. This allows greater precision with which one can be categorized as well as localizing origins to more specific geographic regions and better estimates of nearest common ancestor to matches. I have not yet had my entire mt DNA analyzed.

Some things to think about:

Databases for comparing your mitochondrial DNA to other people include:

Note first that when looking at any database for matches, including appendices of papers, check which actual locations they tested since it's not always identical across labs and papers.

www.mitosearch.org (International registry begun in Oct. 2004, sponsored by familytree DNA. Anyone can post their results and/or search for specific sequences. You can also contact anyone on the database.

www.oxfordancestors.com (You have to hunt through the links. The list is sponsored by oxford ancestors. Anyone can search, but only members can contact someone from the list. In 2005, there seemed to be about 2000 European samples, but it is hard to tell. They claimed that 10,000 samples from research will be made available.)

http://www.fbi.gov/hq/lab/fsc/backissu/april2002/miller1.htm (database sponsored by the FBI. Their sources include data from research subjects at Berkeley and gathered from multiple sources. Database has to first be downloaded before it can be searched)

Also check out www.mitomap.org for tons of Mito DNA information, especially disease info. (Mito deletions and other mutations are often associated with disease, although usually from the coding region rather than the control region)

http://www.bloodoftheisles.net/ This main sources of the data are from the Oxford Genetic Atlas project where Brian Sykes examined 10,000 samples from Britain; it formed the basis of his history book, blood of the isles. Click along the top where it says "Results"

http://www.smgf.org/ The Sorenson database from the Sorenson Molecular Genealogy Foundation. Has 19,000 mtDNA records that can be searched publicly on-line. They also have paper and pen genealogy family trees that go along with the DNA. Be warned that if you decide to submit a sample, they own your data and you cannot remove your record.

http://www.mitomap.org/ One of my favorite sites. It does not have a database of mtDNA like the others, but has a wealth of information. For instance, it has a table with every mutation along with any reported disease association.

Send feedback on this page to bedford@u.arizona.edu

References (abbreviated)

Alves-Silva, J. et al. (2000). The Ancestry of Brazilian mtDNA Lineages. Am J Hum Genet, 67, 444-461.

Bandelt, et al. (2002). The finger print of phantom mutations in Mitochondrial DNA data. Am. Journal of Human Genetics, 71, 1150-1160.

Pereira et al, L. (2000). Diversity of mtDNA lineages in Portugal: not a genetic edge of European variation. Ann. Hum. Genet., 64, 491-506

Sykes, B. (2001). The Seven Daughters of Eve (popular book)

I Appendix:

Some random things about DNA:

Each human adult has more than 5 trillion (!) cells. Each cell has both nuclear and mitochondrial DNA. DNA consists of strings of 4 different molecules called "bases", abbreviated "T", "A", "C" and "G" for their longer chemical names. Each sequence of 3 bases is a unique code for producing one of 20 amino acids. (What about the rest of the 64 possible permutations of 4 bases taken 3 at a time? I guess 4 bases taken 3 at time is the minimum one could do to create all 20 amino acids. Quick calculations suggests 2 bases in a sequence of 3 isn't enough, 3 bases wouldn't allow the pairing of the bases, and 4 bases in a sequence of 2 isn't enough), such as lysine. E.g. a sequence "AAA" codes for one amino acid and "ATT" might code for another. If you string together many amino acids, you create a protein. A gene is defined by the sequence of bases (the sequence of triplets) that it takes to create 1 protein. Different proteins require not only different amino acid sequences, but also different numbers of amino acids depending on complexity; thus different genes are different lengths. It's these genes that code for everything that you are, though often it's many genes that are responsible for a trait.

Each sequence has basically a mirror sequence to which it's attached. T is always is bound to A, and C is always bound to G, so a sequence of CCT would be bound to a sequence GGA for example. In nuclear DNA these twist around and form the famous double helix that looks like a spiral staircase. In nuclear DNA, the different genes are themselves in a sequence on chromosomes. A gene's location on the chromosome is known as its locus. There are 23 pairs of chromosomes, half copied from the father, half from the mother.

 Some random things about mitochondria:

Complete genetic inheritance through the mother is in contrast to ordinary (nuclear) DNA which is inherited in equal amounts from ones mother and father. (Sperm cells from the father do have mitochondria, but only enough to get the sperm swimming up to the egg; it is then jettisoned after a sperm enters the egg. The egg cells from the mother are larger than sperm, and contain the mitochondrial and other stuff the embryo needs.) . Mitochondria provide the energy for the cell; cells requiring lots of energy like muscles and nerves have lots of mitochondria. The DNA inside the mitochondria is fairly short with only around 16,000 bases in length- much shorter than nuclear DNA. In addition, it forms a circle, unlike nuclear DNA, but like bacteria. One hypothesis has been that at least a million years ago parasites lived inside our cells in a symbiotic relationship, which now are the mitochondria.

 

 

Send comments to bedford@u.arizona.edu

Back to Bedford Home Page

Back to Bedford Genealogy Page