In genetics, a centimorgan (abbreviated cM) or map unit (m.u.) is a unit for measuring genetic linkage. It is defined as the distance between chromosome positions (also termed loci or markers) for which the expected average number of intervening chromosomal crossovers in a single generation is 0.01. It is often used to infer distance along a chromosome. However, it is not a true physical distance.
The number of base pairs to which it corresponds varies widely across the genome (different regions of a chromosome have different propensities towards crossover) and it also depends on whether the meiosis in which the crossing-over takes place is a part of oogenesis (formation of female gametes) or spermatogenesis (formation of male gametes).
In humans one centimorgan corresponds to about 1 Mb (1,000,000 base pairs or nucleotides) on average.[1] [2] The relationship is only rough, as the physical chromosomal distance corresponding to one centimorgan varies from place to place in the genome, and also varies between males and females since recombination during gamete formation in females is significantly more frequent than in males. Kong et al. calculated that the female genome is 4460 cM long, while the male genome is only 2590 cM long.[3]
In contrast, in Plasmodium falciparum one centimorgan corresponds to about 15 kb; markers separated by 15,000 nucleotides have an expected rate of chromosomal crossovers of 0.01 per generation.
Note that non-syntenic genes (genes residing on different chromosomes) are inherently unlinked, and cM distances are not applicable to them.
Because genetic recombination between two markers is detected only if there are an odd number of chromosomal crossovers between the two markers, the distance in centimorgans does not correspond exactly to the probability of genetic recombination. Assuming the Haldane Mapping Function, eponymously devised by J. B. S. Haldane, the number of chromosomal crossovers is distributed according to a Poisson distribution,[4] a genetic distance of d centimorgans will lead to an odd number of chromosomal crossovers, and hence a detectable genetic recombination, with probability
P(recombination|linkageofdcM)=
infty | |
\sum | |
k=0 |
P(2k+1crossovers|linkageofdcM)
{}=
infty | |
\sum | |
k=0 |
e-d/100
(d/100)2k+1 | |
(2k+1)! |
=e-d/100\sinh(d/100)=
1-e-2d/100 | |
2 |
,
The formula can be inverted, giving the distance in centimorgans as a function of the recombination probability:
d=50ln\left({
1 | |
1-2 P(recombination) |
Genealogists often use "shared centimorgans" as a proxy for reciprocal of distance in a family tree, halving with each generational step. So if two individuals differ on average by:
etc. The margin of error increases with each step, so that beyond about 4 steps the ranges overlap to such an extent as to make it difficult to establish how many steps are involved, and beyond about 7 steps any relationship at all is tenuous.
The self/twin figure of 7050 cM corresponds to the sum of the cM lengths of human DNA for males and females.
When multiple genetic lines are inherited, they combine as root-sum-of-squares, so that full siblings share around 2493 cM or √2 times as much as half siblings.
Because some recombinations result in unviable gametes, or offspring that cannot themselves reproduce, the observed genetic distances in families tend to be lower (shared cM higher) than predicted by models based purely on physical recombination rates.
The centimorgan was named in honor of geneticist Thomas Hunt Morgan by J. B. S. Haldane.[5] However, its parent unit, the morgan, is rarely used today.