The human immunodeficiency virus... is one of the fastest evolving entities known.2
HIV shows stronger positive selection [having more beneficial mutations] than any other organism studied so far.3
HIV evolves extremely rapidly, exhibiting the highest recorded biological mutation rate currently known to science.4
Since first entering humans about 100 years ago, HIV has had more mutations and more replications than (for example) the fewer than 1020 mammals that would've ever lived during a 200 million year evolutionary timeline.23 24 Even though natural selection is much stronger in an RNA virus like HIV than in large genome mammals,3a 21 22 HIV has evolved very little terms of new or modified function. This strongly suggests evolution would not have enough opportunity to create the the vast and largely different information in the genomes of all mammals, where there was less opportunity for evolution to act.
During the 100 years of HIV evolution there's been:
- About 14,000 replication cycle "generations" of HIV (still much less than the total number of mammal generations).
- A total of about 6x1022 new HIV virus particles being produced.
- About 1022 total HIV mutations among all those new virus particles.
- About 5,000 or fewer constructive mutations becoming fixed within the various HIV subtypes.
The rest of this article quantizes HIV population sizes, mutations, and useful evolution.
HIV came from SIV (simian immunodeficiency virus) in chimpanzees, which in turn came from SIV in monkeys:
- SIV is a retrovirus that infects monkeys and apes, with different SIV variants infecting each species. In some African monkeys SIV is not known to cause any harmful effects.5a
- At an unknown time in the past, two different forms of SIV entered Chimpanzees and combined into a new strain5b that was sometimes deadly.5c
- Sometime "around the 1920s,"5 SIV first entered humans, becoming HIV.
HIV is a modified form of SIV that infects humans, found in two main types:
HIV-1 "is most closely related to SIVcpz,"5 the form of SIV found in some chimpanzees. HIV-1 is categorized into "groups M, N and O which represent separate transfers from chimpanzees,"3 although "one or two of those transmissions may have been via gorillas."5
HIV-1 group M (for major) accounts for the "vast majority (perhaps 98%) of HIV infections worldwide"5 while all other HIV types are mostly or entirely restricted to West Africa.3 Molecular clocks suggest that HIV1 group M first originated "around the 1920s"5 and the other groups don't "appear older than HIV-1 group M."5
Various studies estimate that about 1010 to 1012 HIV viruses exist in an infected person:
|Haase et al, 19966||"the FDC-associated [follicular dendritic cell] pool of HIV RNA would be about 1011 copies in a 70-kg HIV infected individual."
Follicular dendritic cells are major reservoirs for HIV-1 within the lymph nodes.
|Perelson et al, 199610||"The estimated average total HIV-1 production was 10.3x109 virions per day." This is 1.3x1010
"the average HIV-1 generation time--defined as the time from release of a virion until it infects another cell and causes the release of a new generation of viral particles--is 2.6 days."
|Brown, 19978)||"HIV infections are initiated from a small inoculum and increase very rapidly to ≈1010 in the first stages of infection, so a considerable reduction in Ne [effective population size] would be expected to be due to this expansion."|
|Rambaut et al, 20043||"[HIV] has a viral generation time of ~2.5 days and produces ~1010 to 1012 new virions each day" Perelson, 1996 is cited for this estimate.|
|Coffin et al, 20137||"In the case of HIV-1 infection, perhaps 1011 virions are produced daily; the number of cells infected in the same time span is... unlikely to exceed 109."
"this result is consistent with the high natural turnover rate of activated effector memory helper T cells, the primary target for HIV-1 infection, on the order of 1010 cells per day, of which only a small fraction are infected after the initial primary infection phase."
Rather than by counting and extrapolating, some studies (not included above) measure HIV genetic diversity, and use that to estimate effective population sizes in the range of 450 to 105 HIV virions per person.12 These estimates "are many orders of magnitude lower than the census size--a result that has surprised and perplexed many in the HIV-1 community."11
However, models estimating effective population "are heavily influenced by variations in allele frequency" and should "be taken as a lower bound" with the true values "likely to be much higher."9 This is because low genetic diversity in HIV leads to lower effective population size estimates. Even if the real population size is much larger.
Why would HIV have low genetic diversity? Because in each HIV infection, the HIV starts as only a small number of virus particles and then expands to nearly a trillion viruses. And rapidly expanding populations have lower genetic diversity. HIV is also subject to strong selection, and selection removes variants from a population.
Therefore since the effective population size is misleading, and since observation trumps such models, the observed population size estimates in the table above are more reliable.
HIV reproduces about once every 2.6 days,10 which over the last 100 years since HIV first entered humans is about 14,000 "generations." Counting only the last 40 years (1977-2017) when HIV population sizes were significant gives about 5,600 total "generations."
Sadly there are "an estimated 42 million people carrying the [HIV] virus at present."3 Multiplied by 1011 HIV virions per person gives 4x1018 HIV virions existing at any given time.
If "perhaps 1011 virions are produced daily,"7 14,600 days over 40 years times 1011 virions per person per day times 4.2x107 people with HIV gives a total of 6.13x1022 HIV virions that have ever existed in humans. That gives us:
- 14,000 reproduction cycle "generations" of HIV over the last 100 years.
- 4x1018 total HIV virions existing in all humans at any given time.
- 6x1022 total HIV virions existing in all humans over the last 100 years.
However, only one in 100 or fewer virions go on to infect other cells: "perhaps 1011 virions are produced daily,"7 although "the number of cells infected in the same time span... is unlikely to exceed 109 "7
HIV has "about 2×10-5 mutations per site per replication cycle."13 An earlier study "reported a much higher mutation rate," but it "focused on integrated provirus and might not reflect the mutational frequency in the circulating HIV-1 virions."13 The HIV-1 genome is 9181 nucleotides, so that works out to:
- HIV has about 0.18 mutations per replication, or one mutation every 5.6 replications.
- HIV genomes have had a total of about 1022 mutations since first entering humans about 100 years ago.
This means that during the last 100 years, a point mutation has occurred at every single letter of HIV's 9181 nucleotide genome about 1.1x1018 times (1022 / 9181). If we account for point mutations changing nucleotides to one of three other letters, every nucleotide of HIV's genome has been tried out 3.7x1017 times (1.1x1018 / 3), and every possible combination of two nucleotide mutations has been tried about 6.1x108 times (previous number divided by 9191*3).
For comparison, the common bacteria E. coli have a genome of about 5.4 million nucleotides, and have one mutation every 1000 replications.19 Therefore it would take a population of about 16.2 billion E. coli (5.4 million times 1000 times 3) to mutate every possible nucleotide, and a population of about 2.6x1020 E. coli to try out every possible combination of two mutations.
For humans, the numbers are similarly large. We have a 3 billion nucleotide haploid genome and about 33 mutations per haploid genome per generation. Thus it would take about 273 million human reproductions to test every possible single nucleotide mutation, and 2x1026 human reproductions to try every possible combination of two mutations. The table below extrapolates further for combinations of 3, 4, and 5 mutations:
|Every combination of||Times found by HIV||E coli Needed||Humans Needed|
In other words, it would take about 6.9x1040 E. coli to try out every possible combination of 4 specific mutations in its genome. This is more than the total of 1040 cellular organisms estimated to have existed on a 4-billion year old Earth.23 24 Yet in the last 100 years, HIV has tried out every possible combinations of 4 mutations in its own (much smaller) genome. And has done so about 17,301 times.
What's the purpose of this comparison? When HIV uncovers evolutionary gains that require 3, 4 or even 5 simultaneous, specific mutations to all be present, we should not expect other organisms to evolve such specific mutations at all. Even if there are thousands of possible ways to evolve through such paths of simultaneous mutations.
The annotated3 chart below shows all strains of HIV circulating within humans (red lines) and their inferred origins from monkey and ape SIV (gray lines). Longer vertical lines indicate more fixed mutations. Blue numbers indicate the total number of mutations fixed during the time represented by the red vertical lines. The sum of all blue numbers indicates that about 5,160 mutations have become fixed among the various HIV subtypes since first entering humans.
Confounding these estimates, most evolution occurs within a single host and is then lost: "HIV evolves extremely rapidly within individuals, viral evolution is somewhat slower on a population level,"4 Although there's still "extensive viral diversity both within and between hosts,"4
Therefore taking the fixed mutations per group makes the estimate of 5,160 fixed mutations rather arbitrary. We'd estimate a smaller number of mutations if we randomly picked a single HIV-1M virus and counted the mutations in its lineage since it first entered humans. Or if we counted all mutations among every person with HIV we would get a much larger number of mutations, many times the ~9181 nucleotides in an average HIV-1 genome. Identical mutations would be counted many times over.
Likewise if we wanted to compare the number of mutations separating humans and mice, we would compare the average human genome to the average mouse genome, but we wouldn't include the billions of unique mutations present in small numbers within both human and mouse populations.
The chart above omits HIV-1 group M subtypes H through K, which would increase the number of fixed mutations beyond 5,160. However, even though HIV shows "stronger positive selection than any other organism studied so far,"3 and most mutations with HIV's ENV gene "confer a selective advantage,"3 (HIV has about 10 genes) it's unlikely that all 5,160 mutations are positively selected and constructive. So we will use the upper-bound estimate:
Since first entering humans in the 1920s, about 5,000 or fewer constructive mutations became fixed among the various HIV subtypes.
Tetherin is a protein used inside mammal cells to build tethers "between virus envelopes and the cytoplasmic membrane of the cell, preventing the release of those viruses."5
Some strains of SIV have a Vpu gene that produces a protein that counteracts tetherin. For example "Vpu protein of SIVgsn has been shown to counteract greater spot-nosed monkey tetherin."5 But the strain of SIV that infects chimpanzees uses Vpu only for "anti-CD4 activity"5 and does not use Vpu against tetherin. CD4 is a protein on cell surfaces that SIV and HIV use to enter T-cells.
During the process of entering humans, HIV-1 groups M and N both and separately reactivated Vpu's anti-tetherin ability: "the vpu gene did not diverge to the extent that the activity could not be rescued."5 "When SIVcpz crossed the species barrier to infect humans... Vpu subsequently (re)gained its tetherin-antagonizing function."15 Through this evolution, "HIV-1 group N Vpu has lost [its] anti-CD4 activity,"5 although this ability was retained in HIV-1 group M.
However "it is likely that SIV jumped into humans many times"5 before it led to the modern AIDS epidemic. Since these species-crossing attempts occurred perhaps far into the remote past, it's not possible to estimate how many viral replications occurred before SIV was able to discover these mutations.
The differences in the Vpu protein in HIV/SIV in various species.16 Each letter represents an amino acid. "Hydrophobic TM [transmembrane] domain", "α-helix", and "β-turn" are different regions of the protein.
- HIV-1 = humans
- SIVcpz = chimpanzee
- SIVgor = gorilla
- SIVmon = mona monkeys
- SIVgsn =greater spot-nosed monkeys
- SIVmus = mustached monkeys
Some evolutionists point to HIV-1's anti-tetherin activity as proof that evolution can create features that require a large number of simultaneous mutations in many organisms. This is a non-sequitur for two reasons. First, as noted above, HIV has had the mutational resources to explore most possible combinations of five mutations. Exploration this vast can only happen in RNA viruses like HIV, since they have huge populations, very small genomes, and very high mutation rates. Larger and more complex organisms would never be able to explore that many combinations even if given trillions of years. Irreducible complexity (as defined by its originator, Michael Behe25) is a function of population size--larger populations can overcome more simultaneous mutations.
Second, HIV-1's anti-tetherin ability seems as if it was acquired gradually one improving mutation at a time, rather than occurring through multiple simultaneous mutations.
In order for HIV 1 group M's Vpu gene to counteract tetherin, at least three amino acids are needed in the Vpu's transmembrane region: "three amino acid positions, A14, W22, and, to a lesser extent, A18, are required for tetherin antagonism."17 Another research group reached similar conclusions by splicing that and an adjacent region into a Vpu gene from chimpanzee SIV: "SIVcpz Vpu was able to completely rescue the tetherin restriction phenotype when it encoded both regions 1-8 and 14-22 from HIV-1."1818 The researchers saw that "chimeras within each region yielded intermediate phenotypes, suggesting that these regions harbored several minor determinants."1818
We can calculate the mutational path for those three amino acids to mutate to get anti-tetherin activity in HIV-1 group M. The diagram above (under Hydrophobic TM domain) shows that HIV-1 group M's Vpu protein differs from the SIV's at amino acid positions 14 and 18. The W at position 22 was already present in chimpanzee SIVcpzPtt, the groups that most closely resemble human HIV-1 groups M and N. The necessary mutations:
- Position 14: Chimpanzee SIVcpzPtt has amino acid G (glycine) at position 14. A DNA codon table shows that going from amino acid G (glycine) to A (alanine) at position 14 would involve changing only one nucleotide.
- SIVcpzPtt MB66 has a V (valine) at position 14, which would also only involve changing one nucleotide to mutate to the A (alenine) of HIV 1 group M.
- SIVcpzPtt MT145 and GAB1 have an I and an L at position 14, which would require changing two nucleotides, which would require changing two nucleotides to become an A.
- Position 18: Most Chimpanzee CIVcpzPtt strains have amino acid E (glutamic acid) at position 18. Changing that to amino acid A (alenine) in HIV-1 group M would take only one mutation.
- SIVcpzPtt MT145 and GAB1 have an L at position 18, which would require changing two nucleotides to become an A.
- Position 22: This position is already a W in all SIVcpzPtt.
- Positions 1-8?: Possible additional unknown mutations at these spots may confer additional anti-tetherin activity, since this region "harbored several minor determinants."18 Since chimeras within this region "yielded intermediate phenotypes",18 we don't have evidence any of these mutations would need to happen simultaneously.
Therefore, HIV-1 group M's anti tetherin activity could have begun with a single mutation changing the amino acid at position 14, then a second mutation that enhanced the anti-tetherin activity at position 18.
Of course there were many other mutations that don't affect its anti-tetherin ability. It should be stated that this is a new mutational pathway that grants anti-tetherin ability, not simply reversions to the amino acids found in monkey SIVs.
Unfortunately, I haven't found conclusive data on whether the evolution of anti-tetherin activity in HIV-1 group N is or isn't gradual. A 2012 study found that "four TMD amino acid substitutions (E15A, V19A, I25L and V26L) were sufficient to render the SIVcpz Vpu active against human tetherin."28 Figure 5A in their paper (reproduced below) shows human HIV-1 group N in yellow, SIV in blue, and how the researches mixed and matched the amino acids to get different results:
- Changing amino acids at positions 25-26 alone is marked as making no difference, although the gray triangle above the "25" in the second part of figure 5B (not reproduced here) might indicate very slight improvement.
- Changing amino acids at 15 and 25-26 shows low anti-tetherin activity,
- Changing amino acids at 15, 19, and 25-26 grants moderate anti-tetherin activity.
- Also changing 16-18 grants high anti-tetherin activity.
Unfortunately the researchers didn't test position 15 by itself, 25 and 26 separately, or 16-18 separately, so we don't know if doing so grants stepwise gains.
Each row is a list of amino acids in part of the VPU protein from a strain of HIV, SIV, or an artificially created hybrid. Yellow sequences are from human HIV-1 group N, blue from Chimpanzee SIV. Under the Tetherin Release column, - indicates no activity, (+) is low activity, + is moderate activity, and ++ is high activity.
The human leukocyte antigen system is a set of genes that creates proteins on cell surfaces to present viral proteins to white blood cells. A 2011 study looked at HIV evolution within a Chinese sub-population over a course of 10 years. Among RNA letter variations seen in the HIV gag, reverse transcriptase, integrase, and nef genes, finding "24%-56% were sites of HLA-associated selection."26 In simple terms, these mutations helped HIV survive, likely by changing the structure of HIV proteins enough that they were no longer recognized by the immune system.27a
However, some of these mutations end up harming HIV in other ways, and those often revert when infecting a new host with a different immune system:
The long-term stability of T-cell escape mutants depends on the fitness cost incurred by the virus; variants with a high-fitness cost tend to revert to the original sequence after transmission to a host without the selecting HLA allele, unless an appropriate compensatory mutation is also present. Other variants revert only slowly, if at all...26
This section is incomplete, although there are many other gains that could be documented here.
Dubrow, Aaron. "Computing a Cure for HIV: 9 Ways Supercomputers Help Scientists Understand and Treat the Virus." Huffington Post. 2014.The header image here is a modified version of the image in this article. ↩
The authors write: "HIV shows stronger positive selection than any other organism studied so far." Hiv is an example how natural selection is strongest on small, simpler genomes. ↩
"More than 40 species of African monkeys are infected with their own, species-specific, SIV and in at least some host species, the infection seems non-pathogenic." ↩
"Chimpanzees acquired from monkeys two distinct forms of SIVs that recombined to produce a virus with a unique genome structure." ↩
"We have found that SIV infection causes CD4+ T-cell depletion and increases mortality in wild chimpanzees, and so the origin of AIDS is more ancient than the origin of HIV-1." ↩
Haase, A. T. et al. "Quantitative image analysis of HIV-1 infection in lymphoid tissue." Science. 1996. Middle of page 7. Mirrors: Amazon S3 ↩
Coffin, John et al. "HIV Pathogenesis: Dynamics and Genetics of Viral Populations and Infected Cells." Cold Spring Harb Perspect Med. 2013. ↩ ↩ ↩ ↩
Brown, Andew J. Leigh. "Analysis of HIV-1 env gene sequences reveals evidence for a low effective number in the viral population." PNAS. 1997. ↩
Maldarelli, Frank et al. "HIV Populations Are Large and Accumulate High Genetic Diversity in a Nonlinear Fashion." J. Virology. 2013. ↩
Perelson, Alan S et al. "HIV-1 dynamics in vivo: virion clearance rate, infected cell life-span, and viral generation time." Science. 1996. ↩ ↩
Althaus, Christian L et al. "Stochastic Interplay between Mutation and Recombination during the Acquisition of Drug Resistance Mutations in Human Immunodeficiency Virus Type 1." J. Virol. 2005. See table 1 for a list of seven previous estimates of HIV-1 per-human effective population size. Estimates range from 450 to 105. ↩
Zanini, Fabio et al. "In vivo mutation rates and the landscape of fitness costs of HIV-1" Virus Evolution. 2017. ↩ ↩
Snoeck, Joke et al. "Mapping of positive selection sites in the HIV-1 genome in the context of RNA and protein structural constraints." Retrovirology. 2011. ↩ ↩
Sauter, Daniel et al. "The evolution of pandemic and non-pandemic HIV-1 strains has been driven by Tetherin antagonism." Cell Host Microbe. 2010. ↩
Vigan, Raphaël et al. "Determinants of Tetherin Antagonism in the Transmembrane Domain of the Human Immunodeficiency Virus Type 1 Vpu Protein." Journal of Virology. 2010. ↩
Lee, Heewook et al. "Rate and molecular spectrum of spontaneous mutations in the bacterium Escherichia coli as determined by whole-genome sequencing." PNAS. 2012. The authors estimate "2.2x10-10 mutations per nucleotide per generation or 1.0x10-3 mutations per genome per generation" ↩
Jared C. Roach et al. "Analysis of Genetic Inheritance in a Family Quartet by Whole-Genome Sequencing." Science. 2012. The authors "estimated a human intergeneration mutation rate of ~1.1x10-8 per position per haploid genome." This times 3 billion nucleotides in a human haploid genome is 33. The whole (diploid) genome mutation rate would be 66. ↩ ↩
Lynch, Michael. "The Origins of Eukaryotic Gene Structure." Mol Bio Evol. 2006. Lynch writes: "the efficiency of natural selection declines dramatically between prokaryotes, unicellular eukaryotes, and multicellular eukaryotes" and "all lines of evidence point to the fact that the efficiency of selection is greatly reduced in eukaryotes to a degree that depends on organism size." Lynch explains this because more complex organisms typically have 1. smaller population sizes, 2. "decreases in the intensity of recombination" and 3. lower mutation rates per nucleotide. ↩
Sanford, John, et al. "Mendel's Accountant: A biologically realistic forward-time population genetics program." Scalable Computing. 2007.The authors explain: "each nucleotide in a smaller genome on average plays a greater relative role in the organism’s fitness" ↩
Behe, Michael J. "The Edge of Evolution." 2007. Page 60.Behe writes: "Recall that the odds against getting two necessary, independent mutations are the multiplied odds for getting each mutation individually. What if a problem arose during the course of life on earth that required a cluster of mutations that was twice as complex as a CCC? (Let’s call it a double CCC.) For example, what if instead of the several amino acid changes needed for chloroquine resistance in malaria, twice that number were needed? In that case the odds would be that for a CCC times itself. Instead of 10^20 cells to solve the evolutionary problem, we would need 10^40 cells." A "CCC" is Behe's own term: "chloroquine-complexity cluster." It's what he calls the two simultaneous mutations needed for p. falciparum (the parasite that causes malaria) to evolve resistence to the drug chloroquine. ↩
Dong et al. "Extensive HLA-driven viral diversity following a narrow-source HIV-1 outbreak in rural China." Blood. 2011. ↩ ↩
Guha, et al. "Innate Immune Evasion Strategies by Human Immunodeficiency Virus Type 1." ISRN AIDS. 2013. ↩ ↩
"The HIV-1 RNA genome can be mutated randomly which helps the virus to evade immune recognition by the host." ↩
SAuter, Daniel. "Human Tetherin Exerts Strong Selection Pressure on the HIV-1 Group N Vpu Protein." PLoS Pathogen. 2012. ↩