The C-Value Paradox
Some species have dozens or hundreds of times more DNA than other species of similar complexity. Because of this, some have argued that most DNA in complex organisms must be unused junk, and therefore they are not designed. There are several good reasons to think otherwise:
- Some of the largest reported genome sizes don't report haploid genome size, or contain contamination.
- Genome size is roughly correlated with number of cell types
- Genome size correlates with cell size.
- Genome size differences may represent tradeoffs between different forms of data storage.
- In some cases, some organisms with excessively large genomes may actually have large amounts junk DNA, created by runaway transposon duplication.
- But in humans (the most-studied complex animal) there is good evidence that most DNA is functional. Therefore most mammal DNA is likely functional since mammals are similar in complexity and mammals all have about the same amount of DNA.[^gregory-2001]
A C-value is the weight (in picograms) of all DNA in a haploid genome. That means all the DNA among unique chromosomes, since many organisms have two or more copies of each chromosome. Some organisms of similar complexity have very different C-values. Therefore it was reasoned that most DNA in those with very large genomes must be junk, since other organisms of similar complexity could get by on so much less. Francis Crick and Leslie Orgel, 1980 were among the first to make this argument:
We also have to account for the vast amount of DNA found in certain species, such as lilies and salamanders, which may amount to as much as 20 times that found in the human genome. It seems totally implausible that the number of radically different genes needed in a salamander is 20 times that in a man.[^crick-1980]
Or more recently by T. Ryan Gregory in 2014:
Genome size varies enormously among species: at least 7,000-fold among animals and 350-fold even within vertebrates... a human genome contains eight times more DNA than that of a pufferfish but is 40 times smaller than that of a lungfish. Third, organisms that have very large genomes are not few in number or outliers—for example, of the >200 salamander genomes analyzed thus far, all are between four and 35 times larger than the human genome. Fourth, even closely related species with very similar biological properties and the same ploidy level can differ significantly in genome size... the notion that the majority of eukaryotic noncoding DNA is functional is very difficult to reconcile with the massive diversity in genome size observed among species, including among some closely related taxa.[^gregory-2014]
Because onions can have six times more DNA than humans, the C-value paradox has colloquially been referred to as the "onion test" (a term coined by Gregory in 2007). However, there are five reasons I don't think this can be used to argue that most DNA in most eukaryotes is junk:
1. Polyploidy and/or contamination
Some large reported genome sizes don't report haploid genome size, or contain contamination. Researchers report in 2007:
...the C-value for the salamander genus Ambystoma has been taken to indicate a genome size 10-25 times larger than other vertebrates although polyploidy is known in Ambystomatidae, and a recent genetic map suggests the salamander genome may not be greatly dissimilar in size to other vertebrate genomes. The lungfish also has a high C-value, superficially suggesting that its genome is over an order of magnitude larger than primates, but is again known to be polyploid. Other groups of organisms that exhibit a wide range of C-values, such as crustaceans and insects, are also frequently polyploid. There may also be significant measurement errors stemming from different experimental methodologies, interfering compounds and physiological states. For example, there are widely differing estimates of the DNA content of the lungfish Proteopterus aethiopicus, with measurements ranging from 40 to 130pg...
...amoebae are often cited as the most dramatic example of the lack of correlation between genome size and biological complexity. There are may problems with this conclusion, including a likely variation in ploidy... as well as the presence of significant amounts of contaminating DNA from their prey. The amoeba genome is probably smaller than 20 pg, far less than the 700 pg commonly cited.[^taft-2007]
Likewise from Lui et al, 2013:
...our analysis is focused on haploid genome composition, thus removing the confounding factor of ploidy or the contaminating DNA of prey, which are likely to be the primary cause of the large genome sizes attributed to lungfish and amoeba, respectively.[^lui-2013]
But what good is having higher levels polyploidy? Luca Comai, 2005 offers three advantages:
the advantages of polyploidy are caused by the ability to make better use of heterozygosity, the buffering effect of gene redundancy on mutations and, in certain cases the facilitation of reproduction through self-fertilization or asexual means.[^comai-2005]
Likewise, a study in 2004 observed that plant "tetraploids generally grew at higher altitudes than the diploids."[^suda-2004] Perhaps due to a shorter growing season these plants need to get all their transcription done a little faster than their diploid cousins. Although higher levels of ploidy also come with disadvantages--thus a tradeoff:
the disrupting effects of nuclear and cell enlargement, the propensity of polyploid mitosis and meiosis to produce aneuploid cells and the epigenetic instability that results in transgressive (non-additive) gene regulation.[^comai-2005]
However, misreporting and polyploidy certainly don't explain all variations in C-value among organisms.
2. Genome size roughly correlates with number of cell types
This figure from Lui et al, 2013[^lui-2013] shows the ratio of noncoding (NC) DNA to total genome size (TG) in various organisms (A) and the percentage of non-coding DNA versus the number of cell types (B).
In the figures above, viridiplantae are algae and plants, while metazoa are animals, with vertebrates being within the deuterostomia. It makes sense that more complex organisms (measured by number of cell types) would require larger genomes.
This figure from the same paper[^lui-2013] also shows the distribution, although note that the top axis is not linear:
However, there are some notable outliers, as can be seen in this image. Note that the bottom axis is logarithmic. 1 pg (picogram - one trillionth of a gram) is about 1 billion letters of DNA:[^memim-2017]
Genome researcher John Mattick explains that rather in protein coding genes, it seems that larger genomes have more complexity in their non-coding DNA:
It was originally assumed that as complexity increased there would be more and more such genes - before the genome was sequenced there was speculation that humans might have a hundred thousand or more, and it was a huge shock that it's much less, and doesn't scale with complexity. But there are very large numbers of long non-coding RNAs, so this is where the real genetic scaling has occurred.[^mattick-2010]
3. Genome size correlates with cell size
The figure below shows a correlation between genome size and cell size in various taxa, composited from Beaulieau et al 2008 figure 3[^beaulieu-2008] and Gregory 2001 figures 1 and 3.[^gregory-2001] "Angiosperms" iincludes all flowering plants, including flowering trees. Reptiles, birds, and mammals show a weaker correlation because their cell sizes vary less than the other groups.
Cavalier-Smith and Beaton, 1999 explain that more DNA is needed in larger cells to account for greater rates of production:
The situation is like that of a car factory aiming for a steady output of cars: engines, wheels and doors must be made at the same rate; if overall output is to be increased the number of each must be increased by the same proportion. Moreover, if each robot, machine tool, and operative is already working at maximal rates, one can increase output only by increasing the number of assembly lines. As these take up space the factory also has to be larger. In a cell the nucleus is the production line for RNA molecules. To produce more per cell cycle one must have more copies of RNA polymerases and more copies of spliceosomes and other processing machinery, e.g. mRNA capping machinery; both of these take up space, as do nascent RNAs as well as those being processed and in transit towards the nuclear pores. Thus nuclei have to be larger in larger cells.[^cavalier-smith-1999]
However, production rate is only one factor and overall the explanation is not that simple, since cells with larger genomes don't always have more copies of protein-coding genes.[^gregory-2001]
4. Genome size versus efficiency tradeoffs
Among our own designs it's common to see tradeoffs between size, speed, and other limitingn factors. A familiar example is in image compression where decoding and encoding speed also come into play. The png image format offers a small size without loss of image quality, but most computers are not fast enough to record a video where every image is saved as a png file in real time.
The following table shows file sizes for an 8 megapixel photo of diced onion, using different image encoding formats:
The PC and Xbox One game Titanfall provides another size versus speed tradeoff. The PC enthusiast website Tom's Hardware explains:
Want to know why the just-released Titanfall shooter is such a hefty install on the PC? Blame it on the lower-end machines. The Xbox One version of Titanfall is a mere 17 GB, but the PC version eats up around 48 GB of hard drive space, 35 GB of which is all uncompressed audio so that lower-end machines aren't bogged down with decompressing audio.[^parish-2014]
An opposite extreme can be seen in the PC game .kkrieger,[^kkrieger] a first person shooter with detailed graphics and sounds that requires only 96KB of disk space--500,000 times smaller than TitanFall, and a shocking three times smaller than the onion.jpg above. The .kkreiger developers created all sound effects, music, textures, and 3D models through fractals--the tradeoff being a lack of nuanced control over artistic assets.
We may see a similar size versus fidelity tradeoff in genomes via alternate splicing. One ENCODE researcher explained:
Organism introduce genetic variation in different ways. For instance, in Drosophila, a 100 kilobase gene (DSCAM) encode thousands of different proteins through a complex alternate splicing mechanism. One could envision copying each of these transcripts - without the alternate splicing - into the genome thus increasing the size of the genome by 10 million bases, or roughly 10%, but not changing the complexity at all.[^everywhere-2013]
This is similar to how our own compression algorithms operate--frequently used sequences are stored only once and re-referenced, as opposed to the same information being stored multiple times. We can imagine some cases where there are many copies of a sequence in a genome, each slightly different and optimal for its use case. While in other genomes those copies may be condensed into a smaller number of sequences assembled through alternate splicing. Many specialized sequences become a few common sequences--like the fidelity vs size tradeoff as we see in jpeg images. In plants we see that smaller genome sizes can offer the benefit of faster replication speed: "there is a striking connection between DNA content per cell and the minimum generation time of the plant."[^crick-1980]
5. Runaway Transposon Duplication
Transposable elements (also known as transposons or TE's) are stretches of DNA that can copy and move (transpose) themselves to new locations in a genome. They increase their number in doing so and can even copy themselves relatively quickly. In some taxonomically restricted cases of large genomes, the excess may actually be due to runaway transposon duplication and would therefore be true junk DNA. This may be the case with onions of the genus allium, which range in genome size from "7 pg to 31.5 pg"[^gregory-2007] It may also be the case that organisms with similar genomes (such as the various species of salamanders) are prone to the same factors leading to runaway transposon duplication.
However, the cases like onions cannot be used to argue that most DNA in most eukaryotes is junk. Mammals for example all have close to the same genome size.
- [^crick-1980]:Crick, Francis and Leslie Orgel. "Selfish DNA: The Ultimate Parasite." Nature. 1980.
- Mirrors: Archive.org | Local excerpt with comment
- [^gregory-2014]:Gregory, T. Ryan and Alexander Palazzo. "The Case for Junk DNA." PLOS Genetics. 2014. Mirrors: Archive.org | Local excerpts with notes
- [^taft-2007]:Taft, Ryan J. et al. "The relationship between non-protein-coding DNA and eukaryotic complexity." BioEssays. 2007. Mirrors: University of Kentucky
- [^lui-2013]:Lui, Guosheng et al. "A meta-analysis of the genomic and transcriptomic composition of complex life." Cell Cycle. 2013.
- [^comai-2005]:Comai, Luca. "The advantages and disadvantages of being polyploid." Nature. 2005.
- [^suda-2004]:Suda, Jan et al. "Cytotype Distribution in Empetrum (ericaceae) at Various Spatial Scales in the Czech Republic." Folia Geobotanica. 2004.Mirrors: marekbanas.com
- [^memim-2017]:Memim Encyclopedia. "C-Value." 2017.
- [^cavalier-smith-1999]:Cavalier-Smith, T. et al. "The skeletal function of non-genic nuclear DNA: new evidence from ancient cell chimeras." In "Structural Biology and Functional Genomics." Springer. 1999.
- [^beaulieu-2008]:Beaulieu, Jeremy M. et al. "Genome size is a strong predictor of cell size and stomatal density in angiosperms." New Phytologist. 2008.
- [^gregory-2001]:Gretory, T. Ryan. "The Bigger the C-Value, the Larger the Cell." Blood Cells, Molecules, and Diseases. 2001.
- [^toms-hardware-2014]: "Why Titanfall's Install Requires 48 GB: Uncompressed Audio." Tom's Hardware. 2014.
- [^everywhere-2013]:Reddit user west_of_everywhere. "Comment on How come there's a Amoeba with 200 times larger gene set than humans?" Reddit. 2013.
- [^gregory-2007]:Gregory, T. Ryan. "The onion test." Genomicon Blog. 2007.
- [^mattick-2010]:Mattick, John. "Video Q&A: Non-coding RNAs and eukaryotic evolution - a personal view." BMC Biol. 2010.
- [^parish-2014]:Parrish, Kevin. "Why Titanfall's Install Requires 48 GB: Uncompressed Audio." Tom's Hardware. 2014.
- [^kkrieger]:.kkrieger official website. 2012.