Notes to Genomics and Postgenomics
1. DNA methylation refers to the process in which a methyl group (-CH3) is covalently added to another molecule (in this case DNA). The importance of this process will be discussed further below.
2. A minor problem that we shall not discuss is that some viruses do not contain DNA at all, but use RNA to serve their basic genetic functions. This is a fairly technical problem and not too hard to resolve with a qualification to the definition. We should note, however, that it is not a problem at all for the account of the genome that we favour (below).
3. Though perhaps this is not an indefensible move. A more radical way to approach this issue is to abandon the idea of ‘the’ genome of an organism and treat both the organism and its genome as complex symbiotic wholes, “holobionts”, polygenomic entities with multiple and diverse genomes (Dupré 2010; reprinted as Dupré 2012: ch. 7).
4. Non-coding DNA is defined as DNA that does not code for proteins. It is estimated that only about 1.5% of human chromosomal DNA codes for proteins (International Human Genome Sequencing Consortium 2001, 2004). The ENCODE project (see the Supplement) and other recent research on DNA transcription has shown that a large part of chromosomal DNA (up to 74%) is transcribed into RNA, the majority of which does not code for proteins (which is why it is also called non-coding RNA (ncRNA)), but a substantial though debated proportion of which is known to serve some function. For an overview of types of ncRNA see Wright and Bruford 2011. It is estimated by some that there could be as many or more genes for non-coding RNA than for coding RNA, bringing up the gene count from 20,000 to at least 40,000 or more (Harrow et al. 2012; Clark et al. 2013).
5. Keller (2011) identifies four different definitions of the term: ‘genome-1’ is defined as the set of genes, ‘genome-2’ as the set of chromosomes in a cell. ‘Genome-3’ is defined as the organism’s DNA, whereas ‘genome-4’ refers to all the genetic material of an organism.
6. For further discussion of the broader processual view of biology of which this view of the genome is a vital part, see Dupré 2012; Nicholson and Dupré forthcoming.
7. We shall not address here the fascinating history of sequencing technologies, from the laborious manual techniques of the 70s and 80s, to the machines that today can provide gigabytes of sequence data in a few hours. Note that as crucial as speed is accuracy, and in actual sequencing practice a large number of runs is required to reduce errors.
8. These discussions about the different modes of the experimental sciences (exploratory vs. hypothesis- or theory-driven experimentation) also became an important topic in philosophy of science (see, e.g., Steinle 1997; Burian 1997; Franklin 2005; Elliott 2007; O’Malley 2007; Waters 2007a; Karaca 2013). Philosophers of science also tackled the important issue of how these different modes of research are integrated with each other (Burian 2007; O’Malley et al. 2010; O’Malley & Soyer 2012).
9. Fittingly, the first genome of a single person to be published was that of James D. Watson himself, whose sequence was released online in 2007 and published in an academic journal in 2008 (Wheeler et al. 2008). The online release of Watson’s genome was quickly followed by the publication of Craig Venter’s genome (Levy et al. 2007).
10. By enlarging the range of sequenced genomes the HGP also had a crucial impact on what is now called ‘comparative genomics’. The power of this discipline lies in the insights it can give us into the relations between different organisms or species at the genetic level (Touchman 2010). The comparison can take place at different resolutions (overall genome size, number of chromosomes, genes or, most importantly perhaps, nucleotide-by-nucleotide alignments of different sequences). The field has been of crucial importance to phylogenetics and has also attracted the attention of philosophers (see, e.g., Moss 2006; Piotrowska 2009; Perini 2011). Even though, due to space restrictions, we could not include a section dedicated to comparative genomics, we will discuss crucial aspects of this practice throughout this entry, as its tools and methods play a key role in many of the projects we touch upon (such as the HapMap discussed in Section 3.1 or the ENCODE project discussed in a supplement).
11. Note that we can find the term ‘junk DNA’ earlier, for instance in a paper by Ehret and De Haller (1963), where the authors mention that
[w]hile current evidence makes plausible the idea that all genetic material is DNA (with the possible exception of RNA viruses), it does not follow that all DNA is competent genetic material (viz. “junk” DNA) […]. (Ehret & De Haller 1963: 39)
See Graur 2013, for more on the history of the term ‘junk DNA’.
12. Note how assumption (a) neglects the key examples that gave rise to the C-value paradox.
13. As Evelyn Fox Keller (2011) points out, this concept of junk DNA became a staple of genomic thinking in the 1980s, in particular through the publication of two papers (Doolittle & Sapienza 1980; Orgel & Crick 1980) that linked Richard Dawkins’ (1976) concept of selfish DNA to Ohno’s notion of junk DNA.
14. The official NCBI website containing the HapMap resource data was decommissioned in June 2016 due to a security issue that was uncovered in a computer security audit (see the link to ‘Decomissioned NCBI HapMap Resource’ in the Other Internet Resources section). The NCBI justifies the decision to decommission the resource by a decline in usage that could be observed over recent years. This has sparked a conversation amongst scientists who believe in the continuing importance of the HapMap resource. Parts of these discussions can be found on Twitter under the hashtag #saveHapMap.
15. The term ‘allele’ is often used to refer to different variants of a gene but the term is now used more liberally and can simply refer to variants of any locus on chromosomal DNA. See for instance talking glossary: allele or scitable: allele under OIR.
16. Note that 1% is an arbitrary value. Some authors, for instance, set the threshold at 5%.
17. Meiosis is the special type of cell division that happens only in an organism’s reproductive cells (gametes). During meiosis homologous chromosomes (i.e., the maternal and paternal version of each chromosome) become fragmented and are then rejoined in a process called ‘crossing-over’. This process leads to the (homologous) recombination of parts of the two chromosomes. If a crossing over happens between two different loci on a chromosome the alleles at these two loci become separated. If the loci are closer to each other then the likelihood of a crossing over event taking place between them is smaller, leading to a higher association level between the two alleles.
18. For a short history of the CD/CV hypothesis see Box 2 in Visscher et al. 2012.
19. Francis Collins was quoted as saying that the HapMap will be “the single most important genomic resource for understanding human disease, after the sequence” (Couzin 2002).
20. Hall (2010) also cites other prominent researchers, such as Walter Bodmer or David Goldstein, who were critical of the HapMap project.
21. So US president Bill Clinton talking on June 26, 2000 at the presentation of the draft genome at the White House:
With this profound new knowledge, humankind is on the verge of gaining immense, new power to heal. Genome science will have a real impact on all our lives—and even more, on the lives of our children. It will revolutionize the diagnosis, prevention and treatment of most, if not all, human diseases. (White House 2000)
22. For an overview and discussion of the literature on race (and ethnic categories) as socially and historically constructed see Fujimura et al. 2014: 209. Lisa Gannett claims that the concept of race has never really been abandoned in biology but was rather transformed from a typological into a population-based understanding of race (Gannett 2001). For more on the history of the race concept see Stolley (1999), Marks (2008), Yudell (2011), and the SEP entry on race.
23. The anomaly consists of the discrepancy between the proportion of living microbes that can be detected in the original sample using a microscope and the proportion of microbes of that same sample that then grow on an agar plate, the key means of culturing microbes.
24. This step is usually performed in the bacterium E. coli. See Streit and Schmitz 2004, National Research Council 2007, and Ekkers et al. 2012 for a discussion of the methodological difficulties such an approach can encounter.
Notes to the Supplement
S1. Interestingly, the term ‘junk DNA’ does not appear in any of the main consortium publications and there are no claims made about the ‘death’ of the junk DNA concept (see ENCODE Project Consortium 2007, 2012). But as Germain et al. (2014) point out, we find the term in commentaries such as Ecker et al. 2012, Pennisi 2012 with Pennisi in particular claiming that the ENCODE project has written a ‘eulogy’ for junk DNA. It is also these editorials (or articles written in newspapers such as the New York Times) to which Doolittle (2013) turned in his critique of the ENCODE project and its claims about the demise of the junk DNA concept. But see Eddy (2013) for the claim that some of the ENCODE leaders themselves spun the project towards the textbooks-are-wrong narrative in their attempts to popularize the project.
S2. It has to be noted here that this is in fact something the ENCODE researchers already noted in the publication reporting on the pilot phase of the project (ENCODE Project Consortium 2007). The fact that their assays picked up so much more biochemical activity than what would be expected if we only look at conserved regions of the genome was something they repeatedly highlighted and also tried to explain (ENCODE Project Consortium 2007).