Supplement to Genomics and Postgenomics
The ENCODE Project and the ENCODE Controversy
The ENCyclopedia Of DNA Elements (ENCODE) project was an international research effort funded by the National Human Genome Research Institute (NHGRI) that aimed to identify all functional elements (FE) in the human genome (ENCODE Project Consortium 2004). FEs include, for instance, protein-coding regions, regulatory elements such as promoters, silencers or enhancers and sequences that are important for chromosomal structure. The project, which began in 2003 and included 442 researchers during its main production phase, came to a conclusion in 2012 with the publication of 30 different papers in different journals (ENCODE Project Consortium 2012; Pennisi 2012). Similarly to the HapMap project, ENCODE was presented as the logical next step after the sequencing of the genomic DNA, since tackling the interpretation of the sequences was now seen as the top priority (ENCODE Project Consortium 2004).
The ENCODE project incited a heated debate in academic journals, the blogosphere and also in the national and international press. The crucial claim that incited much ire was the project’s conclusion that 80.4% of the human genomic DNA has a ‘biochemical function’ (ENCODE Project Consortium 2012). To understand the strong reaction this statement provoked we have to turn our focus again to the C-value paradox and the concept of ‘junk DNA’ (see Section 2.3 of the main text). In the context of the ENCODE controversy this debate was linked with the issue of how to define a ‘functional element’ and how scientists ascribe functions in biological systems. What the ENCODE research implied, at least in the eyes of some commentators, was that the idea of junk DNA was proven wrong, because almost all of our DNA turned out to be functional. This led to claims that textbooks will have to be re-written, as they still describe the genome as mainly composed of junk.[S1] The defenders of the old view claimed that the ENCODE researchers set far too low a bar in ascribing functions to elements of biological systems.
The Methodology of the ENCODE Project
The ENCODE project used a range of different experimental assays to analyse what they referred to as ‘sites of biochemical activity’ (for an overview of the ENCODE output see Qu & Fang 2013). These are sites at which some sort of modification can be identified (for instance methylation) or to which an activity (such as transcription of DNA to RNA) can be ascribed. These modifications or activities were taken as strong indications that the identified regions of the genomic DNA play a functional role in human cells.
As an example of how this approach worked, ENCODE researchers were interested in finding out how much of the genomic DNA is involved in the regulation of gene expression. Researchers postulate that a key hallmark of all regulatory DNA elements is their accessibility. This makes sense as the regulatory and transcriptional machinery need access to these DNA sites. ENCODE used this feature of regulatory DNA to map (putative) regulatory elements in the human genome. One way to do so is to perform what is called a ‘DNase I hypersensitivity assay’. DNase I is a protein that can cut DNA and this cutting process works better when the template DNA is accessible, meaning that highly accessible regions are more sensitive to DNase I activity. The behaviour of the genome in the DNase I hypersensitivity assay can therefore be used to learn indirectly about its structure, from which researchers then infer the presence of a functional element (in this case a regulatory sequence). This is just one example of about 24 different types of assays that ENCODE researchers used to get a better insight into the number and distribution of functional elements in the human genome (for a discussion of the different types of experimental approaches used in ENCODE see Kellis et al. 2014).
What is interesting about most of these assays is that they look at a proxy for function: if a stretch of DNA is hypersensitive to DNase I then it is automatically defined as functional. Another example is DNA transcription itself. If a DNA sequence shows up in RNA sequencing then this means it has been transcribed into RNA by the enzyme RNA polymerase. This activity, in the eyes of the ENCODE researchers at least, makes the DNA element in question a functional element of the genome.
But such a broad approach to finding out about functional elements is highly problematic, as a transcription event or hypersensitivity can be present for many different reasons (for instance as a result of transcriptional noise). This is exactly what some critics of ENCODE homed in on, pointing out that merely showing the existence of a structure (such as methylation) or a process (such as transcription) is not enough by itself to prove any functional significance of these biochemical features (Doolittle 2013; Eddy 2012; Graur et al. 2013; Niu & Jiang 2013).
Whilst this is surely a valid point that applies to a large part of the research done within ENCODE, not all studies performed as part of the project looked at such proxies. An example is (Whitfield et al. 2012), who did not just look at specific modifications or behaviour of DNA in particular assays but mutated specific sites to check whether the interference with these sites has an effect on gene expression.
The above argument about how we learn about functional elements presumes that we already have an understanding of what it means to be ‘functional’. But it is by no means clear how biological function should be defined and there are competing accounts of what it is that makes an entity functional. These discussions about the concept of a biological function were central to the dispute surrounding the ENCODE project.
The ENCODE Controversy
Especially in the early critiques by Doolittle (2013) and Graur et al. (2013) the distinction between ‘selected effect’ (SE) function and the causal role (CR) function of an entity or process figured prominently. The ENCODE project, so the critics, simply ignored key work by philosophers and theoretical biologists on this topic, thereby making a complete muddle of what they are talking about when they use the term ‘function’. With more conceptual clarity, they argued, the claim that 80% of our DNA is functional would not be tenable and the established notion of ‘junk DNA’ would be saved.
The definition of function and functional analysis in biology deserves an SEP entry of its own. Here we will limit ourselves to a few comments on this issue that relate to the ENCODE controversy specifically. The key point is that SE functions are functions that are assigned to conserved sequences. The SE account aims to answer the question of why an element is there: a functional element according to this definition is an entity whose presence has a positive effect on the survival or reproduction of the organism; meaning the entity has been selected for (Millikan 1984, 1989a; Neander 1991; Griffiths 1992, 1993; Godfrey-Smith 1994). If a gene has been selected for it is expected that its sequence will be conserved: mutations within it will be selected against, and will be less frequent than in sequences that are not being maintained by selection. History matters for this account, which is also why it is sometimes referred to as the etiological account of function, going back to a paper by Wright (1976) but see Millikan (1989a) on how the etiological account relates to Wright’s original account).
CR functions on the other hand do not depend on the history of the system. What the CR account answers are ‘how’ questions in relation to the capacities of a system (Millikan 1989b). It is only the here and now that matters for the CR account: functional analysis is about analysing a system with capacity C into sub-capacities that are attributed to elements of the system and which contribute to C (Cummins 1975). The CR account is in an important sense more liberal than the etiological account: according to CR anything can be deemed a functional element as long as it is part of a system and plays some causal role contributing to some system capacity we happen to be interested in.
Graur et al. (2013) claim that ENCODE worked with the CR account but that this is a mistake, as biologists actually work with the SE account, a claim that can also be found in Doolittle 2013. They acknowledge that biologists might study CR functions (for instance when doing deletion experiments) but claim that even if scientists do so they take these causal roles simply as indicative of SE function, which is the ‘true’ function of a biological element (Doolittle 2013; Graur et al. 2013). It is with this focus on SE functions that these critics bring us back to the C-value paradox and the strong case one can make for the importance of the junk DNA concept.
The deep problem is that there is simply not enough conserved DNA in humans to match the high percentage of functional DNA the ENCODE project came up with. Accepting current estimates that between 5 and 10% of the human genome is conserved (Lindblad-Toh et al. 2011; Ward & Kellis 2012) then there is clearly no correlation between the amount of sequence under evolutionary constraint and what is called ‘functional’ by the ENCODE consortium. Graur et al. (2013) call this the ‘ENCODE incongruity’.[S2]
As already pointed out above this critique is based on a claim about which functional account is actually used by scientists. This appears to be taken as an empirical claim, though perhaps what matters more is how scientists ought to understand functional language. This, in turn, is likely to depend on what their aims are. Either way, this is an important point, because once we think in terms of SE functions DNA conservation immediately becomes salient. If, however, it turns out that scientists don’t (or shouldn’t) use the SE account (as is claimed, for instance, by Elliott et al. 2014; Germain et al. 2014; Amundson & Lauder 1994; Griffiths 1994, 2006) this critique loses much of its force as the ENCODE incongruity is no longer a problem.
This is exactly the point on which a recently published critique of the critics picks up. (Germain et al. 2014) claim that the critics of ENCODE simply misunderstood the nature of the project, as they did not take into account that the ENCODE project was part of a biomedical discovery process. As such the project was concerned to find out about elements of the human genome that might engage in relevant biochemical processes. What makes a sequence or activity relevant in the biomedical context it is not whether it is conserved but whether its absence or presence has a potential effect on activities or entities that are of relevance to biomedical research. The CR account, Germain and co-workers claim, is therefore the right account to use in this context and the ‘ENCODE incongruity’ is no longer a relevant issue.
In all of this the ENCODE researchers themselves did not stay silent. It is interesting to note that in a reply to their critics key ENCODE members toned down their claims about the percentage of functional elements present in the human genome – the 80.4% number is not mentioned again (Kellis et al. 2014). In fact, no numbers are mentioned in this paper and the authors remark that in their opinion creating an open access resource (i.e., the ENCODE library) is “far more important than any interim estimate of the fraction of the human genome that is functional” (2014: 6136). The authors also point out that in their eyes all experimental and theoretical approaches to functional ascriptions have their limitations and that no account or assay will get it right on their own, which is why they advocate both a methodological and theoretical pluralism, again defusing many of the stronger claims made earlier on both sides of the dispute.
The issue (Germain et al. 2014) raise concerns the type of scientific project ENCODE is. As also in the context of the HapMap project, we encounter the long-standing dispute about the value of hypothesis-free or exploratory research (see Section 3.1.3 of the main text). Eddy (2013), for instance, laments that the project was originally a mapping project but was then spun retrospectively as a project that aimed at testing a hypothesis. Graur et al. (2013) also make the point that ENCODE overstepped their remit of a big science project - which, they claim, is simply to provide data – and that the ENCODE researchers ventured into ‘small science’ territory by trying to deliver an interpretation of that data. In contrast to the criticisms the HGP originally encountered, these modern-day critics don’t have a problem anymore with the idea of a descriptive mapping project; their worry is rather that the project is sold as something it isn’t.