In 1996, it was reported that a repeat length polymorphism in the promoter region of the human serotonin transporter gene (SLC6A4; also known as 5-HTT) regulates gene expression in vitro. Furthermore, individuals carrying one or two copies of the relatively low-expressing short (S) allele of the serotonin transporter linked polymorphic region (5-HTTLPR) exhibit elevated neuroticism, a personality trait involved in the propensity to depression (1). In 2002, it was reported that S-carriers exhibit elevated amygdala reactivity to threatening stimuli, as assessed by functional MRI (2). In 2003, it was reported that S-carriers exhibit elevated depressive symptoms, diagnosable depression, and suicidality after experiencing stressful life events and childhood maltreatment (3).
This article takes an inclusive approach to the literature on 5-HTT and stress sensitivity, as opposed to an exclusive focus on papers attempting to approximate the methods of the initial report of a 5-HTTLPR GxE interaction (3). An inclusive review is essential once it is understood that the hypothesis of interest is that variation in 5-HTT influences reactivity to environmental stress exposure, and thereby brings about risk for depression. Accordingly, in many studies testing the 5-HTT stress-sensitivity hypothesis, the outcome is not depression per se. Rather, inferential advantages are gained by studying intermediate phenotypes on the causal pathway from stress to depression that are considered to index stress sensitivity (e.g., stress hormones, amygdala reactivity). Likewise, stress is not narrowly construed as a count of stressful life events. Other stressors are examined in the field and in the laboratory, whenever doing so augments scientific inference (e.g., hurricane exposure rules out gene-environment correlation because victims' genes could not evoke this life event; officially recorded child abuse rules out recall bias; experimental stress induction allows titration of stress dosage). Because the outcome is not restricted to human depression, important information comes from studies of 5-HTT and stress sensitivity in animals (e.g., genetically modified mice, rhesus macaques carrying an orthologous 5-HTTLPR variant).
It is evident from research conducted with multiple species and from research using both observational and experimental methods that variation in 5-HTT modifies organisms' stress responses to their environments (Figure 1). Complementary experimental and observational research designs are integral to testing not only the 5-HTT stress-sensitivity hypothesis, but all GxE hypotheses (4, 5). Experiments with humans, nonhuman primates, and rodents elucidate biological mechanisms behind the hypothesis and also validate findings from human observational studies by using designs with stronger internal validity (e.g., by random assignment to stress conditions). Observational studies use designs with stronger external validity (e.g., by studying real-world stressors), estimate the effect size of the 5-HTTLPR GxE interaction in the human population, and allow researchers to study clinical depression as the outcome.
Role of 5-HTT Variation in Stress Sensitivity as Underscored by the Coherence of Findings From Hypothesis-Driven Studies in Multiple Species Employing Multiple Methodologies
Human observational studies
The initial GxE effect (3) did not have an overwhelmingly impressive p value, but it was robust, having been 1) discovered in an epidemiologically sound longitudinal cohort study; 2) tested in a straightforward and transparent analysis; 3) reproduced across two stressors, child maltreatment and adult stressful life events; and 4) reproduced across four depression phenotypes. How has this hypothesis fared in observational studies since it was initially tested?
Table 1 and Table 2 list all human observational studies up to summer 2009 that tested the hypothesis that the 5-HTTLPR moderates the effect of stress on depression phenotypes. Three observations emerge from the tables. First, multiple studies have reported that S-carriage moderates the influence of stress on depression. Whether or not the initial finding can be replicated has been answered in the affirmative. Second, positive findings have emerged from a variety of observational research designs used to test the hypothesis, including phenotype case-only designs, case-control designs, cross-sectional designs, longitudinal designs, and exposure designs. This suggests the finding is "sturdy," in the sense that its signal can be detected despite noise from varying research settings, sample characteristics, and study designs (6). Third, there have also been quite a few negative findings. The degree to which negative findings call the original result into question depends on whether differences in study designs are systematically related to differences in study findings. If failures to replicate are characterized by systematically different subject populations or systematically weaker methodologies, their challenge to the original result is greatly diminished.
Human Observational Studies Testing the Hypothesis That the 5-HTTLPR Moderates the Effect of Stress on Depression Phenotypes in Studies of Specific Stressors
Human Observational Studies Testing the Hypothesis That the 5-HTTLPR Moderates the Effect of Stress on Depression Phenotypes in Studies of Stressful/Adverse Life Events
We considered factors that might covary with positive versus negative findings, including subjects' sex, age, and nationality, and features of phenotype measurement, but these did not covary systematically with findings. However, positive and negative findings did closely track variation in methodological features related to the quality of environmental exposure measurement. Concerns have been expressed about standards of stress assessment in tests of this hypothesis (7, 8). We call attention to three issues.
First, almost all nonreplications rely on brief self-report measures of stress, whereas studies using objective indicators or face-to-face interviews to assess stress exposure yield positive replications("Stress Assessment" column in Tables 1 and 2). Face-to-face interviewers can clarify the meaning of a reported life event and enhance memory for life events by probing and by using techniques such as life event calendars, as did the initial study (3). In contrast, it is known that self-report event checklists gather idiosyncratic and inaccurate information (9, 10).
Second, studies of specific stressors consistently yield positive findings. Why are these studies so consistent? One possibility is that their focus on a specific, homogeneous, developmentally relevant, and clearly operationalized depression-inducing event decreased between-subject heterogeneity in the exposure and enhanced internal validity of the study design. Table 1 groups studies of two specific stressors that are established causes of depression: childhood maltreatment and medical illness. Nine studies report about depression that follows childhood experiences associated with maltreatment and victimization. Although exposure measurement is not uniform, the studies are united by focusing on threatening events in which physical, sexual, or relational harm were carried out or intended. Virtually all of these studies focus on children, adolescents, and young adults. All of them show that S-carriage moderates the association between child maltreatment and depression. Another nine studies report about depression following medical illness. Virtually all focus on middle-aged and elderly participants. Studies of patients suffering hip fractures, strokes, Parkinson's disease, heart disease, and chronic-disease load show that S-carriage moderates the association between medical illness and depression.
Third, whereas studies of specific stressors consistently generate positive findings, studies of stressful/adverse life events yield mixed results (Table 2). This inconsistency could result from the highly variable measurement of stressful life events (7, 11). The pool of studies exemplifies five difficulties in stress measurement: 1) Stress measures are sometimes noncomparable and fall prey to the fallacy that because measures have the same name they measure the same construct (12). For example, some studies count death of a spouse as a stressor, whereas others count being the child of a father in an unskilled job as a stressor. Some studies count stress events, others model event severity. Some stressors are chronic, others acute. Some studies define a "stressor" by its level of distress, others do not. Some studies examine events that happened to the proband, others examine events among the proband's friends and relatives. 2) Some studies assess stress through currently depressed individuals' self-reports, which are biased by mood-congruent memory revision and thus overcount events (13). Moreover, humans seek explanations, a phenomenon termed "effort after meaning," which leads respondents who have been depressed to misattribute their illness to a life event. Some studies assess events through long-term retrospective reports (sometimes over decades), which are flawed by forgetting and undercount life events, particularly among respondents who lack depression. In addition, respondents often overcount trivial and undercount severe events (9). These cognitive processes (mood-congruent memory revision, effort after meaning, and retrospective forgetting) working together can artifactually influence a study's association between life events and depression. Thus, a correlation between life events and depression does not indicate validity, contrary to claims (14). 3) Some studies test the connection between stress and depression contemporaneously, others across years or decades. 4) Some studies are unable to rule out reverse causation, in which depression precipitates stressful events; for example, one study measured depression over the respondents' lifetime, but ascertained life stress during only the past year (15). 5) Most studies do not consider variation in participants' depression history, despite evidence that stress is more relevant for initial than recurrent depression episodes.
In the first decade of research about the 5-HTTLPR GxE interaction, scientists have frequently taken advantage of existing data sets, quickly adding genotype data to studies that had previously measured depression and life events for other purposes. Not all of these studies' designs and measures are well-suited to testing the GxE hypothesis. Covariation between poor measurement quality and negative findings was observed early on (16) and has been confirmed with the increasing number of published GxE studies (17). Notably, many of the largest studies in Table 1 and Table 2 were obliged to collect brief retrospective self-reports of stress through telephone interviews or postal questionnaires in order to contain data collection costs. Thus, unfortunately, large sample size tends to coincide with poor measurement quality, and meta-analyses that give larger samples greater weight in estimating an effect across studies further compound this problem. There is hope that a new generation of cohort studies purpose-built for testing GxE interactions will improve replicability, but these must correct the problems of exposure measurement discussed in the previous paragraph, lest they merely repeat the problems on a far larger scale.
Most observational GxE research on 5-HTT in humans has focused on depression. However, additional evidence links the 5-HTTLPR to a broader range of stress-reactive phenotypes, including PTSD (18), posttrauma suicide attempt (19), aggressive reactions to a cold-pressor test (20), stress-linked alcohol consumption (21, 22) and substance use (23, 24), stress-related sleep disturbance (25), and even premature ejaculation (26). Research on quantitative endophenotypes shows that S-carriers with high levels of childhood maltreatment and adversity exhibit enhanced anxiety sensitivity (27) and a bias toward perceiving and expecting negative outcomes (28). Moreover, S-carrying children who are raised by unresponsive or nonsupportive mothers exhibit poor self-regulation of negative affect (29—32), which predicts a variety of adult psychiatric disorders (33). Finally, research that monitors affective experiences on a daily basis shows that S-carriers experience anxious mood on days with more intense stressors (34) and larger increases in negative affect while trying to quit smoking (35). To claim that these diverse outcomes are heterotypic manifestations of a unifying genetic vulnerability to stress reflected in the 5-HTTLPR S allele requires a theory that specifies the unifying mechanism. The leading theory (1, 36) is that the 5-HTTLPR is a genetic substrate for a latent personality trait, termed negative affectivity or neuroticism. Negative affectivity prospectively predicts risk for all stress-related psychiatric disorders (37). In theory, 5-HTTLPR S-carriers are characterized by the stable trait of negative affectivity that is converted to psychopathology only under conditions of stress, just as glass is always characterized by the trait of brittleness but shatters only when a stone is thrown. Negative affectivity represents the potential for excitability of anxiety and fear neural circuits, and is characterized by an attentional bias toward negatively valenced information and a cognitive sensitivity to perceive threat (38). This trait is operationalized in all experimental tests of the 5-HTTLPR GxE hypothesis, reviewed next.
Experimental neuroscience studies
In 2002, a synergy emerged between research in human affective neuroscience and genetic research into the 5-HTTLPR. Specifically, noninvasive functional MRI (fMRI), which assays information processing within distinct neuronal circuits, revealed relatively exaggerated threat-related amygdala reactivity in carriers of the 5-HTTLPR S allele (2). This initial finding has since been repli-cated in independent samples of both healthy volunteers and psychiatric patients, using a multitude of threatening stimuli and neuroimaging modalities (39—49). This effect on the magnitude of amygdala reactivity has recently been extended, with S-carriers also exhibiting a relatively faster response than L-allele homozygotes (50). Consistent with the heightened sensitivity to environmental threat documented in S-carriers, recent work suggests that the effects of the S allele on amygdala function may be unique to stimulus-provoked amygdala reactivity and not elevated baseline levels of activation (51—53).
The bias in threat-related amygdala reactivity associated with the 5-HTTLPR S allele is positioned to drive the polymorphism's associations with altered mood and affective disorders, especially in interaction with exposure to environmental stressors and trauma. Evidence from animal and human studies demonstrates that the amygdala mediates both physiological (e.g., autonomic reactivity) and behavioral (e.g., reallocation of attentional resources) effects that allow an individual to respond to environmental and social challenges (54). Neuroimaging studies have reported positive correlations between indices of anxiety and amygdala reactivity to affective stimuli (especially threatening stimuli) (55). Such findings demonstrate that variability in the magnitude of threat-related amygdala reactivity predicts individual differences in sensitivity to environmental threat and stress.
Human neuroimaging research suggests that relatively increased amygdala reactivity associated with the 5-HTTLPR S allele is likely to reflect both the functional and structural architecture of a distributed network of brain regions. Research suggests that this network communicates information about the environment to the amygdala and relays signals between the amygdala and regulatory circuits in the medial prefrontal cortex. This putative mechanism is further underscored by the significant role serotonin signaling plays in the general development and function of this extended neural network (56). The S allele has been associated with altered functional coupling (as indexed by correlated fMRI signal strength) between the amygdala and regions of the medial prefrontal cortex (40, 57). These medial prefrontal regions integrate amygdala-mediated arousal and down regulate amygdala reactivity. Medial prefrontal regions are also involved in the extinction of conditioned fear responses, which are dependent on amygdala circuitry.
The pattern of 5-HTTLPR-associated differences in the functional dynamics of the amygdala and medial prefrontal cortex is echoed in structural measures within this same network. Specifically, the S allele has been associated with relatively decreased gray matter volume in the amygdala and medial prefrontal cortex (42, 57). The S allele has also been associated with alterations in the microstructure of the uncinate fasciculus, the white matter fiber bundle providing the majority of connections between the amygdala and medial prefrontal cortex (58). Individual differences in uncinate fasciculus microstructure correlate with trait anxiety (59). In addition, postmortem tissue analyses have associated the 5-HTTLPR S allele with relative enlargement of the pulvinar, which relays visual information to subcortical and higher cortical brain regions (60). Consistent with this, as well as with amygdala-mediated behavioral arousal, numerous studies have reported increased cortical activity in response to experimental provocation in S-carriers (61—66).
In addition, a growing group of studies has begun to document effects of the 5-HTTLPR on intermediate behavioral and physiological processes that map onto these alterations in brain structure and function. The S allele is associated with increased acquisition of conditioned fear responses (67), increased auditory startle response (68, 69), and greater sympathetic reactivity when simply observing another person receiving shock (70). Moreover, the 5-HTTLPR S allele has been associated with increased HPA axis reactivity to aversive or threatening stimuli in a number of studies (71—74). The S allele typically has no impact on baseline levels of HPA function in these studies, underscoring its documented effect on threat-related amygdala reactivity. In addition, the S allele has been linked with difficulty disengaging from, or preferential attention toward, threat-related stimuli (75—79), a more negative information-processing bias (80), emotion-induced retrograde amnesia (81), sensitivity to financial loss (70, 82), and even social blushing (83). Although this literature is not without inconsistencies (e.g., some reported associations are sex-specific and others have not replicated), it does suggest that the effects of the 5-HTTLPR S allele on the brain's neural circuitry for responding to environmental threat and stress translate to biases in both behavioral and physiological processes which may, in turn, shape individual risk for depression upon exposure to acute trauma or chronic stressors (Figure 2). Multiple components of this ongoing research were highlighted in one report of increases in threat-related amygdala and medial prefrontal cortical activation as well as heart rate and startle amplitude in 5-HTTLPR S-carriers who also exhibited a self-reported sensitivity to perceived danger in the environment (28).
Figure 2.How the 5-HTTLPR Affects Neural Circuitry for Responding to Environmental Threat and Stress
a Implicated in humans and nonhuman primates.
b Implicated in humans, nonhuman primates, and rodents.
Rhesus monkeys have an orthologue of the human 5-HTTLPR, making them an excellent model species for GxE studies. Like the human variant, the rhesus S allele is associated with decreased transcriptional efficiency in vitro (84). The modulating influence of the polymorphism on early life stress has been tested by separating infant rhesus monkeys from their mothers and rearing them with other infants (a long-established model of early life adversity in this species). During initial episodes of separation, monkeys carrying the rh5-HTTLPR S allele exhibit less "protest" and self-directed behaviors that are considered active coping responses to this stressor (85). Instead, separated S-allele monkeys display greater anxiety, agitation, stereotypies, and an exaggerated HPA axis response (85, 86).
The modulating influence of the rh5-HTTLPR on separation in infancy persists into later life, manifesting, for example, as higher ACTH responses to stress in S-carrier monkeys than LL homozygotes (87). It is important to underscore that these long-lasting phenotypic effects of the S allele only occur in monkeys exposed to maternal separation early life stress, echoing the GxE interaction observed in relation to human depression.
Another major parallel between the human and monkey data has been the finding that, as in humans, the stress-related S allele phenotype in monkeys is related to an intermediate neural phenotype characterized by abnormal corticolimbic structure and function. For example, the S allele in monkeys also has been mapped onto reduced gray matter volumes in the amygdala, medial prefrontal and orbitofrontal cortex, and pulvinar (88). Moreover, monkeys with the S allele exhibit greater metabolic activity than LL homozygotes in the amygdala and its networked cortical regions, including orbitofrontal cortex, in response to the stress of relocation (89). Given the importance of the orbitofrontal cortex in social behavior, abnormalities in this region might also account for the finding that S-carriers engage in less eye gaze with high status conspecifics and are more risk-averse in their presence (90). An intriguing development is recent data from S-carrier monkeys (88) and 5-HTT mutant mice (91) demonstrating that reversal learning, a measure of cognitive flexibility subserved by the orbitofrontal cortex (92), is enhanced as a function of relative 5-HTT gene deficiency. This may reflect increased sensitivity to negative environmental stimuli, although further work will be needed to substantiate this. Notwithstanding, these data indicate that altered 5-HTT gene function may influence multiple higher behaviors, as would be predicted if it affects a core corticolimbic neural circuitry.
Studies involving genetically engineered 5-HTT mutations in rodents
Research using rodents allows for experimental control over genetic background and the environment to a degree that is neither practically nor ethically feasible in human or even nonhuman primate studies (93). Although there is functional gene variation in the murine 5-HTT (slc6a4) (94), there is no rodent orthologue of the 5-HTTLPR. As an alternative approach, mice and rats have been genetically engineered with loss-of-function mutations in the 5-HTT gene. Studying the consequences of these mutations for behavior and brain function has greatly complemented the work on the 5-HTTLPR in primates and provided some key insights into the mechanisms that mediate the influence of 5-HTT on negative affect and stress reactivity (56, 95).
Mice in which the 5-HTT has been functionally excised either by targeted mutation or chemical mutagenesis exhibit heightened anxiety-like behavior, impaired fear extinction, and exaggerated HPA-axis responses to acute stress. While it is far less common to engineer mutant rats than mice, a 5-HTT-null mutant rat has been generated and also shows increased anxiety-like behavior (96). Furthermore, providing an interesting counterpoint to these "knockout" mutants, mice with transgenic overexpression of the 5-HTT actually produce decreased anxiety-like behavior (97). The consistency of these findings across models, laboratories, and species is rarely seen in the field of rodent behavioral genetics and illustrates the strong penetrance of the mutation's effects.
The "depression-related" consequences of rodent 5-HTT knockout mice are, at first blush, less consistent than the anxiety-related consequences, in that they are seen in some of the standard rodent assays for this behavior but not others. This variability may, however, be a legitimate reflection of differences in the level of stress evoked under varying test conditions. In support of this hypothesis, following repeated exposure to stress (e.g., forced swimming, tail suspension), 5-HTT knockout mice develop a depression-related "despair-like" phenotype that is not seen with single exposure (97). The parallels with the primate data showing that the S-allele influence on depression is contingent upon repeated stress exposure are clear.
Much of our understanding of the functional role of the 5-HTT as a master modulator of the 5-HT system has been built upon work in rodents (98). As such, researchers have a ready platform and toolset from which to perform certain neural and molecular analyses in 5-HTT mutant rodents (e.g., in vivo measurement of brain 5-HT availability) that cannot be employed in humans. One of the key themes to emerge from this work is that the neural consequences of 5-HTT gene mutation extend well beyond the 5-HTT and its role as a regulator of 5-HT availability. 5-HTT null mutation leads to alterations throughout the 5-HT system that include changes in 5-HT receptor binding and 5-HT synthesis (56, 95). At the systems level, 5-HTT knockout mice exhibit an abnormally high density of excitatory dendritic spines on amygdala neurons and an increase in dendritic arborization of prefrontal cortex neurons (56). The implication here is that influence of 5-HTT variation may not be limited to effects on 5-HT availability or even on the 5-HT system. Recently, this implication was confirmed in a rhesus macaque model (88), in which the rh5-HTTLPR S allele affected behavior and brain morphology but not 5-HTT (99) or 5-HT1A concentrations in vivo. Similar complexities in the likely molecular consequences of the 5-HTTLPR have been documented in humans (100—105). Collectively, mouse and monkey and human findings suggest that 5-HTTLPR's behavioral effects on stress-reactivity may be most consistently rooted in neural development.
An intriguing line of enquiry in this context has centered on the hypothesis that 5-HTT variation may in part modulate the capacity to cope with stress by shaping the early life development of corticolimbic circuitry (56). In fact, the importance of the 5-HT system in neurodevelopment has long been recognized, and the 5-HTT is known to be critical for the formation of cortical systems in particular (106, 107). Pharmacological inhibition of the 5-HTT during early life mimics the anxiety-like phenotype of 5-HTT knockout (108). Moreover, poor maternal care produces heightened anxiety-like behavior in mice with a partial (heterozygous) 5-HTT null mutation, which are phenotypically normal under conditions of good maternal care (109).
These findings raise the question of whether the effects of 5-HTT knockout are developmentally driven. It has been hypothesized that the 5-HTTLPR GxE interaction observed in relation to adult stressful life events should selectively affect people already "primed" by childhood adversity (8). This opens up some very interesting avenues for future animal studies. For example, would 5-HTT loss restricted to early life development be sufficient to increase anxiety and impair stress-coping? If so, is there a critical window and what is the corresponding ontogenic period in humans? Researchers could then elucidate the key neural and molecular changes underlying these effects. This could, in turn, "square the circle" by nominating mechanisms to target with novel therapeutic approaches in humans.
In the previous section, we reviewed evidence about the 5-HTT stress-sensitivity hypothesis. Lessons learned in this research apply broadly to all GxE research. In this section, we draw on these lessons to dispel some misconceptions and offer some constructive recommendations.
GxE hypotheses can be tested with large and small samples
Statistical power is critical for theory-free, exploratory scans for GxE interactions (110). This realization has prompted the creation of large case-control consortia and massive biobanks. A question that puzzles many readers is how to reconcile the obvious benefits of huge samples with evidence that GxE interactions have been reported in many small-sample studies of 5-HTT and stress sensitivity, particularly in studies comparing stress-exposed versus matched nonexposed groups (e.g., abused children) and in experimental studies of humans and animals. There are statistical reasons for this.
The problem has to do with the approach to testing interactions (111): If the product term (i.e., the interaction term in multiple regression) is calculated from two normally distributed symmetrical variables, it has restricted variance but is uncorrelated with the first-order predictors (Figure 3, top row). However, a product term of two categorical variables (e.g., minor allele frequency [MAF] of 25% and rate of exposure [Pexp] of 25%) is significantly correlated with the first-order predictors (Figure 3, middle row). Such is the case in practically all observational GxE studies of psychiatric phenotypes. As a result, the residual variance of the product term after factoring out first-order predictors—and the corresponding power to detect interactions—declines rapidly with minor allele frequencies and rates of exposure departing from 50%. The full power for testing interactions between categorical variables is only preserved in the optimal case where minor allele frequency and exposure rate equal 50% (bottom of Figure 3). An implication of this insight is that hypothesis-driven GxE studies that recruit participants on the basis of their genotype and their environmental exposure (e.g., experimental GxE studies with balanced cell sizes) are better powered to test for genetically moderated exposure effects than are observational field studies, which must make do with unequal-sized groups since these occur in nature.
Figure 3.How the Power to Detect GxE Interactions Depends on the Distributions of the Genotypes and Exposures in the Sample1
a The two rows of graphs demonstrate a key difference between interactions involving normally distributed continuous variables (top row) and those involving asymmetrically distributed categorical ones (middle row). If the product term A*B (i.e., the term that represents interaction in a multiple regression) is calculated from two normally distributed symmetrical variables A and B, it has a restricted variance (leptokurtic distribution) but is uncorrelated with the first-order predictors (i.e., the correlations between A and A*B [rA, A*B] and between B and A*B [rB, A*B] are zero). However, the product term G*E that represents two categorical variables (G: genotype with a minor allele frequency [MAF] of 25%; and E: categorical exposure in the population [PEXP] of 25%) is strongly correlated with the first-order predictors (i.e., the correlations between G and G*E [rG,G*E] and between E and G*E [rE,G*E] are substantial). As a result, the residual variance of the product term (bottom of figure) after factoring out first-order predictors, and the power to detect interactions, declines rapidly as the rates of exposure and minor allele frequency depart from 50%. The full power for testing interactions between categorical variables is only preserved in the special case of minor allele frequency equal to 50% and exposure rate of 50% (the top segment in red). "Density" reflects the proportion of individuals falling within each narrow band of values of the variable on the x axis.
GxE research can be carried out before as well as after replicated gene discovery
Some researchers claim that GxE studies should only be carried out if there exists a genotype-to-phenotype main effect, but this claim is statistically unwarranted 112). Such a strategy also precludes identification of environmentally dependent genetic effects that are small in absolute size or are contingent on relatively uncommon environmental factors (Figure 4). Moreover, genotype-phenotype association studies may not replicate if GxE interactions are operating and research samples differ on environmental risk exposure. Waiting for genomewide association studies (GWAS) to throw up candidate genes may be ill-advised because GxE interactions may conceal good candidates from GWAS. Inconsistent genotype-phenotype associations have inspired successful searches for GxE interactions in different fields of medicine, from asthma (113) to cardiovascular disease (114). Inconsistent associations between the 5-HTTLPR S allele and depression (115—117) prompted us to consider a GxE interaction in our initial studies of the 5-HTTLPR and depression.
Figure 4.How the Frequency of an Environmental Exposure in a Sample Influences the Ability to Detect Genetic Effects and GxE Interactionsa
a Panel A shows the influence of environmental exposure frequency on the ability to identify genetic effects, in two genotypes of equal prevalence. Genotype A shows no phenotypic response to the environmental exposure. Genotype B shows a response to the environmental exposure. What would happen if the association between genotype and phenotype were studied without knowledge of the environmental exposure and its frequency (shown from 10% to 90%)? A sample having many exposed subjects will report a genetic effect on the phenotype, whereas a sample having few exposed subjects will not, and if exposure is not ascertained, the source of nonreplication will remain a mystery. Panel B shows the influence of the rate of environmental exposure on statistical power to detect GxE interactions and main effects of genes. Each point is based on 10,000 simulations of samples of 1000 drawn from a population with equal distributions of two genotypes, with a continuous outcome generated as a moderately strong GxE (i.e., the difference in the environment-phenotype correlation between genetic strata = 0.3), and no main effect. In samples with exposure frequency close to 0, there is no detectable interaction or main effect. For exposure frequency below 50%, there is greater power to detect a GxE interaction (blue line) than to detect a main effect of genes (red line). With rates of exposure exceeding 50%, the power of detecting a direct effect of genes (red line) increases above that of detecting an interaction, even though interaction is the data-generating mechanism. The probability of detecting a spurious main effect of genes (or environments) remains at the 2.5% chance level across the range of exposure frequency if the interaction term is retained in the equation.
The Psychiatric GWAS Consortium (118) recommends conducting GxE studies only after convincing genotype-phenotype associations have been identified by 1) finding the disease susceptibility gene by conducting a GWAS, then 2) identifying the functional consequences of the putative causal variant, and only then 3) testing interactions between the variant and environmental factors. This strategy is presumed to offer a foolproof approach to detecting replicable GxE interactions. However, research in obesity illustrates this strategy may not work. FTO was found to be a susceptibility gene through GWAS (119), and FTO's functional consequences were identified (120—123). GxE research then documented that an active lifestyle mitigates obesity risk from FTO (124—127). However, this GxE interaction has not universally replicated (128, 129), in part because of cross-study differences in the quality of physical activity measurement. The moral is that a robust genotype-phenotype association cannot guarantee a robust GxE finding, because the study of GxE interactions requires more appropriate and high-quality exposure measurement.
GxE research is a helpful tool for gene discovery
Although most GxE research uses candidate genes, environmental exposures can also be used to discover novel loci. Indeed, one possible reason for the paucity of susceptibility genes in psychiatry is that gene-discovery studies have been searching for genetic effects on disease rather than for genetic effects on vulnerability to environmental causes of disease (130). Whereas in genetic association studies, a candidate gene is a gene suspected of being involved in a trait or disease—either because its protein product is relevant or because it has been uncovered in the course of association or linkage analysis of the phenotype—in GxE research a candidate gene is one plausibly related to the organisms' reactivity to the environmental risk or pathogen (131). The idea that genes may moderate the effect of environmental risk has direct implications for hypothesis-driven selection of novel candidate genes. For example, genes associated with the physiological response to psychological stress, particularly in the HPA axis, are natural candidates for GxE research on stress and depression (132). Genes regulated by hypoxia are candidates for GxE research on obstetric complications and schizophrenia (133). Genes involved in biosynthesis of fatty acids are candidates for GxE research on nutrition and brain development (134). Genes involved in lead absorption are relevant for research on attention deficits and hyperactivity (135). Genes involved in ototoxicity are relevant for research on learning difficulties (136).
Research on "candidate environmental risks" can be combined with theory-free genetics to discover novel loci in two ways. One way is to turn GWAS into Gene-Environment-Wide Interaction Studies (137). Theoretically, the ability to measure GxE interactions should sharpen measurement of gene-disease associations in subsets of the population and even potentially increase statistical power to detect such associations (137). This will become increasingly possible as researchers seek to integrate genome-wide information with information about environmental exposures gathered in the context of epidemiological studies. But sample sizes will become prohibitive when testing gene-environment-wide interactions because 1) more tests are involved, 2) tests for interactions have less power compared to tests for main effects, and 3) environmental exposures introduce additional measurement error. If genetic epidemiologists embrace purely agnostic, theory-free approaches and data-mining tools in studying GxE interactions, the "fishing expedition" may net little. The new generation of purpose-built Gene-Environment-Wide Interaction studies may be an improvement over opportunistic studies published in these early years of GxE research, but even these will fall short unless they attend to the measurement of environmental exposures. An alternative is to pursue study designs that use confirmed environmental effects on disease. Such "exposed-only designs" will test genome-wide associations comparing equally exposed individuals who do versus do not develop a disease in order to discover novel susceptibility loci. Examples of this design can be seen in research on infectious disease, whose starting point is pathogen exposure (138). The environmental risks (i.e., pathogens) for many psychiatric conditions are well established, if not always well measured. As such, the strong prior probabilities for environmental risks can be harnessed in psychiatry to design genome-wide studies focused on identifying genetic differences in responses to well-defined environmental risks. This approach to gene discovery will involve entirely different designs and sampling frames than currently used in case-control studies and biobanks.
A second way in which environmental exposures can be used to discover novel loci is to study gene expression (mRNA levels) as a quantitative phenotype, although attention needs to be paid to tissue informativeness (139). Gene expression profiling offers a powerful tool to identifying genomic responses to the environment by investigating responses to specific, well-operationalized, and reliably measured pathogens and stressors, including exposures to social adversities (140). By assessing genotype effects on gene expression levels (141), polymorphisms in environmentally responsive genes may be identified and then used to study why some people become ill when challenged by the environment and others do not. Incorporating environmental genomics into psychiatry may facilitate identifying susceptibility factors in environmentally induced psychiatric conditions.
Construct validation is a useful way to evaluate GxE research
There are two distinct cultures vying to evaluate the worth of the 5-HTTLPR GxE findings: a purely statistical (theory-free) approach that relies wholly on meta-analysis (142, 143) versus a construct-validity (theory-guided) approach that looks for a nomological network of convergent evidence (this article). The statistical approach is essential for confirming direct genotype-phenotype association discoveries. This approach is driven by the imperative to avoid false positives when evaluating associations sifted from huge amounts of data in theory-free, genome-wide testing with nil prior probability of gene-disease association (144). Naturally, the statistical approach prizes exact replication. In the statistical approach, replication attempts' elements should match the original report's elements, including sample, phenotype, polymorphism, genetic model, and direction of effect. Larger samples are given greater weight in statistical evaluation, because with all other study elements held equal, power is decisive (145).
It is our contention that the purely statistical approach is not sufficient, or necessary, for evaluating research into GxE hypotheses involving candidate genes. In such GxE research, the prior probability of association is far from nil, thus mitigating the risk of false positives. For example, the 5-HTTLPR stress-sensitivity hypothesis was informed by knowledge about the serotonin system's role in depression and the transporter gene's function (1), by inconsistent associations between the 5-HTTLPR and depression suggesting environmental moderation might be operating (146), by evidence that stress causes depression (147), and by initial reports that 5-HTT variation influenced stress reactivity (2, 148, 149). In GxE research, replication attempts' elements need not match those of the original report. GxE research involves not only polymorphism and phenotype, but another element: the environment. Whereas genetic measurements are standard and unchanging across time and across studies and phenotypic measurements can also be standardized to a high degree, environmental exposure measurements vary markedly across studies (150). Two kinds of heterogeneity should be distinguished: heterogeneity in the types of stress exposure versus heterogeneity in the quality of exposure measurements. Regarding exposure types, stressful experiences come in many forms (Table 1 and Table 2) and studies of the 5-HTTLPR GxE have rightly gone beyond the original report to incorporate them. This environmental measurement heterogeneity has implications for matching the genetic model across studies, because the "correct" genetic model could vary depending on severity of the environmental exposure or other factors such as developmental stage and course of illness (e.g., first-onset versus recurrent depression). By insisting that all results must conform to one genetic mod- el, the meta-analysis approach conceals potentially informative patterns, if they exist. Regarding measurement quality, in GxE research it is folly to give greater weight to larger samples, because many large samples are afflicted by poor exposure measurement. Overall, heterogeneity in both the type of stress exposures and in the quality of exposure measurements renders the studies in Table 1 and Table 2 inappropriate for drawing one simple conclusion about statistical replication (145).
Meta-analysis can be a useful tool for interpreting multiple tests of a GxE hypothesis, when best practice is followed. Meta-analyses should table the universe of publications testing the GxE interaction, explaining in a transparent way why each was analyzed or omitted. The subsample analyzed should represent the distribution of positive and negative results in the literature. Metaregression should be undertaken to evaluate methodological sources of variation among findings. Methodological evaluation should be guided by long-established cautions. For example, large samples often suffer poor measurement quality, and large exposure-to-outcome correlations often signal measurement bias, not validity. It should be appreciated that when the sample of studies is small, a statistical test for heterogeneity is underpowered and its nonsignificance does not contraindicate metaregression. If methodological heterogeneity is ruled out, metaregression should investigate substantive sources of variation among findings (e.g., sex, age, exposure severity), and if these are uncovered, variation in genetic model should be considered in relation to the substantive findings. Meta-analyses of the 5-HTTLPR GxE hypothesis have been reported (142, 143), but did not follow best practice (17, 151—155).
In any case, whether GxE studies can meet prerequisite standards for statistical meta-analysis is immaterial, since replicating a theory-free association is not the goal. The goal is to evaluate the construct validity of a theory-guided hypothesis (156). In contrast to the statistical approach, the construct validation approach prizes design heterogeneity (although it requires high-quality samples and measures) (157). Construct validation seeks "sturdy" findings (6), defined as results that emerge repeatedly despite variation in sample characteristics, phenotype measurement, and environmental exposure, and that are validated across human epidemiology, experimental neuroscience, and animal models. We have attempted to show that this is the case with evidence for the 5-HTT stress-sensitivity hypothesis.
Public understanding of genetic science
One of GxE research's important contributions is often overlooked by scientists: teaching the falsehood of genetic (and environmental) determinism (158). For over a century the public has been fed a diet of determinism, beginning with early 20th-century eugenics policies to correct all human flaws by culling the breeding stock. Mid-century opinions swung back toward naive environmental determinism, exemplified by B.F. Skinner's 1948 Walden Two. In the late 20th century, public opinion was compelled toward genetic determinism again when high heritability estimates were taken to imply that nongenetic factors have little importance for mental health and behavior. Discoveries of single mutations causing rare disorders strengthened the public's belief that knowing one's genetic makeup is tantamount to knowing one's future. Deterministic beliefs, environmental or genetic, are dangerous. Determinism encourages policies that violate human rights (at worst) and waste resources on ill-conceived mental health improvement programs (at best). Media coverage of this century's new findings of gene-environment interaction (and environmental effects on gene expression) is persuading the public to embrace a more realistic, nuanced understanding of the causes of behavior, in which some genes' effects depend on lifestyle choices that are often under human control. That understanding will be the best defense against misuse of genetic information. Interdependence between life stress and the 5-HTTLPR leads this shift in understanding, because stress and depression touch almost everyone.