Autism is characterized by (A) qualitative impairment in social interaction, (B) qualitative impairment in communication, and (C) repetitive behaviors and restricted interests (DSM-IV). As might be expected for a disorder with qualitative impairments in multiple domains, autism is extremely heterogeneous. In addition to variability in the triad of core symptoms, there is also heterogeneity in co-occurring medical conditions among children with autism. Adding further to the heterogeneity, there is also a broader group of diagnoses called the autism spectrum disorder (ASD). ASD includes autism, Asperger's syndrome, Rett syndrome, and pervasive developmental disorders not otherwise specified, which has evolved to encompass a wide range of social and emotional abnormalities with varying levels of cognitive functioning. [For a comprehensive review of the diagnostic criteria for ASD, see the recent FOCUS review of Westphal and Volkmar (1).]
Despite its widely varying clinical presentation, ASD appears to be highly heritable. The time-honored way to determine the degree to which genetics contributes to risk of a particular disorder is to compare the concordance rate of the disorder in monozygotic twins and dizygotic twins. Twins are assumed to share most of the environmental factors that might contribute to disorder risk. However, monozygotic twins are genetically 100% identical, whereas dizygotic twins share only as much genetic information as typical siblings. Therefore, if the concordance rate (the percentage of twins in which both individuals are affected by the disorder) is higher in monozygotic twins than in dizygotic twins, then there is good evidence that genetic factors contribute to the disorder. For ASD, six such studies have been reported, and each of them provides strong evidence that ASD is highly heritable (2—7). In the first five studies, there were relatively few twin pairs (21—50 pairs) available for study, but there was general agreement that concordance rates were 80%—95% in monozygotic twins and 0%—30% in dizygotic twins (2—6). The most recent report included 277 twin pairs, and found concordance rates of 88% in monozygotic twins and 31% in dizygotic twins (7). These concordance rates indicate that genetic factors have a larger impact on ASD than on any other psychiatric disorder.
Despite its high heritability, the utility of clinical testing for ASD is limited at the current time (8—11). Rare syndromic disorders with high rates of co-occurring ASD account for ∼10% of individuals with ASD (8, 12). A high-resolution karyotype may detect duplication of chromosome 15q11—13, which is strongly implicated in ASD risk (evidence discussed below). However, the chromosome 15q11—13 duplication is observed in only ∼1% in individuals with ASD, a rate that may not justify broad clinical testing. Similarly, clinical fragile X testing will identify the ∼1% of individuals in whom ASD is due to co-occurring fragile X syndrome (13, 14). Among ASD, only Rett syndrome is understood at a genetic level that justifies clinical testing, focused on female patients with ASD and intellectual disability (15, 16). However, even the single-gene Rett syndrome, which is caused by various mutations of the MECP2 gene, can be genetically complex (17). Recent advances in ASD genetics suggest that routine testing may soon be justified in a larger subset of individuals with ASD (9). This review summarizes that recent evidence.
Given the highly heritable nature of ASD, it is not surprising that a great deal of effort has been exerted to determine its genetic causes. Genetic linkage studies have proven to be powerful tools to find rare genetic mutations with high impact in disorders. Linkage studies require genotyping of families with multiple affected individuals to identify genetic markers that are "identical by descent" and correlated with diagnosis of the disorder. The first genome-wide ASD linkage study indicated the most significant signal on chromosome 7q (18). The majority of subsequent genome-wide linkage studies also indicated an ASD linkage peak on chromosome 7q, although linkage evidence also emerged for a number of other genomic regions, including chromosomes 2q, 3p, 15q, and 17q (19—27). Collectively, the results of the genome-wide linkage studies suggested that there are multiple genetic loci that contribute to ASD risk. The linkage studies effectively narrowed the search for ASD candidate genes to a few chromosomal regions. However, each of the linkage peaks was broad enough to span dozens of genes. Therefore, candidate gene association studies were necessary to pinpoint the genes that contributed to the linkage peaks.
Candidate gene association studies for ASD have been conducted on at least 196 different genes, with at least one report of positive association for 106 different genes (28). However, the majority (83 of 106) of these candidate genes have been positively associated in only a single study (28). Because even well-designed single candidate gene studies can generate spurious results, single gene association studies should be considered "tentative knowledge" (29). Genes should be considered candidates to contribute to disorder risk only after replication of the genetic association in multiple samples (29). Using these criteria, five genes stand out as candidates for ASD risk because positive association has been reported in four or more samples (28). Each of these genes maps to a chromosomal region that was implicated in previous genome-wide linkage studies: OXTR on chromosome 3p; MET and EN2 on chromosome 7q; GABRB3 on chromosome 15q; and SLC6A4 on chromosome 17q. It is not yet clear whether these results represent a confirmation of the genome-wide linkage results or a bias in the selection of candidate genes. The preponderance of evidence in favor of a linkage peak on chromosome 7q led to intense investigation of candidate genes in this region (30—38).
Technological advances in genetics have been applied to ASD at a remarkable pace. The first genome-wide linkage study of ASD, published in 1998, examined 354 microsatellite markers (18). Assays that genotype individual single nucleotide polymorphism (SNP) markers in hundreds of samples (39) were applied widely in ASD candidate gene studies by 2004 (40). Most recently, genome-wide association study (GWAS) platforms that genotype 10,000 or 500,000 SNPs have been applied to ASD (41, 42). The latest technologies for copy number variations (deletions or duplications of DNA segments) have also been applied to ASD (43—47). All of these genome-wide technologies offer hope of identifying genetic variants that contribute to ASD susceptibility.
The other paradigm shift in ASD has been the experimental approach toward increasingly larger sample sizes. Early reports of linkage and association for ASD included less than 200 families. Reports from other complex genetic disorders indicated that success depended on much larger sample sizes; reproducible genome-wide genetic associations were found in samples of thousands of patients (48, 49). Appropriately, the latest ASD genetics studies have included more than a thousand families (41, 42).
Applications of these genome-wide techniques to large sample sizes have begun to define chromosomal regions most likely to harbor genetic variants that contribute to ASD susceptibility. The genome-wide techniques have not pointed to the same chromosomal region (Figure 1). Initial attempts to replicate reports of genome-wide significant linkage, genome-wide significant association, and association of specific copy number variants (CNVs) with ASD risk have failed. However, these powerful genome-wide approaches have provided support for two previously identified ASD candidate genes and identified, for the first time, genetic loci with genome-wide significance that deserve further investigation.
Linkage analysis is a powerful technique to identify chromosomal regions that contribute to disease risk. Because the goal is to identify chromosomal regions that are coinherited with disease, linkage analysis requires families with multiple affected individuals. Ten ASD linkage analyses were published between 1998 and 2007 (18—27). Although a majority of studies identified linkage on chromosome 7q21—35, suggestive evidence for linkage has been described on every chromosome (50).
Two recent publications provided more robust evidence for linkage peaks in ASD. The Autism Genome Project Consortium performed linkage analysis on 10,000 SNP markers genotyped in 1,181 families (41). With this powerful combination of marker density and sample size, this group was able to identify a single linkage peak at chromosome 11p12—13 that exceeds the threshold for genome-wide suggestive linkage, particularly in families that have at least one female with ASD (41). Only modest support was observed for the previously identified linkage peaks on chromosomes 2q and 7q; support for a linkage peak on chromosome 17q was lacking (41). The results of this study suggest that rare genetic variants on chromosome 11p12—13 may contribute to ASD risk.
The most recent attempt at linkage analysis in ASD used technology that genotyped 500,000 SNPs in 878 families (42). Because marker densities that are too high can create statistical problems in linkage analyses, the authors pruned the number of markers used in their analyses to 16,311 highly polymorphic, high-quality SNPs (42). They found significant genome-wide linkage on chromosome 20p13 and suggestive evidence for linkage on chromosome 6q27 (42). These linkage data provided no evidence in support of the previously identified linkage peaks on chromosomes 2q, 7q, and 17q (42). Neither chromosome 20p13 nor chromosome 6q27 was suggestive for linkage in the Autism Genome Project Consortium study (41). Similarly, Weiss et al. (42) did not find evidence to support the Autism Genome Project Consortium linkage peak on chromosome 11p12—13 (41). Thus, the two large, recent linkage scans identified linkage signals different from each other and linkage signals different from those reported previously, confirming the heterogeneity of ASD. Resequencing and candidate gene association studies will be required to identify the functional variants contributing to each of the linkage signals.
The advancement of technology platforms that genotype 10,000, 500,000, or even 1 million SNPs allows the study of genetic association on a genome-wide scale, instead of studying individual candidate genes. This new technology provides a powerful approach for identifying common genetic variants that contribute to ASD risk.
The first published GWAS of ASD found evidence of a genome-wide significant association signal on chromosome 5p14 (51), outside any previously reported linkage peak for ASD. The significant association signal lies between the genes encoding cadherin 9 (CDH9) and cadherin 10 (CDH10), both of which encode cell adhesion molecules that are expressed in the developing brain (51). The genetic evidence is extremely compelling and convincing: the original genome-wide significant association signal in 780 families was replicated in a large case-control sample. Further, the association peaked at a single SNP (rs4307059) that was flanked by several other markers that also showed evidence of association (51), suggesting that the association at rs4307059 was not due to a technical artifact. Because brain expression levels of CDH9 and CDH10 do not correlate with genotype at rs4307059 (51), the functional variant is yet to be identified.
The second published GWAS association data failed to replicate the association of rs4307059 on chromosome 5p14 (42). Instead, the second GWAS identified genome-wide significant association of markers ∼16 million base pairs away, on chromosome 5p15 (42). Here, three markers displayed genome-wide significant association near the gene encoding semaphorin 5A (SEMA5A). Like the cadherins, semaphorin 5A is a cell adhesion molecule with a defined role in brain development. Further, an extensive study of postmortem brains revealed a significant 20% decrease in SEMA5A transcript in postmortem brains of individuals with ASD compared with control subjects (42). It is not yet clear whether the identified genetic variants correlate with SEMA5A expression levels. Moreover, the identified risk allele for ASD near the SEMA5A gene is present at a frequency of >95% (42), suggesting that direct resequencing will be required to identify a less common, functional risk allele.
Collectively, the two reported ASD GWAS studies point to regions of chromosome 5p that may include common genetic variants that contribute to ASD risk. These studies provide strong statistical evidence for a potential functional variant on chromosome 5p, a region that was not revealed in any linkage study of ASD. Additional studies will be required to determine the functional genetic variants.
Most cells of the human body contain 23 pairs of chromosomes. However, humans can survive with regions of chromosomes that are present in only one copy ("deletion") or in three or more copies ("duplication"). Deletions and duplications are collectively known as CNVs. CNVs are more common than was appreciated only 10 years ago, and technology continues to improve to identify increasingly smaller regions of CNVs. CNVs do not always cause disease. In fact, healthy control individuals carry an average of 11 CNVs, suggesting that even multiple CNVs are not pathological (52). Nevertheless, increasing evidence suggests that CNVs are a significant source of genetic variation and that specific CNVs may contribute to ASD risk.
There are multiple technologies used for detection of CNVs. [For a recent review of these CNV technologies and how they have been applied to ASD and schizophrenia, see Merikangas et al. (47).] There are two types of CNVs: inherited and de novo. Inherited CNVs, as the name implies, are passed from one generation to the next. In contrast, de novo CNVs are not inherited from a parent but are instead mutations that are first observed in the patient. De novo CNVs are rare and more likely to be pathological than inherited CNVs.
Ten studies have reported analysis of CNVs in ASD. Although improved technology may provide improved resolution of smaller CNVs, three apparently definitive conclusions can be drawn from these studies. First, the results of several studies establish that genome-wide inherited CNVs are as common in individuals with ASD as in control individuals or unaffected siblings, indicating that there is no increased burden of CNVs in ASD (41, 44, 46). Second, genome-wide de novo CNVs appear at a frequency of 7%—10% in simplex families (those with a single individual with ASD), 2%—3% in multiplex families (those with multiple individuals with ASD), and ∼1% of control individuals (43, 44). The higher rate of de novo CNVs in simplex families compared with that in multiplex families suggests that de novo events may contribute to ASD risk in this subset of families. Third, there does not appear to be a single CNV that accounts for a large fraction of cases of ASD or is diagnostic for ASD. However, CNV analysis could provide important clues about what genes and biological pathways contribute to ASD risk. In the data presented to date, it appears that CNVs of three chromosomal regions may be enriched in ASD: chromosome 2p16, chromosome 15q11—13, and chromosome 16p11.2.
De novo CNV of chromosome 2p16 was first described in a single family (41). Because of a lack of control samples in this study, it was difficult to determine whether any of these de novo events indicated an increased frequency of ASD (41). However, the de novo CNV of chromosome 2p16 involved deletion of the gene encoding the synaptic protein neurexin 1 (NRXN1). Two subsequent reports indicated de novo deletion of NRXN1 in ∼0.5% of individuals with ASD that was absent from control individuals, suggesting that deletion of NRXN1 may contribute to ASD risk in a small fraction of patients. De novo deletion of NRXN1 was observed in 9 of 1,771 (0.5%) individuals with ASD but in none of 2,989 control individuals (53) and in 10 of 2,195 (0.5%) individuals with ASD but none of 2,519 control individuals (46). However, the opposite has also been reported: an absence of de novo deletion of NRXN1 in 2,252 individuals with ASD or developmental delay, but the presence of the NRXN1 deletion in 5 of 2,814 (0.2%) control individuals (45). Therefore, deletion of NRXN1 may contribute to ASD risk in a small fraction of patients, but it is not diagnostic for ASD.
De novo CNVs of chromosome 16p11.2 were also originally described in a single family (43). A subsequent study revealed that a de novo CNV of chromosome 16p11.2 was present in 3 of 427 (∼1%) individuals with ASD (44). Simultaneously, a much larger study confirmed that a de novo CNV of chromosome 16p11.2 was present at a rate of ∼1% (15 of 1,740) in individuals with ASD and only 0.05% (12 of 23,502) of control individuals (45). The region of consistent chromosome 16p11.2 CNV includes ∼25 genes (44, 45). However, a recent report indicated a control sample frequency for chromosome 16p11.2 CNV of ∼0.2% (4 of 2,519) (46), confirming that the CNV of chromosome 16p11.2 is not diagnostic for ASD. Further, recent data also implicate CNV of the same chromosome 16p11.2 region in mental retardation and/or multiple congenital anomalies (54), suggesting that CNV of chromosome 16p11.2 may not be specific to ASD.
Duplication of chromosome 15q11—13 in individuals with ASD was first described nearly 20 years ago by cytogenetic analysis (55). This chromosomal region was implicated in multiple genome-wide linkage scans for ASD, and inheritance of a duplication of chromosome 15q11—13 from the mother (but not from the father) is associated with ASD (56). Deletion of chromosome 15q11—13 results in Prader-Willi syndrome (when inherited from the father) and Angelman syndrome (when inherited from the mother). The ASD candidate region includes eight genes (57) and may contribute to ASD risk via multiple genetic and epigenetic mechanisms (58). Each of the recent CNV studies of ASD has provided strong support for contribution of chromosome 15q11—13 duplication to ASD risk (41, 43—46, 53). The rate of chromosome 15q11—13 duplication (inherited or de novo) is ∼1% in ASD. One de novo duplication included only two genes, ATP10A and GABRB3, narrowing the search for ASD candidate genes (45). A subsequent report indicates association of a rare functional variant of GABRB3 with ASD risk (59), providing evidence that this gene may contribute to the often replicated association of chromosome 15q11—13 with ASD risk.
One caveat to note about the CNV analyses to date is that most have used the Autism Genetics Resource Exchange (AGRE) collection, which comprises ∼95% multiplex families. Given the higher rate of de novo CNVs in simplex families, it is possible that important de novo CNVs that contribute to ASD risk are yet to be reported. Collectively, CNV studies of ASD have identified rare CNVs targeting several chromosomes other than chromosomes 2p16, 16p11.2, and 15q11—13 (41, 43—46, 53). Each of these preliminary observations awaits replication. It appears clear that chromosome 15q11—13 duplication, either de novo or inherited from the mother, contributes to ASD risk.
The strongest ASD candidate genes are those for which convergent evidence exists. Ideally, linkage analysis would implicate the chromosomal region, candidate gene association studies would describe replicated association of alleles, GWAS studies would implicate the gene, rare CNVs of the gene would be identified, and there would be some functional evidence that the gene is involved in ASD risk. No candidate gene meets all of these criteria. However, there are five genes that have convergent evidence for contributing to ASD risk. The evidence for each is detailed below and summarized in Table 1.
The MET gene encodes the MET receptor tyrosine kinase, a key regulator of neuronal migration and synapse formation in the brain. Most linkage studies implicate the chromosome 7q31 region in which the MET gene lies. The results of all genetic association studies reported to date indicate positive association of MET gene variants. The MET promoter variant rs1858830 C allele was associated in five independent samples (37, 60, 61). Another study did not precisely replicate association of the MET rs1858830 C allele but instead found positive association of another genetic variant in MET that is likely to regulate its expression in two independent samples (62). The MET rs1858830 C allele is functional: it decreases transcription twofold due to altered binding of transcription factor complexes (37). Expression of MET protein is decreased twofold in postmortem brains of individuals with ASD (63). Further, the association of the MET rs1858830 C allele is enriched in individuals with co-occurring ASD and gastrointestinal conditions, indicating that MET may contribute to a subset of cases of ASD (64). Rare CNVs of the region including the MET gene have been reported (44). GWAS platforms lack markers near the ASD-associated MET promoter variant, and thus association of MET is not tested in GWAS studies (42). Therefore, overwhelming convergent evidence points toward association of common genetic variants in the MET gene that regulate its expression and contribute to ASD risk in a subset of patients with co-occurring gastrointestinal conditions.
The GABRB3 gene encodes the GABAA receptor β3 subunit protein, a critical component of inhibitory signaling in the brain. Some, but not all, linkage studies of ASD implicated the chromosome 15q11—13 region in ASD. The results of many candidate gene association studies of GABRB3 were inconclusive. Association of the GABRB3 marker 155CA-2 was observed in two samples (65, 66) but not replicated in three other samples (67—69). Similarly, a positive association of the GABRB3 marker rs2081648 was observed in a study of 104 Korean trios, but this result was not replicated in 470 Caucasian families (70, 71). However, there is now overwhelming evidence in favor of chromosome 15q11—13 duplication contributing to ASD risk in ∼1% of families (41, 43—46, 53). Further, a functional rare variant, also present in ∼1% of families, is associated with ASD risk (59). The ASD-associated mutant form of the protein causes decreased expression of the receptor on the cell surface and reduced whole cell current (59). Therefore, despite the absence of a clear association of common genetic variants, accumulating evidence indicates that rare variation of the GABRB3 gene contributes to a subset of cases of ASD.
The EN2 gene encodes ENGRAILED 2, a transcription factor involved in cerebellar development. Most linkage studies implicated the chromosome 7q36 location of the EN2 gene. Association of two alleles, the rs1861972 A allele and rs1861973 C allele, in the only intron of the EN2 gene was first observed in two samples (36) and then was replicated in four additional samples (72—74). A study of 210 Chinese Han families did not precisely replicate the association of rs1861972 and rs1861973 but instead found association of another SNP in the EN2 intron, rs3824068 (75). Another association analysis in a Chinese Han sample found association of the opposite alleles of rs1861972 and rs1861973 (76). Together, these results implicated the EN2 gene in ASD risk, but suggested that the functional variant had not been identified. However, recent functional studies indicate that the alleles of rs1861972 and rs1861973 bind different transcription factor complexes and cause a modest (∼20%) but significant change in transcriptional efficiency (77). Although recent CNV and GWAS data do not implicate EN2 and evidence of altered EN2 expression in ASD is lacking, association of common variants in EN2 in eight samples suggests that this gene contributes to at least a subset of cases of ASD.
The SLC6A4 gene encodes the serotonin transporter, a critical regulator of the neurotransmitter serotonin in both the brain and peripheral tissues. Because increased platelet serotonin is one of the few biomarkers that identify a subset of patients with ASD, the serotonin transporter is a plausible biological candidate for ASD risk. Multiple linkage studies found evidence for linkage of the chromosome 17q11.1—12 region at which the SLC6A4 gene resides. One of the first candidate gene association studies in ASD used microsatellite markers to describe association of the short allele of a SLC6A4 promoter variant (78). The short allele of the SLC6A4 promoter variant is functional: it decreases transcription efficiency, resulting in decreased gene expression and serotonin uptake activity (79). However, the results of more than a dozen genetic association studies in ASD have yielded mixed results: some reports found significant association of the short allele, some reports described significant association of the long allele, and some reports found no association of either allele (80). In fact, a recent meta-analysis of 14 studies indicates no overall association of the SLC6A4 promoter variant (81). Further, neither CNV nor GWAS studies implicate the SLC6A4 gene in ASD risk. However, rare functional variants may contribute to ASD risk, especially in a subset of patients with rigid-compulsive behaviors (82).
The OXTR gene encodes the oxytocin receptor, a known modulator of social behavior (83). Intranasal oxytocin administration was shown to improve the ability of individuals with ASD to recognize emotions (84), emphasizing the biological plausibility of OXTR contribution to ASD risk. One genome-wide linkage analysis highlighted the chromosome 3p24—26 region of OXTR (24). A study of 195 Chinese families indicated association of two markers in OXTR: rs2254298 A allele and rs53576 A allele (85). Association of the rs2254298 A allele was replicated in a Japanese case-control sample (86), and the opposite allele was identified in a study of 57 Caucasian families (87). Two additional studies failed to replicate association of rs2254298 with ASD (88, 89), but association of nearby markers in the OXTR gene have also been described (88, 90). In addition, a recent report described deletion of the region including OXTR and four neighboring genes in 1 of 119 families, altered methylation of the OXTR promoter in individuals with ASD, and decreased expression of OXTR in postmortem brains of individuals with ASD (91). Therefore, there may be multiple modes of disrupting OXTR that result in decreased oxytocin receptor and an increased risk for ASD.
Among the candidate genes with the most evidence in favor of contribution to ASD risk, there is little overlap with recent genome-wide linkage, GWAS, and CNV results (Figure 1). Association studies of recently implicated genetic regions should provide additional strong ASD candidate genes. One promising addition to the list is CNTNAP2, a gene that contributes to quantitative language impairments (92, 93) and may also contribute to ASD risk (38, 94). For all the candidate genes, it will be important to determine the biological pathways by which functional genetic variants contribute to ASD risk.
Future technological advances will provide additional information about the genetic basis of ASD risk. For example, the application of next-generation sequencing techniques to ASD has not been reported. Next-generation sequencing has its limitations (95) but has the potential to provide important genome-wide information about common variants that are not tested on GWAS platforms, rare mutations, and rare CNVs that are enriched in individuals with ASD (96). Similarly, improved resolution of CNV platforms promises to identify increasingly smaller CNVs that may contribute to ASD risk. The application of these CNV platforms to simplex families, in which the burden of de novo CNV appears to be higher, may provide important insights into the genetic basis of ASD risk. Once these approaches identify candidate genes, the challenge then becomes translation of the genetic findings to determine the biological impact of the ASD-associated variants on brain development (12).
Beyond technological advances in genetics, there are two additional issues that must be addressed to understand the genetic basis of ASD risk. First, the observation that ASD affects four times as many males as females must be explained. There must be a genetic basis to this distinction, but no empirical data described to date explain it. Second, the absence of a single strong genetic signal indicates that multiple genes contribute to ASD risk, but there is debate about how these various genetic variants converge to result in ASD. One possibility is that there are genes that contribute independently to deficits in social interaction, language, and behavioral flexibility, and that ASD results from the convergence of multiple genetic risk factors for each of these independent domains. Alternatively, ASD may arise from a common biological mechanism (and thus the three ASD domains are inseparable), with the genetic complexity arising from disruption of multiple genes encoding proteins within the biological pathway (97). It will be important to make this distinction because it influences future biological approaches to understanding ASD. If ASD is the convergence of multiple genetic risk factors for each phenotypic domain, then it will be important to understand the biological interaction of proteins encoded by genes that influence the social domain (e.g., OXTR) and the language domain (e.g., CNTNAP2). On the other hand, if ASD results from disruptions of a common biological pathway, then it will be important to understand the multiple ways in which that pathway can be disrupted [e.g., the multiple genes encoding proteins of the MET signaling pathway (60)]. Network analysis of ASD candidate genes will provide important insights (98).
In summary, the recent application of powerful genome-wide technologies to large samples has implicated novel genetic regions in ASD risk. These findings also provide additional support for the candidate genes MET and GABRB3 (Figure 1). The available genetic evidence provides an opportunity to understand the biological basis of ASD. The challenges are now to identify functional variants, describe the contributions of the risk alleles to altered brain development, and determine effective treatment strategies based on biologically sound hypotheses.