On page 213 of this issue, Cronin et al1 report their findings of a replication study of the top 27 associated single nucleotide polymorphisms (SNPs) identified when combining results from several whole genome association studies (WGASs) in amyotrophic lateral sclerosis (ALS). Although the results of the study are largely negative, the insight that is gained from their interpretation is timely as researchers are addressing the strengths and weaknesses of WGASs in ALS and which genes should be followed up from a functional biology perspective. Although the cause of this deadly disease is not yet fully understood, genetic studies have lead the way so far in identifying pathways that may increase its understanding, and the hope is that genes identified in WGASs will eventually help in this regard.

ALS, also known as Lou Gehrig's disease, is a motor neuron disease characterized by rapidly progressive paralysis leading to death due to respiratory failure, typically within 3–5 years of symptom onset. It is the most common adult onset motor neuron disease with an incidence of 2.1 per 100 000 person-years.2 Familial ALS is responsible for 1–5% of ALS cases as a whole depending on the population studied, and it is caused by highly penetrant genes that are most often inherited in an autosomal dominant manner.3 The greatest contribution toward an understanding of ALS thus far has come from the discovery of mutations in the superoxide dismutase 1 (SOD1) gene on chromosome 21q22.11, which accounts for 10–20% of autosomal dominant familial ALS cases. Other than linkage analysis, which has allowed the mapping of SOD1 in familial ALS, other approaches have been used to identify genes that are involved in non-familial or ‘sporadic’ ALS. These include mainly candidate gene association studies and WGASs.

Candidate gene association studies have been performed to identify genes based on a priori hypotheses.3 The genes identified pertain to broad categories such as hypoxia and oxidative stress (vascular endothelial growth factor, paraoxonase); cytoskeletal structure (neurofilament heavy chain subunit, dynactin); motor neuron survival (survival motor neuron, ciliary neurotrophic factor, leukemia-inhibitory factor) and neurodegenerative disorders (hemochromatosis, apolipoprotein E4 allele, TARDBP).

As genotyping on a large scale has become affordable, WGASs have enabled the hunt for genes involved in complex diseases without a priori hypotheses about the function of the gene.4 WGASs have been successful in identifying genetic factors that underlie common diseases such as diabetes5 and breast cancer.6 The variants identified in these studies, however, have odds ratios not much above 1, which means that their discovery necessitates the compilation of very large sample collections numbering in the thousands or even tens of thousands.

The first reported WGAS in ALS was performed in a small cohort of 276 American sporadic cases and 275 neurologically normal American control samples.7 It identified 34 SNPs that were significantly associated with increased risk of developing ALS, though none of these SNPs exceeded the Bonferroni threshold for multiple testing. Another group8 shortly after performed a WGAS using 766 955 SNPs in 386 white patients with sporadic ALS and 542 neurologically normal white controls, while replicating findings in two stages: one of 744 cases and 750 controls, and a follow-up of 135 cases and 275 controls. Their most significant association with disease was found for an SNP near an uncharacterized gene known as FLJ10986, which was expressed in the spinal cord and cerebrospinal fluid of patients and controls. Another group, from the Netherlands, performed a WGAS in sporadic ALS using an initial sample of 461 cases and 450 controls, while replicating their findings in an independent sample of 876 cases and 906 controls from the Netherlands, Belgium and Sweden; they found ITPR2 to be significantly associated with ALS.9 This same group then extended their initial analysis to include additional samples and found that variants and haplotype blocks surrounding the DPP6 gene were strongly associated with ALS susceptibility in 1767 cases and 1916 controls of European ancestry.10

In their initial WGAS of sporadic ALS, Cronin et al11 used a more homogeneous population by conducting their study in 221 cases and 211 controls from Ireland, identifying 35 potentially associated loci. They then used for replication a joint analysis of genome-wide data from the publicly available Dutch and US datasets7 totaling 958 ALS cases and 932 controls. Their strongest association was also a variant in the gene encoding DPP6, a component of type A neuronal transmembrane potassium channels. Cronin et al1 present, in the current issue of the European Journal of Human Genetics, a replication study of their initial WGA scan. They examine a total of 27 SNPs that had the same allelic directionality and approached nominal significance in the previous US, Dutch and Irish WGA sample sets. Their replication population was a slightly expanded cohort of 91 patients and 48 controls from Ireland and an independent set of 218 patients and 356 controls from Poland. Their combined data analysis of 1267 cases and 1336 control subjects did not identify any SNPs reaching Bonferroni-corrected significance for association.

The most salient finding from the present study of Cronin et al1 is that the DPP6 allele that was over-represented in the initial WGAS is now under-represented in the Polish cohort. This swap of the disease-associated allele is often a strong warning sign when interpreting the WGAS results. We can thus conclude with some certainty that the SNP itself that was examined (rs10260404) does not cause ALS. However, as was observed in the Dutch DPP6 study,10 a full linkage disequilibrium block incorporating several adjacent SNPs was associated, such that an SNP nearby and perhaps still within the DPP6 genomic interval may have an effect. This is critical as the identification of this causal variant is the ultimate goal of this genetic study. Alternatively, as the authors point out, lack of replication can suggest that the initial result was a false positive, or that population structure may be at play. However, the main argument against the latter is that the initial studies were performed in quite diverse populations already (US, Ireland, the Netherlands, Sweden and Belgium). How specific can the Polish population be?

At the present stage, one has to be very careful in interpreting results originating from WGASs, since these studies have limited resolution owing to small sample sizes. The reporting of negative replication results is important in analyzing and interpreting the significance of candidate genes that emerge as associated WGASs. Public availability of WGAS genotype datasets is invaluable in increasing the sample size and thus the power to detect SNPs, conferring a relatively small risk. However, this has resulted in tremendous overlap of datasets from separate supposedly independent publications in which a substantial portion of the signal comes from data already present in other studies. What is lacking therefore is truly independent validation from additional populations. While we await these results, replication of the top associated SNPs in a separate population is nonetheless relevant▪