Introduction

Strabismus (misalignment of the eyes) is a common ophthalmic condition with both genetic and non-genetic risk factors contributing to its aetiology. Most strabismus is comitant (or concomitant), meaning that the angle of misalignment between the two eyes remains relatively constant regardless of the direction of gaze1. Based on the direction of deviation, concomitant strabismus can be further divided into convergent/esotropia (ET, inward eye deviation) or divergent/exotropia (XT, outward eye deviation). The prevalence of ET and XT varies across populations. For example, among European regions, ET has a prevalence of 2.17%, whereas XT has a prevalence of ~1.53%. By contrast, the African region shows a prevalence of 0.13% for ET and 0.14% for XT2. The misalignment of the visual axis due to the imbalance in extraocular muscles in strabismus leads to reduced or absent binocular vision and is often associated with amblyopia3; individuals who have amblyopia face a significantly increased risk of bilateral visual impairment over their lifetime4. The pathogenesis of strabismus is poorly understood.

Previous studies suggested that various prenatal and early-life environmental factors, such as maternal smoking during pregnancy, increase the risk of strabismus5,6,7. Genetic studies have been conducted to understand the risk loci for strabismus3,8,9. The genetic contribution to strabismus has also been assessed by genome-wide association studies (GWAS). Shaaban et al.10 conducted a GWAS for strabismus and reported significant variants rs2244352 and rs912759 located within/near the WRB on chromosome 21 and ADGRL4 on chromosome 1, respectively. Plotnikov et al.11 also identified a variant rs75078292 (P = 2.24 × 10−8) within the NPLOC4-TSPAN10-PDE6G-FAAP100 gene cluster strongly associated with strabismus. Obtaining a larger sample size increases the statistical power of GWAS and is expected to lead to the discovery of further strabismus-risk variants. Here, we conducted a European ancestry meta-analysis GWAS of strabismus analyses combining 11 summary statistics from 7 sources. The meta-analysis was based on three definitions of the strabismus phenotype: broad-sense strabismus (20,464 cases and 954,921 controls), ET (5,963 cases and 588,794 controls) and XT (3998 cases and 583,468 controls). We identified 7 previously unreported risk variants that provide insights into the aetiology of strabismus.

Previous observational studies have reported an association between maternal smoking during pregnancy and strabismus in offspring5,12,13. However, conventional observational study designs cannot determine causality. In this study, we conducted Mendelian randomisation (MR) to evaluate if genetic support exists for a causal association between maternal smoking and strabismus. Further, since previous studies have reported an association between birth weight and strabismus (which may be partly mediated by maternal smoking)14, we used MR to assess support for a causal relationship between birth weight and strabismus.

Results

Meta-analysis

We calculated genetic correlations (rg) for three strabismus phenotypes (Supplementary Data S2). The analysis revealed that broad-sense strabismus showed a higher genetic overlap with ET (rg = 0.83, 95% CI: 0.70–0.96, P = 1.04 ×  1034) compared to XT (rg = 0.60, 95% CI: 0.42–0.79, P = 1.25 × 1010). The genetic correlation of ET and XT was −0.22 (95% CI: −0.51 to 0.07, P = 0.137). We conducted GWAS meta-analyses using three strabismus definitions (Fig. 1). Using the broad non-paralytic strabismus definition, we identified 4 genome-wide significant independent variants near the NPLOC4-TSPAN10-PDE6G-FAAP100 gene cluster, COL6A1, ZNF701, and CHRNA4, respectively. Using the ET definition, we identified 4 genome-wide significant independent variants within or near UTS2, CHRNA4, DYNLRB2, and NPLOC4-TSPAN10-PDE6G-FAAP100. Using the XT definition, we identified 2 genome-wide significant independent variants near UTS2 and MAD1L1 (Table 1, Supplementary Data S4 and Fig. 2). The gene cluster NPLOC4-TSPAN10-PDE6G-FAAP100 was significantly associated with strabismus and both its sub-phenotypes. CHRNA4 was associated with broad-sense strabismus and ET (Fig. 3). UTS2 was associated with ET and XT. In total, across the different strabismus definitions, we identified 7 strabismus-associated loci where the peak SNP had P < 5 × 10−8. Only the NPLOC4-TSPAN10-PDE6G-FAAP100 locus reached genome-wide significance in previous strabismus GWAS11,15.

Fig. 1: Manhattan plots and QQ-plots of three strabismus phenotypes.
figure 1

The red line in Manhattan plots represents the genome-wide significant threshold (P = 5 × 10−8), the green line represents the suggestive significance threshold (P = 1 × 10−5). The red line in QQ-plots represents the expected distribution of the p values, and blue/yellow/green trend represents the observed distribution. Shades represent the 95% confidence interval of the expected distribution.

Fig. 2: Locus zoom plots for strabismus, esotropia and exotropia significant loci.
figure 2

Genome build for chromosome position is Homo sapiens (human) genome assembly GRCh37 (hg19) and LD (r2) is calculated from 1000 Genome European population. The blue line represents the recombination rate (cMMb). The most significant SNPs are indicated by the purple dots. The x-axis shows genes located in the genomic regions (1MB) and y-axis indicates the significance of SNP associations (−log10(P)).

Fig. 3: Venn diagram of seven GWAS-significant loci.
figure 3

The GWAS summary statistics of the three strabismus phenotypes shared a subset of associated loci. Genes in the blue/red/green circle represent the genetic loci associated with strabismus/esotropia/exotropia. Genetic loci in bold represent the loci also associated with myopia or refractive errors.

Table 1 Lead GWAS SNP

Of the loci identified in the three meta-analyses, MAD1L1, UTS2, NPLOC4, CHRNA4 and ZNF701 were reported to be associated with lung function or smoking16,17,18,19, while NPLOC4-TSPAN10-PDE6G-FAAP100 and COL6A1 loci showed an association with myopia20. We looked up the pheWAS results of these lead SNPs in the GWAS Atlas21; the pheWAS results after the Bonferroni correction are listed in Tables S4S6.

Loci associated with strabismus after adjusting for refractive error

We examined whether variants identified in our broad-sense strabismus meta-analysis were associated with published GWAS of refractive error22. The genetic correlations of the three strabismus traits and refractive error were statistically significantly different from zero, although the magnitude was modest (rg ~ 0.1–0.2, Supplementary Data S8). As would be expected because of pleiotropy between strabismus and refractive error, a conditional analysis led to a reduction in the mean test statistic across all SNPs in the genome, although three out of four (rs1996371, rs6420484 and rs8108303) from the broad-sense strabismus meta-analysis remained significantly associated with strabismus after conditioning on refractive error (P < 5 × 10−8). This suggested that these variants are specific to strabismus and are not simply associated with strabismus via their association with refractive error.

We also applied the above procedures to the results of the GWAS for ET and for XT. All four XT lead SNPs and two of four ET lead SNPs (rs228636 and rs8070929) retained genome-wide significance in the conditional analysis (Supplementary Data S9).

eQTL look-up

To assess the functional relevance of the lead strabismus loci, we evaluated the eQTLs associated with 4 lead SNPs (rs2150458, rs1996371, rs6420484, and rs8070929) from the broad-sense strabismus GWAS meta-analysis. By filtering associations based on an eQTL FDR < 0.05, the 4 lead GWAS SNPs were mapped to 231 significant eQTLs. No specific tissue type dominated the eQTL associations, but blood cells (BIOSQTL) had the largest proportion of eQTLs. SNPs rs2150458, rs1996371, rs6420484, and rs8070929 were linked to 15, 75, 128 and 13 eQTLs, respectively (Supplementary Data S10).

TWAS

We conducted a cross-tissue TWAS to detect strabismus-risk genes. The tissue weights from GTEx were applied in the UTMOST framework23. The cross-tissue analysis examined 17290 genes across 44 GTEx tissues. After accounting for multiple testing (P < 0.05/17290 = 2.89 × 10−6), nine significant loci were identified in the broad-sense strabismus GWAS meta-analysis: ADAMTS7, ALYREF, C17orf70, COL6A2, CREB3L3, DCXR, OSER1, PDE6G, SLC16A3. A further eight loci were identified in the ET TWAS analysis: CALU, CREB3L3, FBXL18, KXD1, NLRP9, NPAS4, TNRC18, TSACC, and three in the XT TWAS analysis (FTSJ2, MAD1L1, RAB3A).

Replication of previously published loci

We examined three previously published strabismus variants, rs2244352, rs912759 and rs75078292, in our meta-analysis (Table 2). The lead non-accommodative ET variant, rs2244352, identified in the ET GWAS reported by Shaaban et al.10, reached nominal significance, but was not genome-wide significantly associated with strabismus (P > 5 × 10−8) in any of our meta-analyses. The accommodative ET variant rs912759 from the same paper demonstrated no association in our meta-analyses. The locus at rs75078292 reported by Plotnikov et al.11 was identified as genome-wide significant in the broad-sense strabismus and ET meta-analysis and reached nominal significance in the XT meta-analysis.

Table 2 Replication of published variants

Association of maternal cigarette smoking with strabismus

Previous observational studies have suggested maternal smoking may be associated with strabismus risk5,6. Using a SNP, which has been shown to index maternal smoking (rs16969968)24, we performed an MR analysis to evaluate the genetically inferred causal link between smoking and three strabismus phenotypes (strabismus, ET and XT; Table 3). The MR results provided evidence to support a causal effect where maternal smoking increased the risk of strabismus (P < 0.05 for broadly defined strabismus as well as for ET and XT) (Table 3). A sensitivity analysis in which additional SNPs were used as instrumental variables for maternal smoking produced a similar result (Supplementary Results and Data S11).

Table 3 Two-sample MR to assess the effect of maternal smoking on strabismus risk

Association of birth weight with strabismus

Low birth weight has been reported as a risk factor for strabismus5,14,24. Also, a lower birth weight could potentially be a mediator in the causal pathway from maternal smoking to strabismus. Hence, we conducted a second MR analysis to investigate the genetic association between birth weight and strabismus. This MR analysis indicated that, for each 500-g increase (~1 standard deviation) in offspring birth weight, ORs for strabismus, ET, and XT risk were 1.05, 1.10, and 1.02, respectively (all P > 0.05, as presented in Table 4). To validate our MR findings, we compared them with results from a published observational study25 (Fig. 4). While our MR results did not yield strong evidence (P > 0.05) to support a causal link between birth weight and strabismus, the confidence intervals of ORs were relatively wide and overlapped with the observational results. For instance, when considering a 500 g increase in birth weight from 3500–3999 g to 4000–4499 g, the MR OR overlapped with observational results.

Fig. 4: Comparison of epidemiology and MR-based estimates of the relationship between birth weight and strabismus subtypes.
figure 4

We compared odds ratios (ORs) change from our Mendelian randomisation (MR) analysis of birth weight on esotropia (a) and exotropia (b) with observational results published by Torp-Pedersen et al. The x-axis contains different groups based on per 500 g birth weight increase. The birth weight changes were labelled as 3000–3499 g, 3500–3999 g and 4000–4499 g in the original paper, we re-labelled them to ‘500 g increase from 2500 g’, ‘500 g increase from 3000 g’ and ‘500 g increase from 3500 g’ and present the ORs change per 500 g increase on birth weight (green points in a, yellow points in b). Error bars are the 95% confidence interval of ORs change. The red point represents the MR OR per 500 g increase in birth weight, and the red error bars represent the 95% confidence interval of MR OR.

Table 4 Two-sample MR to assess the effect of birth weight (BW) on strabismus risk

We then assessed whether there was evidence for causality in both directions. We performed MR analyses to assess the effect of strabismus/ET/XT on birth weight using GWAS-significant SNPs identified from the current study as the instruments. We found no evidence for an effect of strabismus risk on birth weight using the IVW method (OR = 0.98 per doubling odds of strabismus; 95% CI = 0.94–1.02; P = 0.57; OR = 1.02 per doubling odds of ET; 95% CI = 1.00–1.04; P = 0.43; OR = 0.96 per doubling odds of XT; 95% CI = 0.93–1.00; P = 0.27).

Discussion

In this study, we have conducted the largest genome-wide meta-analysis for strabismus. We identified seven genetic variants significantly associated with strabismus using different definitions of strabismus. We performed MR using the well-established maternal smoking variable rs16969968 and showed that genetically-proxied maternal smoking increases the risk of offspring strabismus/ET; this adds genetic evidence to the existing conventional observational studies, bolstering the case for there being a causal relationship between maternal smoking and strabismus risk.

Strabismus is a heterogeneous condition; although some genetic factors are common across various subtypes, others are unique to specific forms26. Our results indicate that the statistical power of meta-analysis varies depending on the definition of strabismus used in the clinical data, suggesting that subtype-specific genetic factors may influence the susceptibility to different forms of strabismus. The low genetic correlation (rg = −0.22, Supplementary Data S2) between ET and XT suggests that these sub-phenotypes have different biological mechanisms. The broad-sense strabismus meta-analysis identified the same number of independent SNPs as the ET meta-analysis, despite the difference in sample size (Ncase = 20,464 for the broad-sense strabismus analysis; Ncase = 5963 for the ET meta-analysis). Future studies should endeavour to collect more detailed phenotype information to better dissect this heterogeneity, although large sample sizes will be required. Restricting the age range of cases may also help increase the accuracy of some future strabismus GWAS. In this study, we opted not to use ICD data in the UKB sample as these data were collected from older people whose ICD records did not reflect their childhood disease status.

The NPLOC4-TSPAN10-PDE6G-FAAP100 locus on chromosome 17 was associated with both broad-sense strabismus and ET. This locus was first reported in a UKB strabismus GWAS11 and replicated in the FinnGen cohort27. Our findings verified that this locus has a strong association with strabismus. However, this locus did not reach the genome-wide significance level in XT analysis, again supporting potential divergent biological mechanisms underlying ET and XT.

Genetic loci identified in previous strabismus GWAS have been associated with a diverse range of ocular phenotypes, including myopia or refractive errors10,11,28. Some of our findings may have been influenced by pleiotropy between strabismus and refractive errors in the current GWAS analyses. We were unable to include refractive error as a covariate due to the lack of access to individual-level data for the bulk of the input data. Instead, we compared the genetic correlation between our strabismus GWAS and published GWAS of refractive error22 (Supplementary Data S8), and we applied the mtCOJO approach to screen for SNPs associated with strabismus after adjusting for their effects on refractive error (Supplementary Data S9). Consistent with modest pleiotropy between strabismus and refractive error (rg ~ 0.1–0.2), we found that the number of genome-wide significant lead SNPs associated with strabismus reduced after conditioning on refractive error.

Previous observational studies have reported that maternal smoking during pregnancy has significant effects on offspring’s vision health5,6,7,29,30. MR uses genetic data to infer causality in a framework that is typically subject to different sources of confounding bias compared to observational studies and thus provides an additional source of evidence. However, the limited sample size of existing strabismus GWAS has hindered the use of MR to investigate the causal relationship between maternal smoking and strabismus. Here we report the first MR study to examine the relationship between maternal smoking and strabismus. We used a well-established maternal smoking proxy instrumental variable, rs16969968, to show that maternal smoking during pregnancy is linked to the risk of broadly defined strabismus as well as ET and XT (Table 3).

To confirm the single SNP MR result, we used all top variants from Saunders et al.31 as instrument variables for maternal smoking and found concordant results (Supplementary Data S11, Supplementary Results). However, the specific ORs from the multiple SNP analysis are more difficult to reliably interpret because they are based on the simplifying assumption that offspring genotype reflects maternal genotype31. As the age of onset for strabismus (in early childhood) occurs earlier than the average age of smoking initiation (around 15 years old or later)32,33, it is very likely that the SNPs instrumenting smoking behaviour index maternal smoking rather than indexing risk relating to the offspring’s smoking behaviour.

There are several limitations to consider in the MR analysis. First, although the SNP (rs16969968) used in our primary analysis as a genetic proxy for maternal smoking plays a well-established role in smoking behaviour, it remains possible the SNP had pleiotropic effects on maternal traits other than smoking initiation (or pleiotropic effects on a maternal trait that is a confounder of the maternal smoking-offspring strabismus relationship); such a pleiotropic effect could have biased the direction or magnitude of the MR result. However, examining rs16969968 in Open Targets Genetics34, there is no evidence that this SNP affects traits other than smoking and directly related traits (such as lung cancer). Second, although we estimated the magnitude of the risk of strabismus conferred by maternal smoking, accurately estimating a specific OR using MR is difficult35,36, especially in the case of maternal exposure’s effect on an offspring outcome. Ideally, an MR analysis would condition maternal genetic effects on the offspring genotype37, but this approach was not possible with the data available. Third, for our MR sensitivity analysis based on multiple SNP IVs, our estimates were derived assuming that the offspring genotype for each SNP was indicative of maternal smoking risk. Irrespective of the precise magnitude of the risk, our MR analyses provide an independent line of evidence for a link between maternal smoking and strabismus. In comparison to previous observational studies5,7, MR studies are subject to different sources of confounding bias. Therefore, combining past evidence from observational studies5 and the additional evidence from our MR study, there is consistent evidence of a causal relationship between maternal smoking and strabismus. Fourth, our study focused on individuals of European descent. Due to the lack of samples, we were unable to seek to replicate our GWAS lead loci in non-European cohorts. Although maternal smoking rates differ across countries2,38, maternal smoking is associated with strabismus in conventional observational studies across various ancestry groups5,7,39. Our MR findings in Europeans provide support for a causal link between maternal smoking and strabismus. Future genetic studies should be conducted in a wider range of ancestries to expand the scope of gene mapping and MR studies.

Notably, although we inferred the relationship between genetically-proxied maternal smoking and strabismus was causal, the actual mechanism underlying this association remains unclear. Given the well-known link between maternal smoking and birth weight, we conducted a secondary MR analysis to investigate the genetic association between birth weight and strabismus (Supplementary Results, Fig. S4); the confidence intervals on our MR estimates overlapped with those from a previous observational study of babies in the middle of the weight range (3000–4000 g)25. An advantage of observational studies is that they allow simple dissection of the effect of a particular increase in birth weight across a range of birth weights (e.g. low, medium, high), with a previous study showing that, for example, a 500 g change in birth weight was associated with strabismus among small babies (~2000 g), but that this effect was not be seen in larger babies (~4000 g)14,25. However, various confounders can influence conventional observational studies, which may introduce biases and distort the observed associations. A key advantage of our MR estimates is that they are less likely than observational to be affected by confounding. A disadvantage of our MR analysis is that, like most MR studies, we assume a linear relationship, which may only partially capture how birth weight affects strabismus across the full range of birth weights in the population. In the future, this could be revisited by applying MR to subsets of babies of low birth weight (e.g. <2000 g), although much larger sample sizes than we currently have would be required for adequate power.

Strengths of our study include conducting GWAS meta-analyses with multiple large cohorts for different types of strabismus. We also compared the GWAS of ET and XT and indicated the different biological mechanisms underlying these two subtypes. Furthermore, we conducted MR and showed evidence for a causal effect of maternal smoking during pregnancy on strabismus risk in offspring, consistent with conventional observational studies.

In summary, GWAS meta-analyses of strabismus and two of its sub-phenotypes identified a total of seven genome-wide significant genetic variants, six of which were unreported findings. The identification of these genetic loci associated with strabismus susceptibility enhances our understanding of its biological mechanisms. In addition, we obtained strong genetic evidence supporting a causal link between maternal smoking and strabismus. Thus, this work augments ongoing public health efforts aimed at reducing the rate of maternal smoking.

Methods

This study complies with all relevant ethical regulations. All participants provided informed consent, and individual-level data were anonymized and analysed in accordance with the approved protocols.

Datasets

We included 7 sources with 11 sets of GWAS summary statistics for adult strabismus (Table 1). These included a GWAS based on clinical strabismus data from the Kaiser Permanente Genetic Epidemiology Research on Adult Health and Aging (GERA) cohort, Finngen, and the Estonian Biobank (EstBB), along with a GWAS based on self-reported strabismus from UK Biobank (UKB), Lifelines, the Australian Genetics of Depression Study (AGDS) and a published ET GWAS using USA/Australia samples10.

UK Biobank

The UKB is a large-scale United Kingdom biomedical database containing in-depth genetic and phenotypic data from ~500,000 participants who were between the ages of 40 and 69 years at recruitment. Approximately 488,000 participants were genotyped on high-density SNP arrays. The genotype data underwent quality control and imputation procedures as previously described (Bycroft et al.40). Approximately 96 million variations were imputed utilising resources of the Haplotype Reference Consortium (HRC) and UK10K haplotype and 487,409 individuals were retained after genotyping quality control. To validate the ancestral background from UKB self-report ethnicity (Data-Field 21000), we used the k-means clustering method and clustered the top 20 principal components (PCs) into 20 clusters. The PCA clusters were compared with the self-report ethnicity. UKB individuals who had consistent European self-report ethnicity and genetic clusters were used in GWAS (mainly white British, N = 438,637). We included 2744 self-reported strabismus cases who gave this as their ‘reason for glasses/contact lenses’ (Data-Field 6147). Controls were 306,683 participants without self-reported strabismus, diagnosed strabismus (ICD-10 code H49 and H50), and no history of eye surgeries in loss of vision (Data-Field 5181, 5324, 5325, 5326, 5327 and 5328) (UKB phenotype Sep 2021 update).

We conducted a GWAS for strabismus in UKB using the software Regenie (version 2.2.4) (Mbatchou et al.41), adjusting for sex, age and the top 10 PCs. SNPs with MAF > 0.01 and imputation quality score (INFO score) > 0.8 were retained in the following analysis.

Kaiser Permanente GERA cohort

The GERA cohort contains genome-wide genotype, clinical, and demographic data of over 110,000 adult members of the Kaiser Permanente Northern California (KPNC) Medical Care Plan42. The Institutional Review Board of the Kaiser Foundation Research Institute has approved all study procedures. Patients with strabismus were diagnosed by a Kaiser Permanente ophthalmologist and were identified from clinical diagnoses captured in the KPNC electronic health records (EHR) system. These clinical diagnoses were recorded in the EHR system as International Classification of Diseases, Ninth or Tenth Revision (ICD-9 or ICD-10) codes. In GERA, strabismus cases were defined based on diagnosis codes (ICD-9: 378.0x, 378.1x, 378.31x, 378.32x, and 378.9x; or ICD-10 codes equivalent: H50.0x, H50.1x, H50.2x, and H50.9x). After excluding subjects who had any evidence of strabismus based on ICD-10 codes (H49 and H50), our control group included all the non-cases. All controls had at least one vision exam recorded in the KPNC EHR system. In total, 5763 ‘broad’ strabismus cases (or 1582 ET cases; or 1018 XT cases) and 59,797 controls from the GERA non-Hispanic white sample were included in this study. Protocols for participant genotyping, data collection and quality control have been described in detail42. Briefly, GERA participants’ DNA samples were extracted from Oragene kits (DNA Genotek Inc., Ottawa, ON, Canada) at KPNC and genotyped at the Genomics Core Facility of UCSF. DNA samples were genotyped at over 665,000 genetic markers on four ethnic-specific Affymetrix Axiom arrays (Affymetrix, Santa Clara, CA, USA) optimised for European, Latino, East Asian, and African American individuals43. Genotype quality control (QC) procedures and imputation were conducted on an array-wise basis44. For imputation, we additionally removed variants with call rates <90% by array. Genotypes were then pre-phased with Eagle (v2.3.2)45, and then imputed with Minimac3 (v2.0.1)46, using two reference panels. Variants were preferred if present in the EGA release of the HRC (N = 27,165; no indels) reference panel45, and from the 1000 Genomes Project Phase III release if not (N = 2504; including indels)47.

In GERA, GWA analyses were conducted for three strabismus phenotypes (‘broad’ strabismus, ET, and XT) using logistic regression models adjusting for age, sex, and ancestry PCs. GWASs were conducted using PLINK v1.9 (www.cog-genomics.org/plink/1.9/).

FinnGen

The FinnGen project (https://www.finngen.fi/en) is a nationwide biobank project launched in 2017. FinnGen plans to collect ~500,000 biobank samples in Finland over 6 years (~10% of the population). The variants were genotyped using the ThermoFisher Axiom custom array v2 that contains 723,376 probesets for 664,510 markers. In addition to the core GWAS markers (about 500,000), it contains about 116,000 coding variants enriched in Finland (https://www.finngen.fi/en/researchers/genotyping). Genotype imputation was conducted by using the population-specific SISu v4.2 imputation reference panel (which contains 8554 whole-genome sequencing (WGS) data of Finnish individuals). The detailed QC and imputation procedures have been described at https://finngen.gitbook.io/documentation/.

We downloaded the GWAS summary statistics from Data Freeze 8 for three strabismus definitions: broad-sense strabismus (coded as ‘Other strabismus’ by FinnGen), convergent concomitant strabismus (ET) and divergent concomitant strabismus (XT). The FinnGen GWAS of ‘Other strabismus’ (primarily comprising ICD-10 H50) included 5604 cases and 297,342 controls; the GWAS of convergent concomitant strabismus (primarily comprising ICD-10 H50.0) included 1368 cases and 297,342 controls; the GWAS of divergent concomitant strabismus (primarily comprising ICD-10 H50.1) included 1863 cases and 297,342 controls.

Estonian Biobank

The EstBB is a population-based biobank with 212,955 participants in the current data freeze (2022v1). All biobank participants have signed a broad informed consent form and information on ICD codes is obtained via regular linking with the National Health Insurance Fund and other relevant databases, with the majority of the EHR collected since 200448.

The EstBB GWAS for the ICD-10 H50* strabismus phenotype included 2,818 cases and 195,861 controls; the GWAS of convergent concomitant strabismus (comprising ICD-10 H50.0) contains 1057 cases and 197,622 controls; the GWAS of divergent concomitant strabismus (comprising ICD-10 H50.1) included 926 cases and 197,753 controls.

All EstBB participants have been genotyped at the Core Genotyping Lab of the Institute of Genomics, University of Tartu, using Illumina Global Screening Array v3.0_EST. Samples were genotyped and PLINK format files were created using Illumina GenomeStudio v2.0.4. Individuals were excluded from the analysis if their call rate was < 95% or if sex based on heterozygosity of the X chromosome did not match sex in phenotype data. Before imputation, variants were filtered by call rate < 95%, Hardy–Weinberg equilibrium p value < 1 × 10−4 (autosomal variants only), and minor allele frequency < 1%. Variant positions were in build 37 and all variants were changed to be from the TOP strand using GSAMD-24v1-0_20011747_A1-b37.strand.RefAlt.zip files from https://www.well.ox.ac.uk/~wrayner/strand/ webpage. Pre-phasing was performed using the Eagle v2.3 software45. The number of conditioning haplotypes Eagle2 used when phasing each sample was set to: --Kpbwt = 20,000, and imputation was performed using Beagle 5.4 (v.18May20.d20) with effective population size Neff = 20,00049. A population-specific reference panel consisting of 2297 WGS samples was used for imputation50. Based on PC analysis, samples of non-European ancestry and samples that were twins or duplicates of included samples were removed.

Association analysis in the EstBB was carried out for all variants with an INFO score >0.7 using the additive model as implemented in Scalable and Accurate Implementation of Generalized mixed model (SAIGE v1.0.7)51, with a saddle point approximation to calibrate unbalanced case-control ratios. Logistic regression was carried out with LOCO = TRUE setting and was adjusted for current age, age-squared, sex and 10 PCs as covariates, analysing only variants with a minimum minor allele count of 2.

Lifelines

Lifelines is a large, multigenerational cohort study that includes over 167,000 participants (10%) from the northern population of the Netherlands. The study included participants from three generations, who will be followed for at least 30 years, to obtain insight into healthy ageing. Detailed population characterisation was described by Scholtens et al.52 and Sijtsma et al.53.

Lifelines samples were genotyped in two separate stages. The first stage used the Illumina Cyto SNP12 v2 chip (~15,000 samples) and the second stage the Illumina Global Screening Array (GSA) chip (~35,000 samples). For the purpose of this analysis, the CytoSNP and GSA datasets were treated as separate analyses (CytoSNP samples that were duplicated or had close relatives in the GSA dataset were excluded beforehand). SNP data obtained from the array were imputed using human reference genomes, including the Genome of The Netherlands (GoNL) release 554 and the 1000 Genomes phase 1 v3 reference panels55, using Minimac (version 2012.10.3.9)56. Prior to imputation, SHAPEIT2 was employed for genotype pre-phasing57, and the Genotype Harmonizer was used to align the genotypes with the reference panels to address strand issues58. Cleaned pedigree files and in- and output files for imputation algorithms were created in PLINK59. The imputation analysis was conducted using Beagle (version 3.1.0.8)60.

Strabismus cases were identified based on self-report of strabismus surgery or self-report of strabismus as the ‘reason to start using glasses/contact lenses’ and were under the age of 8 when they started wearing glasses or contact lenses.

Australian genetics of depression cohort study

The AGDS release 11 contains 20,689 participants who have been recruited through the Australian Department of Human Services and a media campaign. Participants completed an online questionnaire that consisted of a compulsory module that assessed self-reported psychiatric disease history and other traits related to psychopathology. By September 2018, DNA samples from 15,792 participants had been collected using saliva kits. The detailed sample recruitment information has been previously described61. The genotyping of AGDS data was performed using the Illumina Global Screening Array (GSA). The genotype data were imputed via TOPMed Imputation Server62, and the SNPs were dropped by high missingness (>1%), deviation from Hardy–Weinberg equilibrium (<1 × 10−6), and low minor allele frequency (<1%). We also rejected individuals with missing rates >0.01 and kinship coefficients greater than 0.2.

There were 233 participants of European ancestry with self-reported strabismus and 15,117 controls without strabismus. We conducted a GWAS via Regenie (version 2.2.4) (Mbatchou et al. 2021), adjusting for sex, age and the top 20 PCs. SNPs with MAF > 0.01 and INFO score >0.8 were retained.

Previously published USA/Australia/UK GWAS

GWAS for ET was conducted by Shaaban et al.10 examined a white European American cohort as the discovery cohort containing non-accommodative (826 cases and 2991 controls) or accommodative (224 cases and 749 controls) ET samples. The replication cohorts involved non-accommodative (689 cases and 1448 controls) or accommodative (66 cases and 264 controls) ET samples from white European, Australian and United Kingdom populations. These four groups were included as separate cohorts in our meta-analysis.

The sample collection and GWAS analysis procedures have been described in detail in the paper10. Briefly, 337,204 SNPs were genotyped using OmniExpress arrays and passed QC. These SNPs were imputed against 1000 Genomes phase 1 v3 European reference panels55 using IMPUTE2 programme (version 2)63. Phenotyping of data was based on participant examinations by an ophthalmologist, optometrist, or orthoptist; participant questionnaires and reviews of additional medical records.

The authors applied a mixed linear additive model for the 2018 Shaaban study, rather than logistic regression. Therefore, we applied the equation to convert the SNP effect sizes from linear scale beta values to the odds ratios:

$${{{\mathrm{OR}}}}=\frac{(k+{{{\mathrm{beta}}}})}{(1-k-{{{\mathrm{beta}}}})[(1-k)/k]}$$
(1)

Where k = Ncases/Nsamples.

Phenotype definition

We considered three phenotype definitions: broad-sense strabismus (based primarily on ICD-10 code H50, non-paralytic strabismus), ET (based primarily on ICD-10 code H50.0) and XT (based primarily on ICD-10 code H50.1).

For datasets with high-quality ICD-10 data (Finngen, GERA and Estonian Biobank), ICD-10 data were used. For UKB, self-report phenotypes were used in preference to the ICD-10 codes, because the medical records did not cover the relevant early-life period for these participants. For Finngen, we used ‘Other strabismus’ (primarily comprising ICD-10 H50, https://risteys.finngen.fi/endpoints/H7_STRABOTH), convergent concomitant strabismus (primarily comprising ICD-10 H50.0, https://risteys.finngen.fi/endpoints/H7_CONVERSTRAB) and divergent concomitant strabismus (primarily comprising ICD-10 H50.1, https://risteys.finngen.fi/endpoints/H7_DIVERGSTRAB). To maximise the sample size of meta-analyses for broad-sense strabismus, the GERA, EstBB and FinnGen ICD-10-based data were combined with the other GWAS summary statistics based on self-report strabismus data (which did not distinguish between ET and XT) from UKB (field 6147), AGDS and Lifelines. For the broad-sense strabismus and the ET analysis, we also included a published ET GWAS10, which contains four summary statistics involving: accommodative ET (based on combined discovery and replication data) and non-accommodative ET (based on combined discovery and replication data). For the XT meta-analysis, we combined XT GWAS from Finngen, GERA and the Estonian Biobank. The sample sizes of the input datasets are listed in Supplementary Data S1.

Statistical analyses

GWAS meta-analyses

We combined the 11 summary statistics using the weighted-sum scheme (METAL software: 5th May 2020) (Willer et al.64). Following the METAL document, we computed the effective sample size for each input data, where Neff = 4/(1/Ncases + 1/Nctrls). Any variants in the input GWAS with INFO score < 0.3 and MAF < 0.01 were removed prior to the meta-analysis. For the requirements of post-GWAS analysis, we calculated the effect size (logOR) and standard error using the equation described by65:

$${logOR}=\frac{Z}{\sqrt{2\times {{\mathrm{Freq}}}\times (1-{{\mathrm{Freq}}})\times ({{\mathrm{Weight}}}+{Z}^{2})}}$$
(2)
$${SE}=\frac{1}{\sqrt{2\times {{\mathrm{Freq}}}\times (1-{{\mathrm{Freq}}})\times ({{\mathrm{Weight}}}+{Z}^{2})}}$$
(3)

where

Freq is the allele frequency. Weight is proportional to the square root of the effective sample size, as per the METAL document http://genome.sph.umich.edu/wiki/Metal_Documentation (Willer et al. 201064). Given the variation in QC and imputation strategies across the input GWAS summary statistics, variants that were not present in all datasets were included without establishing a specific threshold for the number of studies. Besides, due to X chromosome was excluded in some GWAS sources, we limited our analysis to autosomes to keep the input GWAS consistent.

Conditional analysis

We performed a conditional test using GCTA-COJO (v. 1.94.4; Yang et al.66) to identify statistically independent variants associated with strabismus (window of 1 megabase (MB)). Some of the previously identified strabismus genes may influence strabismus via their effect on refractive error/myopia67, therefore, we applied the mtCOJO approach in GCTA (v. 1.94.4)68 to assess the effect of each SNP on strabismus, accounting for the effect of refractive error. This method performs a conditional analysis where the effect of SNPs on a disease is conditioned upon the disease status. The LD reference for GCTA-COJO and mtCOJO included 4,990 randomly selected individuals of White British ancestry in UKB66.

Genetic correlation analyses

We conducted LD score regression on the GWAS summary statistics for all three strabismus meta-analyses to first estimate the SNP-based heritability and to calculate genetic correlation across strabismus phenotypes69,70. We used a similar procedure to estimate the genetic correlation between three strabismus traits and refractive error22.

Post-GWAS analyses

The Open Targets Genetics platform (https://genetics.opentargets.org/) was used to annotate independent variants by their nearby genes for each independently-associated strabismus variant34. Since the ET GWAS identified the largest number of genome-wide significant variants, it was used as the primary analysis in the post-GWAS analysis. The effect of the lead ET meta-analysis variants on genes expression was investigated using eQTL data from GTEx (V6, V7 and V8, muti-tissue)71,72,73, BIOSQTL (blood cells)74,75, The Brain eQTL Almanac (Braineac, Brain)76, CommonMind Consortium (CMC, Brain)77, Database of Immune Cell Expression, Expression quantitative trait loci (eQTLs) and Epigenomics (DICE, immune cells)78, eQTLcatalogue (muti-tissue)79, eQTLGen (muti-tissue)80, EyeGEx (eye)81, PsychENCODE (brain)82 and xQTLServer (dorsolateral prefrontal cortex)83 through FUMA platform (https://fuma.ctglab.nl/). We conducted a cross-tissue TWAS using UTMOST (v2.0)84. The UTMOST analysis performed single-tissue association tests for 44 GTeX V6 tissues. This was followed by a cross-tissue association test combining 44 gene-trait associations through the joint generalised Berk-Jones (GBJ) test.

The lead SNPs from the PheWAS analysis were cross-referenced against the GWAS Atlas21 (https://atlas.ctglab.nl/PheWAS). The results were filtered by Bonferroni correction (Number of GWASs considered).

Mendelian randomisation of maternal smoking and strabismus

We leveraged our large-scale genetic data on strabismus to test the hypothesis that maternal smoking is causally associated with offspring strabismus risk using MR. We used the well-validated smoking SNP rs1051730/rs16969968 located in the nicotine receptor gene cluster CHRNA5CHRNA4CHRNB3 as an instrumental variable85. Each additional allele of rs16969968 in an offspring has been shown to be associated with a 1.02-higher odds of maternal smoking24. The detailed description of this instrument has been described by Yang et al.24. Here we use offspring genotype as a predictor of (i) maternal smoking in pregnancy24 and (ii) strabismus risk in the offspring (the effect size is estimated from the strabismus GWAS in this study). The Wald-ratio test from the TwoSampleMR package in R86 was used to evaluate the association of maternal smoking on offspring strabismus risk. Given that strabismus is a binary outcome, ORs were converted (by multiplying logORs by 0.693 (log2) and then exponentiating) to ORs per doubling in odds, to reflect the average change in the strabismus per doubling increase in the risk of maternal smoking. We also tested whether there was evidence for a causal link between smoking and strabismus using the offspring’s genome (see Supplementary Results and Supplementary Data S11).

Mendelian randomisation of birth weight and strabismus

To investigate if there is genetic evidence to support a causal relationship between birth weight and strabismus, we utilised MR analyses, using birth weight as an exposure14,24. We performed MR using SNPs associated with birth weight87, selecting SNP instruments with only foetal effects to avoid potential horizontal pleiotropic effects on strabismus through the maternal genotype (in cases where >1 SNP was chosen from a single locus, we selected only 1 SNP with the smallest P value so that SNPs chosen were uncorrelated). Given that these SNPs in the maternal genome do not affect offspring birth weight, they are less likely to influence other offspring outcomes. We excluded SNPs where the structural equation model had indicated a potential issue with model convergence (Supplementary Data S6 in Warrington et al). We conducted a two-sample MR analysis between birth weight and three strabismus phenotypes in our analysis using the TwoSampleMR package86, and applied MR-Egger intercept and MR-PRESSO to check the pleiotropy. We compared the MR OR with the ORs change per 500 g increase from a published observational study25. Changes in ORs and their confidence intervals were derived from the ratio of ORhigh-weight to ORlow-weight.

Reporting summary

Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.