Abstract
Adolescent idiopathic scoliosis (AIS) is a complex genetic disorder. This study used whole-genome sequencing (WGS) to investigate the genetic basis of AIS in 119 patients from 103 families. Our WGS analysis identified known pathogenic or protein-truncating variants in 15 probands, and other strong or moderate candidate variants in 69 additional patients. We found both coding and non-coding mutations, including structural variants. Candidate genes included known AIS genes (e.g., COL11A2, FBN1) and genes linked to other musculoskeletal disorders with scoliosis (e.g., RYR1). Association analysis confirmed four known AIS single-nucleotide polymorphisms in our cohort. Gene set enrichment analysis revealed four gene clusters related to skeletal muscle contraction, extracellular matrix, and gene expression regulation. This WGS-based approach identified clinically relevant genetic variations and biological pathways in AIS patients, offering valuable insights into its complex development.
Similar content being viewed by others
Introduction
Adolescent idiopathic scoliosis (AIS) is the most common nondegenerative spinal abnormality, with a prevalence of 1–4%1,2,3. It is diagnosed when a three-dimensional spinal deformity is measured by the Cobb angle of more than 10° in the coronal plane4. Severe progressive curvatures can lead to serious complications such as cardiac or respiratory compromise, back pain, and other degenerative diseases, as well as psychological issues related to cosmetic deformity, which can have a long-term impact on the patient’s quality of life5. For mild scoliosis, brace therapy is often combined with physiotherapy, and for severe individuals, surgical treatments are required.
Despite growing research for decades, the etiopathogenesis of AIS remains unclear. Several hypotheses regarding AIS etiology have been proposed, involving genetics, central nervous system, biomechanics, metabolic pathways, spinal cord growth, endocrine factors, sex hormones, bone metabolism, and epigenetics6,7. Epidemiological studies have identified high heritability and marked sexual dimorphism in AIS, with significant female predominance1. Multiple literatures support the genetic foundation of AIS, with sibling recurrence risks reported to be 17.7%, the most recent heritability estimate of 57% (95% CI: 0.29–0.86)8, and an odds ratio (OR) for developing AIS being 1.5-fold higher for the participants whose mother had scoliosis9. Nevertheless, the rarity of causative genes that have been clearly identified as ‘AIS genes’ and the lack of replication of certain variants between genetic analyses in different study cohorts suggest that AIS is a highly complex polygenic disease that results from the interaction of multiple gene loci and the environment10,11,12.
With the advancement of next-generation sequencing, molecular genetics diagnostic methods using targeted sequencing or whole-exome sequencing (WES) have been widely utilized. Most AIS genetic research has utilized targeted sequencing, WES, and single-nucleotide polymorphism (SNP)-based genome-wide association studies (GWAS)13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28. However, mutations in regulatory or intronic regions, difficult-to-sequence repetitive sequences, and the presence of novel genes are challenging to uncover with targeted sequencing or WES, and the complex genomic changes, including structural variations (SVs), are difficult to identify with SNP-based GWAS. Whole genome sequencing (WGS), covering the entire genome, allows for a more comprehensive exploration of the extended genetic landscape. The clinical utilization of whole-genome sequencing (WGS) has been limited due to high costs and excessive data; however, a decline in the costs of high-throughput sequencing and advances in bioinformatic analyses have increased the potential for WGS to be used in clinical practice29.
In this study, we comprehensively explored the genetic landscape in 103 probands with AIS utilizing WGS. For WGS analysis, we implemented a WGS analysis pipeline powered by RareVisionTM, consisting of automated algorithms and manual curation by specialists in genomic medicine. The primary objective of this study was to identify new candidate pathways and functional categories associated with scoliosis by analyzing a large cohort of AIS patients.
Results
Baseline characteristics
The cohort comprised 204 samples from 103 families, consisting of 103 AIS probands and 101 relatives (16 singletons, 65 mother-only duos, 8 father-only duos, and 14 trios). Among the 103 probands, 81 patients (78.6%) were female, whereas 22 (21.4%) were male. Additionally, 87 patients (84.5%) had idiopathic AIS, whereas 16 (15.5%) had familial AIS with an affected mother. Therefore, the analysis of candidate variants included 119 AIS patients (103 probands and 16 affected mothers) from 103 families.
All probands were diagnosed with scoliosis under the age of 18, by a Cobb angle measurement exceeding 10°. Detailed measurements were available for a subset of de-identified patients (n = 62, Supplementary Data 2), showing an average of 21.6° (1.45°–53.1°) proximal, 40.2° (5.26°–100°) thoracic, and 32.3° (2.19°–85.4°) lumbar Cobb angle at the time of their latest evaluation.
Our cohort was recruited at 13 sites across the continental US and Hawaii. Race and ethnicity data were not available for the participants. We used principal component analysis (PCA) to explore the genetic ancestry of the families in the cohort. The majority of samples in the cohort clustered adjacent to European and Admixed American samples from the 1000 Genomes project (Fig. 1).
Principal component analysis (PCA) results of the SHC AIS cohort (represented in pink) combined with 1000GP. a PC1 vs. PC2 of Parents (N = 101). b PC1 vs. PC2 of Probands (N = 103). c PC1 vs. PC3 of Parents. d PC1 vs. PC3 of Probands. Explained variance for parents: PC1 = 33.1%, PC2 = 25.6%, PC3 = 12.4%. Explained variance for probands: PC1 = 33.1%, PC2 = 25.6%, PC3 = 12.7%. 1000GP populations (N = 3202): AFR, African (N = 893); AMR, Admixed American (N = 490); EAS, East Asian (N = 585); EUR, European (N = 633); SAS, South Asian (N = 601).
Variant filtration and gene ontology analysis
The average for genomic coverage was 40.7x, and the average duplication rate was 7.9%. After filtering for minor allele frequency and predicted effect (see Methods), we identified 2463 rare variants classified as pathogenic (P), likely pathogenic (LP), or uncertain effect (VUS) in 2122 genes (Supplementary Data 1). Missense variants constituted the majority of filtered variants (79.4%), followed by frameshift (8.9%), stop gain, stop loss, or start loss (7.2%), splicing/regulatory site (2.6%), and structural/copy number variants (1.8%).
To understand the biological functions of identified genes that were overrepresented with damaging variants within the AIS families, we performed an over-representation analysis based on the filtered gene list (Supplementary Data 1). The most overrepresented GO Cellular Component categories (Table 1) were “laminin-11 complex”, “laminin-10 complex”, “t-UTP complex” (p = 1.16 × 10−3, 9.51 fold enrichment), and inner dynein arm” (p = 6.99 × 10−5, 7.93 fold enrichment). The most overrepresented GO Molecular Function (Table 2) categories were “alpha-1,4-glucosidase activity” (p = 1.28 × 10−5, 9.51 fold enrichment), followed by “serotonin:sodium:chloride symporter activity”, “inositol 1,4,5-trisphosphate-gated calcium channel activity”, and “carnitine O-octanoyltransferase activity” (p = 1.16 × 10−3, 9.51 fold enrichment). Figure 2 displays a visual representation of Enrichr and the top ten clusters in terms of p-value, which were linked to binding activities. The top enriched GO Biological Process (Fig. 2) terms were “supramolecular fiber organization” (p = 9.35 × 10−12) and “extracellular matrix organization” (p = 1.42 × 10−10). Furthermore, the top enriched KEGG term was “ECM-receptor interaction” (p = 3.02 × 10−9).
Top 10 enriched categories identified via Enrichr. Analysis was performed using the list of 2123 genes found in Supplementary Data 1. Biological process GO terms are represented in pink, Kyoto Encyclopedia of Genes and Genomes (KEGG) terms in gray.
Candidate variants
We then performed variant curation for each pedigree, prioritizing variants based on predicted effect, family segregation, and association between the affected gene and AIS (see Methods for details). Based on the strength of the highest-priority variant(s) (as described in Table 3), we assigned each case to one of four groups: solved, near-solved pending missing parent genotype, case with strong candidate variant, and case with moderate candidate variant. All remaining cases were considered unsolved. Out of 103 cases, 15 were solved or near-solved, 21 had at least one strong candidate, 48 had at least one moderate candidate, and 19 were unsolved.
Detailed curation results and candidate variant information for the 84 positive cases are shown in Supplementary Data 3. In total, 130 candidate variants were identified, all of which were heterozygous. These included 10 de novo variants, 4 compound heterozygous variants, 17 variants inherited from an affected parent, 8 variants inherited from a parent with unknown affection, and 92 variants with unknown inheritance due to unavailable parental genotypes. Candidate variants included point mutations in coding and regulatory regions, as well as structural variants, showing the utility of WGS.
Candidate variants were identified within 116 genes/loci (Table 4), of which 103 were unique to one family, nine (FBN2, HSPG2, MYH2, NSD2, RBM8A, ROR2, SACS, SYNE1, TRIO) were shared between two families, and one (RYR1) was shared between four families. Candidate genes included known AIS genes COL11A2, FBN1, FBN2, HSPG2, and KIF7, as well as genes associated with other musculoskeletal and developmental syndromes where scoliosis is a known phenotype, such as RYR1 and MYH2.
To understand the connections between the candidate genes, we created an interaction map using the STRING database30. We saw that 96 out of 116 candidate genes had at least one interactor (Supplementary Fig. 1). Furthermore, we identified 4 interconnected clusters that included the majority of the candidate genes (Supplementary Figs. 2, 3). Cluster 1 included extracellular matrix (ECM) proteins, such as COL11A2, FBN1, HSPG2, and LAMA2. Cluster 2 included proteins involved in muscle contraction, such as ACTN2, MYBPC1, MYH2, and SCN4A. Cluster 3 included Joubert syndrome genes, several of which are involved in ciliogenesis, including ATP7B, CPLANE1, SPATA5, and TMEM67. Cluster 4 included DNA-interacting proteins, including gene expression regulators, such as ARID1B, DNMT3A, MED13L, and TCF12.
Allelic test of association for SNPs
We attempted to compare SNPs between our patient cohort and a general population cohort of similar ethnic background to determine their association with AIS disease. Pérez-Machado et al.7 proposed SNPs with the most statistically significant associations with AIS using 10 previous studies and 1387 published associations reported up to 2019. We performed allelic association tests in our patient cohort for the 15 SNPs proposed by Pérez-Machado et al. Table 5 lists the genes and SNPs of these results and shows a significant association of the genetic variant rs12946942, located between SOX9 and KCNJ2, with AIS. A significant association of loci (rs10756785 and rs3904778) of the BNC2 gene and locus (rs7633294) of MAGI1 with AIS was also found.
Representative cases
The representative cases in which we were able to identify the causative gene of AIS through WGS are presented. Patient 14-0356 was a trio with unaffected parents and was identified with a ~3.0 Mbp deletion encompassing four low-copy repeats (LCR) A to D in the chromosome 22q11.2 region (Fig. 3). Comparison of coverage of the 22q11 region in this family showed that this variant was a de novo deletion that occurred only in the proband. 22q11.2 proximal deletions are associated with DiGeorge syndrome (MIM 188400), which presents with several physiological and anatomical phenotypes, including scoliosis31. The TBX1 gene is located at region 22q11.2, and a haploinsufficiency of TBX1 has been identified as the driver of most of the physical malformations in patients with DiGeorge syndrome32.
a Schematic overview of the chromosome 22q11.2 region from the UCSC Genome Browser. The region of the 22q11.2 chromosomal deletion in this patient spans ~3 Mb and contains four low copy repeats (LCR22A-D) distributed across this region. b Coverage of chromosome 22q11.2 region in this trio family (proband in purple, mother in yellow, and father in blue). Y-axis shows read depth and X-axis shows position in Kbs. LCR22A-D regions are shown in red boxes, and the de novo deletion region in proband is shown with the bar.
Patient 14-0105 was also a trio with unaffected parents, and the genetic results are shown in Fig. 4. A frameshift variant (p.Glu1344ArgfsTer91) in the NSD2 gene on chromosome 4 (chr4:1978831:G>GC) was identified with WGS. This variant was not found in the patient’s parents, indicating that it was a de novo event. Haploinsufficiency due to de novo heterozygous frameshift and missense variants in NSD2 has been reported to cause a diverse spectrum of skeletal abnormalities, including scoliosis33.
Discussion
In the present study, we investigated the genetic landscape in 103 AIS probands utilizing WGS. To the best of our knowledge, this is the largest study to explore the comprehensive genomic landscape of an AIS cohort using WGS.
The lack of overlap in specific genes between different families suggests that AIS is a heterogeneous disorder involving variants in multiple genes, potentially suggesting that a combination of genetic/epigenetic and environmental factors is involved in the expression of different phenotypes. GO enrichment analysis revealed that the genes identified in this AIS cohort are mainly involved in “extracellular matrix organization”. Allelic association tests for SNPs showed that SNP risk alleles located between SOX9 and KCNJ2, BNC2, and MACl1 genes were significantly associated with AIS. A majority of the candidate genes in our list were associated with other complex syndromes with severe phenotypes, including but not limited to scoliosis. This could indicate that patients diagnosed with AIS may be presenting with milder cases of more severe syndromes.
Genes related to ECM organization were the most enriched category in our dataset. The ECM is a complex and dynamic component of all tissues, comprising components such as fibrillar proteins, glycosaminoglycans, proteoglycans, minerals, and related proteins34. ECM-related candidate genes we identified in our patient cohort include HSPG2, FBN1, and FBN2, which have also been previously associated with AIS35,36, as well as several collagen genes COL5A1, COL5A2, COL6A3, and COL11A2. Previous research has suggested that the main pathogenesis of AIS is asymmetrical bone growth37, and histological studies have shown decreased chondrogenesis, disorganized columnation, and premature cessation of growth in the cartilaginous growth plate of the vertebral body38. It has been proposed that growth plate chondrocytes indirectly regulate cellular activities, such as shape and volume preferential changes, through alterations in ECM synthesis and degradation39. These previous studies in individuals with AIS, coupled with our genetic findings, suggest that ECM organization genes may play an important role in the development of abnormal spinal curvature.
We also identified several candidate mutations in genes related to muscle contraction, including MYH2, RYR1, MYBPC1, ACTN2, and TPM3. There have been various studies on the asymmetry of the paraspinal muscles in patients with AIS, and recent literature has reported dominance of electromyography activity on the convex side of the scoliotic curve40,41,42. The genetic findings from our group and others support a functional role for muscle contraction in AIS etiology. Although the pathogenesis of AIS is still not fully understood, we speculate that the asymmetry of the activity of structures involved in paraspinal muscle contraction as well as the ECM comprising the growth plate of the vertebral body are intricately connected within musculoskeletal tissues and are involved in the biomechanical system, which is one of the important etiologies of AIS.
The majority of previous research on the genetic basis of AIS was genome-wide association studies, which identified a number of SNPs associated with AIS. Although we had a small cohort, we were able to leverage the ethnic background of our participants to replicate the association of 4 known AIS SNPs. The rs12946942 locus at 17q24.3 is correlated with curve severity43. It is located between SOX9, a pivotal transcription factor involved in skeletal development and ECM remodeling44,45, and KCNJ2, which encodes a potassium channel involved in muscle contraction, and associated with skeletal malformations including progressive scoliosis46,47. The rs7633294 at 3p14.1 is located near MAGI1, which encodes a scaffolding molecule important for the stabilization of cadherin-mediated cell interactions47,48. The loci rs10756785 and rs3904778 are located within BNC2, which encodes a highly conserved zinc-finger protein involved in ECM regulation49,50,51,52. These findings further support that the regulation of ECM structure and muscle contraction through transcriptional mechanisms play a significant role in AIS. Given the intricacy of these systems, identifying functional regulatory variants is crucial to understanding the precise mechanisms that lead to disordered phenotypes. Continued efforts in utilizing WGS combined with regulatory variant annotation have been successful in identifying candidate causative variants in other complex disorders like cardiomyopathy and autism spectrum disorder53,54, and similar approaches should be employed to better understand the etiology of AIS.
Out of the 130 candidate variants we have identified, 20 were located within non-coding regions, including 10 structural variants, 6 splice site variants, and 4 regulatory region variants. The identification of intronic variants and SVs requires sequencing of non-coding regions of the genome. The panel-based sequencing and WES are limited in their ability to detect such variants, particularly those with breakpoints in intronic or intergenic regions, whereas WGS can overcome this limitation by covering all exonic and intronic regions of the genome55,56. We believe that genetic approaches using WGS in these populations will help us understand the different phenotypes of patients in clinical practice and characterize the genetic components of AIS pathophysiology. It is noteworthy that a variety of syndromic genes were identified in our patient cohort, suggesting that AIS may present as a milder presentation of more severe forms of syndromes. While many studies have reported on the important role of genetics in the initiation of AIS, it is also important to note that environmental factors and epigenetic influences are also deeply involved in the progression of AIS7,57.
This study had some limitations. The study population was predominantly of European and American ancestry. To capture the complete genetic landscape of AIS, WGS studies of larger cohorts including individuals with diverse ancestry will be required. Secondly, the majority of our identified variants are heterozygous, and without functional testing, it is unclear if these variants have a true dominant-negative effect on the resulting protein. Additional functional studies of specific variants will be warranted to demonstrate causality. Thirdly, analyses of clinical phenotypes of AIS, such as Cobb angle and location of the curve, are lacking in some patients. Further studies to determine the association between clinical manifestations and genotype will provide a deeper understanding of the role of genetic mechanisms underlying AIS. However, despite these limitations, the present study has the strength of being the first to investigate WGS results of the largest AIS cohort.
In conclusion, this study investigated the genetic landscape in the largest AIS cohort utilizing WGS. We observed an overrepresentation of variants in ECM organization and muscle contraction categories, with few specific genes shared across families. Overall, this study revealed that the genetic etiology of AIS is highly polygenic and that WGS has a greater potential to detect a wider spectrum of genetic variants, such as SVs and intronic variants. Further research with clinical phenotypes and functional studies of genetic variants is required to draw definitive conclusions regarding the genetic mechanisms underlying AIS pathogenesis.
Methods
Study subjects
This is a retrospective analysis of patients with AIS at the Research for Precision Medicine study at Shriners Hospitals for Children (SHC) – The Genome Institute. The study was approved by the Institutional Review Board of SHC (IRB approval number: 9-21-2022) and adhered to the tenets of the Declaration of Helsinki. Patients were recruited from 2019 to 2022 at 13 SHC locations, including 12 locations in the continental US and one in Hawaii. All patients included in this study were under 18 years old at the time of their first visit at SHC. Clinical diagnosis of scoliosis was made based on a measured Cobb angle exceeding 10°. All patients underwent an orthopedic examination to confirm the idiopathic nature of scoliosis. Patients with congenital scoliosis and possible conditions that cause neuromuscular scoliosis, including cerebral palsy, Duchenne muscular dystrophy, myelomeningocele, spinal muscular atrophy, or Friedrich’s ataxia, were excluded from the study. The radiographic diagnosis of scoliosis relied on the presence of a lateral curve to the spine greater than 10° with vertebral rotation58. The demographic data and clinical phenotypes were retrieved from the electronic medical records. A pedigree investigation was conducted for all patients, and the principle of performing genetic testing for all available parents was followed. Genome sequencing was performed as singleton, duo, and trio sequencing, depending on the availability of biological parental samples. All individuals provided informed consent prior to the genetic analyses.
Whole-genome sequencing and bioinformatics analysis
To obtain genomic DNA, peripheral blood samples were collected from probands with or without their parents. Shriners Genomics Institute performed the entire genome sequencing process, and analysis and interpretation were performed using the RareVision™ system (Inocras Inc., San Diego, CA, USA). Genomic DNA was extracted from blood samples using the Allprep DNA/RNA kits (Qiagen, Venlo, Netherlands). DNA libraries were prepared using TruSeq DNA PCR-Free Library Prep Kits (Illumina, San Diego, CA, USA) and sequenced on the Illumina NovaSeq6000 platform (Illumina) with an average depth of coverage of 30×. The obtained genome sequences were aligned to the human reference genome (GRCh38) using the BWA-MEM algorithm. PCR duplicates were removed using SAMBLASTER. The initial mutation calling for base substitutions and short indels was performed using HaplotypeCaller2 and Strelka2, respectively. Structural variations were identified using Manta. Variants were annotated using the Ensembl Variant Effect Predictor. Annotations included population allele frequencies from 1000 Genomes Project (1000GP)59, The Genome Aggregation Database (gnomAD)60, and The Exome Aggregation Consortium (ExAC)61, and predictions of variant deleteriousness from BayesDel, DEOGEN2, FATHMM, LIST-S2, LRT, M-CAP, MetaLR, MetaRNN, MetaSVM, MutationAssessor, MutationTaster, PolyPhen-2, PROVEAN, PrimateAI, and SIFT4G.
Pathogenicity was defined based on the clinical information and genetic variant information of the test subjects, following the guidelines published by the American College of Medical Genetics and Genomics (ACMG)62, ClinVar annotations and functional prediction scores: variants reported to be either (1) pathogenic (P), (2) likely pathogenic (LP), (3) uncertain significance (VUS), or benign/likely benign (B/LB).
Variant filtration
To focus on rare deleterious variants, we first excluded variants with minor allele frequency (MAF) > 0.1% in public databases, including any available subpopulations. We also excluded any variants found in any unaffected person in our cohort.
Pathogenicity was defined using clinical information and genetic variant information of the test subjects, based on VEP and ClinVar annotations. Structural variants affecting gene integrity and variants with “HIGH” VEP impact rating were grouped as protein-truncating variants (PTVs). Following the guidelines published by the American College of Medical Genetics and Genomics (ACMG)62, each variant was classified under pathogenic/likely pathogenic (P/LP) or benign/likely benign (B/LB). To be stringent, variants with contradictory criteria were classified under B/LB. All remaining variants were classified as of uncertain significance (VUS). The following variants were excluded from further analysis: All B/LB variants, in-frame insertions and deletions, variants with VEP impact ratings “LOW” or “MODIFIER”, and variants not found in all affected family members (if available).
Candidate variant prioritization
To prioritize candidate variants, we considered (A) predicted effect of the mutation, (B) previously reported associations between the gene and AIS, and (C) familial segregation.
For (A) we used the VEP annotations and ACMG classifications to prioritize variants in the following order: (1) P/LP variants, followed by VUS with (2) strong, (3) moderate, or (4) low evidence to support a deleterious effect, and (5) variants with no evidence of deleterious effect. For (2), we considered copy number variants, protein-truncating variants (PTVs), and variants with ClinVar pathogenic/likely pathogenic classification. PTVs included variants with “HIGH” VEP impact rating and structural variants affecting gene integrity. For (3) and (4), we considered VUS missense variants with pathogenic classifications by at least 3 computational prediction tools. If 2/3 or more of the available predictions were pathogenic, this was considered moderate evidence. If between 1/3 and 2/3 of the available predictions were pathogenic, this was considered low evidence. All remaining variants were grouped under (5). We created a filtered variant file including all variants that fit under criteria 1–4 (Supplementary Data 1), and all further prioritization was done using this table.
For (A), we employed a filter using the Human Phenotype Ontology (HPO) term “Scoliosis” and conducted a gene-based review in OMIM and PubMed. Genes were sorted into 5 categories and were prioritized based on the following order: (1) Known AIS genes, (2) autosomal dominant (AD) syndromic genes, (3) autosomal recessive (AR) syndromic genes, (4) AIS-associated genes, and (5) AIS-unassociated genes. Known AIS genes include 12 genes with strong evidence for AIS association in previous studies (ADGRG6/GPR12663, AKAP264, BNC249, CHD765,66, COL11A167, COL11A268, FAT369, FBN135, FBN235, HSPG236, KIF770, LBX171, and POC572). AD/AR syndromic genes are associated with an OMIM-listed genetic syndrome where scoliosis is a reported symptom or common comorbidity. AIS-associated genes include genes with previous evidence for association with AIS, other skeletal abnormalities, or other known AIS genes. All remaining genes, for which we have not found any relevant evidence, were considered AIS-unassociated.
For (C) familial segregation, variants were categorized based on family type and mode of inheritance. Our cohort included trios, duos, and singletons. Note that the phenotype information was unavailable for some participating parents. These cases were evaluated based on the number of parents with known phenotypes. The following modes of inheritance were considered: De novo variants were absent in both unaffected parents and heterozygous in the proband. Inherited homozygous variants were heterozygous in both unaffected parents and homozygous in the proband. Inherited heterozygous variants were absent in the unaffected parent and heterozygous in the affected parent and the proband. Compound heterozygous variants were defined as two different inherited heterozygous or de novo variants present in the same gene in trans. For unknown parental genotypes and phenotypes, all possible options were considered, and the candidate variants were evaluated for each combination to select the strongest option. Variants that did not fit the criteria for de novo, heterozygous, homozygous, or compound heterozygous based on available parental genotypes and phenotypes were considered non-segregating.
Based on these criteria, the candidate genes were sorted into categories as described in Table 3 as follows: 1) solved case, 2) nearly solved case, requires missing parent genotype, 3) case with strong candidate variant, 4) case with moderate candidate variant.
Principal component analysis
Principal component analysis (PCA) was carried out in PLINK version 1.90b6.1173,74 using Phase 3 1000GP data59. The same analysis was carried out for both probands and their parents separately. VCF files were converted into PLINK format. Autosomal variants were filtered for Hardy–Weinberg equilibrium (p < 0.001), MAF > 5%, and maximum missing genotype rate of 25%. AIS cohort and 1000GP data were then merged and pruned to remove variants with MAF < 10%, missing genotype rate greater than 5%, and pruned for linkage disequilibrium (LD) using PLINK --indep-pairwise 50 5 0.15. Triallelic and palindromic variants were also removed. PCA was run in PLINK using the --pca flag, and the first two principal components were plotted in R.
Gene ontology analysis
Genes that include P/LP, strong VUS, moderate VUS, or low VUS were selected for the gene ontology analysis (see Supplementary Data 1). We performed a molecular pathway analysis using the Kyoto Encyclopedia of Genes and Genomes (KEGG)75 to determine how candidate genes are networked and enriched within the pathways. Gene Ontology (GO)76,77 terms derived from cellular component and molecular function were also annotated to figure out protein-protein interactions. GO enrichment analysis by PANTER is accessible at https://geneontology.org/. The significance of an enrichment result was set at false discovery rate (FDR) < 0.05 for all functional annotations. The same input gene lists as used for GO enrichment analysis were used in Enrichr78, accessible at https://maayanlab.cloud/enrichr-kg. We report results from the 2021 GO term Biological Process and 2021 KEGG Human pathways.
SNP association analysis
The population of each proband was inferred by k-means clustering (KNN). Briefly, KNN was run on the first 3 principal components from the PCA output, using 1000GP data as the training set, and assuming k = 5 for the 5 superpopulations in 1000GP (African or AFR, Admixed American or AMR, East Asian or EAS, European or EUR, and Southeast Asian or SAS). For both probands and 1000GP samples, only individuals with AMR or EUR backgrounds were included in the association analysis, as other population groups were rare in our cohort. An allelic test of association was performed for 15 SNPs previously associated with AIS7 using PLINK version 1.90b6.
Data availability
The data presented in this study are available on request from the corresponding authors. The data are not publicly available due to the ethical and private nature of the data.
Code availability
The RareVision™ algorithm and associated in-house scripts are proprietary to Inocras Inc.
References
Rogala, E. J., Drummond, D. S. & Gurr, J. Scoliosis: incidence and natural history. A prospective epidemiological study. J. Bone Jt. Surg. 60, 173–176 (1978).
Asher, M. A. & Burton, D. C. Adolescent idiopathic scoliosis: natural history and long term treatment effects. Scoliosis 1, 2 (2006).
Soucacos, P. N., Soucacos, P. K., Zacharis, K. C., Beris, A. E. & Xenakis, T. A. School-screening for scoliosis. A prospective epidemiological study in northwestern and central Greece. J. Bone Jt. Surg. 79, 1498–1503 (1997).
Weinstein, S. L., Dolan, L. A., Cheng, J. C., Danielsson, A. & Morcuende, J. A. Adolescent idiopathic scoliosis. Lancet 371, 1527–1537 (2008).
Dunn, J. et al. Screening for adolescent idiopathic scoliosis: evidence report and systematic review for the US preventive services task force. JAMA J. Am. Med. Assoc. 319, 173 (2018).
Barton, C. B. & Weinstein, S. L. Adolescent idiopathic scoliosis: natural history. In Pathogenesis of Idiopathic Scoliosis, 27–50 (Springer, 2018). https://doi.org/10.1007/978-4-431-56541-3_2.
Pérez-Machado, G. et al. From genetics to epigenetics to unravel the etiology of adolescent idiopathic scoliosis. Bone 140, 115563 (2020).
Cheng, T., Einarsdottir, E., Kere, J. & Gerdhem, P. Idiopathic scoliosis: a systematic review and meta-analysis of heritability. EFORT Open Rev. 7, 414–421 (2022).
Watanabe, K. et al. Physical activities and lifestyle factors related to adolescent idiopathic scoliosis. J. Bone Jt. Surg. 99, 284–294 (2017).
Terhune, E. A. et al. Whole exome sequencing of 23 multigeneration idiopathic scoliosis families reveals enrichments in cytoskeletal variants, suggests highly polygenic disease. Genes12, 922 (2021).
Xia, C. et al. Rare variant of HSPG2 is not involved in the development of adolescent idiopathic scoliosis: evidence from a large-scale replication study. BMC Musculoskelet. Disord. 20, 24 (2019).
Takahashi, Y. et al. Replication study of the association between adolescent idiopathic scoliosis and two estrogen receptor genes. J. Orthop. Res. 29, 834–837 (2011).
Sharma, S. et al. Genome-wide association studies of adolescent idiopathic scoliosis suggest candidate susceptibility genes. Hum. Mol. Genet 20, 1456–1466 (2011).
Sharma, S. et al. A PAX1 enhancer locus is associated with susceptibility to idiopathic scoliosis in females. Nat. Commun. 6, 6452 (2015).
Zhu, Z. et al. Genome-wide association study identifies new susceptibility loci for adolescent idiopathic scoliosis in Chinese girls. Nat. Commun. 6, 8355 (2015).
Zhu, Z. et al. Genome-wide association study identifies novel susceptible loci and highlights Wnt/beta-catenin pathway in the development of adolescent idiopathic scoliosis. Hum. Mol. Genet 26, 1577–1583 (2017).
Takahashi, Y. et al. A genome-wide association study identifies common variants near LBX1 associated with adolescent idiopathic scoliosis. Nat. Genet 43, 1237–1240 (2011).
Liu, S. et al. Genetic polymorphism of LBX1 is associated with adolescent idiopathic scoliosis in Northern Chinese Han population. Spine 42, 1125–1129 (2017).
Liu, G. et al. Genetic polymorphisms of GPR126 are functionally associated with PUMC classifications of adolescent idiopathic scoliosis in a Northern Han population. J. Cell Mol. Med 22, 1964–1971 (2018).
Liu, G. et al. Genetic polymorphisms of PAX1 are functionally associated with different PUMC types of adolescent idiopathic scoliosis in a northern Chinese Han population. Gene 688, 215–220 (2019).
Man, G. C.-W. et al. Replication study for the association of GWAS-associated Loci with adolescent idiopathic scoliosis susceptibility and curve progression in a Chinese population. Spine 44, 464–471 (2019).
Qin, X. et al. Genetic variant of GPR126 gene is functionally associated with adolescent idiopathic scoliosis in Chinese population. Spine42, E1098–E1103 (2017).
Xu, L. et al. Genetic variant of PAX1 gene is functionally associated with adolescent idiopathic scoliosis in Chinese population. Spine 43, 492–496 (2018).
Xu, L. et al. Genetic variant of BNC2 gene is functionally associated with adolescent idiopathic scoliosis in Chinese population. Mol. Genet. Genomics 292, 789–794 (2017).
Wu, Z. et al. Genetic variants of CHD7 are associated with adolescent idiopathic scoliosis. Spine 46, E618–E624 (2021).
Li, Y. et al. Genetic variant of TBX1 gene is functionally associated with adolescent idiopathic scoliosis in Chinese population. Spine 46, 17–21 (2021).
Grauers, A. et al. Candidate gene analysis and exome sequencing confirm LBX1 as a susceptibility gene for idiopathic scoliosis. Spine J. 15, 2239–2246 (2015).
Nada, D., Julien, C., Samuels, M. E. & Moreau, A. A replication study for association of LBX1 locus with adolescent idiopathic scoliosis in French–Canadian population. Spine 43, 172–178 (2018).
Pennisi, E. & Upstart, D. N. A. sequencers could be a ‘game changer’. Science 376, 1257–1258 (2022).
Szklarczyk, D. et al. The STRING database in 2023: protein–protein association networks and functional enrichment analyses for any sequenced genome of interest. Nucleic Acids Res. 51, D638–D646 (2023).
Morava, E., Lacassie, Y., King, A., Illes, T. & Marble, M. Scoliosis in velo-cardio-facial syndrome. J. Pediatr. Orthop. 22, 780–783 (2002).
Du, Q., de la Morena, M. T. & van Oers, N. S. C. The genetics and epigenetics of 22q11.2 deletion syndrome. Front. Genet. 10, 1365 (2020).
Wiel, L. C., Bruno, I., Barbi, E. & Sirchia, F. From Wolf-Hirschhorn syndrome to NSD2 haploinsufficiency: a shifting paradigm through the description of a new case and a review of the literature. Ital. J. Pediatr. 48, 72 (2022).
Lamandé, S. R. & Bateman, J. F. Genetic disorders of the extracellular matrix. Anat. Rec. 303, 1527–1542 (2020).
Buchan, J. G. et al. Rare variants in FBN1 and FBN2 are associated with severe adolescent idiopathic scoliosis. Hum. Mol. Genet 23, 5271–5282 (2014).
Baschal, E. E. et al. Exome sequencing identifies a rare HSPG2 variant associated with familial idiopathic scoliosis. G3 Genes Genomes Genet. 5, 167–174 (2015).
Michelsson, J.-E. The development of spinal deformity in experimental scoliosis. Acta Orthop. Scand. 36, 3–91 (1965).
Stilwell, D. L. Structural deformities of vertebrae: bone adaptation and modeling in experimental scoliosis and kyphosis. J. Bone Jt. Surg. 44, 611–634 (1962).
Wang, S. et al. Expression of Runx2 and type X collagen in vertebral growth plate of patients with adolescent idiopathic scoliosis. Connect Tissue Res. 51, 188–196 (2010).
Cheung, J. et al. The relation between electromyography and growth velocity of the spine in the evaluation of curve progression in idiopathic scoliosis. Spine 29, 1011–1016 (2004).
Farahpour, N., Ghasemi, S., Allard, P. & Saba, M. S. Electromyographic responses of erector spinae and lower limb’s muscles to dynamic postural perturbations in patients with adolescent idiopathic scoliosis. J. Electromyogr. Kinesiol. 24, 645–651 (2014).
Stetkarova, I. et al. Electrophysiological and histological changes of paraspinal muscles in adolescent idiopathic scoliosis. Eur. Spine J. 25, 3146–3153 (2016).
Miyake, A. et al. Identification of a susceptibility locus for severe adolescent idiopathic scoliosis on chromosome 17q24.3. PLoS One 8, e72802 (2013).
Tsingas, M. et al. Sox9 deletion causes severe intervertebral disc degeneration characterized by apoptosis, matrix remodeling, and compartment-specific transcriptomic changes. Matrix Biol. 94, 110–133 (2020).
Güven, A. et al. Extracellular matrix-inducing Sox9 promotes both basal progenitor proliferation and gliogenesis in developing neocortex. Elife 9, 49808 (2020).
Villa, M. Genes associated with adolescent idiopathic scoliosis: a review. Hereditary Genet. 04, 146 (2015).
Blyth, M., Huang, S., Maloney, V., Crolla, J. A. & Karen Temple, I. A 2.3 Mb deletion of 17q24.2–q24.3 associated with ‘Carney Complex plus’. Eur. J. Med. Genet. 51, 672–678 (2008).
Mizuhara, E. et al. MAGI1 recruits Dll1 to cadherin-based adherens junctions and stabilizes it on the cell surface. J. Biol. Chem. 280, 26499–26507 (2005).
Ogura, Y. et al. A functional SNP in BNC2 is associated with adolescent idiopathic scoliosis. Am. J. Hum. Genet. 97, 337–342 (2015).
Khanshour, A. M. et al. Genome-wide meta-analysis and replication studies in multiple ethnicities identify novel adolescent idiopathic scoliosis susceptibility loci. Hum. Mol. Genet. 27, 3986–3998 (2018).
Bobowski-Gerard, M. et al. Functional genomics uncovers the transcription factor BNC2 as required for myofibroblastic activation in fibrosis. Nat. Commun. 13, 5324 (2022).
Orang, A. et al. Basonuclin-2 regulates extracellular matrix production and degradation. Life Sci. Alliance 6, e202301984 (2023).
Tuncay, I. O. et al. Analysis of recent shared ancestry in a familial cohort identifies coding and noncoding autism spectrum disorder variants. NPJ Genom. Med. 7, 13 (2022).
Lesurf, R. et al. Whole genome sequencing delineates regulatory, copy number, and cryptic splice variants in early onset cardiomyopathy. NPJ Genom. Med. 7, 18 (2022).
Meynert, A. M., Ansari, M., FitzPatrick, D. R. & Taylor, M. S. Variant detection sensitivity and biases in whole genome and exome sequencing. BMC Bioinforma. 15, 247 (2014).
Belkadi, A. et al. Whole-genome sequencing is more powerful than whole-exome sequencing for detecting exome variants. Proc. Natl. Acad. Sci. 112, 5473–5478 (2015).
Goldberg, C. J., Dowling, F. E. & Fogarty, E. E. Adolescent idiopathic scoliosis: is rising growth rate the triggering factor in progression?. Eur. Spine J. 2, 29–36 (1993).
Horne, J. P., Flannery, R. & Usman, S. Adolescent idiopathic scoliosis: diagnosis and management. Am. Fam. Phys.89, 193–198 (2014).
Auton, A. et al. A global reference for human genetic variation. Nature 526, 68–74 (2015).
Karczewski, K. J. et al. The mutational constraint spectrum quantified from variation in 141,456 humans. Nature 581, 434–443 (2020).
Karczewski, K. J. et al. The ExAC browser: displaying reference data information from over 60 000 exomes. Nucleic Acids Res. 45, D840–D845 (2017).
Richards, S. et al. Standards and guidelines for the interpretation of sequence variants: a joint consensus recommendation of the American College of Medical Genetics and Genomics and the Association for Molecular Pathology. Genet. Med. 17, 405–424 (2015).
Kou, I. et al. Genetic variants in GPR126 are associated with adolescent idiopathic scoliosis. Nat. Genet. 45, 676–679 (2013).
Li, W. et al. AKAP2 identified as a novel gene mutated in a Chinese family with adolescent idiopathic scoliosis. J. Med. Genet. 53, 488–493 (2016).
Gao, X. et al. CHD7 gene polymorphisms are associated with susceptibility to idiopathic scoliosis. Am. J. Hum. Genet. 80, 957–965 (2007).
Tilley, M. K. et al. CHD7 gene polymorphisms and familial idiopathic scoliosis. Spine 38, E1432–E1436 (2013).
Yu, H. et al. Association of genetic variation in COL11A1 with adolescent idiopathic scoliosis. Elife 12, 89762v3 (2024).
Rebello, D. et al. COL11A2 as a candidate gene for vertebral malformations and congenital scoliosis. Hum. Mol. Genet. 32, 2913–2928 (2023).
Nada, D. et al. Identification of FAT3 as a new candidate gene for adolescent idiopathic scoliosis. Sci. Rep. 12, 12298 (2022).
Terhune, E. A. et al. Mutations in KIF7 implicated in idiopathic scoliosis in humans and axial curvatures in zebrafish. Hum. Mutat. 42, 392–407 (2021).
Jennings, W. et al. Paraspinal muscle ladybird homeobox 1 (LBX1) in adolescent idiopathic scoliosis: a cross-sectional study. Spine J. 19, 1911–1916 (2019).
Hassan, A. et al. Adolescent idiopathic scoliosis associated POC5 mutation impairs cell cycle, cilia length and centrosome protein interactions. PLoS One 14, e0213269 (2019).
Purcell, S. et al. PLINK: a tool set for whole-genome association and population-based linkage analyses. Am. J. Hum. Genet. 81, 559–575 (2007).
Chang, C. C. et al. Second-generation PLINK: rising to the challenge of larger and richer datasets. Gigascience 4, 7 (2015).
Kanehisa, M. KEGG: Kyoto encyclopedia of genes and genomes. Nucleic Acids Res. 28, 27–30 (2000).
Ashburner, M. et al. Gene ontology: tool for the unification of biology. Nat. Genet 25, 25–29 (2000).
Carbon, S. et al. The Gene Ontology resource: enriching a GOld mine. Nucleic Acids Res. 49, D325–D334 (2021).
Kuleshov, M. V. et al. Enrichr: a comprehensive gene set enrichment analysis web server 2016 update. Nucleic Acids Res. 44, W90–W97 (2016).
Acknowledgements
This study was supported by a research grant of the Basic Science Research Program through the National Research Foundation of Korea (NRF) funded by the Ministry of Science and ICT (Information and Communication Technology, NRF-2021R1F1A1045417). The sponsor or funding organization had no role in the design or conduct of this research.
Author information
Authors and Affiliations
Contributions
I.O.T. and E.K.L. contributed equally to this work and share first authorship. S.L. and K.S. contributed equally to this work. Concept and design: I.O.T., E.K.L., S.L., K.S.; acquisition, analysis, and interpretation of data: I.O.T., E.K.L., A.G.; drafting of the manuscript: I.O.T., E.K.L.; critical revision of the manuscript for important intellectual content: S.L., K.S.; statistical analysis: I.O.T., E.K.L.; funding acquisition: E.K.L., S.L.; administrative, technical, and material support: Y.L., D.J., J-Y.K., W.L., A.G.; supervision: S.L., K.S. All authors have read and approved the manuscript.
Corresponding authors
Ethics declarations
Competing interests
The authors declare no competing interests.
Additional information
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary information
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.
About this article
Cite this article
Tuncay, I.O., Lee, E.K., Gustafson, A. et al. Whole genome sequencing in adolescent idiopathic scoliosis cohort implicates multiple biological pathways. npj Genom. Med. 10, 67 (2025). https://doi.org/10.1038/s41525-025-00520-5
Received:
Accepted:
Published:
DOI: https://doi.org/10.1038/s41525-025-00520-5