Introduction

Alport syndrome (AS) is a monogenic kidney disease due to defects in COL4A3, COL4A4 and COL4A5 genes which encode the α3, α4 and α5 chains of collagen IV respectively1. Autosomal dominant (AD) Alport syndrome is caused by a heterozygous variant in either COL4A3 or COL4A4, while autosomal recessive (AR) disease is caused by two pathogenic variants (in trans) in either COL4A3 or COL4A4. Sex-linked (XL) Alport syndrome occurs if there is a defect in the COL4A5 gene. Digenic Alport syndrome is due to defects in two of the three genes (COL4A3, COL4A4, COL4A5), either in cis or trans1.

The α3, α4 and α5 chains of collagen IV intertwine to form triple helices that assemble to a network in the basement membranes of the glomeruli, cochlea, retina and lens2. Genetic variants resulting in amino acid changes that distort this trimerization can affect the assembly of collagen IV networks, leading to abnormal basement membranes. Glycine, being the smallest amino acid and occurring at the first position of the Gly-Xaa-Yaa repeats in the intermediate domains, allows for tight packing of the collagen triple helix3. In Alport syndrome, replacing glycine with a larger amino acid can disturb the α3α4α5 helix formation and affect the collagen networks in the basement membranes4. This explains the possible multiorgan involvement in Alport syndrome, namely glomerular disease, sensorineural hearing loss, lens and retinal abnormalities1.

Common kidney phenotypes in Alport syndrome include persistent glomerular hematuria, proteinuria, and progressive kidney failure5. Histological findings include focal and segmental glomerulosclerosis (FSGS) and coincidental IgA nephropathy6,7. Kidney cysts can also occur8. The trajectory of kidney disease depends on the mode of inheritance as well as epigenetic and environmental factors, modifying genes and comorbidities. Individuals with AR disease and males with XL disease have the most severe phenotypes, while those with AD condition and females with XL disease have a milder disease trajectory9,10.

Traditionally known as a rare genetic disorder, the prevalence rates of Alport syndrome were previously reported to range from one in 5000 to one in 53,00011,12,13. However, a recent study based on the Genome Aggregation Database (gnomAD) of more than 122,000 subjects showed that one in 106 (0.9%) individuals have a heterozygous variant in either COL4A3 or COL4A4 gene that is predicted to be pathogenic and can cause AD Alport syndrome14. These variants were associated with hematuria in the United Kingdom 100,000 Genomes Project. Additionally, in that study, one in 2,320 individuals had a pathogenic variant predicted to cause XL Alport syndrome. These findings corroborated with the subsequent Geisinger MyCode/ DiscovEHR study which was based on an unselected health system-based cohort of 174,418 participants15. In the latter study, 1 in 433 (0.23%) individuals had a heterozygous COL4A3 variant that was predicted to be pathogenic. The true population prevalence of Alport syndrome may be higher than those suggested in these two studies because individuals with diagnosed disease were not included in the cohorts. Moreover, large structural genetic rearrangements and deep intronic splicing variants could not be detected with the genomic testing techniques used14.

These population cohort studies had an under-representation of Asians. Although Asians represent nearly 60% of the global population, Asian genomes comprise only 6.6% of the gnomAD data and 3% of population health studies16,17. Furthermore, Asian-centric genomic databases such as GenomeAsia 100k were only created recently and have not been fully recruited yet18. Minor allele frequencies (MAF) of Alport gene variants are known to vary among populations. Pathogenic COL4A5 variants are more common in European or African populations, while heterozygous pathogenic COL4A3 and COL4A4 variants are at least four times more common in East Asian ancestries compared with Ashkenazim and Finns14.

Due to historical migration patterns, the genetic ancestry of Singapore’s multi-ethnic population has strong correlations with Asian populations in different geographical locations19. Hence, the study of Singapore population genomics can reflect the human genetic diversity across Asia. In this study, we aimed to estimate the population prevalence of individuals carrying pathogenic variants that can cause Alport syndrome, using a population-scale genomic database of 10,323 Singaporeans20. In addition, we aimed to stratify the population prevalence by genetically-inferred ancestries to provide insights into the ancestry-specific variant landscape of Alport syndrome in Asia19.

Results

Genetic variants

The SG10K study dataset includes 9,770 genomes after quality control. Initial analysis revealed a total of 28,968 variants in the COL4A3, COL4A4 and COL4A5 genes (9,012 for COL4A3, 9,777 for COL4A4 and 10,179 for COL4A5). Most variants were intronic (97.3% for COL4A3, 94.2% for COL4A4, and 97.8% for COL4A5) (Figure S1).

These 28,968 unique variants consisted of single nucleotide variants (SNVs), indels, and others (Table S1). The SNVs were predominantly non-synonymous missense variants, followed by synonymous missense changes. Notably, there was one and four SNVs in COL4A3 and COL4A4 respectively that led to gain of premature stop codons; and one SNV in COL4A4 that resulted in loss of the stop codon. Moreover, four SNVs in COL4A3 and one SNV in COL4A4 were located at canonical splice sites (+/- 1 or 2). There were no SNVs that affected stop codons or canonical splice sites identified in COL4A5 (Table S1).

Indel variant types included frameshift or inframe insertions and deletions (Table S1). These include one variant in COL4A3 that resulted in a frameshift insertion, three variants in COL4A3 and four variants in COL4A4 that caused frameshift deletions, one variant in COL4A4 that resulted in an inframe insertion, and six variants that caused inframe deletions (four in COL4A3, one in COL4A4 and one in COL4A5). Two indel variants in COL4A3 gene were identified at canonical splice sites while such variants were absent in the COL4A4 and COL4A5 genes (Table S1).

All synonymous and intronic non-splicing variants were filtered out as part of the variant curation steps.

Pathogenic COL4A3, COL4A4 and COL4A5 variants

There were 44 pathogenic or likely pathogenic variants (Table 1 and S1) in all three genes. Of these, 19 (43%) were COL4A3, 22 (50%) were COL4A4 and 3 (7%) were COL4A5 (Fig. 1).

Table 1 Minor allele frequency (MAF) of pathogenic and likely pathogenic COL4A3 - COL4A5 variants in the Singapore population.
Fig. 1
figure 1

COL4A3, COL4A4 and COL4A5 variants curation flowchart and pathogenic variants distribution. Whole-genome sequencing (WGS) was performed on a total of 10,323 healthy samples, of which, 9,770 genomes passed quality control. After removal of related family members up to second-degree, 9,051 individuals consisting of 61% Chinese, 21% Indian and 18% Malay remained. Variant curation revealed 44 (likely) pathogenic variants and 6 unique variants of uncertain significance with high pathogenicity scores. Among the 9,051 individuals tested, 55 individuals had heterozygous pathogenic variants in COL4A3 or COL4A4 suggesting AD Alport syndrome, while 4 individuals (3 females, 1 male) carried a single copy of pathogenic variant in COL4A5 suggesting XL Alport syndrome. Comparing across the three ethnic groups in Singapore, the prevalence of Alport syndrome pathogenic variants among the Chinese population was 2.7 times (95% CI: 1.1–6.4) higher than that in the Malay population (P = 0.027, Fisher’s exact test). There was no statistical difference in the prevalence rates between the Chinese and Indian populations, and between the Indian and Malay populations (P > 0.05).

Non-synonymous missense variants predominated, accounting for 52% (23 out of 44) of all pathogenic variants. Almost all (22 out of 23) non-synonymous missense pathogenic variants cause substitutions of glycine occurring at the first position of the Gly-Xaa-Yaa repeats in the intermediate domains. The remaining variant (COL4A3: c.4793T > G (p.Leu1598Arg)) led to the substitution of a hydrophobic leucine with a hydrophilic arginine.

All five identified SNVs that cause gain of a premature stop codon were pathogenic. Similarly, 7 out of 8 identified frameshift deletions and insertions were pathogenic. In contrast, only 2 out of 7 inframe insertions or deletions were pathogenic. Of the 11 splicing pathogenic variants, 7 occurred at canonical splice sites, 3 were non-synonymous missense variants, while the remaining one was a 9 bp deletion (Table S1).

Population prevalence of individuals who carry pathogenic variants for Alport syndrome

Variants predicted to be pathogenic in COL4A3, COL4A4 and COL4A5 genes were found in 60 out of 9,051 individuals, indicating an estimated overall population frequency of one in 150.

Of these 60 individuals, 55 (92%) had a single heterozygous pathogenic variant that can cause AD Alport syndrome. This included the COL4A3 gene for which 17 pathogenic variants were present in 27 individuals; and the COL4A4 gene for which 22 heterozygous pathogenic variants were present in 28 individuals. Hence, the estimated prevalence of individuals carrying variants that can cause AD Alport syndrome is one in 165 (55 out of 9,051 individuals) (Fig. 1), which is slightly less than previously reported prevalence of 1 in 106 based on the gnomAD database14.

We identified 4 individuals with pathogenic variants that can cause XL Alport syndrome (7% of 60 individuals). These four individuals, namely one Indian hemizygote and three heterozygous females (two Chinese, and one Malay) carried three distinct pathogenic variants in the COL4A5 gene. The estimated population frequency for individuals who carry variants which can cause XL Alport syndrome is therefore one in 2,262 (4 out of 9,051 individuals) (Fig. 1), similar to the previously reported prevalence of one in 2,320 (Fisher’s exact test, P= 0.7966)14.

Additionally, we identified one individual (Malay male) who carried two heterozygous variants in COL4A3, namely c.4253-4_4258delGAAGGACCAG (p.Gly1418_Pro1419del) and c.4258_4259insTTTTTTTTTT (p.Ala1420Valfs*93)). Although it could not be determined if the two variants were in cis or in trans, we presumptively assigned this subject as autosomal recessive Alport syndrome. No individuals with digenic forms of Alport syndrome were found.

Ethnic distribution of pathogenic COL4A3, COL4A4 and COL4A5 variants

When stratified by ancestry, the prevalence rates for Chinese, Indian and Malay populations were 0.808% (44 in 5,443), 0.572% (11 in 1,922) and 0.297% (5 in 1,686) respectively (Fig. 1). The prevalence rate among the Chinese population was 2.7 times (95% CI: 1.147–6.437) higher than for the Malay population (P = 0.027, Fisher’s exact test). Conversely, there was no significant difference between the Chinese and Indian populations, or between the Indian and Malay populations (P = 0.451 and 0.223, respectively, Fisher’s exact test).

It is notable that each pathogenic variant occurred exclusively within a specific ethnic group, apart from the variant COL4A4: c.1323_1340del18 (p.Pro444_Leu449delProGlyAlaProGlyLeu) which appeared twice in the Chinese population and once in the Indian population, and the variant COL4A5: c.2119G > C (p.Gly707Arg) which affected one Indian individual and one Malay individual. The two most prevalent pathogenic variants, COL4A3: c.3856G > A (p.Gly1286Arg) identified in eight individuals, and COL4A3: c.4793T > G (p.Leu1598Arg) identified in four individuals, were only found in the Chinese population.

Variants of uncertain significance with high pathogenicity scores

Five variants in COL4A3 and one variant in COL4A4 each have a total pathogenicity score of 5 (Table 2). Among these six VUS, three were missense SNVs that caused substitutions of glycine at position 1 of the Gly-Xaa-Yaa repeats while the other two were inframe deletions in COL4A3. The missense SNVs remained as VUS primarily because the evidence for in silico tool prediction (PP3) were only of moderate strength (REVEL scores ranging from 0.773 to 0.932)52. Additional criteria will be needed to reclassify these variants53,54. For example, further genotyping or phenotyping of the proband or family members; or emergence of newly published cases in the future may allow for addition of PS4, PP1, PP4 or PM5 criteria.

Table 2 Minor allele frequency (MAF) of the variants of uncertain significance with high pathogenicity scores.

Minor allele frequencies of pathogenic COL4A3, COL4A4 and COL4A5 variants

The MAF for each pathogenic variant are shown in Table 1 and S2. The commonest pathogenic variant is COL4A3: c.3856G > A (p.Gly1286Arg) which affects eight Chinese individuals and are present in 13% of the individuals carrying Alport gene alterations (8 of 60 subjects). This variant has a MAF of 0.00045 (8 out of 17,752 alleles) in our cohort, and this is higher than those reported in the gnomAD database: 0.00025 (11 out of 44,882 alleles) in East Asians and 2.542 × 10−6 (3 out of 1,179,994 alleles) in non-Finnish Europeans. This variant is absent in other ancestral groups of SG10K project.

Discussion

We have found that in Singapore, the prevalence of individuals carrying pathogenic variants that can cause AD Alport syndrome to be one in 165, which is slightly less than previously reported prevalence of 1 in 106 based on the gnomAD database14. To our knowledge, this is the first study that analyzed ethnic-specific prevalence in an Asian population. Our data also suggested that in Singapore, individuals carrying pathogenic variants for Alport syndrome are more common among the Chinese compared to the Malay population.

Located in the middle of Southeast Asia, Singapore’s population stemmed from immigrants from China, India and other parts of Asia over the past two centuries61. Because of this, Singapore’s current population of five million comprises of three major ethnic groups: Chinese (74.2%), Malay (13.7%), Indian (8.9%) which are of East Asian, Southeast Asian and South Asian ancestries respectively19. Our findings provide some insights into the variant landscape of Alport syndrome in Asia and could suggest that Alport syndrome may be more prevalent among Asian populations of Chinese descent compared to those of Malay descent. Further studies are needed to confirm these. Interestingly, our results contradicted the local observation that the prevalence of chronic kidney disease is higher among the Malay population compared to the other ethnic groups62. This may be explained by lifestyle and other environmental factors, gene-gene interactions or polygenic causes.

Our study was based on the SG10K_Health population-scale genomic database20which provides a rich resource to survey the population prevalence of genetic conditions. While our predicted population prevalence appears to be comparable to other population-based studies on Alport syndrome performed in global and Caucasian populations14,15, it is notable that the variant curation criteria differ widely among these studies. This is especially pertinent in variants that result in substitution of position 1 glycine in genes that encode collagen type IV. While there have been recommendations that such variants should be considered a “hot spot” for mutations according to the ACMG/AMP criteria for pathogenicity63, these have not been clearly defined in the newer guidelines released by the Clinical Genome (ClinGen) expert panels54. The impact is not miniscule since nearly half of the non-synonymous missense pathogenic variants identified in our study result in substitution of position 1 glycine (22 out of 55 variants).

The criteria used in our study were relatively more stringent compared to other similar population-based studies. Specifically, we converted the ACMG/AMP evidences into calculated odds of pathogenicity64and determined the variant classification using a Bayesian framework65in accordance to the ClinGen expert panels guidelines54. In contrast, Gibson et al. classified all variants that resulted in the substitution of position 1 glycine as “predicted pathogenic”14 while Solanki et al. used Varsome (www.varsome.com) to predict pathogenicity and classified variants as pathogenic if they have been reported in ClinVar as likely pathogenic or pathogenic at least once, regardless of the level of review supporting the variant classification14. Because of these differences, four variants that result in position 1 glycine substitutions in COL4A3 or COL4A4 genes remained as VUS with high pathogenicity scores in our study. Yet, they would have been categorized as “predicted pathogenic” if the criteria described by Gibson et al.14 was applied. This difference in variant curation criteria may partly explain the slightly lower prevalence in our study compared to other population-based studies, and clearly illustrates a real challenge in genomics medicine today.

A key limitation in our study is the lack of access to clinical or family details of the subjects in SG10K database. This limitation is partly ameliorated by our criteria for variant curation which were based on publicly available data such as GnomAD and ClinVar databases; clinical details of published cases in the literature as well as variant-specific details such as REVEL scores and published functional studies. Variants were classified as pathogenic if they achieved the pathogenicity score cut-offs without phenotype or family details specific to the SG10K subject. Nevertheless, we recognize some variants may be mis-classified especially if there are undetected evidence in the SG10K subjects that may suggest a benign tendency, such as lack of segregation in affected family members. Separately, we have described the VUS with high pathogenicity scores. These may be upgraded to likely pathogenic with added evidence from the clinical or family details of the subject in SG10K. These VUS were not included in the determination of population prevalence. Subsequent Phase II and III of the National Precision Medicine initiative in Singapore will include 100,000 genomes, and access to data in the electronic medical records of the study subjects66. Future analysis using this larger, more comprehensive dataset will provide better insights into disease trajectory and the population prevalence of Alport syndrome in Singapore.

We have also described the segregation of most variants in the COL4A3COL4A5 genes into specific ethnic groups. The two most prevalent pathogenic variants, COL4A3: c.3856G > A (p.Gly1286Arg) and COL4A3: c.4793T > G (p.Leu1598Arg), were exclusively present in the Chinese population, suggesting a possible founder effect. Notably, the variant COL4A3: c.3856G > A (p.Gly1286Arg) has a higher MAF among the Chinese in the SG10K cohort compared to those in the overall gnomAD cohort and its East Asian subgroup. In addition, there was a notable absence of COL4A5: c.1871G > A (p.Gly624Asp) in our cohort even though this variant represented almost half the pathogenic COL4A5 variants in the European population14. These exemplify the need for population-specific MAF databases for better interpretation of the genetic variants.

It may be argued that the prevalence of individuals with pathogenic variants may not equate the prevalence of the disease. This is because about 30% of individuals who harbor pathogenic variants in Alport syndrome do not manifest disease15,67 and hence the presence of pathogenic variants may not render the diagnosis of Alport syndrome. Additionally, age-dependent penetrance may occur as disease manifestations occur only later in life. Furthermore, the presumably non-penetrant individual may have an increased genetic predisposition to chronic kidney disease especially in the presence of other kidney insults like nephrotoxins, diabetes mellitus and hypertension. Health prevention strategies targeted at such an individual may therefore be justified. To this end, from a public health perspective, knowledge on the population prevalence of individuals carrying pathogenic variants of Alport syndrome can provide directions on health prevention strategies targeted to the genetically at-risk individuals.

The findings of this study provide insights on the potential disease burden in Singapore, and this is important for the nationwide clinical implementation of genetic testing in nephrology services. Genetic testing in glomerular disease has been shown to have high clinical utility in a systematic analysis68. Specifically in Alport syndrome which is the commonest monogenic glomerular disease68,69, 70% and 90% of male patients with XL Alport syndrome develop kidney failure by age 30 and 40 respectively without treatment70.

In those with AD Alport syndrome, 29% develop chronic kidney disease and 15% reached kidney failure by 53 years47. Early and aggressive treatment with renin-angiotensin-aldosterone system blockade may delay kidney failure by two decades or more71,72. The crux is that the earlier the treatment is started, the greater the impact on kidney survival71. This partly explains why genetic testing in glomerular diseases is cost-effective73. However, while clinical benefits are clear, genetic testing in nephrology has not been routinely performed in Singapore because of the limited availability of genetic testing, absence of clinical algorithms and lack of genetic counselling expertise. This results in lost opportunities to reduce the incidence of chronic kidney disease in Singapore, known to have one of the highest prevalence rates of kidney failure in the world74. A clinical implementation pilot project has been underway to provide genetic services within the nephrology service75. With the new understanding on the variant landscape and genetic burden of Alport syndrome in our study, it will direct the implementation efforts in various aspects such as resource allocation and genetic counselling.

There are several other limitations in this study. Firstly, the SG10K cohort comprises of a diverse group of individuals, including those in good health and others with known medical conditions. Participants with late-stage chronic kidney disease were likely not included and this may result in an under-estimation of the true population prevalence of Alport syndrome. Secondly, despite the ability of next-generation sequencing techniques to detect SNVs and most indels, it may not detect long-length duplications, deletions and insertions. Our analysis also focused only on exonic variants and intronic variants within 50 base pairs of the exon-intron boundaries, potentially neglecting deep intronic pathogenic variants, which may induce pathogenicity by creating or activating alternative splice sites76. These will lead to an under-estimation of the prevalence.

In conclusion, one in 150 individuals in the Singaporean population carry pathogenic variants that can lead to Alport syndrome. The prevalence rate among the Chinese population is 2.7 times higher than that of the Malay population. Subsequent studies as part of the Phase II of Singapore’s National Precision Medicine program should enhance the accuracy of our estimates.

Methods

Datasets

The SG10K Health Project20 consisted of cross-sectional population-scale whole genome sequencing of 10,323 Singaporeans. These included a mix of healthy and diseased individuals with adult-onset diseases such as diabetes mellitus and eye diseases (Table S4). They were not known to have a genetic kidney disease, but the prevalence of chronic kidney disease within this group was not known.

Ethics approval

This present study was conducted using data derived from the SG10K Health Project, established through collaboration with six local cohorts. These studies were approved by the Institutional Review Board of the respective institutions, including the National University Hospital, Singapore (CIRB/E/2019/2655 for “GUSTO”), Nanyang Technological University (IRB 2016-11-030 for “HELIOS”), National University of Singapore (CIRB 13–512 for “SEED”), SingHealth Centralised Institutional Review Board (2013/605/C for “MEC”; 2018/2717, 2018/2921, 2012/487/A, 2015/2279, 2018/2006, 2018/2594, and 2018/2570 for “PRISM”), and the National Health Group (TB-2020-001 and BTC-2020-001 for the Tan Tock Seng Hospital cohort) (Table S4). These studies were performed in accordance with the 1975 Declaration of Helsinki and its 2013 amendment. Informed consent was obtained from all participants20.

Genomic sequencing

Details of the methodology have been described previously20. In brief, whole-genome sequencing (WGS) of 15x or 30x coverage was performed on a total of 10,323 healthy samples. Following joint variant calling and quality control based on the GenomeAnalysisToolKit (GATK, v4.0.6.0) best practices workflow77,78, 9,770 genomes remained. The samples were further filtered through kinship analysis, resulting in 9,051 remaining genomes inferred to be unrelated to the second degree. Females comprised 57.3% of this analysed dataset. The age of participants ranged from birth to 85 years old (median 47 years). Due to cryptic admixture, discordance between self-reported and genetically-inferred ancestry have been previously described79. Therefore, we used genetically-inferred ancestry for all our analysis. With these, 60.8% of the final cohort were Chinese, 21.4% Indian and 17.8% Malay.

Genetic variants

COL4A3-COL4A5 variants were aligned using the Genome Reference Consortium Human Build 38 (GRCh38/hg38). The MANE transcripts for the collagen type IV α3 chain, α4 chain and α5 chain are ENST00000396578.8, ENST00000396625.5 and ENST00000328300.11, respectively. Variants were classified based on their location and variant alteration type. SNV types included missense synonymous and nonsynonymous SNV and SNVs that resulted in gain or loss of stop codons. Indel variant types included frameshift or inframe insertions or deletions, and those are located at splice sites.

Variant curation

Variants were classified into five categories: “Benign”, “Likely Benign”, “VUS”, “Likely Pathogenic” and “Pathogenic” based on the ACMG / AMP guidelines53,63. In this study, only pathogenic and likely pathogenic variants were used to derive prevalence rates.

Firstly, four criteria were initially applied to filter out variants: (1) variants with MAF > 1% in ExAC (Exome Aggregation Consortium) and GnomAD (Genome Aggregation Database) v3.1.3 databases, (2) intronic variants that were more than 50 base pairs from the exon-intron boundaries, (3) variants classified as “Benign” and “Likely Benign” in ClinVar or InterVar and (4) variants classified as “Benign” and “Likely Benign” by the semiautomated online tool VarSome80 (www.varsome.com).

We then applied the ACMG/AMP criteria based on population databases and in silico tools: (1) Absence in population databases or occurrence in extremely low frequencies of ≤ 0.00007 for AR and ≤ 0.00002 for AD (PM2). The strength of PM2 was not higher than “Supporting”52,81and (2) REVEL scores for missense variants (Supporting: 0.644–0.773; Moderate: 0.773–0.932; Strong: > = 0.932)52 (PP3).

According to the consensus by the ClinGen Sequence Variant Interpretation Working Group82, relevant data of the published cases carrying the same variant can be used to assign appropriate ACMG criteria. In light of these, an extensive literature search, empowered by Mastermind Genomenon®, was conducted to retrieve previously reported cases related to the variants83. The information on the published cases were obtained for the following ACMG evidences: (1) typical Alport syndrome phenotypes, such as absence of immunohistochemistry staining for collagen IV α5 in kidney or skin biopsies in XL and AR disease63 (PP4); (2) co-segregation or lack of segregation with disease respectively in multiple affected family members (PP1 or BS4); (3) de novo (both maternity and paternity confirmed or unconfirmed respectively) in a patient with the disease and no family history (PS2 or PM6). The BP6 or PP5 criteria (variant being reported as benign or pathogenic respectively by reputable sources) were no longer used82.

We also reviewed the literature to identify the following evidences for each variant: (1) same amino acid change as a previously established pathogenic variant regardless of nucleotide change (PS1); (2) novel missense change at an amino acid residue where a different missense change determined to be pathogenic has been described (PM5); (3) well-established in vitro or in vivo functional studies supportive of a damaging or benign effect respectively on the gene or gene product (PS3 or BS3). These scores can be additive across different reported families.

The strengths of each of these ACMG/AMP evidences were then transformed into scores which were calculated odds of pathogenicity64: Benign_Strong, −4; Benign_Supporting, −1; Indeterminate, 0; Pathogenic_Supporting, + 1; Pathogenic_Moderate, + 2; Pathogenic_Strong, + 4; Pathogenic_Very Strong, + 8. The added scores were used to determine the variant classification using a Bayesian framework65: >=10, Pathogenic; 6–9, Likely pathogenic; 0–5, VUS; −1 to −3, Likely benign; <= −4, Benign.

The VUS that each had a total pathogenicity score of 5 were analyzed. These VUS may potentially be “upgraded” to ‘Likely Pathogenic’ with the addition of 1 more point. This additional point may be obtained from the ACMG/AMP evidence derived from the clinical or family details specific to the subject in SG10K, such as family segregation of the variant (PP1) or phenotype specific to the genetic condition (PP4).

Minor allele frequency and population prevalence

Minor allele frequency (MAF) was calculated as the percentage of number of copies of the minor allele compared with the total number of alleles. For accuracy, individuals with variant loci that did not meet sequencing quality control standards were excluded. Therefore, the total number of alleles used for calculating MAF was less than the total number of alleles in the individuals recruited in the whole cohort.

Population prevalence was determined by dividing the number of individuals carrying the pathogenic variants by the total number of individuals in the cohort (n = 9,051). Individuals carrying a single heterozygous pathogenic variant in either COL4A3 or COL4A4 can develop AD Alport syndrome while individuals carrying a single pathogenic variant in COL4A5 can develop XL Alport syndrome. Gender was not considered in the determination of prevalence for XL Alport syndrome because both males and females may develop the disease, albeit with differing severity.

We could determine the number of individuals who had co-occurrence of more than one pathogenic variant in these three genes. Individuals carrying two pathogenic variants in the same COL4A3 or COL4A4 gene were assumed to have autosomal recessive Alport syndrome, even though it was not possible to determine if the variants were occurring in cis or trans.

The population prevalences were also compared between the Chinese, Indian and Malay ethnicities in Singapore.

Statistical analysis

Data was analyzed using SPSS (v.29). The statistical difference between the allele frequencies in different populations was calculated using the two-sided Fisher’s exact test. A P value of less than 0.05 was considered significant.