Abstract
Rare genetic diseases are responsible for a small but significant proportion of childhood morbidity and mortality. The majority of these diseases have no treatment and they create a huge burden on the families and the whole society. A well-tested strategy to prevent these diseases from happening is carrier screening, which can reduce the incidents of autosomal recessive (AR) and X-linked (XL) conditions. Using a carrier screening panel based on next-generation sequencing, 1265 patients including 388 pairs of couples were tested for 486 genes, covering 623 conditions. A total of 1397 variants were found in 66.32% of the individuals, representing a mutation burden of 1.10 variants per person. The highest mutation burdens were found in the subgroups participants with histories of abnormal pregnancies (1.38), health issues (1.34), and ultrasound anomalies(1.23), respectively. Among the 388 pairs of couples, 19 pairs were found to be at high risk of having a child affected by either AR (9 pairs) or XL (10 pairs) conditions. DUOX2, HBA1/2, and USH2A were the most frequently mutated genes found. A cutoff gene frequency of over 1/200 as recommended by ACMG in this study would include the top 20 genes and cover 37.94% of all the variants identified. Ten couples with fertility risk were followed up on their subsequent reproductive choices and intervention, including to choose IVF, abortion and keep the affected child. Given that most individuals carried 1 or 2 variants in this population, carrier screening programs seem to be a worthy investment as a public health tool. Patient follow-ups demonstrated that couples in China have diverse opinions and values regarding reproductive choices.
Similar content being viewed by others
Background
Decades of advances in medicine, technology, and the global economy writ large have significantly alleviated the impact of communicable diseases, improved the global condition for nutritional diseases, and reduced childhood mortality1,2. While definitely a cause for celebration, this shift has made non-communicable diseases, including rare genetic conditions, relatively more frequent in all under-five deaths3.
Overall, rare Mendelian disorders are estimated to affect around 0.3% of pregnancies4,5, represent a quarter of all NICU admissions6, and are responsible for 20%–30% of infant and child mortality7,8. Despite an annual expenditure of $125 billion worldwide on orphan drugs, the percentage of rare diseases with an approved drug indication remains disappointingly low8. Currently, there are different detection methods for birth defect prevention at different stages of reproduction. Those methods are not redundant but complementary. For example, multiple US agencies have clearly stated that carrier screening and newborn screening should not be used as substitutions for each other9.
Carrier screening provided at the preconception stage has been one of the most successful prevention strategies to identify at-risk couples (ARCs) who carry pathogenic variants on the same gene, or similarly female carriers of X-linked conditions. This method of carrier screening has reduced Tay-Sachs disease in Ashkenazi Jews since the 1970s10, and decreased the incidents of thalassemia worldwide11. The ACMG currently adopts a tiered system based mainly on carrier frequency regarding the screening practice12. Briefly, Tier 1 focuses on cystic fibrosis, spinal muscular atrophy, and other individual-based risks13,14,15; Tier 2 screens for moderate or severe conditions with a carrier frequency no less than 1/100 16; Tier 3 expands the carrier frequencies to higher than 1/200, which is the default recommendation; Tier 4 is reserved for specific situations such as consanguinity or suggestive indications12. Additionally, there is a constantly updating gene list of secondary findings that are recommended for reporting if identified during exome or genome sequencing17. The recommended timing of carrier screening by ACMG is ideally at the preconception stage, but all patients who are already pregnant should also be offered the Tier 3 carrier screening12. This is to reduce the potential stress those patients might experience with a positive screening result and to offer more options regarding reproductive decision-making18.
China has seen a great impetus for carrier screening in recent years. An early study screening for 11 recessive diseases in more than ten thousand people found that over a quarter of the apparently healthy sample population carried pathogenic variants19. A 201-gene panel screening a sample population of 2923 Chinese patients using ART found 46.73% were carriers20. When the genes screened increased to 330, a total of 600 patients using ART had a positive result ratio of 64.83%21. The detected carrier rate in the literature increased as the number of genes detected increased.
The detection of genetic abnormalities at the preconception or prenatal stage is not the only aspect of the carrier screening practice. It also includes genetic counseling before and after the screening, reproductive decision-making, appropriate medical interventions, monitoring, and all tests and counseling after the child has been born22. The ethical concerns surrounding any topics of reproductive choices are of utmost importance, and one of the current consensus is to put the patients in a position to make an informed decision5,22,23. Carrier screening in this aspect is on the less controversial side, as it affords the couple more options to choose from. Meanwhile, the choices of at-risk couples may also inform the medical community about the zeitgeist and the ever-changing societal norms to adjust the practice accordingly.
In China, there have been few reports that documented an integration of care and management for at-risk couples throughout the whole reproductive cycle, up until the newborn screening after birth. The establishment of such a routine practice would greatly benefit the couples involved and the birth defect prevention program. As of today, couples with a positive carrier screening result in China have very similar reproductive options available as recommended by ACMG12: (1) Prenatal diagnosis using chorionic villus sampling or amniocentesis, (2) IVF with PGT, (3) Donor gametes, (4) Adoption. A high proportion of Chinese couples would opt for other alternatives when genetic defects are suspected 24. Unsurprisingly, the preferred options are different in Chinese couples, likely due to factors such as culture, economy, regulations, social mores, education, etc.4,25.
Here we selected a sample population of 1265 patients, of which 776 were couples, visiting a fertility center in South-western China. Among them, 130 were ethnic minorities, representing a multiethnic study population in China. Using a panel including 486 genes, we screened for carriers of 623 conditions to scan the genetic landscape of a population with a diverse ethnic background. These data provide crucial information for future ECS panel design in regions with multiple ethnic lineages. We also followed up on their reproductive choices for couples identified with a high risk of having an affected offspring.
Materials and methods
Ethical compliance
The study approval was obtained from the First People’s Hospital of Yunnan Province (No. 2021-28). All patients signed an informed consent form to allow the use of peripheral blood samples for clinical research purposes.
Peripheral blood from all participants was obtained following their consent in accordance with the Declaration of Helsinki. All clinical data of the participants were also reviewed and approved by the ethics board committee at the Obstetrical Department, the First People’s Hospital of Yunnan Province Hospital.
Study design
This was a prospective study of the genetic landscape of single-gene autosomal recessive (AR) or X-linked diseases in a group of couples and single participants who were pregnant at the first or second trimestre. A total of 1265 participants with qualified DNA samples were included in this study, of which 388 pairs were couples. An additional 488 pregnant women were tested as singles, as well as one male due to concerns on account of his family history. All participants were enrolled in a single center between August 2022 and June 2023. All ECS samples were carried out by using high-throughput sequencing, and the identified variants were validated using confirmatory methods such as MLPA, TR-PCR, or qPCR. Figure 1 demonstrates our workflow. All participants underwent genetic counseling prior to the ECS testing. After the test, we conducted a second counseling session performed by board-certified FACMG genetic specialists, and invited the participants to a discussion of the findings and their implications. Technical details like disease information, treatment and prognosis, inheritance and probability, penetrance and expressivity, residual risk, and reproductive options are discussed in an accessible manner. We documented the risk factors for female participants in the project and categorized them into five categories, namely no known risk factors, advanced maternal age, previous abortion histories (spontaneous or otherwise), history of abnormal pregnancies, and maternal health factors as shown in Table 1 These categories were subsequently further divided into sub-categories. It is important to note that a single participant could fall under multiple categories. Within the maternal health issues, the complaints encompassed diabetes, obesity, hypothyroidism, and other health concerns. There were 67 individuals who reported related phenotypes. All male participants except that single testing individual underwent sequential screening following their spouses. We then calculated the carrier status, mutation burden (average numbers of PLPVs per person), and number of at-risk couples (ARCs). The ARCs were subsequently followed up for their interventions and reproductive ends.
Workflow of the carrier screening process. A total of 1265 samples were included 388 pairs of couples (776 samples) and 489 singles (488 females and one male), their samples were sequenced and their carrier status was tallied. The couples were screened sequentially and the males were only sequenced after positive findings in the females. At-risk couples were followed up on their subsequent reproductive choices and results.
Data management, tabulation, analysis, and interpretation were accomplished via self‐written Python and R scripts.
ECS panel design
A total of 623 single-gene recessive (AR or X-linked) diseases, encompassing 486 genes, were used in this study (Table S1). Several aspects were considered to determine the selection of conditions for our ECS panel, detailed in a previous study design21. Briefly, (1) conditions had to satisfy the guidelines/recommendations of the ACMG, the ACOG, the National Society of Genetic Counselors, the Perinatal Quality Foundation, and the Society for Maternal–Fetal Medicine9; (3) having a CF > 1/500 in any ethnicity26; (3) severe newborn, infancy, and childhood‐onset disorders with highly penetrant phenotypes; (4) high‐prevalence monogenic diseases with moderate phenotypes, and disabilities that impact the quality of life for the subject’s entire life, such as severe hearing loss and blindness27; (5) the genes must be detectable by NGS or triplet repeat primed PCR (TP-PCR, for FMR1) with a definitive detection rate and variants can be validated by methods like Sanger, multiplex ligation-dependent probe amplification (MLPA), and quantitative polymerase chain reaction (qPCR).
Genome annotations from the National Center for Biotechnology Information Reference Sequence database were used to define the exon regions for each ECS gene, and 20 bp flanking regions were included in our panel. Some important loci in the non-coding regions curated by HGMD and ClinVar were also included in the panel.
Genomic sequencing and data analysis
Peripheral blood samples from 1265 participants were drawn with informed consent and preserved in anticoagulant tubes containing ethylenediamine tetraacetic acid stored in a freezer at − 80℃. Genomic DNA was purified from frozen blood samples using the QIAamp® DNA Blood Mini kit following the manufacturer’s protocol (Qiagen, Germantown, MD, USA). Genomic DNA (300–500 ng) was sheared, size selected (400–600 bp), ligated to sequencing adapters, and PCR amplified following standard library preparation. The post-PCR library was then used for exome capture using the 486-gene panel and synthesized by Integrated DNA Technologies, Coralville, IA, USA. Exome-enriched samples were sequenced (2 × 150 bp) on an Illumina Novaseq (Illumina, Inc., San Diego, CA, USA). Raw image files were processed using bcl2fastq2 conversion software v2.20 (Illumina, Inc., San Diego, CA, USA).
The sequencing reads were aligned to the human reference genome (hg19/GRCh37) using the Burrows-Wheeler alignment tool, and PCR duplicates were removed using Picard software v1.57 (http://picard.sourceforge.net/). The genomic analysis toolkit GATK4 (https://software.broadinstitute.org/gatk/) was employed for variant discovery. Variant annotation and interpretation were conducted using the ANNOVAR software (http://www.openbioinformatics.org/annovar/). All detected variants were further manually reviewed by following the ACMG/AMP guidelines for variant interpretation.
A misalignment-discriminated method was used to identify genes containing misalignment issues related to homology, including SMN1, HBA1/HBA2, GBA1, CYP21A2, CYP21A1P, VWF, DCRE1C, ALMS1, and IDS. In detail, gene-specific nucleotides (GSNs) refer to the distinct nucleotides between a gene and its pseudogene or homologous region. The ECS panel was designed to capture their existence during the sequencing. After identifying regions of high homology, the allele frequency of GSNs in the population was determined by querying sequencing databases using a misalignment index, which is a natural algorithm of gene-to-pseudogene ratios describing the target gene to the corresponding pseudogene relationship28.
The internally developed software CNVexon, a coverage-based CNV detection method29, was used to analyze CNVs, especially for exon-level heterozygous deletions or amplifications.
Variant Interpretations
All the identified variants were classified into five categories: “pathogenic,” “likely pathogenic,” “uncertain significance,” “likely benign,” and “benign” according to the ACMG guidelines for the interpretation of genetic variants30. All the SNVs and CNVs classified as “pathogenic” or “likely pathogenic” variants (PLPVs) were confirmed through Sanger sequencing, MLPA, or LR-PCR. The reporting standard of the findings to the patients was set based on the practice resource of the ACMG12.
There were 463 variants that were considered pathogenic or likely pathogenic but did not include in the final study due to causing mild phenotype or incomplete penetrance of the disease (Table S10). For example, the single variant of GJB2 c.109G > A(p.Val37Ile) was found 168 times, while the variant of UGT1A1 c.1091C > T(p.Pro364Leu) was found 100 times. These were neither included in the final analysis nor reported to the patients, except in three instances of GJB2 c.109G > A(p.Val37Ile) where the couples were at risk of having a compound heterozygous child and one case of CYP21A2 c.955C > T(p.Gln319*) where the couple were at similar risk and the variant was independently verified through Sanger sequencing (Table S9). A total of 463 variants were excluded in the final analysis.
Sanger sequencing
Sequencing was performed in both forward and reverse directions using 1.6 µl cleanup PCR products, 0.8 µl BigDye, and 2.0 µl M13 sequencing primers (1.6 µM) in each reaction.
Multiplex ligation-dependent probe amplification
MLPA was performed with 100 ng genomic DNA per reaction using the SALSA MLPA probe mixes P050-C1 and P460-A1 to analyze CYP21A2, HBA1/HBA2, and SMN1/2, respectively, according to the manufacturer’s recommendations (MRC Holland, Amsterdam, The Netherlands). Quality control and data analysis were conducted using the Coffalyser.net software (MRC Holland, www.mlpa.com).
Long fragment PCR technology (LR-PCR)
In PCR plates, every reaction used a 25 µl mixture, which was mixed by 12.5 2 × 2 × LongAmp Taq Master Mix, 1 µl LD-PCR primer F (10 μM), 1 µl LD-PCR primer R (10 μM), 2.5 μl genomic DNA and 8 µl ddH2O. Place the plate in a thermal cycler and perform LR-PCR using: 94 ℃ for 1 min; 30 cycles at 94 ℃ for 10 s and 65 ℃ for 6 min; 65℃ for 6 min, maintained at 4 ℃. 5 µl LR-PCR product was loaded into 1.5% agarose with 1 × TAE gel, and electrophoresis was performed at 120 V for 30 min to observe whether there was a band. If there was a band, the next step was performed.
Triplet repeat primed PCR (TP-PCR)
CGG repeat number in the 5´ UTR of FMR1 was analyzed using the MicroreaderTM FMR1 Gene Detection Kit (Microread Genetics, Beijing, China) and agarose gel electrophoresis. Electrophoresis was performed on an ABI 3730 Genetic Analyzer (Applied Biosystems). The PCR conditions were: incubating the reaction mixture at 95 °C for 5 min, followed by 10 denaturation cycles at 97 °C for 35 s, annealing at 65 °C for 35 s (with 1 °C dropping down at each additional cycle), and extension at 68 °C for 4 min. Next, 20 denaturation cycles at 97 °C for 35 s, annealing at 60 °C for 35 s, and extension at 68 °C for 4 min (with 20 s extension at each additional cycle) were followed by holding at 68 °C for 10 min and then final holding at 15 °C. The results were analyzed using GeneMapper v.4.0 software.
Results
Population statistics
A total of 1265 individuals were screened for carrier status of the selected conditions, including 776 coupled individuals (388 couples) and 489 single individuals. Among the 489 single individuals, there is only one male individual who requested the screening test in consideration of his family history, the rest of the participants were all females complied with the sequential nature of the screening practice. The female median age was 32 years old, ranging from 18 to 48, apart from a pair of 10-year-old and 14-year-old sisters who tested with and under the request of their mother; the male median age was 32 years old, ranging from 23 to 60.
Apart from six individuals whose ethnicities were unavailable for various reasons, the rest 1261 patients all had their ethnicities registered. There was a small but substantial number of 130 (10.3%) ethnic minority participants, among which just over half (67 individuals) were from the Yi ethnic group. The rest of the participants were from the Han ethnic group (Table 1).
Sample population carrier rates and mutation burdens
Of the 1265 individuals, 839 (66.32%) returned a positive report, which means they carried at least one PLPV. A total of 1397 PLPVs were found, averaging 1.10 pathogenic mutations per person, i.e. the mutation burden of the population. As shown in Table 1, the mutation burden was slightly higher in females (68.38% carrier rate and 1.14/person) than in males (61.70% carrier rate and 1.03/person). The carrier frequencies and mutation burdens were not remarkable among ethnic minorities compared to the majority Han (61.54% vs. 66.84% and 1.12 vs. 1.10 variant per person).
The carrier rate and mutation burden observed among the males were much lower than those in the females (61.70% and 1.03 vs. 68.38% and 1.14). The trend of higher female mutation burden was more pronounced among the 388 couples. The females from this group had a carrier rate of 75.00% compared to 63.11% in single females (Table 1). This was possibly an artifact due to the fact that those males were more likely to be sequentially screened if their spouse returned a positive result first.
When the participants were stratified by risk factors, the highest carrier rate (43/53, 81.13%) was among the females with a history of ultrasound anomalies, more than females with health issues (77.61%) or histories of abnormal pregnancies (76.92%). The highest mutation burdens were also found in these three subgroups, with 1.38, 1.34, and 1.23 in participants with histories of abnormal pregnancies, health issues, and ultrasound anomalies, respectively. To see whether there were any potential directions for further investigation, statistical analyses were conducted and only the carrier rate of females with ultrasound anomalies showed any promise to warrant further confirmation (P = 0.035 compared to the routine screening group).
The modal mutation burden among the whole studied population was 1, ranging from no variant found to five variants per individual (Fig. 2). Specifically, 29.57% (374/1265) of the population were non-carriers, 55.89% (707/1265) had 1–2 PLPVs, and 10.51% (133/1265) had 3 or more variants.
Mutation burden of the sample population. The modal and most common number was one variant, with 452 individuals, followed by people with a negative result at 374 individuals. The next most common was two variants with 255 individuals. Another 99 individuals carried three variants. The rest 34 patients carried four or five variants.
Among the 1397 variants, 1287 (92.13%) were SNVs, 105 (7.52%) were CNVs, and 5 (0.36%) were short tandem repeats (STRs) of FMR1. Variants that might cause inborn errors of metabolism (IEM) include 34 variants of PAH, 23 of SLC22A5, 20 of MMACHC,15 of SLC25A13, 14 of GBA1, etc., adding up to 31% (437/1397) of all PLPVs, making PLPVs that may cause metabolic diseases the most common as a group in our sample population. Other diseases heavily implicated in this study were multisystem diseases (13%), neurological diseases (12%), endocrine diseases (12%), hematological diseases (7%), and auditory diseases (6%) (Fig. 3A).
Distribution of PLPVs. (a) All variants from different systems. The PLPVs found were linked to IEM the most at 31%, followed by multisystem at 13%, and nervous system and endocrine system both at 12%. (b) All genes with a combined PLPVs carrier frequency ≥ 1/100. (c) Single variants with a carrier frequency ≥ 1/200. The most frequently found variants including HBA1/HBA2 -α3.7/αα(p.?) 28 times, DUOX2 c.1588A > T(p.Lys530*) 24 times, SMN1 ex.7del(p.?) 22 times, and ATP6V0A4 c.1029 + 5G > A(p.?) 21 times.
Neuromuscular disease burden was contributed by 22 variants of SMN1 and 17 variants of NEB. Variants causing endocrine disorders were contributed by mutations on DUOX2 (89 variants), CYP21A2 (29 variants), and SRD5A2 (16 variants). Variants that might impact the hematological system 46 variants of HBA1/2, 21 of F11, 16 of HBB, and 12 PRF1. Despite the exclusion of a large number of controversial variants like GJB2 c.109G > A(p.Val37Ile), highly frequent mutations that could cause auditory conditions were still over-represented within the cohort, including 41 variants of USH2A, 26 of ATP6V0A4, 23 of SLC26A4, 21 of GJB2, and 11 of MYO15A. OCA2 and TYR, both associated with oculocutaneous albinism, had 16 and 12 variants each.
The most frequently found mutated genes were DUOX2, HBA1/HBA2, and USH2A, with 89, 46, and 41 variants reported, respectively. Three genes with significance to NGS due to highly homologous pseudogenes or large deletions were present at high mutation rates, namely HBA1/HBA2, CYP21A2, and SMN1, with 46, 29, and 22 variants detected after Sanger/MLPA validation. Other highly frequent variants affected PAH (34 variants), ATP7B (28 variants), ATP6V0A4 (26 variants), etc. (Fig. 3B).
Nineteen variants were found to have a carrier frequency over 1/200 (Fig. 3C), including four variants of DUOX2 and two variants of HBA1/HBA2. It is worth noting that three of those high-frequency variants were CNVs, namely HBA1/HBA2 -α3.7/αα(p.?) and αα/–SEA(p.?), and SMN1 ex.7del(p.?). They added up to 64 (61.0%) counts in a total of 105 CNVs, which consisted of 39 unique ones (Table S3). Three other CNVs were discovered more than once, including 3 NPHP1 whole gene deletion, HBA1/HBA2 αα/-α4.2(p.?) and CYP21A2 ex.1_7del(p.?) were identified in 2 samples each.
Variant detection efficiencies and panel designs
Due to the uneven distribution of the variants, the number of genes increased non-linearly with the addition of lower and lower carrier frequency candidates (Table 2). There was a single gene, DUOX2, in our study that had a carrier frequency over 1/50 (≥ 26 variants detected). The gene number rose to seven at over 1/100 (≥ 13 variants) and 20 at over 1/200 (≥ 7 variants).
Hypothetically, selecting the 20 highest-frequency genes would have covered 37.94% of all variants detected. The percentage rose to 56.34% and 61.92% when the gene inclusion criteria expanded to frequencies over 1/400 and 1/500, respectively (Fig. 4A). The yield increase per gene detected dropped sharply within the inclusion of the first 50 genes, around the frequency of > 1/400. The yield entered into a linear phase of slow increase (Fig. 4B).
The relationship between gene carrier frequency, number of genes, and detection rate. (a) The increase of the number of genes included doubled (7–1 = 6, 20–7 = 13, 49–20 = 29) as the carrier frequency halved, and the detection rate increased linearly at first. (b) The variant detection number increased greatly at the beginning with a few genes but reached a linear phase after more than 50 genes were included, and the per-gene detection number also dropped sharply until about 50 genes.
At-risk couples
There were nine (9/388, 2.32%) ARCs carrying 18 relevant variants (they carried other PLPVs only in one partner of the couple) of autosomal recessive conditions (Table S9). The only recurrent risk appearance was GJB2 (2 couples) that could lead to non-syndromic deafness. Another 10 (10/876, 1.14%) females, of which four were screened with their partners, carried variants of an X-linked condition. Among them, 7 were G6PD variants associated with favism, and F8 (Hemophilia A), DMD (Duchenne/Becker muscular dystrophy), and CHM (choroideremia) variant one of each. These couples or individuals were at significant risk of having an affected offspring.
All of the ARC were accepted the genetic consultation. There were ten couples continue to receive medical care and treatment at our hospital. The followed up of these cases’ subsequent pregnancies were summarized in Table 3. All these couples were of Han Chinese origin, except for the wife of AR-CP16 who was from the Laku ethnic group. Couples AR-CP6 and AR-CP16 were carriers of CYP21A2 and HBA1/HBA2 variants, respectively, and amniocenteses ruled out their offspring as carriers. Couples AR-CP4, AR-CP7, AR-CP10, and AR-CP21 were carriers of SLC22A5, PCDH15, RYR1, and CPS1, respectively, and amniocenteses confirmed their offspring as heterozygous carriers not at increased risk of monogenic conditions. All these six couples had a successful delivery of an unaffected child.
Couple AR-CP19 were carriers of AGXT variants associated with hyperoxaluria (OMIM:#259900). The fetus was compound heterozygous for both variants after amniocentesis confirmation. The female of Couple XL-CP2 was a carrier of a DMD variant, and a male fetus was confirmed as a carrier after amniocentesis. These two couples decided to terminate the pregnancy after genetic counseling. Non-couple testing female patient XL-SF4 was a carrier of an F8 variant, and she decided to go through IVF to assist her reproductive journey.
The female of Couple XL-CP4 was a carrier of a CHM variant, and a male fetus was confirmed as a carrier after amniocentesis. After comprehensive genetic counseling and careful consideration, the parents conceived and delivered a male infant naturally. The infant showed characteristic phenotypes of choroideremia. The couple received genetic counseling on the details of the condition and we have been continuously following up with the growth of the baby.
Individuals carrying biallelic homozygous variants
Eight apparently healthy individuals during the screening were found to be homozygous with PLPVs (Table S6). Three individuals of each were found to be carrying homozygous variants of GJB2 c.109G > A(p.Val37Ile) and UGT1A1 c.1091C > T(p.Pro364Leu), two of OCA2 c.1441G > A(p.Ala481Thr). They did not report any phenotype related to the disease. Although these variants were considered pathogenic or likely pathogenic, their pathogenicities were relatively weak. They were not reported in this study either in overall variants or in at-risk couples if the offspring were potentially homozygous, but only reported in compound heterozygous cases. Another individual who was homozygous for SH3TC2 c.730C > T(p.Gln244*) had telltale signs of a neuromuscular disorder.
Discussion
This is the first study to evaluate the carrier status among the diverse population in the Yunnan Province of China using an expanded carrier screening panel covering 486 genes. This unique population makeup that consists of groups with diverse ancestries, which share origins and cultures with Southwest China, Southeast Asia, and the rest of China, is in large part due to mass migration during wartime. The ethnic minority groups have maintained certain unique marriage practices and kept their lineages relatively distinct genetically. Therefore, the inclusion of a small but substantial number of these patients within the same cohort alongside the majority Han group within the same region gives us a rare insight into the genetic landscape of the region.
The genetic landscape of the population
In this cohort of 1265 patients, the mutation burden was comparable to studies in Chinese populations at 1.10 variants per person, and the carrier rate at 66.32%, using a 486-gene panel. On one level, this could partially be the result of the expanding scope of the genes screened. For example, an early study in China screening 11 diseases in multiethnic groups had a carrier rate of 27.49%19. Apart from the thalassemia-associated genes, the rest have rather similar carrier frequencies (Table 4). The high level of thalassemia burden in the previous study was likely due to the classic balancing selection pressure from malaria in that region.
Another piece of the puzzle might come from the types of variations screened, such as CNVs31, but this could not account for the differences between studies using a similar design. In fact, among the variants detected, 7.44% were copy number variations, comparable to a level reported previously in Shanghai at 4.18%21.
Interestingly, no perceptible difference was observed between the majority Han ethnic group and the minority groups in terms of mutation burden using this same panel. The 130 patients from all the minority groups had a mutation burden of 1.12 variants per person compared to the 1.10 variants per person of the Han group. The largest group in the study among the minorities, the Yi ethnics, had a burden of 1.10 among the 67 patients. No evidence suggested the carrier rates differed in any particular ethnic group, either. The results seemed to support the use of a pan-ethic panel in China until further evidence suggests otherwise.
A multiple hypothesis testing was performed to investigate whether any subgroup has the potential for further exploration. Within this cohort, the highest carrier rates and mutation burdens were found among females with a history of ultrasound anomalies, having their own health issues, and those with past abnormal pregnancies. Only the carrier rate of the first group showed any statistical significance compared to the routine screening group with no registered risk factors. Further research is needed to confirm whether the difference in females with ultrasound anomalies is genuine or spurious. These pathogenic variants are not associated with embryo lethality or have an established link to intrauterine presentations. One might speculate that the higher mutation number reflected a higher general mutation burden on the genome, which in turn might have affected embryonic health in a polygenic manner.
Genes with high mutation frequencies
There were several genes with a high mutation frequency that had been previously reported in literature among Chinese populations, but there were also clear differences in the genetic landscape of this population.
For genes that are associated with auditory conditions, GJB2 and SLC26A4 are well known to cause non-syndromic hearing loss32,33, and USH2A is associated with Usher syndrome34. Despite the exclusion of some of the most common variants, these genes still counted for quite a number of mutations. Interestingly, ATP6V0A4, which is associated with renal tubular acidosis in addition to sensorineural hearing loss35, had a relatively high frequency in this cohort but was not commonly reported in previous carrier screening studies. On the other hand, GJB3 variants were less frequent in this cohort compared to previous studies33.
Variants that could lead to inborn errors of metabolism (IEM) were found on PAH, ATP7B, SLC22A5, MMACHC, and SLC25A13. These were consistent with earlier studies in China21,36. Similarly, high-frequency variants were found that might cause endocrine conditions on DUOX2 and CYP21A220,37. Other high-frequency variants that were previously reported include the genes like HBA1/2 and SMN120,36,37.
Variants of SRD5A2 were found 16 times in this study, especially c.680G > A(p.Arg227Gln), which is significantly higher than the number reported from other regions in China. A previous study on hypospadias found the variants more common in South China than in North China, especially in Guangxi province, which borders Yunnan38. The geographical and ethnic environs make it not impossible a link to the “kwalatmala” boys of Papua New Guinea, which was among the first groups studied on 5α-reductase deficiency39. Further studies are needed to determine if a link exists.
Notably, the variant of CYP21A2 c.955C > T(p.Gln319*) suffers from interference of the pseudogene CYP21A1P. To be certain of its validity and pathogenicity, the variant needs to be verified through Sanger sequencing. After verification of 20 samples, only three (15%) returned a true positive result, thus a decision was reached that this specific variant would only be verified and then reported in cases of at-risk couples. This variant was flagged by the system 42 times in total (Table S10), but only one was found in an at-risk couple and then verified. This was the only count included in the final analysis. This change in screening strategy significantly increased the efficiency of the variant interpretation and streamlined the verification process for this locus.
The cost–benefit analysis of panel size
The Pareto principle, or the proverbial "80–20 rule", states that about 80% of outcomes are due to 20% of causes, due to the nature of power law distribution if the variable expands over a wide range of magnitudes. The allele frequencies of pathogenic variants vary greatly and thus one would expect their distributions to similarly follow the power law.
A population-specific panel design targeting genes with the highest mutation rate at the local level could make meaningful changes in the economic considerations of both the patients and the wider society40. This is one of the major reasons for conducting this study, i.e. to assess the population-specific mutation profile and establish a genetic basis for designing a multi-tier system of screening panels.
There were 300 genes in this study that were found to have at least one pathogenic or likely pathogenic variant among the sample population, totaling 1397 variants (Table S4). Therefore, the Pareto principle would suggest the top 60 genes responsible would roughly account for more than 1100 variants should the mutation happen randomly. However, deleterious mutations would be purged through natural selection gradually, especially at the higher frequencies, while de novo mutations accumulate within a population. In this population, the top 20% of genes covered only slightly over 60% of all the variants detected.
The variant detection number increased sharply with the first handful of genes but slowed down after the top 20 genes, reaching a plateau phase after more than 50 genes were included. Interestingly, this circa 20 genes line was almost precisely at the > 1/200 frequency cutoff, agreeing neatly with the ACMG recommendation. Based on the findings of this study, a virtual panel consisting of 49 genes with the highest frequencies would cover 56.34% of all the variants detected in this population. If designing a small panel maximizing cost–benefit efficiency was the goal, the > 1/200 cutoff point of the top 20 genes would be a sensible choice given the per-gene detection rate.
At-risk couples and reproductive decision-making
Of the 388 couples sequenced together, 9 (2.32%) pairs were at high risk of having offspring affected by AR conditions, which was not unexpected based on earlier similarly designed studies21. A recurrent risk was only because of the highly prevalent GJB2 c.109G > A(p.Val37Ile) resulting in compound heterozygous offspring. Among female carriers of X-linked conditions, G6PD variants were the most common. These two genes are associated with diseases of limited impact on the individual life expectancy or reproductive success, but quality of life, i.e., non-syndromic hearing loss41 and favism 42.
The disparity in disease screening scope between China and Western nations presents a distinct challenge, rooted in cultural preferences and historical factors. Unlike programs typically prioritizing severe-phenotype diseases, China’s expanded carrier screening also include moderate-impact conditions, reflecting a culturally-mediated emphasis on comprehensive genetic risk assessment. This practice amplifies clinical complexities in genetic counseling, as providers must contextualize extensive risk profiles for patients. Consequently, amniocentesis becomes a standard procedure for most at-risk couples to resolve fetal carrier ambiguities. Among carrier couples of severe phenotypic variants, the reproductive decision-making is also heterogeneous post-diagnosis. One of the most important insights gained from this study was the reproductive choices made by the at-risk couples (Table 3). Seven at-risk couples with autosomal recessive conditions and three couples with female carriers of X-linked conditions went through subsequent reproductive actions. Nine couples continued with natural pregnancies and then used amniocentesis for prenatal screening.
Couple XL-SF4 involved a mother carrying a pathogenic variant of F8, they went through IVF after learning they could go through the procedure at the same center. This suggests that services covering the whole reproductive cycle at the same site could facilitate couples choosing their preferred options for themselves.
Amniocenteses from the first six couples reported encouraging findings for the parents. Couple AR-CP19 had a compound heterozygous fetus for AGXT from amniocentesis, Couple XL-CP2 had a male fetus with a pathogenic CNV on DMD. After genetic counseling and understanding the risks associated with the variants, both couples decided to terminate the pregnancy through induced abortion. It is not clear whether the latter couple was more concerned with the less likely phenotype of the severe Duchenne muscular dystrophy that could lead to an early death, or if they had more reservations about any congenital condition that might impact the quality of life.
Couple XL-CP4 was at risk due to the mother being a carrier of a CHM variant. They conceived a child naturally and subsequent amniocentesis showed this was an affected male fetus, and the ultrasound result suggested cleft lip and palate, but the couple decided to continue with the pregnancy after genetic counseling and careful consideration. After the child was born, the kid showed signs of choroideremia. However, the presentation of cleft lip and palate was more likely of another origin, rather than a hitherto undocumented rare phenotype of CHM mutation, as per Hickam’s dictum43.
These couples, especially the latter four Couples XL-SF4, AR-CP19, XL-CP2, and XL-CP4 demonstrated a quite nuanced attitude when it comes to birth defects. They had embraced the premise of carrier screening, i.e., prevention of birth defects, but also accepted certain risks of having conditions impacting the quality of life in their child. This suggested that genetic counseling and shared decision-making between physicians and patients are indispensable in Chinese couples despite the aforementioned challenges.
Individuals with homozygous pathogenic variants
The main and first proposed purpose of carrier screening was to identify carriers of recessive conditions in order to prevent birth defects or treat in advance individuals with two pathogenic alleles on the same gene. However, not all variants are made the same. Some variants are not damaging enough to cause symptoms on their own in homozygous form or in combination with another weakly pathogenic variant. In this study, eight individuals were found to be in such a condition: three homozygous individuals of each for GJB2 c.109G > A(p.Val37Ile) and UGT1A1 c.1091C > T(p.Pro364Leu), two for OCA2 c.1441G > A(p.Ala481Thr)44,45,46. They all appeared to be otherwise healthy. This further supported our decision in this study to exclude most of these variants except in specific instances.
There was one unusual case involving an individual with a past diagnosis of an unspecified neuromuscular disorder. This patient had foot deformities and scoliosis, and was ambulatory but had trouble maintaining balance. She came for screening due to her own condition as well as her family history. She was tested and found to be homozygous for SH3TC2 c.730C > T(p.Gln244*), which was a gene associated with Charcot-Marie-Tooth disease type 4C and had been reported in Chinese patients 47. This information was subsequently forwarded to the patient and the primary care physician to aid further diagnosis. Fortunately, the partner of the patient was not a carrier and the offspring was not at risk. In this example, it was demonstrated that carrier screenings can occasionally benefit the screened individuals themselves, especially those with medical or family histories, in addition to the reassurance of reproductive success.
Future issues to explore
Recent years have seen a number of carrier screening studies in China, but the study cohort was primarily made up of the Han ethnic group. Officially, there are 55 minority ethnic groups and many of them have unique lineages that share very limited common ancestries with other groups. This study established a baseline genetic spectrum of the Yunnan province, which can be compared to other regions. However, to what extent the differences contributed by the different ethnic groups requires further investigation at the moment.
Establishing the mutation spectrum for different ethnic groups, some of which might still practice marriage traditions that restrict unions with outsiders, is crucial for precise birth defect prevention. To explore the potential of ethnically targeted screening, future carrier screening studies should focus more on minority ethnic groups and recruit more couples from these backgrounds.
Limitations
Some limitations should be borne in mind when analyzing these results and trying to formulate more general conclusions. Variant pathogenicity followed the ACMG guidelines with the available evidence at the time. Changes in the guidelines and advances in medical genetics knowledge may change the interpretations of some variants.
Only a small proportion of the at-risk couples were followed up on their reproductive decisions due to the constraint of the study timeframe. Therefore, their choices might not be representative of the wider population.
Finally, there were many minority ethnic groups included in the study but many groups had only a small number of patients. It did not give a detailed picture of ethnicities.
Conclusions
The population in Yunan province has a genetic profile that is generally similar to that of other Chinese populations elsewhere. No noticeably distinct genetic features were observed in minority groups compared to the Han majority in the region. Patient couples notified of their carrier and at-risk status were able to weigh different options and make a rational decision based on their backgrounds and values. This confirmed that shared decision-making is just as important in China as in Western societies.
Data availability
The datasets generated and analysed during the current study are available in the dbSNP repository (https://www.ncbi.nlm.nih.gov/SNP/snp_viewTable.cgi? handle=YNECS).
References
Collaborators, G. B. D. D. Global age-sex-specific fertility, mortality, healthy life expectancy (HALE), and population estimates in 204 countries and territories, 1950–2019: A comprehensive demographic analysis for the Global Burden of Disease Study 2019. Lancet 396, 1160–1203 (2020).
Diseases, G. B. D. & Injuries, C. Global burden of 369 diseases and injuries in 204 countries and territories, 1990–2019: A systematic analysis for the Global Burden of Disease Study 2019. Lancet 396, 1204–1222 (2020).
Blencowe, H. et al. Rare single gene disorders: Estimating baseline prevalence and outcomes worldwide. J. Community Genet. 9, 397–406 (2018).
Shapiro, A. J., Kroener, L. & Quinn, M. M. Expanded carrier screening for recessively inherited disorders: Economic burden and factors in decision-making when one individual in a couple is identified as a carrier. J. Assist. Reprod. Genet. 38, 957–963 (2021).
Veneruso, I., Di Resta, C., Tomaiuolo, R. & D’Argenio, V. Current updates on expanded carrier screening: New insights in the Omics Era. Medicina (Kaunas). 58, 455 (2022).
Malam, F. et al. Benchmarking outcomes in the Neonatal Intensive Care Unit: Cytogenetic and molecular diagnostic rates in a retrospective cohort. Am. J. Med. Genet. A. 173, 1839–1847 (2017).
Stevenson, D. A. & Carey, J. C. Contribution of malformations and genetic disorders to mortality in a Children’s hospital. Am. J. Med. Genet. A. 126A, 393–397 (2004).
Ferreira, C. R. The burden of rare diseases. Am. J. Med. Genet. A. 179, 885–892 (2019).
Edwards, J. G. et al. Expanded carrier screening in reproductive medicine-points to consider: A joint statement of the American College of Medical Genetics and Genomics, American College of Obstetricians and Gynecologists, National Society of Genetic Counselors, Perinatal Quality Foundation, and Society for Maternal-Fetal Medicine. Obstet. Gynecol. 125, 653–662 (2015).
Kaback, M. M. Population-based genetic screening for reproductive counseling: The Tay-Sachs disease model. Eur. J. Pediatr. 159(Suppl 3), S192-195 (2000).
Cousens, N. E., Gaff, C. L., Metcalfe, S. A. & Delatycki, M. B. Carrier screening for beta-thalassaemia: A review of international practice. Eur. J. Hum. Genet. 18, 1077–1083 (2010).
Gregg, A. R. et al. Screening for autosomal recessive and X-linked conditions during pregnancy and preconception: a practice resource of the American College of Medical Genetics and Genomics (ACMG). Genet. Med. 23(10), 1793–1806 (2021).
Grody, W. W. et al. Laboratory standards and guidelines for population-based cystic fibrosis carrier screening. Genet. Med. 3, 149–154 (2001).
Prior, T. W., Professional, P. & Guidelines, C. Carrier screening for spinal muscular atrophy. Genet. Med. 10, 840–842 (2008).
ACOG, C. O. G. Committee Opinion No. 691: Carrier Screening for Genetic Conditions. Obstet. Gynecol. 129, e41–e55 (2017).
ACOG, C. O. G. Committee Opinion No. 690: Carrier Screening in the Age of Genomic Medicine. Obstet. Gynecol. 129, e35–e40 (2017).
Miller, D. T. et al. ACMG SF v3.2 list for reporting of secondary findings in clinical exome and genome sequencing: A policy statement of the American College of Medical Genetics and Genomics (ACMG). Genet. Med. 25, 100866 (2023).
Cheng, H. Y. H. et al. Expanded carrier screening in Chinese population—A survey on views and acceptance of pregnant and non-pregnant women. Front. Genet. 11, 594091 (2020).
Zhao, S. et al. Pilot study of expanded carrier screening for 11 recessive diseases in China: Results from 10,476 ethnically diverse couples. Eur. J. Hum. Genet. 27, 254–262 (2019).
Xi, Y. et al. Expanded carrier screening in Chinese patients seeking the help of assisted reproductive technology. Mol. Genet. Genomic. Med. 8, e1340 (2020).
Chen, S. C. et al. Carrier burden of over 300 diseases in Han Chinese identified by expanded carrier testing of 300 couples using assisted reproductive technology. J. Assist. Reprod. Genet. 40(9), 2157–2173 (2023).
Sagaser, K. G. et al. Expanded carrier screening for reproductive risk assessment: An evidence-based practice guideline from the National Society of Genetic Counselors. J. Genet. Couns. 32, 540–557 (2023).
Dive, L. & Newson, A. J. Reproductive carrier screening: Responding to the eugenics critique. J. Med. Ethics. 48, 1060–1067 (2022).
Xie, D. et al. Prenatal diagnosis of birth defects and termination of pregnancy in Hunan Province, China. Prenat. Diagn. 40(8), 925–930 (2020).
Shi, M. et al. Clinical implementation of expanded carrier screening in pregnant women at early gestational weeks: A Chinese cohort study. Genes (Basel) 12, 496 (2021).
Ben-Shachar, R., Svenson, A., Goldberg, J. D. & Muzzey, D. A data-driven evaluation of the size and content of expanded carrier screening panels. Genet. Med. 21, 1931–1939 (2019).
Lazarin, G. A. et al. Systematic classification of disease severity for evaluation of expanded carrier screening panels. PLoS ONE 9, e114391 (2014).
Lee, C. Y., Yen, H. Y., Zhong, A. W. & Gao, H. Resolving misalignment interference for NGS-based clinical diagnostics. Hum. Genet. 140, 477–492 (2021).
Strom, S. P. et al. A Streamlined approach to Prader-Willi and Angelman syndrome molecular diagnostics. Front. Genet. 12, 608889 (2021).
Richards, S. et al. Standards and guidelines for the interpretation of sequence variants: A joint consensus recommendation of the American College of Medical Genetics and Genomics and the Association for Molecular Pathology. Genet. Med. 17, 405–424 (2015).
Hogan, G. J. et al. Validation of an expanded carrier screen that optimizes sensitivity via full-exon sequencing and panel-wide copy number variant identification. Clin. Chem. 64, 1063–1073 (2018).
Fu, Y. et al. Carrier frequencies of hearing loss variants in newborns of China: A meta-analysis. J. Evid. Based. Med. 12, 40–50 (2019).
Hu, H. et al. Genetic testing involving 100 common mutations for antenatal diagnosis of hereditary hearing loss in Chongqing, China. Medicine (Baltimore) 100, 25647 (2021).
Zhu, T. et al. USH2A variants in Chinese patients with Usher syndrome type II and non-syndromic retinitis pigmentosa. Br. J. Ophthalmol. 105, 694–703 (2021).
Guo, W. et al. Genotypic and phenotypic analysis in 51 Chinese patients with primary distal renal tubular acidosis. Clin. Genet. 100, 440–446 (2021).
Fang, Y. et al. Clinical application value of expanded carrier screening in the population of childbearing age. Eur. J. Med. Res. 28, 151 (2023).
Tong, K. et al. Clinical utility of medical exome sequencing: expanded carrier screening for patients seeking assisted reproductive technology in China. Front. Genet. 13, 943058 (2022).
Gui, B. et al. New insights into 5alpha-reductase type 2 deficiency based on a multi-centre study: Regional distribution and genotype-phenotype profiling of SRD5A2 in 190 Chinese patients. J. Med. Genet. 56, 685–692 (2019).
Imperato-McGinley, J. et al. A cluster of male pseudohermaphrodites with 5 alpha-reductase deficiency in Papua New Guinea. Clin. Endocrinol (Oxf) 34, 293–298 (1991).
Wang, T. et al. Economic evaluation of reproductive carrier screening for recessive genetic conditions: A systematic review. Expert Rev. Pharmacoecon. Outcomes Res. 22, 197–206 (2022).
Wang, X. et al. Children with GJB2 gene mutations have various audiological phenotypes. Biosci. Trends. 12, 419–425 (2018).
He, Y. et al. Glucose-6-phosphate dehydrogenase deficiency in the Han Chinese population: Molecular characterization and genotype-phenotype association throughout an activity distribution. Sci. Rep. 10, 17106 (2020).
Stratakis, C. A. “patients can have as many gene variants as they damn well please”: Why contemporary genetics presents us daily with a version of Hickam’s dictum. J. Clin. Endocrinol. Metab. 97, E802–E804 (2012).
Li, L. et al. The p. V37I exclusive genotype of GJB2: A genetic risk-indicator of postnatal permanent childhood hearing impairment. PLoS ONE 7, e36621 (2012).
Yang, H. et al. Clinical significance of UGT1A1 genetic analysis in Chinese neonates with severe hyperbilirubinemia. Pediatr. Neonatol. 57, 310–317 (2016).
Wang, Y., Chang, Y., Gao, M., Zang, W. & Liu, X. Genetic analysis of albinism caused by compound heterozygous mutations of the OCA2 gene in a Chinese family. Hereditas 161, 8 (2024).
Sun, B. et al. Screening for SH3TC2 variants in Charcot-Marie-Tooth disease in a cohort of Chinese patients. Acta. Neurol. Belg. 122, 1169–1175 (2022).
Acknowledgements
The authors thank the participating samples for their cooperation and support of this research. The authors thank Jia Jia, Mingmin Zhao, Yanjie Wang, Nani Zhou for their contributions to the writing of this article. Additionally, the authors are grateful to Xiaoxia Zhou, Yuwen Qian, and Yanfei Ni for their help in revising the article.
Funding
This work was supported by the Project of National Natural Science Foundation of China (42167060), the Clinical Research Center for Gynecological and Obstetric Disease of Yunnan Province (2023ZJZX-FC14), the Key research and development plan of Yunnan Province (202403AC100002), and the Yunnan Province Key Clinical Specialty of Obstetrical Department (2024CKKFKT-11).
Author information
Authors and Affiliations
Contributions
HW and QZ are the co-first authors. GC, HW and QZ designed the research. DX, YW, JP analyzed data and provided explanations. JP and DX performed statistical analysis. XD and HW obtained financing. HW and QZ wrote the manuscript. GC and XD critically revised the manuscript to enrich the manuscript with additional knowledge. All authors read and approved the final draft.
Corresponding authors
Ethics declarations
Competing interests
The authors declare no competing interests.
Ethical approval
The study approval was obtained from the First People’s Hospital of Yunnan Province (No. 2021-28).
Consent for publication
All participants were provided informed consent during the study.
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Electronic supplementary material
Below is the link to the electronic supplementary material.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.
About this article
Cite this article
Wang, H., Zhao, Q., Xie, D. et al. Diagnostic yield of expanded carrier screening of a multi-ethnic population in yunnan, China. Sci Rep 15, 23590 (2025). https://doi.org/10.1038/s41598-025-08012-3
Received:
Accepted:
Published:
DOI: https://doi.org/10.1038/s41598-025-08012-3