Background

Decades of advances in medicine, technology, and the global economy writ large have significantly alleviated the impact of communicable diseases, improved the global condition for nutritional diseases, and reduced childhood mortality1,2. While definitely a cause for celebration, this shift has made non-communicable diseases, including rare genetic conditions, relatively more frequent in all under-five deaths3.

Overall, rare Mendelian disorders are estimated to affect around 0.3% of pregnancies4,5, represent a quarter of all NICU admissions6, and are responsible for 20%–30% of infant and child mortality7,8. Despite an annual expenditure of $125 billion worldwide on orphan drugs, the percentage of rare diseases with an approved drug indication remains disappointingly low8. Currently, there are different detection methods for birth defect prevention at different stages of reproduction. Those methods are not redundant but complementary. For example, multiple US agencies have clearly stated that carrier screening and newborn screening should not be used as substitutions for each other9.

Carrier screening provided at the preconception stage has been one of the most successful prevention strategies to identify at-risk couples (ARCs) who carry pathogenic variants on the same gene, or similarly female carriers of X-linked conditions. This method of carrier screening has reduced Tay-Sachs disease in Ashkenazi Jews since the 1970s10, and decreased the incidents of thalassemia worldwide11. The ACMG currently adopts a tiered system based mainly on carrier frequency regarding the screening practice12. Briefly, Tier 1 focuses on cystic fibrosis, spinal muscular atrophy, and other individual-based risks13,14,15; Tier 2 screens for moderate or severe conditions with a carrier frequency no less than 1/100 16; Tier 3 expands the carrier frequencies to higher than 1/200, which is the default recommendation; Tier 4 is reserved for specific situations such as consanguinity or suggestive indications12. Additionally, there is a constantly updating gene list of secondary findings that are recommended for reporting if identified during exome or genome sequencing17. The recommended timing of carrier screening by ACMG is ideally at the preconception stage, but all patients who are already pregnant should also be offered the Tier 3 carrier screening12. This is to reduce the potential stress those patients might experience with a positive screening result and to offer more options regarding reproductive decision-making18.

China has seen a great impetus for carrier screening in recent years. An early study screening for 11 recessive diseases in more than ten thousand people found that over a quarter of the apparently healthy sample population carried pathogenic variants19. A 201-gene panel screening a sample population of 2923 Chinese patients using ART found 46.73% were carriers20. When the genes screened increased to 330, a total of 600 patients using ART had a positive result ratio of 64.83%21. The detected carrier rate in the literature increased as the number of genes detected increased.

The detection of genetic abnormalities at the preconception or prenatal stage is not the only aspect of the carrier screening practice. It also includes genetic counseling before and after the screening, reproductive decision-making, appropriate medical interventions, monitoring, and all tests and counseling after the child has been born22. The ethical concerns surrounding any topics of reproductive choices are of utmost importance, and one of the current consensus is to put the patients in a position to make an informed decision5,22,23. Carrier screening in this aspect is on the less controversial side, as it affords the couple more options to choose from. Meanwhile, the choices of at-risk couples may also inform the medical community about the zeitgeist and the ever-changing societal norms to adjust the practice accordingly.

In China, there have been few reports that documented an integration of care and management for at-risk couples throughout the whole reproductive cycle, up until the newborn screening after birth. The establishment of such a routine practice would greatly benefit the couples involved and the birth defect prevention program. As of today, couples with a positive carrier screening result in China have very similar reproductive options available as recommended by ACMG12: (1) Prenatal diagnosis using chorionic villus sampling or amniocentesis, (2) IVF with PGT, (3) Donor gametes, (4) Adoption. A high proportion of Chinese couples would opt for other alternatives when genetic defects are suspected 24. Unsurprisingly, the preferred options are different in Chinese couples, likely due to factors such as culture, economy, regulations, social mores, education, etc.4,25.

Here we selected a sample population of 1265 patients, of which 776 were couples, visiting a fertility center in South-western China. Among them, 130 were ethnic minorities, representing a multiethnic study population in China. Using a panel including 486 genes, we screened for carriers of 623 conditions to scan the genetic landscape of a population with a diverse ethnic background. These data provide crucial information for future ECS panel design in regions with multiple ethnic lineages. We also followed up on their reproductive choices for couples identified with a high risk of having an affected offspring.

Materials and methods

Ethical compliance

The study approval was obtained from the First People’s Hospital of Yunnan Province (No. 2021-28). All patients signed an informed consent form to allow the use of peripheral blood samples for clinical research purposes.

Peripheral blood from all participants was obtained following their consent in accordance with the Declaration of Helsinki. All clinical data of the participants were also reviewed and approved by the ethics board committee at the Obstetrical Department, the First People’s Hospital of Yunnan Province Hospital.

Study design

This was a prospective study of the genetic landscape of single-gene autosomal recessive (AR) or X-linked diseases in a group of couples and single participants who were pregnant at the first or second trimestre. A total of 1265 participants with qualified DNA samples were included in this study, of which 388 pairs were couples. An additional 488 pregnant women were tested as singles, as well as one male due to concerns on account of his family history. All participants were enrolled in a single center between August 2022 and June 2023. All ECS samples were carried out by using high-throughput sequencing, and the identified variants were validated using confirmatory methods such as MLPA, TR-PCR, or qPCR. Figure 1 demonstrates our workflow. All participants underwent genetic counseling prior to the ECS testing. After the test, we conducted a second counseling session performed by board-certified FACMG genetic specialists, and invited the participants to a discussion of the findings and their implications. Technical details like disease information, treatment and prognosis, inheritance and probability, penetrance and expressivity, residual risk, and reproductive options are discussed in an accessible manner. We documented the risk factors for female participants in the project and categorized them into five categories, namely no known risk factors, advanced maternal age, previous abortion histories (spontaneous or otherwise), history of abnormal pregnancies, and maternal health factors as shown in Table 1 These categories were subsequently further divided into sub-categories. It is important to note that a single participant could fall under multiple categories. Within the maternal health issues, the complaints encompassed diabetes, obesity, hypothyroidism, and other health concerns. There were 67 individuals who reported related phenotypes. All male participants except that single testing individual underwent sequential screening following their spouses. We then calculated the carrier status, mutation burden (average numbers of PLPVs per person), and number of at-risk couples (ARCs). The ARCs were subsequently followed up for their interventions and reproductive ends.

Fig. 1
figure 1

Workflow of the carrier screening process. A total of 1265 samples were included 388 pairs of couples (776 samples) and 489 singles (488 females and one male), their samples were sequenced and their carrier status was tallied. The couples were screened sequentially and the males were only sequenced after positive findings in the females. At-risk couples were followed up on their subsequent reproductive choices and results.

Table 1 Sample population statistics and reasons for carrier screening carried out for all patients.

Data management, tabulation, analysis, and interpretation were accomplished via self‐written Python and R scripts.

ECS panel design

A total of 623 single-gene recessive (AR or X-linked) diseases, encompassing 486 genes, were used in this study (Table S1). Several aspects were considered to determine the selection of conditions for our ECS panel, detailed in a previous study design21. Briefly, (1) conditions had to satisfy the guidelines/recommendations of the ACMG, the ACOG, the National Society of Genetic Counselors, the Perinatal Quality Foundation, and the Society for Maternal–Fetal Medicine9; (3) having a CF > 1/500 in any ethnicity26; (3) severe newborn, infancy, and childhood‐onset disorders with highly penetrant phenotypes; (4) high‐prevalence monogenic diseases with moderate phenotypes, and disabilities that impact the quality of life for the subject’s entire life, such as severe hearing loss and blindness27; (5) the genes must be detectable by NGS or triplet repeat primed PCR (TP-PCR, for FMR1) with a definitive detection rate and variants can be validated by methods like Sanger, multiplex ligation-dependent probe amplification (MLPA), and quantitative polymerase chain reaction (qPCR).

Genome annotations from the National Center for Biotechnology Information Reference Sequence database were used to define the exon regions for each ECS gene, and 20 bp flanking regions were included in our panel. Some important loci in the non-coding regions curated by HGMD and ClinVar were also included in the panel.

Genomic sequencing and data analysis

Peripheral blood samples from 1265 participants were drawn with informed consent and preserved in anticoagulant tubes containing ethylenediamine tetraacetic acid stored in a freezer at − 80℃. Genomic DNA was purified from frozen blood samples using the QIAamp® DNA Blood Mini kit following the manufacturer’s protocol (Qiagen, Germantown, MD, USA). Genomic DNA (300–500 ng) was sheared, size selected (400–600 bp), ligated to sequencing adapters, and PCR amplified following standard library preparation. The post-PCR library was then used for exome capture using the 486-gene panel and synthesized by Integrated DNA Technologies, Coralville, IA, USA. Exome-enriched samples were sequenced (2 × 150 bp) on an Illumina Novaseq (Illumina, Inc., San Diego, CA, USA). Raw image files were processed using bcl2fastq2 conversion software v2.20 (Illumina, Inc., San Diego, CA, USA).

The sequencing reads were aligned to the human reference genome (hg19/GRCh37) using the Burrows-Wheeler alignment tool, and PCR duplicates were removed using Picard software v1.57 (http://picard.sourceforge.net/). The genomic analysis toolkit GATK4 (https://software.broadinstitute.org/gatk/) was employed for variant discovery. Variant annotation and interpretation were conducted using the ANNOVAR software (http://www.openbioinformatics.org/annovar/). All detected variants were further manually reviewed by following the ACMG/AMP guidelines for variant interpretation.

A misalignment-discriminated method was used to identify genes containing misalignment issues related to homology, including SMN1, HBA1/HBA2, GBA1, CYP21A2, CYP21A1P, VWF, DCRE1C, ALMS1, and IDS. In detail, gene-specific nucleotides (GSNs) refer to the distinct nucleotides between a gene and its pseudogene or homologous region. The ECS panel was designed to capture their existence during the sequencing. After identifying regions of high homology, the allele frequency of GSNs in the population was determined by querying sequencing databases using a misalignment index, which is a natural algorithm of gene-to-pseudogene ratios describing the target gene to the corresponding pseudogene relationship28.

The internally developed software CNVexon, a coverage-based CNV detection method29, was used to analyze CNVs, especially for exon-level heterozygous deletions or amplifications.

Variant Interpretations

All the identified variants were classified into five categories: “pathogenic,” “likely pathogenic,” “uncertain significance,” “likely benign,” and “benign” according to the ACMG guidelines for the interpretation of genetic variants30. All the SNVs and CNVs classified as “pathogenic” or “likely pathogenic” variants (PLPVs) were confirmed through Sanger sequencing, MLPA, or LR-PCR. The reporting standard of the findings to the patients was set based on the practice resource of the ACMG12.

There were 463 variants that were considered pathogenic or likely pathogenic but did not include in the final study due to causing mild phenotype or incomplete penetrance of the disease (Table S10). For example, the single variant of GJB2 c.109G > A(p.Val37Ile) was found 168 times, while the variant of UGT1A1 c.1091C > T(p.Pro364Leu) was found 100 times. These were neither included in the final analysis nor reported to the patients, except in three instances of GJB2 c.109G > A(p.Val37Ile) where the couples were at risk of having a compound heterozygous child and one case of CYP21A2 c.955C > T(p.Gln319*) where the couple were at similar risk and the variant was independently verified through Sanger sequencing (Table S9). A total of 463 variants were excluded in the final analysis.

Sanger sequencing

Sequencing was performed in both forward and reverse directions using 1.6 µl cleanup PCR products, 0.8 µl BigDye, and 2.0 µl M13 sequencing primers (1.6 µM) in each reaction.

Multiplex ligation-dependent probe amplification

MLPA was performed with 100 ng genomic DNA per reaction using the SALSA MLPA probe mixes P050-C1 and P460-A1 to analyze CYP21A2, HBA1/HBA2, and SMN1/2, respectively, according to the manufacturer’s recommendations (MRC Holland, Amsterdam, The Netherlands). Quality control and data analysis were conducted using the Coffalyser.net software (MRC Holland, www.mlpa.com).

Long fragment PCR technology (LR-PCR)

In PCR plates, every reaction used a 25 µl mixture, which was mixed by 12.5 2 × 2 × LongAmp Taq Master Mix, 1 µl LD-PCR primer F (10 μM), 1 µl LD-PCR primer R (10 μM), 2.5 μl genomic DNA and 8 µl ddH2O. Place the plate in a thermal cycler and perform LR-PCR using: 94 ℃ for 1 min; 30 cycles at 94 ℃ for 10 s and 65 ℃ for 6 min; 65℃ for 6 min, maintained at 4 ℃. 5 µl LR-PCR product was loaded into 1.5% agarose with 1 × TAE gel, and electrophoresis was performed at 120 V for 30 min to observe whether there was a band. If there was a band, the next step was performed.

Triplet repeat primed PCR (TP-PCR)

CGG repeat number in the 5´ UTR of FMR1 was analyzed using the MicroreaderTM FMR1 Gene Detection Kit (Microread Genetics, Beijing, China) and agarose gel electrophoresis. Electrophoresis was performed on an ABI 3730 Genetic Analyzer (Applied Biosystems). The PCR conditions were: incubating the reaction mixture at 95 °C for 5 min, followed by 10 denaturation cycles at 97 °C for 35 s, annealing at 65 °C for 35 s (with 1 °C dropping down at each additional cycle), and extension at 68 °C for 4 min. Next, 20 denaturation cycles at 97 °C for 35 s, annealing at 60 °C for 35 s, and extension at 68 °C for 4 min (with 20 s extension at each additional cycle) were followed by holding at 68 °C for 10 min and then final holding at 15 °C. The results were analyzed using GeneMapper v.4.0 software.

Results

Population statistics

A total of 1265 individuals were screened for carrier status of the selected conditions, including 776 coupled individuals (388 couples) and 489 single individuals. Among the 489 single individuals, there is only one male individual who requested the screening test in consideration of his family history, the rest of the participants were all females complied with the sequential nature of the screening practice. The female median age was 32 years old, ranging from 18 to  48, apart from a pair of 10-year-old and 14-year-old sisters who tested with and under the request of their mother; the male median age was 32 years old, ranging from 23 to 60.

Apart from six individuals whose ethnicities were unavailable for various reasons, the rest 1261 patients all had their ethnicities registered. There was a small but substantial number of 130 (10.3%) ethnic minority participants, among which just over half (67 individuals) were from the Yi ethnic group. The rest of the participants were from the Han ethnic group (Table 1).

Sample population carrier rates and mutation burdens

Of the 1265 individuals, 839 (66.32%) returned a positive report, which means they carried at least one PLPV. A total of 1397 PLPVs were found, averaging 1.10 pathogenic mutations per person, i.e. the mutation burden of the population. As shown in Table 1, the mutation burden was slightly higher in females (68.38% carrier rate and 1.14/person) than in males (61.70% carrier rate and 1.03/person). The carrier frequencies and mutation burdens were not remarkable among ethnic minorities compared to the majority Han (61.54% vs. 66.84% and 1.12 vs. 1.10 variant per person).

The carrier rate and mutation burden observed among the males were much lower than those in the females (61.70% and 1.03 vs. 68.38% and 1.14). The trend of higher female mutation burden was more pronounced among the 388 couples. The females from this group had a carrier rate of 75.00% compared to 63.11% in single females (Table 1). This was possibly an artifact due to the fact that those males were more likely to be sequentially screened if their spouse returned a positive result first.

When the participants were stratified by risk factors, the highest carrier rate (43/53, 81.13%) was among the females with a history of ultrasound anomalies, more than females with health issues (77.61%) or histories of abnormal pregnancies (76.92%). The highest mutation burdens were also found in these three subgroups, with 1.38, 1.34, and 1.23 in participants with histories of abnormal pregnancies, health issues, and ultrasound anomalies, respectively. To see whether there were any potential directions for further investigation, statistical analyses were conducted and only the carrier rate of females with ultrasound anomalies showed any promise to warrant further confirmation (P = 0.035 compared to the routine screening group).

The modal mutation burden among the whole studied population was 1, ranging from no variant found to five variants per individual (Fig. 2). Specifically, 29.57% (374/1265) of the population were non-carriers, 55.89% (707/1265) had 1–2 PLPVs, and 10.51% (133/1265) had 3 or more variants.

Fig. 2
figure 2

Mutation burden of the sample population. The modal and most common number was one variant, with 452 individuals, followed by people with a negative result at 374 individuals. The next most common was two variants with 255 individuals. Another 99 individuals carried three variants. The rest 34 patients carried four or five variants.

Among the 1397 variants, 1287 (92.13%) were SNVs, 105 (7.52%) were CNVs, and 5 (0.36%) were short tandem repeats (STRs) of FMR1. Variants that might cause inborn errors of metabolism (IEM) include 34 variants of PAH, 23 of SLC22A5, 20 of MMACHC,15 of SLC25A13, 14 of GBA1, etc., adding up to 31% (437/1397) of all PLPVs, making PLPVs that may cause metabolic diseases the most common as a group in our sample population. Other diseases heavily implicated in this study were multisystem diseases (13%), neurological diseases (12%), endocrine diseases (12%), hematological diseases (7%), and auditory diseases (6%) (Fig. 3A).

Fig. 3
figure 3

Distribution of PLPVs. (a) All variants from different systems. The PLPVs found were linked to IEM the most at 31%, followed by multisystem at 13%, and nervous system and endocrine system both at 12%. (b) All genes with a combined PLPVs carrier frequency ≥ 1/100. (c) Single variants with a carrier frequency ≥ 1/200. The most frequently found variants including HBA1/HBA2 -α3.7/αα(p.?) 28 times, DUOX2 c.1588A > T(p.Lys530*) 24 times, SMN1 ex.7del(p.?) 22 times, and ATP6V0A4 c.1029 + 5G > A(p.?) 21 times.

Neuromuscular disease burden was contributed by 22 variants of SMN1 and 17 variants of NEB. Variants causing endocrine disorders were contributed by mutations on DUOX2 (89 variants), CYP21A2 (29 variants), and SRD5A2 (16 variants). Variants that might impact the hematological system 46 variants of HBA1/2, 21 of F11, 16 of HBB, and 12 PRF1. Despite the exclusion of a large number of controversial variants like GJB2 c.109G > A(p.Val37Ile), highly frequent mutations that could cause auditory conditions were still over-represented within the cohort, including 41 variants of USH2A, 26 of ATP6V0A4, 23 of SLC26A4, 21 of GJB2, and 11 of MYO15A. OCA2 and TYR, both associated with oculocutaneous albinism, had 16 and 12 variants each.

The most frequently found mutated genes were DUOX2, HBA1/HBA2, and USH2A, with 89, 46, and 41 variants reported, respectively. Three genes with significance to NGS due to highly homologous pseudogenes or large deletions were present at high mutation rates, namely HBA1/HBA2, CYP21A2, and SMN1, with 46, 29, and 22 variants detected after Sanger/MLPA validation. Other highly frequent variants affected PAH (34 variants), ATP7B (28 variants), ATP6V0A4 (26 variants), etc. (Fig. 3B).

Nineteen variants were found to have a carrier frequency over 1/200 (Fig. 3C), including four variants of DUOX2 and two variants of HBA1/HBA2. It is worth noting that three of those high-frequency variants were CNVs, namely HBA1/HBA2 -α3.7/αα(p.?) and αα/–SEA(p.?), and SMN1 ex.7del(p.?). They added up to 64 (61.0%) counts in a total of 105 CNVs, which consisted of 39 unique ones (Table S3). Three other CNVs were discovered more than once, including 3 NPHP1 whole gene deletion, HBA1/HBA2 αα/-α4.2(p.?) and CYP21A2 ex.1_7del(p.?) were identified in 2 samples each.

Variant detection efficiencies and panel designs

Due to the uneven distribution of the variants, the number of genes increased non-linearly with the addition of lower and lower carrier frequency candidates (Table 2). There was a single gene, DUOX2, in our study that had a carrier frequency over 1/50 (≥ 26 variants detected). The gene number rose to seven at over 1/100 (≥ 13 variants) and 20 at over 1/200 (≥ 7 variants).

Table 2 The assessment of gene detection inclusion criteria based on carrier frequencies.

Hypothetically, selecting the 20 highest-frequency genes would have covered 37.94% of all variants detected. The percentage rose to 56.34% and 61.92% when the gene inclusion criteria expanded to frequencies over 1/400 and 1/500, respectively (Fig. 4A). The yield increase per gene detected dropped sharply within the inclusion of the first 50 genes, around the frequency of > 1/400. The yield entered into a linear phase of slow increase (Fig. 4B).

Fig. 4
figure 4

The relationship between gene carrier frequency, number of genes, and detection rate. (a) The increase of the number of genes included doubled (7–1 = 6, 20–7 = 13, 49–20 = 29) as the carrier frequency halved, and the detection rate increased linearly at first. (b) The variant detection number increased greatly at the beginning with a few genes but reached a linear phase after more than 50 genes were included, and the per-gene detection number also dropped sharply until about 50 genes.

At-risk couples

There were nine (9/388, 2.32%) ARCs carrying 18 relevant variants (they carried other PLPVs only in one partner of the couple) of autosomal recessive conditions (Table S9). The only recurrent risk appearance was GJB2 (2 couples) that could lead to non-syndromic deafness. Another 10 (10/876, 1.14%) females, of which four were screened with their partners, carried variants of an X-linked condition. Among them, 7 were G6PD variants associated with favism, and F8 (Hemophilia A), DMD (Duchenne/Becker muscular dystrophy), and CHM (choroideremia) variant one of each. These couples or individuals were at significant risk of having an affected offspring.

All of the ARC were accepted the genetic consultation. There were ten couples continue to receive medical care and treatment at our hospital. The followed up of these cases’ subsequent pregnancies were summarized in Table 3. All these couples were of Han Chinese origin, except for the wife of AR-CP16 who was from the Laku ethnic group. Couples AR-CP6 and AR-CP16 were carriers of CYP21A2 and HBA1/HBA2 variants, respectively, and amniocenteses ruled out their offspring as carriers. Couples AR-CP4, AR-CP7, AR-CP10, and AR-CP21 were carriers of SLC22A5, PCDH15, RYR1, and CPS1, respectively, and amniocenteses confirmed their offspring as heterozygous carriers not at increased risk of monogenic conditions. All these six couples had a successful delivery of an unaffected child.

Table 3 Seven at-risk couples and three at-risk females with confirmed test results of their subsequent pregnancies.

Couple AR-CP19 were carriers of AGXT variants associated with hyperoxaluria (OMIM:#259900). The fetus was compound heterozygous for both variants after amniocentesis confirmation. The female of Couple XL-CP2 was a carrier of a DMD variant, and a male fetus was confirmed as a carrier after amniocentesis. These two couples decided to terminate the pregnancy after genetic counseling. Non-couple testing female patient XL-SF4 was a carrier of an F8 variant, and she decided to go through IVF to assist her reproductive journey.

The female of Couple XL-CP4 was a carrier of a CHM variant, and a male fetus was confirmed as a carrier after amniocentesis. After comprehensive genetic counseling and careful consideration, the parents conceived and delivered a male infant naturally. The infant showed characteristic phenotypes of choroideremia. The couple received genetic counseling on the details of the condition and we have been continuously following up with the growth of the baby.

Individuals carrying biallelic homozygous variants

Eight apparently healthy individuals during the screening were found to be homozygous with PLPVs (Table S6). Three individuals of each were found to be carrying homozygous variants of GJB2 c.109G > A(p.Val37Ile) and UGT1A1 c.1091C > T(p.Pro364Leu), two of OCA2 c.1441G > A(p.Ala481Thr). They did not report any phenotype related to the disease. Although these variants were considered pathogenic or likely pathogenic, their pathogenicities were relatively weak. They were not reported in this study either in overall variants or in at-risk couples if the offspring were potentially homozygous, but only reported in compound heterozygous cases. Another individual who was homozygous for SH3TC2 c.730C > T(p.Gln244*) had telltale signs of a neuromuscular disorder.

Discussion

This is the first study to evaluate the carrier status among the diverse population in the Yunnan Province of China using an expanded carrier screening panel covering 486 genes. This unique population makeup that consists of groups with diverse ancestries, which share origins and cultures with Southwest China, Southeast Asia, and the rest of China, is in large part due to mass migration during wartime. The ethnic minority groups have maintained certain unique marriage practices and kept their lineages relatively distinct genetically. Therefore, the inclusion of a small but substantial number of these patients within the same cohort alongside the majority Han group within the same region gives us a rare insight into the genetic landscape of the region.

The genetic landscape of the population

In this cohort of 1265 patients, the mutation burden was comparable to studies in Chinese populations at 1.10 variants per person, and the carrier rate at 66.32%, using a 486-gene panel. On one level, this could partially be the result of the expanding scope of the genes screened. For example, an early study in China screening 11 diseases in multiethnic groups had a carrier rate of 27.49%19. Apart from the thalassemia-associated genes, the rest have rather similar carrier frequencies (Table 4). The high level of thalassemia burden in the previous study was likely due to the classic balancing selection pressure from malaria in that region.

Table 4 Comparisons of carrier frequency between this population and this multiethnic population in [19].

Another piece of the puzzle might come from the types of variations screened, such as CNVs31, but this could not account for the differences between studies using a similar design. In fact, among the variants detected, 7.44% were copy number variations, comparable to a level reported previously in Shanghai at 4.18%21.

Interestingly, no perceptible difference was observed between the majority Han ethnic group and the minority groups in terms of mutation burden using this same panel. The 130 patients from all the minority groups had a mutation burden of 1.12 variants per person compared to the 1.10 variants per person of the Han group. The largest group in the study among the minorities, the Yi ethnics, had a burden of 1.10 among the 67 patients. No evidence suggested the carrier rates differed in any particular ethnic group, either. The results seemed to support the use of a pan-ethic panel in China until further evidence suggests otherwise.

A multiple hypothesis testing was performed to investigate whether any subgroup has the potential for further exploration. Within this cohort, the highest carrier rates and mutation burdens were found among females with a history of ultrasound anomalies, having their own health issues, and those with past abnormal pregnancies. Only the carrier rate of the first group showed any statistical significance compared to the routine screening group with no registered risk factors. Further research is needed to confirm whether the difference in females with ultrasound anomalies is genuine or spurious. These pathogenic variants are not associated with embryo lethality or have an established link to intrauterine presentations. One might speculate that the higher mutation number reflected a higher general mutation burden on the genome, which in turn might have affected embryonic health in a polygenic manner.

Genes with high mutation frequencies

There were several genes with a high mutation frequency that had been previously reported in literature among Chinese populations, but there were also clear differences in the genetic landscape of this population.

For genes that are associated with auditory conditions, GJB2 and SLC26A4 are well known to cause non-syndromic hearing loss32,33, and USH2A is associated with Usher syndrome34. Despite the exclusion of some of the most common variants, these genes still counted for quite a number of mutations. Interestingly, ATP6V0A4, which is associated with renal tubular acidosis in addition to sensorineural hearing loss35, had a relatively high frequency in this cohort but was not commonly reported in previous carrier screening studies. On the other hand, GJB3 variants were less frequent in this cohort compared to previous studies33.

Variants that could lead to inborn errors of metabolism (IEM) were found on PAH, ATP7B, SLC22A5, MMACHC, and SLC25A13. These were consistent with earlier studies in China21,36. Similarly, high-frequency variants were found that might cause endocrine conditions on DUOX2 and CYP21A220,37. Other high-frequency variants that were previously reported include the genes like HBA1/2 and SMN120,36,37.

Variants of SRD5A2 were found 16 times in this study, especially c.680G > A(p.Arg227Gln), which is significantly higher than the number reported from other regions in China. A previous study on hypospadias found the variants more common in South China than in North China, especially in Guangxi province, which borders Yunnan38. The geographical and ethnic environs make it not impossible a link to the “kwalatmala” boys of Papua New Guinea, which was among the first groups studied on 5α-reductase deficiency39. Further studies are needed to determine if a link exists.

Notably, the variant of CYP21A2 c.955C > T(p.Gln319*) suffers from interference of the pseudogene CYP21A1P. To be certain of its validity and pathogenicity, the variant needs to be verified through Sanger sequencing. After verification of 20 samples, only three (15%) returned a true positive result, thus a decision was reached that this specific variant would only be verified and then reported in cases of at-risk couples. This variant was flagged by the system 42 times in total (Table S10), but only one was found in an at-risk couple and then verified. This was the only count included in the final analysis. This change in screening strategy significantly increased the efficiency of the variant interpretation and streamlined the verification process for this locus.

The cost–benefit analysis of panel size

The Pareto principle, or the proverbial "80–20 rule", states that about 80% of outcomes are due to 20% of causes, due to the nature of power law distribution if the variable expands over a wide range of magnitudes. The allele frequencies of pathogenic variants vary greatly and thus one would expect their distributions to similarly follow the power law.

A population-specific panel design targeting genes with the highest mutation rate at the local level could make meaningful changes in the economic considerations of both the patients and the wider society40. This is one of the major reasons for conducting this study, i.e. to assess the population-specific mutation profile and establish a genetic basis for designing a multi-tier system of screening panels.

There were 300 genes in this study that were found to have at least one pathogenic or likely pathogenic variant among the sample population, totaling 1397 variants (Table S4). Therefore, the Pareto principle would suggest the top 60 genes responsible would roughly account for more than 1100 variants should the mutation happen randomly. However, deleterious mutations would be purged through natural selection gradually, especially at the higher frequencies, while de novo mutations accumulate within a population. In this population, the top 20% of genes covered only slightly over 60% of all the variants detected.

The variant detection number increased sharply with the first handful of genes but slowed down after the top 20 genes, reaching a plateau phase after more than 50 genes were included. Interestingly, this circa 20 genes line was almost precisely at the > 1/200 frequency cutoff, agreeing neatly with the ACMG recommendation. Based on the findings of this study, a virtual panel consisting of 49 genes with the highest frequencies would cover 56.34% of all the variants detected in this population. If designing a small panel maximizing cost–benefit efficiency was the goal, the > 1/200 cutoff point of the top 20 genes would be a sensible choice given the per-gene detection rate.

At-risk couples and reproductive decision-making

Of the 388 couples sequenced together, 9 (2.32%) pairs were at high risk of having offspring affected by AR conditions, which was not unexpected based on earlier similarly designed studies21. A recurrent risk was only because of the highly prevalent GJB2 c.109G > A(p.Val37Ile) resulting in compound heterozygous offspring. Among female carriers of X-linked conditions, G6PD variants were the most common. These two genes are associated with diseases of limited impact on the individual life expectancy or reproductive success, but quality of life, i.e., non-syndromic hearing loss41 and favism 42.

The disparity in disease screening scope between China and Western nations presents a distinct challenge, rooted in cultural preferences and historical factors. Unlike programs typically prioritizing severe-phenotype diseases, China’s expanded carrier screening also include moderate-impact conditions, reflecting a culturally-mediated emphasis on comprehensive genetic risk assessment. This practice amplifies clinical complexities in genetic counseling, as providers must contextualize extensive risk profiles for patients. Consequently, amniocentesis becomes a standard procedure for most at-risk couples to resolve fetal carrier ambiguities. Among carrier couples of severe phenotypic variants, the reproductive decision-making is also heterogeneous post-diagnosis. One of the most important insights gained from this study was the reproductive choices made by the at-risk couples (Table 3). Seven at-risk couples with autosomal recessive conditions and three couples with female carriers of X-linked conditions went through subsequent reproductive actions. Nine couples continued with natural pregnancies and then used amniocentesis for prenatal screening.

Couple XL-SF4 involved a mother carrying a pathogenic variant of F8, they went through IVF after learning they could go through the procedure at the same center. This suggests that services covering the whole reproductive cycle at the same site could facilitate couples choosing their preferred options for themselves.

Amniocenteses from the first six couples reported encouraging findings for the parents. Couple AR-CP19 had a compound heterozygous fetus for AGXT from amniocentesis, Couple XL-CP2 had a male fetus with a pathogenic CNV on DMD. After genetic counseling and understanding the risks associated with the variants, both couples decided to terminate the pregnancy through induced abortion. It is not clear whether the latter couple was more concerned with the less likely phenotype of the severe Duchenne muscular dystrophy that could lead to an early death, or if they had more reservations about any congenital condition that might impact the quality of life.

Couple XL-CP4 was at risk due to the mother being a carrier of a CHM variant. They conceived a child naturally and subsequent amniocentesis showed this was an affected male fetus, and the ultrasound result suggested cleft lip and palate, but the couple decided to continue with the pregnancy after genetic counseling and careful consideration. After the child was born, the kid showed signs of choroideremia. However, the presentation of cleft lip and palate was more likely of another origin, rather than a hitherto undocumented rare phenotype of CHM mutation, as per Hickam’s dictum43.

These couples, especially the latter four Couples XL-SF4, AR-CP19, XL-CP2, and XL-CP4 demonstrated a quite nuanced attitude when it comes to birth defects. They had embraced the premise of carrier screening, i.e., prevention of birth defects, but also accepted certain risks of having conditions impacting the quality of life in their child. This suggested that genetic counseling and shared decision-making between physicians and patients are indispensable in Chinese couples despite the aforementioned challenges.

Individuals with homozygous pathogenic variants

The main and first proposed purpose of carrier screening was to identify carriers of recessive conditions in order to prevent birth defects or treat in advance individuals with two pathogenic alleles on the same gene. However, not all variants are made the same. Some variants are not damaging enough to cause symptoms on their own in homozygous form or in combination with another weakly pathogenic variant. In this study, eight individuals were found to be in such a condition: three homozygous individuals of each for GJB2 c.109G > A(p.Val37Ile) and UGT1A1 c.1091C > T(p.Pro364Leu), two for OCA2 c.1441G > A(p.Ala481Thr)44,45,46. They all appeared to be otherwise healthy. This further supported our decision in this study to exclude most of these variants except in specific instances.

There was one unusual case involving an individual with a past diagnosis of an unspecified neuromuscular disorder. This patient had foot deformities and scoliosis, and was ambulatory but had trouble maintaining balance. She came for screening due to her own condition as well as her family history. She was tested and found to be homozygous for SH3TC2 c.730C > T(p.Gln244*), which was a gene associated with Charcot-Marie-Tooth disease type 4C and had been reported in Chinese patients 47. This information was subsequently forwarded to the patient and the primary care physician to aid further diagnosis. Fortunately, the partner of the patient was not a carrier and the offspring was not at risk. In this example, it was demonstrated that carrier screenings can occasionally benefit the screened individuals themselves, especially those with medical or family histories, in addition to the reassurance of reproductive success.

Future issues to explore

Recent years have seen a number of carrier screening studies in China, but the study cohort was primarily made up of the Han ethnic group. Officially, there are 55 minority ethnic groups and many of them have unique lineages that share very limited common ancestries with other groups. This study established a baseline genetic spectrum of the Yunnan province, which can be compared to other regions. However, to what extent the differences contributed by the different ethnic groups requires further investigation at the moment.

Establishing the mutation spectrum for different ethnic groups, some of which might still practice marriage traditions that restrict unions with outsiders, is crucial for precise birth defect prevention. To explore the potential of ethnically targeted screening, future carrier screening studies should focus more on minority ethnic groups and recruit more couples from these backgrounds.

Limitations

Some limitations should be borne in mind when analyzing these results and trying to formulate more general conclusions. Variant pathogenicity followed the ACMG guidelines with the available evidence at the time. Changes in the guidelines and advances in medical genetics knowledge may change the interpretations of some variants.

Only a small proportion of the at-risk couples were followed up on their reproductive decisions due to the constraint of the study timeframe. Therefore, their choices might not be representative of the wider population.

Finally, there were many minority ethnic groups included in the study but many groups had only a small number of patients. It did not give a detailed picture of ethnicities.

Conclusions

The population in Yunan province has a genetic profile that is generally similar to that of other Chinese populations elsewhere. No noticeably distinct genetic features were observed in minority groups compared to the Han majority in the region. Patient couples notified of their carrier and at-risk status were able to weigh different options and make a rational decision based on their backgrounds and values. This confirmed that shared decision-making is just as important in China as in Western societies.