Abstract
Exome and genome sequencing (ES/GS) are routinely used for the diagnosis of genetic diseases in developed countries. However, their implementation is limited in countries from Latin America. We aimed to describe the results of GS in patients with suspected rare genetic diseases in Colombia. We studied 501 patients from 22 healthcare sites from January to December 2022. GS was performed in the index cases using dried blood spots on filtercards. Ancestry analysis was performed under iAdmix. Multiomic testing was performed when needed (biomarker, enzymatic activity, RNA-seq). All tests were performed at an accredited genetic laboratory. Ethnicity prediction data confirmed that 401 patients (80%) were mainly of Amerindian origin. A genetic diagnosis was established for 142 patients with a 28.3% diagnostic yield. The highest diagnostic yield was achieved for pathologies with a metabolic component and syndromic disorders (p < 0.001). Young children had a median of 1 year of diagnostic odyssey, while the median time for adults was significantly longer (15 years). Patients with genetic syndromes have spent more than 75% of their life without a diagnosis, while for patients with neurologic and neuromuscular diseases, the time of the diagnostic odyssey tended to decrease with age. Previous testing, specifically karyotyping or chromosomal microarray were significantly associated with a longer time to reach a definitive diagnosis (p < 0.01). Furthermore, one out of five patients that had an ES before could be diagnosed by GS. The Colombian genome project is the first Latin American study reporting the experience of systematic use of diagnostic GS in rare diseases.
Similar content being viewed by others
Introduction
Rare diseases (RD) comprise about 5000 to 8000 medical disorders, each of which affects a small number of patients, but collectively afflict millions [1]. Most of them have a pediatric onset (69.9%) and are responsible for 25% of all pediatric hospitalizations [2]. In Latin America (LATAM) about 7% of the population (~40–50 million people), are living with a rare disease, posing a major public health challenge for these countries [3]. Regardless of the geography, RD share elements such as chronicity, severity, unknown natural history, onerous management, compromised quality of life and difficulties in establishing the diagnosis. Usually, the rarity of the disease is associated with a delay or even lack of diagnosis [4].
The “diagnostic odyssey” describes the journey of the patient and family members toward finding a diagnosis, which may begin prenatally, during the neonatal period, in infancy, but also in adulthood. Shortening or ending the odyssey has important clinical, psychosocial, and economic benefits [5, 6].
Colombia is the third most populous country in LATAM with 52 million inhabitants in an area of 2,070,408 km2 [7], divided into five regions (Andean, Pacific, Caribbean, Amazonía and Orinoquía). Colombia has a RD incidence of less than 1:5000 [8]. The census of RD in Colombia was initiated in 2016, with about 12,000 new patients registered every year. It is thought that 10 to 30% of them have a genetic component [9, 10]. Since the last decade, changes in the Colombian health coverage services have allowed a wider utilization of Sanger sequencing, MLPA (multiplex ligation-dependent probe amplification), chromosomal microarrays (CMA) and Next-generation sequencing (NGS) methodologies. Furthermore, from 2015 exome sequencing (ES) was implemented as a useful tool to achieve genetic diagnoses, establish therapeutic management, and provide genetic counseling to patients and families with RD [10]. It is recognized that the diagnostic yield of ES is between 20 to 30%, being higher in critically ill patients or with multiple malformations [11, 12]. The American College of Medical Genetics and Genomics guidelines (ACMG) recommend ES as a first tier test in pediatric patients with congenital anomalies or intellectual disability [13].
Genome sequencing (GS) could offer a higher diagnostic yield [14], given its technical advantages, with even coverage which includes the non-coding regions, better detection of copy number variants (CNV), structural variants (SV) and even the detection of repeat expansions [14]. GS has been particularly useful in patients with intellectual disability and malformations (40–55%) [15]. However, there is limited data on the use of ES for the diagnosis of RD in LATAM [16,17,18,19]. Furthermore, only a few initiatives are known that applied GS such as the Rare Genomes Project [20], the Odyssey Project, the Brazilian Genomes Project (https://www.genomasraros.com/projeto/), and the 80 Plus Project. Apart from the information provided by the Rare Genomes Project, with a preliminary diagnostic yield of 35.7%, we have no further evidence of the diagnostic or clinical utility of GS in LATAM [21]. In this paper, we describe these characteristics, and, in addition, we analyzed the diagnostic performance of the test and the diagnostic odyssey in the largest Colombian cohort of patients with suspected rare diseases reported until now.
Materials and methods
Study design and settings
We evaluated 501 participants that were referred for genetic testing by specialized physicians and clinical geneticists at a healthcare provider in Colombia, from January to December of 2022. The study included 22 healthcare centers in Andina, Pacific and Caribbean regions. A retrospective analysis was made to describe GS results for the 501 patients (Supplementary Appendix 1).
Participants
All eligible patients were asked to participate in the study and informed consent for GS testing and genetic testing was given by adult patients, parents of minors, or caregivers, respectively. Inclusion criteria included all consecutive patients with a suspected RD. Exclusion criteria are defined in Supplementary Appendix 2. No follow-up of patients was done. Data regarding family history, consanguinity, disease onset, motive of referral and previous genetic testing were extracted from our database and curated individually by medical geneticists and molecular biologists.
Variables
All variables were collected from primary sources within the 22 participating healthcare centers. Four main categories were considered: demographic variables, clinical variables, genetic variables, and related to patient evolution and outcomes (treatment, genetic counseling).
Clinical data
The clinical information provided by referring clinicians was converted into Human phenotype ontology (HPO) terms by a dedicated team of scientists at the Bioscience Center, Sura Omics Science Center in SURA Colombia and CENTOGENE in Germany. During this process, relevant information was curated using HPO terms and registered in the laboratory management system (LIMS) and CENTOGENE Biodatabank. This clinical information was used to assess the relevance of the identified variants.
Genome sequencing (GS) and bioinformatic pipeline
DNA was extracted from dried blood spots on filters (CentoCard ®) using standard column-based methods. GS was performed as described before [22]. In brief, genomic DNA was cleaved enzymatically, and libraries were generated by PCR with Illumina compatible adapters. The libraries were paired end sequenced on an Illumina platform to yield an average coverage depth of ~30x. An in-house bioinformatics pipeline, including read alignment to GRCh37/hg19 genome assembly and revised Cambridge Reference Sequence (rCRS) of the Human Mitochondrial DNA (NC_012920), variant calling, annotation, and comprehensive variant filtering is applied. Copy number variation (CNV) calling is based on the DRAGEN pipeline from Illumina. All variants with minor allele frequency (MAF) of less than 1% in gnomAD database, and disease-causing variants reported in HGMD®, in ClinVar or in CentoMD® were evaluated.
Although the evaluation was focused on coding exons and flanking intronic regions, the complete gene region was interrogated for candidate variants with plausible association to the phenotype. All potential mode of inheritance were considered (autosomal recessive, autosomal dominant, X-linked dominant and recessive, mitochondrial) along with variant features such as observed zygosity, frequency in external and internal databases, phenotype of individuals with the same variant/zygosity, in silico predictions, gene-disease validity according to ClinGen guidelines, known disease mechanism, and variant classification in CENTOGENE and external databases. Mitochondrial variants are reported for heteroplasmy levels of 15% or higher. Filtering and variant evaluation is performed with a tool developed at CENTOGENE. Variants with a minimum read depth of 10 reads, and variant allele fraction (>25%) were considered, along with a general quality score >100. Details of the quality score calculation were published before [23]. Variants with allele fraction <25% might indicate mosaicism or sequencing artifacts. Visual inspection of each variant considered for reporting was done (BAM files visualized using the IGV browser) [24]. Consequently, a technical specificity, i.e., accuracy of the observed genotypes of >99.9% for all reported variants is achieved.
Biochemical testing
Biochemical testing assays from dried blood spots (DBS) were performed to clarify the effect of genetic variants with determination of the corresponding enzymatic activity and/or biomarker concentration. The enzymatic activities were determined either by fluorimetry or liquid chromatography coupled with mass spectrometry in DBS. The quantification of the biomarkers was performed in DBS using mass spectrometry (a full list of the tests can be found in [25]).
RNA-seq
A protocol was developed to perform RNA-seq using RNA extracted from filter card DBS, library preparation and bioinformatic pipeline analysis. Splicing patterns in cases were inspected using IGV interactive program and adding three independent controls. Relative gene expression was calculated using housekeeping genes and compared against controls.
Variants classification and reporting
Variants are classified into five classes (pathogenic, likely pathogenic, variants of uncertain significance (VUS), likely benign, and benign) along ACMG/AMP SVI guidelines for classification of genetic variants. All relevant variants related to the phenotype of the patient were reported [26]. In addition, upon signed consent, pathogenic and likely pathogenic variants not associated with the patient’s disease or symptoms but medically actionable (Secondary Findings) are reported in a dedicated section (73 selected genes according to ACMG guidelines) [27]. Furthermore, in the carriership section, we report pathogenic and likely pathogenic sequence variants in known genes associated with severe and early-onset autosomal recessive and X-linked disorders regardless of the current phenotype. The selection of over 2000 genes was carefully curated to reflect up-to-date scientific and medical knowledge (e.g., includes genes from the ACMG carrier screening guidelines [28].
Analytic strategy
A descriptive analysis was performed, which includes summary measures such as medians, percentiles, and proportions. Associations were statistically qualified using Fisher´s exact test or Chi-square, as appropriate. p values less than 0.05 were considered significant. The Mann–Whitney and Kruskal–Wallis U test were also used to determine the difference between groups; the Shapiro–Wilk test was used to assess the normality of the variables.
Ethnicity admixtures were predicted using the iAdmix software (*iAdmix is released under the MIT license) [29]. A model with 25 distinct ethnicities was trained using a research dataset from previous whole genome analysis at CENTOGENE. During the ethnicity prediction, we obtain a maximum likelihood estimate of the global admixture proportions using the Broyden-Fletcher-Goldfarb-Shanno (BFGS) optimization algorithm for each of the 25 ethnicities. The Top 3 ethnicities for every patient are used for subsequent analysis.
All the statistical analyzes were developed in the statistical software R, version 4.2.0.
Results
We included 501 index cases with suspected genetic diseases, without a conclusive clinical diagnosis. Many patients had had previous genetic testing such as karyotype (24%), ES (24%) and CMA (22%). Over half of the patients were children or preadolescents (52%), 16% were adolescents and 32% were adults. The median age at onset of symptoms was 1 year (0–12). In a small number of patients (8.9%) parental consanguinity was reported, while many of them mentioned positive family history (44%) (Table 1).
Most of the patients originated from the province of Antioquia (Andean/Northwestern Region) due to the health insurance coverage in this region (SURA health insurance company). The analysis highlighted a preponderance of Amerindian/American component close to 81% (n = 410 patients), followed by European with 16% (mainly South European, n = 305). Only 79 patients had European ethnicity as first prediction, and 16 patients African (Fig. 1).
Dendrogram representing the ancestry analysis results from the 501 patients included in this study. The results illustrate the preponderance of an Amerindian component close to 81% (410 patients), followed by South European with 16% (305 patients). The ancestry predictions are shown as number of patients, with TOP 3 predictions shown. First prediction: left side, second prediction at the middle and third prediction at the right side of the diagram.
Phenotypic analysis was conducted using HPO (Human Phenotype Ontology) terms, with an average of 5 terms per patient. To determine the most affected taxonomic group, signs and symptoms with the highest clinical impact were considered. The current cohort of patients exhibited an average of 2 affected taxonomies. CNS structural abnormalities were observed in 262 (52%) patients, cognitive involvement was detected in 192 patients (38%), facial dysmorphisms were identified in 156 patients (31%), and neuromuscular disorders were present in 125 patients (25%) (Fig. 2A).
A Diverse taxonomy groups (in percentages) among 501 patients in the Colombian rare genomes project. Taxonomy groups were formed according to the main signs and symptoms of the patients. The dotted orange line reflects the mean of the percentages. The distribution of each taxonomy was analyzed independently. Disorders of the CNS (52.3%) and of the cognitive functions (38.3%) were the most frequently detected. B Percentage of positivity by disease group. The x-axis describes the percentage of the overall diagnostic yield (dashed orange line) and of each disease group independently. The y-axis describes the disease groups. The highest diagnostic yield was achieved for metabolic disorders (64.3%), followed by syndromic disorders (43.8%).
Molecular findings
Clinically relevant variants were reported in 237 genes with 144 pathogenic/likely (P/LP) pathogenic variants. Several of these P/LP variants were repeatedly detected in genes such as CYP21A2 (5 patients), SCN1A, GBA and FKBP14 (4 patients each) (Supplementary Appendix 3). Of the variants classified as P/LP, 96% were in coding regions, whereas 4% were noncoding variants in RMRP, POLR3A, NPC1, UGT1A1, and CYP21A2. Most of the variants corresponded to SNVs (81%), followed by CNVs (12%), small indels (5%) and aneuploidies (1%) (Fig. 3A). Missense variants were the most reported type in this dataset (Fig. 3B). When comparing the molecular effect between the P/LP and VUS group, there were significant differences (p < 0.01), with more missense variants reported in the VUS group (71% vs. 36.1%) and more loss of function (LoF) variants in the P/LP group (Fig. 3B), as expected.
A This figure shows the type of variants classified as P/PL (left) and VUS (right) in percentages. B The bar graph shows the type of variants according to their molecular impact among P/LP variants (left) and VUS (right), in percentages. As expected, there were more CNVs and indels among P/LP variants than among VUS (12% and 5% vs. 6% and 1%, respectively), with a predominance of nonsense and frameshift variants in the P/LP group compared to the VUS group (20.9% and 18.1% vs. 3.3% and 5%, respectively). The VUS group was composed mainly by missense and variants with unknown effect (e.g. noncoding variants). SNV single nucleotide variant, INDEL insertion/deletion variant, CNV copy number variant.
Most of the diagnosed diseases had a mode of inheritance compatible with autosomal dominant inheritance (71, 50.4%), followed by autosomal recessive (51, 36.2%), X-linked (14, 9.9%), aneuploidies (4, 2.8%) and, with mitochondrial inheritance (1, 0.7%). A mosaic case was also detected corresponding to an 11-year-old patient with a molecular diagnosis of lissencephaly and subcortical band heterotopia (OMIM: *601545, patient GM86) with a mosaic PAFAH1B1 variant NM_000430.3:c.869_870del p. (Tyr290Phefs*5) detected in 25.8% of the reads (Fig. 4A).
A Detection of mosaic variant in a 11-year-old patient in the PAFAH1B1 gene. A pathogenic variant was detected in 25.8% of the reads (IGV browser). This result confirmed the diagnosis of lissencephaly type 1, autosomal dominant. B IGV image showing two variants in the GAA gene (in compound heterozygosity in a 14-year-old patient) The first variant is pathogenic c.1465G > A p. Asp489Asn and, the second is a variant of uncertain significance c.2799+59A > G. The transcriptomic study showed no splicing alterations between exons 18 and 19, thus, the second variant remains of uncertain significance.
In selected cases, the clinical significance of the detected variants was further investigated by additional testing using the same sample provided. In a male patient (GM214) with neurodevelopmental delay, seizures, hepatosplenomegaly and tetraparesis, we detected two heterozygous P/LP variants in NPC1. The biomarker PPCS (C24H50O7N2P—legacy name lyso SM-509) was pathologically increased, which supported the diagnosis of Nieman-Pick disease type C1 in the patient. The second patient (GM378) was a 4-year-old male, with bilateral coxa valga, encephalopathy, dysphagia, periventricular leukomalacia, dental malocclusion, developmental regression, and growth delay. We detected a hemizygous VUS in the gene IDS NM_000202.5:c.1477C > T p. (Arg493Cys). Consecutively, the enzymatic activity of iduronate sulfatase was found pathologically decreased, supporting the diagnosis of Mucopolysaccharidosis II in the patient. In a third patient (GM176), a 14-year-old male presented with weakness in lower extremities since the age of 12 years. We detected two variants in GAA, a pathogenic (NM_000152.3:c.1465G > A p. (Asp489Asn)) and VUS (NM_000152.3:c.2799+59A > G) in compound heterozygous state. Further testing detected a mildly reduced acid alpha-glucosidase activity. Screening for additional heterozygote individuals for any of the detected variants revealed that the pathogenic variant was leading to approximately a 60% reduction of the enzymatic activity, while individuals with the heterozygous intronic VUS presented normal enzyme activity. Furthermore, RNA-seq analysis excluded an abnormal splicing of GAA. Taken together, the use of multiomic analysis supported a non-causal role for the c.2799+59A > G variant, making the diagnosis of Pompe disease unlikely (Fig. 4B).
Diagnostic yield
A molecular diagnosis could be defined for 142 (28.3%) index cases. The highest diagnostic yield was achieved for pathologies with a metabolic component (n = 18, 64.3%, p < 0.001), followed by syndromic disorders (n = 56, 43.8%, p < 0.001), neurological abnormalities such as neuromuscular (n = 17, 25.4%), neurological (n = 28, 19.9%) and intellectual disability (n = 6, 13.3%). Diseases with a major cardiovascular or hemato-immunological component had lower diagnostic yields in our cohort (p < 0.05) (Fig. 2B).
A total of 119 patients (24%) from this cohort had previous negative or inconclusive ES studies and of these, 23 patients (19.3%) received a positive diagnosis with GS. Among these there were two SNVs located in noncoding regions (GM79, RMRP, associated with cartilage-hair hypoplasia and GM275, HSD17B4, associated with Perrault syndrome). There were also four large CNVs: three of them were copy-number losses (GM43, CLCNKB gene; GM146, APTX gene in homozygosis and GM143, BPTF in heterozygosis) and one case of copy-number gain affecting the DMD gene in patient GM394 (Table 2).
We analyzed features such as parental consanguinity, positive family history, age at onset of symptoms, number of HPO terms, among other features. Only the total number of taxonomies suggested a statistical relationship to the final diagnostic result, with patients with more taxonomies (3 or more) having a positive GS result (Supplementary Table 1). A multiorgan involvement, as reflected in a higher number of affected taxonomic groups per patient, might be indicative of an underlining genetic disorder. This can explain the higher diagnostic yield among patients with higher number of affected taxonomies (Supplementary Table 1).
In addition, the carriership status for P/LP variants with relevance for family planning were analyzed for all patients/families (upon consent, see Methods section). Sixty-one percent of the patients analyzed (308/501) had at least one P/LP variant in any of 162 reported genes (heterozygous carrier status). The most frequently reported variants were in the following genes: BTD (n = 33, 7%), GJB2 (n = 17, 4%), CFTR (n = 13, 3%), CBS (n = 11, 2%), BCHE (n = 10, 2%), ALG12 (n = 9, 2%, HBB (n = 8, 2%), DPYD (n = 8, 2%), POLR3A (n = 8, 2%). As for secondary findings, according to the recommended list of genes of the ACMG guidelines [30], we found that 11 patients (2%) presented relevant variants (Table 2).
Diagnostic odyssey
In 142 positive cases, we examined the time elapsed between symptom onset and genetic diagnosis. As expected, the diagnostic odyssey was longer in older patients, with a median of 1 year for young children (0–5 years) versus a median of 15 years in adolescents and adult patients (13–35- and 36–86-year groups). The most significant differences were found between extreme age groups (0–5 years and 36–86 years, p value < 0.001, Supplementary Appendix 4). Thus, to facilitate and allow further comparations among patients, we corrected for the patients’ age and normalized the odyssey time by age. Note that this is a relative measure of the interval of time that a person lived without a diagnosis (diagnostic odyssey).
Then, we observed interesting differences in the diagnostic odyssey according to the pathology and age ranges. Patients with syndromic disorders have spent more than 75% of their life without a diagnosis, while for patients with neurologic and neuromuscular diseases, the time of the odyssey tended to decrease with age (inversely related, Fig. 5).
Odyssey time intervals were also analyzed according to taxonomy groups. The diagnostic odyssey was longer for patients with taxonomy associated with face and neck abnormalities (p = 0.003) and with growth disorders (p = 0.002), while shorter time intervals were observed for patients presenting neuromuscular abnormalities (p = 0.045) (Supplementary Appendix 5).
We then analyzed the possible relationship between previous genetic testing and the odyssey time. Having karyotype or CMA testing before GS was associated with a longer time to reach a definitive diagnosis (p < 0.01). Having a previous ES did not show any significant relationship. No statistical differences in odyssey times were observed when analyzing other variables such as parental consanguinity and positive family history (p > 0.05).
Discussion
In this work we describe the results of GS applied to a clinically heterogeneous cohort of 501 patients with suspected RD from Colombia. Of note, patients with clinical entities that could be easily diagnosed by ES or gene panels were not included, given the well-known diagnostic performance of these tests in patients with hereditary cancer, easily characterized dysmorphic syndromes, among others.
The patients from this cohort were presented with 970 HPO terms, 18 taxonomies, and 196 clinically suspected diseases (ICD10) [4, 9]. A recent Latin American study of 103 Chilean patients with heterogeneous genetic phenotypes analyzed with ES showed an average of 6 compromised systems [31].
By using a dedicated ethnicity prediction tool, we show that most of the patients included in the study have an Amerindian (native American) origin which has been underrepresented in public databases and other NGS studies [14, 21]. Importantly, this is the first LATAM study reporting the experience of systematic use of diagnostic GS. Efforts to include diverse populations in genomic studies are essential for advancing precision medicine and developing interventions across different ethnic backgrounds.
In this cohort, the diagnostic yield was 28.3%, which is within the known range of 22–50% reported in other studies [13]. Importantly, a higher diagnostic yield of 64.3% was observed among patients with suspected metabolic disorders and a yield of 43.8% was observed among patients with syndromic involvement, supporting the indication of GS as a preferred testing method or first tier for these patient groups. The high diagnostic yield obtained for metabolic disorders might have two main reasons. Firstly, the lack of early diagnosis (e.g., by newborn screening) in Colombia. Currently, newborn screening in Colombia only includes congenital hypothyroidism. Further efforts are required to reduce the gap in the diagnosis of metabolic pathologies. Secondly, the application of multi-omics (e.g., biomarkers, enzymatic activity measurements), and/or RNA-seq allowed us to further understand the relevance of the detected variants, which is especially useful in metabolic disorders.
There are few GS/ES studies reported in LATAM. In 2019, a study based on 60 Mexican patients with suspected genetic diseases explored the use of GS as first-tier genetic test [32]. Another study referred the experience with ES in Brazil, which describes a diagnostic yield of 31.6% which was driven by the prenatal samples included (67% diagnostic rate), and children younger than 1 year (44% diagnostic rate) [19]. The difference in diagnostic yield with the current study could be due to the age of the patients included. Prenatal samples were not included in our study, with a median patient age of 12 years [4, 9].
Although in most cases, the search for the genetic cause of diseases is focused on the coding regions, the study of non-coding regions is becoming increasingly relevant, variants located on introns and regulatory regions could explain an important percentage of diseases [33]. For example, a recent RNA-seq study of 15 patients clinically diagnosed with Neurofibromatosis type 1 but negative NF1 testing, identified that 10 out of the 15 patients presented noncoding changes including deep intronic variants, transposon insertions causing noncanonical splicing, and a branch point variant [34].
In the current study, we identified P/LP variants affecting noncoding regions, which would be missed with ES (Table 2). The interpretation of noncoding variants remains challenging, and in many cases, complementary methods based on enzyme activity determination, biomarker testing, and/or RNA analysis are necessary for variant classification and reporting, as demonstrated in our study. We recommend the use of this multi-omics approach to understand the functional consequences of noncoding and other genomic variants with unclear effect [25, 35], detected by GS serving as a diagnostic tool for precision medicine [31,32,33,34] .
A recent meta-analysis that included 154 studies performing genetic testing in patients with epilepsy detected that GS gave the highest diagnostic yield (48%), followed by ES (24%), gene panels (19%), and CMA (9%), supporting that GS should be prioritized as testing method for specific types of disorders [36,37,38,39,40].
Traditionally, the diagnostic odyssey is measured in years, which is then compared among patients and studies. However, this “crude” measure of the diagnostic odyssey in years might not accurately reflect the impact that the disease has had on patients or allow for comparisons between subjects of different ages [41].
In the current study, we used a “standardized” odyssey which allowed comparations between patients from different age groups. We found that patients with syndromic disorders have spent more than 75% of their life without a diagnosis, while for patients with neurologic and neuromuscular diseases the diagnostic odyssey was relatively short.
We found that pre-testing, specifically having karyotyping or CMA testing was associated with a longer time to reach a definitive diagnosis (p < 0.01). This is relevant since referring clinicians should be aware of the most efficient diagnostic tests for specific diseases. While in the past karyotyping and CMA testing were the recommended first tier testing methods, currents guidelines strongly recommend that ES/GS should be considered as a first- or second-tier test for patients with congenital anomalies, developmental delay, or intellectual disability [13].
The current limitations of the use of GS in the clinical practice include the difficulties in detecting SV (that do not lead a copy number change) but that can disrupt relevant sequences (especially relevant with short reads methodologies used for genome sequencing). Recent studies have shown that SVs can not only affect gene dosage but also modulate basic mechanisms of gene regulation. SVs are still mainly interpreted using the genetic dosage approach and general awareness of 3D position effects is low, leaving many patients undiagnosed [42]. We face difficulties in understanding the clinical relevance of noncoding variants. From the tecnical point of view, the detection of repeat expansions is still challenging. Finally, the lack of knowledge on the function of many genes is affecting the understanding of the relevance of the detected variants. Further analyses focusing in negative cases with similar clinical presentations can lead to the identification of new gene-disease relationships. As example, in three patients from this cohort, we identified a novel common variant in SNUPN. This helped to establish SNUPN deficiency as the genetic etiology of a previously unrecognized subtype of muscular dystrophy [43].
Conclusion
This is the first LATAM study reporting the experience of the systematic use of diagnostic GS in patients with suspected RD. The diagnostic rate reached was close to 1 in 3.5 patients analyzed (28.3%), demonstrating the diagnostic utility of the test as reported in other populations. The diagnostic categories associated with metabolic, syndromic diseases, and neurocognitive disorders demonstrate the greatest technical robustness of GS in our cohort. The application of simultaneous multi-omics testing was valuable in assessing the clinical relevance of the detected variants. We anticipate that the current work will inspire the scientific and medical community to implement diagnostic GS as the preferred testing method in the diagnosis of RD in LATAM. Furthermore, we expect that governmental entities will recognize and accelerate the implementation of GS in the health care of patients with RD. Researchers and policymakers should continue to prioritize initiatives aimed at increasing diversity and representation in genomic databases and research studies to promote health equity and enhance our understanding of genetic contributions to disease.
References
Ferreira CR. The burden of rare diseases. Am J Med Genet A. 2019;179:885–92.
Nguengang Wakap S, Lambert DM, Olry A, Rodwell C, Gueydan C, Lanneau V, et al. Estimating cumulative point prevalence of rare diseases: analysis of the Orphanet database. Eur J Hum Genet. 2020;28:165–73.
Itzel Monserrat Vargas. Rare diseases in the Americas. 2023. pp. 1–20. https://www.wilsoncenter.org/article/infographic-rare-diseases-americas.
Giugliani R, Castillo Taucher S, Hafez S, Oliveira JB, Rico-Restrepo M, Rozenfeld P, et al. Opportunities and challenges for newborn screening and early diagnosis of rare diseases in Latin America. Front Genet. 2022;13:1053559.
Bauskis A, Strange C, Molster C, Fisher C. The diagnostic odyssey: insights from parents of children living with an undiagnosed condition. Orphanet J Rare Dis. 2022;17:233.
Sawyer SL, Hartley T, Dyment DA, Beaulieu CL, Schwartzentruber J, Smith A, et al. Utility of whole-exome sequencing for those near the end of the diagnostic odyssey: time to address gaps in care. Clin Genet. 2016;89:275–84.
National Administrative Department of Statistics of Colombia. El territorio colombiano. 2011. pp. 1–20. https://geoportal.dane.gov.co/servicios/atlas-estadistico/src/Tomo_I_Demografico/1.1.-el-territorio-colombiano.html.
Ministerio de Salud de Colombia. Ley 1392 del 2 de julio del 2010 para enfermedades huerfanas. 2010. pp. 1–30. https://www.minsalud.gov.co/sites/rid/Lists/BibliotecaDigital/RIDE/DE/DIJ/ley-1392-de-2010.pdf.
Richter T, Nestler-Parr S, Babela R, Khan ZM, Tesoro T, Molsen E, et al. Rare disease terminology and definitions—a systematic global review: report of the ISPOR Rare Disease Special Interest Group. Value Health. 2015;18:906–14.
De Castro M, Restrepo CM. Genetics and genomic medicine in colombia. Mol Genet Genomic Med. 2015;3:84–91.
Instituto Nacional de Salud de Colombia. Enfermedades huérfanas-raras. 2020. pp. 1–25. https://www.ins.gov.co/buscador-eventos/Informesdeevento/ENFERMEDADES%20HUERFANAS%20PE%20VI%202022.pdf.
Alix T, Chéry C, Josse T, Bronowicki JP, Feillet F, Guéant-Rodriguez RM, et al. Predictors of the utility of clinical exome sequencing as a first-tier genetic test in patients with Mendelian phenotypes: results from a referral center study on 603 consecutive cases. Hum Genomics. 2023;17:5.
Manickam K, McClain MR, Demmer LA, Biswas S, Kearney HM, Malinowski J, et al. Exome and genome sequencing for pediatric patients with congenital anomalies or intellectual disability: an evidence-based clinical guideline of the American College of Medical Genetics and Genomics (ACMG). Genet Med. 2021;23:2029–37.
Krant ID, Medne L, Weatherly JM, Wild KT, Biswas S. Effect of whole-genome sequencing on the clinical management of acutely ill infants with suspected genetic disease: a randomized clinical trial. 2023. https://pubmed.ncbi.nlm.nih.gov/34570182/.
Smedley D, Smith KR, Martin A, Thomas EA, McDonagh EM, Cipriani V, et al. 100,000 genomes pilot on rare-disease diagnosis in health care—preliminary report. N Engl J Med. 2021;385:1868–80. http://www.nejm.org/doi/10.1056/NEJMoa2035790.
Errante PR, Franco JL, Espinosa‐Rosales FJ, Sorensen R, Condino‐Neto A. Advances in primary immunodeficiency diseases in Latin America: epidemiology, research, and perspectives. Ann N Y Acad Sci. 2012;1250:62–72.
Bevilacqua JA, Guecaimburu Ehuletche MR, Perna A, Dubrovsky A, Franca MC, Vargas S, et al. The Latin American experience with a next generation sequencing genetic panel for recessive limb-girdle muscular weakness and Pompe disease. Orphanet J Rare Dis. 2020;15:11.
Córdoba M, Rodriguez-Quiroga SA, Vega PA, Salinas V, Perez-Maturo J, Amartino H, et al. Whole exome sequencing in neurogenetic odysseys: an effective, cost- and time-saving diagnostic approach. PLoS ONE. 2018;13:e0191228.
Quaio CRDC, Moreira CM, Novo‐Filho GM, Sacramento‐Bobotis PR, Groenner Penna M, Perazzio SF, et al. Diagnostic power and clinical impact of exome sequencing in a cohort of 500 patients with rare diseases. Am J Med Genet C Semin Med Genet. 2020;184:955–64.
Coelho AVC, Mascaro-Cordeiro B, Lucon DR, Nóbrega MS, Reis RS, de Alexandre RB, et al. The Brazilian Rare Genomes Project: validation of whole genome sequencing for rare diseases diagnosis. Front Mol Biosci. 2022;9:821582.
Félix TM, Fischinger Moura de Souza C, Oliveira JB, Rico-Restrepo M, Zanoteli E, Zatz M, et al. Challenges and recommendations to increasing the use of exome sequencing and whole genome sequencing for diagnosing rare diseases in Brazil: an expert perspective. Int J Equity Health. 2023;22:11.
Bertoli-Avella AM, Beetz C, Ameziane N, Rocha ME, Guatibonza P, Pereira C, et al. Successful application of genome sequencing in a diagnostic setting: 1007 index cases from a clinically heterogeneous cohort. Eur J Hum Genet. 2021;29:141–53.
Rehm HL. Peter Bauer, Ellen Karges, Gabriela Oprea and Arndt Rolfs. Genet Med. 2018;20:378–9.
Bertoli-Avella AM, Kandaswamy KK, Khan S, Ordonez-Herrera N, Tripolszki K, Beetz C, et al. Combining exome/genome sequencing with data repository analysis reveals novel gene–disease associations for a wide range of genetic disorders. Genet Med. 2021;23:1551–68.
Almeida LS, Pereira C, Aanicai R, Schröder S, Bochinski T, Kaune A, et al. An integrated multiomic approach as an excellent tool for the diagnosis of metabolic diseases: our first 3720 patients. Eur J Hum Genet. 2022;30:1029–35.
Richards S, Aziz N, Bale S, Bick D, Das S, Gastier-Foster J, et al. Standards and guidelines for the interpretation of sequence variants: a joint consensus recommendation of the American College of Medical Genetics and Genomics and the Association for Molecular Pathology. Genet Med. 2015;17:405–24.
Miller DT, Lee K, Chung WK, Gordon AS, Herman GE, Klein TE, et al. ACMG SF v3.0 list for reporting of secondary findings in clinical exome and genome sequencing: a policy statement of the American College of Medical Genetics and Genomics (ACMG). Genet Med. 2021;23:1381–90.
Gregg AR, Aarabi M, Klugman S, Leach NT, Bashford MT, Goldwaser T, et al. Screening for autosomal recessive and X-linked conditions during pregnancy and preconception: a practice resource of the American College of Medical Genetics and Genomics (ACMG). Genet Med. 2021;23:1793–806.
Bansal V, Libiger O. Fast individual ancestry inference from DNA sequence data leveraging allele frequencies for multiple populations. BMC Bioinforma. 2015;16:4.
Miller DT, Lee K, Abul-Husn NS, Amendola LM, Brothers K, Chung WK, et al. ACMG SF v3.1 list for reporting of secondary findings in clinical exome and genome sequencing: a policy statement of the American College of Medical Genetics and Genomics (ACMG). Genet Med. 2022;24:1407–14.
Poli MC, Rebolledo-Jaramillo B, Lagos C, Orellana J, Moreno G, Martín LM, et al. Decoding complex inherited phenotypes in rare disorders: the DECIPHERD initiative for rare undiagnosed diseases in Chile. Eur J Hum Genet. 2024. https://doi.org/10.1038/s41431-023-01523-5.
Scocchia A, Wigby KM, Masser-Frye D, Del Campo M, Galarreta CI, Thorpe E, et al. Clinical whole genome sequencing as a first-tier test at a resource-limited dysmorphology clinic in Mexico. NPJ Genom Med. 2019;4:5.
D’haene E, Vergult S. Interpreting the impact of noncoding structural variation in neurodevelopmental disorders. Genet Med. 2021;23:34–46. https://doi.org/10.1038/s41436-020-00974-1.
Douben HCW, Nellist M, van Unen L, Elfferich P, Kasteleijn E, Hoogeveen‐Westerveld M, et al. High-yield identification of pathogenic NF1 variants by skin fibroblast transcriptome screening after apparently normal diagnostic DNA testing. Hum Mutat. 2022;43:2130–40.
Cheema H, Bertoli-Avella AM, Skrahina V, Anjum MN, Waheed N, Saeed A, et al. Genomic testing in 1019 individuals from 349 Pakistani families results in high diagnostic yield and clinical utility. NPJ Genom Med. 2020;5:44.
Belkadi A, Bolze A, Itan Y, Cobat A, Vincent QB, Antipenko A, et al. Whole-genome sequencing is more powerful than whole-exome sequencing for detecting exome variants. Proc Natl Acad Sci USA. 2015;112:5473–8.
Gross AM, Ajay SS, Rajan V, Brown C, Bluske K, Burns NJ, et al. Copy-number variants in clinical genome sequencing: deployment and interpretation for rare and undiagnosed disease. 2018. https://doi.org/10.1038/s41436-018-0295-y.
Manzoni C, Kia DA, Vandrovcova J, Hardy J, Wood NW, Lewis PA, et al. Genome, transcriptome and proteome: the rise of omics data and their integration in biomedical sciences. Brief Bioinform. 2018;19:286–302.
Hou YCC, Yu HC, Martin R, Cirulli ET, Schenker-Ahmed NM, Hicks M, et al. Precision medicine integrating whole-genome sequencing, comprehensive metabolomics, and advanced imaging. Proc Natl Acad Sci USA. 2020;117:3053–62.
Chu X, Jaeger M, Beumer J, Bakker OB, Aguirre-Gamboa R, Oosting M, et al. Integration of metabolomics, genomics, and immune phenotypes reveals the causal roles of metabolites in disease. Genome Biol. 2021;22:198.
Tinker RJ, Fisher M, Gimeno AF, Gill K, Ivey C, Peterson JF, et al. Diagnostic delay in monogenic disease: a scoping review. Genet Med. 2024;26:101074. https://linkinghub.elsevier.com/retrieve/pii/S1098360024000078.
Spielmann M, Lupiáñez DG, Mundlos S. Structural variation in the 3D genome. Nat Rev Genet. 2018;19:453–67.
Nashabat MN, Nabavizadeh HP, Saraçoğlu B, Sarıbaş Ş, Avcı Ş, Börklü E, et al. SNUPN deficiency causes a recessive muscular dystrophy due to RNA mis-splicing and ECM dysregulation. Nat Commun. 2024;15:1758. https://www.nature.com/articles/s41467-024-45933-5.
Acknowledgements
We thank the entire work team of SURA Bioscience Center, CENTOGENE and SURA Genetic Unit for all the entire work done during the Colombian Rare Genomes Project. We also thank Dr. Harry Pachajoa, Dr. Carolina Baquero, Dr. Gustavo Giraldo, Dr. Jorge Montoya and Dr. Derly Castro for there contribution with the project.
Funding
This study was fund by Sura Colombia (EPS-Ayudas Diagnósticas SURA-Centro de Biociencias). The sponsors were partially involved in the study, they only had a roll in reviewing of the final manuscript. They had no role in the study design, data collection and analysis.
Author information
Authors and Affiliations
Contributions
HMV: conceptualization; data curation; formal analysis; supervision; project administration, supervision, validation and writing. ABA: conceptualization; data curation; formal analysis; writing. JEM: conceptualization; data curation; formal analysis; supervision; validation and writing. DSC: formal analysis, data curation, writing. LAG: funding acquisition; writing. MNV: data curation, validation and writing. JPV: formal analysis, data curation. CAB: project administration, writing. JG: conceptualization, formal editing and analysis. JM: formal analysis, data curation. PB: funding acquisition, conceptualization, review of the manuscript. CJJ: conceptualization; data curation; supervision; funding acquisition; project administration, supervision, validation and writing.
Corresponding author
Ethics declarations
Competing interests
ABA, JG, JM, and PB are employees of CENTOGENE GmbH. Other authors declare they have no conflict of interests.
Ethical approval
The current project has been conducted within a diagnostic setting and in a second step, utilized de-identified data and samples. Informed consents were obtained, the form contains a section for consent for genetic testing related to the disease (s) of the patient and consent for research and approval from Ayudas Diagnósticas SURA Ethics committee was obtain. The research conformed to the principles of the Helsinki Declaration.
Additional information
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary information
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.
About this article
Cite this article
Velasco, H.M., Bertoli-Avella, A., Jaramillo, C.J. et al. Facing the challenges to shorten the diagnostic odyssey: first Whole Genome Sequencing experience of a Colombian cohort with suspected rare diseases. Eur J Hum Genet 32, 1327–1337 (2024). https://doi.org/10.1038/s41431-024-01609-8
Received:
Revised:
Accepted:
Published:
Issue date:
DOI: https://doi.org/10.1038/s41431-024-01609-8