Clinical genetic variation across Hispanic populations in the Mexican Biobank

Barberena-Jonas, Carmina; Medina-Muñoz, Santiago G.; Cedillo-Castelán, Viankail; Sepúlveda-Morales, Tania; Gonzaga-Jáuregui, Claudia; García-García, Lourdes; Ioannidis, Alexander G.; Moreno-Estrada, Andrés

doi:10.1038/s41591-025-04100-z

Download PDF

Article
Open access
Published: 21 January 2026

Clinical genetic variation across Hispanic populations in the Mexican Biobank

Carmina Barberena-Jonas ORCID: orcid.org/0000-0001-7413-638X¹,
Santiago G. Medina-Muñoz¹,
Viankail Cedillo-Castelán^1,2,
Tania Sepúlveda-Morales³,
Claudia Gonzaga-Jáuregui³,
ENSA Genomics Consortium,
Lourdes García-García ORCID: orcid.org/0000-0001-5262-1157⁴,
Alexander G. Ioannidis ORCID: orcid.org/0000-0002-4735-7803^5,6 &
…
Andrés Moreno-Estrada ORCID: orcid.org/0000-0001-8329-8292¹

Nature Medicine volume 32, pages 725–735 (2026)Cite this article

17k Accesses
4 Citations
141 Altmetric
Metrics details

Subjects

Abstract

Genetic testing for specific alleles is often recommended based on an individual’s ancestry. However, the frequency of pathogenic and pharmacogenomic alleles across different Hispanic groups has not been well characterized, and existing guidelines often fail to recognize the geographic and ancestral diversity within these populations. Here analyzing data from 6,011 individuals from the nationwide Mexican Biobank, we show that Mexican individuals have striking regional differences in biomedically relevant allele frequencies, shaped both by their overall admixture proportions, but also by the local Indigenous ancestral groups contributing to their genome (for example, Nahua in central Mexico, Zapotec in the South or Maya in the Yucatan peninsula). We found ancestry-specific patterns with clinical implications that could not have been detected without a local ancestry-informed approach, including variants affecting fentanyl (rs2242480) and statin (rs4149056) metabolism, examples particularly relevant to the epidemiology of Hispanic populations. This analysis framework could inform genetic testing guidelines across the Americas. We are making available the results for 42,769 biomedically relevant genotyped variants through MexVar, a user-friendly platform designed to improve access to genomic data for the scientific community and support genetic analyses for populations of Mexican descent worldwide.

Genotyping, sequencing and analysis of 140,000 adults from Mexico City

Article Open access 11 October 2023

Mexican Biobank advances population and medical genomics of diverse ancestries

Article Open access 11 October 2023

Exome sequencing and analysis of 44,028 British South Asians enriched for high autozygosity

Article Open access 27 March 2026

Main

Clinical recommendations often consider observed allele-frequency differences in specific populations, recognizing the need for tailored approaches in precision medicine^1,2. For example, individuals of Ashkenazi Jewish descent are advised to undergo genetic testing for BRCA1 and BRCA2 pathogenic variants due to a higher prevalence in this group of these variants, which substantially increase the risk for breast and ovarian cancers^3,4,5. Similarly, it is often recommended that Hispanic/Latinos be tested for specific genetic markers, such as those influencing drug metabolism⁶. In practice, the American College of Medical Genetics and Genomics (ACMG) in the United States recommends carrier screening if a clinically relevant variant reaches a frequency of 1:200 in a population⁷. However, how such a population is defined is not clear, and existing clinical guidelines often completely overlook differentiation among Hispanic/Latino populations, treating them as a single ethnicity. This is particularly relevant for Mexican individuals, whose genomes are often admixed with varying continental ancestry contributions due to historical migrations and also include contributions from different subcontinental regional ancestries, particularly distinct Indigenous American ones^8,9,10,11.

To fully understand and explore the clinical genetic consequences of this diversity, a nationwide biobank that captures the unique genetic landscape of Mexicans is essential. The Mexican Biobank (MXB) provides an invaluable resource by including genetic data from individuals across 898 localities nationwide, in both urban and rural areas¹¹. This biobank currently includes 6,011 individuals from all 32 states across Mexico (Extended Data Fig. 1), genotyped for 1.8 million variants on the Multi-Ethnic Global Array (MEGA), which provides extensive coverage of clinical variants and those associated with diseases and pharmacogenomics (PGx). The comprehensive design of the MXB enables in-depth characterization of genetic variation, making it a crucial tool for advancing precision medicine in the Mexican population (for further details on the inclusion criteria and genotyping array, see Methods).

Here we demonstrate, using the MXB, that genetic diversity within Mexicans is far more intricate than a monolithic ethnic label like ‘Hispanic’ or ‘Latino’ can capture. This complexity arises from two key factors. First, Mexicans exhibit varying proportions of Indigenous American, European and African ancestries, which substantially influence the frequency of biomedically relevant variants, such as those associated with pharmacogenetic traits. Second, there is substantial regional subcontinental differentiation within the Indigenous American source ancestries of Mexicans, stemming from pre-Columbian population genetic variation, from Arido-American groups in the north to Mesoamericans around the central plateau of Mexico City to southern Oaxaca and the Maya of the Yucatán Peninsula.

We have found that subcontinental variation among these Indigenous American subgroups contributing ancestry to admixed Mexicans can result in geographic allele frequency variation across Mexico that is as large as, or even larger than, that observed between many continental groupings worldwide, such as Europeans and East Asians.

To better explore these complexities, we developed MexVar, an interactive platform that integrates multiple datasets and provides a comprehensive tool for exploring the frequency variation of biomedically relevant variants in Mexico, accounting for geography and ancestry at fine scales. MexVar empowers researchers, clinicians and broader users by offering access to crucial genetic information that enables precision medicine strategies tailored to the diverse genetic backgrounds of Mexican and broader Hispanic/Latino populations.

Results

Biomedical variants are differentiated across geography and ancestries

First, we defined biomedically relevant variants as those established, or implicated, in disease risk, diagnosis or treatment response. To curate these, we leveraged multiple sources, including Pharmacogenomics Knowledge Base (PharmGKB), Online Mendelian Inheritance in Man (OMIM), ClinVar and the GWAS Catalog^12,13,14,15. These aggregate data from large-scale genomic investigations, clinical case reports and association studies. We required our curated variants to be present in at least one of the databases and genotyped in the MXB (Supplementary Table 1). We then calculated the allele frequency for all curated variants in the databases (Extended Data Fig. 2c). This analysis shows that large differences exist in overall curated variant numbers and frequency distributions between the databases.

Examination of average continental ancestry proportions across Mexican states (Extended Data Fig. 2a) revealed heterogeneity, with southern states, particularly Oaxaca, showing a higher proportion of Indigenous American ancestry, while northern states exhibited a lower proportion. As expected and previously shown, European ancestry displayed an inverse relationship with Indigenous ancestry (Supplementary Fig. 1)^10,11. This affects measures of genetic differentiation as shown next.

We calculated Fst values for biomedically relevant variants in a pairwise comparison between Mexican states, observing the largest genetic differentiation along a north-to-southeast gradient (Extended Data Fig. 2b). This is consistent with previous reports¹¹ using all genetic variants, regardless of functional category, meaning that the frequencies of biomedically relevant variants appear to behave similarly to variants across the entire genome. We found that the highest Fst values were observed between southern states (that is, Oaxaca and Chiapas) and northern ones (that is, Chihuahua and Sinaloa). These also show the highest and lowest average Indigenous American ancestry, respectively, suggesting that part of the observed differentiation is driven by European-Indigenous admixture clines in modern Mexico.

To explore these variants further, we defined a focused subset of clinically relevant and actionable variants by integrating PGx alleles from PharmGKB (levels 1A, 1B, 2A, 2B) and pathogenic or likely pathogenic variants from ClinVar occurring in ACMG Secondary Finding List (SF) v3.2-listed genes associated with cardiovascular, metabolic, hereditary cancer and other actionable conditions (Fig. 1).

**Fig. 1: Distribution and allele frequency of clinically relevant and actionable variants across the genome and in the MXB.**

Of the ClinVar-reported pathogenic and likely pathogenic variants in ACMG-listed genes, 1,057 sites were genotyped in the MEGA array, involving 73 of the 81 recommended genes. Among these, 99 variants in 33 genes had a nonzero minor allele frequency (MAF; Fig. 1b). Variants in GAA, MYBPC3, RYR1, KCNQ1 and BTD genes showed particularly high frequencies. The variant rs39751467 in the BTD gene, for instance, displayed a striking geographic pattern, almost exclusively present in the Yucatán Peninsula (Fig. 1c). Pathogenic BTD variants are associated with biotinidase deficiency¹⁶ (MIM, 253260), but oral biotin supplementation is possible and, when patients are identified early enough, can prevent neurological damage.

We then focused on PGx alleles totaling 58 SNPs and 22 genes (Supplementary Table 2), we obtained allele frequencies across each state in Mexico (Supplementary Fig. 2) and even within this single nation, observed substantial heterogeneity across states (Fig. 1d). For example, rs2242480—associated with fentanyl metabolism¹⁷—showed a clear north-to-south gradient, while rs1801133—linked to methotrexate response—had higher frequencies in the north. In the following section, we aim to demonstrate that such patterns arise from two distinct sources that together have a key role in shaping clinical variation in Mexico: admixture gradients of continental ancestry (illustrated in Fig. 2) and geographic differentiation in Indigenous American ancestry sources across Mexico (highlighted in Fig. 3).

**Fig. 2: asF of PGx variants in the MXB.**

**Fig. 3: Greater differentiation is observed across Indigenous segments than across global populations.**

To that end, we performed further analyses to dissect the allele frequencies of the clinically relevant and actionable variants into their ancestry-specific frequency components (Fig. 2 and Extended Data Fig. 3). Ancestry-specific frequencies (asF_x) compute allele frequencies only for alleles observed in chromosomal segments inherited from the ancestry of interest. For example, since an admixed individual in Mexico might have some genomic segments of a particular chromosome inherited from European ancestors and others from Indigenous American ancestors, we would include the allele on that individual’s chromosome in the Indigenous American-specific frequency estimate only if it lays within a segment inherited in that individual on that chromosome from Indigenous American ancestors (for details, see ‘Local ancestry analysis’).

In our asF_x analysis, we compared the distribution of all 58 pharmacogenetic variants across Indigenous, European and African ancestry genomic segments. Of these, only 32 variants were found in all three ancestries (Supplementary Table 3). Among these, most showed marked deviations in pairwise comparisons, indicating clear enrichment toward a specific ancestry. For example, 19 variants were enriched in European ancestry segments when compared to Indigenous segments, and 20 compared to African segments (Supplementary Table 3). Among the 19 variants enriched in European segments versus Indigenous segments, 12 were either minimally represented or entirely absent in Indigenous ancestral segments of the MXB population (Fig. 2a).

We identified nine variants enriched for Indigenous ancestry, of which rs2242480-T (CYP3A4*1G; Fig. 2b) showed the highest Indigenous enrichment. Its allele frequency distribution exhibits a north-to-south increase within Mexico (Fig. 2b), and the higher frequency of the T allele correlates with higher Indigenous genetic ancestry proportions (Fig. 2c and Supplementary Fig. 4). This is consistent with the previously observed European ancestry cline in Mexico, and the allele-specific frequency asF_x pattern, which shows that ancestry-specific allele frequency (asF) in individuals with genomic segments of European ancestry (asF_Eur) has a lower frequency of the effect allele for rs2242480 compared to the asF observed in individuals with genomic segments of Indigenous American ancestry (asF_Ind; Fig. 2d,e and Supplementary Fig. 5). Moreover, in the Indigenous segments, this genetic variant has a consistent allele frequency of 60% across states. When looking at the African segments (asF_Afr; Supplementary Figs. 3–5), we found that the frequency of the effect allele was even higher than in Indigenous segments. However, the overall contribution of African segments to the MXB samples is considerably smaller than that of Indigenous segments. As a result, while both ancestries contribute to the overall frequency of the variant in the population of Mexico, Indigenous ancestry has a more prominent role in shaping the observed nationwide distribution, which reflects regional differences in continental ancestry proportions.

Another African-enriched variant is rs1801265 (DPYD*9a), where genotypes AA + AG are associated with reduced drug toxicity when treated with capecitabine or fluorouracil in individuals with colorectal neoplasms, compared to genotype GG. This variant shows a frequency of 42% in African segments, compared to 21% and 20% in Indigenous and European segments, respectively (Supplementary Figs. 3–6 and Supplementary Table 3).

Subcontinental variation within the Indigenous American ancestry of Mexicans can be larger than between other ancestries worldwide

We then compared the distribution of ancestry-specific Fst (asFst) values across all biomedically relevant variants between Mexican states for Indigenous American and European genome segments. We identified a significant difference (P = 0.042), where Indigenous American segments exhibit higher mean Fst across states than European segments, reflecting greater geographic differentiation in allele frequencies across Mexico in this ancestry component. European chromosomal segments predominantly show low Fst values, likely due to a common Spanish origin across the nation, albeit with some extreme asFst alleles possibly arising from colonial founder effects (Fig. 3a and Supplementary Fig. 7).

To illustrate this greater frequency differentiation in Indigenous American ancestry segments, and to highlight that regional frequency differences do not stem only from differential admixture but rather from ancestral variation in the Indigenous American substratum of Mexico, we considered one of the best documented pharmacogenetic variants—rs4149056 (c.521T > C found in SLCO1B1*5 and SLCO1B1*15). This SNP has been strongly associated with the response to several statins (Supplementary Fig. 8), such as atorvastatin, pravastatin and simvastatin, in individuals with hyperlipidemia. When controlling for admixture by considering each ancestry separately, we found that although the variant is overall more frequent in European haplotypes, the effect allele frequencies in Indigenous American segments show strong geographic differentiation, being much higher in the southeast than in central-northern Mexico (17% in Yucatan versus 3% in Sonora). This extreme regional frequency variation in the Indigenous American ancestry component of Mexicans is larger than the differences in frequency for this allele between continental populations worldwide (ranging from 5% in Southeast Asians to 16% in Europeans; Fig. 3). Variation within the Indigenous component is what also drove geographic variation in the previously mentioned BTD variant (rs39751467), which is nearly exclusive to Indigenous segments from the ancestrally Mayan Yucatan.

To investigate the driver of this regional frequency difference in the Indigenous ancestry component, in Mexicans, we explored the frequencies of rs4149056 in another cohort, the Native Mexican Diversity Panel (n = 454)¹⁰. Although having a smaller sample size compared to the MXB dataset, this dataset includes a wide range of Indigenous groups with self-identified ethnicity (and >90% Indigenous ancestry). In this independent cohort, we also observed a higher allele frequency in the southeast (with Zapotec having over 22% frequency) versus the west (with Purepecha having 4% and Huichol having 0%), paralleling the regional differences we observe in the MXB (Fig. 3c and Supplementary Figs. 8 and 9). This suggests that pre-Columbian native population structure still shapes the frequencies of pharmacogenetic variants in Mexicans, resulting in substantial regional differences across the MXB.

Subcontinental variation of pharmacoalleles is not restricted to just the Indigenous American component. For SCN1A rs3812718-T, where TT carriers require higher carbamazepine doses than CC carriers, the allele distribution is heterogeneous across both asF_Ind and asF_Eur (Supplementary Fig. 10).

Factors impacting clinical genetic variation in Mexico

To characterize and quantify the underlying factors contributing to the wide frequency variation, we applied a generalized linear model (GLM) to all biomedically relevant variants with a MAF 5% (42,769), incorporating geographic variables (latitude and longitude) and continental genetic ancestry as predictors. The results revealed that 62% of the variants (n = 26,517) show a significant association with the Indigenous American ancestry proportion (P < 0.05; Fig. 4). Geographic variables predominantly showed an effect when combined with ancestry components (Indigenous or African), and <6% (n = 2,465) of the variation is solely explained by geography (Fig. 4a). We also identified 12% (n = 5,237) of variants where variation cannot be attributed to any of the variables we modeled, indicating a more complex scenario. To illustrate these findings, we selected four representative SNPs, each highlighting a different predictive factor from the GLM analysis (Fig. 4b–e). We then examined the estimated effect of each variable. For genetic ancestry, most variants had a negative effect within the Indigenous American component (Fig. 4f, P ≤ 0.001). Accordingly, this means that European ancestry was associated with a higher effect on allele frequencies, suggesting a strong selection bias in the clinical variants ascertained in these databases, with most deriving from studies on European-descent individuals. We observed a similar, significant difference between positive and negative effects for African ancestry. There was no significant difference between longitude and latitude regarding positive and negative effects. To further investigate patterns of genetic differentiation, we calculated the absolute difference in allele frequency between ancestries (Supplementary Fig. 12) and expanded on these findings in Supplementary Fig. 13 to reveal genome-wide patterns of allele frequency differences, with most variants showing enrichment towards European ancestry. We found that this pattern was consistent across databases, further supporting a discovery bias rooted in European-centric studies.

**Fig. 4: Geographic and ancestry-related contributions to genetic variation in the MXB.**

Other clinically relevant variants did not show geographically structured patterns but segregated exclusively in genomic segments of African ancestry, which also deserves particular attention. This ancestral component is less prevalent than the Indigenous or European ancestries in the MXB, and its distribution is more uniform across Mexico. The vast majority of MXB individuals have 1–5% African ancestry nationwide (Supplementary Fig. 11). High-frequency variants in African source populations may have been inherited via founder effects during the transatlantic slave trade and maintained in the African component of admixed Mexicans. This appears to be the case for a variant in the APOL1 gene (rs73885319-G from ClinVar and GWAS Catalog), where all occurrences of the risk allele (n = 68) were observed in African haplotypes, supporting the hypothesis of an African origin of this MXB allele, which has been strongly associated with end-stage kidney disease, early-onset hypertension and cardiovascular disease¹⁸. Elevated frequencies of the APOL1 risk allele have been found in African-American, sub-Saharan African and Western African populations. Recent studies have found elevated frequencies outside Africa and the United States, including Caribbean and Central American populations with high proportions of African ancestry¹⁹. In our analysis, the allele was not observed at higher frequencies among individuals with greater African ancestry.

MexVar is an interactive database to explore biomedically relevant variants in the context of ancestry and geography

To date, there is no publicly available platform that allows for user-friendly consultation of allele frequencies at a nationwide scale in Mexico. To close this gap, we incorporated the allele frequencies estimated in the MXB (genome wide and ancestry specific) into a graphical user interface platform named MexVar (Fig. 5). This is an R-based interactive platform for end-users, allowing them to explore and visualize allele frequency data in real time with the option of incorporating a local ancestry analysis (ancestry specific) informed approach. The output includes a dynamic map of Mexico featuring overall and asF, a bar graph visualization, and detailed descriptions of variant effects. Users can also download variant frequency data for further analysis. Additionally, users have the flexibility to tailor the plot esthetics, such as including color schemes and titles, to suit their preferences. MexVar includes a dedicated ‘variant details’ tab with curated metadata, allowing users to access studies related to each allele. MexVar’s design allows for the integration of additional data as it becomes available, whether through increased sample size or through the inclusion of sequencing data from technologies such as whole-genome sequencing, exomes or long-read sequencing. Additionally, MexVar can incorporate newly reported variants from external databases. This scalability and adaptability make MexVar a long-term tool for genetic research and clinical applications. MexVar is available at https://morenolab.shinyapps.io/mexvar/.

**Fig. 5: MexVar is an interactive platform for exploring ancestry- and geography-informed allele frequencies in the MXB.**

Discussion

Current clinical guidelines often fail to account for the genetic diversity within Hispanic/Latino populations, considering them as a single homogeneous label. This approach overlooks substantial regional genetic differences, particularly in highly diverse countries like Mexico, where Indigenous American ancestry source populations vary widely across the country. Our study addresses this gap by leveraging MXB data to characterize regional differences in allele frequencies of biomedically relevant variants across all 32 states of Mexico. Our results revealed several patterns of differentiation, including a North-South cline that aligns with previous findings on genetic diversity across Mexico using all autosomal variants^10,11.

We found that 60% of biomedically relevant variants are influenced by continental ancestry proportions (admixture). These findings align with previous PGx studies in Latin America, where many variant frequencies correlate with continental genetic ancestry^20,21,22,23. Most of the remaining variation is attributable to geographic genetic variation. We demonstrate that this stems largely from remarkable differences in allele frequencies within the Indigenous American ancestry substratum of Mexicans, driven by pre-Columbian Indigenous population substructure. Indeed, genetic differences among Indigenous source populations from different regions, such as the Nahua, Maya or Mixtec, correlate directly with geographic frequency differences of clinically relevant variants in the MXB cohort. Even if participants may not necessarily self-identify with a particular neighboring Indigenous ancestral group, these groups have contributed to their genetic ancestry and have thus shaped their genetic risk.

These results collectively suggest that using descriptors such as Latino or Hispanic to imply a genetically homogeneous group risks overlooking important PGx differences rooted in ancestral diversity. Finally, approximately 12% of the variants did not correlate significantly with either ancestry, geography or a combination of these factors, unveiling a more complex scenario that should be explored further. Despite the current bias in characterized variants, which have mostly been ascertained in European populations, we were able to discover some variants more common in Indigenous ancestry segments than in European. These are particularly relevant for local epidemiology, as current clinical guidelines dominated by European-derived frequency data may not recommend routine testing for them. rs2242480 (CYP3A4*1G) illustrates a striking case showing a positive correlation between the effect allele (T) and Indigenous genetic ancestry. In the MXB, the T allele is highly prevalent in southern states such as Chiapas (86% carriers, 38% homozygous), compared to northern states like Baja California Sur (53% carriers, 9% homozygous). The T allele is linked to reduced fentanyl metabolism and a lower dose requirement for women in postsurgery pain management, with implications for safer opioid dosing during childbirth^24,25. While the United States faces an opioid crisis, with fentanyl as the leading cause of death among adults aged 18–45 years^26,27, Mexico deals with opioid shortages that limit treatment for patients with severe health conditions²⁸. Therefore, pharmacogenetic data and insights from our study are relevant for public health in Mexico and for Mexican-American individuals in the United States, as current standard opioid dosing guidelines may inadvertently increase the risk of adverse effects in these populations. Our findings point to a potential benefit of tailoring opioid prescriptions based on genetic profiles, particularly in individuals with origins from southern states such as Chiapas, Oaxaca and Yucatán. Further clinical studies are needed to evaluate the impact of pharmacogenetic testing on patient outcomes.

In these types of decision-making scenarios, while continental ancestry proportions can provide some guidance, they cannot alone predict the genetic risk of admixed individuals in populations with complex demographic histories. To capture the finer scales of this complexity, we demonstrate that a local ancestry analysis approach is needed so that the relationship between allele frequencies and the different ancestral components can be disentangled. By applying this approach, we were able to calculate asF nationwide for 42,769 biomedically relevant variants, revealing patterns that variation in genome-wide ancestry proportions cannot explain. This resulted in the detection of substantial local differentiation patterns, highlighting how subcontinental structure, particularly within the Indigenous American ancestry component, has a crucial role in shaping the genetic diversity of clinically relevant variants.

A major case of a differentially distributed clinical variant under an ancestry-specific lens is the SNP rs4149056 in the SLCO1B1 gene, which is strongly linked to statin metabolism in Mexican populations, with C allele carriers being intermediate or slow metabolizers of atorvastatin^29,30. Statins are widely prescribed to reduce LDL cholesterol and prevent cardiovascular disease and are being used by an estimated 7.9 million people in Mexico³¹. While effective and affordable, their metabolism is influenced by genetic variation, reflected in substantial allele frequency differences across continental and subcontinental groups³². By showing the remarkably higher values of Indigenous asF of this allele, particularly in the Yucatan and broader Mayan region, we demonstrate that the source of this geographic difference was only resolvable through local ancestry inference, which can help recover hidden substructure patterns in admixed populations. Indeed, when looking at this allele in source populations from an independent cohort of self-identified Indigenous individuals in Mexico¹⁰, we confirmed a significant differentiation in the southeast (Fig. 2c and Supplementary Fig. 9). Given the high prevalence of the C allele in the Mayan region, and the high burden of coronary heart disease as a leading cause of death in Mexico, national guidelines should consider adjusting statin dosing in these populations based on genetic testing or alternative statins (for example, pravastatin and fluvastatin) should be prioritized to reduce the risk of myopathy and rhabdomyolysis associated with long-term exposure.

On the other hand, our ancestry-specific analyses also revealed that many high-evidence pharmacogenetic variants are rare or absent in Indigenous ancestry segments, highlighting the limits of applying European-based guidelines to diverse populations like Mexico. As reported in other Indigenous American and Latin American cohorts²³, the apparent absence of known variants might reflect the presence of yet-undiscovered rare alleles associated with poor or ultrarapid drug metabolism—variants that remain uncharacterized in current PGx panels. This highlights the need to diversify PGx discovery efforts, as clinically actionable variants may differ markedly—or be entirely absent—in underrepresented populations, particularly those with substantial Indigenous or African genetic contributions.

In addition to PGx variants, we analyzed clinically actionable variants from the ACMG list, which are critical for early diagnosis, prevention and management of genetic diseases. These variants provide important opportunities for precision public health, as their detection can directly inform newborn screening and targeted interventions. For example, the pathogenic allele identified in the BTD gene associated with biotinidase deficiency was found exclusively within Indigenous ancestry segments and showed strong regional variation. Indeed, its occurrence in Indigenous ancestry segments is centered exclusively on the Yucatan, suggesting a source in the Mayans, native to that region. Its frequency in the MXB, stemming largely from the Yucatan, was approximately 0.25%, while in a separate cohort with available genomic data for Mexican individuals, the Mexico City Prospective Study (MCPS)³³—a predominantly urban cohort restricted to Mexico City—it had a sixfold lower frequency (~0.04%), also exclusively within Indigenous segments. Given that biotinidase deficiency is treatable via early biotin supplementation and is included in Mexico’s national newborn screening program³⁴, identifying subpopulations with higher carrier frequencies could support the development of targeted screening strategies to prevent irreversible neurological damage.

The MXB was instrumental in uncovering these fine-scale patterns, demonstrating the importance of such resources in advancing genetic research and precision health for highly diverse and yet understudied populations. The local ancestry analytical approach allows for the identification of variants with higher frequencies in Indigenous segments and with geographic variation across different Indigenous sources, suggesting particular Indigenous groups and regions with a greater prevalence that might otherwise remain undertested. Such insights can help pinpoint target populations for follow-up studies. By compiling large-scale, diverse genetic datasets, biobanks like this help researchers to explore the genetic basis of complex diseases, uncover new therapeutic targets and enhance the accuracy of personalized medicine³⁵. Our analysis of biomedically relevant variants in the MXB underscores the complexity of genetic variation across Mexico, necessitating a case-by-case approach to fully understand these patterns. To address this diversity, we developed MexVar, a user-friendly platform that enables the real-time exploration of allele frequencies by integrating genome-wide and ancestry-specific data. This tool can help address the challenges of analyzing complex admixed genomes, removing the dependency on extensive bioinformatics infrastructure and making genetics research more accessible to nongenomics researchers and clinicians, especially in resource-limited settings.

While the MXB offers the broadest geographic coverage of any genomic resource in Mexico, it does not fully capture the genetic diversity of the entire population, and some groups—particularly smaller or more geographically restricted Indigenous communities—may be underrepresented. We acknowledge this limitation and have addressed it to the best of our ability with the existing data. Expanding biobanking and whole-genome sequencing efforts will enable the discovery of rare and population-specific variants that are currently not captured by the MEGA array, which is an affordable assay that includes clinically relevant variants.

MexVar aims to make these data accessible locally and globally, empowering precision health research in Latin America, where genetic histories are often distinct from the global average. Enhancing our understanding of the frequencies of PGx alleles and clinically actionable variants across a broader spectrum of biogeographic origins will strengthen testing panels and improve treatment outcomes in diverse populations^36,37, including, but not limited to, underrepresented ancestries in the Global South. Clinical testing recommendations based solely on the Hispanic/Latino label are insufficient for personalized medicine, as they neglect to consider this substantial regional, national and local genetic variation across Latin America. As we continue to push the boundaries of precision medicine, initiatives like the MXB and implementations such as MexVar will have a crucial role in delivering personalized and equitable healthcare solutions, ensuring that the benefits of genomics research extend to all populations.

Methods

Human participants and ethics

This work did not involve new recruitment of human research participants. Demographics of previously recruited human participants included in the MXB project are described in ref. ¹¹. Ethical approval was obtained from the Institutional Review Board of the National Institute of Public Health (INSP; approvals CI1479 and CB1470) for the genetic characterization of samples from the 2000 National Health Survey (ENSA 2000). Participants were enrolled through informed consent and extensive community engagement nationwide. National Health Surveys in Mexico have been conducted periodically since 1988, so the population is highly participative and receptive to household visits by INSP staff and fieldwork teams. Sampling and biobank maintenance were carried out by INSP, while genomic data were generated at the Cinvestav Research Center in Mexico. The data have been analyzed jointly, fostering interinstitutional collaboration and local leadership among Mexican researchers and trainees.

MXB dataset

To explore and catalog biomedically relevant variants within Mexican populations, we used the genetic information generated by the MXB Project¹¹. It currently includes data for 6,011 individuals from all 32 states across Mexico recruited as part of the 2000 National Health Survey (ENSA 2000), and genotyped at 1.8 million variants on the MEGA³⁹. The MEGA array provides extensive coverage of clinically relevant variants associated with disease and PGxs, ensuring the representation of diverse populations. Its design incorporates over 500,000 variants linked to clinical research, drawing from major variant annotation databases, including ClinVar, OMIM, the GWAS Catalog, PharmGKB, ACMG, the Clinical Pharmacogenetics Implementation Consortium and gnomAD. This comprehensive integration offers a robust framework for identifying variants with established or potential clinical significance.

Participants in the ENSA 2000 were selected using a probabilistic, multistage, stratified and clustered sampling design to ensure national representativity across 32 states of Mexico. The survey targeted civilian, noninstitutionalized individuals and collected household, health and sociodemographic data through structured interviews conducted by trained personnel. Biological samples, including serum and buffy coats, were obtained from 43,085 individuals aged 20 years or older. This cohort has been used in various epidemiological and genetic studies, offering a comprehensive resource for assessing health determinants at both the individual and population levels. The inclusion of individuals from rural and remote areas further strengthens the dataset’s utility in investigating genetic diversity and its implications for health disparities in Mexico. For more details, see ref. ⁴⁰.

Samples for the MXB project were selected to maximize both geographic coverage and representation of Indigenous ancestries. The 6,011 genotyped samples are distributed across 898 recruitment sites throughout Mexico, ensuring an average sample size of five to ten individuals per locality, regardless of population density. Each state has, on average, 188 individuals, ranging from 86 to 309. The number of individuals per state is shown in Extended Data Fig. 1. The selection process prioritized individuals who reported speaking an Indigenous language (1,055), followed by random selection until budget limitations were reached. For further details, see ref. ¹¹.

Curation of biomedically relevant variants

We focused on known variants within our dataset that are relevant to human health. To identify these variants, we used data from the following four main databases: ClinVar, GWAS Catalog, PharmGKB and OMIM^12,13,14,15. The complete datasets were downloaded directly from their respective websites in March 2023. We parsed these databases to extract biomedically relevant variants, including their variant identifiers, chromosomal locations, genetic positions, effect directions, levels of evidence, clinical significance, drug associations, associated phenotypes and related genes. We then intersected these variants with the 1.8 million variants directly genotyped for the MXB cohort. For a subset of the analysis, and for the MexVar app, variants with a MAF of less than 5% were filtered out using PLINK⁴¹ with the MAF option set to 0.05, ensuring that our analysis focused on variants exhibiting meaningful frequency patterns within the Mexican population. After merging and filtering, we retained 42,769 variants. A detailed summary of these selected variants is provided in Supplementary Table 1.

Ancestry and population descriptors

‘Genetic ancestry’, as used in this study, is a statistical construct based on the genetic similarity that an individual shares with a given reference panel of source populations, reflecting their potential ancestors. In contrast, ‘race’ and ‘ethnicity’ are social constructs used to group people based on perceived physical, geographical, cultural or other social characteristics. In the analyses presented here, we exclusively refer to genetic ancestry as described above, except when mentioned otherwise. Notably, an individual’s assigned genetic ancestry is not equivalent to, and does not invalidate, how that individual self-identifies.

In this study, we distinguish between two levels of genetic ancestry: continental and subcontinental.

Continental ancestry refers to broad ancestral groupings based on large-scale worldwide population structure; this study includes Indigenous American, African and European components. These proportions were inferred using global reference panels and represent the primary axes of genetic variation relevant to populations across Mexico.

Subcontinental ancestry, on the other hand, captures the finer-scale structure within the continental component. While continental ancestry reflects large-scale population groupings, subcontinental ancestry refers to the regional genetic differentiation that arises from long-term demographic, cultural and geographic isolation within continental landmasses. For example, within a continental ancestry, such as Indigenous American, there exists substantial genetic heterogeneity between regional populations due to historical separation, founder effects and limited gene flow. Accounting for this substructure is essential for accurately characterizing patterns of genetic variation, since different subcontinental contributions can lead to significant differences in risk allele frequencies with medical implications, even among individuals with similar continental ancestry proportions.

GLMs

We used linear models to investigate the influence of genetic background (Indigenous American and African ancestry) and geographic factors on the variation observed in biomedically relevant variants. Before modeling, these variables were standardized in R to reduce scale-related biases. Subsequently, GLMs were used in R⁴² to analyze the associations between the standardized predictor variables and genetic variations. Coefficients and P values were calculated to determine the statistical significance of each predictor variable.

Local ancestry analysis

Local ancestry inference

To further investigate the influence of ancestry on the incidence of biomedically relevant variants in individuals within the MXB cohort, we used the Gnomix software from ref. ⁴³ to infer local ancestry tracts. We used a k = 4 model, which assumes the presence of four distinct genetic groups. Reference populations were selected to represent the major continental genetic ancestries in Mexico¹¹—African (Afr), European (Eur), East Asian (Eas) and Indigenous American (Ind; Supplementary Fig. 14). This approach allowed us to accurately characterize the genetic contributions from these ancestries within the cohort.

Reference populations were taken from the Human Genome Diversity Project⁴⁴, the Population Architecture using Genomics and Epidemiology study⁴⁵, and individuals from the MXB. We used the same number of individuals in each reference population to mitigate potential bias towards a particular ancestry. Sixty individuals were randomly chosen for each of the four populations. For the African component, we selected individuals self-identifying as Bantu, Mandenka and Yoruba; for the European, we selected populations from Western Europe, individuals self-identifying as French, Italian and Orcadians; and for East Asian, we used a combination of individuals identifying as Han and Japanese. For the Indigenous people from the Americas, we integrated genetic information from various groups, including individuals self-identifying as Mixe, Surui, Puno, Zenu and Indigenous populations in Honduras. We also included individuals from the MXB who exhibited more than 98% Indigenous ancestry based on unsupervised ADMIXTURE analysis⁴⁶.

We used PLINK⁴¹ to merge the datasets and retained only the intersecting biallelic variants, excluding triallelic variants and those with genotype missingness <5%. We ran an ADMIXTURE analysis to corroborate the homogeneity of our reference panel (Supplementary Fig. 14). We ran Gnomix with the default parameters, with the exception of setting ‘inference to best’ and ‘phase’ to FALSE.

Local ancestry accuracy

We used Gnomix to estimate local ancestry tracks due to its higher accuracy compared to other programs like RFmix, as stated in the main Gnomix paper⁴³. This paper also evaluates accuracy over time and reports that for ancestries traced back up to 20 generations, Gnomix maintains an accuracy of over 93% on array data. Given that our dataset primarily captures ancestry within the past 16 generations (assuming an average of 30 years per generation and admixture starting approximately 500 years ago), Gnomix is well-suited for our analysis.

We also conducted a simulation to evaluate the impact of local ancestry inference errors on allele frequency estimation. We simulated chromosomes with ancestry proportions reflecting those observed in the Mexican population (Afr = 4, Ind = 65, Eur = 30, Eas = 1). In this simulation, alleles were assigned frequencies specific to each ancestral background (for example, Ind = 0.1). To model errors in local ancestry prediction, we incorporated the confusion matrix derived from the Gnomix results. Using the predicted ancestry, we then estimated asF across a range of MAFs, using a sample size comparable to that of the MXB. Overall, in our simulations, we found that local ancestry inference errors had minimal impact on the estimation of asF (Supplementary Fig. 15). When training the model with our own data, we achieved a mean estimated accuracy of 94.97% across all chromosomes.

Estimation of asF

To estimate asF, we used a customized pipeline approach (Fig. 5) that involved creating a masked Variant Call Format (VCF) file. This approach uses predicted local ancestry inferences, in which variants that do not match the specified ancestry are designated missing. Subsequently, allele frequencies are computed from these ancestry-masked VCF files using VCFtools with the --freq option⁴⁷. asF are then calculated as the ratio of alleles corresponding to a given ancestry over the total number of alleles for that ancestry, as detailed in the following formula:

$${\mathrm{asF}}_{x}=\frac{p\,\mathrm{alleles}\,\mathrm{in}\,x\,\mathrm{ancestry}}{\mathrm{total}\,\left(p+q\right)\,\mathrm{alleles}\,\mathrm{in}\,x\,\mathrm{ancestry}}$$

where x is the given ancestry.

asFst

To accurately quantify genetic differentiation between the states in Mexico, we calculated asFst using a customized ancestry-specific mask to filter variants for each specific ancestry. We provided a VCF file with missing alleles from other ancestries to VCFTOOLS⁴⁷ using the --weir-fst-pop argument, performing this analysis separately for European and Indigenous local ancestry masks. This approach allowed us to quantify genetic differentiation between these populations accurately. asFSt values were calculated for all biomedically relevant variants with MAF > 0.05

Clinically relevant and actionable variants

To identify variants with established clinical relevance, we defined a subset referred to as the clinically relevant and actionable variants. This subset was curated to include only variants with strong evidence for medical actionability based on current clinical guidelines and expert consensus. The following criteria were applied:

(1)
PGx variants—we included variants annotated in the PharmGKB database at level 1A, 1B, 2A and 2B corresponding to gene–drug associations supported by clinical practice guidelines
(2)
Pathogenic variants in medically actionable genes—we identified all pathogenic and likely pathogenic variants without conflicting interpretations reported in the ClinVar database (accessed 30 April 2023) occurring in genes listed in the ACMG SF (v3.2; ref. ⁴⁸). The guideline list includes 81 genes deemed to be medically actionable, of which 28 are associated with hereditary cancer, and the remaining 53 genes are associated with cardiovascular, metabolic and other genetic conditions for which there are available medical interventions that can prevent or reduce morbidity and mortality due to these conditions.

Notably, no allele frequency threshold was applied to this subset, as many clinically relevant variants are rare but nonetheless important for diagnosis or therapeutic decisions.

This curated subset was used to illustrate the ancestry-specific distribution of clinically actionable variants within the MXB cohort. Summary statistics—including the number of variants, genes and distribution across ancestries—are provided in Fig. 1 and Extended Data Fig. 3.

PharmGKB annotations

We obtained variant annotations from the PharmGKB, where each variant can be associated with multiple annotations. These annotations are categorized into six levels of evidence (4, 3, 2B, 2A, 1B and 1A; Supplementary Fig. 16), reflecting the strength of the evidence supporting the association with a particular drug response, with level 1A representing the highest evidence and level 4 the lowest. When a variant was associated with multiple drugs at different levels of evidence, we considered the drug with the highest level of evidence. Specifically, we focused on annotations with a level of evidence starting from 2B, which denotes variant-drug combinations supported by a moderate level of evidence and requires support from at least two independent publications.

For SNP rs4149056, we analyzed the top five drugs associated with it. The data for this analysis were retrieved from the PharmGKB database on 11 September 2023. For each drug, we also determined the number of studies supporting the association, providing insights into the robustness of the identified relationships (more details are available at https://www.pharmgkb.org/variant/PA166154579/variantAnnotation).

Integration of allele frequencies of the MXB on the Shiny app

In this paper, we introduce MexVar, a robust, user-friendly web application leveraging the Shiny framework in R⁴⁹. This graphical user interface serves as a dynamic platform for querying allele frequencies nationwide in Mexico. Our analysis relied on a comprehensive dataset from the MXB, which encompasses allele frequency distributions of biomedically relevant variants across all 32 states of Mexico and comprises 42,769 variants. Detailed information regarding this dataset is provided in Supplementary Table 1. Maps displayed here and in the MexVar app are generated using mxmaps³⁸.

The application allows users to view both genome-wide and asF, thus providing an in-depth understanding of each variant’s impact relative to the ancestry. In the ancestry selection module, users can choose from five options—‘all’ (for nonancestry-specific data) and four ancestry-specific categories (European, Indigenous, African and East Asian). Detailed methodology for ancestry-specific analysis is mentioned in ‘Local ancestry analysis’.

Two mandatory inputs for the MexVar application are the rsID of a given SNP and the desired genetic ancestry used to calculate allele frequencies. Additionally, users can customize the application’s esthetic elements, such as color schemes and titles, to suit their preferences. The application processes these inputs in real time to display an output that includes a dynamic map of Mexico, where each state is color-coded according to the selected SNP allele frequency. Moreover, users can browse an additional tab within the application to explore detailed information about the SNP, including its presence in the app, the originating database, associated phenotype, gene, risk allele (if mentioned in the database) and relevant publications. Users can also download the frequency table for offline analysis, facilitating deeper investigation and data exploration.

The design is adaptable and scalable, allowing for the integration of additional data as it becomes available, such as whole-genome sequencing. This inherent flexibility facilitates ongoing improvements and the expansion of analytical scope in line with evolving datasets.

Reporting summary

Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.

Data availability

Individual level genotype data were previously generated as part of The MXB project¹¹, and are available at the European Genome-phenome Archive (EGA) through a Data Access Agreement with the Data Access Committee (EGA accession number for study: EGAS00001005797; dataset: EGAD00010002361 (Mexican_Biobank_Genotypes). Additionally, MXB frequency data has been uploaded as a new track to the UCSC human genome browser, and ancestry-specific frequencies are available through MexVar at https://morenolab.shinyapps.io/mexvar/.

Code availability

The code used in the analyses can be found in https://github.com/morenolab/MexVar_Paper_Code.

References

Khoury, M. J. et al. A collaborative translational research framework for evaluating and implementing the appropriate use of human genome sequencing to improve health. PLoS Med. 15, e1002631 (2018).
PubMed PubMed Central Google Scholar
Xi, Q., Jin, S. & Morris, S. Economic evaluations of predictive genetic testing: a scoping review. PLoS ONE 18, e0276572 (2023).
PubMed PubMed Central CAS Google Scholar
Hartge, P., Struewing, J. P., Wacholder, S., Brody, L. C. & Tucker, M. A. The prevalence of common BRCA1 and BRCA2 mutations among Ashkenazi Jews. Am. J. Hum. Genet. 64, 963–970 (1999).
PubMed PubMed Central CAS Google Scholar
Amar, S., Lieberman, S., Bideman, A., Lahad, A. & Feinsilver, T. A new population screening program for BRCA mutations in Israel—attitudes and barriers among Ashkenazi Jewish women. J. Breast Cancer Res. 2, 4–13 (2023).
Google Scholar
Manchanda, R. et al. Cost-effectiveness of population screening for BRCA mutations in Ashkenazi Jewish women compared with family history-based testing. J. Natl Cancer Inst. 107, 380 (2015).
PubMed Google Scholar
Claudio-Campos, K., Duconge, J., Cadilla, C. L. & Ruaño, G. Pharmacogenetics of drug-metabolizing enzymes in US Hispanics. Drug Metab. Pers. Ther. 30, 87–105 (2015).
PubMed PubMed Central CAS Google Scholar
Gregg, A. R. et al. Screening for autosomal recessive and X-linked conditions during pregnancy and preconception: a practice resource of the American College of Medical Genetics and Genomics (ACMG). Genet. Med. 23, 1793–1806 (2021).
PubMed PubMed Central Google Scholar
Medina-Muñoz, S. G. et al. Demographic modeling of admixed Latin American populations from whole genomes. Am. J. Hum. Genet. 110, 1804–1816 (2023).
PubMed PubMed Central Google Scholar
Moreno-Estrada, A. et al. Reconstructing the population genetic history of the Caribbean. PLoS Genet. 9, e1003925 (2013).
PubMed PubMed Central Google Scholar
Moreno-Estrada, A. et al. The genetics of Mexico recapitulates Native American substructure and affects biomedical traits. Science 344, 1280–1285 (2014).
PubMed PubMed Central CAS Google Scholar
Sohail, M. et al. Mexican Biobank advances population and medical genomics of diverse ancestries. Nature 622, 775–783 (2023).
PubMed PubMed Central CAS Google Scholar
Hamosh, A. Online Mendelian Inheritance in Man (OMIM), a knowledgebase of human genes and genetic disorders. Nucleic Acids Res. 33, D514–D517 (2004).
Google Scholar
Landrum, M. J. et al. ClinVar: improving access to variant interpretations and supporting evidence. Nucleic Acids Res. 46, D1062–D1067 (2018).
PubMed PubMed Central CAS Google Scholar
Barbarino, J. M., Whirl-Carrillo, M., Altman, R. B. & Klein, T. E. PharmGKB: a worldwide resource for pharmacogenomic information. Wiley Interdiscip. Rev. Syst. Biol. Med. 10, e1417 (2018).
PubMed PubMed Central Google Scholar
Sollis, E. et al. The NHGRI-EBI GWAS Catalog: knowledgebase and deposition resource. Nucleic Acids Res. 51, D977–D985 (2023).
PubMed PubMed Central CAS Google Scholar
Hymes, J., Stanley, C. M. & Wolf, B. Mutations in BTD causing biotinidase deficiency. Hum. Mutat. 18, 375–381 (2001).
PubMed CAS Google Scholar
Yuan, J.-J. et al. CYP3A4*1G genetic polymorphism influences metabolism of fentanyl in human liver microsomes in Chinese patients. Pharmacology 96, 55–60 (2015).
PubMed CAS Google Scholar
Tzur, S. et al. Missense mutations in the APOL1 gene are highly associated with end stage kidney disease risk previously attributed to the MYH9 gene. Hum. Genet. 128, 345–350 (2010).
PubMed PubMed Central CAS Google Scholar
Nadkarni, G. N. et al. Worldwide frequencies of APOL1 renal risk variants. N. Engl. J. Med. 379, 2571–2572 (2018).
PubMed PubMed Central Google Scholar
Alvim, I. et al. The need to diversify genomic studies: insights from Andean highlanders and Amazonians. Cell 187, 4819–4823 (2024).
PubMed CAS Google Scholar
Suarez-Kurtz, G. DPYD genotyping panels: impact of population diversity. Clin. Transl. Sci. 17, e13805 (2024).
PubMed PubMed Central Google Scholar
Guevara, M. et al. Afro-Latin American pharmacogenetics of CYP2D6, CYP2C9, and CYP2C19 in Dominicans: a study from the RIBEF-CEIBA consortium. Pharmaceutics 16, 1399 (2024).
PubMed PubMed Central CAS Google Scholar
Rodrigues-Soares, F. et al. Genomic ancestry, CYP2D6, CYP2C9, and CYP2C19 among Latin Americans. Clin. Pharmacol. Ther. 107, 257–268 (2020).
PubMed CAS Google Scholar
Chi, L. et al. Detection of cytochrome P450 3A4 gene polymorphism guides for labor analgesia with sufentanil medication. [Article in Chinese]. Beijing Da Xue Xue Bao Yi Xue Ban 47, 653–656 (2015).
PubMed CAS Google Scholar
Dong, Z.-L. et al. Effect of CYP3A4*1G on the fentanyl consumption for intravenous patient-controlled analgesia after total abdominal hysterectomy in Chinese Han population. J. Clin. Pharm. Ther. 37, 153–156 (2012).
PubMed CAS Google Scholar
Han, Y. The rising crisis of illicit fentanyl use, overdose, and potential therapeutic strategies. Transl. Psychiatry 9, 282 (2019).
PubMed PubMed Central Google Scholar
Palamar, J. J. et al. Trends in characteristics of fentanyl-related poisonings in the United States, 2015–2021. Am. J. Drug Alcohol Abuse 48, 471–480 (2022).
PubMed PubMed Central CAS Google Scholar
Covarrubias-Gómez, A., Esquer-Guzmán, H. M., Carrillo-Torres, O., Carmona-Rodríguez, J. L. & Ramos-Guerrero, J. A. La crisis de opioides en México. Rev. Mex. Anestesiol. 46, 161–165 (2023).
Google Scholar
Oshiro, C., Mangravite, L., Klein, T. & Altman, R. PharmGKB very important pharmacogene: SLCO1B1. Pharmacogenet. Genomics 20, 211–216 (2010).
PubMed PubMed Central CAS Google Scholar
León-Cachón, R. B. R. et al. The atorvastatin metabolic phenotype shift is influenced by interaction of drug-transporter polymorphisms in Mexican population: results of a randomized trial. Sci. Rep. 10, 8900 (2020).
PubMed PubMed Central Google Scholar
Gómez-Pérez, F. J. et al. Prevention of cardiovascular disease based on lipid lowering treatment: a challenge for the Mexican health system. Salud Pública Méx. 52, S54–S62 (2010).
PubMed Google Scholar
Ramsey, L. B. et al. PharmVar GeneFocus: SLCO1B1. Clin. Pharmacol. Ther. 113, 782–793 (2023).
PubMed CAS Google Scholar
Ziyatdinov, A. et al. Genotyping, sequencing and analysis of 140,000 adults from Mexico City. Nature 622, 784–793 (2023).
PubMed PubMed Central CAS Google Scholar
Gonzaga-Jauregui, C., Moreno-Salgado, R., Tovar-Casas, J. & Navarrete-Martínez, J. I. Newborn screening in Mexico and Latin America: present and future. Rare Dis. Orphan Drug J. 3, 16 (2024).
Google Scholar
Gudmundsson, S. et al. Variant interpretation using population databases: lessons from gnomAD. Hum. Mutat. 43, 1012–1030 (2022).
PubMed Google Scholar
Pirmohamed, M. Pharmacogenomics: current status and future perspectives. Nat. Rev. Genet. 24, 350–362 (2023).
PubMed CAS Google Scholar
Li, B. et al. Frequencies of pharmacogenomic alleles across biogeographic groups in a large-scale biobank. Am. J. Hum. Genet. 110, 1628–1647 (2023).
PubMed PubMed Central CAS Google Scholar
Valle-Jones, D. Mxmaps: create maps of Mexico. GitHub https://github.com/diegovalle/mxmaps (2025).
Ilumina and the Multi-Ethnic Genotyping Array Consortium. Infinium Multi-Ethnic Global BeadChip (Illumina, 2015); https://www.illumina.com/science/consortia/human-consortia/multi-ethnic-genotyping-consortium.html
Sepúlveda, J. et al. Diseño y metodología de la Encuesta Nacional de Salud 2000. Salud Pública Méx. 49, s427–s432 (2007).
Google Scholar
Purcell, S. et al. PLINK: a tool set for whole-genome association and population-based linkage analyses. Am. J. Hum. Genet. 81, 559–575 (2007).
PubMed PubMed Central CAS Google Scholar
R Core Team. R: A Language and Environment for Statistical Computing (R Foundation for Statistical Computing, 2023).
Hilmarsson, H. et al. High resolution ancestry deconvolution for next generation genomic data. Preprint at bioRxiv https://doi.org/10.1101/2021.09.19.460980 (2021).
Bergström, A. et al. Insights into human genetic variation and population history from 929 diverse genomes. Science 367, eaay5012 (2020).
PubMed PubMed Central Google Scholar
Wojcik, G. L. et al. Genetic analyses of diverse populations improves discovery for complex traits. Nature 570, 514–518 (2019).
PubMed PubMed Central CAS Google Scholar
Alexander, D. H., Novembre, J. & Lange, K. Fast model-based estimation of ancestry in unrelated individuals. Genome Res. 19, 1655–1664 (2009).
PubMed PubMed Central CAS Google Scholar
Danecek, P. et al. The variant call format and VCFtools. Bioinformatics 27, 2156–2158 (2011).
PubMed PubMed Central CAS Google Scholar
Miller, D. T. et al. ACMG SF v3.2 list for reporting of secondary findings in clinical exome and genome sequencing: a policy statement of the American College of Medical Genetics and Genomics (ACMG). Genet. Med. 25, 100866 (2023).
PubMed PubMed Central CAS Google Scholar
Chang, W. et al. shiny: web application framework for R. CRAN https://cran.r-project.org/web/packages/shiny/ (2023).

Download references

Acknowledgements

We recognize the fundamental efforts of J. Sepúlveda-Amor, former deputy secretary (vice-minister) of health, director of the National Institutes of Health of Mexico, director-general of Mexico’s National Institute of Public Health (INSP) and dean of the National School of Public Health, in the design and implementation of the 2000 National Health Survey (ENSA 2000), which laid the groundwork for the MX Biobank project and enabled the generation of the genetic data analyzed in this study. We thank the ENSA 2000 participants and the INSP staff who conducted the 2000 National Health Survey across Mexico, with support from the Secretaría de Salud (Ministry of Health). We also extend our gratitude to former and current INSP directors—J. Sepúlveda-Amor, M. Hernández Ávila, M. H. Rodríguez López, J. A. Rivera Dommarco and E. C. Lazcano Ponce—and to the former director of the Center for Research on Infectious Diseases (CISEI, INSP), C. Alpuche Aranda, and the current interim CISEI director, J. Martínez Barnetche, for their continued institutional support in this collaborative effort. We acknowledge the valuable support of J. Corona Uribe in financial administration and management, which enabled the operation of the MX Biobank project at Cinvestav. The MXB project was funded by CONACYT (grant FONCICYT/50/2016) and the Newton Fund through the UK Medical Research Council (grant MR/N028937/1) awarded to A.M.-E., as well as Mexico’s Center for Research and Advanced Studies of the National Polytechnic Institute (Cinvestav; grant UGA/DG34/24). We thank the Mexican Council for Science, Humanities and Technology (CONAHCyT) for fellowship support granted to C.B.-J. and S.G.M.-M. We also thank the Chan Zuckerberg Initiative (CZI) for a grant awarded to A.M.-E. and A.G.I. that partially supported the development of the MexVar platform (grant CZI-2024-354605). The funders had no role in study design, data collection and analysis, decision to publish or preparation of the manuscript. We thank C. Aguilar, T. Tusié and their teams at the National Institute of Medical Sciences and Nutrition Salvador Zubirán (INCMNSZ) for DNA extraction and biochemical profiling, and, along with A. Ochoa, for providing feedback on MexVar. We also thank M. Haeussler at UC Santa Cruz for his valuable help in uploading MXB data to the UCSC Genome Browser, M. Rentería at the Queensland Institute of Medical Research for fruitful discussions on genetic variants and C. Tamburrini at Cinvestav for her insightful comments on earlier versions of the paper.

Author information

Luis Pablo Cruz-Hervert, Carlos Magis-Rodríguez, María José Palma-Martínez & Mashaal Sohail
Present address: Universidad Nacional Autónoma de México, Mexico City, Mexico
Alicia Huerta-Chagoya
Present address: Broad Institute of MIT and Harvard, Harvard, MA, USA
Pablo Kuri-Morales
Present address: Tecnológico de Monterrey, Monterrey, Mexico
Hortensia Moreno-Macías
Present address: Universidad Autónoma Metropolitana, Mexico City, Mexico
Jaime Sepúlveda-Amor
Present address: University of California, San Francisco, San Francisco, CA, USA
Roberto Tapia-Conyer
Present address: Fundación Carlos Slim, Mexico City, Mexico
Deceased: Luis Juárez-Figueroa, José Luis Valdespino-Gómez, Oscar Velázquez-Monroy.

Authors and Affiliations

Aging Research Center, Cinvestav Sede Sur, Center for Research and Advanced Studies of the National Polytechnic Institute, Mexico City, Mexico
Carmina Barberena-Jonas, Santiago G. Medina-Muñoz, Viankail Cedillo-Castelán, Carmina Barberena-Jonas, Consuelo Dayzú Quinto-Cortés & Andrés Moreno-Estrada
Posgrado en Ciencia e Ingeniería de la Computación, Universidad Nacional Autónoma de México, Mexico City, Mexico
Viankail Cedillo-Castelán
Laboratorio Internacional de Investigación sobre el Genoma Humano, Universidad Nacional Autónoma de México, Juriquilla, Mexico
Tania Sepúlveda-Morales & Claudia Gonzaga-Jáuregui
Instituto Nacional de Salud Pública (INSP), Cuernavaca, Mexico
Sergio Canizales-Quintero, Luis Pablo Cruz-Hervert, Guadalupe Delgado-Sánchez, Elizabeth Ferreira-Guerrero, Leticia Ferreyra-Reyes, Juan Eugenio Hernández-Avila, Luis Juárez-Figueroa, Eduardo Lazcano-Ponce, Norma Mongua-Rodríguez, Elsa Sarti-Gutiérrez, Jaime Sepúlveda-Amor, José Luis Valdespino-Gómez, Norma Téllez-Vázquez, Manuel Velázquez-Meza & Lourdes García-García
Department of Genetics, Stanford University, Stanford, CA, USA
Alexander G. Ioannidis
Genomics Institute, University of California, Santa Cruz, Santa Cruz, CA, USA
Alexander G. Ioannidis
Instituto Nacional de Ciencias Médicas y Nutrición Salvador Zubirán, Mexico City, Mexico
Carlos Aguilar-Salinas, Alicia Huerta-Chagoya, Hortensia Moreno-Macías, Rosario Rodríguez-Guillén, María Luisa Ordóñez-Sánchez & María Teresa Tusié-Luna
Centro de Investigación y de Estudios Avanzados del Instituto Politécnico Nacional (Cinvestav), Irapuato, Mexico
Cecilia Gutiérrez-López, María de Jesús Ortega-Estrada, María José Palma-Martínez, Karla Sandoval & Mashaal Sohail
Secretaría de Salud, Mexico City, Mexico
Pablo Kuri-Morales, Carlos Magis-Rodríguez, Roberto Tapia-Conyer & Oscar Velázquez-Monroy
Universidad Nacional Autónoma de México, Mexico City, Mexico
María Teresa Tusié-Luna

Authors

Carmina Barberena-Jonas
View author publications
Search author on:PubMed Google Scholar
Santiago G. Medina-Muñoz
View author publications
Search author on:PubMed Google Scholar
Viankail Cedillo-Castelán
View author publications
Search author on:PubMed Google Scholar
Tania Sepúlveda-Morales
View author publications
Search author on:PubMed Google Scholar
Claudia Gonzaga-Jáuregui
View author publications
Search author on:PubMed Google Scholar
Lourdes García-García
View author publications
Search author on:PubMed Google Scholar
Alexander G. Ioannidis
View author publications
Search author on:PubMed Google Scholar
Andrés Moreno-Estrada
View author publications
Search author on:PubMed Google Scholar

Consortia

ENSA Genomics Consortium

Carlos Aguilar-Salinas
, Carmina Barberena-Jonas
, Sergio Canizales-Quintero
, Viankail Cedillo-Castelán
, Luis Pablo Cruz-Hervert
, Guadalupe Delgado-Sánchez
, Elizabeth Ferreira-Guerrero
, Leticia Ferreyra-Reyes
, Manuel Velázquez-Meza
, Cecilia Gutiérrez-López
, Juan Eugenio Hernández-Avila
, Alicia Huerta-Chagoya
, Luis Juárez-Figueroa
, Pablo Kuri-Morales
, Eduardo Lazcano-Ponce
, Carlos Magis-Rodríguez
, Norma Mongua-Rodríguez
, Alexander G. Ioannidis
, Hortensia Moreno-Macías
, María de Jesús Ortega-Estrada
, María José Palma-Martínez
, Consuelo Dayzú Quinto-Cortés
, Rosario Rodríguez-Guillén
, María Luisa Ordóñez-Sánchez
, Elsa Sarti-Gutiérrez
, Karla Sandoval
, Jaime Sepúlveda-Amor
, Mashaal Sohail
, Roberto Tapia-Conyer
, María Teresa Tusié-Luna
, José Luis Valdespino-Gómez
, Norma Téllez-Vázquez
, Oscar Velázquez-Monroy
& Manuel Velázquez-Meza

Contributions

C.B.-J., A.G.I and A.M.-E. conceptualized the study. A.M.-E., L.G.-G and the ENSA Genomics Consortium acquired the data. Members of the ENSA Genomics Consortium designed and implemented the 2000 National Health Survey (ENSA 2000) and maintained the biobank, selected and processed samples. C.B.-J. and T.S.-M. curated the data. C.B.-J. and S.G.M.-M. performed the formal analyses. C.B.-J. and V.C.-C. developed the software. C.B.-J. wrote the paper with input and edits from A.M.-E., C.G.-J and A.G.I.

Corresponding authors

Correspondence to Lourdes García-García, Alexander G. Ioannidis or Andrés Moreno-Estrada.

Ethics declarations

Competing interests

The authors declare no competing interests.

Peer review

Peer review information

Nature Medicine thanks Mary Regina Boland, Hakon Hakonarson and the other, anonymous, reviewer(s) for their contribution to the peer review of this work. Primary Handling Editor: Anna Maria Ranzoni, in collaboration with the Nature Medicine team.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Extended data

Extended Data Fig. 1 Geographic and demographic summary of the Mexican Biobank.

(a) Number of individuals per state in the Mexican Biobank cohort. Inset maps show detailed views of central and southeastern states. (b) Age distribution. (c) Sex distribution. (d) Distribution by language spoken and (e) urbanicity.

Extended Data Fig. 2 Allele frequency landscape of clinically relevant variants in the Mexican Biobank.

(a) Geographic map of Mexico showing the sampling locations of the 6,011 individuals in the MXB. Each point represents a sample, with shape indicating urban (triangle) or rural (circle) origin. States are color-coded by average proportion of Indigenous genetic ancestry. Image in panel a is reproduced with permission from Valle-Jones D. (ref. ³⁸), used under a free use license. (b) Pairwise Fst heatmap showing genetic differentiation between Mexican states based only on clinically relevant genetic variants. (c) Distribution of minor allele frequencies (MAFs) for clinically relevant SNPs in the MXB dataset, grouped by source database (ClinVar, GWAS Catalog, PharmGKB, OMIM).

Extended Data Fig. 3 Allele frequencies of ACMG-listed pathogenic variants stratified by ancestry.

Nationwide allele frequencies of pathogenic or likely pathogenic variants reported in ACMG SF v3.2 genes, identified in the Mexican Biobank. Only variants with nonzero frequency are shown. Bars are grouped by gene, and colors indicate the proportion of each allele frequency attributed to continental ancestries, as estimated through local ancestry inference.

Supplementary information

Supplementary Information (download PDF )

Supplementary Figs. 1–16.

Reporting Summary (download PDF )

Supplementary Tables (download XLSX )

Supplementary Tables 1–3.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Barberena-Jonas, C., Medina-Muñoz, S.G., Cedillo-Castelán, V. et al. Clinical genetic variation across Hispanic populations in the Mexican Biobank. Nat Med 32, 725–735 (2026). https://doi.org/10.1038/s41591-025-04100-z

Download citation

Received: 30 October 2024
Accepted: 31 October 2025
Published: 21 January 2026
Version of record: 21 January 2026
Issue date: February 2026
DOI: https://doi.org/10.1038/s41591-025-04100-z