Mitochondrial ancestry from complete mitogenomes highlights a lack of characterization of indigenous haplogroups in Brazilian Amazon population

de Souza, Felipe Gouvea; Matos, Gustavo Barra; Sena Santos, Camille; Souza, Tatiane Piedade; Gobbo, Angélica Rita; da Costa, Patrícia Fagundes; Salgado, Claudio Guedes; Lopes Eufraseo, Gracivane; Vitor de Souza Fernandes, André; Santos-Lobato, Bruno Lopes; Santos, Sidney; C. Cavalcante, Giovanna; M. Ribeiro dos Santos, André; de Araújo, Gilderlanio S.; Ribeiro- dos-Santos, Ândrea

doi:10.1038/s42003-025-08126-4

Download PDF

Article
Open access
Published: 30 May 2025

Mitochondrial ancestry from complete mitogenomes highlights a lack of characterization of indigenous haplogroups in Brazilian Amazon population

Felipe Gouvea de Souza¹,
Gustavo Barra Matos¹,
Camille Sena Santos¹,
Tatiane Piedade Souza¹,
Angélica Rita Gobbo²,
Patrícia Fagundes da Costa²,
Claudio Guedes Salgado²,
Gracivane Lopes Eufraseo³,
André Vitor de Souza Fernandes³,
Bruno Lopes Santos-Lobato³,
Sidney Santos^1,4,
Giovanna C. Cavalcante¹,
André M. Ribeiro dos Santos¹,
Gilderlanio S. de Araújo ORCID: orcid.org/0000-0001-9199-9419¹ &
…
Ândrea Ribeiro- dos-Santos^1,4

Communications Biology volume 8, Article number: 835 (2025) Cite this article

3432 Accesses
3 Citations
1 Altmetric
Metrics details

Subjects

Abstract

Mitochondrial genome is an essential resource to investigate the dispersion and diversification of human populations. However, the lack of genomic data from non-European ancestry hinders demographic, anthropologic and genetic studies. Indigenous people from the Brazilian Amazon are especially underrepresented among large genetic datasets, despite their significative contribution to the Brazilian genetic pool. We investigated 157 mitochondrial genomes from the Brazilian Amazon using bioinformatics tools for ancestry inference and phylogenetics. Results were compared with 2578 global mitogenomes to assess haplogroup frequencies and genetic distances. Thirteen macrohaplogroups were identified from investigated samples, including Indigenous, European and African ancestries. Despite being prominent in our sample, Indigenous haplogroup assignment confidence by prediction tools was lower and many samples lacked key-defining variants, suggesting misassignment due to low representation in reference datasets. We also identified potential new groups sharing specific variants. Thus, underrepresentation of Indigenous haplogroups contributes to phylogenetic inconsistencies and needs future investigations.

Exploring the mitochondrial DNA ancestry of patients with type 1 diabetes from an admixed population of the Northeast of Brazil

Article Open access 01 July 2025

A benchmarking of human mitochondrial DNA haplogroup classifiers from whole-genome and whole-exome sequence data

Article Open access 15 October 2021

Challenges in selecting admixture models and marker sets to infer genetic ancestry in a Brazilian admixed population

Article Open access 08 December 2022

Introduction

Data produced by whole mitochondrial genome (mitogenome) sequencing can substantially increase the discriminatory power of mitochondrial DNA (mtDNA), revealing ancestry-specific variants¹. Due to its maternal inheritance, mtDNA can be used to obtain information about genetic ancestry through the identification of mitochondrial haplogroups, defined by the sequential accumulation of alterations inherited from a common ancestor^1,2. This information can be used to trace the phylogeographic distribution of mtDNA in different populations and to identify lineages of specific geographic regions. Further, mtDNA variation may contribute to differences in the prevalence of diseases observed among different ethnic groups^1,2.

However, current mtDNA haplogroup identification methods face certain difficulties, such as the bias of classification based on consensus haplotypes of the revised Cambridge Reference Sequence (rCRS)^3,4. Additionally, admixed populations, such as those in the Americas, represent a peculiar case for genetic ancestry inference because they originated through a complex process of admixture from population of different continents: Indigenous, Europeans, Africans, and more recently East Asians^5,6.

In this context, the Brazilian population is one of the most ethnically diverse in the world and is currently divided into five main macro-regions (North, Northeast, Central-West, Southeast, and South), each with distinct histories of colonization and settlement. This diversity is reflected in the genetic structure of the country^7,8. Thus, characterization of mitochondrial ancestry markers can elucidate past and current patterns of admixture, complementing prior knowledge about the biological composition of the Brazilian population^7,8,9,10.

Among the Brazilian macro-regions, the Amazonian (or North) region stands out for having a larger contribution of Indigenous mitochondrial haplogroups (A–D and X)^11,12,13 in its population compared to other regions of the country^7,14,15,16. However, there is still a lack of complete mitogenome data for this territory, which hinders demographic genetic studies and limits our understanding of individuals health-disease status, especially those of Indigenous ancestry.

Here, we investigated the mitochondrial haplogroups of the Amazonian region of Brazil using mitogenomes previously analyzed and published by our research group. We assessed the contribution of Indigenous lineages to the formation of this highly admixed population and evaluated the reliability of maternal ancestry predictors in defining Indigenous haplogroups.

Results and discussion

Characterization and genetic composition of the cohort

We investigated the mitogenome of 157 admixed individuals from the North region. Although there were more male than females, they both presented similar average age (Table 1). Thirteen macrohaplogroups were identified using HaploGrep3, which can be divided among those of Indigenous (macrohaplogroups A, B, C, and D), European (macrohaplogroups H, J, N, T, and U), and African (macrohaplogroups L0, L1, L2, and L3) ancestry (Fig. 1A). As expected, Indigenous haplogroups represented the majority (64.97%), followed by African (28.03%) and European haplogroups (7%). This haplotypic distribution is consistent with the literature and reflects the processes of occupation and colonization of the Brazilian territory, especially of the Amazon region^{7,17,18,19,20}. When sub-structuring the Indigenous macrohaplogroups, haplogroups C, A2, and B2 were the most frequent in the cohort (Fig. 1B), consistent with previous description of Indigenous population from the region^7,19,20,21. Although haplogroups A–D are also present in East Asia, fact that supports the East Asian origin of Indigenous lineages²¹, we find unlikely that all of the individuals here with these haplogroups are indeed of genetic East Asian origin.

**Fig. 1: Frequency of mitochondrial macrohaplogroups and haplogroups in the Amazonian population.**

Table 1 Demographics of the investigated cohort

Full size table

Evaluation of indigenous haplogroup identification

While evaluating the mitochondrial haplogroup assignments of HaploGrep3 tool, we observed that those belonging to European and African ancestries had an average prediction confidence around 0.95 while Indigenous haplogroups was around 0.89 (Fig. 2A). The lower assignment confidence of Indigenous haplogroups reflects the limited data available for such ethnic background in global public databases^22,23. Admixed Americans comprise only 3.72% gnomAD²⁴, currently one of the largest genomic databases. The mtDNA-specialized database EMPOP²⁵ has 17% of their samples attributed to Indigenous origin and 6% to admixed Americans, among which only eight Indigenous samples are from Brazil and none are from the Amazon region. Such restrictions interferes with the correct identification of mitochondrial lineages that are still represented today in the Amazon region.

**Fig. 2: Haplogroup identification confidence using HaploGrep3.**

Further, European, African and East Asian populations from the 1000 Genomes Project (1000G) have an average prediction confidence greater than 0.95 when carrying out the same analysis, while our sample presented an average confidence of 0.9 (Fig. 2B). This result is not surprising given that the populations included in 1000G were more frequently investigated and have more information available in public databases, unlike of the Amazonian populations from Brazil²¹.

Underrepresentation of ethnically diverse populations, such as those from the Amazon, in global genetic datasets can impact ancestry inference, impair discovery of population-specific variants, hinder identification of disease causal variants, reduce accuracy of genetic risk predictions, and limit our understanding of shared genetic risk factors affecting health outcomes^22,26,27.

Distribution of indigenous mitochondrial variants

We investigated all the variants highlighted by HaploGrep3, as well as all the expected variants for the assigned haplogroup that were absent in the sequence of 102 mitogenomes of Indigenous ancestry. Most ancestry-defining haplotypes were identified. However, the classification of some individuals presented a significant margin of error, attributed to the absence of certain ancestry-defining variants (Table 2).

Table 2 Distribution of haplotype-defining variants in the amazon population among indigenous haplogroups

Full size table

Although most of A2, B2, and C individuals presented all their ancestry-defining variants, not all of them possess every required mutation, which suggests that the missing variants may have undergone reverse selection. Eight individuals from haplogroup A2 lacked the 16111T and 1736G variants and 14 individuals did not present 8027A, all haplogroup-defining variants, thus being uncertain if they were correctly classified. Unlike those, an A8 individual did not present the variant 16242T, essential for the characterization of such haplogroup, so this individual should have been assigned to the haplogroup A, due to the presence of 16223T, 16290T and 16319A and absence of 16242T^28,29.

Regarding haplogroups B2 and B4, all B2 individuals lacked the 15535T mutation, and seven lacked the 3547G mutation. Among the C individuals, five lacked the 10873C mutation, and one individual did not present the 16327 T mutation, which is crucial for the characterization of this haplogroup. This suggests that this individual might be an intersection between haplogroups C and CZ^28,29. In our whole cohort, only 23 individuals had the 7028T mutation (A2 = 4, B2 = 6, C = 4, C1 = 1, D1 = 1, L1 = 1, L2 = 3, L3 = 2, and N = 1). This is curious since this variant is highly frequent in most populations, especially in Asian, African, and Latin American/admixed populations, except for the European haplogroup H^24,30.

Phylogenetic analyses

Aiming to explore the relationship between our samples, we reconstructed a phylogenetic tree using the maximum parsimony method and rooted the resulting tree on two L0 individuals (Methods; Figs. 3, S1). The resulting tree reflects the major human mitochondrial divisions presented in PhyloTree^28,31, where all L samples were outgroups and meta-haplogroups M (composed by haplogroups C and D), N (A and N), and R (H, J, T, U, and B) grouped as expected. Overall, samples were clustered according to their assigned macrohaplogroups, thus reinforcing the robustness of this analysis.

**Fig. 3: Maximum parsimony phylogenetic tree.**

However, we observed unexpected behaviors when analyzing within Indigenous haplogroups. Individuals classified as C and C1 constitute a single indistinguishable clade from each other. Haplogroup C1 is mainly distinguished from C by the presence of 16325C, but almost all samples classified as C present this variant. Individuals classified as D1 and D4 presented a similar pattern with all grouped under a single clade. Haplogroup D1 is distinguished from D4 by the presence of 16325C and 2092T, here we found that most D1 individuals lack 16325C while all D4 individuals presented the variant. These haplogroups are distinguished only by the presence of 2092T. Underrepresentation of samples of Indigenous ancestry in the PhyloTree construction, especially those from the Amazon region, may be impairing the HaploGrep3 classification.

Representativeness of cohort haplogroups in other populations

We constructed a PCA based on mitochondrial macrohaplogroup frequencies to explore the genetic relationship between Amazon population (BRA) and 1000G populations (Fig. 4A), which includes African, European, East Asian, South Asian and Admixed American populations³². PC1 explained 29.3% of the dataset variation and PC2 21.3%. PCA indicated the existence of five distinct clusters, corresponding to the global African, American, European, South and East Asian populations. Notably, our Amazonian sample was in an intermediate position between all clusters, which is in agreement with the literature and demonstrates the historical processes of migration, admixture and colonization of the Brazilian territory^6,33,34.

**Fig. 4: Worldwide analysis of haplogroups representation.**

Next, we compare the frequency of haplogroups from our cohort to selected populations from 1000G that presented haplogroups found among our samples (Fig. 4B). The African haplogroups from our cohort (L0, L1, L2 and L3) had a relevant presence only in populations from Nigeria, Sierra Leone, Kenya, Gambia, Barbados and among Afro-Americans from the US.

Indigenous haplogroups were only prevalent among American populations that underwent similar admixture processes as Brazil (Fig. 4B). Haplogroup A2 was found mostly in populations from Puerto Rico, Peru, Mexicans from California and Colombia. Peruvian population presented a significative presence of B2, C and D1 with a similar distribution as our sample. Although Indigenous haplogroups are found among Admixed American populations, they remain under-represented in general and misses specific haplogroups such as A8 and D4. Thus, this evaluation demonstrates the lack of Indigenous ancestry data in public datasets.

We further explored the genetic similarities and divergences between our sample and 1000 G populations by calculating their pairwise Fst genetic distance. For this analysis, we divided our cohort according to their haplogroup ancestry (European: BRA-EUR, African: BRA-AFR, and Indigenous: BRA-IND). These populations grouped with themselves and distinguished from all other populations (Figure S2A). BRA-EUR presented the largest distance to all other populations, with highest value against IBS (Fst=0.13). Our cohort presented the smallest distance to Afro-American populations, such as ASW and BRA-IND (Fst=0.08). These results were reinforced when analyzing pairwise Fst genetic distance of this study groups and 1000 G populations grouped by their geographic region in American, European, South Asian, East Asian, African and African-American (Figure S2B). Our cohort presented the largest distance to European populations (Fst = 0.10), and greater proximity to African-American (Fst = 0.08) and African populations (Fst = 0.09).

In one of the few studies about the complete mitogenome in Brazil, Avila et al.³⁵ produced similar findings when analyzing 96 individuals from South, Southeast and Central-West regions of Brazil. They indicated that the Brazilian population had a high proximity to the African-American population, a low proximity to European populations, and an intermediate proximity to South Asian populations. These findings are supported by genetic, historical and ethnogeographic information previously reported in the Brazilian territory ^5,7,33,35.

Haplogroups representation among indigenous populations

Next, we compared the frequency of Indigenous haplogroups (Fig. 5) on our Admixed Amazonian cohort to 74 Indigenous mitogenomes included in the construction of PhyloTree^21,31,36 (Supplementary Table 2). Our cohort presented a greater haplogroup frequency similarity to Amazon and South American Indigenous population. Amazonian Indigenous and our cohort presented a greater predominance of haplogroups A2, B2, and C1 than other groups, as previously described^{1,11,13,18,37,38}, while other Brazilian Indigenous populations lack A2 and C. Further, the X2 haplogroup specificity to North American Indigenous is in agreement with the literature^21,39 and reflects the limited expansion of this group to the rest of the continent.

**Fig. 5: Relative frequency of indigenous haplogroups among our cohort and indigenous populations.**

We further explored the relationship between these samples by constructing a heatmap for mutation presence for haplogroups A to D (Supplementary Figs. 3–6). All samples presented a low haplogroup assignment confidence by HaploGrep3 (0.89 ± 0.06). In fact, samples assigned to different haplogroups presented a high genetic similarity. Two A2 samples clustered with A2 + (64), despite lacking 64 T, due to their shared 310C (Supplementary Fig. 3). Haplogroups B2, B2b, and B2h did not cluster together (Supplementary Fig. 4). One B2e and two B2 clustered due to their exclusive 8281–8289 deletion. B2 and B2*2 clustered due to three variants (103A, 14470C, and 16241G).

Among macrohaplogroup C, most C1b and C1c samples grouped according to their haplogroup assignments (Supplementary Fig. 5). Most samples assigned as C presented 16325C, which defines the C1 clade, and at least half of these also presented 493G, which defines C1b. C1d1 cluster included two C assigned samples, which presented C1d defining variants (16325C and 16051G). Another cluster of three C1c and six C samples presented all C1c defining variants (1888A and 15930A). These examples present strong indication of misassignment of those C samples.

Macrohaplogroup D presented the highest average assignment confidence (0.95 ± 0.02) compared to the others (Supplementary Fig. 6). All D4 samples lacked 2092T but presented 16325C, which defines haplogroup D1. They all clustered with other D1 samples and differed from them by a single mutation.

Among macrohaplogroups A, B, and C, potential new clusters can be identified (Supplementary Figs. 3–5). Three A2 samples clustered due to five exclusive variants (9096C, 10463G, 15951G, 16126C and 16259T). Three B2 samples shared four variants (2392C, 5150T, 15936G, 16178C) and other three B samples shared 16 variants (top of Supplementary Fig. 6). One C sample clustered with C1b*3 due to three exclusive variants (8383C, 10586A, and 16093C), a cluster of six C samples presented C1d defining variants and shared 7697A and 15004T, and other three C samples shared six variants that were almost exclusive (top of Supplementary Fig. 5).

Final remarks

Representation of ancestries and haplogroups from different continents or geographic regions varies widely between genetic panels. Panels more representative of ethnically diverse populations lead to better genomic imputation compared to others⁴⁰. Previous studies have reported that under-representation of non-European populations influence the accuracy of genotype imputation^27,41. Worldwide GWAS are mainly concentrated in USA and European countries, which contribute to underrepresentation of the worldwide genomic diversity, and leads to a low prediction quality among underrepresented populations^22,26,27,42. As of November 2024, the majority of genomic studies were conducted on individuals of European ancestry, which represents 90% of all investigated individuals, while admixed individuals from Latin America accounted for only 0.34%⁴². Previous studies have also shown that Hispanic/Latino and African American genomic studies contribute with a disproportionate larger number of associations compared to a similarly sized study with European ancestry^42,43. Therefore, increasing representation of samples of non-European ancestry in genomic datasets is critical to discover new variants associated with diseases and explain complex phenotypes.

Various Indigenous haplogroups are present among the Brazilian Amazon populations and their haplotype are underrepresented among large genomic datasets such as 1000 G and PhyloTree. This lack of data produces poor assignment confidence to Indigenous haplogroups tools such as HaploGrep3. Although there are efforts to increase the diversity of genomic information from Latin America countries, such as PoblAr⁴⁴, Urugenomes⁴⁵ and LatinGen⁴⁶, there is still a scarcity of data from Brazil and Brazilian Amazon, especially regarding the mitochondrial genome. In this sense, this work complements efforts to improve mitochondrial reference data and increase our understanding of underrepresented populations phylogenetic distribution worldwide. However, due to limitations of the current study, further studies with larger cohorts should be carried out to characterize the genetic diversity of Amazonian populations.

Methodology

Sample characterization

A total of 157 mitogenomes from Brazilian Amazon were analyzed in this study. Data was gathered from two previous studies from our research group:

De Souza et al.⁴⁷ analyzed the mitogenome of 33 individuals diagnosed with leprosy and 37 control individuals.
Matos et al.⁴⁸ analyzed the mitogenome of 45 individuals diagnosed with Parkinson disease and 42 control individuals.

DNA extraction, quantification, amplification, and sequencing are described in their respective studies. This work adhered to the Declaration of Helsinki.

To expand our analysis, we also integrated to our dataset the mitogenome of 2504 individuals from the 1000 G phase 3³² and 74 Indigenous mitogenome from various regions the Americas^21,36. Their detailed population description can be found on Table S1 and Table S2, respectively.

Sequencing pre-processing and bioinformatics analysis

To assess sequencing quality, we used FastQC v0.12.1⁴⁹ and MultiQC v1.19⁵⁰ before and after data pre-processing. We used FastP (v0.23.4)⁵¹ to remove low-quality bases, sequencing adapters, and reads with less than 36 bp. After treatment and quality assessment, reads were aligned to the mtDNA reference sequence (rCRS) using the Burrows-Wheeler alignment tool (BWA) v0.7.17⁵². The resulting alignments were sorted using SAMTools v1.15.1⁵³, and duplicate reads were removed using Picard v2.27.5⁵⁴. To identify and remove samples with contamination issues, we employed Haplocheck v1.3.3⁵⁵ tolerating up to 10% of contamination and excluding samples exceeding this threshold.

Detection and analysis of variants

After pre-processing and aligning the sequences, BAM files were submitted to mutserve v2.0.1 for single nucleotide polymorphisms (SNPs) calling, annotation and heteroplasmy detection (https://mitoverse.readthedocs.io/mutserve/mutserve/)⁵⁶. Functional annotation of each variant, locus and mtDNA complex was then summarized through custom scripts developed with R programming language⁵⁷ and RStudio environment⁵⁸. To improve the reliability of the variant analyses, we required an overall average depth > 550x and heteroplasmy levels below < 0.10.

Inference of mitochondrial haplogroups

Inference of mitochondrial haplogroups was conducted using HaploGrep3 v3.2.1⁵⁹ (available at https://haplogrep.i-med.ac.at/), which performs automatic assignment of mtDNA haplogroups based on PhyloTree 17—Forensic Update 1.2³¹. The inference is performed using the Kulczynski calculation between the sample and presumed haplogroup based on sets of expected and observed polymorphisms, weighted by the relative recurrence of each polymorphism in the phylogenetic tree.

Phylogenetic analysis

Mitogenome sequences were generated using HaploGrep3 with -write-fasta-msa option. The resulting.fasta files were aligned using MAFFT v7.526 software^60,61 and later loaded into MEGA v11.0.10 software^62,63. We opted to build a Maximum Parsimony tree with 1500x bootstrap⁶⁴ and exported it in Newick format (.nwk). The resulting tree was rooted in two L0 individuals, to reflect our current understanding of human origin^13,31. This strategy was developed following the recommendations from Behar et al.⁶⁵. The resulting tree was visualized in R programming language using the treeio⁶⁶ and ggtree⁶⁷ packages.

Worldwide mitogenome diversity

We performed a Principal Component Analysis (PCA) based on the frequency of all Macrohaplogroups with 1000 G populations and this study cohort. To calculate the Fst pairwise distance between populations, we aggregated our genotype data to mitochondrial data from 1000 G to create a genotype matrix in the R environment using the vcfR package⁶⁸. The resulting matrix was converted to a hierfstat dataframe used to calculate Fst according to Weir and Cockerham⁶⁹ equation using hierfstat package⁷⁰. The results were visualized using pheatmap package⁷¹, factoextra⁷², FactoMineR⁷³ and plotly⁷⁴.

Then, we combined data from this study with a 74 Indigenous mitogenome from various regions of the Americas^21,36. We compared the relative haplogroup frequency among our cohort and different Indigenous populations. Next, we compared these samples mutation profile and presented it using a presence heatmap. The heatmap was visualized using ComplexHeatmap package⁷⁵.

Statistics and reproducibility

Statistical analyses and visualizations were conducted using R (v. 4.3.2). The statistical methods and tests employed for each analysis are explained in the text and figure legends, when applicable. A comprehensive description of the statistical methodology applied for each of haplogroups assignment, phylogenetics and worldwide mitogenome diversity analysis, respectively, is available in the “Methods” section.

Reporting summary

Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.

Code availability

The computer code and scripts used in this study are available at the following Zenodo⁷⁶.

Data availability

The data analyzed here were generated in previous studies^47,48 and were deposited at the European Nucleotide Archive (ENA) under the accession numbers PRJEB59275 and PRJEB74357. Data from Matos et al.⁴⁸ is also available on Zenodo⁷⁷.

References

Wood, M. R. et al. Resolving mitochondrial haplogroups B2 and B4 with next-generation mitogenome sequencing to distinguish Native American from Asian haplotypes. Forsenic Sci. Int. Genet. 43, 102143 (2019).
Article CAS Google Scholar
Mitchell, S. L. et al. Characterization of mitochondrial haplogroups in a large population-based sample from the United States. Hum. Genet. 133, 861–868 (2014).
Article CAS PubMed PubMed Central Google Scholar
Andrews, R. M. et al. Reanalysis and revision of the Cambridge reference sequence for human mitochondrial DNA. Nat. Genet. 23, 147 (1999).
Article CAS PubMed Google Scholar
Rubin, J. D., Vogel, N. A., Gopalakrishnan, S., Sackett, P. W. & Renaud, G. HaploCart: Human mtDNA haplogroup classification using a pangenomic reference graph. PLOS Comput. Biol. 19, e1011148 (2023).
Article CAS PubMed PubMed Central Google Scholar
Escher, L. M. et al. Challenges in selecting admixture models and marker sets to infer genetic ancestry in a Brazilian admixed population. Sci. Rep. 12, 21240 (2022).
Article CAS PubMed PubMed Central Google Scholar
Schaan, A. P. et al. New insights on intercontinental origins of paternal lineages in Northeast Brazil. BMC Evol. Biol. 20, 15 (2020).
Article PubMed PubMed Central Google Scholar
Schaan, A. P. et al. mtDNA structure: the women who formed the Brazilian Northeast. BMC Evol. Biol. 17, 185 (2017).
Article PubMed PubMed Central Google Scholar
Resque, R. et al. Male Lineages in Brazil: intercontinental admixture and stratification of the European Background. PLoS ONE 11, e0152573 (2016).
Article PubMed PubMed Central Google Scholar
Hernández, C. Mitochondrial DNA in human diversity and health: from the Golden Age to the Omics Era. Genes 14, 1534 (2023).
Article PubMed PubMed Central Google Scholar
Criollo-Rayo, A. A. et al. Native American gene continuity to the modern admixed population from the Colombian Andes: implication for biomedical, population and forensic studies. Forensic Sci. Int. Genet. 36, e1–e7 (2018).
Article CAS PubMed Google Scholar
Martínez-Cortés, G. et al. Maternal admixture and population structure in Mexican–Mestizos based on mtDNA haplogroups. Am. J. Phys. Anthropol. 151, 526–537 (2013).
Article PubMed Google Scholar
Torroni, A. et al. Native American mitochondrial DNA analysis indicates that the Amerind and the Nadene populations were founded by two independent migrations. Genetics 130, 153–162 (1992).
Article CAS PubMed PubMed Central Google Scholar
Badano, I. et al. Mitochondrial DNA ancestry, HPV infection and the risk of cervical cancer in a multiethnic population of northeastern Argentina. PLoS ONE 13, e0190966 (2018).
Article PubMed PubMed Central Google Scholar
Poletto, M. M., Malaghini, M., Silva, J. S., Bicalho, M. G. & Braun-Prado, K. Mitochondrial DNA control region diversity in a population from Parana state—increasing the Brazilian forensic database. Int. J. Leg. Med. 133, 347–351 (2019).
Article CAS Google Scholar
Freitas, J. M., Fassio, L. H., Braganholi, D. F. & Chemale, G. Mitochondrial DNA control region haplotypes and haplogroup diversity in a sample from Brasília, Federal District, Brazil. Forensic Sci. Int. Genet. 40, e228–e230 (2019).
Article CAS PubMed Google Scholar
Sanches, N. M., Paneto, G. G., Figueiredo, R. F., De Mello, A. O. & Cicarelli, R. M. B. Mitochondrial DNA control region diversity in a population from Espirito Santo state, Brazil. Mol. Biol. Rep. 41, 6645–6648 (2014).
Article CAS PubMed Google Scholar
Carvalho, B. M., Bortolini, M. C., Santos, S. E. B. D. & Ribeiro-dos-Santos, ÂK. C. Mitochondrial DNA mapping of social-biological interactions in Brazilian Amazonian African-descendant populations. Genet. Mol. Biol. 31, 12–22 (2008).
Article Google Scholar
Pena, S. D. J. et al. The genomic ancestry of individuals from different geographical regions of Brazil is more uniform than expected. PLoS ONE 6, e17063 (2011).
Article CAS PubMed PubMed Central Google Scholar
dos Santos, A. K. C. R. & Zago, M. A. Diversidade de linhagens do DNA mitocondrial de Ameríndios da Amazônia:populações contemporâneas e ancestrais (1996).
Freires Junior, A. A. Diversidade das regiões hipervariáveis (HVS-I, HVS-II, HVS-III) do DNA mitocondrial em tribos indigenas da Amazônia Brasileira (Universidade Federal do Pará, 2008).
Fagundes, N. J. R. et al. Mitochondrial population genomics supports a single pre-clovis origin with a coastal route for the peopling of the Americas. Am. J. Hum. Genet. 82, 583–592 (2008).
Article CAS PubMed PubMed Central Google Scholar
Fatumo, S. et al. A roadmap to increase diversity in genomic studies. Nat. Med. 28, 243–250 (2022).
Article CAS PubMed PubMed Central Google Scholar
Aquilano, E., De La Fuente, C., Rodríguez Golpe, D., Motti, J. M. B. & Bravi, C. M. Sequencing errors in Native American mitogenomes: Impact on clade definitions, haplogroup assignation, and beyond. Mitochondrion 70, 54–58 (2023).
Article CAS PubMed Google Scholar
Chen, S. et al. A genomic mutational constraint map using variation in 76,156 human genomes. Nature 625, 92–100 (2024).
Article CAS PubMed Google Scholar
Parson, W. & Dür, A. EMPOP—a forensic mtDNA database. Forensic Sci. Int. Genet. 1, 88–92 (2007).
Article PubMed Google Scholar
Sirugo, G., Williams, S. M. & Tishkoff, S. A. The missing diversity in human genetic studies. Cell 177, 26–31 (2019).
Article CAS PubMed PubMed Central Google Scholar
O’Connell, J. et al. A population-specific reference panel for improved genotype imputation in African Americans. Commun. Biol. 4, 1269 (2021).
Article PubMed PubMed Central Google Scholar
Dür, A., Huber, N. & Parson, W. Fine-tuning phylogenetic alignment and haplogrouping of mtDNA sequences. Int. J. Mol. Sci. 22, 5747 (2021).
Article PubMed PubMed Central Google Scholar
Weissensteiner, H. et al. HaploGrep 2: mitochondrial haplogroup classification in the era of high-throughput sequencing. Nucleic Acids Res. 44, W58–W63 (2016).
Article CAS PubMed PubMed Central Google Scholar
Karczewski, K. J. et al. The mutational constraint spectrum quantified from variation in 141,456 humans. Nature 581, 434–443 (2020).
Article CAS PubMed PubMed Central Google Scholar
Van Oven, M. & Kayser, M. Updated comprehensive phylogenetic tree of global human mitochondrial DNA variation. Hum. Mutat. 30, E386–E394 (2009).
Article PubMed Google Scholar
The 1000 Genomes Project Consortium et al. A global reference for human genetic variation. Nature 526, 68–74 (2015).
Article Google Scholar
Pena, S. D. J., Santos, F. R. & Tarazona-Santos, E. Genetic admixture in Brazil. Am. J. Med. Genet. C Semin. Med. Genet. 184, 928–938 (2020).
Article CAS PubMed Google Scholar
Simão, F., Ferreira, A. P., De Carvalho, E. F., Parson, W. & Gusmão, L. Defining mtDNA origins and population stratification in Rio de Janeiro. Forensic Sci. Int. Genet. 34, 97–104 (2018).
Article PubMed Google Scholar
Avila, E. et al. Full mtDNA genome sequencing of Brazilian admixed populations: a forensic-focused evaluation of a MPS application as an alternative to Sanger sequencing methods. Forensic Sci. Int. Genet. 42, 154–164 (2019).
Article CAS PubMed Google Scholar
Kivisild, T. et al. The role of selection in the evolution of human mitochondrial genomes. Genetics 172, 373–387 (2006).
Article CAS PubMed PubMed Central Google Scholar
Alves-Silva, J. et al. The ancestry of Brazilian mtDNA lineages. Am. J. Hum. Genet. 67, 444–461 (2000).
Article CAS PubMed PubMed Central Google Scholar
Batista Dos Santos, S. E., Rodrigues, J. D., Ribeiro-Dos-Santos, A. K. C. & Zago, M. A. Differential contribution of indigenous men and women to the formation of an urban population in the Amazon region as revealed by mtDNA and Y-dNA. Am. J. Phys. Anthropol. 109, 175–180 (1999).
Article CAS PubMed Google Scholar
Havaš Auguštin, D. et al. Refining the global phylogeny of mitochondrial N1a, X, and HV2 haplogroups based on rare mitogenomes from Croatian isolates. Genes 14, 1614 (2023).
Article PubMed PubMed Central Google Scholar
Sengupta, D. et al. Performance and accuracy evaluation of reference panels for genotype imputation in sub-Saharan African populations. Cell Genom. 3, 100332 (2023).
Article CAS PubMed PubMed Central Google Scholar
Cahoon, J. L. et al. Imputation accuracy across global human populations. Am. J. Hum. Genet. 111, 979–989 (2024).
Article CAS PubMed PubMed Central Google Scholar
Mills, M. C. & Rahal, C. The GWAS Diversity Monitor tracks diversity by disease in real time. Nat. Genet. 52, 242–243 (2020).
Article CAS PubMed Google Scholar
Martin, A. R. et al. Clinical use of current polygenic risk scores may exacerbate health disparities. Nat. Genet. 51, 584–591 (2019).
Article CAS PubMed PubMed Central Google Scholar
Dopazo, H., Llera, A. S., Berenstein, M. & Gonzáles-José, R. Genomas, enfermedades y medicina de precisión: un Proyecto Nacional. Cienc. Tecnol. Política 2, 019 (2019).
Article Google Scholar
Wire, B. Institut Pasteur de Montevideo, Genomic Medicine Institute-Seoul National University, and Macrogen Launch Urugenomes Project to Decode the Genome of Uruguyan Population (2014).
Giugliani, R. & Matte, U. The Latin-American network of human genetics. Public Health Genom. 7, 74–75 (2004).
Article Google Scholar
De Souza, F. G. et al. Whole mitogenome sequencing uncovers a relation between mitochondrial heteroplasmy and leprosy severity. Hum. Genom. 17, 110 (2023).
Article Google Scholar
Matos, G. B. et al. The mitogenome mutation repertoire affects progression of Parkinson´s Disease. Preprint at https://doi.org/10.21203/rs.3.rs-5411701/v1 (2024).
Andrews, S. FastQC: a quality control tool for high throughput sequence data. Babraham Bioinform. (2010).
Ewels, P., Magnusson, M., Lundin, S. & Käller, M. MultiQC: summarize analysis results for multiple tools and samples in a single report. Bioinformatics 32, 3047–3048 (2016).
Article CAS PubMed PubMed Central Google Scholar
Chen, S. Ultrafast one-pass FASTQ data preprocessing, quality control, and deduplication using fastp. iMeta 2, e107 (2023).
Article CAS PubMed PubMed Central Google Scholar
Li, H. & Durbin, R. Fast and accurate long-read alignment with Burrows-Wheeler transform. Bioinformation 26, 589–595 (2010).
Article Google Scholar
Danecek, P. et al. Twelve years of SAMtools and BCFtools. Gigascience 10, giab008 (2021).
Article PubMed PubMed Central Google Scholar
Toolkit, P. Broad Institute, GitHub Repository. Broad Inst. (2019).
Weissensteiner, H. et al. Contamination detection in sequencing studies using the mitochondrial phylogeny. Genome Res. 31, 309–316 (2021).
Article CAS PubMed PubMed Central Google Scholar
Weissensteiner, H. et al. mtDNA-Server: next-generation sequencing data analysis of human mitochondrial DNA in the cloud. Nucleic Acids Res. 44, W64–W69 (2016).
Article CAS PubMed PubMed Central Google Scholar
Core Team, R. R: A Language and Environment for Statistical Computing (R Foundation of Statistical Computing, 2014).
Team, R. Studio R Studio: Integrated Development Environment for R (R Foundation of Statistical Computing, 2021).
Schönherr, S., Weissensteiner, H., Kronenberg, F. & Forer, L. Haplogrep 3 - an interactive haplogroup classification and analysis platform. Nucleic Acids Res. 51, W263–W268 (2023).
Article PubMed PubMed Central Google Scholar
Katoh, K., Rozewicki, J. & Yamada, K. D. MAFFT online service: multiple sequence alignment, interactive sequence choice and visualization. Brief. Bioinform. 20, 1160–1166 (2019).
Article CAS PubMed Google Scholar
Kuraku, S., Zmasek, C. M., Nishimura, O. & Katoh, K. aLeaves facilitates on-demand exploration of metazoan gene family trees on MAFFT sequence alignment server with enhanced interactivity. Nucleic Acids Res. 41, W22–W28 (2013).
Article PubMed PubMed Central Google Scholar
Tamura, K., Stecher, G. & Kumar, S. MEGA11: molecular evolutionary genetics analysis version 11. Mol. Biol. Evol. 38, 3022–3027 (2021).
Article CAS PubMed PubMed Central Google Scholar
Stecher, G., Tamura, K. & Kumar, S. Molecular Evolutionary Genetics Analysis (MEGA) for macOS. Mol. Biol. Evol. 37, 1237–1239 (2020).
Article CAS PubMed PubMed Central Google Scholar
Felsenstein, J. Confidence limits on phylogenies: an approach using the Bootstrap. Evolution 39, 783–791 (1985).
Article PubMed Google Scholar
Behar, D. M. et al. A Copernican reassessment of the human mitochondrial DNA tree from its root. Am. J. Hum. Genet. 90, 675–684 (2012).
Article CAS PubMed PubMed Central Google Scholar
Yu G. [Aut, C. treeio. Bioconductor https://doi.org/10.18129/B9.BIOC.TREEIO (2017).
Yu G. [Aut, C. ggtree. Bioconductor https://doi.org/10.18129/B9.BIOC.GGTREE (2017).
Knaus, B. J. & Grünwald, N. J. vcfr: a package to manipulate and visualize variant call format data in R. Mol. Ecol. Resour. 17, 44–53 (2017).
Article CAS PubMed Google Scholar
Weir, B. S. & Cockerham, C. C. Estimating F-statistics for the analysis of population structure. Evolution 38, 1358 (1984).
CAS PubMed Google Scholar
Goudet, J. Hierfstat, a package for R to compute and test hierarchical F -statistics. Mol. Ecol. Notes 5, 184–186 (2005).
Article Google Scholar
Kolde, R. & Kolde, M. R. Package ‘pheatmap’. R. Package 1, 790 (2015).
Google Scholar
Kassambara, A. & Mundt, F. factoextra: extract and visualize the results of multivariate. Data Analyses 1, 0–7 (2016).
Google Scholar
F. Husson, J. Josse, S. Le, J. Mazet. FactoMineR: multivariate exploratory data analysis and data mining. 2. 11 https://doi.org/10.32614/CRAN.package.FactoMineR (2006).
Sievert, C. Interactive Web-Based Data Visualization with R, Plotly, and Shiny. CRC Press, Taylor and Francis Group2020.
Book Google Scholar
Zuguang Gu. ComplexHeatmap. Bioconductor https://doi.org/10.18129/B9.BIOC.COMPLEXHEATMAP (2017).
Gouvea de Souza, F., Ribeiro-dos-Santos, A. M. & Araújo, G. Mitochondrial Ancestry from Complete Mitogenomes Highlights a Lack of Characterization of Indigenous Haplogroups in Brazilian Amazon Population. Zenodo https://doi.org/10.5281/ZENODO.15120168 (2025).
Barra Matos, G. & Araújo, G. mtDNA-network data - Parkinson’s Disease: mtDNA sequencing. Zenodo https://doi.org/10.5281/ZENODO.15150211 (2025).

Download references

Acknowledgements

This research was funded by Conselho Nacional de Desenvolvimento Científico e Tecnológico (CNPq-620139/2006-4), Coordenação de Aperfeiçoamento de Pessoal de Nível Superior (Biocomputacional-Protocol no. 3381/2013/CAPES; Grant-Protocol no. 88887.912165/2023-00/CAPES); A.R.S is supported by Conselho Nacional de Desenvolvimento Científico e Tecnológico (CNPq/Brazil) with Produtividade (304413/2015–1) and Pró-Reitoria de Pesquisa e Pós-Graduação da Universidade Federal do Pará (PROPESP/UFPA).

Author information

Authors and Affiliations

Laboratory of Human and Medical Genetics (LGHM), Institute of Biological Sciences, Federal University of Pará (UFPA), Belém, PA, Brazil
Felipe Gouvea de Souza, Gustavo Barra Matos, Camille Sena Santos, Tatiane Piedade Souza, Sidney Santos, Giovanna C. Cavalcante, André M. Ribeiro dos Santos, Gilderlanio S. de Araújo & Ândrea Ribeiro- dos-Santos
Laboratory of Dermato-Immunology (LDI), Institute of Biological Sciences, Federal University of Pará (UFPA), Marituba, PA, Brazil
Angélica Rita Gobbo, Patrícia Fagundes da Costa & Claudio Guedes Salgado
Laboratory of Experimental Neurology (LaNEx), Federal University of Pará (UFPA), Belém, Pará, Brazil
Gracivane Lopes Eufraseo, André Vitor de Souza Fernandes & Bruno Lopes Santos-Lobato
Oncology Research Center (NPO), Federal University of Pará (UFPA), Belém, PA, Brazil
Sidney Santos & Ândrea Ribeiro- dos-Santos

Authors

Felipe Gouvea de Souza
View author publications
Search author on:PubMed Google Scholar
Gustavo Barra Matos
View author publications
Search author on:PubMed Google Scholar
Camille Sena Santos
View author publications
Search author on:PubMed Google Scholar
Tatiane Piedade Souza
View author publications
Search author on:PubMed Google Scholar
Angélica Rita Gobbo
View author publications
Search author on:PubMed Google Scholar
Patrícia Fagundes da Costa
View author publications
Search author on:PubMed Google Scholar
Claudio Guedes Salgado
View author publications
Search author on:PubMed Google Scholar
Gracivane Lopes Eufraseo
View author publications
Search author on:PubMed Google Scholar
André Vitor de Souza Fernandes
View author publications
Search author on:PubMed Google Scholar
Bruno Lopes Santos-Lobato
View author publications
Search author on:PubMed Google Scholar
Sidney Santos
View author publications
Search author on:PubMed Google Scholar
Giovanna C. Cavalcante
View author publications
Search author on:PubMed Google Scholar
André M. Ribeiro dos Santos
View author publications
Search author on:PubMed Google Scholar
Gilderlanio S. de Araújo
View author publications
Search author on:PubMed Google Scholar
Ândrea Ribeiro- dos-Santos
View author publications
Search author on:PubMed Google Scholar

Contributions

Â.R.S. and G.S.A. designed the study; C.G.S., A.M.R.S., C.S.S., T.P.S., A.R.G., P.F.C., G.L.E., A.V.S.F., G.C.C. and B.L.S.L. collected the data and revised the manuscript; F.G.S., G.B.M. and G.S.A. performed the analyses; C.G.S., B.L.S.L., S.S. and Â.R.S., acquired the resources; F.G.S., G.B.M. and A.M.R.S. wrote and revised the manuscript; Â.R.S., B.L.S.L., C.G.S., and G.S.A. supervised the project. All authors have read and agreed to the published version of the manuscript.

Corresponding author

Correspondence to Ândrea Ribeiro- dos-Santos.

Ethics declarations

Competing interests

The authors declare no competing interests.

Peer review

Peer review information

Communications Biology thanks the anonymous reviewers for their contribution to the peer review of this work. Primary Handling Editors: Aylin Bircan. A peer review file is available.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

Supplemental Material (download PDF )

Reporting Summary (download PDF )

Transparent Peer Review file (download PDF )

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.

Reprints and permissions

About this article

Cite this article

de Souza, F.G., Matos, G.B., Sena Santos, C. et al. Mitochondrial ancestry from complete mitogenomes highlights a lack of characterization of indigenous haplogroups in Brazilian Amazon population. Commun Biol 8, 835 (2025). https://doi.org/10.1038/s42003-025-08126-4

Download citation

Received: 01 October 2024
Accepted: 24 April 2025
Published: 30 May 2025
Version of record: 30 May 2025
DOI: https://doi.org/10.1038/s42003-025-08126-4