Abstract
Bos taurus (taurine) and Bos indicus (indicine) cattle diverged at least 150,000 years ago and, since that time, substantial genomic differences have evolved between the two lineages. During the last two millennia, genetic exchange in Africa has resulted in a complex tapestry of taurine-indicine ancestry, with most cattle populations exhibiting varying levels of admixture. Similarly, there are several Southern European cattle populations that also show evidence for historical gene flow from indicine cattle, the highest levels of which are found in the Central Italian White breeds. Here we use two different software tools (MOSAIC and ELAI) for local ancestry inference (LAI) with genome-wide high- and low-density SNP array data sets in hybrid African and residually admixed Southern European cattle populations and obtained broadly similar results despite critical differences in the two LAI methodologies used. Our analyses identified genomic regions with elevated levels of retained or introgressed ancestry from the African taurine, European taurine, and Asian indicine lineages. Functional enrichment of genes underlying these ancestry peaks highlighted biological processes relating to immunobiology and olfaction, some of which may relate to differing susceptibilities to infectious diseases, including bovine tuberculosis, East Coast fever, and tropical theileriosis. Notably, for retained African taurine ancestry in admixed trypanotolerant cattle we observed enrichment of genes associated with haemoglobin and oxygen transport. This may reflect positive selection of genomic variants that enhance control of severe anaemia, a debilitating feature of trypanosomiasis disease, which severely constrains cattle agriculture across much of sub-Saharan Africa.
Similar content being viewed by others
Introduction
Long acknowledged in plants (Anderson 1949), gene flow and hybridisation between interfertile taxa are increasingly recognised as important evolutionary processes in animals (Hedrick 2013; Payseur and Rieseberg 2016; Taylor and Larson 2019). Genetic exchange between populations can provide an abundant source of new functional genomic variation—both adaptive and maladaptive—that can generate novel combinations of alleles at individual genes, and interacting gene loci, thereby altering gene regulatory networks, biochemical pathways, physiological outputs, and ultimately phenotypic outcomes (Arnold and Kunte 2017; Edelman and Mallet 2021; Tigano and Friesen 2016). In this respect, hybrid zones, where evolutionarily distinct but interfertile animal taxa interact to produce admixed populations, represent natural laboratories for evolutionary studies (Hewitt 1988). It has also been observed that gene flow, reticulate evolution, and admixture between distinct lineages and from wild congeners are common features of many domestic animal species, including pigs (Sus scrofa), dogs (Canis familiaris), sheep (Ovis aries), goats (Capra hircus), and chickens (Gallus gallus) (Frantz et al. 2015; Freedman and Wayne 2017; Lv et al. 2022; Pogorevc et al. 2024; Wang et al. 2020b).
Cattle were domesticated from the now extinct aurochs (Bos primigenius) (Bailey et al. 1996) and humpless Bos taurus (taurine) cattle were some of the first large ruminants to be domesticated 10–11,000 years ago in the Fertile Crescent region (Conolly et al. 2011; Larson et al. 2014; Zeder 2017). Approximately 2000 years later, humped Bos indicus (indicine or zebu) cattle were domesticated in present-day Pakistan (Utsunomiya et al. 2019) and analyses of genome-scale DNA sequence data show that the B. taurus and B. indicus lineages likely diverged 150–500 kya (Chen et al. 2018; Wang et al. 2018; Wu et al. 2018). Consequently substantial genomic differences have evolved between the two subspecies, making hybrid cattle an excellent resource for addressing fundamental scientific questions concerning the role of gene flow, admixture, and introgression in mammalian microevolution (Bahbahani et al. 2017; Chen et al. 2018; Chen et al. 2023; Flori et al. 2014; Friedrich et al. 2023; Kim et al. 2020; Kwon et al. 2022; Mbole-Kariuki et al. 2014; McTavish and Hillis 2014; Ward et al. 2022; Wu et al. 2018).
Cattle populations from several regions around the globe exhibit evidence of B. taurus/B. indicus admixture, although gene flow and genomic introgression between the two subspecies are most well understood and surveyed in Africa (Decker et al. 2014; Hanotte et al. 2002; Hanotte et al. 2000; Kim et al. 2020; MacHugh et al. 1997). Domestication and the subsequent spread and interactions of different taurine and indicine cattle populations have resulted in gradients of B. taurus and B. indicus ancestry across the African continent (Hanotte et al. 2002; Mwai et al. 2015; Verdugo et al. 2019). There are approximately 150 breeds of indigenous cattle in sub-Saharan Africa and African cattle represent a complex tapestry of African B. taurus and Asian B. indicus ancestry, with some populations also exhibiting significant non-African B. taurus genetic influence (Hanotte et al. 2002; Kim et al. 2020; MacHugh et al. 1997). The indigenous B. taurus cattle of Africa are generally adapted to humid and subhumid zones associated with sedentary subsistence farming and consequently face particular disease challenges (FAO 2015). As a result of their longer history of exposure and adaptation to high pathogen and parasite burdens on the continent, African B. taurus cattle have several advantages over B. indicus cattle in terms of disease tolerance and resistance (de Clare Bronsvoort et al., 2013; Mwai et al. 2015). Cattle that are predominantly indicine in ancestry are normally transhumant livestock adapted to the arid and semi-arid regions of the continent and are favoured by many farmers due to their larger size and higher production yields, while hybrid populations tend to inhabit environments somewhere between these extremes (FAO 2015; Mwai et al. 2015).
One particularly important disease for African cattle is African animal trypanosomiasis (AAT) or nagana, a wasting disease caused by parasitic protozoa of the genus Trypanosoma transmitted by biting insect vectors such as tsetse flies (Glossina spp.), which causes fever, severe weight loss, and anaemia (MacGregor et al. 2021; Steverding 2008). Cattle agriculture in sub-Saharan Africa is severely constrained by AAT because, even with the availability of trypanocidal drugs, the high susceptibility of many breeds to trypanosomiasis renders them unproductive in regions with significant tsetse burdens (Berthier et al. 2015; Yaro et al. 2016). However, some African B. taurus breeds have a tolerance of trypanosome infection termed “trypanotolerance”, which enables these cattle to control parasitaemia and anaemia, making them more productive than trypanosusceptible breeds in many areas of West and Central Africa (Berthier et al. 2015; Murray and Black 1985). These trypanotolerant populations, which include the longhorn N’Dama and shorthorn Baoule, Lagune, and Somba breeds, are therefore an important genetic resource as they are uniquely suited to livestock production in these areas (Berthier et al. 2015; Yaro et al. 2016).
Trypanotolerance has been shown to be a heritable multigenic trait, with variability in tolerance among individual animals within trypanotolerant populations (Hanotte et al. 2003; Kambal et al. 2023). Some African B. taurus/B. indicus hybrid cattle breeds are also known to exhibit trypanotolerance; however, trypanotolerant breeds with high levels of B. taurus ancestry have a greater capacity to control anaemia, while hybrid animals exhibit intermediate levels of control compared to trypanosusceptible B. indicus breeds (Bahbahani et al. 2018; Berthier et al. 2015). The genomic architecture of trypanotolerance in cattle remains poorly understood, although some candidate genes have been proposed, and identification of genes and genomic regulatory elements (GREs) underpinning the trait may facilitate introduction or enhancement of the trait via genome-enabled breeding or genome editing (Yaro et al. 2016).
In contrast to the complex nature of African cattle ancestry, the majority of European cattle populations are comprised of almost exclusively European B. taurus ancestry; however, there are several breeds in Southern Europe that are known to exhibit modest levels of African B. taurus and/or B. indicus ancestry (da Fonseca et al. 2019; Upadhyay et al. 2019). The most well-characterised of these, which also have the highest levels of indicine admixture, are the group of populations known as Central Italian White cattle (Barbato et al. 2020). Compared to temperate taurine cattle, B. indicus cattle have enhanced heat and drought tolerance and introgression of genomic variants from B. indicus into Central Italian White cattle may have made these breeds better adapted to extreme summer heat events on the Italian peninsula (Hooyberghs et al. 2019).
Admixture and introgression among populations can be studied at a sub-chromosomal level using statistical methods for surveying locus-specific or local ancestry, which in contrast to global ancestry proportions, corresponds to the ancestry of specific genomic segments that consist of unbroken ancestry blocks from different donor populations (Gompert and Buerkle 2013). A range of methods for local ancestry inference (LAI) using genome-scale data have been developed (Tan and Atkinson 2023; Wu et al. 2021). Two widely used software tools are Efficient Local Ancestry Inference (ELAI) (Guan 2014) and MOSAIC Organizes Segments of Ancestry In Chromosomes (MOSAIC) (Salter-Townshend and Myers 2019). ELAI fits a two-layer hidden Markov model (HMM) that allows ancestry switching anywhere along the genome; however, it requires the donor reference populations and the approximate number of generations since the admixture occurred to be preassigned. Additionally, the donor reference populations should be as genetically similar to the original source populations as possible. MOSAIC also fits a two-layer HMM but employs a different strategy that determines how closely related each segment of chromosome in every admixed individual genome is to chromosomal segments in individual genomes from potential donor reference populations and infers a stochastic relationship between donor reference panels and mixing populations. Unlike other methods, MOSAIC does not require the donor reference populations to be direct surrogates for the original source populations and it can also infer the number of generations since the start of an admixture process. However, the MOSAIC algorithm requires phased haplotypes and a recombination rate map.
For the present study we performed a range of population genomics analyses and comparative LAI using the ELAI and MOSAIC software tools with a panel of African and European cattle breeds that exhibit varying levels of African taurine, European taurine, and Asian indicine ancestries. Two different genome-wide SNP data sets were used: a high-density SNP data set consisting of more than 600,000 SNPs and a low-density data set encompassing approximately 30,000 SNPs. These analyses allowed us to assess the ELAI and MOSAIC algorithms as tools for LAI in admixed cattle. We were also able to systematically catalogue and functionally evaluate genomic regions exhibiting evidence for elevated levels of introgressed or retained ancestry from the three cattle lineages.
Materials and methods
High-density genome-wide cattle SNP data sets
For this study new Illumina® BovineHD 777 K BeadChip SNP data sets were generated for 39 African cattle (23 Somba, 8 N’Dama and 8 Boran). The Somba breed data were obtained using DNA samples previously published as part of a microsatellite-based survey of cattle genetic diversity (Freeman et al. 2004) and were generated by Weatherbys Scientific (Naas, Ireland) using standard procedures for Illumina® SNP array genotyping. The N’Dama and Boran data were obtained using cattle DNA samples from a trypanosome challenge time-course experiment (O’Gorman et al. 2009) and were generated by Neogen Europe (Ayr, Scotland) also using standard procedures. Additional Illumina® BovineHD 777 K BeadChip data sets were obtained from published studies (Bahbahani et al. 2017; Barbato et al. 2020; Upadhyay et al. 2017; Verdugo et al. 2019; Ward et al. 2022; Wragg et al. 2022) and the Web-Interfaced next generation Database Exploration (WIDDE) repository Sempéré et al. (2015).
The total data set consisted of high-density 777 K SNP data for 1030 cattle before filtering and 24 different populations were represented, including three European B. taurus populations (Holstein-Friesian, Angus, and Jersey); three African B. taurus populations (Muturu, Lagune, and Guinean N’Dama); three B. indicus populations (Tharparkar, Gir, and Nelore); five residually admixed European populations (Romagnola, Chianina, Marchigiana, Maremmana, and Alentejana); five trypanotolerant African hybrid populations (hybrid N’Dama, Borgou, Somba, Keteku, and Sheko) and five trypanosusceptible African hybrid populations (Ankole, Nganda, East African Shorthorn Zebu, Karamojong, and Boran). The cattle BovineHD 777 K SNP data were converted to binary PLINK files with Illumina® allele coding for the FORWARD strand as required using PLINK (v. 1.90 beta 6.25) (Chang et al. 2015) and SNPchiMp (v. 3) (Nicolazzi et al. 2015). The sample data were then merged with PLINK (v. 1.90 beta 6.25).
Figure 1 illustrates the overall study workflow including the genome assembly updating, data preparation, and filtering steps, which are described in the following subsections and that were implemented prior to the population genomics analyses. Table 1 shows the taxonomic, breed, geographical, sample number (pre- and post-SNP data filtering), and sources for the BovineHD 777 K BeadChip SNP data sets. There was a total of 750 individual animal BovineHD 777 K BeadChip SNP data sets retained after filtering.
Cattle images by Tracy A. Heath, T. Michael Keesey and Steven Traver via phylopic.org.
Software resources
The following R packages were used with R (v. 4.3.0) (R Core Team 2023) to analyse and visualise the data in this study: ape (v. 5.7.1) (Paradis and Schliep 2019), dplyr (v. 1.1.2) (Wickham et al. 2023a), ggh4x (v. 0.2.4) (van den Brand 2023), ggnewscale (v. 0.4.9) (Campitelli 2023), ggplot2 (v. 3.4.2) (Wickham 2009), ggrepel (v. 0.9.3) (Slowikowski 2023), ggtext (v. 0.1.2) (Wilke and Wiernik 2022), ggtree (v. 3.9.0.1) (Yu 2022), magick (v. 2.8.1) (Ooms 2023) with ImageMagick (v. 6.9.12.96) (ImageMagick Studio LLC 2023), magrittr (v. 2.0.3) (Bache and Wickham 2022), patchwork (v. 1.1.2) (Pedersen 2023), parallel (v. 4.3.0) (R Core Team 2023), readr (v. 2.1.4) (Wickham et al. 2023b), scales (v. 1.2.1) (Wickham et al. 2023c), stringr (v. 1.5.0) (Wickham 2023), tibble (v. 3.2.1) (Müller and Wickham 2023), tidyr (v. 1.3.0) (Wickham et al. 2023d), tidytree (v. 0.4.4) (Yu 2022), and treeio (v. 1.25.2) (Wang et al. 2020a). Colours were generated from khroma (v. 1.10.0) (Frerebeau 2023) and viridis (v. 0.6.3) (Garnier et al. 2023).
Updating the bovine genome assembly
The BovineHD 777 K BeadChip SNP locations were updated from the UMD3.1 bovine genome assembly to the current assembly ARS-UCD1.2 (Rosen et al. 2020) using coordinates from the National Animal Genome Research Programme (NAGRP) Data Repository genotyping array SNP mapping to ARS-UCD1.2 resource (Schnabel 2018) and PLINK (v. 1.90 beta 6.25).
Data preparation and filtering
Generation of a low-density SNP array data set
To compare the results of analyses of both high- and low-density SNP array data sets from the same set of animals a comparative low-density SNP array data set was produced. To do this the high density BovineHD 777 K SNP data set was downsampled to the subset of the 46,713 SNPs in common with the Illumina® Bovine SNP50 BeadChip using PLINK (v. 1.90 beta 6.25). A list of the Bovine SNP50 BeadChip SNPs modified from the NAGRP Data Repository (Schnabel 2018) was used for this purpose.
Missing SNP removal
Individual animals that had missing SNP call rates exceeding 0.95 from the low-density data set were removed using a missing genotype filter with PLINK (v. 1.90 beta 6.25). The same set of animals were also removed from the high-density data set
Removal of duplicate samples by identity-by-state filtering
Duplicate samples present in two or more data sources were removed using PLINK (v. 1.90 beta 6.25) and a previously described identity-by-state methodology (Browett et al. 2018). The method was modified to select one from each pair of animals that had an identity-by-state value greater than or equal to 0.99. The resulting list of sample duplicates were removed from the high- and low-density data sets with PLINK (v. 1.90 beta 6.25).
Removal of admixed animals from the reference populations
An inbreeding analysis in PLINK (v. 1.90 beta 6.25) was used to remove animals that showed evidence for significant admixture in the reference populations (three European B. taurus populations: Holstein-Friesian, Angus, and Jersey; three African B. taurus populations: Muturu, Lagune, and Guinean N’Dama; and three B. indicus populations: Tharparkar, Gir, and Nelore). To do this, outlier animals with statistically lower inbreeding values than the rest of the population were identified via boxplots. The resulting list of animals were removed from the high- and low-density data sets. A systematic inbreeding analysis was then performed with PLINK (v. 1.90 beta 6.25) and the output was modified to identify animals with the top 25 and bottom 25 inbreeding values across the three European B. taurus populations (Holstein-Friesian, Angus, and Jersey). These samples were then removed from the high- and low-density data sets to balance the numbers of animals across the reference groups. A final inbreeding analysis of the low-density data set after the filters were applied was performed to compare the results with those of the high-density data set.
Filtering of SNPs by call rate and minor allele frequency
The high- and low-density data sets were filtered to retain autosomal SNPs with a minimum call rate of 95% and minor allele frequency (MAF) of at least 5% with PLINK (v. 1.90 beta 6.25). The methodologies used for this process have been described in a previous study (McHugo et al. 2019).
Population genomics analyses
Principal component analysis
Principal component analysis (PCA) was performed for the high- and low-density data sets using smartpca after file conversion with convertf, both part of EIGENSOFT package (v. 7.1.2) (Patterson et al. 2006).
Genetic structure analysis
Genetic structure analysis was performed for the high- and low-density data sets using structure_threader (v. 1.3.4) (Pina-Martins et al. 2017) with fastStructure (v. 1.0) (Raj et al. 2014). The structure analysis was carried out with the model complexity or number of populations (K) set from 2 to 25. The chooseK function was used to test the outputs to find a range of values of K that best accounted for the structure in the data (Raj et al. 2014).
Phylogenetic analysis
An additional sample of three gaur (B. gaurus) that were genotyped using the BovineHD 777 K BeadChip (Sempéré et al. 2015; Verdugo et al. 2019) were available to use as an outgroup. After the pre-processing steps described above were performed to convert the data to binary PLINK files with Illumina® allele coding for the FORWARD strand on the ARS-UCD1.2 bovine genome assembly, the B. gaurus data set was filtered with PLINK (v. 1.90 beta 6.25) to retain only SNPs present in the high-density data set. The gaur data were then merged with the high-density data set and an additional filter was applied with PLINK (v. 1.90 beta 6.25) to retain autosomal SNPs with a minimum call rate of 95% and minor allele frequency of at least 5%. A gzipped allele frequency cluster file was produced with PLINK (v. 1.90 beta 6.25) and the resulting file was converted to TreeMix format using the plink2treemix python script provided with the TreeMix software package (v. 1.13) (Pickrell and Pritchard 2012).
Phylogenetic analysis was performed for both the high- and low-density SNP data sets using TreeMix (v. 1.13) with the number of migration edges (m) set from 1 to 15 for ten iterations using windows of SNPs (k) increasing from 100 to 1000 by increments of 100. The OptM package (v. 0.1.6) (Fitak 2021) was used with R (v. 4.3.0) to calculate the mean and standard deviation (SD) across the 10 iterations for the composite likelihood (L(m)), proportion of variance explained and the second-order rate of change (Δm) across migration edges (m). The BITE R package (v. 1.2.0008) (Milanesi et al. 2017) was also used to generate a Unix shell script customised to perform 100 TreeMix bootstrap replicates for the selected numbers of migration edges (m). The results were visualised using a modified version of the script provided with TreeMix.
Local ancestry estimation
MOSAIC analysis
The high- and low-density SNP data sets were separated by chromosome with PLINK (v. 1.90 beta 6.25) and each chromosome was phased with SHAPEIT (v. 2.r904) (O’Connell et al. 2014). The resulting segregated chromosome SNP data files were converted to MOSAIC format using the R script provided with MOSAIC (v. 1.5.0) (Salter-Townshend and Myers 2019) and R (v. 4.3.0). Recombination rate files were prepared from a cattle recombination map (Ma et al. 2015) using an R script adapted from Ward et al. (2022) with R (v. 4.3.0). For each admixed population three-way local ancestry analysis was performed across all autosomes without FST estimation and assuming an effective population size (Ne) of 400 using MOSAIC (v. 1.5.0) with R (v. 4.3.0). The potential donor populations were the three European B. taurus populations (Holstein-Friesian, Angus, and Jersey), the three African B. taurus populations (Muturu, Lagune, and Guinean N’Dama) and the three B. indicus populations (Tharparkar, Gir, and Nelore).
ELAI analysis
The high- and low-density SNP data sets were separated and converted into BIMBAM format for each population and chromosome with PLINK (v. 1.90 beta 6.25). Local ancestry analysis was carried out for each admixed population and autosome with 30 expectation-maximization (EM) steps, 3 upper clusters, 15 lower clusters, and 200 mixing generations using ELAI (v. 1.0) (Guan 2014). The donor populations for each admixed population were selected based on the results of the MOSAIC analysis.
Local ancestry analysis comparison
The local ancestry results were extracted and mean ancestry scores across the individual admixed animals and a genome-wide z-score for each of the three ancestry components were calculated for each admixed population. Weighted mean ancestry scores and z-scores were calculated across the admixed populations within each group of residually admixed European, trypanotolerant African hybrids, and trypanosusceptible African hybrids for a subset of the admixed populations selected to only include populations with a minimum of 15 animals and a relatively stable level of admixture based on a visual examination of the structure results.
Functional enrichment of introgressed regions
Functional enrichment was performed and visualised using gprofiler2 (v. 0.2.2) (Kolberg et al. 2023) with R (v. 4.3.0). The background set was the set of genes within 1 Mb up- and downstream from a SNP in the high-density data set. The query sets were the genes within 1 Mb up- and downstream from the SNPs with a z-score ≥2.0 for each of the ancestries. An interval size range for this analysis of ±1 Mb around focal LAI segments was selected based on previous functional population genomics studies of gene flow and admixture in archaic and modern human populations (Colbran et al. 2019; McQuillan et al. 2022).
Results
High-density genome-wide cattle SNP data sets
After filtering for missing genotypes (30 samples removed), identity-by-state (Fig. S1; 194 samples removed), and inbreeding (Figs. S2, S3; 56 samples removed), there were 750 animals in the high- and low-density SNP data sets (Table 1). Filtering for autosomal SNPs with a minimum call rate of 95% and MAF of at least 5% retained 614,026 SNPs in the high-density data set with a total genotyping rate of 98.20%, and 30,706 SNPs in the low-density data set with a total genotyping rate of 95.91%.
Population genomic analyses
Principal component analysis
The first principal component (PC1) explained 53.59% of the total variation for PC1–10 in the high-density SNP data set and separated the B. taurus and B. indicus lineages (Fig. 2). The second principal component (PC2) explained a further 19.71% of the total variation for PC1–10 in the high-density SNP data set and separated the European B. taurus and African B. taurus lineages (Fig. 2). The admixed animals were dispersed among the reference populations with the residually admixed European animals clustering close to the European B. taurus group and the African hybrid animals mostly located between the African B. taurus and B. indicus groups (Fig. 2). The trypanotolerant African hybrid individuals are closest to the African B. taurus group while the trypanosusceptible African hybrid animals are closest to the B. indicus group (Fig. 2). The same pattern was observed for the low-density SNP data set (Fig. S4).
A Principal component analysis of the selected high-density SNP data set with cattle samples coloured according to population showing the first two principal components (PC1 and PC2), and (B) bar chart of the proportion of variance for the top ten PCs.
Genetic structure analysis
The structure results for K = 2 separates the B. taurus and B. indicus ancestries in the high-density SNP data set (Fig. 3). With K = 3, the structure results divide the European and African B. taurus and B. indicus ancestries (Fig. 3). For the high-density SNP data set the model complexity that maximizes marginal likelihood was 16 and the model components used to explain the structure in the data was 17 (Figs. S5, S6). For the low-density SNP data set the model complexity that maximizes marginal likelihood and the model components used to explain the structure in the data was 16 (Figs. S7, S8).
Results are shown for a range of assumed values for the number of ancestral populations (K = 2–3).
Phylogenetic analysis
After the B. gaurus outgroup animals were added and filters for autosomal SNPs with a minimum call rate of 95% and MAF of at least 5% were applied there were 613,334 SNPs in the high-density SNP data set with a total genotyping rate of 99.65%, and 30,644 SNPs in the low-density SNP data set with a total genotyping rate of 99.50%. The optimum number of migration edges indicated by the first peak in Δm for both the high- and low-density SNP data sets is three while the number of migration edges required to explain 99.8% of the variation in the data is 12 and 11 for the high- and low-density SNP data sets, respectively (Figs. S9, S10). The phylogenetic analysis results clearly distinguish and group the European B. taurus, African B. taurus, and B. indicus populations into different clades with a high degree of confidence (Fig. 4). The residually admixed European populations are grouped with the European B. taurus populations with a similarly high degree of confidence, while the trypanotolerant and trypanosusceptible African hybrid populations are placed between the African B. taurus and B. indicus populations with varying degrees of confidence (Fig. 4). This pattern holds regardless of the number of migration edges or SNP data set density (Figs. S11–S15). The introduction of migration edges into the phylogenetic tree indicates admixture between the hybrid African and African B. taurus populations when m is set to 3 (Figs. 4, S14). For much higher values of m, the admixture shown includes the residually admixed European populations (Figs. S12, S15).
A TreeMix phylogenetic tree for the high-density SNP data set with bootstrap values and three migration edges, and (B) heatmap showing the standard error values.
Local ancestry estimation
Weighted mean local ancestry results were calculated for each of the three ancestry components (European B. taurus, African B. taurus, and B. indicus) for each of the three admixed groups (residually admixed European, trypanotolerant African hybrid, and trypanosusceptible African hybrid) using the mean results from populations with more than 15 samples (Table 1), and relatively stable hybridisation indicated by more uniform levels of admixture across all individuals of the population based on visual examination of the structure results at K = 3 (Fig. 3). The residually admixed European group included the Romagnola and Chianina populations; the trypanotolerant African hybrid group included the Borgou and Sheko populations; and the trypanosusceptible African hybrid group included the Ankole, Nganda, East African Shorthorn Zebu, Karamojong, and Boran populations. The results for the high-density SNP data set showed similar patterns using both MOSAIC and ELAI when examined visually, as did the ELAI results for the low-density SNP data set (Fig. 5). The MOSAIC results for the low-density SNP data set exhibited a noticeable smoothening across the genome for all three ancestry components in all three admixed groups (Fig. 5). This was particularly evident for the ancestry components with lower proportions, such as the African B. taurus and B. indicus components in the residually admixed European group, and the European B. taurus component in the trypanotolerant African hybrid group (Fig. 5). When individual chromosome results were examined, the high-density SNP local ancestry results for MOSAIC and ELAI and the low-density SNP ELAI results showed peaks for the various ancestry components around the major histocompatibility complex (MHC) located on BTA23, although this was not evident for the low-density MOSAIC results (Figs. S16–S27). Correlation plots between the MOSAIC and ELAI results for each ancestry component along each chromosome indicated positive correlations for the high-density SNP results for all three admixed groups (Figs. S28–S30) while the low-density SNP results indicated much weaker or no correlations (Figs. S31–S33). To identify SNPs within the peaks of local ancestry for each ancestry component genome-wide z-scores of the weighted mean local ancestry results were used to select SNPs with z-scores ≥ 2.0 for each software and data set (Table 2). There were no SNPs that passed the z ≥ 2 threshold for the European B. taurus ancestry component in the residually admixed European group for the high-density SNP MOSAIC and ELAI results and the low-density SNP MOSAIC results, while the trypanotolerant and trypanosusceptible African hybrid groups had the lowest number of SNPs passing the z ≥ 2 threshold for the African B. taurus and B. indicus ancestry components, respectively (Table 2). Similar proportions of SNPs were found for each software and SNP data set (Table 2).
Results are presented for residually admixed European, trypanotolerant African hybrid, and trypanosusceptible African hybrid groups across all autosomes for the MOSAIC and ELAI analyses of high- and low-density SNP data sets. Each vertical line on the circular genome plots represents a SNP and is coloured according to the ancestry results.
Functional enrichment of introgressed regions
The proportions of the numbers of genes found within 1 Mb up- and downstream from each SNP with a z-score ≥2.0 are similar to those of the numbers of SNPs found for each ancestry component in the admixed groups for each software and SNP data set (Tables 2, S1). There were no European B. taurus SNPs that passed the z ≥ 2 threshold; consequently, there were no European B. taurus genes for functional enrichment in the residually admixed European group (Tables 2, S1). The top driver GO terms for the African B. taurus genes in the residually admixed European group included terms related to the MHC (GO:0042613 MHC class II protein complex) and other aspects of the immune system (GO:0019882 antigen processing and presentation, GO:0001914 regulation of T cell mediated cytotoxicity, and GO:0004930G protein-coupled receptor activity); protein and DNA complexes and protein binding (GO:0000786 nucleosome, GO:0030527 structural constituent of chromatin, GO:0046982 protein heterodimerization activity, GO:0065004 protein-DNA complex assembly); and olfaction (GO:0004984 olfactory receptor activity and GO:0050911 detection of chemical stimulus involved in sensory perception of smell) (Fig. 6). The top B. indicus driver GO terms also included terms relating to the MHC (GO:0042613 MHC class II protein complex) and other immune terms (GO:0019882 antigen processing and presentation and GO:0002684 positive regulation of immune system process), as well as cell membrane and signalling activity (GO:0001594 trace-amine receptor activity, GO:0009897 external side of plasma membrane, and GO:0004364 glutathione transferase activity) (Fig. 6).
Each circle represents a significantly enriched GO term with the size indicating the ratio of the intersection between the term and the introgressed genes. The y-axis shows the −log10(Padj.) value and the horizontal panels and colours indicate the ancestry component. The vertical panels indicate the source of the term, and position within each panel groups terms from the same GO subtree. The top driver GO terms (up to a maximum of 10) are indicated with a black outline and label.
The trypanotolerant African hybrid group also had driver GO terms relating to the MHC (GO:0002486 antigen processing and presentation of endogenous peptide antigen via MHC class I via ER pathway, TAP-independent and GO:0002476 antigen processing and presentation of endogenous peptide antigen via MHC class Ib); other components of the immune system (GO:0007186G protein-coupled receptor signalling pathway); and olfaction (GO:0004984 olfactory receptor activity and GO:0050911 detection of chemical stimulus involved in sensory perception of smell) among the European B. taurus terms (Fig. 6). In addition, there were also a number of terms relating to L-amino acid transmembrane transport (GO:0097638 L-arginine import across plasma membrane, GO:0000064 L-ornithine transmembrane transporter activity, GO:1903352 L-ornithine transmembrane transport, GO:1903401 L-lysine transmembrane transport, and GO:0015189 L-lysine transmembrane transporter activity) (Fig. 6). The top African B. taurus terms related to haemoglobin and oxygen binding and transport (GO:0005833 haemoglobin complex, GO:0015671 oxygen transport, GO:0019825 oxygen binding), while the top B. indicus terms related to metabolic processes (GO:0047023 androsterone dehydrogenase activity, GO:0030647 aminoglycoside antibiotic metabolic process, GO:0047086 ketosteroid monooxygenase activity, GO:0042448 progesterone metabolic process, GO:0004032 alditol:NADP+1-oxidoreductase activity, GO:0032052 bile acid binding, and GO:0016614 oxidoreductase activity, acting on CH-OH group of donors) (Fig. 6).
For the trypanosusceptible African hybrid group the only driver GO term for the European B. taurus ancestry component related to intracellular organelles (GO:0043229 intracellular organelle) (Fig. 6). The top African B. taurus terms included those related to the MHC (GO:0042613 MHC class II protein complex, and GO:0023026 MHC class II protein complex binding); other components of the immune system (GO:0019882 antigen processing and presentation, GO:0001914 regulation of T cell mediated cytotoxicity, and GO:0042605 peptide antigen binding); olfaction (GO:0004984 olfactory receptor activity, and GO:0050911 detection of chemical stimulus involved in sensory perception of smell); and protein-DNA complex and protein binding (GO:0030527 structural constituent of chromatin, GO:0000786 nucleosome, GO:0046982 protein heterodimerization activity) (Fig. 6). The top B. indicus terms included cell adhesion (GO:0007156 homophilic cell adhesion via plasma membrane adhesion molecules) and metal ion binding (GO:0005507 copper ion binding) (Fig. 6).
Similar GO term enrichment patterns were also observed using low-density SNP data with MOSAIC (Fig. S34), and for ELAI with both high- and low-density SNP data (Figs. S35, S36).
Discussion
The results of the population genomic analyses in admixed European and African cattle populations are consistent with previously published studies that have used modest numbers of genetic markers (e.g., microsatellites) and genome-wide SNP data (Barbato et al. 2020; Decker et al. 2014; Hanotte et al. 2002; Kim et al. 2020; MacHugh et al. 1997; Ward et al. 2022). Visualisation of PCA results by plotting PC1 and PC2 recovered the classic “Bos triangle” with the first two PCs explaining a very high proportion of the total variation for PC1–10 within the data (73.30%). PC1 and PC2 separated the reference European B. taurus, African B. taurus, and B. indicus populations with the admixed animal samples dispersed within the triangle with locations determined by three-way global admixture proportions (Fig. 2). The locations of the various admixed populations nearer to the reference populations they share the most ancestry, which is in agreement with previous studies (Bahbahani et al. 2017; Barbato et al. 2020; Upadhyay et al. 2017; Verdugo et al. 2019; Ward et al. 2022; Wragg et al. 2022). In addition, the clustering of some of the animals in the trypanotolerant African hybrid populations with the African B. taurus reference populations indicate that some of these animals have very high levels of African taurine ancestry (Fig. 2). In this regard, it is important to note that although a diverse panel of European B. taurus, African B. taurus, B. indicus, and hybrid cattle in the design and validation of the BovineHD 777 K BeadChip (Illumina 2015), ascertainment bias may affect the placement of admixed cattle in a PCA plot (Dokan et al. 2021; McTavish and Hillis 2015). However, genome-wide multi-locus dimension reduction tools are typically substantially less affected by ascertainment bias than analyses such as estimation of diversity statistics such as the fixation index (FST) or selection signal detection, which use individual SNP locus frequency-based statistics (Albrechtsen et al. 2010; Malomane et al. 2018; Porto Neto and Barendse 2010).
The results of the genetic structure analysis for K = 2 and K = 3 mirror those of the PCA with the first major split evident for the B. taurus and B. indicus populations and the second split separating the African and European B. taurus populations (Fig. 3). The locations of animals in the admixed populations reflect global admixture proportions that are in agreement with both their positions on the PCA and previous studies (Bahbahani et al. 2017; Barbato et al. 2020; Upadhyay et al. 2017; Verdugo et al. 2019; Ward et al. 2022; Wragg et al. 2022) (Fig. 2). The number of modelled K values that best explain the variation among the 24 populations examined in the study was 16–17, indicating that some of the populations are closely related to the point that they may not be genetically distinct discrete populations (Raj et al. 2014). The genetic structure results also show the variation within the admixed populations in terms of global admixture (Fig. 3). Some populations, such as the residually admixed European and trypanosusceptible African hybrid groups, show a relatively consistent level of global admixture across each population while the trypanotolerant African hybrid group is more variable (Fig. 3). This indicates that the residually admixed European and trypanosusceptible African hybrid breeds are more long-established admixed populations (“stable crossbreds”) and that the hybridisation within the African trypanotolerant hybrid populations is more recent and dynamic. Some of the more extreme examples, such as the N’Dama hybrid, Somba, and Keteku populations, indicate that some animals are not hybrids and are instead pure African B. taurus (Fig. 3). This is also in agreement with the PCA results and is likely due to the origins of the samples from a range of studies that sampled animals from different populations that were classified as the same breed or breed subtype (Bahbahani et al. 2017; Barbato et al. 2020; Upadhyay et al. 2017; Verdugo et al. 2019; Ward et al. 2022; Wragg et al. 2022) (Fig. 2).
The results of the phylogenetic analysis are also in agreement with the PCA and genetic structure results (Fig. 4). The reference populations are unambiguously separated into the expected groupings (European B. taurus, African B. taurus, and B. indicus) with bootstrap values of 99–100, as are the residually admixed European populations (Fig. 4). The trypanotolerant and trypanosusceptible African hybrid populations are spread between the African B. taurus and B. indicus reference populations, with some hybrid branch clusters exhibiting low bootstrap values, indicating instability in the clade structure because of taurine/indicine admixture (Fig. 4). This is also where the strongest gene flow events are inferred as modelled migration edges, demonstrating the higher levels of indicine admixture in the trypanotolerant and trypanosusceptible African hybrid populations compared to the residually admixed European populations and reflecting the differences in the age of admixture due to differing management practices (Fig. 4).
The local ancestry results show similar patterns of peaks and troughs dispersed across the genome for each LAI software tool used (MOSAIC and ELAI), the admixed cattle group examined (residually admixed European, trypanotolerant African hybrid, and trypanosusceptible African hybrid), and each genome-wide SNP data set analysed (low- or high-density); the exception being the low-density SNP data set results obtained using MOSAIC (Fig. 5). This may be due to the rephasing algorithm implemented by default as part of the MOSAIC analysis, which may overcorrect for phasing errors in low-density SNP data set (Salter-Townshend and Myers 2019). Alternatively, the smoothened results may be due to difficulty automatically estimating the age of admixture with low-density SNP data in MOSAIC (Salter-Townshend and Myers 2019). Comparing the MOSAIC high-density SNP data set results with the ELAI results using both the high- and low-density SNP data sets (Fig. 5), the similar genome-wide patterns of local ancestry obtained indicate that, despite the differences in the software used, there are robust signals of local ancestry discernible in these populations. This is evident in the marked three-way ancestry diversity around the MHC region on BTA23 seen in these results (Figs. S16–S27), particularly for the residually admixed European group using both MOSAIC and ELAI and the high-density SNP data set where there is a clear signature of elevated African taurine and indicine ancestry (Figs. S16, S22). This tendency for increased genomic introgression in the bovine MHC region is likely a consequence of the balancing selection that maintains high MHC gene polymorphism due to the key function of MHC class I and II proteins in presentation of antigenic peptides from rapidly evolving pathogens to CD4+ and CD8+ T cells and via interactions with receptors on natural killer (NK) cells (Codner et al. 2012; Ellis 2004; Ellis and Hammond 2014). Balancing selection acting on pre-existing trans-specific polymorphisms and introgressed variants would give rise to extensive polymorphism in the MHC region (Radwan et al. 2020) and this has been observed in several species (Hedrick 2013), including Homo sapiens where there is evidence that Neanderthal (Homo sapiens neanderthalensis) and Denisovan (Homo sapiens subsp. ‘Denisova’) MHC gene variants have readily introgressed into anatomically modern human populations (Liston et al. 2021; Racimo et al. 2015). However, it is also important to note that there are known difficulties in genotyping the MHC region (Dicks et al. 2021).
The correlations observed between the MOSAIC and ELAI results for the high-density SNP data sets (Figs. S28–S33) provides additional evidence that supports the visually apparent similarities in the local ancestry signals observed across the bovine genome using both software approaches. The proportions of the numbers of SNPs passing the genome-wide threshold (z-score ≥2.0) were similar for the MOSAIC and ELAI analyses using both the high- and low-density SNP data sets, which despite the visual differences in the local ancestry results, indicates that similar numbers of ancestry peaks can be detected using the z-score approach (Table 2). The lack of SNPs passing the threshold for the European B. taurus ancestry component in the residually admixed European group is likely due to the high and relatively uniform proportion of the European B. taurus ancestry component across the genome. This would give rise to a situation such that no SNPs could pass the threshold of two standard deviations from the mean (Fig. 5). Similarly, the lower numbers of SNPs passing the threshold for the African B. taurus and B. indicus ancestry components for the trypanotolerant and trypanosusceptible African hybrid groups, respectively, is likely due to the higher proportions of the reference population ancestries to which each admixed group is most closely related (Fig. 5).
The introgressed genomic regions for the three admixed population samples show several distinct patterns in terms of functional enrichment. All three admixed groups had significant driver GO terms relating to the MHC (Fig. 6), which directly encompass MHC genes (e.g., MHC class I and II) and other genes encoding proteins that interact with MHC gene products. Visual examination of the local ancestry results supports this observation as do previous LAI studies in cattle (Figs. 5, S16–S27) (Buggiotti et al. 2021; Chen et al. 2020; Guan et al. 2022; Li et al. 2023). Other immune system related driver GO terms were also found to be significant for the three admixed groups. Several of these terms contain genes that are either up- or downstream from MHC genes in biological pathways, underscoring the importance of MHC-related immunobiology in admixed cattle. More generally, it is notable that immune genes are well represented in the top functional enrichment categories for the introgressed genomic regions since there are well documented differences among European B. taurus, African B. taurus, and B. indicus cattle populations in terms of susceptibilities to various infectious diseases such as bovine tuberculosis caused by Mycobacterium bovis (Allen et al., 2010; Lee et al. 2024); East Coast fever and tropical theileriosis caused by Theileria parva and Theileria annulata, respectively (Bahbahani and Hanotte 2015); and AAT caused by Trypanosoma spp. (Yaro et al. 2016). In this regard, many of the genes highlighted by LAI through retention of taurine ancestry in the trypanotolerant African hybrid population may represent putative candidate genes underlying the multigenic trypanotolerance trait. For example, in this group, genes associated with haemoglobin, and oxygen binding and transport cellular processes were highlighted by the GO term functional enrichment for retained African B. taurus genomic ancestry (Fig. 6). This may reflect positive selection of genomic variants that enhance control of anaemia, which is understood to be a key feature of the trypanotolerance trait in cattle (Kambal et al. 2023).
Driver GO terms relating to olfaction were also significantly enriched across the three admixed cattle groups (Fig. 6). Genes related to olfaction, such as olfactory receptor (OR) genes, have been identified in previous functional population genomics analyses of admixed cattle populations with taurine and indicine ancestry. These include, for example, genes containing breed-specific missense SNPs in admixed Ethiopian cattle (Zegeye et al. 2023), genes within genomic regions with evidence for selection signatures in admixed Turkish and Chinese cattle (Demir et al. 2023; Sun et al. 2023), and genes in population-differentiated copy-number variation regions (CNVRs) in African hybrid cattle (Jang et al. 2021). This may be due to the relatively large number of OR genes dispersed across the cattle genome, which, at more than 800 functional loci is comparable to the OR gene repertoire in the domestic dog (Canis familiaris) (Lee et al. 2013; Niimura and Nei 2007). However, recent studies have suggested that more than 500 olfactory receptors may be expressed by macrophages, immune cells involved in detection and phagocytosis of pathogens (Orecchioni et al. 2022). Macrophages are the host’s first line of defence to mycobacterial infections with evasion and reprogramming of host macrophages being key components of host-pathogen interaction (Hall et al. 2024). In this regard, it is therefore noteworthy that sequence variation at olfactory receptor gene loci has been shown to be associated with susceptibility to M. bovis infection in cattle (Ring et al. 2019).
An alternative hypothesis for enrichment of olfaction-related genes, however, could relate to detection of odorants associated with MHC diversity and selection of mates (Santos et al. 2010; Ziegler et al. 2010), although this is unlikely to be a major factor in managed male-biased cattle husbandry systems. Similarly, cattle populations under intensive human control and management are unlikely to require a keen sense of smell to find food and avoid danger; however, introgressive natural selection is likely to be acting on olfaction-related genes in free-ranging admixed African cattle populations exposed to a wide range of environmental and predation challenges (Mwai et al. 2015).
The comparative LAI analyses we have performed using low- and high-density SNP array data sets in various groups of admixed cattle with taurine and indicine genomic ancestry provides a framework for applying LAI to much larger data sets that will encompass millions of SNPs. In addition, our results will provide a context for understanding the genomic basis of heterosis in admixed cattle, particularly as it dissipates beyond the F1 generation (Syrstad 1985). Also, identification of genomic regions that have been subject to introgressive selection will provide important information for genome-enabled breeding in admixed cattle populations, particularly in Africa (Marshall et al. 2019; Mrode et al. 2019). Finally, the methodologies that we describe here can be applied to other admixed cattle populations, for example, admixed breeds in Anatolia and the Middle East that have had much longer histories of taurine/indicine genetic exchange (Verdugo et al. 2019).
Data availability
New Illumina® BovineHD 777 K BeadChip SNP data sets generated for this study have been deposited in the Dryad data repository at https://doi.org/10.5061/dryad.w3r22810n. The computer code required to repeat and reproduce the analyses is available at https://doi.org/10.5281/zenodo.11491948.
References
Albrechtsen A, Nielsen FC, Nielsen R (2010) Ascertainment biases in SNP chips affect measures of population divergence. Mol Biol Evol 27(11):2534–2547
Allen AR, Minozzi G, Glass EJ, Skuce RA, McDowell SW, Woolliams JA et al. (2010) Bovine tuberculosis: the genetic basis of host susceptibility. Proc Biol Sci 277(1695):2737–2745
Anderson E (1949) Introgressive Hybridization. John Wiley and Sons, Inc, New York
Arnold ML, Kunte K (2017) Adaptive genetic exchange: a tangled history of admixture and evolutionary innovation. Trends Ecol Evol 32(8):601–611
Bache SM, Wickham H (2022). magrittr: A Forward-Pipe Operator for R. https://magrittr.tidyverse.org
Bahbahani H, Afana A, Wragg D (2018) Genomic signatures of adaptive introgression and environmental adaptation in the Sheko cattle of southwest Ethiopia. PLoS ONE 13(8):e0202479
Bahbahani H, Hanotte O (2015) Genetic resistance: tolerance to vector-borne diseases and the prospects and challenges of genomics. Rev Sci Tech 34(1):185–197
Bahbahani H, Tijjani A, Mukasa C, Wragg D, Almathen F, Nash O et al. (2017) Signatures of selection for environmental adaptation and zebu × taurine hybrid fitness in East African Shorthorn Zebu. Front Genet 8:68
Bailey JF, Richards MB, Macaulay VA, Colson IB, James IT, Bradley DG et al. (1996) Ancient DNA suggests a recent expansion of European cattle from a diverse wild progenitor species. Proc Biol Sci 263(1376):1467–1473
Barbato M, Hailer F, Upadhyay M, Del Corvo M, Colli L, Negrini R et al. (2020) Adaptive introgression from indicine cattle into white cattle breeds from Central Italy. Sci Rep. 10(1):1279
Berthier D, Peylhard M, Dayo GK, Flori L, Sylla S, Bolly S et al. (2015) A comparison of phenotypic traits related to trypanotolerance in five West African cattle breeds highlights the value of Shorthorn taurine breeds. PLoS ONE 10(5):e0126498
Browett S, McHugo G, Richardson IW, Magee DA, Park SDE, Fahey AG et al. (2018) Genomic characterisation of the indigenous Irish Kerry cattle breed. Front Genet 9:51
Buggiotti L, Yurchenko AA, Yudin NS, Vander Jagt CJ, Vorobieva NV, Kusliy MA et al. (2021) Demographic history, adaptation, and NRAP convergent evolution at amino acid residue 100 in the world northernmost cattle from Siberia. Mol Biol Evol 38(8):3093–3110
Campitelli E (2023) ggnewscale: Multiple Fill and Colour Scales in ‘ggplot2’. https://doi.org/10.5281/zenodo.2543762
Chang CC, Chow CC, Tellier LC, Vattikuti S, Purcell SM, Lee JJ (2015) Second-generation PLINK: rising to the challenge of larger and richer datasets. Gigascience 4:7
Chen N, Cai Y, Chen Q, Li R, Wang K, Huang Y et al. (2018) Whole-genome resequencing reveals world-wide ancestry and adaptive introgression events of domesticated cattle in East Asia. Nat Commun 9(1):2337
Chen N, Xia X, Hanif Q, Zhang F, Dang R, Huang B et al. (2023) Global genetic diversity, introgression, and evolutionary adaptation of indicine cattle revealed by whole genome sequencing. Nat Commun 14(1):7803
Chen Q, Zhan J, Shen J, Qu K, Hanif Q, Liu J et al. (2020) Whole-genome resequencing reveals diversity, global and local ancestry proportions in Yunling cattle. J Anim Breed Genet 137(6):641–650
Codner GF, Stear MJ, Reeve R, Matthews L, Ellis SA (2012) Selective forces shaping diversity in the class I region of the major histocompatibility complex in dairy cattle. Anim Genet 43(3):239–249
Colbran LL, Gamazon ER, Zhou D, Evans P, Cox NJ, Capra JA (2019) Inferred divergent gene regulation in archaic hominins reveals potential phenotypic differences. Nat Ecol Evol 3(11):1598–1606
Conolly J, Colledge S, Dobney K, Vigne JD, Peters J, Stopp B et al. (2011) Meta-analysis of zooarchaeological data from SW Asia and SE Europe provides insight into the origins and spread of animal husbandry. J Archaeol Sci 38(3):538–545
da Fonseca RR, Ureña I, Afonso S, Pires AE, Jørsboe E, Chikhi L et al. (2019) Consequences of breed formation on patterns of genomic diversity and differentiation: the case of highly diverse peripheral Iberian cattle. BMC Genomics 20(1):334
de Clare Bronsvoort BM, Thumbi SM, Poole EJ, Kiara H, Auguet OT, Handel IG et al. (2013) Design and descriptive epidemiology of the Infectious Diseases of East African Livestock (IDEAL) project, a longitudinal calf cohort study in western Kenya. BMC Vet Res 9:171
Decker JE, McKay SD, Rolf MM, Kim J, Molina Alcala A, Sonstegard TS et al. (2014) Worldwide patterns of ancestry, divergence, and admixture in domesticated cattle. PLoS Genet 10(3):e1004254
Demir E, Moravčíková N, Kaya S, Kasarda R, Bilginer Ü, Doğru H et al. (2023) Genome-wide screening for selection signatures in native and cosmopolitan cattle breeds reared in Türkiye. Anim Genet 54(6):721–730
Dicks KL, Pemberton JM, Ballingall KT, Johnston SE (2021) MHC class IIa haplotypes derived by high-throughput SNP screening in an isolated sheep population. G3 (Bethesda) 11(10):jkab200
Dokan K, Kawamura S, Teshima KM (2021) Effects of single nucleotide polymorphism ascertainment on population structure inferences. G3 (Bethesda) 11(9):jkab128
Edelman NB, Mallet J (2021) Prevalence and adaptive impact of introgression. Annu Rev Genet 55:265–283
Ellis S (2004) The cattle major histocompatibility complex: is it unique? Vet Immunol Immunopathol 102(1-2):1–8
Ellis SA, Hammond JA (2014) The functional significance of cattle major histocompatibility complex class I genetic diversity. Annu Rev Anim Biosci 2:285–306
FAO (2015) The Second Report on the State of the World’s Animal Genetic Resources for Food and Agriculture. FAO Commission on Genetic Resources for Food and Agriculture Assessments, Rome, Italy
Fitak RR (2021) OptM: estimating the optimal number of migration edges on population trees using Treemix. Biol Methods Protoc 6(1):bpab017
Flori L, Thevenon S, Dayo GK, Senou M, Sylla S, Berthier D et al. (2014) Adaptive admixture in the West African bovine hybrid zone: insight from the Borgou population. Mol Ecol 23(13):3241–3257
Frantz LA, Schraiber JG, Madsen O, Megens HJ, Cagan A, Bosse M et al. (2015) Evidence of long-term gene flow and selection during domestication from analyses of Eurasian wild and domestic pig genomes. Nat Genet 47(10):1141–1148
Freedman AH, Wayne RK (2017) Deciphering the origin of dogs: from fossils to genomes. Annu Rev Anim Biosci 5:281–307
Freeman AR, Meghen CM, MacHugh DE, Loftus RT, Achukwi MD, Bado A et al. (2004) Admixture and diversity in West African cattle populations. Mol Ecol 13(11):3477–3487
Frerebeau N (2023) khroma: Colour Schemes for Scientific Data Visualization. https://doi.org/10.5281/zenodo.1472077
Friedrich J, Bailey RI, Talenti A, Chaudhry U, Ali Q, Obishakin EF et al. (2023) Mapping restricted introgression across the genomes of admixed indigenous African cattle breeds. Genet Sel Evol 55(1):91
Garnier, S, Ross, N, Rudis, R et al. (2023) viridis(Lite) - Colorblind-Friendly Color Maps for R. https://doi.org/10.5281/zenodo.4679423
Gompert Z, Buerkle CA (2013) Analyses of genetic ancestry enable key insights for molecular ecology. Mol Ecol 22(21):5278–5294
Guan X, Zhao S, Xiang W, Jin H, Chen N, Lei C et al. (2022) Genetic diversity and selective signature in Dabieshan cattle revealed by whole-genome resequencing. Biology (Basel) 11(9)
Guan Y (2014) Detecting structure of haplotypes and local ancestry. Genetics 196(3):625–642
Hall TJ, McHugo GP, Mullen MP, Ward JA, Killick KE, Browne JA et al. (2024) Integrative and comparative genomic analyses of mammalian macrophage responses to intracellular mycobacterial pathogens. Tuberculosis (Edinb) 147:102453
Hanotte O, Bradley DG, Ochieng JW, Verjee Y, Hill EW, Rege JE (2002) African pastoralism: genetic imprints of origins and migrations. Science 296(5566):336–339
Hanotte O, Ronin Y, Agaba M, Nilsson P, Gelhaus A, Horstmann R et al. (2003) Mapping of quantitative trait loci controlling trypanotolerance in a cross of tolerant West African N’Dama and susceptible East African Boran cattle. Proc Natl Acad Sci USA 100(13):7443–7448
Hanotte O, Tawah CL, Bradley DG, Okomo M, Verjee Y, Ochieng J et al. (2000) Geographic distribution and frequency of a taurine Bos taurus and an indicine Bos indicus Y specific allele amongst sub-saharan African cattle breeds. Mol Ecol 9(4):387–396
Hedrick PW (2013) Adaptive introgression in animals: examples and comparison to new mutation and standing variation as sources of adaptive variation. Mol Ecol 22(18):4606–4618
Hewitt GM (1988) Hybrid zones-natural laboratories for evolutionary studies. Trends Ecol Evol 3(7):158–167
Hooyberghs H, Berckmans J, Lefebre F, De Ridder K (2019). Heat waves and cold spells in Europe derived from climate projections. Copernicus Climate Change Service (C3S) Climate Data Store (CDS). https://doi.org/10.24381/cds.9e7ca677
Illumina (2015) Data sheet: BovineHD Genotyping BeadChip. http://www.illumina.com/Documents/products/datasheets/datasheet_bovineHD.pdf
ImageMagick Studio LLC. (2023) ImageMagick. https://imagemagick.org
Jang J, Kim K, Lee YH, Kim H (2021) Population differentiated copy number variation of Bos taurus, Bos indicus and their African hybrids. BMC Genomics 22(1):531
Kambal S, Tijjani A, Ibrahim SAE, Ahmed MKA, Mwacharo JM, Hanotte O (2023) Candidate signatures of positive selection for environmental adaptation in indigenous African cattle: A review. Anim Genet 54(6):689–708
Kim K, Kwon T, Dessie T, Yoo D, Mwai OA, Jang J et al. (2020) The mosaic genome of indigenous African cattle as a unique genetic resource for African pastoralism. Nat Genet 52(10):1099–1110
Kolberg L, Raudvere U, Kuzmin I, Adler P, Vilo J, Peterson H (2023) g:Profiler-interoperable web service for functional enrichment analysis and gene identifier mapping (2023 update). Nucleic Acids Res 51(W1):W207–W212
Kwon T, Kim K, Caetano-Anolles K, Sung S, Cho S, Jeong C et al. (2022) Mitonuclear incompatibility as a hidden driver behind the genome ancestry of African admixed cattle. BMC Biol 20(1):20
Larson G, Piperno DR, Allaby RG, Purugganan MD, Andersson L, Arroyo-Kalin M et al. (2014) Current perspectives and the future of domestication studies. Proc Natl Acad Sci USA 111(17):6139–6146
Lee K, Nguyen DT, Choi M, Cha SY, Kim JH, Dadi H et al. (2013) Analysis of cattle olfactory subgenome: the first detail study on the characteristics of the complete olfactory receptor repertoire of a ruminant. BMC Genomics 14:596
Lee S, Clementine C, Kim H (2024) Exploring the genetic factors behind the discrepancy in resistance to bovine tuberculosis between African zebu cattle and European taurine cattle. Sci Rep. 14(1):2370
Li Z, He J, Yang F, Yin S, Gao Z, Chen W et al. (2023) A look under the hood of genomic-estimated breed compositions for Brangus cattle: What have we learned? Front Genet 14:1080279
Liston A, Humblet-Baron S, Duffy D, Goris A (2021) Human immune diversity: from evolution to modernity. Nat Immunol 22(12):1479–1489
Lv FH, Cao YH, Liu GJ, Luo LY, Lu R, Liu MJ et al. (2022) Whole-genome resequencing of worldwide wild and domestic sheep elucidates genetic diversity, introgression, and agronomically important loci. Mol Biol Evol 39(2):msab353
Ma L, O’Connell JR, VanRaden PM, Shen B, Padhi A, Sun C et al. (2015) Cattle sex-specific recombination and genetic control from a large pedigree analysis. PLoS Genet 11(11):e1005387
MacGregor P, Nene V, Nisbet RER (2021) Tackling protozoan parasites of cattle in sub-Saharan Africa. PLoS Pathog 17(10):e1009955
MacHugh DE, Shriver MD, Loftus RT, Cunningham P, Bradley DG (1997) Microsatellite DNA variation and the evolution, domestication and phylogeography of taurine and zebu cattle (Bos taurus and Bos indicus). Genetics 146(3):1071–1086
Malomane DK, Reimer C, Weigend S, Weigend A, Sharifi AR, Simianer H (2018) Efficiency of different strategies to mitigate ascertainment bias when using SNP panels in diversity studies. BMC Genomics 19(1):22
Marshall K, Gibson JP, Mwai O, Mwacharo JM, Haile A, Getachew T et al. (2019) Livestock genomics for developing countries - African examples in practice. Front Genet 10:297
Mbole-Kariuki MN, Sonstegard T, Orth A, Thumbi SM, Bronsvoort BM, Kiara H et al. (2014) Genome-wide analysis reveals the ancient and recent admixture history of East African Shorthorn Zebu from Western Kenya. Heredity (Edinb) 113(4):297–305
McHugo GP, Browett S, Randhawa IAS, Howard DJ, Mullen MP, Richardson IW et al. (2019) A population genomics analysis of the native Irish Galway sheep breed. Front Genet 10:927
McQuillan MA, Ranciaro A, Hansen MEB, Fan S, Beggs W, Belay G et al. (2022) Signatures of convergent evolution and natural selection at the alcohol dehydrogenase gene region are correlated with agriculture in ethnically diverse Africans. Mol Biol Evol 39(10):msac183
McTavish EJ, Hillis DM (2014) A genomic approach for distinguishing between recent and ancient admixture as applied to cattle. J Hered 105(4):445–456
McTavish EJ, Hillis DM (2015) How do SNP ascertainment schemes and population demographics affect inferences about population history? BMC Genomics 16(1):266
Milanesi M, Capomaccio S, Vajana E, Bomba L, Fernando Garcia J, Ajmone-Marsan P et al. (2017) BITE: an R package for biodiversity analyses. bioRxiv: 181610. https://www.biorxiv.org/content/10.1101/181610v1.
Mrode R, Ojango JMK, Okeyo AM, Mwacharo JM (2019) Genomic selection and use of molecular tools in breeding programs for indigenous and crossbred cattle in developing countries: current status and future prospects. Front Genet 9:694
Müller K, Wickham H (2023) tibble: Simple Data Frames. https://tibble.tidyverse.org
Murray M, Black SJ (1985) African trypanosomiasis in cattle: working with nature’s solution. Vet Parasitol 18(2):167–182
Mwai O, Hanotte O, Kwon YJ, Cho S (2015) African indigenous cattle: unique genetic resources in a rapidly changing world. Asian-Australas J Anim Sci 28(7):911–921
Nicolazzi EL, Caprera A, Nazzicari N, Cozzi P, Strozzi F, Lawley C et al. (2015) SNPchiMp v.3: integrating and standardizing single nucleotide polymorphism data for livestock species. BMC Genomics 16(1):283
Niimura Y, Nei M (2007) Extensive gains and losses of olfactory receptor genes in mammalian evolution. PLoS ONE 2(8):e708
O’Connell J, Gurdasani D, Delaneau O, Pirastu N, Ulivi S, Cocca M et al. (2014) A general approach for haplotype phasing across the full spectrum of relatedness. PLoS Genet 10(4):e1004234
O’Gorman GM, Park SD, Hill EW, Meade KG, Coussens PM, Agaba M et al. (2009) Transcriptional profiling of cattle infected with Trypanosoma congolense highlights gene expression signatures underlying trypanotolerance and trypanosusceptibility. BMC Genomics 10:207
Ooms J (2023) magick: Advanced Graphics and Image-Processing in R. https://docs.ropensci.org/magick
Orecchioni M, Matsunami H, Ley K (2022) Olfactory receptors in macrophages and inflammation. Front Immunol 13:1029244
Paradis E, Schliep K (2019) ape 5.0: an environment for modern phylogenetics and evolutionary analyses in R. Bioinformatics 35(3):526–528
Patterson N, Price AL, Reich D (2006) Population structure and eigenanalysis. PLoS Genet 2(12):e190
Payseur BA, Rieseberg LH (2016) A genomic perspective on hybridization and speciation. Mol Ecol 25(11):2337–2360
Pedersen TL (2023) patchwork: The Composer of Plots. https://patchwork.data-imaginist.com
Pickrell JK, Pritchard JK (2012) Inference of population splits and mixtures from genome-wide allele frequency data. PLoS Genet 8(11):e1002967
Pina-Martins F, Silva DN, Fino J, Paulo OS (2017) Structure_threader: An improved method for automation and parallelization of programs structure, fastStructure and MavericK on multicore CPU systems. Mol Ecol Resour 17(6):e268–e274
Pogorevc N, Dotsev A, Upadhyay M, Sandoval-Castellanos E, Hannemann E, Simcic M et al. (2024) Whole-genome SNP genotyping unveils ancestral and recent introgression in wild and domestic goats. Mol Ecol 33(1):e17190
Porto Neto LR, Barendse W (2010) Effect of SNP origin on analyses of genetic diversity in cattle. Anim Prod Sci 50(8):792–800
R Core Team. (2023) R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing: Vienna, Austria. https://www.r-project.org
Racimo F, Sankararaman S, Nielsen R, Huerta-Sanchez E (2015) Evidence for archaic adaptive introgression in humans. Nat Rev Genet 16(6):359–371
Radwan J, Babik W, Kaufman J, Lenz TL, Winternitz J (2020) Advances in the evolutionary understanding of MHC polymorphism. Trends Genet 36(4):298–311
Raj A, Stephens M, Pritchard JK (2014) fastSTRUCTURE: variational inference of population structure in large SNP data sets. Genetics 197(2):573–589
Ring SC, Purfield DC, Good M, Breslin P, Ryan E, Blom A et al. (2019) Variance components for bovine tuberculosis infection and multi-breed genome-wide association analysis using imputed whole genome sequence data. PLoS ONE 14(2):e0212067
Rosen BD, Bickhart DM, Schnabel RD, Koren S, Elsik CG, Tseng E et al. (2020) De novo assembly of the cattle reference genome with single-molecule sequencing. Gigascience 9(3):giaa021
Salter-Townshend M, Myers S (2019) Fine-scale inference of ancestry segments without prior knowledge of admixing groups. Genetics 212(3):869–889
Santos PS, Kellermann T, Uchanska-Ziegler B, Ziegler A (2010) Genomic architecture of MHC-linked odorant receptor gene repertoires among 16 vertebrate species. Immunogenetics 62(9):569–584
Schnabel RD (2018). ARS-UCD1.2 Cow Genome Assembly: mapping of all existing variants. https://www.animalgenome.org/repository/cattle/UMC_bovine_coordinates
Sempéré G, Moazami-Goudarzi K, Eggen A, Laloë D, Gautier M, Flori L (2015) WIDDE: a Web-Interfaced next generation database for genetic diversity exploration, with a first application in cattle. BMC Genomics 16:940
Slowikowski K (2023) ggrepel: Automatically Position Non-Overlapping Text Labels with ‘ggplot2’. https://ggrepel.slowkow.com
Steverding D (2008) The history of African trypanosomiasis. Parasit Vectors 1(1):3
Sun L, Qu K, Liu Y, Ma X, Chen N, Zhang J et al. (2023) Assessing genomic diversity and selective pressures in Bashan cattle by whole-genome sequencing data. Anim Biotechnol 34(4):835–846
Syrstad O (1985) Heterosis in Bos taurus × Bos indicus crosses. Livest Prod Sci 12(4):299–307
Tan T, Atkinson EG (2023) Strategies for the genomic analysis of admixed populations. Annu Rev Biomed Data Sci 6:105–127
Taylor SA, Larson EL (2019) Insights from genomes into the evolutionary importance and prevalence of hybridization in nature. Nat Ecol Evol 3(2):170–177
Tigano A, Friesen VL (2016) Genomics of local adaptation with gene flow. Mol Ecol 25(10):2144–2164
Upadhyay M, Bortoluzzi C, Barbato M, Ajmone-Marsan P, Colli L, Ginja C et al. (2019) Deciphering the patterns of genetic admixture and diversity in southern European cattle using genome-wide SNPs. Evol Appl 12(5):951–963
Upadhyay MR, Chen W, Lenstra JA, Goderie CR, MacHugh DE, Park SD et al. (2017) Genetic origin, admixture and population history of aurochs (Bos primigenius) and primitive European cattle. Heredity (Edinb) 118(2):169–176
Utsunomiya YT, Milanesi M, Fortes MRS, Porto‐Neto LR, Utsunomiya ATH, Silva MVGB et al. (2019) Genomic clues of the evolutionary history of Bos indicus cattle. Anim Genet 50(6):557–568
van den Brand T (2023) ggh4x: Hacks for ‘ggplot2’. https://teunbrand.github.io/ggh4x/
Verdugo MP, Mullin VE, Scheu A, Mattiangeli V, Daly KG, Maisano Delser P et al. (2019) Ancient cattle genomics, origins, and rapid turnover in the Fertile Crescent. Science 365(6449):173–176
Wang K, Lenstra JA, Liu L, Hu Q, Ma T, Qiu Q et al. (2018) Incomplete lineage sorting rather than hybridization explains the inconsistent phylogeny of the wisent. Commun Biol 1(1):169
Wang LG, Lam TT, Xu S, Dai Z, Zhou L, Feng T et al. (2020a) Treeio: an R package for phylogenetic tree input and output with richly annotated and associated data. Mol Biol Evol 37(2):599–603
Wang MS, Thakur M, Peng MS, Jiang Y, Frantz LAF, Li M et al. (2020b) 863 genomes reveal the origin and domestication of chicken. Cell Res 30(8):693–701
Ward JA, McHugo GP, Dover MJ, Hall TJ, Ng’ang’a SI, Sonstegard TS et al. (2022) Genome-wide local ancestry and evidence for mitonuclear coadaptation in African hybrid cattle populations. iScience 25(7):104672
Wickham H (2009) ggplot2: Elegant Graphics for Data Analysis. Springer, New York
Wickham H (2023) stringr: Simple, Consistent Wrappers for Common String Operations. https://stringr.tidyverse.org
Wickham H, François R, Henry L, Müller K, Vaughan D (2023a) dplyr: A Grammar of Data Manipulation. https://dplyr.tidyverse.org
Wickham H, Hester J, Bryan J (2023b) readr: Read Rectangular Text Data. https://readr.tidyverse.org
Wickham H, Pedersen TL, Seidel D (2023c) scales: Scale Functions for Visualization. https://scales.r-lib.org
Wickham H, Vaughan D, Girlich M (2023d) tidyr: Tidy Messy Data. https://tidyr.tidyverse.org
Wilke CO, Wiernik BM (2022) ggtext: Improved Text Rendering Support for ‘ggplot2’. https://wilkelab.org/ggtext
Wragg D, Cook EAJ, Latre de Late P, Sitt T, Hemmink JD, Chepkwony MC et al. (2022) A locus conferring tolerance to Theileria infection in African cattle. PLoS Genet 18(4):e1010099
Wu DD, Ding XD, Wang S, Wojcik JM, Zhang Y, Tokarska M et al. (2018) Pervasive introgression facilitated domestication and adaptation in the Bos species complex. Nat Ecol Evol 2(7):1139–1145
Wu J, Liu Y, Zhao Y (2021) Systematic review on local ancestor inference from a mathematical and algorithmic perspective. Front Genet 12:639877
Yaro M, Munyard KA, Stear MJ, Groth DM (2016) Combatting African Animal Trypanosomiasis (AAT) in livestock: The potential role of trypanotolerance. Vet Parasitol 225:43–52
Yu G (2022) Data Integration, Manipulation and Visualization of Phylogenetic Trees, 1st edn. Chapman and Hall/CRC: New York
Zeder MA (2017) Out of the Fertile Crescent: The dispersal of domestic livestock through Europe and Africa. In: Petraglia M, Boivin N and Crassard R (eds) Human Dispersal and Species Movement: From Prehistory to the Present. Cambridge University Press: Cambridge, pp 261-303.
Zegeye T, Belay G, Vallejo-Trujillo A, Han J, Hanotte O (2023) Genome-wide diversity and admixture of five indigenous cattle populations from the Tigray region of northern Ethiopia. Front Genet 14:1050365
Ziegler A, Santos PS, Kellermann T, Uchanska-Ziegler B (2010) Self/nonself perception, reproduction and the extended MHC. Self Nonself 1(3):176–191
Acknowledgements
We thank Morris Agaba, Olivier Hanotte, Stephen J. Kemp, John A. Browne, Daniel G. Bradley, and Stephen V. Gordon for assistance with sample resources and for useful scientific discussion. This research work was funded by Science Foundation Ireland (SFI) under Investigator Programme Awards (grant nos: SFI/01/F.1/B028 and SFI/15/IA/3154). JAW was supported by the Centre for Research Training in Genomics Data Science (grant no. SFI/18/CRT/6214).
Author information
Authors and Affiliations
Contributions
GPM was responsible for analysis, data curation, lab work, interpretation of results, study design, visualisation, and writing—original draft. JAW was responsible for data provision, lab work, interpretation of results, and writing—review & editing. SIN was responsible for data provision, interpretation of results, and writing—review & editing. LAF was responsible for data provision, interpretation of results, and writing—review & editing. MST was responsible for interpretation of results, software provision, and writing—review & editing. EWH was responsible for lab work, sample collection and provision, and writing—review & editing. GMO was responsible for lab work, sample collection and provision, and writing—review & editing. KGM was responsible for lab work, sample collection and provision, and writing—review & editing. TJH was responsible for guidance and writing—review & editing. DEM was responsible for data provision, funding acquisition, lab work, interpretation of results, sample collection and provision, study design, supervision, and writing—original draft.
Corresponding author
Ethics declarations
Competing interests
The authors declare no competing interests.
Research Ethics Statement
For this study new Illumina® BovineHD 777 K BeadChip SNP data sets were generated for 39 individuals (23 Somba, 8 N’Dama and 8 Boran). The Somba individuals were obtained from DNA samples that were previously published as part of microsatellite-based surveys of cattle genetic diversity in the early 1990s and the N’Dama and Boran individuals were obtained from unpublished DNA samples collected during a time-course infection experiment carried out in 2003. This livestock DNA sampling work was completed prior to the requirement for Institutional Permission in Ireland, which is based on European Union Directive 2010/63/EU; however, all efforts were made to ensure ethical handling of all animal subjects.
Additional information
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Associate editor: Armando Caballero.
Supplementary information
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
McHugo, G.P., Ward, J.A., Ng’ang’a, S.I. et al. Genome-wide local ancestry and the functional consequences of admixture in African and European cattle populations. Heredity 134, 49–63 (2025). https://doi.org/10.1038/s41437-024-00734-w
Received:
Revised:
Accepted:
Published:
Version of record:
Issue date:
DOI: https://doi.org/10.1038/s41437-024-00734-w








