Introduction

Horse power revolutionized human history, arming ancient societies with fast mobility, new ways to make war and farm1. As a prominent symbol of prestige, the horse also deeply impacted past social systems2 by becoming a central element in elite rituals3. While once a matter of intense scientific debates, the domestication homeland has now been identified in the steppes of the lower Don-Volga in the second half of the third millennium Before Common Era (BCE)4. This process involved close reproductive control, shortened generation times, and a severe demographic bottleneck between ~2700 BCE and ~2200 BCE5. This was followed by a rapid expansion across Eurasia ~2200 BCE, initiating the era of large-scale horse-based mobility in human history, ultimately resulting in the near-total replacement of local horse lineages4,5. As a result, all modern domestic horses living on the planet belong to the DOM2 lineage.

Earlier archaeological evidence from Botai, Kazakhstan (~3500 BCE) has suggested horse milking, harnessing and corralling, indicative of horse husbandry6. This interpretation aligns well with the genetic signatures of reproductive control found on site5. The underlying process was, however, almost completely independent genetically from DOM2 domestication, and remained geographically isolated, with Przewalski’s horses as its only surviving descendants7.

Today, the vast majority of horses belong to domestic bloodlines or their feral relatives. This stands in stark contrast to the rich diversity of wild horse populations that once roamed Eurasia at the time of domestication4,5,8. When, how, and why these wild lineages disappeared, and whether they contributed to the genetic makeup or characteristics of domesticated horses, remain open questions.

Iberia (present-day Spain and Portugal) provides a striking example of this lost diversity. Recent work has revealed the presence of a genetically-divergent, now-extinct lineage in the region—IBE—morphologically similar to other horses2. IBE carried genetic ancestry from a highly divergent, uncharacterized ghost lineage, particularly evident in its unique Y-chromosome haplogroup2. To date, only six IBE samples have been genetically characterized, spanning the Early Neolithic (Cova Fosca, ~5180 BCE)4, to the Bronze Age (El Acequión, ~1900 BCE)2. Although archaeological bone assemblages throughout the Upper Palaeolithic, Mesolithic and Neolithic periods have suggested favourable climatic conditions for horses in Iberia9,10,11, the question of whether the area was only inhabited by IBE horses remains unresolved, owing to the limited genetic data available. Key uncertainties also persist regarding (i) the timing of IBE extinction and potential local domestication, possibly as early as the Late Neolithic-Chalcolithic transition12,13,14, and; (ii) the introduction of DOM2 bloodlines in Iberia, with the earliest genetically characterized DOM2 horse dating to the Early Iron Age (Els Vilars, ~8th–7th centuries BCE)2,4. While the genetic legacy of IBE into modern domestic bloodlines appears minimal, the extent to which past breeding practices were permissive to admixture between DOM2 and local IBE horses remains largely unknown15. Additionally, Iberian horses were highly valued in Antiquity, as attested by Greek and Latin sources describing their military use until the first century BCE and their prominence in Roman chariot racing16,17. During the Iron Age, they served as riding and working animals, as a symbol of prestige with a prominent role in ritual practices, and, to a lesser extent, as a source of meat3. However, how Iberian bloodlines evolved in later periods remains unclear, particularly during the ~800-year-long Muslim rule of al-Andalus18, and the colonization of the New World19.

In this study, we generated genome-wide data from 87 archaeological horse remains to map the genetic diversity of horses in Iberia and across the Mediterranean during the last ~26,800 years. We identified the IBE lineage in Iberian horses from the Upper Palaeolithic, with the latest confirmed individual dating to ~571 BCE (734–408 BCE). The earliest DOM2 Iberian horse in our dataset dated to ~1213–1048 BCE, though the lineage arrived earlier, as evidenced by multiple admixture events with IBE beginning around ~1842 BCE (1973–1711 BCE), and continuing until ~358 BCE (557–159 BCE). This suggests that domesticated horses were not managed in complete isolation from wild populations and that IBE likely became extinct by the last quarter of the first millennium BCE. Finally, DOM2 horses across Europe exhibited strong genetic ties to Iberian populations between ~1200 BCE and ~500 CE, including during Roman times. This underscores Iberia as a major horse-breeding centre throughout the Iron Age and Antiquity, an influence that endured until the colonization of the Americas.

Results

Genome data set

A total of 87 horse archaeological remains, including petrosal bones, teeth and postcranial elements, were extracted for DNA at the CAGT ancient DNA facilities (Toulouse, France) (Fig. 1; Supplementary Data 1). Low-depth sequencing of triple-indexed DNA libraries on the Illumina MiniSeq instrument revealed 26 females and 61 males showing endogenous DNA content compatible with relatively economical genome characterization by shotgun sequencing (median = 27.21%, range = 2.03–89.75%) (Supplementary Data1). mapDamage220 base compositional profiles, and PMDtools21 nucleotide mis-incorporation profiles, were aligned with the molecular tools used for generating ancient DNA sequences, including USER enzymatic treatment and DNA library construction22, supporting data authenticity (Supplementary Figs. 1, 2 and 3). A total of 18.46 billion read pairs were produced on Illumina HiSeq4000 and NovaSeq 6000 instruments, of which 4.55 billion could be uniquely aligned with confidence against the horse reference genome, resulting in genome depth-of-coverage values between 0.10-fold and 4.12-fold (median = 0.97-fold; Supplementary Data 1).

Fig. 1: Geographic locations and genetic group labels associated with the samples studied.
figure 1

a Worldwide distribution of the horse genome panel analyzed in this study, restricted to 347 ancient specimens (Supplementary Data 1). Those specimens sequenced for the first time or previously reported are shown with open triangles and circles, respectively. Group labels are defined and coloured according to previous studies, with short names indicated between brackets. b Zoom across Europe and the Mediterranean range. Numbers refer to individual archaeological sites, as indicated in Supplementary Data 1. Some of the numbers pointing to northeastern Iberian sites are framed to enhance visibility. c Schematic representation of the Iberia timeline cultures. Newly sequenced samples from outside Iberia are shown at the bottom, with grey bars representing underlying chronologies.

The vast majority of the newly characterized genomes originated from Iberia (N = 63) with the remaining 24 originating from specimens excavated in France (N = 16), Italy (N = 3), Morocco (N = 4) and Israel (N = 1), to map possible Iberian genetic affinities and trade networks across the Mediterranean (Fig. 1b, c). The new genomes covered the Late Pleistocene (N = 4; ~25,265–14,179 BCE) to the Modern Period (N = 1; >1656 CE), with a majority, however, belonging to the Iron Age (N = 48), Roman Republic (N = 11), and the Roman Empire/Late Antiquity time periods (N = 5). The newly characterized genomes were combined with a total of 261 ancient horse genomes previously characterized encompassing the whole range of genetic lineages found across Eurasia (Fig. 1a; Supplementary Fig. 4; Supplementary Data 1). They were supplemented with the genomes of 83 modern domestic breeds from around the world, one modern and one historical Przewalski’s horse, and two donkey outgroups, to investigate the genetic landscape of Iberian horses from before and after the Last Glacial Maximum (LGM; 26.5-20/19 kyrs cal. Before Present (BP))23, as well as their possible legacy to the present day.

IBE survived in Iberia throughout LGM to the Iron Age

Pseudo-haploidization of sequence data led to the recovery of 7.12 million autosomal transversion positions in at least 80% of specimens considered, including donkey outgroups (median missingness = 7.614%; range = 0.003–92.890%). Principal Component Analysis (PCA), excluding outgroups and divergent Late Pleistocene American horses, revealed that the majority of the newly sequenced specimens (N = 73) co-plotted with DOM2 (D2) (Fig. 2a, Supplementary Figs. 5, 6 and 7). Strikingly, the remaining 14 specimens occupied the most positive and negative scores along Principal Components 1 and 2 (PC1 and PC2), respectively, together with previously identified IBE horses2. PC2 scores discriminated mostly those lineages from the Ural Mountains (UR), Central Asia (BOTAI, BO; BORLY4, BL; NEOBOR, NB; NOVOLINKA, NV; KROKALEV, KR, and; TERSEK, TE), to northeastern Siberia (Equus lenensis, EL) (Supplementary Data 1).

Fig. 2: Autosomal genetic affinities.
figure 2

a Principal Component Analysis (PCA) of 431 ancient and modern specimens based on unlinked autosomal transversions (MAF = 1%). The first two principal components summarize 1.67% and 1.12% of the total genetic variation, respectively. b Neighbor-Joining phylogenetic tree based on the full set of autosomal transversions. Node support was assessed from 100 bootstrap pseudo-replicates, and is indicated if greater or equal to 80%. The DOM2 clade (N = 311) was collapsed for readability (see Supplementary Fig. 8 for the full tree). Colours and symbols are according to Fig. 1a.

Autosomal Neighbor-Joining phylogenetic reconstruction confirmed that the members of the IBE cluster formed a monophyletic group, receiving maximal (100%) bootstrap support (Fig. 2b; Supplementary Fig. 8). One Middle Bronze Age mare from Moncín (MNP20_Spa_m1558), however, slightly deviated from the other PCA cluster members. It was the first to diverge within the IBE phylogenetic clade, potentially reflecting a more complex genomic make-up.

The autosomal phylogeny also confirmed that several lineages of various genetic affinities once inhabited Europe. IBE showed closest affinities to lineages of Late Pleistocene from southern France (SF), as well as from northern France and Britain (NF). Another related cluster, more divergent from IBE, was also formed by horses from the fourth to third millennium BCE of Denmark (FB), Germany (CWC), Poland (POL), and the Carpathian basin (Czechia, CZ, and; Hungary, HU) (Fig. 2b; Supplementary Fig. 8).

The mitogenome phylogeny displayed, however, a different pattern compared to the autosomal phylogeny, in line with previous reports indicating incomplete lineage sorting and limited phylogeographic resolution for mitochondrial DNA in horses24. Interestingly, the majority of specimens showing autosomal affinity to IBE, including the distinctive MNP20_Spa_m1558 specimen, grouped together within three strongly supported mitochondrial clusters, showing SF or NF members as their closest relatives (Fig. 3a, Supplementary Fig. 9). In contrast, all of the eight newly characterized IBE male specimens grouped together with the three previously sequenced, forming a Y chromosomal clade supported by 100% bootstrap values (Fig. 3b, Supplementary Fig. 10). The latter cluster also included a specimen from Denmark (DJM130x6_Den_ m3011; FB), dated to the transition of the fourth-to-third millennium BCE. This specimen was known for carrying one of the most divergent Y chromosomal haplotypes in Eurasia2,4, now linking to IBE.

Fig. 3: Maximum Likelihood (ML) phylogenetic relationships for uniparental markers.
figure 3

a Mitochondrial DNA. b Y-chromosome. Tip labels were removed for clarity, except for outgroups and members of the IBE lineage, either isolated or grouping within larger, supported monophyletic clades. Fully annotated trees are provided in Supplementary Figs. 9 and 10. Node support was assessed from 1,000 bootstrap pseudo-replicates and ultrafast bootstrap approximation, and is indicated if greater or equal to 80%. Colours and symbols are according to Fig. 1a.

Combined, our autosomal, mitochondrial and Y-chromosomal phylogenetic analyses reveal that the divergent IBE lineage was genetically related to other wild lineages from western Europe, especially from France (Late Pleistocene). It was present in Iberia from at least ~26.8 kya (Z51_Spa_m24857)25 to ~2.5 kya (CBC11_Spa_m571) (Supplementary Data 1), which considerably extends its temporal timeline, previously documented between ~7.2–3.9 kya2,4. The genetic continuity observed between the Late Pleistocene to Early Iron Age also demonstrates that the IBE lineage survived during the LGM in an Ice Age refugium located south of the Pyrenean mountains, in line with climate niche modelling indicating palaeoenvironmental conditions viable for horses26.

Interestingly, IBE δ13C and δ15N isotopic profiles remained stable through time from the Late Pleistocene to ~2.5 kya, suggesting limited changes in diet and habitat after the LGM. While IBE δ13C values were within the range measured in other horse genetic lineages (Supplementary Fig. 11a), their δ15N values occupied the lower distribution range (Supplementary Fig. 11b). The δ13C and δ15N values in herbivores usually reflect those of the plants they consume, which vary with a range of natural and anthropogenic practices27. In a C3-dominated environment such as Iberia, plant δ13C and δ15N values will primarily vary depending on plant type (with cereals showing higher δ13C values, versus lower δ¹⁵N values for legumes), plant parts (with leaves and stems showing higher δ13C and δ15N values relative to grains), and habitat openness (with forested area associated with lower δ13C and δ15N values)28. Climate also plays a dominant role on δ15N values with plants growing in drier exhibiting higher measurements27. Last, domesticated animals consuming protein-rich diet or plants grown on soils fertilized with manure either intentionally, or as a result of repeated pasture use, show increased δ15N values27. The consistent δ15N differences between IBE and DOM2 horses across multiple sites and time periods, combined with the similar δ13C values between lineages, suggest systematic differences in habitat use, forage type, or physiology rather than purely regional effects. The low δ¹⁵N values of IBE horses align with those of other Pleistocene horses (such as LPSFR (SF)), supporting continuity in habitat and forage. In contrast, DOM2 horses exhibit increasingly higher δ¹⁵N values and a broader δ13C range, potentially due to foraging in open grasslands and increasingly human-controlled diet or habitats. We conclude that the observed differences could result from a combination of (1) distinct habitats (wetter forests for IBE, drier grasslands for DOM2), and; (2) differing diets (lower-protein forage in natural habitats such as legumes and grass for IBE, compared to increasingly anthropogenically-impacted habitats with higher-quality cereals and grasses for DOM2).

Horse genetic affinities in Europe pre-domestication

To further explore the relationship between IBE and other wild horse populations in Europe, we applied AdmixtureBayes29 graph modelling to a selection of seven major genetic horse lineages identified in Europe to date4,5. Lineages from the Carpathian basin and Poland, which cluster together in our autosomal phylogenetic tree (Fig. 2b, Supplementary Fig. 8), and are known to share genetic affinities with the Late Neolithic CWC5 lineage from Germany, were excluded to limit model complexity and computational time, while improving MCMC chain mixing efficiency.

The posterior graph distribution was dominated by a single model, representing 56.5% of the sampled graphs (Fig. 4a), with only two additional models occurring at frequencies above 5% (7.0% and 6.4%, respectively) (Fig. 4b, c). All three models depict a complex history of lineage divergence and admixture, in which Late Pleistocene populations from northern France and UK (LPNFR, NF) emerged as some of the earliest lineages investigated to expand and differentiate into Europe (divergence node n8). The descending populations further differentiated into distinct regional lineages, shaped by contributions from other, more divergent groups. For instance, the three models consistently support the IBE lineage tracking 1.7–6.0% of its ancestry back to a divergent population best modelled as the root and showing some of the largest branch drift estimates (admixture node A1; Fig. 4a–c). The deep branching of this contribution indicates the presence of an ancient horse lineage that had arrived in Europe much earlier, and survived in the Iberian climatic refugium. Furthermore, the Eneolithic population from Romania (ENEOROM, NR) is modelled to primarily derive from a node also forming the URAL (UR) population (n1 on the first two models, and n6 on the third). This node branches deep in the history of lineage divergence (n10), prior to the radiation leading to all other lineages from Europe, indicating yet another expansion into Europe than that forming NF and related descending populations.

Fig. 4: AdmixtureBayes29 population graph modelling of seven population groups representing major genetic horse lineages in Europe.
figure 4

a Population graph associated with the top posterior probability (0.565). b Population graph showing the second largest posterior probability (0.070). c Population graph showing the third largest posterior probability (0.064). No other population graphs account for more than 0.033 of the posterior distribution. Genetic drift estimates are reported along each branch, multiplied by 100, for readability, with values ≥ 10 highlighted in bold. Divergence nodes (n), exempt of admixture, are shown with ellipses coloured in dark grey, while admixture nodes (A) are shown with squares coloured in light grey, with percentages reflecting genetic contribution from the main contributing source. The fraction contributed by the second admixture source is the complement of this percentage to 100%. Branches associated with a proportionally larger admixture proportion are highlighted.

In all three models, Late Pleistocene populations from southern France (LPSFR, SF) derived most of their ancestry from the same expansion that led to the formation of NF populations (admixture node A3), with 77.6–80.9% originating to node n9, immediately ancestral to n8. However, the remaining fraction of the SF genetic makeup originated from a population directly related to IBE (n2). As unadmixed members of the IBE lineage were only found in Iberia, thus likely differentiated in Iberia, this suggests that an IBE-related population expanded northward into southern France, where it formed SF by mixing with its local closest geographic neighbours.

The three models differ in the population history underlying the formation of the NR lineage and those occupying Germany ~2700 BCE (CWC) and Denmark ~3000 BCE (FBPWC, FB). The first two models indicate that CWC- and FB-related lineages participated to NR formation, located further east, likely as populations related to both lineages expanded eastward towards Romania and mixed with the divergent local lineage (admixture node A6; Fig. 4a, b). In contrast, the third model supports a reverse scenario, as an NR-related lineage (node n4) appears central in the formation of both CWC and FB lineages. This implies that a population related to the local lineage from Romania (NR) spread westward into Germany and Denmark, where it admixed with the descendants of the earlier expansion that established the NF lineage (Fig. 4c). While further genome-wide time-series are needed to fully resolve the complexity of these expansions and admixture events, the three population graphs reveal a history of significant genetic drift leading to FB and CWC populations. This aligns with early-to-mid Holocene bone assemblages, which indicate that horse populations were small and sparsely distributed across the continent during this period30.

The rise of DOM2 horses in Iberia

PCA and phylogenetic analyses identified specimen MNP20_Spa_m1558 as an outlier member of the IBE lineage. This was confirmed by qpADM31 modelling strongly rejecting MNP20_Spa_m1558 as entirely descending from the single IBE cluster (p-values = 0; Supplementary Data 2). Furthermore, ADMIXTURE32 analyses based on the optimal K = 2 number of ancestry components identified by cross-validation, indicated that the ancestry component maximized in IBE individuals (and other non-DOM2 lineages) accounted for only 79.7%-80.2% of its genetic ancestry (Supplementary Fig. 12). Considering the same number of components, the Struct-f433 profile of sample MNP20_Spa_m1558 also comprised a higher fraction of the ancestry component maximized in DOM2 horses than other IBE horses (Supplementary Fig. 13). We, thus, explored other possible contributions than IBE, considering all other genetic lineages identified in Europe, as well as DOM2.

qpADM modelling implementing the ‘rotating’ strategy from Harney and colleagues34 testing one possible source at a time while considering all others as outgroup (‘right’) populations, revealed DOM2 as the only compatible source (p-values > 0.01; Supplementary Data 2). Population graph modelling with OrientAGraph35, and including all known possible candidate sources from Europe, also supported the presence of gene flow from the DOM2 lineage into MNP20_Spa_m1558, indicating an admixture proportion of 30.6–34.4% (Supplementary Figs. 14 and 15). These analyses align with D-statistics of the form (EA, IBE; DOM2*, MNP20_Spa_m1558), explicitly testing whether MNP20_Spa_m1558 shares more derived mutations with members of the IBE lineage relative to the earliest DOM2 members known in Europe (DOM2*). The significantly positive D-statistics support an excess of IBE ancestry in MNP20_Spa_m1558 (Fig. 5a; Supplementary Fig. 16), reaching a fraction of 58.91% (range=57.8–60.0%), according to F4-ratios31 (Z-score=36.83; Fig. 5c).

Fig. 5: Admixture between IBE and DOM2 horses.
figure 5

a Significant D-statistics tests of the form (EA, IBE; DOM2*, X), where IBE (N = 19), X is an ancient horse (N = 16), and DOM2* define the earliest European DOM2 horses present in our genome panel (N = 8). Results are shown in red and presented as D values ± standard errors. The statistical significance was assessed following Bonferroni correction for multiple testing. b DATES36 estimates for the timeline of admixture between DOM2 and IBE. Midpoint estimates are shown in black, with confidence intervals in orange. Crossed circles indicate the radiocarbon date of the specimens analyzed, or the age inferred from archaeological context. c F4-ratio31 estimates for IBE (pink) and DOM2 (orange) genetic contributions in four successfully modelled individuals. d Geographic location of the specimens showing evidence of IBE and DOM2 admixture, with those delivering unreliable DATES36 estimates shown with smaller bold, and italicized fonts. Source data for Fig. 5a are provided as a Source Data file.

The predominance of IBE horses in Iberia until the second millennium BCE contrasts with their almost disappearance from our data set during the Iron Age. Only a single of the 20 IBE specimens investigated here (including MNP20_Spa_m1558, plus six previously published) was radiocarbon dated to the first millennium BCE (CBC11_Spa_m571, 734–408 BCE), in contrast to 48 DOM2 horses from Iberia (38 from this study plus 10 previously published, and associated with Iron Age and Roman Republic contexts) (Fig. 1bc; Supplementary Data 1). This indicates a history of genetic turnover following the expansion of DOM2 horses, as shown for other wild lineages across Eurasia4,5.

Importantly, significantly positive D-statistics of the form (EA, IBE; DOM2*, X) indicated an excess of IBE ancestry in only 15 DOM2 specimens present in our data set (plus MNP20_Spa_m1558; Fig. 5a; Supplementary Fig. 16). Four of these specimens were not from Iberia, revealing indirect genetic influence of IBE outside their native range, potentially through trade and/or exchange of admixed DOM2 horses originally bred in Iberia. These included one sample from the Roman settlement of Lixus, Morocco (LIX01_Mor_329; 255-402 CE), and three samples from Iron Age settlements in Sicily (Motya, MO2_Ita_m450; 475-425 BCE), southeast Britain (Chalk Hill, Chalk3_UK_m474; 541-406 BCE), and southern France (Pech501_Fra_m200; 200 BCE). The latter was excavated at the Pech Maho oppidum in southern Gaul, approximately 130 kilometres north of the Greek colony of Emporion (Empúries, northeastern Iberia). The site has yielded material evidence of Iberian contacts between ~500 and 200 BCE (Supplementary Text). Statistical power was sufficient to estimate that the IBE lineage contributed to 1.22% of the genomic makeup of the admixed specimen from Pech Maho (Z-score = 134.72; Fig. 5c), and 1.65–1.90% (Z-scores = 74.04 and 102.61) in two other specimens from Spain (including MR7_Spa_m1131 from the final Bronze Age settlement of El Morredón, which represents the earliest Iberian DOM2 horse in our data set (1213–1048 BCE). While Z-scores were also largely significant in all other 12 admixed individuals tested (69.97 ≤ Z-scores≤130.57), the confidence range estimated for the IBE ancestry could intercept zero, or even extend into negative values. This suggests limited sensitivity and that IBE genetic contribution to the genomes of the remaining horses with non-zero D-statistics was below the minimum successfully estimated value (i.e., 1.22%).

DATES36 analyses for estimating the admixture time between DOM2 and IBE lineages showed significant model fits for all admixed candidates (Fig. 5b), except LIX01_Mor_329 and Chalk3_UK_m474, which were, thus, conservatively considered false-positives (Supplementary Data 3, Fig. 5d). Admixture times were distributed across most of the mid-first millennium BCE, and extended into the first half of the second millennium BCE (Fig. 5b; Supplementary Data 3). The fact that multiple confidence intervals show no overlap suggests several independent admixture events punctuating the Bronze Age and the Iron Age. These findings support husbandry practices allowing almost continuous contact with wild horses. Remarkably, the earliest admixture date was obtained on the MNP20_Spa_m1558 specimen from Moncín, Spain, and was estimated to ~1842 BCE (range = 1973–1711 BCE). This indicates that DOM2 horses have entered the region in the early second millennium BCE, only a few centuries after their initial domestication in the lower Don-Volga basin (Supplementary Data 3).

D-statistics calculations showed that none of the modern breeds investigated shared an excess of IBE ancestry (Supplementary Fig. 17). This supports previous reports of limited IBE genetic legacy into modern domestic horses2, likely due to a relatively early extinction of this lineage. The absence of measurable IBE-related genetic ancestry in the DOM2 ancient horses present in our data set after ~358 BCE (midpoint; range = 557–159 BCE), suggests an extinction time by the end of the Iron Age, and likely prior to the Roman period.

The genetic influence of Iberian horses from the Iron Age onwards

Historical sources highlight the exceptional value of Iberian bloodlines from Antiquity to the Modern Period16,17,37,38,39. To investigate the potential influence of Iberian horse breeding over time, we analyzed ancient DOM2 horses lacking IBE ancestry, covering a period from ~2125 BCE to the modern era (Supplementary Data 1). We used f3-outgroup statistics with donkeys (EA) as an outgroup (Supplementary Fig. 18), to measure genetic affinities between each individual DOM2 horse from Europe and the Mediterranean, and the remaining DOM2 specimens from other archaeological sites. Pairs of samples from the same site or separated by more than 500 years were disregarded. This filter was especially implemented (1) to avoid only capturing within site affinities, (2) to avoid biasing the analysis due to uneven sampling across sites (i.e., those sites with multiple samples would show local affinities that sites represented with a single sample could not), and; (3) to only compare those horses living relatively close in time. This approach allowed us to detect genetic affinities spanning thousands of kilometres (Supplementary Fig. 19). We then mapped the pairs showing the highest detected affinity, visually representing their connection with colour-coded links proportional to the strength of the genetic relationship (Fig. 6).

Fig. 6: Map of horse genetic affinities.
figure 6

a Before 1200 BCE. b 1200 BCE–500 CE. c After 500 CE. Segments highlight sample pairs associated with maximal f3-Outgroup statistics, with colours proportional to values. Pairs involving individuals from the same site or separated by >500 years were disregarded. Crossed-green circles indicate the origin of the sample tested, while purple circles reflect their closest genetic relative. Samples present in the test panel but returning lower f3-Outgroup statistics with the samples tested are shown with grey crosses. Source data are provided as a Source Data file.

Prior to ~1200 BCE, the number of unadmixed DOM2 specimens available was scarce across western Europe (Fig. 6a). However, Central European samples most often exhibited stronger genetic affinities with individuals from present-day western Russia, Kazakhstan, and Iran, suggesting that domestic horses in Europe were not yet locally bred but instead sourced from genetically similar stocks originating near the domestication centre.

In contrast, from ~1200 BCE to 500 CE, most horses analyzed—both within and outside Iberia—found their closest genetic relatives in Iberia or southeastern France (Fig. 6b). This likely reflects the emergence of locally bred bloodlines in the region, and their subsequent diffusion through trade networks across western Europe. Notably, Iberian genetic influence extended over vast distances, reaching as far north as present-day Estonia (Ridala) and Britain (Magor), as far east as Anatolia (Acemhöyük), and as far south as Morocco (Lixus) and Tunisia (Althiburos) (Fig. 6b).

Although our dataset limited was incompatible with extensive analyses beyond 500 CE, we detected a substantial expansion of Iberian genetic influence, extending to both Asia (southern Kazakhstan, Zhanaturmus, and; northeastern Iran, Belgheis), and the Americas (Kaw River and Chorillo Grande; Fig. 6c). The affinities with Asia likely reflect the shared genomic heritage of Iberian and Oriental bloodlines, previously linked to the spread of horses following the Muslim conquest, which impacted both Europe and Asia2. Meanwhile, the genetic signal in the Americas corresponds to the early introduction of domestic horses by the Spanish and Portuguese colonizers (i.e., the Conquistadors)19.

Discussion

Our work uncovers the pervasive genetic influence of Iberian horses across multiple prehistoric, protohistoric, and historical periods. From at least ~26.8 kya, Iberia played host to the highly divergent IBE lineage of wild horses, a distinct genetic stock that received genetic contribution from an archaic lineage reaching Europe much earlier, and that remained separate from most wild populations found elsewhere in Europe, including France (NF), Britain (NF), Germany (CWC), Denmark (FB), Czechia (CZ), and Romania (NR). Despite its regional uniqueness, IBE-related genetic ancestry extended beyond Iberia in the Late Pleistocene, particularly into neighboring populations in southern France (SF). However, the extent of IBE influence further afield and prior the arrival of domestic horses—especially along the Mediterranean coasts of southern Europe—remains unclear due to the lack of post-LGM genomes from Italy and the Balkans.

The IBE lineage persisted until at least the middle of the first millennium BCE, with the most recent IBE horse in our dataset radiocarbon dated to ~571 BCE (734–408 BCE). Discovered in a pit outside the walls of the settlement at Cabezo de la Cruz (Supplementary Text), its archaeological context suggests that it was a wild horse that had been hunted. The latest detected admixture event between DOM2 and IBE horses (midpoint ~358 BCE; range = 557–159 BCE) indicates that IBE may have survived into the last quarter of the first millennium BCE. However, its survival beyond this period is unlikely, given the absence of IBE genetic ancestry in more recent archaeological remains. Rather than a purely natural extinction, the decline of the IBE lineage may also have been linked to shifts in horse exploitation practices. Fluctuations in population size due to trade networks or warfare could have been significant, especially considering the broad temporal overlap between IBE demise and the Punic wars from the third and second centuries BCE, which opposed Romans and Carthaginians in the Mediterranean theatre16,17. Classical sources indicate that Hannibal mobilized Iberian and African troops before the Second Punic War, transferring over 2000 northern African cavalrymen to Iberia and later, moving a significant subset to Italy40. Such large-scale mobilization may have disrupted long-standing practices of wild horse admixture in the region, ultimately contributing to the extinction of the IBE lineage. More research focusing on the second half of the first millennium BCE is needed to clarify the impact of the Punic Wars on Iberian horse breeding traditions.

Our analyses revealed that the vast majority of horses living in Europe and north Africa between 1200 BCE and 500 CE found their closest genetic relatives within Iberia (or southern France, within a short distance from Iberia). This suggests a period during which locally bred Iberian bloodlines were exported across a broad geographic range, positioning Iberia as a major horse production centre shaping the equine population structure of Europe (and north Africa, although the number of individuals analyzed from this region remains limited). This genetic evidence aligns with archaeological findings at the Els Vilars fortress (Lleida, Spain), demonstrating the ability of Iron Age communities in northeastern Iberia to breed and sacrifice horses on a large scale41,42.

More broadly, the first millennium BCE in Iberia was a time of profound transformation, shaped by both internal developments and external influences from the wider Mediterranean world43. This period saw substantial cultural shifts, military conflicts, and large-scale population movements44, culminating in the integration of Iberia into the Roman Republic, as a vital resource hub45. Achieving finer temporal and geographic resolution of this period—and its impact on horse breeding and trade—will require extensive genomic time-series data from across Europe. Mapping identity-by-descent (IBD) patterns between individuals through advanced statistical techniques, including imputation46, is likely to significantly enhance our understanding. This approach has recently provided unprecedented insights into human mobility in Central Asia47 and Europe48 but remains in its early stages for horses, due to the limited availability of global imputation reference panels49.

The period surrounding the rise and fall of the Roman Empire, which has been extensively studied at the human DNA level50,51 but remains poorly understood in horses, warrants particular attention. Despite the 18 Roman and Late Antiquity specimens analyzed in this study, large-scale patterns of horse mobility during this time remain unclear. Further research could shed light on long-distance movement patterns, potentially paralleling those identified in donkeys, where genetic evidence traces mobility between western Africa and France during the Roman period52. Similarly, applying these techniques to medieval contexts could enhance our understanding of animal exchange as subsistence economies became increasingly localized53.

Finally, as genome time-series continue to expand, the application of imputation-based approaches holds great potential for reconstructing the spread of Oriental bloodlines following the Muslim expansion2, as well as the large-scale translocation of horses across the Atlantic that played a crucial role in establishing colonial power in the Americas19.

Methods

Samples and archaeological contexts

All faunal remains in this study were authorized under the appropriate permits for destructive sampling. Sample acronyms (where available) can be found in the Supplementary Information. Furthermore, each sample has a unique laboratory code linking to associated metadata, including the archaeological or palaeontological site of origin. We processed 87 individuals from the Iberian Peninsula, Morocco, France, Italy and Israel spanning a temporal range from Late Pleistocene ~24,857 BCE to the Modern Period (~1800 CE). Our main aim was to clarify the diversity of horse lineages once populating Iberia, and their potential genetic influence in the global history of horse breeding. A detailed description of the archaeological contexts, sites, and individuals is reported in Supplementary Text, and summarized in Supplementary Data 1, together with their associated genetic groups and the various data partitions considered in the different genetic analyses.

Radiocarbon dating and isotopic analysis

A total of 29 new radiocarbon dates were obtained in this study (Supplementary Text). Dating and δ13C and δ15N isotopic measurements were carried out at the Keck Carbon Cycle AMS Laboratory, UC Irvine, following collagen extraction and ultra-filtration from approximately one gram of bone or dental material. Calibration was performed using OxCalOnline54 and the IntCal20 calibration curve55.

Ancient DNA extraction and sequencing

The laboratory work was performed at the state-of-art ancient DNA facilities of the Centre for Anthropobiology and Genomics of Toulouse (CAGT, University of Toulouse, France), following the procedures from Fages and colleagues2 and Librado and colleagues4 (Supplementary Text). Petrosal bones were prioritized, then teeth and postcranial elements. The overall methodology involved the powdering of a total of 160–720 mg of bone or dental material using the Mixer Mill MM200 (Retsch) Micro-dismembrator. DNA extraction was carried out to facilitate the recovery of even the shortest DNA fragments56. All but one DNA extract was treated with the USER (NEB) enzymatic mix to eliminate a fraction of post-mortem DNA damage (Supplementary Data 1). Libraries were built from double-stranded DNA templates in which two internal indexes were added during adapter ligation, plus an additional external index during PCR amplification. Libraries were amplified, purified, quantified, and pooled at equimolar concentrations for low-depth sequencing on the Illumina MiniSeq instrument (paired-end mode, 2 × 80). After initial screening for library content, those libraries compatible with an economical characterization of the horse genome through shotgun sequencing were selected for deeper sequencing on the Illumina HiSeq4000 instruments from Genoscope (paired-end mode, 2 × 76; France Génomique), and the Illumina NovaSeq 6000 instruments from Novogene Europe (paired-end mode, 2 × 150).

Read processing and mapping

Demultiplexing, trimming and collapsing of the raw FASTQ read pairs were carried out using AdapterRemoval2 (v2.3.0)57 (Supplementary Text). Collapsed and un-collapsed reads were mapped against the horse reference genome EquCab358, appended with the Y-chromosome contigs from Felkel and colleagues59, and the mitochondrial reference genome (NC_001640)60. The alignments were performed using PALEOMIX bam_pipeline version 1.2.1361 and Bowtie2 (v2.3.4.1)62, with the parameters recommended by Poullet and Orlando63. For nucleotide mis-incorporation and DNA fragmentation profile analyses, we used PMDtools (v0.60)21 and mapDamage2 (v2.2.1)20, considering those alignments against chromosome 1 (Supplementary Figs. 2 and 3), or a random subset of 100,000 alignments (Supplementary Fig. 1), respectively. Overall, we generated sequence data for a total of 87 novel ancient horse specimens, with average depth-of-coverage, removing duplicates and filtering out alignment showing mapping qualities inferior to 25, ranging from 0.10-fold to 4.12-fold (median=0.97-fold, with 40 above 1X average depth-of-coverage). Information about the number of independent DNA libraries, sequencing efforts, demultiplexing, endogenous content, (mito)genome coverage, and error rates (as calculated by Librado and colleagues4,5) are provided in Supplementary Data 1. The previously published sequence data of 262 ancient horses, 84 modern horses, plus two donkeys (one ancient, one modern), were processed following the same procedures to obtain a comparative genome panel including outgroups and encompassing all known horse genetic lineages (Supplementary Data 1).

SNP variation was identified using ANGSD (v0.940) (htslib 1.16, build on Dec 10 2022 14:56:47), following the pseudo-haploidization procedure from Taylor and colleagues19. This procedure disregarded low-confidence polymorphisms (-baq 0 -remove_bads 1 -uniqueOnly 1 -minMapQ 25 -minQ 30 -rmTrans 1 -rmTriallelic 1e-4 -SNP_pval 1e-16 -C 50), and sites above the 99.6% site-wise coverage threshold observed across all individuals, resulting in a total of 7,122,531 transversion sites, including outgroups.

Phylogenetic reconstructions

Horse mitochondrial DNA haplotypes were obtained from individual BAM alignments manipulation using BCFtools (v1.19)64 (Supplementary Text). Diploid genotype calling was employed to accommodate heteroplasmy, prevalent in horses60. A multifasta file was generated adding the donkey mitochondrial reference haplotype (Accession nb. CM027722.2)65 as an outgroup. We further excluded the highly repeated region located between positions 16,121–16,370 of the horse mitochondrial reference and those samples with over 50% missing data. Maximum Likelihood phylogenetic inference was performed using IQtree (v1.6.12)66 under GTR + F + R10 substitution model, selected via the Akaike Information Criterion (AIC), and a node support evaluated through 1,000 bootstrap pseudo-replicates and ultrafast bootstrap approximation (Fig. 3a, Supplementary Text; Supplementary Fig. 9).

Alignments against Y chromosomal contigs followed the same procedure as autosomal variation identification (with a threshold correspond to 99.5% of the site-wise coverage distribution across samples), however focusing on the single copy (“sc”) regions defined by Felkel and colleagues59, and excluding sites not covered in at least 10% of specimens. This provided data for a total of 14,992 transversion sites across 249 male individuals, including outgroups, which were converted into a multifasta alignment for phylogenetic reconstruction using IQtree (v1.6.12) with TIMe+R3 substitution model based on Akaike Information Criterion (AIC). Node support was assessed as for mitogenome phylogenetic reconstruction (Fig. 3b, Supplementary Fig. 10; Supplementary Text).

Autosomal phylogeny was generated applying FastMe (v2.1.6.2)67 to the matrix of pairwise genetic distances estimated by Plink v1.9 (--distance square 1-ibs flat-missing)68 from the 7,122,531 transversion sites. Node support evaluated from 100 bootstrap pseudo-replicates using newick utils v1.669 (Fig. 2b, Supplementary Fig. 8; Supplementary Text).

Population structure and gene flow

Principal Component Analysis (PCA) was conducted on pruned autosomal transversions using smartPCA, EIGENSOFT (v6.1.4)70, and restricting to sites present at Minimum Allelic Frequency of 1% (Supplementary Text). Donkey outgroups, two divergent Late Pleistocene horses from north America, as well as sites under linkage disequilibrium (--indep-pairwise 500 10 0.2), were excluded via Plink (v1.9)68, resulting in a final matrix of 3,021,627 transversion sites in 431 individuals (Fig. 2a, Supplementary Figs. 5, 6 and 7).

Pairwise genetic proximity between samples was determined using f3-outgroup statistics31, computed with Calc-f333 using the full matrix of autosomal variation (Supplementary Data 1), plus two donkey outgroups. Standard errors were assessed jackknifing 5 Mb genomic blocks, following Librado and colleagues4. Samples defining individual genetic lineages were grouped together, resulting in N = 36 population groups (Supplementary Data 1), including two donkey outgroups, and N = 311 individual DOM2 horses. f3-outgroup statistics were visualized as heatmap, sorted to reflect the main genetic affinities (Supplementary Fig. 18).

The full matrix of autosomal variation was used for the calculation of D-statistics of the form (EA, IBE; DOM2*, X), where DOM2* represents the earliest DOM2 horses from Europe present in our dataset (N = 8), and X is a sample belonging to the IBEm or the DOM2 genetic lineage (Fig. 5a, Supplementary Fig. 16, Supplementary Fig. 17a). We also calculated D-statistics of the form (EA, PR; DOM2*, X), where PR represents Przewalski’s horses (N = 2), and X represents the full list of modern DOM2 horses present in our genome panel (N = 83) (Supplementary Fig. 17b; Supplementary Data 1; Supplementary Text). D-statistics were calculated using qpDstat (v751)31 with default parameters, turning “f4mode” off. Statistical significance was evaluated using the Z-score threshold and Bonferroni corrections for multiple testing. Individuals showing admixed genomic profiles were analyzed using DATES v401036 to estimate the time of admixture between IBE and four pre-defined groups, representing different combinations of DOM2 horses as proxies for early DOM2 sources (Supplementary Data 1 and 3). In these analyses, we considered all autosomes (numchrom=31), and the following parameters: maxdis 1, binsize 0.001, qbin 10, and lovalfit 0.45, turning runfit and afffit options on.

DOM2 and IBE ancestry fractions were estimated calculating F4-ratios in qpF4ratio (version 310)31, assuming blocks of 5 Mb for standard error estimation. The topology tested was (A, B; X; IBE, Outgroup), where A and B represent a subset of N = 5 DOM2 horses from the 3rd and 2nd millennium BCE, and of N = 40 unadmixed Iberian horses from the 1st millennium BCE and early 1st millennium CE, respectively (Supplementary Data 1). The former provided a proxy for early DOM2 horses, while the latter was selected to approximate the Iberian DOM2 gene pool at the time of admixture with IBE.

Genetic ancestry profiling

Unsupervised ADMIXTURE (v1.3.0)32 analysis was carried out using the pruned SNP matrix (MAF = 1%) to investigate the genetic profile of the horse specimens present in our data set (Supplementary Fig. 12a, b), including or not the two divergent Late Pleistocene horses from north America (LPNAMR, NAM; Supplementary Fig. 12c, d; Supplementary Data 1). The optimal number of K = 2 ancestry components was assessed through cross-validation. Confidence intervals for the proportions of genetic ancestries were determined through 100 bootstrap pseudo-replicates (-B 100 -c 0.0001).

Additionally, f4-statistics were computed using Calc-f433, using the same procedure as for the calculation of f3-Outgroup statistics, but considering more population groups to minimize permutations and running time (Supplementary Data 1). Unadmixed modern DOM2 horses were grouped together, as well as those ancient DOM2 individuals from the same archaeological site, when showing similar PCA placement and ADMIXTURE32 profiles. A total of K = 2 ancestry components were assumed, in line with the optimal number of ancestry components identified by cross-validation in ADMIXTURE32 analyses (Supplementary Fig. 13; Supplementary Text).

Admixture modelling and population graphs

We used qpADM v81031 to model the genomic make-up of sample MNP20_Spa_m1558 (IBEm, identified as a genetic outlier to the IBE genetic cluster), considering various combinations of outgroup (‘right’) populations and potential source (‘left’) populations encompassing the entire range of horse genetic lineages present in Europe (Supplementary Data 1 and 2). To test for possible additional sources than the IBE lineage (Supplementary Data 2), we applied the procedure from Harney and colleagues34, ‘rotating’ one of the sources at a time. Analyses turned off the allsnps option, and significance was set at 0.01.

We also used OrientAGraph (version 1.0)35 to test for possible sources for the complex genomic makeup of sample MNP20_Spa_m1558 (IBEm) (Supplementary Figs. 14 and 15). This analysis included a total of N = 9 population groups from Europe, including CWC, IBE, FBPWC (FB), HUNG (HU), LPNFR (NF), LPSFR (SF), ENEOROM (NR), ENEOCZE (CZ), and Vert311, plus EA as outgroup, and a group of unadmixed ancient DOM2 horses from Iberia (N = 40). The analysis was repeated considering LPNAMR (NAM) as another outgroup, considering up to M = 3 migration edges. A total of 100 SNPs per block (-k) were used for estimation of the covariance matrix, and a round of global rearrangements was carried out after adding all populations (-global).

Population graphs were reconstructed with AdmixtureBayes29 considering 32 parallel MCMC chains for mixing (--MCMC_chains), while limiting the number of possible admixture events to 10 (--max_admixes). The first 50% of 250 million iterations were used as burnin, and the remaining were thinned every 250. In this analysis, the full autosomal TPED matrix was restricted, using Plink v1.968, to seven population groups representing major genetic horse lineages in Europe (IBE, SF, NF, CWC, FB, UR and NR), along with two donkey specimens as outgroups (EA; Fig. 4; Supplementary Data 1; Supplementary Text).