Introduction

Sheep (Ovis aries) were among the earliest domesticated ungulates, with archaeological, zooarchaeological, and genomic evidence converging on the Fertile Crescent as the primary center of domestication during the early Holocene1,2,3,4,5. From these origins, pastoral expansions, trade, and repeated episodes of human-mediated selection diversified sheep into a broad array of breeds specialized for meat, milk, wool, and adaptation to heterogeneous agroecological niches6. Over thousands of years, deliberate breeding and local environmental pressures have left discernible “fingerprints” in the ovine genome — so-called selection signatures — that reflect both ancient and recent responses to natural and artificial selection7,8,9. Systematic detection of these signatures provides a powerful lens for reconstructing the evolutionary and breeding history of populations, identifying candidate genes underlying complex traits, and informing evidence-based improvement and conservation strategies10.

Türkiye occupies a pivotal position in this history and contemporary sheep production. Its geography spans semi-arid steppes, continental plateaus, and Mediterranean littorals, supporting extensive and semi-extensive grazing systems that remain central to rural livelihoods11,12,13. Indigenous Turkish breeds have been shaped by decades to centuries of selection under nutritional constraints, thermal stress, and disease challenges typical of these environments. At the same time, state farms and research institutes have implemented structured crossbreeding and introgression programs to enhance productivity, particularly in wool and lamb production, without compromising local resilience14. This dual imperative, preserving adaptation while elevating output, has produced a rich landscape of indigenous and composite populations that are ideal for comparative genomic analysis15.

Akkaraman (also known as White Karaman) is the most widespread fat-tailed indigenous breed in Central Anatolia. Managed primarily in extensive systems, Akkaraman sheep exhibit notable robustness to feed scarcity, temperature extremes, and endemic disease pressure, coupled with moderate growth and prolificacy. As a genetic reservoir for adaptation to steppe ecosystems, Akkaraman represents a critical reference for understanding the genomic basis of resilience. In parallel, a set of crossbred/composite types has been developed to enhance meat and wool attributes. Karacabey Merino was initiated at the Karacabey State Farm through the crossing of Kivircik with German Mutton Merino, resulting in a line stabilized to produce finer wool and acceptable growth under Turkish conditions. The Central Anatolian Merino (oamer; Orta Anadolu Merinosu) arose from structured introgression of Merino germplasm into local stocks (notably Akkaraman) on Central Anatolian state farms, combining improved fleece quality with environmental hardiness. More recently, the Bandirma Sheep Research Institute established composite meat-type populations, such as Hasmer and Hasak, by integrating genetics from terminal-sire breeds (e.g., German Blackhead Mutton, Hampshire Down) and Merino with Akkaraman. These programs were designed to exploit heterosis and complementarity, improving carcass traits, growth rate, and, where relevant, fleece characteristics while retaining adaptation to Turkish production systems16,17,18,19,20,21.

Despite their economic importance, the genomic architecture that differentiates indigenous and crossbred sheep populations—and the balance between introgressed performance alleles and indigenous adaptation loci—remains incompletely characterized. Importantly, adaptive genomic variation is not limited to single-nucleotide polymorphisms (SNPs). Structural variants (SVs) and copy-number variations (CNVs) can alter gene dosage, disrupt coding sequences, or modify regulatory landscapes and have been repeatedly associated with adaptive phenotypes in livestock, including climate- and production-related traits in sheep. For example, large-scale CNV scans in global sheep populations identified thousands of CNV events. They linked CNV regions to diverse phenotypes, while recent work has shown CNV associations with climatic variables and candidate adaptive genes across worldwide sheep populations. Although SNP-based arrays (including medium-density chips) effectively capture broad population structure and many selection signals, SVs and CNVs represent complementary sources of functional variation that merit consideration in future fine-mapping and follow-up studies22,23.

Classical performance testing and quantitative genetic evaluations provide trait-level estimates (e.g., heritability, breeding values). Still, they do not directly reveal the specific genomic regions and biological pathways shaped by selection. Genome-wide selection signature scans fill this gap by interrogating population genetic patterns that deviate from neutrality24. Three complementary classes of signals are especially informative for livestock: (i) runs of homozygosity (ROH), (ii) haplotype-based metrics of extended haplotype homozygosity (EHH) such as the integrated haplotype score (iHS), and (iii) site-frequency-spectrum (SFS) statistics such as Tajima’s D25,26.

ROH are long, contiguous stretches of homozygous genotypes that arise when chromosomal segments are inherited identically by descent27,28,29. Their frequency, length distribution, and genomic clustering (“ROH islands”) provide insight into recent inbreeding, demographic contractions, and strong directional selection that fixes or nearly fixes haplotypes surrounding advantageous alleles29,30. Shorter ROH often reflect older shared ancestry, whereas very long ROH indicate recent common ancestors or intense selection31. iHS, in contrast, is a within-population haplotype test that contrasts the decay of EHH around the derived versus ancestral allele at a focal SNP32,33. Because incomplete or ongoing sweeps preserve extended haplotypes at elevated frequency around the favored allele, iHS is particularly sensitive to relatively recent selection where both alleles still segregate34. Finally, Tajima’s D summarizes the site-frequency spectrum (SFS) by comparing pairwise nucleotide diversity (π) to the number of segregating sites (S). Negative values—an excess of rare variants—are consistent with recent directional selection (including artificial selection) or population expansion. Positive values—an excess of intermediate-frequency variants — can reflect balancing selection, but in domestic or structured populations, they more commonly indicate past bottlenecks, substructure, or admixture35,36. By integrating ROH, iHS, and Tajima’s D, investigators can triangulate signals across timescales and demographic contexts: ROH captures fixation-proximal events and inbreeding structure; iHS targets ongoing sweeps; and Tajima’s D adds SFS sensitivity that is less dependent on haplotype phase37,38.

Applying selection-scan approaches to Turkish sheep is particularly informative because indigenous and crossbred/composite populations have experienced contrasting evolutionary histories: long-term environmental filtering and traditional management in local breeds versus programmatic selection and introgression aimed at production traits in crossbreds. These divergent histories create distinct genomic footprints, and admixture in crossbred animals reshapes local linkage disequilibrium and allele-frequency spectra, complicating the straightforward interpretation of selection signals. Because many economically important traits (e.g., growth, carcass composition, milk production, wool quality, parasite resistance, and thermotolerance) are highly polygenic, a comparative framework that samples multiple populations and applies complementary SNP- and haplotype-based methods increases robustness and helps distinguish shared from population-specific selection. Identifying recurrently targeted regions across breeds versus population-specific sweeps therefore generates testable hypotheses about core biological pathways and yields candidate regions that are useful for both conservation and breeding applications, while recognizing that fine-mapping requires higher-resolution genotyping or sequencing11,39,40,41,42.

This study addresses a clear knowledge gap at the interface of adaptation and productivity in Turkish sheep. A rigorous, comparative scan across indigenous and crossbred populations will (i) reveal breed-specific and shared selection footprints, (ii) nominate candidate genes and pathways underpinning economically important and adaptive traits, and (iii) provide actionable genomic targets for breeding programs seeking to balance performance gains with the preservation of local robustness. The overarching objective is to generate a consolidated map of selection in Turkish sheep that can guide marker-assisted and genomic selection, inform introgression and conservation decisions, and deepen our understanding of how historical and contemporary selection have sculpted small-ruminant genomes in the Anatolian context.

Materials and methods

Animal material

A total of 1,612 sheep from five breeds were included in this study: Akkaraman (n = 168), Karacabey Merino (n = 760), Oamer (n = 671), Hasak (n = 7), and Hasmer (n = 6) (Fig. 1). Blood samples from the Akkaraman, Oamer, Hasak, and Hasmer breeds were collected at the Bahri Dağdaş International Agricultural Research Institute in Konya. In contrast, samples from Karacabey Merino were collected at the Sheep Breeding Research Institute in Balikesir. Blood was drawn from the jugular vein using vacuum tubes containing ethylenediaminetetraacetic acid (EDTA) as an anticoagulant and stored under appropriate conditions until DNA extraction. The experimental procedures employed in this research were authorised by the “Bandırma Sheep Breeding and Research Institute Ethics Committee for the Use of Animals in Research and Experimentation”, Türkiye (Approval No: 04.10.2021/049). All procedures conformed to the ARRIVE reporting guidelines, and formal permission was obtained from the institute’s administration prior to initiating the study. No animals were euthanised at any stage of the work; following blood collection, all sheep continued their lives under the institute’s standard management, nutrition, and husbandry conditions.

Fig. 1
figure 1

Representative Turkish sheep breeds, (a) Akkaraman, (b) Karacabey Merino, (c) Oamer, (d) Hasmer, (e) Hasak.

DNA extraction and genotyping

Genomic DNA was isolated from blood samples using the QIAamp DNA Blood Mini Kit (Qiagen, Hilden, Germany) following the manufacturer’s protocol. DNA yield and purity were assessed spectrophotometrically (NanoDrop, Thermo Fisher Scientific), and DNA concentration was confirmed using a Qubit Fluorometer (Thermo Fisher Scientific). All individuals were genotyped using the Illumina OvineSNP50 BeadChip, and SNP coordinates were mapped to the Oar_v4.0 reference assembly.

Genotype quality control

Initial quality filtering was performed using PLINK v1.943. SNPs were discarded if more than 10% of their genotypes were missing, if they significantly deviated from Hardy–Weinberg equilibrium (P < 0.0001), or if their MAF fell below 5%.

Candidate gene identification and annotation

Candidate genomic regions (iHS/ROH/Tajima’s D peaks) were extended by ± 100 kb and used to retrieve gene annotations from Ensembl (Ovis aries, Oar_v4.0) via the biomaRt package in R. Briefly, region coordinates were formatted in R and queried against the oaries_gene_ensembl dataset (biomaRt); retrieved fields included Ensembl gene ID, external gene name, chromosome, start/end positions and description. Results were deduplicated (unique gene symbols), compiled by region, and exported as Excel/flat files for downstream analysis. All annotation steps were implemented in R using tidyverse, readxl, biomaRt, and writexl.

Genetic diversity and population structure analysis

Principal component analysis (PCA)

To explore genetic differentiation among the five breeds, PCA was performed on the post-QC genotype matrix. Genotypes were encoded as 0/1/2 (alternate-allele counts); SNPs with ≥ 10% missing data and monomorphic markers were removed; and the remaining genotypes were mean-centered and scaled per marker before analysis. To reduce marker correlation, we applied LD pruning using PLINK v1.9 (window = 50 SNPs, step = 5, r² threshold = 0.20). PCA was computed in R (v4.x) using prcomp() with center = TRUE and scale = TRUE. = TRUE. The top two principal components (PC1 and PC2) are reported (see Results for percentage variance explained), and individuals were plotted in two-dimensional PC space, colored by breed, using ggplot2.

Selection signatures analysis

Three complementary approaches—iHS, ROH, and Tajima’s D—were used to detect genomic regions under selection.

iHS analysis

Haplotype phasing was performed with BEAGLE, and standardized iHS values were calculated using the rehh package in R. SNP-level iHS p-values were summarized as − log₁₀(p). The genome was scanned in non-overlapping 100 kb windows. A window was retained as a candidate if it contained at least one SNP with − log₁₀(p) ≥ 4 (i.e., p ≤ 1 × 10⁻⁴); adjacent significant windows were merged into intervals. A p ≤ 1 × 10⁻⁴ is a relatively stringent fixed cutoff widely used in genome-scan studies with medium-density arrays to limit single-marker noise and false positives. At the same time, 100-kb windows focus interpretation on regional rather than isolated haplotype signals.

ROH analysis

ROH were detected using a sliding-window approach. Main parameters: window size = 15 SNPs, minimum consecutive homozygous SNPs = 15, minimum ROH length = 1 Mb, maximum gap between adjacent SNPs within a run = 1 Mb, and a minimum SNP density threshold of ~ 1 SNP per 100 kb to avoid spurious short ROH in low-density regions. The 15-SNP sliding window smooths over isolated genotype errors, and the 1-Mb minimum length targets ROH likely arising from recent or intermediate shared ancestry rather than background homozygosity expected by chance on a 50 K array.

Tajima’s D calculation

Tajima’s D was calculated per breed using VCFtools (v0.1.17) with 10-kb sliding windows across autosomes. Candidate windows were identified empirically by percentile: we report windows in the top 1% of the genome-wide distribution and also present the top 5% for comparison. Ten-kb windows provide relatively fine mapping on a 50 K dataset while maintaining sufficient variant counts per window for stable estimates; empirical percentile thresholds avoid over-reliance on asymptotic p-values, which can be biased by demography and sample size. Overlap among the five breed-specific sets was computed and visualized with the web tool InteractiVenn44.

GO and KEGG functional enrichment analysis

Genes located within candidate selection regions were subjected to functional enrichment analysis using SRplot45. Gene Ontology (GO) terms were classified into three categories: biological process (BP), molecular function (MF), and cellular component (CC). KEGG pathway analysis was then performed to identify pathways that were significantly enriched46. SRplot computed enrichment scores, applied FDR correction for p-values, and generated graphical summaries of enriched terms.

Genetic diversity analysis

All diversity analyses were conducted on an Ubuntu 20.04 system under Windows Subsystem for Linux (WSL). We used PLINK v1.9 for genotype manipulation and basic summary statistics, VCFtools v0.1.17 for sliding-window nucleotide diversity, and R v4.x (with the data. table, knitr, and writeXLSX packages) for downstream aggregation and reporting.

Nucleotide diversity

Each breed’s quality-controlled PLINK dataset was first exported to VCF format. We then ran VCFtools to scan the autosomes in 100-kb windows with a 50-kb step. Within every window, the average number of pairwise nucleotide differences per site among all sampled chromosomes was computed. Finally, these windowed estimates were averaged to yield a single genome‐wide nucleotide diversity value for each breed.

Minor allele frequency and heterozygosity (MAF, HO, HE)

Within-breed allele frequencies were summarized by computing the MAF across all retained SNPs using PLINK’s frequency function. Observed heterozygosity (HO) and expected heterozygosity (HE) were then extracted from PLINK’s Hardy–Weinberg output. HO represents the actual proportion of heterozygous genotypes in the sample, whereas HE reflects the proportion expected under random mating given the allele frequencies.

Inbreeding coefficient (FIS)

Using the HO and HE values obtained from PLINK, we calculated the FIS for each breed in R. A negative FIS indicates an observed excess of heterozygotes relative to Hardy–Weinberg expectations. In contrast, a positive FIS indicates a deficit.

Within-breed allele sharing (DST) and genetic distance (D)

Pairwise genomic similarity between individuals was quantified via PLINK’s identity-by‐state (IBS) analysis. From the proportions of loci where individuals shared zero, one, or two alleles identically, we derived an average DST for each breed. D value was then expressed as the complement of DST, with higher D values indicating greater genetic divergence among individuals.

Results

PCA analysis in five sheep breeds

To investigate genetic structure among five sheep breeds (Akkaraman, Hasak, Hasmer, Karacabey, and Oamer), PCA was performed on the post-QC SNP set after removing 12,161 monomorphic or entirely missing markers and retaining SNPs with < 10% missing data. Before PCA, we applied LD pruning (PLINK v1.9; window = 50 SNPs, step = 5, r² threshold = 0.20) to reduce marker correlation. After pruning, PC1 explained 15.81% and PC2 explained 5.13% of the total genetic variance (combined = 20.94%); the axes in Fig. 2 have been updated accordingly. Akkaraman (n = 168) is clearly separated along PC1, with one individual (A115) showing a notably divergent PC1 score (− 0.72), consistent with possible substructure. Karacabey (n = 760) and Oamer (n = 671) form larger, partially overlapping clusters, suggesting closer genetic affinities or historical gene flow between these populations. Hasak (n = 7) and Hasmer (n = 6) are represented by small samples and therefore form less well-defined clusters; their dispersion along PC2 should be interpreted with caution. Because LD pruning was performed, the modest percentage of variance explained by PC1 and PC2 is unlikely to be an artifact of marker correlation; rather, it likely reflects genome-wide, broadly distributed differentiation in these populations.

Fig. 2
figure 2

PCA illustrating genetic differentiation among five Turkish sheep breeds.

Genetic diversity

Genome-wide nucleotide diversity varied modestly among the five Turkish sheep breeds, ranging from 6.97 × 10⁶ in Karacabey to 8.30 × 10⁶ in Akkaraman (Table 1). MAF was similarly homogeneous, with values ranging from 0.286 in Oamer to 0.306 in Karacabey. HO spanned from 0.381 in Oamer to 0.445 in Hasmer, whereas HE ranged from 0.376 in Oamer to 0.394 in Karacabey. All breeds exhibited slightly negative FIS, indicative of a slight excess of heterozygotes; the most pronounced heterozygote excess occurred in Hasmer (FIS = − 0.161), while Karacabey showed the mildest (FIS = − 0.016). Finally, DST values ranged from 0.033 in Oamer to 0.113 in Hasmer, corresponding to genetic distances (D = 1 – DST) of 0.887–0.967. Collectively, these metrics reveal moderate, broadly similar levels of genetic diversity across the five breeds, with Hasmer exhibiting the highest internal relatedness and Karacabey displaying the most extraordinary nucleotide diversity.

Table 1 Results of genetic diversity among the five sheep breeds.

ROH-based selection signatures

Using a window-based ROH calling strategy, we detected widespread runs of homozygosity across all five breeds, with apparent breed-specific differences in the number and genomic distribution of ROH islands (Fig. 3). In Akkaraman sheep, ROH analysis revealed multiple extended homozygous tracts, with the strongest signal centered on BGLAP (osteocalcin, involved in bone mineralization/postnatal growth). Growth-related loci within ROH included MYF6, MEF2C, FTO, FGF12; milk loci spanned the casein cluster (CSN1S1, CSN2, CSN1S2, CSN3) together with LTF and PRL; immune genes TLR2, TLR5, MYD88, NFKBIA, IL15 showed pronounced homozygosity; reproductive candidates FSHB, GDF9, OXTR were present; and metabolic/signaling genes ACLY, SHMT1, STAT5A, AURKA also lay in ROH segments (Table S1). In the Hasak crossbred population, ROH islands were widespread. The most pronounced centered on BGLAP; additional tracts encompassed FTO and GHRHR. Milk‐related ROH included PRL and the casein cluster (CSN1S1, CSN2, CSN1S2, CSN3). Immune genes within the ROH comprised TLR2, TLR5, IFNAR1, IL15, and NFKBIA; reproductive candidate genes included FSHB, LHB, FSHR, and OXTR. Skeletal/muscle loci MYOG, MEF2C, and metabolic loci ACLY and STAT5A also occurred in extended runs of homozygosity (Table S2). In the Hasmer crossbred, ROH analysis revealed a pronounced island at POU1F1 (pituitary growth/lactotroph lineage regulator). Extended homozygous tracts also encompassed growth hormone axis genes GHR and GHRHR. Milk/lactation candidates ADIPOQ and AHSG were embedded within ROH, immune loci IL12A, TLR5, and IFNAR1 resided in ROH, and reproductive candidates INHBA and BMPR1B were likewise detected (Table S3). In the Karacabey Merino crossbred, multiple ROH islands were detected, with a prominent signal at BGLAP. Additional extended tracts covered GH, GHR, GHRHR, and POU1F1 (somatotropic axis). Milk‐associated LTF and PRL fell within ROH; immune genes TLR5, IFNAR1, IL12A, NFKBIA mapped to homozygous regions; reproductive loci FSHB, LHB resided in ROH; and muscle/metabolic candidates MYOG, FGF12, ACLY, STAT5A, and IGFBP4 were also included (Table S4). In the Oamer crossbred, extended homozygosity spanned multiple chromosomes. The leading ROH island centered on BGLAP; further tracts encompassed POU1F1 and GHRHR. Milk regions within ROH included the casein cluster (CSN1S1, CSN2, CSN1S2, CSN3) and PRLHR. Immune‐related loci IL12A, IFNAR1, IFNAR2, and TLR4 were embedded within ROH; reproductive candidates FSHR, INHBA, and BMPR1B were present; and metabolic/growth genes LEPR, IGFBP3, and MSTN also occurred in extended runs of homozygosity (Table S5). Candidate-gene counts by breed were: Akkaraman (214), Karacabey Merino (871), Oamer (499), Hasak (496), and Hasmer (386). All five breeds shared fifty genes. Breed-specific sets comprised 107 genes unique to Karacabey Merino and one gene unique to Oamer; no breed-specific genes were detected for Akkaraman, Hasak, or Hasmer. The remaining genes were distributed among pairwise, three-way, and four-way intersections as shown in Fig. 4.

Fig. 3
figure 3

Comparative genome-wide ROH analysis in five sheep breeds.

Fig. 4
figure 4

Venn analysis.

iHS-based selection signatures

We performed iHS analysis to detect signatures of positive selection within each sheep breed, and then summarized the standardized iHS values genome-wide (Fig. 5). In Akkaraman sheep, iHS signals encompassed BGLAP (osteocalcin; bone mineralization and skeletal growth), FTO (nucleic acid demethylase linked to energy balance/adiposity), STAT5A (transcription factor in GH/PRL signaling), PRL (prolactin; lactation), CSN1S1, CSN2, CSN1S2, CSN3 (caseins; major milk proteins), TLR2, TLR5 (pathogen-recognition receptors), MYD88 (TLR adaptor), IL15 (lymphocyte activation), FSHB (FSH β-subunit), OXT (oxytocin; parturition/milk ejection), GDF9 (oocyte growth/folliculogenesis), and the myogenic regulators MEF2C and MYF6 (Table S6). In the Hasak crossbred population, iHS highlighted BMPR1B (BMP receptor; folliculogenesis/prolificacy), FGF2 (skeletal growth/tissue repair), ABCG2 (milk secretion/transport), CSN1S1, CSN2, CSN1S2, CSN3 (caseins), innate immune sensors TLR1, TLR6, TLR10, cytokines IL2 and IL21, reproductive-axis genes GNRHR and ESR2, and additional candidates IGFBP7 (IGF-axis modulation/ECM) and GPX1 (oxidative stress defense) (Table S7). In the Hasmer crossbred, iHS peaks were detected at FSHR (follicle-stimulating hormone receptor, involved in ovarian follicle maturation) and GNRHR (GnRH receptor), with growth/myogenesis signals at MYF6 and MEF2C, and IGF-axis support from IGFBP7. Milk synthesis selection involved LALBA (α-lactalbumin; lactose synthesis) and the caseins CSN1S1, CSN2, CSN1S2, CSN3; immune adaptation featured TLR1, TLR6, TLR10, IL22, and IFNG (Table S8). In the Karacabey Merino crossbred, iHS signals included MSTN (myostatin; negative regulator of muscle growth), PRL (lactation), BMPR2 (BMP receptor; ovarian/follicular signaling), INHA (inhibin α; feedback on FSH), IGFBP5 (IGF bioavailability), and immune-related TNF, CTLA4, and CD28 (T-cell costimulation/checkpoint) (Table S9). In the Oamer crossbred, iHS highlighted BGLAP (osteocalcin; bone mineralization), POU1F1 (pituitary transcription factor for GH/PRL/TSH lineages), LEPR (leptin receptor; energy balance/reproduction), LALBA (α-lactalbumin; lactose synthesis), innate sensors TLR1, TLR6, TLR10, cytokine/receptor genes IL12A and IFNAR1, reproductive genes BMPR2 and INHA, and metabolic/growth candidates ADIPOQ, AHSG (fetuin-A), and IGFBP5 (Table S10).

Fig. 5
figure 5

Genome-wide iHS profiles across five sheep breeds.

Tajima’s D-Based selection signatures

Using a sliding-window framework, we computed Tajima’s D across all autosomes in each breed (Fig. 6). In Akkaraman sheep, Tajima’s D identified three loci—CAPN2, PAG4, and IRF2—with CAPN2 showing the most significant deviation from neutrality. CAPN2 encodes a calcium-dependent protease central to cytoskeletal remodeling and muscle fiber hypertrophy; PAG4 reflects selection on placental function; IRF2 indicates adaptive pressure on immune regulation (Table S11). In the Hasak crossbred, Tajima’s D revealed a pronounced signal at CAST, consistent with an excess of low-frequency alleles and a recent sweep. As the endogenous inhibitor of calpains, CAST suggests a selective pressure on somatic growth, with plausible secondary effects on lactation efficiency and disease resilience (Table S12). In the Hasmer crossbred, two loci deviated significantly: CAPN3 (strongest) and PAG4. CAPN3 (muscle-specific calpain) supports selection on sarcomere remodeling and postnatal muscle development, while PAG4 suggests selection on placental/reproductive performance (Table S13). In the Karacabey Merino crossbred, three candidates—CAST (the strongest), MEF2C, and CAPN2—showed skewed allele-frequency spectra. CAST implicates selection on muscle remodeling and growth; MEF2C indicates pressure on myogenic differentiation; CAPN2 supports recent positive selection on cellular remodeling pathways (Table S14). In the Oamer crossbred, a strong signal at GHR underscores selection on somatic growth and metabolic regulation. Additional deviations at DGAT1, CAST, and CAPN2 suggest involvement in milk-fat synthesis and muscle remodeling; immune genes ITGB2 and IL1B indicate pressure on pathogen defense; B4GALNT2 suggests possible selection related to reproductive tract function (Table S15).

Fig. 6
figure 6

Comparative genome-wide Tajima’s D analysis in five sheep breeds.

Synthesis of multi-method and cross-breed selection signals

High-confidence candidate genes were defined as those supported by two or more independent methods (ROH, iHS, Tajima’s D) within a breed or by the same method in two or more breeds. Exhaustive per-method gene lists are provided in Supplementary Table S16. Examples of high-confidence loci include growth and muscle regulators such as BGLAP (ROH + iHS in Akkaraman, Karacabey and Oamer), MEF2C and MYF6 (ROH + iHS), MSTN (ROH and iHS in different breeds), and proteolysis regulators CAST/CAPN2 (Tajima’s D with ROH overlap); lactation-related loci including the casein cluster (CSN1S1, CSN2, CSN1S2, CSN3), PRL and LALBA (recurrent ROH and/or iHS signals); somatotropic/IGF-axis candidates such as POU1F1 and GHR/GHRHR and several IGFBP genes; and immune-related signals including multiple TLR family members, MYD88, IRF2 and IL15 detected by ROH and/or iHS.

When interpretation is limited to multi-method candidates, clear breed-level patterns emerge: the indigenous Akkaraman population shows relatively stronger multi-method signals for immune and local-adaptation loci (notably TLR family members, IRF2, IL15) alongside some muscle/growth regulators, whereas the crossbred populations (Karacabey Merino, Oamer, Hasak, Hasmer) show more consistent multi-method evidence for somatotropic and lactation-axis genes (POU1F1, GHR/GHRHR, caseins, PRL, LALBA). The recurrence of the casein cluster and PRL supports selection on milk production and composition; recurrent TLR signals are consistent with selection on immune responsiveness or disease resistance. At the same time, recurrent detections can result from shared ancestry, demographic history (bottlenecks or admixture) or SNP50K chip ascertainment bias; therefore primary biological claims are based on genes with multi-method and/or multi-breed concordance, while single-method hits—especially those derived from tiny samples (Hasak, Hasmer; n < 10)—are flagged as provisional and require replication.

Functional enrichment of candidate genes

Functional enrichment analysis of the candidate gene set revealed significant overrepresentation of biological processes governing developmental and regulatory pathways. Terms associated with the positive regulation of multicellular organismal processes, positive regulation of developmental processes, and regulation of cell differentiation displayed the highest enrichment, underscoring roles in growth and tissue formation (Fig. 7a). Cellular component annotations were dominated by membrane-associated locales, including the extracellular space, melanosome, and pigment granule, as well as integral and intrinsic components of the plasma membrane, membrane rafts, microdomains, and receptor complexes. Molecular function categories were principally characterized by signaling and regulatory activities, with signaling receptor regulator activity, signaling receptor activator activity, receptor ligand activity, and signaling receptor binding representing the strongest hits. Additional molecular functions, such as cytokine activity, cytokine receptor binding, hormone activity, growth factor receptor binding, and GTPase activity, further highlighted candidate genes involved in immune signaling, endocrine regulation, and signal transduction.

Fig. 7
figure 7

(a) Gene Ontology enrichment, (b) KEGG pathway enrichment.

Pathway analysis of candidate genes

Pathway analysis identified hormone signaling as the most significantly enriched pathway, reflecting selection on endocrine regulators of growth and reproduction. The cytokine–cytokine receptor interaction pathway emerged as the next-highest hit, alongside Toll-like receptor signaling, indicating a strong immune component among the selected loci (Fig. 7b). Enrichment was also observed for infectious disease-related pathways, including pertussis, measles, leishmaniasis, and malaria, as well as inflammatory bowel disease, suggesting adaptation to a spectrum of pathogen pressures. Cardiovascular-related pathways, including fluid shear stress and atherosclerosis, were also overrepresented, while Chagas disease appeared at the lower end of significance. These results collectively indicate that coordinated selection has occurred on networks governing hormone action, immune defense, pathogen response, and vascular homeostasis in Turkish sheep breeds.

Discussion

Population structure and genetic diversity

Our interpretation of population structure and diversity is qualified by explicit methodological constraints that affect precision and comparability across breeds. Sample sizes were highly unbalanced, with tiny numbers in Hasak and Hasmer relative to the larger Akkaraman, Karacabey, and Oamer sets. This imbalance, combined with potential demographic confounding from recent admixture and drift, reduces power for detecting subtle structure and inflates uncertainty around point estimates in the crossbred groups. Accordingly, we emphasize patterns that are consistent across methods and better-sampled populations, and we treat inferences involving Hasak and Hasmer as exploratory.

Within this evidence-based framework, the PCA indicated weak overall differentiation, with the first two components accounting for 20.94% of the variance. Akkaraman separated along PC1, consistent with prior reports of its genetic distinctiveness among Turkish fat-tailed breeds. In contrast, Karacabey and Oamer overlapped broadly, consistent with shared ancestry and recent gene flow typical of regional production systems. The modest variance captured by PC1–PC2 cautions against over-interpreting fine structure. It aligns with studies showing that most ovine genetic variation is partitioned within rather than between populations in Türkiye and neighboring regions. Comparable low between-breed differentiation has been documented for Turkish sheep using both microsatellites and SNP panels11, for Greek breeds where within-population variation predominates47, and for South Asian and Middle Eastern sheep characterized by extensive admixture. Together, these observations support a view of broad genomic continuity with localized pockets of differentiation rather than sharply segregated breed clusters. Diversity statistics further reinforce this picture: estimates of nucleotide diversity and heterozygosity were moderate across well-sampled breeds, and negative FIS values suggest recent admixture or balancing processes, consistent with earlier Turkish and regional reports in which heterozygosity was moderate and FIS tended to be near zero or negative11,47. Given the small n for Hasak and Hasmer, we refrain from attributing their point estimates to breed-level processes and instead note that additional sampling will be required to confirm whether these groups deviate meaningfully from the general pattern.

Analyses of runs of homozygosity provide complementary insight into selection and demography but warrant the same caution with respect to crossbred sample sizes. Across the dataset, ROH landscapes contained regions that plausibly reflect both older background selection and more recent inbreeding or directional selection, a mixture also reported for Mediterranean and Central-European sheep. The predominance of short to intermediate ROH and the presence of a limited number of longer tracts mirror findings from Greek populations, where ROH lengths commonly fall between 1 and 5 Mb and suggest moderate autozygosity47, and from Polish breeds, where many short ROH indicate historical inbreeding. In contrast, longer segments point to recent events48. In Anatolian sheep, previously reported ROH islands encompassed loci such as ZNF208B, CBX1, and COPZ149, and the overlap of functionally coherent regions in our study with these prior signals supports the interpretation that some selection pressures are shared across regional contexts. At the same time, the extensive admixture documented for Turkish flocks means that ROH islands can aggregate variants introduced through crossbreeding, complicating attribution to specific breeds or management histories. For this reason, we avoid breed-by-breed catalogues—especially for Hasak and Hasmer—and instead emphasize the aggregate signal: in well-sampled groups, ROH architecture suggests moderate autozygosity against a background of gene flow, with a subset of tracts consistent with selection on pathways related to productivity and resilience. Targeted, demography-aware analyses with balanced sampling will be necessary to disentangle selection from drift and to validate the stability of these ROH features across cohorts and environments.

Selection signatures for growth and body size

Extended haplotype homozygosity and ROH architecture around BGLAP and nearby regulators of muscle and metabolism are concordant with selection on skeletal robustness and carcass accretion in Turkish production systems. In the better-sampled Akkaraman and Karacabey groups, we observe clustering of transcriptional regulators of myogenesis (e.g., MEF2/MYF family members) together with components of the GH–GHR–GHRHR–STAT cascade, a combination that mechanistically links bone formation, fibre differentiation, and nutrient partitioning. Rather than over-weighting individual loci, we note that this pathway-level convergence mirrors reports from multiple ovine contexts in which body size and muscle growth map to growth-hormone signaling, chromatin/transcriptional control of myogenesis, and lipid–amino-acid metabolism. Studies in Merino-derived lines emphasize contributors to stature and muscling (e.g., HMGA2/FGF12-proximal regions)50; Iranian breeds implicate myogenic regulators consistent with muscle development and fibre composition51; Hu sheep and several Chinese panels highlight coordinated effects of HOX/MSTN modules alongside growth-hormone–related signals52,53; and high-altitude/body-size work in Tibetan sheep underscores the polygenic, pathway-level nature of growth with enrichment in cAMP/Rap1 and related signaling nodes54. Broad regional syntheses from Central and West Asia similarly recover growth-axis and limb-development components (e.g., SMAD/ESR/HAS2 pathways in Tarim Basin sheep)55 and polygenic architectures that integrate limb morphogenesis and immune–metabolic crosstalk in Middle Eastern/South Asian populations56. Taken together, these comparisons suggest that our Turkish signals fit a recurrent, cross-study motif in which selection on carcass yield and frame size is mediated by variants that modulate endocrine growth signaling, myogenic differentiation programs, and cellular energy homeostasis.

In this context, candidate regions proximate to BGLAP are most parsimoniously interpreted as markers of long-term selection on skeletal integrity rather than as singular “major-effect” drivers, because the surrounding haplotypic tracts repeatedly co-localize with genes that coordinate osteoblast activity, muscle fibre maturation, and substrate utilization. The co-occurrence of calpain–calpastatin signals with growth-axis markers is also biologically coherent: proteostasis modules (e.g., CAPN/CAST) influence post-mortem tenderness. They may reflect indirect selection on carcass quality traits that covary with growth rate. Importantly, pathway-level agreement across our scans and the literature does not imply that any single locus is uniformly causal across breeds; instead, it supports a polygenic model in which different combinations of variants within a shared network yield comparable phenotypes under similar husbandry objectives.

Linking these genomic patterns to breed histories provides additional interpretive discipline while avoiding overreach. Akkaraman is a fat-tailed, meat-type landrace managed under relatively harsh Anatolian conditions; selection for skeletal robustness and efficient muscle accretion would be consistent with the repeated signals we see around osteogenic and somatotropic pathways. Karacabey has substantial Merino ancestry and a dual-purpose orientation; the co-enrichment of growth-axis and muscle-regulatory signals aligns with long-standing emphasis on frame and carcass traits in Merino-derived improvement schemes. By contrast, given the tiny n in Hasak and Hasmer, we deliberately refrain from detailing locus-by-locus findings for these crossbreds and instead note that any apparent “additional” signals must be validated with balanced sampling before they can be connected to introgression histories or management goals.

Overall, our best-supported conclusion is not that a single gene underlies body size in Turkish sheep, but that multiple, interacting pathways—endocrine growth signaling, myogenic transcriptional control, and metabolic regulation—show coordinated evidence of selection in the better-sampled populations, in agreement with independent studies across diverse ovine contexts 50–52,54−56. Future work should prioritize replication with larger, more balanced cohorts, demographic modeling to separate selection from drift, and genotype–phenotype association analyses to identify which nodes within these networks most robustly predict growth and carcass outcomes in specific production environments.

Selection signatures for milk production

Long ROH and extended haplotypes center on the casein gene cluster together with endocrine and metabolic regulators of lactation, indicating selection on both milk composition (protein/fat synthesis) and secretion/transport. This architecture accords with independent reports: ROH islands in Greek and Polish sheep repeatedly include dairy-relevant loci such as ABCG2/SPP1 and related milk-yield or composition candidates, supporting a general signal of selection on lactation pathways rather than breed-specific outliers48,57. Cross-species comparative scans also converge on metabolic and transport functions—overlap between FST and XP-EHH highlights lipid handling, carbohydrate metabolism, and membrane transport components that plausibly scale milk volume and solids58; in specialized dairy panels (e.g., Lacaune), targets such as SUCNR1 and PPARGC1A emphasize mitochondrial signaling and transcriptional control of energy allocation during lactation59. Regional studies from Iran similarly implicate metabolic and regulatory genes associated with milk traits, reinforcing that selection on dairy performance in Southwest Asian contexts often acts through networks coordinating lipid mobilization, oxidative metabolism, and secretory capacity rather than through isolated major-effect loci51.

Linking these genomic patterns to breed histories provides a cautious, biologically grounded interpretation without overstatement. Akkaraman and Oamer have long been managed as dual-purpose populations under varied production environments; enrichment for casein-cluster and lactation-endocrine signals in the better-sampled groups is therefore consistent with sustained, polygenic selection on milk quality and yield under field conditions. By contrast, any apparent enrichment in the small crossbred samples could reflect sampling variance or recent drift; these findings require replication with balanced designs before attributing them to introgression or targeted selection. Overall, the weight of evidence—across our multi-method scans and the comparative literature—supports a model in which Turkish dairy performance has been shaped by coordinated selection on (i) milk-protein loci (casein cluster), (ii) endocrine regulators that tune lactation efficiency, and (iii) metabolic/transport modules that govern energy partitioning and milk constituent export48,51,57,59. Further validation with larger, phenotype-linked cohorts and explicit demographic modeling will be essential to separate proper selection from background structure and to identify which nodes in these networks most robustly predict dairy outcomes in each breed.

Selection signatures for immune function and adaptation

Our results support a pathway-level model in which innate sensing (Toll-like receptors and downstream adapters), interferon signaling, and antibody effector functions have been recurrent targets of selection in Turkish sheep. Rather than relying on single-gene claims, the convergence of long ROH and extended haplotypes around TLR–MYD88–NF-κB modules, interferon receptors, and Fc-receptor loci suggests coordinated selection on early pathogen recognition, inflammatory tuning, and effector clearance. This architecture parallels reports from Anatolia and neighboring regions, where selection candidates repeatedly include innate immune and stress-response pathways, as well as odorant/chemosensory and heat-stress components that mediate host–environment interactions49,60. Broad comparative scans across climatic zones similarly find enrichment of interleukin and cluster-of-differentiation families among selection hits, pointing to polygenic immune adaptation rather than isolated major-effect variants60. Studies from the Middle East and South Asia further implicate loci tied to host defense and tissue repair, reinforcing a regional pattern in which limb/skin development, antiviral restriction, and leukocyte signaling co-segregate with immune candidates in selection scans56. In Iranian and Chinese datasets, independent analyses identify interferon-pathway members and mitochondrial/oxidative-stress components (e.g., STAT2, DOCK5, UBR1, NLRX1), underscoring the recurring theme that pathogen pressure and cellular stress tolerance are coupled targets of selection in sheep51,52. High-altitude and desert panels add heat-shock, redox, and barrier-integrity candidates (e.g., TRIM/FOX and glycolytic/oxidoreductive modules), consistent with thermal and hypoxic challenges shaping immune-metabolic crosstalk61. Signals in southern African Merino and Western Pyrenees populations likewise emphasize interferon-receptor complexes and central immune transcription factors, suggesting that components of the interferon-JAK/STAT axis are repeatedly tuned across diverse production systems50,57.

Linking these genomic patterns to breed histories and production ecologies provides a cautious biological rationale while avoiding over-interpretation. Akkaraman and Oamer, which are better sampled here and maintained under mixed crop–livestock systems spanning continental interiors to coastal zones, show pathway-level evidence consistent with long-term exposure to variable pathogen communities and climatic stressors. Enrichment for TLR–MYD88–NF-κB and interferon modules in these groups accords with management contexts where endemic parasitism, bacterial mastitis risk, and seasonal viral pressures are salient; similar combinations of innate and cytokine signaling candidates recur in independent studies from Anatolia, the Middle East, and Africa49,50,56,57,60,61. By contrast, any apparent immune enrichment in the small crossbred samples could reflect sampling variance or recent drift; attributing such patterns to recent introgression or targeted selection would be premature without larger cohorts and replication. Overall, the weight of evidence—across our multi-method scans and the comparative literature—supports a model in which Turkish sheep have undergone coordinated, polygenic selection on (i) early pathogen sensing and inflammatory control, (ii) interferon-mediated antiviral and stress responses, and (iii) antibody-dependent effector pathways that together enhance robustness under heterogeneous disease and climate regimes49,50,51,52,56,57,60,61. Future work should integrate balanced sampling, phenotype, and exposure data (e.g., parasite burdens, mastitis records, heat-humidity indices), and demographic controls to separate proper selection from background structure and to identify which nodes in these networks most robustly predict health outcomes in each breed.

Selection signatures for reproduction and fertility

Our results are most consistent with pathway-level selection on the gonadotropin axis (FSH/LH synthesis and signaling), oocyte/follicular development mediated by BMP/TGF-β components, and insulin-like growth factor modulation of follicle maturation. Rather than prioritizing individual loci, we note that extended homozygosity and long-range haplotypes surrounding the FSH/LH–GNRH signaling cascade and BMP receptors align with repeated observations from diverse sheep panels showing that fertility is a polygenic trait shaped by coordinated tuning of endocrine and oocyte-development pathways50,51,52,55,57,62. For example, independent studies in Hu, Suffolk, Tarim Basin, South African Merino, and Western Pyrenees sheep consistently recover networks involving BMP signaling (including BMPR family members), hypothalamic–pituitary control (GNRH/FSH/LH), and accessory modulators of gametogenesis and implantation, supporting a model of convergent selection on the same functional axes across environments and production systems50,51,52,55,57,62. Linking these genomic patterns to breed histories, Akkaraman—maintained under extensive systems with economic emphasis on lamb crop—shows signals consistent with long-term selection on follicular recruitment and ovulation efficiency; Karacabey and Oamer, used in more managed, dual-purpose contexts, show complementary signatures suggestive of balancing fecundity with perinatal survival and maternal behavior. By contrast, any apparent enrichment in Hasak/Hasmer should not be over-interpreted as introgressed without replication in larger, demographically balanced cohorts. Taken together, the weight of evidence supports a conservative interpretation: Turkish sheep exhibit polygenic, pathway-level selection on reproductive efficiency—centered on BMP/TGF-β and gonadotropin signaling with IGF-mediated modulation—consistent with regional reports and with breeding goals that favor higher lambing rates while preserving fitness-related traits50,51,52,55,57,62. Future work integrating balanced sampling, explicit demographic corrections, and reproductive phenotypes (ovulation rate, litter size, lamb survival) will be essential to distinguish accurate selection signals from sampling noise and to identify which nodes in these pathways most robustly predict fertility in each breed.

Convergence and divergence of selection signals

The most defensible result is pathway-level convergence on axes underpinning productivity and adaptation: somatotropic growth signalling (GH–GHR–GHRHR–STAT), osteogenesis/myogenesis modules linked to carcass traits, the casein–lactation complex for milk composition, innate/adaptive immunity (TLR/interferon/interleukin systems), and endocrine control of fecundity (gonadotropin and BMP/TGF-β cascades). The recurrence of these same functional routes across diverse sheep datasets elsewhere supports a shared architecture shaped by long-term husbandry goals and environmental pressures rather than study-specific noise50,52,55,57,59,61,62. In this context, Akkaraman provides the clearest breed-history link: as a fat-tailed landrace maintained under extensive Anatolian conditions, it plausibly accumulates selection for skeletal/metabolic efficiency and disease defence, consistent with resilience to heat, feed fluctuations, and endemic pathogens. Karacabey and Oamer, managed in more temperate, semi-intensive settings and historically targeted for dual-purpose use, fit a pattern in which lactation and growth pathways cosegregate with immune modulation—consistent with selection to sustain yield while mitigating the health risks typical of higher-input systems. Signals appearing to be unique to Hasak or Hasmer cannot be distinguished from sampling artefacts at present and should not be interpreted as definitive introgressed haplotypes without replication; their primary value here is to generate hypotheses for follow-up. Overall, our synthesis aligns with continental reports showing high within-breed diversity and admixture in European/Turkish flocks11,47 and with repeated detection of growth/BMP, casein/metabolic, and immune pathways across European, Asian, and African breed comparisons50,52,55,57,59,61,62. Thus, rather than emphasizing isolated loci, we conclude that Turkish sheep most likely share a conserved, polygenic backbone for productivity and adaptation, upon which management history and local breeding goals have produced modest, breed-contingent differences. Practically, this argues for improvement programs that leverage pathway-level markers while preserving standing diversity—especially in small or recently formed populations—coupled with future designs that equalize sampling, model demography explicitly, and validate candidate routes with phenotype-anchored replication.

Interpretive frame and key limitations

Three constraints particularly shape how the results should be read. First, sample sizes are unbalanced across breeds, with especially small cohorts for Hasak and Hasmer. Such imbalance lowers statistical power and increases the chance that single-breed outliers reflect sampling noise rather than true population signals. Accordingly, while we report findings for these crosses, we treat breed-specific signals from Hasak/Hasmer as provisional and avoid strong conclusions based on them alone. Second, we used a medium-density SNP array, which constrains genomic resolution, can be affected by ascertainment bias, and may miss rare/structural variants that contribute to selection signatures. This limitation is common in genome scans with legacy arrays and motivates our emphasis on multi-method consistency and cross-breed recurrence rather than isolated single-marker peaks. Third, demographic history (admixture, drift, uneven recent effective population sizes) can generate long homozygous tracts or skew neutrality statistics in ways that mimic positive selection.

Conclusion

This study combined population-structure summaries with multiple selection‐scan statistics to explore adaptation and productivity in Turkish sheep, while explicitly recognizing key limitations. Severe sample-size imbalance, medium-density SNP coverage, and potential demographic confounding (admixture, recent Ne changes) constrain power and precision, especially for crossbred-specific inferences. Accordingly, we place interpretive weight on patterns that recur across methods and better-sampled breeds and treat Hasak/Hasmer signals as provisional. Within that evidence-bounded frame, four pathway-level themes emerge: (i) somatotropic–myogenic regulation of growth and carcass traits; (ii) lactation signatures centered on the casein region with metabolic modulators affecting nutrient partitioning; (iii) innate and interferon-mediated immunity consistent with broad resilience rather than pathogen specificity; and (iv) reproductive networks spanning BMP, gonadotropin, and IGF-binding axes. These convergent tendencies align with breed histories—e.g., Akkaraman’s separation and robustness under continental management, and the more balanced somatotropic–lactation emphasis in Karacabey/Oamer under temperate, dual-purpose systems. Rather than a prescriptive roadmap, our results offer evidence-informed starting points. Priorities include balanced resampling (particularly for Hasak/Hasmer), higher-resolution genotyping or WGS, demography-aware null models, phenotype–environment integration, and functional validation. With these steps, pathway-level signals can be translated into robust, diversity-conserving tools for sustainable improvement of meat, milk, resilience, and fertility in Türkiye’s sheep.