Introduction

In recent years, there has been a clear trend in biomedical research towards incorporating non-traditional animal models1,2. Research on species living in extreme environments or those possessing unusual traits has proven to be a useful route to biomedically significant discoveries3,4,5. Adaptations helping the species to survive in challenging conditions can offer valuable insights into fundamental physiological processes and thus be also useful for addressing human disorders. Among mammals, such promising models are, for example, the arid adapted spiny mice for skin regeneration6, the subterranean blind mole rats for longevity and cancer resistance7,8, hibernating bears and the smallest primates, mouse lemurs, for natural obesity and diabetes9,10 as well as the bowhead whales for disease resistance and ageing11.

Among these emerging model organisms, one stands out for its relevance across multiple research fields. The naked mole-rat (Heterocephalus glaber), a small ( < 80 g), hairless rodent of the African mole-rat family (Bathyergidae), exhibits a truly unique suite of characteristics. These can be attributed to its strictly subterranean lifestyle, complex social organization resembling insect eusociality, and distribution in very arid regions of the Horn of Africa. After the description of its extreme longevity – nearly 40 years according to recent estimates12, which is exceptional for a non-hibernating mammal of its size13 – a series of important discoveries followed. These include insights in healthy aging, e.g., see refs. 12,14; cancer resistance, e.g., see refs. 15,16; pain insensitivity, e.g., see refs. 17,18,19; hypoxia tolerance, e.g., ref. 20 and oxidative stress defense e.g., see ref. 21. As a result, the naked mole-rat – once heralded as the most unusual mammal22 – has repeatedly been highlighted as a valuable model for biomedical research23,24,25.

Given the substantial effort invested in studying the biology of the naked mole-rat in laboratory settings26, surprisingly little is known about its natural genetic variation. The genus is largely distributed across insecure and politically unstable regions of the Horn of Africa, including Somalia, eastern Ethiopia, and eastern Kenya27. As a result, only a limited number of old museum specimens from the colonial era and virtually no freshly collected tissue samples suitable for genomic analyses are available.

The most comprehensive study on the genetic structure of free-living populations of the naked mole-rat to date28 was still restricted to a relatively small part of its distributional range. It was based on the mitochondrial cytochrome b (CYTB) gene and a few nuclear markers and confirmed existence of two distinct lineages in the genus29,30. One of them was found in Kenya and southern Ethiopia, the other in northeastern Ethiopia and their divergence was dated to the middle Pleistocene, 0.8–1.4 million years (Ma) ago. Beside the pronounced genetic divergence, a considerable morphological variation was recently described in the genus31. Both kinds of data thus raised the question of whether multiple species of the naked mole-rat exist. In our study, we address the issue with extended geographical sampling, including sequencing of old museum material from Somalia and numerous fresh tissue samples from large poorly explored areas such as Somaliland and Djibouti. Analysis of mitochondrial and nuclear genomic data, using both well-established and novel phylogenetic and species delimitation methods, revealed the presence of three deeply divergent lineages. These lineages differ in habitat requirements, and micro-computed tomography (µCT) further demonstrated differences in molar tooth morphology. Integration of available data supports the existence of more than one species of the naked mole-rat.

Results and Discussion

Congruent mitonuclear structure

The analysis of 38 mitochondrial CYTB gene sequences and 37 reduced-representation nuclear genomes obtained via double-digest restriction-site associated DNA sequencing (ddRAD), revealed three deeply divergent phylogenetic lineages with non-overlapping geographic distributions (Figs. 1, 2, Supplementary Table S1). Two of these correspond to the previously recognized subspecies H. g. glaber Rüppell 1842 and H. g. ansorgei Thomas 1903, see refs. 28,31. The third lineage, a distant sister group of the former two, comprises two specimens from localities in southern Somaliland (Xangay and Shanshacade). Based on molar characteristics31 (see below) and geographic position (cf. Figure 1A), this lineage was identified as H. g. phillipsi Thomas 1885 – a taxon not previously studied genetically, cf.28.

Fig. 1: Distribution, localities, and CYTB phylogeny.
Fig. 1: Distribution, localities, and CYTB phylogeny.
Full size image

A Distribution of sampled sites, presence records of Heterocephalus and type localities of formally described species (the precise type locality of H. glaber is not known, but it should be around Afar triangle, see ref. 28. The basemap is Google Satellite imagery accessed via XYZ Tiles (QGIS), 2025; Map data © Google. The species’ distribution outline is from the IUCN Red List (International Union for Conservation of Nature, 2008). Country boundaries are taken from Natural Earth. B Cytochrome b tree with branch colours indicating species delimited by the branchcutting algorithm. Monophyly of all three lineages (phillipsi, glaber and ansorgei) was supported with PP = 1.00. The locality numbers used in the map are associated with locality names on the tree tips.

Fig. 2: Analyses of ddRAD data.
Fig. 2: Analyses of ddRAD data.
Full size image

A Nuclear phylogeny estimated from the concatenated biallelic SNPs with colours indicating branchcutting delimitation. Monophyly of all three main lineages (phillipsi, glaber and ansorgei) was supported with PP = 1.00. B The coancestry matrix with squares indicating Infomap clustering. C Ordination of individual nuclear genotypes in the space of the first two principal components. Colours indicate the classification of the individuals to species based on the branchcutting and Infomap clustering analyses. D The STRUCTURE estimate of individual ancestry proportions as indicated by coloured bars.

In the CYTB tree, these three lineages were delimited as separate species by the branchcutting algorithm32 with high support (≥0.97, Fig. 1B), and each formed a monophyletic group with a posterior probability (PP) of 1.00. The average CYTB sequence distances among them were also remarkably high – exceeding 11% (Supplementary Table S2). The nuclear genomic dataset comprised 25,466 ddRAD loci, containing 147,986 single nucleotide polymorphisms (SNPs). Four different ordinations and clustering analyses of the nuclear genomic data consistently identified three clusters of individuals, each corresponding to one of the mitochondrial lineages. In the concatenated tree of all biallelic SNPs, the three lineages were again monophyletic (PP = 1.00) and delimited by the branchcutting algorithm with full (1.00) support (Fig. 2A). Infomap clustering33,34 on the coancestry matrix34 identified phillipsi as a very distinctive lineage (Fig. 2B). Although phillipsi, as a whole, was separated there from the others by markedly lower coancestry values, its representative from Shanshacade (SOM512) showed slightly elevated coancestry with two individuals (SOM584, SOM585) from Sheikh, the nearest sampled locality of glaber. Principal component analysis (PCA) separated phillipsi from the other two lineages along PC1, though the Shanshacade individual was again shifted towards the cluster of glaber (Fig. 2C). STRUCTURE analysis35 clearly supported the model with three groups (\(\text{K}=3\)) as the best one. Ancestry proportions were estimated at 1.00 for all individuals, except for the Shanshacade specimen, where the ancestry proportion of phillipsi was 93.9%, with the remaining 6.1% assigned to glaber (Fig. 2D).

The multispecies coalescent (MSC) analysis36 confirmed that the divergence between phillipsi and the other lineages – 0.0067 (95% highest posterior density [HPD] interval: 0.0059–0.0076) – is substantially deeper than that between glaber and ansorgei, which was estimated at 0.0021 (0.0019–0.0024) in substitution units. To test for potential recent gene flow, especially between glaber and phillipsi as suggested by the clustering analyses, we applied a series of MSC-M models, which extend the MSC model by including post-divergence migration37 and compared them using Bayes factors (BF). All pairwise combinations were tested, and the only supported migration was from glaber to phillipsi (BF = 25.8 compared to the no-migration model), though the estimated number of migrants per generation was very low, M = 0.0063 (95% HPD 0.0024–0.0103, Fig. 3A). Full results of the BPP analyses are provided in Supplementary Table S3.

Fig. 3: Migration and dating.
Fig. 3: Migration and dating.
Full size image

A Multispecies coalescent tree with branches (extant species and their ancestors) represented by rectangles whose lengths are proportional to time (difference between τs of the bounding nodes) and their widths to half the population size parameters (\(\theta /2\)) Rectangle shapes thus reflect branch lengths in coalescent units. The arrow indicates the only supported migration with the posterior mean of its rate. B The time-calibrated tree of all extant bathyergids as estimated under the fossilized birth-death model (Petromus and Thryonomys were omitted from the figure).

The three lineages were genetically distinct. The genealogical divergence index (gdi) of all three lineages was ≥ 0.97 with the lower limit of its 95% HPD interval always ≥ 0.96. When glaber and ansorgei were merged, their collective gdi was of the same magnitude. The decomposition of sequence variation confirmed the split into three lineages as the main pattern in the data (Supplementary Table S4). Their fixation index (\({\phi }_{{ST}}\)) was = 0.9888 when estimated relative to both variation in the whole genus. In contrast, \({\phi }_{{ST}}\) was just 0.1361 for the split into two lineages (phillipsi and glaber+ansorgei) and, also, it did not increase much (just to about 0.998) when glaber and ansorgei were each further split into internal sub-lineages (see below). The absolute genetic divergence (\({d}_{\text{xy}}\)) was 0.0141 between phillipsi and the other lineages, but only 0.0056 between glaber and ansorgei (in substitution units). The latter was still ca. six times larger than the divergence between the major sub-lineages found within glaber and ansorgei. The nucleotide diversities (\(\pi\)) were relatively low, ranging from 0.0002 to 0.0031 across the hierarchy of lineages, and the net genetic divergences thus showed the same picture as the absolute ones (Supplementary Table S5).

Extremely old basal split

The timing of divergence events within Heterocephalus was estimated by inferring a time-calibrated tree of phiomorph rodents38 using the fossilized birth-death model39 in combination with node calibrations. All four fossil species of Heterocephalus, described from the Plio-Pleistocene of Eastern Africa31, were included in the analysis (Supplementary Table S6). The posterior sample of trees indicates that the split between phillipsi and the other two lineages is very old, dating to the early Pliocene, 4.1 (2.9–5.5) Ma. This is older than divergence between any other known pair of congeneric mole-rat species (Fig. 3B). The divergence between glaber and ansorgei is much younger, but still dates to the early Pleistocene, at 2.24 (1.3–3.4) Ma (Fig. 3B), which is notably older than the 0.8–1.4 Ma range previously estimated for the same two lineages28.

Diagnostic dental traits

We examined the morphological differentiation among the three lineages through a combined analysis of our and published31 dental measurements (Supplementary Table S7). The phillipsi lineage is separated from the other two in a scatter plot of upper and lower molar row lengths (Fig. 4A). This is not only due to its previously reported small body size31, but also due to vestigial third molars – both upper (M3) and lower (M3) – which are not externally visible in the holotype of H. phillipsi (Fig. 5 in ref. 31). In our newly collected specimens, the third molars were either small, embedded within the bone, or absent (Supplementary Fig. S1). The distinctiveness of phillipsi is also apparent in PCA of the lengths and widths of the first two molars (Supplementary Fig. S2), where phillipsi is distinguished by longer lower teeth, possibly replacing the minute third molar. In addition to dental traits, phillipsi also exhibits a lower coronoid process of the mandible31. In contrast, the glaber and ansorgei lineages were quite similar in their dental traits, with overlapping distributions of individual measurements (Fig. 4A, Supplementary Fig. S2).

Fig. 4: Morphological and ecological differences.
Fig. 4: Morphological and ecological differences.
Full size image

A Scatterplot of the lengths of the upper and lower molar rows. Based on it, we included the holotype of H. dunni Thomas 1909 in phillipsi. The holotypes are labelled by the specific epithets they carry. See also Supplementary videos at https://doi.org/10.6084/m9.figshare.27952233. B Between-group principal component analysis of ecological variables. Small open circles correspond to sites with records of particular lineages (based on genetic and morphological data), large and filled circles are their means. The lines indicate loadings of ecological variables on the bgPC axes. Included are temperature variables: annual mean (BIO01), diurnal range (BIO02), diurnal range divided by annual range (BIO03), seasonality (BIO04), max. of warmest month (BIO05), min. of coldest month (BIO06), mean of driest quarter (BIO09) and mean of warmest quarter (BIO10); precipitation variables: annual total (BIO12), total of driest month (BIO14), seasonality (BIO15), total of warmest quarter (BIO18) and total of coldest quarter (BIO19) and soil variables: proportions of sand and clay particles. In both plots, (A, B), the points are coloured consistently with genetic data or geographical origin (cf. Fig. 1) and resemblance to the phillipsi phenotype31. The circles representing genotyped specimens are filled with semi-transparent colour.

Distribution of the lineages

All three lineages of Heterocephalus appear to be distributed parapatrically (Fig. 1), although additional sampling will be necessary to confirm this geographic pattern. The lineage glaber occurs in north-eastern Ethiopia, northern Somaliland, and southern Djibouti, mainly northernly from the Chercher Mountains. The lineage phillipsi is genetically confirmed from two sites in southern Somaliland. Based on the cranial and dental characteristics (Fig. 4A and ref. 31), phillipsi also includes a specimen from Mogadishu and the holotypes of formerly recognized H. dunni Thomas 1909 and H. scorteccii de Beaux 1934. In that case, H. phillipsi has probably a large, yet poorly documented, geographical distribution covering the Somali region in Ethiopia, Somalia, and southern part of Somaliland (Fig. 1A). The lineage ansorgei occurs most southernly, from eastern Kenya and southern Ethiopia to southern Somalia, tentatively including also a specimen from Genale, 80 km south-west of Mogadishu. In the west, ansorgei appears to be isolated from glaber by the large massif of Arsi and Bale Mountains. This isolation can be emphasized by the presence of large perennial rivers rising in this massif40, which create deep valleys in mountain areas and more easternly in the lowlands they may represent the main migration barrier in an otherwise topographically flat and uniform landscape. Nevertheless, part of mole-rat population may be occasionally found on the opposite side of a river, either due to active crossing or, more likely, as a result of the river changing its course, particularly in lowland areas (see ref. 41).

Within both ansorgei and glaber, further genetic structure is evident, almost completely congruent between mitochondrial and nuclear trees. The ansorgei lineage is differentiated in the north-south direction – the southern sub-lineage contains Kenyan sites (Mtito Andei, Meru NP, Lerata) and the southernmost Ethiopian site (Borena Megadu), while the northern sub-lineage comprises all other Ethiopian sites of the lineage (Arero, Geralle, Borena, Dembalawachu) together with Luuq in Somalia (Fig.1A). Zemlemerova et al.28 suggested that Mega ridge and the presence of extensive lava fields is the barrier dividing both sub-lineages. The division within the glaber lineage follows the ranges of Chercher and Golis Mountains. In both trees, animals from the localities on its southern side (Babile and Sheikh) are separated from those on the northern side (Xeego, Boorama, Habas, Assamo, Dire Dawa, Jeldessa). An individual from the centrally located Agabar belongs to the southern sub-lineage in the mitochondrial tree, but to the northern one in the nuclear tree. The structure is nicely illustrated by CYTB distances. While it is only 1.1% between Babile and 333 km distant Sheikh, it is 2.3% between Babile and Dire Dawa, just about 66 km apart, but on the opposite side of the Chercher Mountains. Monophyly of ansorgei and glaber sub-lineages was supported with PP ≥ 0.93 in CYTB tree and PP = 1.00 in ddRAD tree.

The geographic distribution of the three lineages, together with the basal split between phillipsi and the rest of the genus, suggests that extant Heterocephalus originated in the eastern part of its current range, followed by a west-southward ancestor migration of ansorgei (see ref. 30). The relatively recent arrival of Heterocephalus in the southern part of its range could explain the lack of fossils in Kenyan sites31. However, fossils of Heterocephalus found in northern Tanzania, dated to 4.3–1.7 Ma (H. manthi, H. quenstedti, and H. jaegeri), and in southern Ethiopia, dated to 2.4–1.96 Ma (H. atikoi), may indicate a historically broader distribution of the genus and the subsequent disappearance of these Plio-Pleistocene species from the modern fauna (see ref. 31 for review).

Habitat requirements

The current distribution of the genus was estimated by Maxent42, based on 90 distinct presence records (Supplementary Table S8) and 13 climatic and two soil variables (Supplementary Table S9). The model predicts that suitable conditions for Heterocephalus occur in dry open savannah habitats in eastern Kenya, a part of the Somalian eastern coast, and the lower slopes of Ethiopian highlands adjacent to the Afar Triangle, with extension to the plains of Ogaden and hilly landscape of Somaliland (Supplementary Fig. S3). No records of the naked mole-rats exist from the flat arid area of the Afar Triangle. This is likely due to a combination of extremely low rainfall ( < 200 mm/year), high temperatures (average max. of the warmest month \(\approx\)40 °C), and saline soils covering much of the area. Naked mole-rats are predicted to occur especially on sandy soil in hot and dry climate, but not extremely hot and dry (Supplementary Fig. S4). The genus is found, however, at the southern edge of Afar and its presence there is documented already by fossils pre-dating the last glacial maximum (LGM)43.

Our distribution model for LGM predicts that Heterocephalus occurred in two separated areas. In the northern part, mole-rats could occur in a narrow coastal strip of land along the Golis Mountains (Supplementary Fig. S3), in an area currently occupied by the glaber lineage. Notably, Golis Mts. and adjacent parts of the Ethiopian Highlands played a significant role also in diversification of other small mammals44,45. The southern area was much larger and coincides with the current distribution of ansorgei, which could be more widespread during the glacial period. Nevertheless, it seems it was separated from the distributional range of glaber by an area with unsuitable conditions. These predictions broadly correspond to the previously published models28, which indicated less suitable habitats during LGM, especially in Eastern Ethiopia, and two potential refugia: (i) in the Afar triangle and (ii) from Kenya to southern Ethiopia and Somalia. Due to their current geographic distributions and internal genetic structure, we can assume that the southern refugium was inhabited by ancestors of current ansorgei and the northern refugium (Golis Mountains and/or part of Afar Triangle) by glaber. The geographic record of phillipsi is too scarce to provide clear clues about its glacial distribution.

The three lineages occur in discernibly different climatic and soil conditions, based on 32 geographic sites assigned to them using molecular or morphological evidence (Supplementary Table S8). Whereas ansorgei occurs in sandy soils, phillipsi was found in soils with significantly less sand but more clay. In terms of climate, phillipsi inhabits hotter and much drier areas than those typically occupied by ansorgei and glaber. In contrast, ansorgei lives in areas with generally higher precipitation, while glaber occurs in conditions that are the most seasonal in temperature but the least seasonal in precipitation (Supplementary Table S9).

The ecological distinction of the lineages is comprehensively shown by between-group PCA (Fig. 4B, Supplementary Fig. S5) and scatter plots of the variables (Supplementary Fig. S6). Although these differences are intriguing, some caution is necessary when interpreting them in terms of ecological limits. The principal demand of African mole-rats, and the naked mole-rat particularly, is a combination of workable soil and presence of geophytes (i.e., plants with underground storage organs46,47), which can be very patchy. The relationship between macro-scale predictors and density of micro-scale suitable habitats may not be straightforward – especially given that local values of these predictors are interpolations from often sparse observational data. Without detailed local habitat data, such as geophyte density and characteristics across different soil horizons, we can only hypothesize there is a degree of differential adaptations between the lineages. It seems that especially the localities of phillipsi stand out due to the combination of extremely low precipitation and significant clay content in the soils. Interestingly, there is an extensive overlap between the tentative distribution of phillipsi and the presence of calcisols and gypsisols – soils characterized by significant accumulation, or even strongly cemented layers, of calcium carbonates and gypsum, respectively48. Reddish loams are developed at the surface, a substrate which is apparent in satellite imagery (cf. Fig. 1A) and in which specimens from two Somaliland sites were captured. The high clay content in these loams supports the retention of rainfall/water; however, deeper infiltration is limited by the cemented layers. Consequently, the area becomes deserted within a few months after the rains49. Such conditions likely impose high physiological demands, particularly during the dry season, and may promote the evolution of specific adaptations.

More than one species of the naked mole-rat

Taking all our results into consideration, we conclude that genus Heterocephalus contains at least two different species. Besides H. glaber, we resurrect here H. phillipsi Thomas 1885, which is supported by the mitochondrial CYTB gene as well as by the nuclear genomic dataset. Even though there is evidence of a limited gene flow between H. glaber and H. phillipsi, this phenomenon has been proven many times between distinct mammalian species, e.g.,50,51. Moreover, the initial split separating H. phillipsi from the ancestor of H. glaber and H. ansorgei is very old (4.1 Ma). Not only is it older than other congeneric mole-rat divergences, but also older than the age of most mammalian species52. H. phillipsi is distinctive also by its molar traits (Fig. 4A, Supplementary Fig. S2) and lives in specific habitats (Fig. 4B, Supplementary Table S9). The morphological differences between glaber and ansorgei are less conspicuous, and their divergence is much more recent. In line with31, we therefore suggest considering them as subspecies – H. glaber glaber Rüppell 1842 and H. glaber ansorgei Thomas 1903 – unless additional evidence supports their recognition as distinct species.

Our discovery of the hidden diversity of the naked mole-rat supports the idea that the Horn of Africa is one of the most endemic-rich regions of the continent53. While this has been long recognized for the neighboring Ethiopian Highlands54, geographical isolation of arid habitats of the Horn of Africa from the Sahara – caused by the Ethiopian Highlands and the lakes in the Great Rift Valley55 – also promoted the evolution of unique fauna. This is evidenced by the presence of mammalian endemics, including sengis56, gerbils44,57 and mice45,58. Unfortunately, the biodiversity of this part of the continent remains heavily understudied by modern methods, due to political instability, insecurity, and military conflicts. In fact, it is among the least-studied regions on Earth44,59 and further research will likely reveal additional genetic and species diversity. Importantly, the protection of Africa’s arid areas is insufficient, despite the fact that their unique biota faces increasing threats from global warming, overgrazing, and other pressures60.

New prospects for biomedical research

Although the naked mole-rat belongs to the best studied wild rodents, if not mammals, the information about its biology is limited almost exclusively to the laboratory investigation. To the best of our knowledge, founders of all naked mole-rat colonies kept in labs worldwide were collected from just a few localities in Kenya at the end of the 20th century. It means that all revolutionary discoveries including findings relevant to biomedical research (reviewed in ref. 26) have been obtained on mole-rats from few peripheral populations of a single lineage, H. g. ansorgei.

The unique naked mole-rats’ adaptations are generally attributed to their subterranean existence in harsh arid environment, where food is sparse and patchily distributed, and in large families, where many individuals share poorly ventilated burrows and nests20,25,61. Not surprisingly, almost all information about the biology of free-ranging naked mole-rats is available again only for H. g. ansorgei27. This subspecies occupies areas with the highest precipitation (implying higher food supply) and relatively easily workable sandy soils. In contrast, H. phillipsi lives in significantly hotter and more arid areas, in soils that are probably much less workable due to a higher content of clay (Supplementary Fig. S7). If the harsh conditions are indeed responsible for the evolution of specific adaptations studied in labs on H. g. ansorgei, such adaptations could be more pronounced in H. phillipsi. Therefore, our study not only sheds light on the hidden diversity of the naked mole-rats but also increases their potential as exciting models for advancing biomedical research.

Materials And Methods

General approach

This study integrates phylogenetic analyses of one mitochondrial gene and a reduced representation of the nuclear genome with a detailed examination of dental morphology and environmental niche modelling. The phylogenetic analyses were complemented by various clustering techniques used to delimit species boundaries and included divergence dating using fossilized birth-death model, as well as inference of key population genetic parameters using the multispecies coalescent with migration. The morphological analysis was supported by µCT scanning and included a reanalysis of previously published data. Environmental niche modelling was conducted at the genus level, based on all georeferenced records available in the literature and online databases.

Material

Our research sample consisted of naked mole-rats (Heterocephalus spp.) assembled from a variety of sources over time to get the most complete dataset possible for capturing the genetic diversity of Heterocephalus across its range. These included field-collected animals captured using Hickman’s traps or by traditional way of hunting. The biological material and data came from authors, people included in the acknowledgement, museum collections, scientific literature, and online databases (e.g., GBIF, iNaturalist). Captured animals were humanely euthanized by Xylazyne for detailed morphological study, while others contributed tissue samples only (a small skin sample), and the animals were released. The genetic material was stored in 96% ethanol, and whole bodies in 70% ethanol.

All fieldwork complied with the legal regulations, and sampling was carried out with the permission of local authorities. Special thanks are due to her Excellency Mrs. Shukri Haji Ismail (Minister of Environment & Rural Development, Republic of Somaliland) for issuing the collection and export permit (Ref. MOERD/M/I/251/2017 (3.9.2017), MOERD/M/I/721/2019 (13.7.2019)). Other fieldworks were enabled thanks to following permits: Ref. 425/DEDD/19 (5.1.2020; Ministere De L,Urbanisme, De L’environnement Et Du Tourisme; Republique de Djibouti), Ref. DA31/349/14 (14.4.2014; Ethiopian Wildlife Conservation Authority, Federal Democratic Republic of Ethiopia), Ref. DhBBBO/BH-36/1452 (29.3.2019; Oromia Forest and Wildlife Enterprise, Federal Democratic Republic of Ethiopia), Ref. 2231/311/211 (27.3.2019; Ethiopian Wildlife Conservation Authority, Federal Democratic Republic of Ethiopia), Memorandum of understanding between Oromia Forest and Wildlife Enterprise and The Joint Ethio-Russian Biological Expedition (20.4.2017), Memorandum of understanding between Oromia Forest and Wildlife Enterprise and The Joint Ethio-Russian Biological Expedition (18.2.2016). We have complied with all relevant ethical regulations for animal use. Museum material was obtained from existing curated museum collection with institutional approval.

Sanger sequencing

Genomic DNA from fresh tissues was extracted using the GeneJET Genomic DNA Purification Kit (Thermo Fisher Scientific) following the manufacturer’s protocol. DNA quality was verified on a 1% agarose gel, and concentration was measured fluorometrically with a Qubit® 2.0 Fluorometer using the Qubit dsDNA BR Assay Kit (Invitrogen). Genomic DNA from the museum specimen was extracted from dry skin and purified with the DNeasy Blood & Tissue Kit (Qiagen).

The mitochondrial cytochrome b (CYTB) gene was Sanger-sequenced in 19 individuals. Sequencing failed in three samples from Djibouti, likely due to the presence of a nuclear CYTB pseudogene; however, in one of these, the mitochondrial CYTB sequence was successfully obtained using specifically designed primers. One additional sequence was obtained from the museum specimen. As this DNA was highly degraded, only short CYTB fragments ( ~ 75–250 bp) were amplified using six primer pairs. This data set was completed with published sequences from 17 individuals included in our nuclear genomic dataset28,62 and two additional individuals from localities otherwise unrepresented in our data29. In total, CYTB sequences of 38 individuals were analyzed (Table S1). For divergence dating, we additionally Sanger-sequenced five nuclear genes: recombination activating gene 1 (RAG1), intron 7 of β-fibrinogen gene (FGB), intron 7 of 24-dehydrocholesterol reductase precursor (DHCR), intron 9 of smoothened homolog precursor (SMO), and intron 7 of transient receptor potential cation channel, subfamily V, member 4 (TRPV).

All primers used are listed in Supplementary Table S10. Each 10 μl PCR reaction contained 5 μl of Qiagen Multiplex PCR Master Mix or HotStarTaq Master Mix Kit, 0.3 μl of each forward and reverse primer (10 μM), 0.5 μl of DNA, and 3.9 μl of ddH2O. PCR conditions for CYTB marker followed those described in ref. 29. For amplification of short museum fragments, the program included an initial denaturation at 95 °C for 3 min; 45 cycles of 95 °C for 30 s, 51 °C for 30 s, and 72 °C for 30 s; and a final extension at 72 °C for 6 min. The thermocycling conditions for RAG1, DHCR, SMO, and TRPV included an initial denaturation step at 95 °C for 15 min, 10 cycles of 94 °C for 30 s, 65 °C for 30 s (decreasing by 1 °C with each cycle), and 72 °C for 1 min, followed by 25 cycles of 94°C for 30 s, 55 °C for 30 s, and 72 °C for 1 min, and a final extension at 72 °C for 10 min. Amplification for FGB gene started with initial denaturation at 95 °C for 15 min, following by 35 cycles of 94 °C for 40 s, 59 °C for 45 s, 72 °C 1 min 30 s, and a final extension at 72 °C for 10 min. PCR fragment quality and size were verified by electrophoresis on a 1% agarose gel. Purification of PCR products was performed enzymatically using Exonuclease I (E. coli; 20,000 U/ml) and Calf Intestinal Alkaline Phosphatase (CIP; 10,000 U/ml) from New England BioLabs according to the following protocol: 0.05 μl Exo I, 0.1 μl CIP, 1 μl ddH₂O, and 5 μl PCR product; incubation at 37 °C for 30 min followed by enzyme inactivation at 85 °C for 15 min in a thermocycler. All genes were sequenced with forward primers and those with lower quality results, were sequenced from reverse side for the verification. The sequencing was accomplished by GenSeq s.r.o. company.

ddRAD sequencing and assembly

Double-digest restriction site-associated DNA (ddRAD) sequencing was performed on 37 individuals from 17 localities (Fig. 1A, Supplementary Table S1). Restriction enzymes SphI and MluCI were used, following the library preparation protocol of63 with modifications described in ref. 64. Sequencing was carried out on a NovaSeq 6000 platform (Illumina) with 150 bp long paired-end reads at the United Kingdom branch of Novogene Company Ltd. Processing of raw ddRAD reads began with demultiplexing and trimming of adaptors, barcodes and restriction enzyme recognition sites in Stacks v2.465. Subsequently, ddRAD loci were assembled using ipyrad v0.966 by mapping paired-end reads to the chromosome-level assembly of the Heterocephalus genome (GenBank accession no. GCA_944319715.1). The ipyrad parameter file used for the assembly is provided in Supplementary Table S11. Assembled ddRAD loci were then filtered to retain only loci with ≥90% occupancy (i.e., sequencing success across all individuals). This threshold was selected to minimize missing data while avoiding ascertainment bias, which was evaluated using custom scripts available at https://github.com/onmikula/phyloeda. The final dataset comprised 25,466 loci containing 147,986 single nucleotide polymorphisms (SNPs; median per locus=5). Subsets of this dataset were used for various analyses. The loci were distributed across all scaffolds of the reference genome, including 29 autosomes, X chromosome, and one unplaced scaffold. A small fraction of loci (473; 1.8%) was invariant, but most of them (24,764; 97.2%) were variable and contained at least one biallelic, parsimony-informative SNP.

The mitochondrial (CYTB) phylogeny

The CYTB phylogenetic tree was constructed using sequences from 38 individuals representing most of the known distribution of the naked mole-rat, Heterocephalus (Fig. 1A). Of these, 35 individuals also had ddRAD data available, while three originated from localities without ddRAD data (Fig. 1A, Supplementary Table S1). One sequence was obtained from a museum tissue sample from Luuq, Somalia, representing the first genetic data for the naked mole-rat from this country (Supplementary Text).

The phylogeny was inferred as an unrooted tree with unconstrained branch lengths using Markov chain Monte Carlo (MCMC) simulation of posterior probability distribution implemented in MrBayes v3.2.7a67. ModelFinder tool68 in IQ-TREE v2.1.369 was used to jointly select the optimal partitioning scheme and substitution models for the analysis. Specifically, the alignment was partitioned by codon position, with nucleotide substitution models HKY + G, HKY + I, and GTR + G applied to the first, second, and third codon positions, respectively. Each MCMC was run for 2 × 106 generations with Metropolis coupling (one cold and three heated chains), sampling every 1000 generations. Four independent runs were performed and convergence was assessed in Tracer v1.770. For each run, the first 10% of sampled trees were discarded as burn-in, and the remaining trees were pooled to form a single posterior sample. The pooled sample was summarized by an extended majority-rule consensus tree with posterior mean branch lengths. The tree was subsequently outgroup-rooted using two Fukomys mole-rats (GenBank Accessions: EF043451, EF043452), and the outgroups were pruned from the final tree.

Species delimitation was performed in the rooted consensus tree using the branchcutting algorithm32. After applying an inverse Laplacian transform to branch lengths, (cf.71,72), the algorithm calculates, for each branch, a score equal to the reduction in mean tip-to-tip distance when that branch is contracted to zero length. Kernel density estimation is then used to identify score outliers, whose corresponding branches are inferred to be ancestral to distinct species. The analysis was also applied to all trees in the pooled posterior sample, and support for each delimited species (identified in the consensus tree) was calculated as the proportion of trees in which that species appeared as distinct – that is, not merged or inter-mixed with others, or internally split. The method was implemented in R73 and is available at https://github.com/onmikula/branchcutting.

The nuclear genome differentiation

The nuclear genome differentiation was first examined using a phylogenetic tree inferred from 129,851 biallelic SNPs concatenated across all 24,764 loci and including the genome of Fukomys damarensis (GenBank accession GCA_000743615.1) as an outgroup. Sequences homologous to our ddRAD loci were identified in the F. damarensis genome using programs BWA74, SAMtools75, BEDtools76 and MAFFT77. The analysis followed the same procedure as for the CYTB phylogeny, except that the entire alignment was treated as a single partition. Also, species delimitation using the branchcutting algorithm was performed in the same way, i.e., on the consensus tree and pooled posterior sample after outgroup rooting and removal.

Next, we calculated the coancestry matrix34,78 from genotypes at all ddRAD loci, including invariant ones. The coancestry matrix (\(\text{C}\)) summarizes pairwise genetic similarity among individuals: each off-diagonal element \({\text{C}}_{i,j}\) (\(i\ne j\)) represents the proportion of loci at which an allele of individual \(i\) is most similar to an allele of individual \(j\) rather than to alleles of other individuals in the data set34. The maximum similarity – used as a proxy of the first-order coalescence relationship – was judged by the number of nucleotide differences (i.e., Hamming or uncorrected p distance). When alleles of multiple individuals were equally similar, fractional contributions were distributed equally among corresponding \({\text{C}}_{i,j}\) values. Species boundaries were then delimited using Infomap clustering33, which treats individuals as vertices in a network and coancestry values as weighted, directed edges connecting them. The method applies a modified Louvain algorithm79 to identify clusters that allow the most efficient description of a random walk through the network33. The coancestry matrix was estimated in RADpainter34 and clustering was conducted with Infomap v2.7.180.

To further explore patterns of genetic differentiation, we analyzed individual genotypes represented by a single SNP per locus, chosen at random from the SNPs that were biallelic, parsimony-informative and genotyped in as many individuals as possible. The resulting genotype matrix contained 24,764 sites (one per each locus containing at least one such SNP). First, we performed the principal component analysis (PCA) as described in ref. 81. For this purpose, the opposite alleles were arbitrarily labelled as ‘0’ and ‘1’ and individual diploid genotypes were coded by integers 0, 1 or 2, which indicate the counts of ‘1’ alleles. Next, we conducted Bayesian admixture analysis in STRUCTURE v2.3.335, which models individual multilocus genotypes as mixtures of \({\rm{K}}\) ancestral populations, each defined by characteristic allele frequencies and genotypes in Hardy-Weinberg proportions. Individual ancestry proportions and allelic frequencies in ancestral populations were co-estimated in a single analysis. The analysis was run with the number of clusters (\({\rm{K}}\)) ranging from 1 to 8, with ten replicate runs per K to check for MCMC convergence using StructureSelector82. The same application was used to determine the optimal number of ancestral populations based on the \(\Delta {\rm{K}}\) method83. Because sex was not known for all the individuals (see Supplementary Table S1), analyses assumed diploidy and linkage equilibrium across loci (although this assumption is violated for male X chromosomes). Results were consistent when analyses were repeated separately for autosomal and X-chromosomal loci, the latter including only individuals of known sex and accounting for haploidy in males.

Species tree and post-divergence gene flow

Given that the previous analyses recovered three distinct clusters of individuals, we treated each as a separate lineage, i.e., species in the phylogenetic sense. The basal split was unambiguously positioned between H. phillipsi and the clade comprising H. g. glaber and H. g. ansorgei. Accordingly, we fixed the species tree topology as “(phillipsi,(glaber,ansorgei));” for the joint estimation of divergence times (\(\tau\)) and population size parameters (\(\theta\)) under the multispecies coalescent (MSC) model36 implemented in BPP v4.7.084. Both the mean \(\theta\) and the root time \({\tau }_{1}\) were assigned diffuse gamma priors, G (shape = 2, rate = 260).

Post-divergence gene flow was evaluated under the multispecies coalescent with migration (MSC-M) model42,85,86. We tested all pairwise  migration rates (M) in both directions. For each rate, we calculated the Savage-Dickey density ratio87,88, an approximation of the Bayes Factor (BF)89, by comparing the prior and posterior densities at values near zero. A value of \(\text{BF}\ge 20\) was considered strong evidence for a non-zero migration rate90. Migration rates were first tested individually and ranked by their BFs; those meeting the threshold were then sequentially added to the model. All migration rate parameters were assigned the same prior, G(shape=2, rate=170), which allocates 50% of its probability density to M > 0.1. All MSC-based analyses in BPP were conducted using 1000 loci, selected to be as evenly distributed along the genome as possible (Supplementary Fig. S8).

The genetic distinctiveness of the lineages was further assessed using two complementary approaches. First, we estimated the genealogical divergence index (gdi), the probability that two sequences drawn at random from the same species coalesce within the species, i.e., before reaching the species’ origin and – in case of post-divergence gene flow – before coalescing with a sequence from another species91. The gdi was calculated for each branch of the species tree using the merge algorithm from the Hierarchical Heuristic Species Delimitation pipeline91, which iteratively applies the MSC-M model with specified migration bands. Second, we quantified nucleotide diversity (\(\pi\)), absolute divergence (\({d}_{\text{xy}}\)) and net divergence (\({d}_{\text{a}}={d}_{\text{xy}}-\pi\)) and fixation indices (\({\phi }_{{ST}}\)) among lineages. These statistics were calculated at multiple hierarchical levels: between phillipsi and glaber+ansorgei, between glaber and ansorgei, and between the sub-lineages of each glaber and ansorgei. All estimates were based on concatenated sequences of the 1,000 loci used in the MSC analyses. Nucleotide diversity measures were calculated using a custom R script. The fixation indices were estimated under the AMOVA framework92 implemented in the R package pegas93.

Divergence dating

To estimate divergence times within Heterocephalus, we inferred a time-calibrated phylogeny based on Sanger sequenced nuclear loci of phiomorph rodents, i.e., the clade encompassing African mole-rats (Bathyergidae)38. This analysis included 30 extant species, each represented by sequences from up to five nuclear markers (Supplementary Table S1), and 42 fossil species with specified ages and, where possible, phylogenetic placements (Supplementary Table S6). The analysis employed the Bayesian implementation of the fossilized birth-death model39,94, supplemented by three node calibrations: a uniform prior (\(\min =34.0,\max =56.0\)) on the origin of phiomorph radiation, and two identical lognormal priors (\(\mu =2.0\), \(\sigma =0.25,\text{offset}=23.0\)) on the origins of lineages ultimately leading to the extant subfamilies Heterocephalinae (Heterocephalus) and Bathyerginae (the remaining five extant genera). After confirming MCMC convergence, fossil species were pruned from the posterior samples, and trees in the pooled sample were summarized by the maximum clade credibility tree with mean node heights95.

Environmental niche modelling

The environmental niche of Heterocephalus was estimated from climatic and soil data using the maximum entropy (MaxEnt) approach42 as implemented in the R package maxnet v0.1.496. We considered 19 bioclimatic variables from the CHELSA database (97, https://chelsa-climate.org), estimated for the past 100 years (present) and for the Last Glacial Maximum (LGM, 21 ka before present). Five soil variables were downloaded from SoilGrids 2.0 (98, https://soilgrids.org): mass percentage of sand and clay, volumetric percentage of coarse fragments, bulk density, and carbon content. We used data from 15–30 cm depth, which is most relevant for the mole-rats. The distribution of the genus was characterized by 90 unique georeferenced presence records, including individuals from our genetic analyses and additional records compiled from a variety of sources (Supplementary Table S8). The background area for the MaxEnt model encompassed all terrestrial habitats within the minimum convex polygon around the state territories of Djibouti, Eritrea, Ethiopia, Kenya, Somalia, and Somaliland, covering the whole area from which Heterocephalus is known or could be expected. All input raster layers were downsampled to a spatial resolution of 1/6° (10 arcminutes), resulting in a background of 7621 grid cells.

Variable selection for the MaxEnt model was performed by backward elimination minimizing the corrected Akaike information criterion (AICc99). The final model retained 13 bioclimatic variables – temperature-related (BIO1–6, 9, 10) and precipitation related (BIO12, 14, 15, 18, 19) – and two soil variables (sand and clay fractions; Supplementary Table S9). The model used linear and hinge feature classes (transformations of predictor variables42), based on the AICc-based optimization performed at every step of the backward elimination process. The regularization coefficient was optimized internally by maxnet96.

Predictions were expressed as relative occurrence rates (RORs), which can be thought as fractions of the total predicted population that sum up to unity across the background area. RORs were scaled relative to the uniform prior expectation (i.e., \(1/7621\) per grid cell), making them directly interpretable as evidence ratios: for example, a scaled ROR of 2 indicates that the model provides twice as much evidence for Heterocephalus presence as expected under a uniform prior (i.e., without data). For the LGM, only climatic data were available; thus, predictions were based on a reduced model including only transformed climatic variables with non-zero coefficients in the full model. Feature classes were set to “linear,” meaning the variables were not further transformed. Predictions from the full and reduced models for the present were highly correlated (r = 0.92), supporting the use of the reduced model for LGM projections. The importance of predictor variables was assessed both graphically and numerically. Graphically, “double density plots” compared unweighted and ROR-weighted distributions of each predictor variable. Numerically, variable importance was quantified as the Akaike weight100 of the full model relative to a reduced model lacking the focal variable.

Lineage-specific means and ranges of ecological variables were calculated from 32 presence records that could be assigned to species either by genotype or, when unavailable, by dental traits combined with locality information. For glaber (N = 10), only genetically confirmed records were used, whereas for ansorgei (N = 16) and phillipsi (N = 6), additional specimens were included based on morphological and locality criteria, including four phillipsi (mostly holotypes) and six ansorgei. Interspecific differences were explored using between-group PCA (bgPCA101) of climatic and soil variables (scaled to unit variance). The analysis was performed on both the selected subset of variables and the full set, as variables optimizing the genus-level MaxEnt model may differ from those showing the greatest differentiation among lineages.

Morphometric analyses

Micro-computed tomography (µCT) scans enabled measurement of the poorly developed last molars (M3) in H. phillipsi, which only partly erupted from the jaw. The scanning was carried out at the X-ray imaging laboratory of the Institute of Experimental and Applied Physics, Czech Technical University in Prague, using a µCT scanner equipped with a micro-focus X-ray tube Hamamatsu L12531 and a flat-panel detector Dexela 1512 with a high-resolution CsI sensor. Tomographic reconstruction was performed using a dedicated module of Volume Graphics Studio MAX, and a manual segmentation of regions of interest was conducted in ORS Dragonfly software102.

We measured the length and width of both lower and upper molars from orthogonal views. For further analysis, we merged these data with published dental measurements31, that could be assigned to our lineages. To validate lineage assignments, we first plotted upper versus lower molar row lengths – with or without the third molars (when missing or unerupted). Next, we used bgPCA to assess differentiation in the shape of the first two molars (both upper and lower), which were present in all specimens. To correct for allometric effects, each molar length and width was standardized by dividing by the greatest skull length. Maximum sample sizes were N = 15 (ansorgei), 12 (glaber) and 5 (phillipsi).

Statistics and Reproducibility

All statistical analyses were conducted in R73 using packages and workflows detailed in the Methods and Supplementary Information. Data processing steps, parameter settings, and analysis scripts are fully described to allow replication. We did not encounter any issues affecting reproducibility.