Main

Global biodiversity is currently collapsing at an unprecedented rate, marked by extinction rates exceeding historical baselines, widespread erosion of ecosystem services and rapid declines in genetic diversity1,2,3. In response, the Kunming–Montreal Global Biodiversity Framework commits its signatories to protecting at least 30% of Earth’s land and seas by 2030 through ecologically representative, well-connected conservation networks4,5. Achieving this goal requires filling critical gaps in biodiversity knowledge, particularly in megadiverse regions where endemism and anthropogenic threats intersect.

Identification of global biodiversity hotspots (for regions hosting ≥1,500 endemic vascular plants with ≤30% of their original natural vegetation)6 is pivotal for determining conservation priorities7,8 to guide protected area networks and ecological corridor planning9,10. Modern conservation also encompasses biodiversity dimensions beyond taxonomic distinction, including the phylogenetic (evolutionary) diversity of species11,12,13. Furthermore, distinguishing between neo-endemics (recently diverged taxa) and palaeo-endemics (ancient relicts) is crucial for interpreting regional biodiversity but remains challenging due to inconsistent temporal thresholds14,15 and limited fossil or molecular data16. Phylogenetic endemism (PE) and the Categorical Analysis of Neo- and Palaeo-endemism (CANAPE) offer quantitative solutions by incorporating evolutionary history11,17. Yet, comprehensive assessments combining taxonomic and phylogenetic endemism remain rare in megadiverse regions with persistent data gaps18,19.

China’s extensive landmass and exceptional climatic and topographic heterogeneity (Extended Data Fig. 1) support over 30,000 vascular plant species, approximately 50% of which are endemic to the country20,21. The country spans four currently recognized global biodiversity hotspots: the Himalaya, Indo–Burma, Mountains of Central Asia and Mountains of Southwest China22 (Extended Data Fig. 2). Among these, the Mountains of Southwest China hotspot exhibits the highest levels of species richness and endemism10, a pattern attributed to the combined influences of tectonic uplift and the intensification of the Asian monsoon23,24,25. These geological and palaeoclimatic processes also fostered the evolution of unique evergreen broad-leaved forests26, which starkly contrast with the deserts and savannahs prevalent in analogous subtropical latitudes elsewhere27. Despite this richness, historical gaps in taxonomy, distribution data and phylogenetic resolution have obscured accurate assessments of China’s diversity and endemism patterns. Compounding these challenges, China’s flora is threatened by spatially heterogeneous extinction drivers, from changes in land-use and vegetation structure to extreme weather events and climate change28. An evaluation of plant diversity and endemism to identify conservation gaps and align spatial priorities with global biodiversity targets is therefore urgent4.

Here we provide a comprehensive spatio-temporal assessment of China’s endemic vascular plants by integrating taxonomic and phylogenetic endemism. Using 49,488 sequences from 18,259 taxa, we reconstruct the most comprehensive time-calibrated phylogeny for China’s vascular plants to date (99% of genera, 53% species coverage). By integrating >1.4 million distribution records with this phylogeny, we identify centres of neo- and palaeo-endemism at genus and species levels and evaluate their protection under national and global frameworks. Our analysis uncovers a critically overlooked biodiversity hotspot in Central China, a 1.54-million-km2 region characterized by moderate topographic relief and centred on Hubei, Hunan and Jiangxi provinces. Under the pressure of dense human populations, this region demands urgent conservation attention to safeguard its ecological integrity and connectivity under accelerating anthropogenic changes5,28.

Results and discussion

The origin and diversification of endemic plants in China

According to our newly compiled checklist, China harbours 124 endemic genera (4.1%) and 15,942 endemic species (51.4%) of vascular plants (Supplementary Table 1; Supplementary Data 1 and 2 provide more details). Our dated phylogeny includes 3,029 genera (99.0%) and 16,585 species (53.4%) of native Chinese vascular plants, of which 117 genera and 6,265 species are endemic (Supplementary Table 2). In this Article, we generated 2,996 sequences (atpB: 608, matK: 593, matR: 620, ndhF: 580 and rbcL: 595) for 620 species and 353 genera, of which 436 species and 35 genera lacked target sequences in public databases before this work. Our updated phylogeny now leaves only 32 accepted genera native to China unsampled, often due to low representation in herbarium collections, which warrant prioritized sampling in future research (Supplementary Table 3). Our geospatial analysis reveals gaps in available molecular data from species predominantly clustered in Hengduan Mountains and adjacent areas (Supplementary Fig. 1a,b), while the Qinghai–Tibet Plateau and Xinjiang possess the highest proportion of unsampled species (Supplementary Fig. 1c), with Xinjiang showing particularly acute under-sampling of taxa endemic to China (Supplementary Fig. 1d).

Our dated phylogeny reveals consistent relationships and divergence times for major clades in comparison with previous large-scale phylogenies29,30,31 (Fig. 1 and Extended Data Figs. 3 and 4; Supplementary Discussion provides details). Notably, we find that the endemic plants are unevenly distributed across the phylogeny (Fig. 1). The top five orders with the most endemic genera are Lamiales, Asterales, Apiales, Ranunculales and Gentianales, whereas those with the highest number of endemic species are Lamiales, Ericales, Asterales, Poales and Ranunculales.

Fig. 1: Summarized dated phylogeny and taxonomic endemism of Chinese vascular plants.
Fig. 1: Summarized dated phylogeny and taxonomic endemism of Chinese vascular plants.
Full size image

The phylogeny includes 16,585 species and 3,029 genera of vascular plants native to China (outgroups not shown). Major clades and bootstrap support values for major nodes are indicated with different colours. Bar charts illustrate the percentage of China’s endemic genera (blue bars) and species (red bars) in each order. Representative images illustrate the ten orders with the highest proportion of species endemic to China, with the ranks (1–10) corresponding to numbers on the tree.

Our lineage accumulation rate (LAR) analysis, which quantifies speciation events per unit time32, reveals a prominent peak in diversification for all genera at ~19.3 million years ago (Ma) during the Early Miocene, consistent with previous studies33,34, underscoring the Miocene as a pivotal epoch for the divergence of Chinese plant genera. For endemic genera, the LAR exhibits an initial lower peak at ~46.7 Ma (Middle Eocene), followed by a more pronounced peak at ~18.9 Ma (Early Miocene) (Fig. 2a). This pattern is supported by the largest proportions of endemic genera diversified during the Miocene (37.6%) and Eocene (18.0%) (Supplementary Table 4). Endemic genera originating in the Eocene and Miocene reach their highest richness in the Hengduan Mountains (Supplementary Fig. 2a,c), highlighting the region’s pivotal role in China’s endemic flora. The earlier Eocene peak coincides with the emergence of palaeo-endemics such as Acanthochlamys, probably facilitated by geographic isolation during the India–Asia soft collision and the early uplift of the Hengduan Mountains25. The subsequent Miocene peak aligns with the rapid in situ diversification of alpine flora in the Hengduan Mountains, probably driven by intensified tectonic uplift and monsoon dynamics that increased environmental heterogeneity and climatic gradients23,24.

Fig. 2: Temporal origin of endemic and non-endemic Chinese plants.
Fig. 2: Temporal origin of endemic and non-endemic Chinese plants.
Full size image

a,b, Number of genera (a) and species (b) that originated during specific geological timespans, with red columns showing the number of China’s endemic taxa and grey columns for non-endemic taxa. The LARs are shown with blue lines for all taxa and red lines for endemic taxa. Asterisks on the lines indicate peaks in LAR. c,d, The inset violin plots illustrate comparisons in age distribution between all, endemic and non-endemic taxa. Horizontal lines in the violin plots represent medians, bottom and top edges of boxes represent first and third quartiles and whiskers extend to 1.5 times the interquartile range. Sample sizes (n): all genera = 2,725, endemic genera = 101, non-endemic genera = 2,624, all species = 14,925, endemic species = 5,748, non-endemic species = 9,177. Plc., Palaeocene; Olig., Oligocene; Plio., Pliocene; Plt., Pleistocene.

The LAR reveals a congruent trend for all species and endemic species, each exhibiting a pronounced Pleistocene diversification peak (~2.3 Ma and ~2.1 Ma, respectively; Fig. 2b), highlighting the Pleistocene as a critical epoch for speciation of China’s extant plant diversity. The Pleistocene climate was marked by repeated glacial–interglacial cycles35, with glacial phases driving plant populations southward or to lower altitudes on mountains, while interglacial intervals enabled northward or upslope range expansions35,36. Species originating during the Pleistocene are primarily concentrated in southern China, including the Hengduan Mountains in the southwest and the Daba, Dalou and Nanling Mountains in the southeast (Supplementary Fig. 2g,h). The exceptionally high species endemism in the Hengduan Mountains is closely linked to continuous and rapid in situ diversification since the mid-Miocene24, exemplified by spectacular alpine radiations in major clades such as Rhododendron37 and Corydalis38. Moreover, growing evidence suggests that the low- to mid-elevation mountains in southeastern China acted as crucial Pleistocene refugia36,39,40 and evolutionary ‘pumps’. During glacial–interglacial cycles, population contractions and expansions facilitated isolation, secondary contact and hybridization among refugial populations, promoting lineage divergence and speciation41,42,43,44,45.

Pleistocene climatic dynamics are inferred to have spurred rapid diversification of young, geographically restricted species, as median divergence age of endemic species is significantly younger than widespread species (4.8 Ma vs 7.2 Ma, P < 0.001; Fig. 2d). This pattern highlights the profound and lasting influence of Pleistocene climatic oscillations on the assembly of China’s modern floristic diversity, in the aftermath of geological and climatic events at global and regional levels that set the stage for diversification46. Crucially, these insights are uniquely accessible through a comprehensive, species-level reconstruction of the Tree of Life, underscoring the value of phylogenetic frameworks in deciphering evolutionary drivers31.

Conservation implications of endemism centres

China’s endemic vascular plant richness shows a strong positive correlation with total richness at both genus and species levels (Supplementary Fig. 3). Spatially, China’s endemic species share a similar diversity pattern with all species that peaks in southwestern China, particularly the Hengduan Mountains and adjacent regions (Extended Data Fig. 5 and Supplementary Fig. 4b). In contrast, China’s endemic genera exhibit a markedly distinct distribution of diversity, concentrated in the mountainous areas surrounding the Sichuan Basin, including the Hengduan and Daba mountains (Extended Data Fig. 6 and Supplementary Fig. 4a). This discrepancy is probably attributed to the limited proportion of endemic genera, approximately 4% of all vascular plant genera native to China (Supplementary Table 1).

We identify taxonomic centres of neo- and palaeo-endemism as the top 5% richest grid cells within the oldest and youngest quartiles of taxa (Extended Data Fig. 7), following a threshold widely applied in large-scale biodiversity studies16,33,47. Both genus- and species-level analyses consistently reveal three major endemism centres: (1) the Hengduan Mountains, (2) Central China and (3) Yunnan–Guizhou–Guangxi boundary region (Fig. 3a,b; Supplementary Table 5 provides area, richness and endemism data). At the genus level, the two eastern centres (Central China and Yunnan–Guizhou–Guangxi boundary region) are dominated by palaeo-endemic genera, whereas the Hengduan Mountains centre predominantly harbours neo-endemic genera (Fig. 3a). In contrast, species-level endemism centred in the Hengduan Mountains exhibits a mixed distribution of both neo- and palaeo-endemism (Fig. 3b).

Fig. 3: Geographic distribution of taxonomic and phylogenetic endemism centres for vascular plants in China.
Fig. 3: Geographic distribution of taxonomic and phylogenetic endemism centres for vascular plants in China.
Full size image

a,b, Taxonomic endemism centres based on the top 5% criterion at the genus (a) and species (b) levels: (1) the Hengduan Mountains, (2) Central China and (3) Yunnan–Guizhou–Guangxi boundary region. Grid cells in pink represent centres of the youngest quartile, blue the oldest quartile and purple mixed centres of the two types. c,d, Phylogenetic endemism centres identified by the CANAPE analysis at the genus (c) and species (d) levels. Grid cells in pink represent centres of neo-endemism, blue centres of palaeo-endemism and purple centres of mixed-endemism. The donut charts show the proportion of grid cells occupied by different types of endemism centres. Current national nature reserves are highlighted in dark green, while four global biodiversity hotspots are highlighted in brown. Numbers above the maps denote taxa involved in the analyses. The national boundary layer was downloaded from the Standard Map Service website (approval number GS(2019)1823; http://bzdt.ch.mnr.gov.cn/browse.html?picId=%224o28b0625501ad13015501ad2bfc0256%22; accessed October 2022).

To validate the robustness of our quartile-based classification, we applied an alternative approach by categorizing neo- and palaeo-endemic taxa using fixed temporal thresholds (23 Ma for genera and 5 Ma for species; Extended Data Fig. 8a–d), according to previous studies16. This method yields comparable patterns of endemic richness and endemism centres (Extended Data Fig. 8e,f) to those derived from the quartile approach (Fig. 3a,b). Moreover, centres based on the top 5% richness (and top 5% proportions) of all China’s endemic species with distribution data available (13,724 species) also cluster in the Hengduan Mountains, Central China and Yunnan–Guizhou–Guangxi boundary region (Supplementary Fig. 5). This congruence reinforces the robustness of taxonomic endemism centres identified based on 39.2% of China’s endemic species (6,248 species) with both divergence age estimates and distribution data available (Fig. 3b).

Our genus-level CANAPE analyses of Chinese vascular plants identify centres of phylogenetic endemism across southern China (including the coastal regions of South China, Yunnan, Hainan, Taiwan and the Himalayas), northwestern China (Tianshan–Altai Mountains) and northeastern China (Changbai Mountains) (Fig. 3c), consistent with a previous angiosperm analysis12. Phylogenetic endemism patterns at the species level align broadly with the genus-level results, except for a notable expansion of neo-endemism centres in the Hengduan Mountains (Fig. 3d). This discrepancy is consistent with previous studies that genus-level CANAPE analyses tend to underestimate neo-endemism18,48. Such underestimation arises because large genera, particularly those with rapid evolutionary radiations, are treated as single operational units in genus-level analyses, obscuring recent species-level diversification. Notably, the phylogenetic approach often reveals endemism centres influenced by ‘border effects’12, particularly in countries such as China where geopolitical boundaries ‘fragment’ natural distributions. To minimize this bias, we applied the CANAPE analysis exclusively to China’s endemic taxa, which confirms the robustness of species-level phylogenetic endemism centres (Fig. 3d and Extended Data Fig. 9b). The observed discrepancies between endemic and all genera analyses largely arise from the limited number of endemic vascular plant genera in China (Fig. 3c and Extended Data Fig. 9a).

The disparities between centres of taxonomic and phylogenetic endemism underscore their complementary conservation implications. Taxonomic endemism centres highlight regions rich in taxa restricted to China, many of which face heightened vulnerability from habitat degradation and climate change49. Protecting these areas is critical for both national and global conservation strategies, as the loss of these taxa would represent an irreplaceable reduction of planetary biodiversity50,51. In contrast, phylogenetic endemism centres reflect over-representation of lineages with restricted ranges in China, even if some taxa extend beyond the national boundaries. While geopolitical boundaries may obscure geographic endemism patterns, they remain critical for conservation, as efforts to protect such taxa rarely receive equal attention across political boundaries, particularly in regions with stark socio-economic disparities52,53. Moreover, the substantial spatial incongruence between taxonomic and phylogenetic endemism centres demonstrates that neither metric alone captures the full breadth of biodiversity, limiting their effectiveness in isolation. Integrating both dimensions is therefore essential for comprehensive conservation planning and ensuring biologically unique areas in megadiverse regions are not overlooked.

Several methodological limitations in the identification of endemism centres should be acknowledged. First, while our species sampling is broadly representative, it remains incomplete, especially in the Qinghai–Tibet Plateau and Xinjiang of western China (Supplementary Fig. 1), which may hinder precise delineation of neo- and palaeo-endemism centres in these regions. Second, geographic distribution data, primarily derived from public databases and herbarium collections, are often spatially biased towards more accessible or well-surveyed areas. Third, phylogenetic reconstruction and dating, based on a limited number of molecular loci, may introduce uncertainties, particularly for deep divergences and rapid radiations. Despite these limitations, our core conclusions regarding spatio-temporal patterns and conservation priorities remain robust. This robustness is further supported by the consistent patterns of both taxonomic and phylogenetic endemism centres across ten randomly selected bootstrap trees (Supplementary Figs. 6 and 7). These patterns suggest that our findings are not artefacts of phylogenetic uncertainty. Future studies that incorporate broader species sampling, more geographically balanced distribution data and high-resolution genomic datasets will further refine these findings.

An ignored global biodiversity hotspot in Central China

We identify conservation gaps across China by overlaying endemism centres with national nature reserves and previously recognized global biodiversity hotspots. A pronounced spatial imbalance in conservation efforts emerges along the Hu Huanyong Line, a demarcation of China’s east–west population divide54,55. West of this line, national nature reserves are fewer in number but larger and more contiguous (n = 78, total area: 889,410 km2), whereas the densely populated east hosts a higher density of smaller, fragmented reserves (n = 346, total area: 103,417 km2). This fragmentation, reflecting intense anthropogenic pressures, including land-use conversion for urban, agricultural and infrastructural development10,47, directly challenges implementation of the Kunming–Montreal Global Biodiversity Framework’s target for maintaining ecosystem connectivity4,5. Globally, western China’s endemism centres intersect with three global biodiversity hotspots, Mountains of Central Asia, Himalaya and Mountains of Southwest China, whereas eastern China’s tropical–subtropical zones show only limited overlap with the Indo–Burma hotspot (Fig. 3 and Extended Data Fig. 2). Strikingly, Central China emerges as a critical conservation gap in both national and global frameworks (Figs. 3 and 4). To address this gap, we present updated, comprehensive taxonomic checklists (Supplementary Data 3 and 4) and advocate the formal recognition of Central China as a global biodiversity hotspot.

Fig. 4: Geographic range and protection status of the newly proposed Central China global biodiversity hotspot.
Fig. 4: Geographic range and protection status of the newly proposed Central China global biodiversity hotspot.
Full size image

The newly proposed Central China hotspot (light green) encompasses taxonomic endemism centres (yellow grid cells) at genus and species levels using quartile and temporal threshold criteria. Brown shading shows partial overlap with four recognized global biodiversity hotspots in China, and dark green indicates current national nature reserves. The Hu Huanyong Line (black dashed) demarcates China’s east–west population divide, extending from Heihe City (Heilongjiang Province) to Tengchong City (Yunnan Province). Orange curves indicate major mountain ranges in Central China: (1) Qinling, (2) Daba, (3) Huaying, (4) Dalou, (5) Wuling, (6) Miaoling, (7) Xuefeng, (8) Luoxiao, (9) Nanling, (10) Wuyi, (11) Tianmu and (12) Dabie. Representative vascular plant genera endemic to Central China are illustrated on the right. The national boundary layer was downloaded from the Standard Map Service website (approval number GS(2019)1823; http://bzdt.ch.mnr.gov.cn/browse.html?picId=%224o28b0625501ad13015501ad2bfc0256%22; accessed October 2022).

Myers et al. have previously noted several regions of exceptional endemism under threat, including southeastern China, whose hotspot status remains unresolved due to data limitations6,22. Robust data foundations now exist for revising China’s hotspot system, built over two decades through compilation of national/regional checklists and floras20, digitization of >8.53 million specimens (Chinese Virtual Herbarium, http://www.cvh.ac.cn/) and updated regional phylogenies29,56. Building on this major increase in data availability and the results from previous analyses, we propose Central China (24–34° N and 103–122° E; encompassing much of southeastern China) as a biodiversity hotspot spanning 1.54 million km2, broadly corresponding to Takhtajan’s Central China Floristic Province57 (Supplementary Note provides details of geographic delineation).

Comprehensive data analyses confirm that Central China meets the strict criteria for hotspot designation6 (Supplementary Tables 6 and 7). Comparisons with the four recognized subtropical hotspots further highlight its exceptional plant diversity and extensive vegetation loss (Table 1). This region harbours extraordinary biodiversity and endemism, hosting at least 14,431 vascular plant species, 2,024 of which are endemic to this region (Supplementary Table 6 and Supplementary Data 4). A parallel assessment based on coordinate data from the Global Biodiversity Information Facility suggests that the number of endemic species may be as high as 2,158 (Supplementary Data 5). Whereas the exact figures require further validation, current data place Central China sixth in total vascular plant species richness and 27th in endemics among the 36 recognized global biodiversity hotspots22,58,59. However, as one of China’s most densely populated areas (east of the Hu Huanyong Line), it faces intense land-use pressures and has experienced extensive habitat degradation55, with over 93% of its original vegetation lost (Extended Data Fig. 10 and Supplementary Table 7). Protected areas currently cover just 7% of the region, with national nature reserves and national parks accounting for only 3% (Supplementary Table 8).

Table 1 Comparison of area, vascular plant diversity, endemism and primary vegetation loss between Central China and four subtropical global biodiversity hotspots

Situated within the unique East Asian subtropical evergreen broad-leaved forest ecoregion (Fig. 4 and Supplementary Fig. 8), Central China serves as both a museum and a cradle of biodiversity16,33,40. This dual role is exemplified by its unparalleled concentration of palaeo-endemic genera (Fig. 3a and Extended Data Fig. 8e), including ‘living fossils’ such as Ginkgo, Metasequoia and Davidia34,60,61, alongside a mixture of neo- and palaeo-endemic plant species (Fig. 3b and Extended Data Fig. 8f). The region is also a critical hub for insect and vertebrate diversity62,63, providing refuge for flagship conservation species including the giant panda and snub-nosed monkey64,65. Amphibians, the most threatened vertebrate class (40.7% of species globally threatened)66, further exemplify the region’s richness and endemism, with over 100 regionally endemic species documented (Supplementary Table 6). However, six key amphibian diversity centres in this region remain inadequately protected47, reflecting systemic conservation gaps spanning both flora and fauna.

Alarmingly, accelerating climate change now represents the greatest extinction risk to the region’s flowering plants28, compounding the existing pressures from dense human populations and fragmented landscapes. Addressing protection gaps in this complex socio-ecological region requires a multi-path strategy that integrates spatial, institutional and socio-economic approaches. First, remaining intact patches of native vegetation should be promptly incorporated into high-level conservation instruments, such as the national park system, in line with China’s commitment to deliver on the ‘30 × 30’ protection target of the Kunming–Montreal Global Biodiversity Framework4,67, to ensure strict land-use regulation and enhance cross-jurisdictional coordination68. Priority should be given to formally establishing national park pilots in key areas, such as the Daba Mountains (Shennongjia pilot) in the north and the Nanling Mountains (Nanling pilot) in the south (Extended Data Fig. 10). Second, in heavily human-dominated landscapes, fine-scale conservation interventions led by local governments, such as plant micro-reserves69, stepping-stone habitats70 and a range of Other Effective Area-Based Conservation Measures71 (for example, community-managed forests, traditional sacred natural sites72 and biodiversity-friendly agricultural mosaics), can enhance functional connectivity and safeguard range-restricted endemic species67. Third, socio-economic mechanisms, including ecological compensation, community co-management and livelihood diversification, are vital for reducing conservation-development conflicts and fostering long-term stewardship73,74. These should be considered by both the public and private sectors, such as through the establishment of a science-based market for high-integrity biodiversity credits to support conservation and restoration efforts75. Finally, recognizing Central China as a global biodiversity hotspot during the upcoming revision of the framework announced by the International Union for Conservation of Nature76 would elevate the region’s international conservation priority and guide targeted, context-appropriate planning77, advancing conservation action and supporting China’s sustainability ambitions.

Methods

Taxon sampling and phylogenetic reconstruction

We used sequences of five loci, including four chloroplast genes (atpB, matK, ndhF and rbcL) and one mitochondrial gene (matR) for phylogenetic reconstruction. Our phylogeny was built upon two early versions of Chinese vascular plant tree of life. The first version included 93% genera and 19% species56 and the second version encompassed 96% genera and 44% species29. Compared with the second version phylogeny by Hu et al.29, we newly added 10,023 sequences representing 3,267 species and 996 genera, of which 2,996 sequences (representing 620 species and 353 genera) were newly sequenced in this study and 7,027 (representing 2,727 species and 858 genera) were newly downloaded from GenBank.

Initially, we downloaded all target sequences for approximately 2,727 species from GenBank uploaded after Hu et al.29 (accessed September 2022). Low-quality sequences were identified and replaced or removed through phylogenetic tree reconstruction. For species with multiple sequences of the same locus, we retained the longest one. To address gaps in public databases, we conducted targeted field surveys to collect taxa lacking available sequences. Voucher specimens for newly sampled taxa were deposited at the Herbarium of Institute of Botany, Chinese Academy of Sciences (PE). We newly generated a total of 2,996 sequences for 620 species (92.6% are species endemic to China), most of which were either absent from GenBank or of low quality (for example, shorter than 200 bp). Of these, 436 species had no molecular data previously available in GenBank and the remaining 184 species were replaced with newly generated, higher-quality sequences.

Our phylogeny has sampled all orders and families of vascular plants native to China, with the sole exception of Corsiaceae, represented by the presumably extinct genus Corsiopsis78. At the genus level, previous studies lacked molecular data for 133 genera29, many of which are monotypic, oligotypic or narrowly distributed and have been historically underrepresented in herbarium collections. We newly incorporated 70 of these genera into our phylogeny (Supplementary Table 9) and evaluated taxonomic statuses of the remaining unsampled genera. Only 32 accepted genera native to China remain absent from the phylogeny (Supplementary Table 3), highlighting critical targets for future research. Additionally, we examined the spatial distribution of molecular data coverage by mapping the richness and proportion of taxa lacking target molecular loci within grid cells to locate regional data gaps.

Total genomic DNA was extracted from silica-gel dried leaf materials and sequenced on the Illumina HiSeq and DNBSEQ-T7 platforms by Beijing Novogene Bioinformatics Technology Co. Ltd. The raw sequencing data yielded ~10 Gb of 150 bp paired-end reads per sample. For all newly sampled species, full-length reference sequences for each target gene from the same family were downloaded from GenBank as the target file. Coding sequences of the five genes were extracted from the whole-genome sequencing reads using the hybpiper assemble command of the Hybpiper79 pipeline v.1.3.1. Voucher information for newly sampled species and corresponding GenBank accession numbers are provided in Supplementary Data 6. To validate identification for species with newly generated sequences, we used the BLAST tool provided by the National Center for Biotechnology Information (NCBI) and checked morphology of specimens. The newly obtained sequences were automatically aligned using MAFFT80 v7.508 and then manually checked in BioEdit81.

Incorporating newly generated sequences with alignments from Hu et al.29, our final concatenated matrix included 18,259 taxa and 49,488 sequences (7,844 atpB, 15,666 matK, 1,843 matR, 8,326 ndhF and 15,809 rbcL). One hornwort (Anthoceros angustus), two liverworts (Pellia endiviifolia and Aneura mirabilis) and two mosses (Syntrichia ruralis and Physcomitrium patens) were selected as outgroup taxa following Chen et al.56. Each gene in the concatenated matrix was treated as a separate partition. To ensure taxonomic accuracy, we conducted preliminary maximum likelihood analyses and iteratively validated species placements against established classifications such as the Angiosperm Phylogeny Group classification (APG) IV, the Pteridophyte Phylogeny Group classification (PPG) I and Flora of China20,82,83. Sequences yielding phylogenetically incongruent positions were either replaced or removed until all placements were resolved as taxonomically consistent. Final phylogenetic reconstruction was performed in RAxML84 8.2.12 under the GTRGAMMA model, with 100 bootstrap replicates. The resulting tree included 18,259 tips, representing 17,853 species and 3,288 genera, of which 16,585 species (53%) and 3,029 genera (99%) are native to China. Monophyly assessments revealed six non-monophyletic families and 692 non-monophyletic genera (Supplementary Data 7; Supplementary Discussion provides details). The phylogenetic tree was visualized using iTOL85 (https://itol.embl.de/).

Divergence time estimation

Divergence times were estimated using the penalized likelihood (PL) approach implemented in treePL86, a computationally efficient and widely adopted method for dating large phylogenies31,33. We incorporated a total of 220 calibration points covering major clades (Extended Data Fig. 3; Supplementary Data 8 provides calibration sources). The maximum and minimum crown ages were constrained to 454–416 Ma for vascular plants, 366.8–318 Ma for seed plants and 245–136 Ma for flowering plants, following previous studies87,88. We dated the best maximum likelihood (ML) tree with the optimal parameters (opt = 2, optad = 2, optcvad = 0 and smooth = 1 × 10−10). To quantify uncertainty, confidence intervals for node ages were calibrated based on 100 bootstrap trees, which were constrained using topology of the best ML tree allowing branch lengths to vary, following Magallón et al.89. Divergence times for the 100 bootstrap trees were calculated using the same treePL parameters. Statistical distributions of ages for all nodes (for example, median, maximum and minimum) were summarized in TreeAnnotator v.2.7.590.

To cross validate our divergence time estimates, we compared node ages at family and genus levels with those from two recent global macroevolutionary studies30,31. Whereas these studies employed different maximum age constraints for angiosperms, we focused on their estimates derived under constraints comparable to our own calibration (crown age constraint for angiosperms: max = 256 Ma or 247 Ma) to ensure comparability. We used Spearman’s rank correlation coefficient to evaluate concordance between estimates. The results revealed broad temporal concordance across clades, particularly at the genus level, suggesting robustness of our divergence time estimates.

Compilation and revision of vascular plants endemic to China

The list of China’s endemic vascular plant genera was compiled primarily from ref. 91, applying a strict definition of taxonomic endemism that excluded genera with updated distribution records beyond China. Scientific names for these genera were standardized according to Catalogue of Life China (Species 2000 China Node, http://www.sp2000.org.cn/) and Plants of the World Online (https://powo.science.kew.org). Discrepancies between the two databases were resolved through expert consultation to ensure taxonomic consistency. After excluding synonyms, 124 genera (4.1% of China’s native vascular plant genera) were retained (Supplementary Data 1).

For endemic species, we integrated datasets from Huang et al.92 for seed plants and Zhou et al.93 for lycophytes and ferns, resulting in 15,942 endemic species (Supplementary Data 2). These represent 51.4% of China’s vascular plant diversity. Our phylogenetic analysis included 117 endemic genera (94.4% of total endemic genera) and 6,265 endemic species (39.3% of total endemic species), covering all major clades of vascular plants (Extended Data Fig. 3). To evaluate endemism patterns across lineages of the phylogenetic tree, we quantified endemic genera and species per plant order and calculated their proportional representation within the total endemic taxa.

Distribution data assemblage

Geographic distribution records of vascular plants in China were assembled from two primary sources: the gridded distribution data for angiosperms from ref. 28 and county-level records for gymnosperms, lycophytes and ferns from the Chinese Virtual Herbarium (accessed December 2021). Circumscription of families followed PPG I for lycophytes and ferns83, Christenhusz et al. for gymnosperms94 and APG IV for angiosperms82. To ensure data quality, newly acquired records were cleaned and validated based on protocols from refs. 28,33, including (1) standardizing taxonomic names according to Flora of China20, (2) removing cultivation/non-native occurrences, (3) correcting county names, (4) combining infraspecific taxa records to corresponding species and (5) excluding duplicate records in each county. The angiosperm and non-angiosperm datasets were merged and mapped onto a standard map of China (review drawing number: GS(2019)1823), which was divided into 1,155 grid cells of 100 km × 100 km under the Albers equal-area projection. Grid cells <50% land area (that is, <5,000 km2) were excluded, resulting in a final dataset of 1,421,390 records across 941 grid cells, which encompassed 26,604 species (13,725 endemic) and 2,873 genera (116 endemic) native to China. After matching phylogeny with the distribution database, we obtained a total of 1,112,348 records for 14,906 species for downstream phylogenetically based analyses.

Temporal diversity pattern analyses

Divergence times for endemic and non-endemic vascular plant genera and species in China were extracted from a time-calibrated phylogeny (chronogram). To mitigate biases from incomplete sampling, we used the stem ages (rather than crown ages) for genera. For monophyletic genera, the stem node age was directly extracted. For non-monophyletic genera, the stem age of the largest clade within the genus was used95. For species, divergence times were derived from the terminal tip ages of the chronogram96. To account for phylogenetic uncertainty, ages for each genus and species were calculated as the median values across 100 time-calibrated trees with a fixed topology.

We quantified the number of endemic and non-endemic genera that originated during each five-million-year period since the Jurassic, whereas species-level analysis was conducted in two-million-year intervals during the Cenozoic. We also documented the number of genera, species, endemic genera and endemic species originated in each geological epoch (Supplementary Table 4). Age distributions for all, endemic and non-endemic taxa were log10-transformed and compared using the Kruskal–Wallis and Wilcoxon rank sum test. To reduce influence of outliers, we only included data within the 5th–95th percentiles range of ages for statistical analyses.

To reconstruct the diversification dynamics of endemic vascular plants in China, we estimated the lineage accumulation rates (LARs) at genus and species levels since the Cenozoic using the Julia package Biohistoria.jl97 following a previous study32. LAR quantifies speciation events per unit time by summing the probability density functions of the age distributions pointwise over all lineages. Compared to traditional indices, LAR carries additional statistical information beyond means, medians and confidence intervals32.

Spatial diversity pattern analyses

We quantified taxonomic richness for both endemic and non-endemic taxa within each grid cell as the total number of genera or species, using the specnumber function in the R package vegan98. We also calculated richness patterns separately for the youngest 25% and oldest 25% of genera and species. Because regions with high endemism do not necessarily coincide spatially with those of high overall diversity99, we assessed their spatial congruence by calculating the Pearson correlation coefficient in R100 between endemic richness and total richness at both genus and species levels across grid cells.

We used the R package canaper101 to calculate phylogenetic metrics, conduct randomization tests and perform statistical analyses. These analyses were performed separately for all genera/species and endemic genera/species. We calculated three phylogeny-based diversity metrics, including phylogenetic diversity (PD), phylogenetic endemism (PE) and relative phylogenetic endemism (RPE). PD was calculated as Faith’s PD by summing the total branch length connecting all species within each grid cell102. PE was calculated as the sum of branch lengths connecting species present in a grid cell, weighted by the inverse of range size of their descendant species, to quantify the geographic concentration of evolutionary history17. RPE was defined as the ratio of PE calculated on the original phylogenetic tree to PE calculated on a comparison tree with identical topology but equal lengths for all branches11. This ratio distinguishes areas dominated by palaeo-endemism (rare, long branches) from those dominated by neo-endemism (rare, short branches). To ensure comparability, branch lengths were scaled as proportions of the total tree length before calculating PD and PE, resulting in values between 0 and 1.

To identify grid cells with statistically significant patterns of RPE, we conducted randomization tests using 999 iterations of the ‘curveball’ algorithm implemented in the canaper package101. This algorithm generates null matrices by randomizing species occurrences within each grid cell while maintaining both the observed species richness per cell and the geographic range size of each species103. We then performed a two-tailed test (α = 0.05) against this null model. Grid cells were classified as significantly higher or lower than expected if the observed value fell above 97.5% or below 2.5% of the randomized values. Grid cells not meeting these thresholds were deemed consistent with patterns expected under random conditions11.

Identification of taxonomic endemism centres

We applied the quartile method to identify centres of taxonomic endemism and distinguish between young and old endemism33. We first ranked all endemic taxa (genera and species) by their estimated evolutionary age, from youngest to oldest. We then divided this ranked list into quartiles. For each grid cell, we calculated the richness (number of taxa) within the youngest 25% and oldest 25% of endemic taxa separately. We defined centres of taxonomic endemism as grid cells falling within the top 5% richness values for either the youngest quartile or the oldest quartile. Cells ranking in the top 5% for both quartiles were classified as mixed centres, while those only in the top 5% for the youngest quartile were young centres, and those only in the top 5% for the oldest quartile were old centres. Adjacent grid cells identified as centres were subsequently aggregated into major endemism centres based on geographic proximity and prior biogeographic knowledge104. Finally, we calculated the proportional representation of young, old and mixed centres at both the genus and species levels. To evaluate the impact of phylogenetic uncertainty on our results, we repeated the identification of quartile-based endemism centres using ten trees randomly selected from a set of 100 bootstrap trees.

To assess the robustness of our quartile-based classification of young and old endemism centres, we implemented an alternative temporal threshold method following established approaches16,91. Selecting appropriate thresholds to distinguish palaeo-endemics from neo-endemics is critical. At the genus level, we adopted 23 Ma, corresponding to the Palaeogene–Neogene boundary, as the threshold for palaeo-endemics and neo-endemics. Nearly all Chinese endemic gymnosperm genera originated in the Cretaceous or earlier (palaeo-endemics). For angiosperms, palaeo-endemic genera mainly arose in the Late Cretaceous–Palaeogene, consistent with evidence that many relict lineages (for example, Davidia, Eucommia) arose during this interval and later contracted to China91. In contrast, neo-endemic genera mostly emerged during the Neogene, matching recent large-scale phylogenetic studies that identify the Miocene as the primary diversification epoch for China’s plant genera33,34,95.

At the species level, the threshold distinguishing palaeo- from neo-endemics remains debated16. Nonetheless, given that the majority of China’s endemic species are concentrated in the Hengduan Mountains, a region that experienced rapid uplift and intensified monsoon influence during the late Miocene to Pliocene105,106, this interval represents a critical phase of accelerated in situ diversification23. Moreover, recent studies of the Sino-Himalayan flora highlight ~5 Ma as a key turning point, marked by rapid radiations that generated a substantial number of young, range-restricted species96. Therefore, we set the species-level boundary at 5 Ma.

Using these thresholds, we calculated richness patterns for old (>23 Ma) and young (<23 Ma) genera and old (> 5 Ma) and young (< 5 Ma) species, identifying endemism centres based on the top 5% richness criterion. The resulting richness patterns and endemic centres were broadly consistent with those derived from the quartile method. Consequently, we present the quartile-based results in the main text, with the temporal threshold method results provided in Extended Data.

While our quartile method used estimated lineage ages to classify species as old and young, only 6,248 species endemic to China (39.2%) have both divergence age estimates and distribution data available. To address potential impacts of this incomplete sampling on identifying endemism centres, we utilized distribution records for a larger subset: 13,724 endemic species (86.1%). We calculated the species richness and endemic proportion for each grid cell. Grid cells within the top 5% for each metric were identified as endemism centres. Comparing these results with those derived from the phylogenetically incomplete dataset confirmed the robustness of our analysis, specifically validating Central China’s status as a taxonomic endemism centre.

Identification of phylogenetic endemism centres

We identified phylogenetic endemism centres using the Categorical Analysis of Neo- and Palaeo-endemism (CANAPE)11. For species-level analysis, we directly utilized the species tree and corresponding distribution data. For genera exhibiting non-monophyly, we treated each monophyletic lineage as an independent operational unit in genus-level analysis12. The CANAPE workflow comprised two steps. First, we identified grid cells with significantly high PE on the actual tree (numerator of RPE), PE on a comparison tree with equal branch lengths (denominator of RPE) or both significantly higher than expected (observed value > 95% of randomized values; one-tailed test, α = 0.05). Second, grid cells meeting the above criteria were categorized into three types: if a grid cell has a significantly high or low RPE ratio (two-tailed test, α = 0.05), it is classified as a centre of palaeo-endemism or neo-endemism, respectively. If it is significantly high in both the numerator and the denominator (taken alone) but not significant for RPE, it is classified as a centre of mixed endemism11. Thus, neo-endemic centres cluster recent radiations (short branches), palaeo-endemic centres cluster narrow-range relictual lineages (long branches) and mixed centres contain both. Additionally, we repeated the CANAPE workflow using ten trees randomly selected from the 100 bootstrap trees to assess the impact of phylogenetic uncertainty on identifying phylogenetic endemism centres.

As CANAPE relies on range-weighted phylogenetic endemism17, restricted sampling to China may inflate PE values near borders where taxa extend beyond our dataset (‘border effect’)12. To evaluate this bias, we replicated analyses using strictly Chinese endemic taxa. We extracted the phylogenetic subtrees and distribution data subsets (116–genus; 6,248–species) for analyses based on the list of Chinese endemic vascular plants. The subtrees were extracted using the drop.tip function in the ape package107 in R100, whereas the distribution data subsets were filtered using a series of functions from the dplyr package108. These endemic datasets, unaffected by cross-border ranges, provide more accurate range-size estimates for PE calculations. The CANAPE workflow was reapplied identically to these subsets.

Mapping biodiversity hotspots and gaps in protection

To assess protection status of taxonomic and phylogenetic endemism centres, we overlaid these centres with current protected areas to identify conservation gaps. We used spatial data of protected areas primarily from ref. 33, focusing exclusively on national nature reserves. After excluding non-terrestrial reserves, a total of 424 nature reserves were included for downstream analysis. Furthermore, recognizing the growing emphasis on China’s national park system10,109, we incorporated both officially established national parks and pilots specifically within the Central China region when calculating protected area coverage. The list of national parks was sourced from the Chinese National Forestry and Grassland Administration (http://www.forestry.gov.cn/main/5960/index.html). We further assessed the east–west distribution disparity of protected areas using the Hu Huanyong Line, a seminal demographic geographic boundary dividing China into densely populated southeastern and sparsely populated northwestern regions54. The number and total area of national nature reserves on each side were calculated. For reserves intersecting the line, classification was based on the side containing the larger area.

Globally recognized biodiversity hotspots represent critical conservation priorities6. To identify conservation gaps of global importance, particularly potential hidden hotspots, we overlaid four global biodiversity hotspots involving China (that is, Mountains of Central Asia, Indo–Burma, Mountains of Southwest China and Himalaya)58 with our identified endemic centres. Spatial data for global biodiversity hotspots were downloaded from ref. 110, which provides vector layers for all 36 global hotspots. Our gap analysis revealed that Central China, a distinct floristic unit, receives inadequate global conservation attention. We subsequently delineated its geographic boundaries and quantified its vascular plant endemism and primary vegetation loss (Supplementary Note provides details) to evaluate whether Central China meets the formal criteria for designation as a global biodiversity hotspot6.

Reporting summary

Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.