Introduction

It is now well appreciated that climate oscillations during the Quaternary have profoundly shaped the geographic distributions and current genetic diversity of many temperate species in the Northern Hemisphere1. Two general hypotheses on forest responses to the Quaternary climate changes in East Asia have been proposed2, 3. Palaeovegetation data from East Asia showed that temperate forests in this region were considerably more restricted than today and would have retreated southward to c. 30°N during the LGM2. Conversely, Harrison et al. suggested that temperate forests in this region did not migrate on a large scale, but rather retained in local low-altitude refugia and formed discontinuous forest vegetation during glacial periods3. A limited number of phylogeographic studies appear to support the latter hypothesis. For example, in northern and northeast China, two species of shrubs, Ostryopsis davidiana Decne. and Quercus mongolica Fischer ex Ledebour, survived in multiple glacial refugia and showed regional postglacial expansions4, 5 while in southern and southeast China multiple refugia were recovered for a few trees and accompanying species, such as Tetrastigma hemsleyanum Diels & Gilg6, the East Asian Kirengeshoma 7, and the species of the fir genus8. In subtropical China, the evergreen broad-leaved forest constituents conform to either an in situ survival model or an expansion-contraction model, such as Castanopsis tibetana, Machilus thunbergii and Schima superba 9, Sargentodoxa cuneata 10, and Loropetalum chinense 11. In addition, in the eastern Himalaya, some species from multiple refugia during the Last Glacial Maximum expanded their range to colonize extensive regions before the middle Quaternary12,13,14, although some climate-sensitive trees retreated and recolonized high-altitude regions after the LGM15, 16. Up to now, no studies have examined the phylogeographic structure of a single species/species complex or monophyletic group whose current distributions cover all of these regions in East Asia.

In this study, we focused on the phylogeographic patterns of the Indigofera bungeana complex (Fabaceae), a complex of deciduous shrubs widespread in temperate East Asia (between c. 23°N and 45°N in latitude, 95°E and 135°E in longitude), with a continuous geographic distribution covering southern China, northern China and the Hengduan Mountains region (HMR). Four species have been ascribed to this complex17, I. bungeana Walpers, I. amblyantha Craib, I. silvestrii Pampanini, and I. ramulosissima Hosokawa. Species delimitations between them remains unclear due to the lack of clear morphological and genetic gaps. Our unpublished phylogenetic analyses suggested that numerous individuals of each species intermixed, but formed a highly supported monophyletic clade sister to the species of Indigofera from the Cape region of South Africa. We therefore treated them as a single evolutionary lineage in our phylogeographic analyses. Members of this complex grow in sunny, arid habitats at elevations between 100‒2700 m17. Their widespread distribution provides a unique opportunity to examine how plants responded to past climate changes over a large region in East Asia.

We sequenced two types of DNA fragments with contrasting backgrounds of inheritance. First, two maternally inherited chloroplast DNAs (cpDNAs) were used, as in most phylogeographic studies16, 18, due to the merits of rare recombinations and smaller effective population size19. This type of population genetic data allows an inference of historical range shifts and recolonization routes20,21,22. Second, we also sequenced one nuclear DNA fragment. Population data from nuclear genetic polymorphisms can confirm the phylogeographic inferences from cpDNA23,24,25,26. Sequence variation data from a single nuclear locus is becoming popular for such an aim23, 27.

We finally used ecological niche modelling to infer the possible distributions of this complex during the LGM in East Asia. We expected that the simulated distributions should be consistent with the phylogeographic inferences of the population genetic data. We aimed to address the following questions: (1) Are phylogeographic inferences from cpDNA data consistent with those from nuclear DNA data? (2) When and how did this complex obtain its widespread distribution in East Asia? (3) Did the I. bungeana complex retreat southward or survive in situ during the LGM?

Results

cpDNA variation and haplotype structure

The total alignment of the ndhJ-trnF and trnD-trnT fragments across the 472 individuals sampled was 2933 bp, containing 155 substitutions and 43 indels (insertion/deletion) (4‒116 bp). A total of 133 chlorotypes (C1‒C133) was identified, 104 (78.2%) of which occurred in a single population (Table 1). The most common haplotypes, C5, C23 and C24, were found in 5 (9.8%) populations, respectively. Total haplotype (H d) and nucleotide (π) diversity of the cpDNA data was 0.982 and 0.0033, respectively. Seven of the 51 populations surveyed contained only one haplotype, whereas the remaining populations were polymorphic (Table 1; Fig. 1).

Figure 1
figure 1

Geographic distribution of cpDNA haplotypes detected in Indigofera bungeana complex. The haplotypes found in more than one population are color-coded, while private haplotype particular to each population are shown in white. Figure was generated in DIVA-GIS 7.5 (http://www.diva-gis.org).

Table 1 Locations of populations of Indigofera bungeana complex sampled, sample sizes (n), frequencies of chloroplast and Pgk1 haplotypes per population, the geographic region (Figs 1 and 2) and lineage (Figs 3a and 4a) for each population.

Phylogenetic trees reconstructed using NJ and Bayesian methods were largely consistent in topology. All chlorotypes from the I. bungeana complex comprised a monophyletic lineage with three clades (Fig. 2). The basal A clade included only two chlorotypes, C62 and C63, which were restricted to a single population of I. silvestrii (HMR: MX3); the second B clade contained three, C8, C9 and C47, distributed in the Hengduan Mountains region and southern China. Clade C included all the remaining numerous chlorotypes with unresolved relationships, indicating radiative diversification. The haplotype network showed the star phylogeny, and most cpDNA haplotypes were arranged as a radiative phylogenetic tree relative to the central ones, e.g. C67, C36, C24, C75, C74 and C8 (Fig. 1). The dating analyses under different substitution rates suggested that the common ancestor of the I. bungeana complex originated during the Pliocene, and those of clades A, B and C in the Pliocene to early Pleistocene. Diversification of haplotypes in clade C, which comprised almost all the chlorotypes, was estimated to have occurred before the LGM (see Supplementary Table S1).

Figure 2
figure 2

The evolutionary relationships among cpDNA haplotypes of Indigofera bungeana complex. (a) NJ phylogenetic tree of the 133 cpDNA haplotypes. Numbers above/below branches represent Bayesian posterior probabilities/NJ support values. (b) Maximun parsimony network. The size of circles corresponds to the frequency of each haplotype and black dots represent missing haplotypes (not sampled or extincted). Lineages (A, B, C) and clades (a1, a2, a3, a4, a5) correspond to the lineages and clades in Table 1 and Supplementary Table S1.

Pgk1 variation and haplotype structure

The alignment of Pgk1 across the 434 individuals was 792 bp, containing 58 substitutions and 8 indels (4‒28 bp). These polymorphisms defined 68 haplotypes, with 42 (61.8%) haplotypes unique to a single population (Table 1). The most common haplotypes H2 and H6 occurred in 20 (39.2%) and 13 (25.7%) populations, respectively. Total haplotype (H d) and nucleotide (π) diversity of the Pgk1 was 0.927 and 0.0106, respectively. Among the 51 populations surveyed, five were fixed for a single haplotype, and the remaining ones were polymorphic (Table 1; Fig. 3).

Figure 3
figure 3

Geographic distribution of Pgk1 haplotypes detected in Indigofera bungeana complex. The haplotypes found in more than one population are color-coded, while private haplotype particular to each population are shown in white. Figure was generated in DIVA-GIS 7.5 (http://www.diva-gis.org.).

Nuclear (Pgk1) haplotypes clustered into three clades (Fig. 4). The basal clade comprised the haplotypes occurring in the HMR and southern China, and clade II included H26 and H27 that were restricted to a population of I. amblyantha (southern China: JZ), while the clade III comprised the remaining haplotypes that occurred throughout the whole distribution areas of I. bungeana complex. The haplotype network showed the same star phylogeny with most haplotypes relative to the central ones (e.g. H1, H2, H16 and H17) which occurred at a high frequency (Fig. 4b). The crown age of the I. bungeana complex estimated from Pgk1 was 1.47 (95% HPD: 0.75‒2.37) Ma. The common ancestors of clades I, II and III were estimated to have occurred 0.67 (95% HPD: 0.26‒1.28) Ma, 0.35 (95% HPD: 0.01‒0.93) Ma and 0.98 (95% HPD: 0.51‒1.66) Ma (Table S3), respectively.

Figure 4
figure 4

The evolutionary relationships among Pgk1 haplotypes of Indigofera bungeana complex. (a) NJ phylogenetic tree of the 68 Pgk1 haplotypes. Numbers above/below branches represent Bayesian posterior probabilities/NJ support values. (b) Maximun parsimony network. The size of circles corresponds to the frequency of each haplotype and black dots represent missing haplotypes (not sampled or extincted). Lineages (I, II, III) and clades (b1, b2, b3, b4, b5) correspond to the lineages and clades in Table 1 and Supplementary Table S1.

Genetic differentiation

The SAMOVA analysis failed to uncover any reliable population genetic group in either the cpDNA or nuclear datasets (see Supplementary Fig. S1). We therefore divided all the populations into three groups according to the classical phytogeographic boundaries defined by Wu & Wu28: (A) Hengduan Mountains region (HMR), (B) Northern China, (C) Southern China (see Figs 1 and 2).

The level of total genetic diversity H T (cpDNA: 0.991; Pgk1: 0.939) across the overall populations was much higher than the average within-population gene diversity H S (Table 2). The highest genetic diversity occurred in southern China (cpDNA: 0.982; Pgk1: 0.922), in accordance with the occurrence of the most divergent haplotypes in this region (Figs 1 and 2). For both cpDNA and Pgk1 datasets, a significantly larger N ST than G ST value across overall populations was detected (Table 2), indicating the presence of a significant phylogeographic structure.

Table 2 Estimates of average gene diversity within populations (H S) of Indigofera bungeana complex, total gene diversity (H T), inter-population (G ST), and number of substitution types (N ST) for cpDNA and Pgk1 across regions.

Hierarchical AMOVA revealed low levels of regional and species differentiation (Table 3; see Supplementary Figs S2S5). Variations among regions accounted for 5.26% and 23.53% of the total genetic variation for cpDNA and Pgk1 datasets, respectively. Populations in the HMR and southern China showed significantly higher variation among populations than within populations, while extremely lower differentiation among populations than within populations in northern China (Table 3). The highest level of genetic differentiation among populations (cpDNA: PV = 77.54%, F ST = 0.775; Pgk1: PV = 86.74%, F ST = 0.867) was observed in the HMR, which was consistent with the significantly higher G ST values than in other regions (Table 3). Complex and heterogeneous climate and topography may serve as a favorable condition for isolation, drift and barriers of gene flow in the HMR. When each species was analyzed separately, differences among species explained 7.56% (F CT = 0.076) and 4.76% (F CT = 0.048), those among populations within species 53.33 (F SC = 0.577) and 76.12% (F SC = 0.799), and those within populations 39.11% (F ST = 0.609) and 19.11% (F ST = 0.809) of the total cpDNA and nuclear DNA genetic variation, respectively (Table 3).

Table 3 Hierarchical analysis of molecular variance (AMOVA) of cpDNA and Pgk1 for Indigofera bungeana complex, partitioned by species and region, respectively.

Tests of demographic expansion

Under a model of population expansion, the major clades identified in the cpDNA (clade A, B and C) and Pgk1 (clade I, II and III) phylogeny displayed a bimodal or multimodal mismatch distribution (Fig. S6). However, none (except cpDNA clade A and Pgk1 clade I) of the statistical comparisons between these observed distributions and simulated ones under a sudden expansion model significantly rejected the expansion model (P values > 0.05 based on SSD and H Rag, Table 4). Nonsignificant SSD and raggedness index, as well as a significant large negative F S (C: F S = −24.002, P < 0.001; III: F S = −21.123, P < 0.01) and Tajima’s D (C: D = −0.599, P = 0.306; III: D = −2.096, P < 0.001) values, indicated a historical demographic expansion within cpDNA clade C and Pgk1 clade III. Based on the corresponding τ, and assuming minimum and maximum substitution rates of 1.0 × 10−9 and 3.0 × 10−9 s s−1y−1, the expansion of clade C was estimated to have occurred at 698 (95% CI: 493‒961) and 232 (95% CI: 164‒320) thousand years ago (Kya), respectively. Bayesian skyline plots suggested that effective population size of the cpDNA clade C increased quickly (Fig. S7). The expansion for the Pgk1 clade III was estimated to have occurred 60 (95% CI: 5‒220) Kya (Table 4).

Table 4 Mismatch distribution and neutrality tests for populations of clades of Indigofera bungeana complex.

Present and past distribution modelling

The AUC value for the current potential distribution of the I. bungeana complex was high (0.984), indicating a good predictive model performance. The projection of the model over present bioclimatic conditions shows a good habit suitability between 23°N and 45°N in East Asia (Fig. 5a). With 0.15 chosen as the threshold suitability, the CCSM (Fig. 5b) and MIROC (Fig. 5c) models yielded largely similar paleo-distributions in the LGM, while the MIROC inferred a more similar distribution range to the present day. However, the areas with high suitability (>0.60) were slightly decreased but significantly fragmented in both the CCSM and MIROC models compared with the present distribution, indicating possible habitat loss and fragmentation during the LGM.

Figure 5
figure 5

Modelled climatically suitable areas for Indigofera bungeana complex at different times using Maxent. Niche model results were modified in ArcGIS version 10.0 (http://www.esri.com/software/arcgis/arcgis-for-desktop). (a) The present; the last glacial maximum (LGM: c. 21 ka BP) under the (b) CCSM and (c) MIROC models, and (d) the last interglacial (LIG: c. 130 ka BP). The logistic value of habit suitability is shown according to the color-scale bars.

Discussion

Pleistocene expansion

Chloroplast and nuclear datasets revealed high levels of genetic diversity of I. bungeana complex (Table 4). In addition, genetic differentiation among ‘the assumed species’ were very small, but extremely high among populations or regions using either cpDNA or nuclear DNA markers (Table 4). The small genetic differentiations seem not to support the previous taxonomic delimitations within this complex17. Relatively few haplotypes but high proportions of private haplotypes were detected across the investigated populations or regions in this study (Table 1). Most populations across the different regions are dominated by numerous private haplotypes although a few common haplotypes were also found between populations of the local regions (Figs 3 and 4).

This phylogeographic pattern may be better explained by the hypothesis that all examined populations have experienced a common expansion followed by the fast isolations29. The star phylogeny of haplotypes, an evidence of common expansion, was also observed for both chloroplast and nuclear haplotypes (Figs 3 and 4). The BSP analysis, which is based on coalescent methods, revealed that the effective population sizes (N e) increased in the early and middle Pleistocene. What’s more, the results of mismatch distribution and neutral tests further supported the expansion hypothesis. All these available evidences seem to support that a common expansion have occurred within I. bungeana complex. The range expansions detected in cpDNA clade C and nuclear clade III were estimated to have occurred approximately between 60 and 961 Kya, in the early and middle Pleistocene. Although we could not pinpoint the expansion accurately, it is highly possible that climate change of the Quaternary might have facilitated this expansion30. Some cold-tolerant plants, such as the species of the fir genus, expanded extensively and continuously in high-elevation regions during the Quaternary31,32,33,34.

However, it remains unknown how the ancestral haplotypes disappeared in different regions/populations. It is highly likely that geographic isolations following the range expansions promoted the private haplotypes displaced the ancestral ones. Such scenarios were usually found for isolated species or populations35,36,37. The Quaternary climate changes after the range expansions should have mainly accounted for such fixtures of the private haplotypes in the different regions and great among-regions differentiations. The limited seed and pollen dispersals of this specie complex may have also played an important role.

In situ survival in most distributions of the complex during the LGM

Although northern and southern China have never been covered by large ice sheets during the LGM, it is estimated that the climate was cooler by at least 7‒10/4‒6 °C and dryer by c. 200‒300/400‒600 mm yr−1 than the present, respectively38,39,40. The Quaternary climate changes have strongly affected distribution and genetic diversity of the temperate plants in East Asia, resulted in experienced glacial southward migrations, or in situ glacial survival18, 26. Species of the I. bungeana complex have high cold tolerance as well as drought tolerance with the most northward distributions between c. 23°N and 45°N in the genus41.

Most private haplotypes in this complex derived from the range expansion in the early and middle Pleistocene, earlier than the LGM. Our simulations of the distributions of the I. bungeana complex during the LGM also indicated that the distribution did not migrate southward significantly although the core distributions shrank, which is also consistent with the species with similar distribution in East Asia, such as Juglans cathayensis 42, Quercus variabilis 43, and Tetrastigma hemsleyanum 44.

Given the above evidence, we tentatively suggested that the I. bungeana complex might have survived in situ or in multiple large refugia in response to the climate change of the LGM. However, this and other Quaternary climate oscillations might have together accelerated the regional isolations of the I. bungeana complex that promoted the fixture of the numerous private haplotypes. Our results seem to suggest not all woody species growing under the temperate deciduous forests in northern China migrated southward3. However, these climatic changes might have promoted the species or genetic differentiations of plants occurring in East Asia as suggested by Qian and Ricklefs2.

Methods

Population sampling

A total of 472 individuals representing four species were collected from 51 populations, with 1‒13 individuals per population (spaced at least 50 m apart), covering the whole range of the I. bungeana complex in China (see Supplementary Table S2). We tentatively ascribed all collected materials to four species names in the Flora of China17. Voucher specimens are deposited in Herbarium of Chengdu Institute of Biology, Chinese Academy of Sciences (CDBI).

DNA extraction, amplification and sequencing

Total genomic DNA was extracted from silica-gel-dried leaves using the plant genomic DNA extraction kit (TIANGEN Biotech., Beijing, China). After preliminary screening of ten chloroplast fragments (i.e., atpF-atpH, rpl32-trnL, rps16-trnQ, ndhJ-trnF, matK, trnL-trnF, ndhC-trnV, trnD-trnT, ndhF-rpl32 and psbA-trnH) for the representative samples of I. bungeana complex, we chose two cpDNA intergenic spacer (IGS) regions (ndhJ-trnF and trnD-trnT)45,46,47 for the phylogeographic study. In addition, Pgk1, a single-copy nuclear (scn) gene responsible for coding plastid phosphoglycerate kinase isoenzymes, was surveyed among 51 populations using two new primers (see Supplementary Table S3) designed on the basis of sequences obtained using the primers of Huang et al.48. All the chlorotypes and nuclear haplotypes from Indigofera bungeana were deposited in GenBank under accession numbers (submitted).

All amplifications were performed in 25 µL reactions containing 17 µL deionized sterile water, 1.5 µL of 25 mM MgCl2, 2.5 µL Taq reaction buffer, 2 µL of 2.5 mM dNTP, 0.5 µL of each primer at 10 pmol mL−1, 0.5 µL (2.5 unit) Taq DNA polymerase (TIANGEN, Beijing, China), and 0.5 µL genomic DNA (10‒50 ng). The PCR amplifications were performed as follows: initial denaturation at 94 °C for 5 min, followed by 33 cycles of denaturation at 94 °C for 45 s, annealing (54 °C, 30 s for ndhJ-trnF and trnD-trnT; 58 °C, 30 s for Pgk1), and extension at 72 °C (1 min for ndhJ-trnF; 90 s for trnD-trnT; 45 s for Pgk1), and a final extension at 72 °C for 7 min prior to holding at 12 °C forever. PCR products were purified using an E.Z.N.A gel extraction kit (OMEGA, Biotech., USA). The purified PCR products were sequenced by Life TechnologiesTM (Shanghai, China).

Population genetic analyses

Sequences were assembled and edited with Sequencher 4.1 (Gene Codes, Ann Arbor, MI), aligned using Clustal X 1.8149 and subsequent manual adjustments. Nuclear (Pgk1) allelic phases were resolved using the algorithm of PHASE27 implemented in DnaSP 5.050, using 1,000 iterations with a 1,000 generation burn-in iterations and a thinning interval of 10. Indels were treated as single mutation events and coded as substitution (A or T). Haplotypes of cpDNA (chlorotypes) and Pgk1 were recognized using DNASP 5.050. Genealogical relationships among chlorotypes and nuclear (Pgk1) haplotypes were constructed using a statistical parsimony algorithm51 as implemented in Network v. 4.6 (http://fluxus-engineering.com).

Population gene diversity (H S, H T) and between-population differentiation (G ST, N ST) were estimated using PERMUT52 with the 1,000 permutations test. A higher N ST than G ST usually indicates the presence of phylogeographic structure, that is, the more frequent occurrence of closely related haplotypes in the same area than less closely related haplotypes52. A comparison was made between N ST and G ST using the U-statistics test.

The spatial analysis of molecular variance (SAMOVA) was conducted using SAMOVA 1.053 to define the groups of populations that are geographically homogeneous and maximally differentiated. The SAMOVA analysis was conducted with the number of groups (K) ranging from 2 to 20. To verify the consistency, we ran the analysis five times for each K value with 1,000 independent iterations, starting from 100 random initial conditions. We assessed the optimal K as the one for which F CT (i.e., the genetic variance owing to divergences between groups) was the highest and significant.

Hierarchical analysis of molecular variance (AMOVA) was performed in ARLEQUIN 3.154 to estimate the partition of genetic variance among groups, within and among populations. In the AMOVA analysis, populations were partitioned by geography or species, respectively. Geographical groups were obtained from SAMOVA analysis. If the SAMOVA analysis was unable to detect suitable groups, populations were grouped following Wu & Wu28.

Phylogenetic analyses and divergence time estimation

Two species of Indigofera (I. szechuensis Craib and I. lenticellata Craib) were chosen as outgroups in the phylogenetic analyses according to the phylogenetic analyses of Indigofera (our unpublished results). Phylogenetic relationships of the chlorotypes and nuclear (Pgk1) haplotypes were reconstructed with Neighbor-joining (NJ) and Bayesian inference (BI) methods, using MEGA 5.0555 and MrBayes 3.156, respectively. In the NJ analysis, we used the Kimura’s 2-parameter model57. Confidence values at the nodes were tested by performing 1,000 bootstrap replicates. Prior to BI analyses, the optimal nucleotide substitution model was determined using jModeltest 2.1.258 via the Akaike Information Criterion (AIC)59. The TVM + I + G model (cpDNA) and GTR + I + G model (Pgk1) were selected as the best-fit models. Four Markov chain Monte Carlo (MCMC) chains were run for 20,000,000 generations, starting from random trees and sampling one tree per 1,000 generations with the first 4,000,000 samples discarded as burn-in. The program Tracer 1.560 was used to check the parameter convergence and effective sample size. A 50% majority-rule consensus tree was summarized with posterior probabilities as nodal support.

A likelihood-ratio test61 in PAUP 4.10b62 suggested that the chloroplast and nuclear datasets rejected a strict molecular clock (P < 0.01), therefore we used a relaxed molecular clock. Divergence times between the chloroplast and nuclear haplotype clades were estimated under a Bayesian approach63 in BEAST 1.6.264. As there is no fossil record of Indigofera, we adopted a substitution rate method. The cpDNA substitution rates for most angiosperm species have been estimated to vary between 1.0 × 10−9 and 3.0 × 10−9 substitutions per site per year (s s−1y−1)65. Given the uncertainties of these rate values in Indigofera, we used a minimum (1.0 × 10−9 s s−1y−1), mean (2.0 × 10−9 s s−1y−1) and maximum (3.0 × 10−9 s s−1y−1) substitution rate, respectively. For Pgk1, we adopted a nucleotide substitution rate of 13.6 × 10−9 s s−1y−1 according to Huang et al.66.

Demographic analyses

To test the assumption of selective neutrality, we performed Tajima’s D 67 and Fu’s F S 68 tests, which are expected to show significant negative values under population expansion and positive under a population bottleneck. A mismatch distribution analysis69 (MDA) was also conducted to explore the demographic history of major chlorotypes and nuclear (Pgk1) haplotype clades. Populations that have experienced expansion are expected to have a unimodal shape, whereas stable populations are expected to have a bi- or multi-modal mismatch distribution. The goodness-of-fit was assessed by the sum of squared deviations (SSD), Harpending’s raggedness index70 (H Rag) and 95% confidence interval (CI) around τ under a sudden-expansion model. Statistical significance was determined by 1,000 bootstrap replicates. These analyses were conducted using ARLEQUIN 3.158. To obtain estimates of changes in demographic growth over the history of major clades, the historical demographic dynamics of the I. bungeana complex were inferred from Bayesian skyline plot (BSP) analyses using BEAST 1.6.264. Linear and stepwise models were explored using an uncorrelated lognormal relaxed clock. Runs consisted of 50,000,000 generations, with trees sampled every 1000 generations. The BSP was visualized in the program Tracer version 1.5, which summarizes the posterior distribution of population size over time.

Present and past distribution modelling

We used the maximum entropy modelling implemented in MAXENT 3.3.3k71 to infer the potential geographic range of the I. bungeana complex at the present, the LGM (ca. 21 ka) and last interglacial (LIG, ca. 130 ka) based on the bioclimatic layers downloaded from the WorldClim database72 (http://www.worldclim.org) at 2.5-arcmin resolution. The paleo-climatic conditions during the LIG were simulated by Community Climate System Model73 (CCSM), while we used two available models for the LGM: CCSM and MIROC74. Distribution records of the species of I. bungeana complex were sourced from the database of GBIF (http://www.gbif.org/) and Chinese Virtual Herbarium (http://www.cvh.org.cn), as well as our own field collections. After initial screening for duplicates and records aggregation into a 2.5 resolution raster, 181 unique records were used. Highly correlated variables (r > 0.7) were excluded, and we ultimately selected five bioclimatic variables (i.e., annual mean temperature, temperature seasonality, mean temperature of driest quarter, precipitation of wettest month, and precipitation of warmest quarter). To statistically evaluate model performance, we used the area under the “Receiver Operating Characteristic (ROC) Curve”75 (AUC), a threshold-independent measure of model performance as compared to null expectations.