Introduction

Accurate and detailed spatial population distribution can reflect the differences in natural geographical conditions and the levels of economic development in a region1. Currently, easily accessible population data primarily consist of macro-scale data, one is the census data, which serve as the primary source of raw population distribution data and form the basis for various population-related studies. These data are categorized based on different statistical scales, such as provincial and municipal statistics or street-level statistics. However, population distribution within administrative units is not uniform2, and the long acquisition time leads to a lack of information on short-term fluctuations in population distribution3. Therefore, census data are often combined with other geographical spatial data to conduct spatial population studies, forming the second type of population data. For instance, population spatial studies4 often employ various modeling factors5or individual data with coordinates6. Currently, there is an abundance of grid-based population data available globally or for specific regions, such as the Global Population Grid (GPW), Global Human Settlement (GHS), LandScan, and China Population Dataset. These datasets reflect different population activity patterns and have been applied to issues such as immigration7, social issue prediction8, disaster management9, etc. However, current research mainly focuses on population the overall distribution and mobility, with relatively less attention paid to the spatial distribution characteristics of specific population groups.

Minority refers to an ethnic group other than the dominant ethnic group that constitutes a minority of the population. Countries like Singapore, Thailand, and China in Asia, as well as the United Kingdom and Spain in Europe, are widely recognized as multi-ethnic nations. Ethnic minority areas have always exhibited distinct clustering phenomena, which are prevalent worldwide10, such as Chinatowns11, black communities12, and ethnic cultural villages13. In ethnic minority concentration areas, unique ethnic cultural and economic phenomena often emerge14 like traditional ethnic village forms15, subsistence farming economies16, and ethnic tourism economies17. Historically, the underdevelopment and limited transportation in ethnic minority regions have played a significant role in preserving distinctive cultures18. However, industrialization and urbanization have altered the lifestyles and rural landscapes of indigenous peoples19, leading to the destruction of traditional ethnic villages20 and hindering the transmission of ethnic cultures21. Due to their numerical disadvantage, poverty rates in ethnic minority villages are significantly higher than in other areas22. Therefore, understanding the spatial distribution of ethnic minority populations plays a crucial role in addressing regional resource allocation, mitigating ethnic conflicts, and promoting local planning and development.

Recent studies on population distribution have predominantly utilized nighttime light data (NTL) combined with multi-source information to achieve high-resolution population mapping, with toponymic data playing a significant role in indicating specific population groups. In population estimation, researchers often incorporate land use/land cover change (LUCC) data, points of interest (POI), and social media data to improve the accuracy of population distribution models23. For attribution analysis, variables such as temperature, vegetation indices, road network density, and terrain elevation are commonly introduced. Compared to traditional approaches, these multi-source data integration methods have significantly enhanced the precision of population distribution analyses24,25. In addition to POI data, more granular toponymic information is increasingly being explored to characterize population patterns in greater depth. For instance, studies have mapped immigrant entrepreneurship through business registration data26, or depicted the spatial distribution of Chinese communities using street networks and Chinatown bus routes27. Furthermore, research has highlighted the role of place names in linking people to space—particularly among indigenous minority groups—by identifying territorial boundaries and supporting emotional bonds between people and place28. These communities often possess sufficient time and cultural strength to associate geographical features with names in their native languages, thereby creating unique “namescapes” that may resist translation into standard naming conventions. Current research primarily focuses on integrating NTL and multi-source data to enhance population distribution accuracy, but few studies have examined the population distribution of ethnic minority groups. Therefore, to better delineate the distribution of ethnic minorities, toponymic data specific to ethnic groups could serve as a valuable resource for more detailed analysis.

Regression models29, Geographically Weighted Regression (GWR)30, and machine learning models such as XGBoost and Random Forest31,32, have been applied to investigate the relationships between population and various dependent variables. However, these studies have largely concentrated on mapping population distribution, with limited attention paid to the delineation or identification of specific population groups. Machine learning algorithms have made significant advancements in data analysis and pattern recognition fields. Among them, the Random Forests (RF) algorithm, as a powerful and popular ensemble learning method, has been widely applied in various domains such as scene classification33, regression prediction34, spatial interpolation35, etc. Due to its high accuracy36, many researchers have applied it to population prediction studies. With the development of geospatial big data, an increasing number of machine learning studies utilize auxiliary data to estimate population distribution, making it possible to identify spatial distribution data for different ethnic groups.

Previous studies have primarily discussed large-scale population distribution patterns37. However, relatively few studies have focused on the spatial distribution of specific population groups. Ethnic toponyms offer an intuitive indication of population concentration and activity ranges. This study focuses on the population distribution of ethnic minority groups. In Dehong Prefecture, Yunnan Province, China, five indigenous ethnic minority groups (IEG) reside: the Dai, Jingpo, Lisu, Achang, and Deang peoples. These groups maintain rich traditional cultural practices. Existing research in this region has primarily focused on the distribution of cultural heritage sites38, ecological quality39, regional development40, and ethnic cultural exchange41. Such a distinctive cultural and humanistic background provides a natural foundation for exploring the spatial distribution of ethnic minority populations. Therefore, this study focuses on the distribution of small-scale ethnic groups. Therefore, the research selects Dehong Prefecture in Yunnan Province, China as a case study to explore the changes in population spatial distribution and influencing factors in multi-ethnic areas. The main contents include: (1) investigating population changes in Dehong Prefecture over the past decade and the contributions of ethnic minorities; (2) introducing machine learning algorithms to enhance the spatial resolution of population distribution; (3) quantifying the contributions of various influencing factors to population distribution predictions.

Study area and datasets

Study area

The Dehong Dai and Jingpo Autonomous Prefecture (97°31’ ~ 98°43’ E, 23°50’ ~ 25°20’ N), lies in the western part of Yunnan Province, China (Fig. 1), which governs 2 cities and 3 counties (Mangshi City, Ruili City, Longchuan County, Yingjiang County, and Lianghe County). It borders the Union of Myanmar to the south, west, and northwest. The elevation of the entire prefecture ranges from 210 to 3177 m, with higher terrain in the northeast and lower terrain in the southwest. The prefecture is home to more than 30 ethnic minorities. As of the end of 2022, the permanent population of Dehong Prefecture was 1.321 million, with ethnic minorities accounting for 45.84% of the total population. And it is the main settlement area for the Jingpo, Achang, and Deang ethnic groups in the country. The Dai ethnic group is the largest, accounting for 27.27% of the total population, followed by the Jingpo (10.22%), Lisu (2.61%), Achang (2.36%), and Deang (1.09%) ethnic groups. These five ethnic groups are also IEG (refers to the five ethnic minorities of Dai, Jingpo, Achang, Lisu and Deang) of Dehong creating a rich and colorful ethnic culture in its long history.

Yunnan plays an important role in China’s “Belt and Road” strategy, and Dehong Prefecture, due to its unique geographical location, has become an important gateway to Southeast and South Asian countries. The prefecture has a border length of 503.8 km, accounting for one-fourth of the China-Myanmar border42, at the intersection of the Chinese economic zone, the Southeast Asian economic zone, and the South Asian economic zone. It is a golden port for China to connect with Myanmar and enter Southeast and South Asia by land. Currently, Dehong Prefecture is implementing various ethnic policies. Against the backdrop of its vigorous development and prominent advantages, understanding the current population distribution status will facilitate the better implementation of various policies. For a region like Dehong Prefecture, which is rich in ethnic minority culture and distinctive characteristics, exploring the spatial layout of the current population of each ethnic minority is crucial for developing tailored development strategies and formulating local development strategies.

Fig. 1
Fig. 1
Full size image

Administrative division map of Dehong prefecture. The map was created using ArcGIS Pro (version 3.0.1, https://www.esri.com/en-us/arcgis/products/arcgis-pro).

Dataset

Table 1 lists the datasets and sources used in the study. All data used in this paper have undergone projection transformation and spatial resolution standardization. The “WGS_1984_UTM_Zone_47N” projection was selected in the ArcGIS 3.0 Pro platform, and all data were unified to a spatial resolution of 500 m.

Census data on ethnic minority populations at the county and township levels in Dehong Prefecture are derived from the sixth and seventh Chinese census data released by Dehong Prefecture. The county-level administrative divisions and township-level divisions of Dehong Prefecture were obtained from the National Centre for Basic Geographic Information. Data verification was conducted to ensure consistency between the population census boundaries and administrative boundaries. Because the area of statistical units has a large impact on the number of people and because the area of administrative units fluctuates greatly in this study, this paper uses population density (people per square metre) to avoid such regional fluctuations.

Nighttime light data (NTL) were obtained from the NPP-VIIRS product. Since the DMSP-OLS NTL data ceased publication in 2013, VIIRS-DNB NTL data were only released until 2012. In order to reduce data discrepancies from different sensor changes, the NPP-VIIRS annual average data for 2012 and 2020 were used in this study to participate in the study, which is a VNL v.2 dataset downloaded from the Group on Earth Observations website. It is a pre-processed NPP-VIIRS product published by Elvidge et al.43, which filters out pixels of sunlight, moonlight, and cloudiness, as well as removing the interfering effects of biomass combustion and other uncommon noise sources using methods such as outlier removal. The VNL v.2 time series product pays particular attention to the use of the same thresholds and settings, so that it is possible to compare quantitatively the radiance and spatial patterns. Due to the characteristic of NTL in representing human activities44, it has been widely used in studying human activities, such as delineating human activity ranges and intensities45, carbon emissions46, economic development47, etc. Scholars often grid population information based on NTL combined with statistical data to explore the social patterns of population distribution48.

The land cover data utilized in this study were sourced from the China Land Cover Dataset (CLCD), as published by Yang and Huang49. This dataset provides 30-meter resolution land cover data for China, derived from 335,709 Landsat images on the Google Earth Engine platform. The dataset includes nine categories such as Cropland, Forest, Shrub, Grassland, Water, etc., with an overall accuracy of 80%. LUCC are important manifestations of human activities50 closely related to ecosystem services51 and human welfare52. Ethnic minorities living in rural areas rely more on land, and traces of human life can be explored based on land use changes.

Road data were sourced from the OpenStreetMap (OSM) platform. In order to achieve the goal of comprehensive well-being by the end of 2020, China proposed a strategy of precise poverty alleviation, in which “transport” was a key domain of poverty alleviation and development. Since 2010, the country has invested $100 billion annually53. The government subsequently achieved the goal of ‘roads to the doorsteps’ after the poverty eradication campaign. In the past, ethnic minorities living in remote and impoverished areas often faced difficulties with transportation54, but due to targeted poverty alleviation efforts, road conditions have greatly improved55. Therefore, it is speculated that compared to ethnic minorities living in developed areas, changes in roads have a more significant impact on ethnic minorities living in rural areas. The historical database of the OSM platform dates back to 2014. Given the unavailability of road network data for 2010, this study utilizes the 2014 OSM dataset as a proxy for road infrastructure conditions in the early 2010s. We downloaded road data for the year 2014 and 2020. and the 2014 OSM data is used as a reference for the 2010 estimation in this paper as a compromise.

The Digital Elevation Model (DEM) data were obtained from STRM, which was obtained from the Geospatial Data Cloud Platform with a spatial resolution of 30 m. Subsequently, the slope was generated in the ArcGIS platform. DEM, as an intuitive feature of the Earth’s surface, is closely related to environmental climate. Some studies have pointed out that different ethnic groups have different preferences, such as the Dai people mainly living in river valleys, while the Hani and Jingpo ethnic groups live in mid-slopes, and the Nu and Dulong ethnic groups are primarily concentrated in mountainous areas37. Therefore, selecting DEM can further assist in studying the distribution of different ethnic minorities.

Normalized Difference Vegetation Index (NDVI) data were obtained from MODIS-Terra (MOD13A2), with a spatial resolution of 250 m and a temporal resolution of 16 days. This dataset primarily utilizes the Google Earth Engine cloud computing platform, synthesizing the annual mean NDVI for each pixel using data from the entire year. NDVI can intuitively reflect vegetation distribution and growth characteristics. Scholars point out that people living in areas for generations often manage traditional landscapes based on traditional ecological knowledge, and these areas’ ecosystems are stable, with ecosystem services being maintained56. Therefore, it is inferred that NDVI and land cover (LC) are the main influencing factors for the distribution of populations less affected by urbanisation.

POI data were obtained from the BIGEMAP platform, including 19 categories such as dining, accommodation, shopping, medical care, etc. After preprocessing steps including error data removal and duplicate dataset elimination, a total of 595,286 data points were retained. Kernel density analysis was performed on this data with a search radius of 3 km. POI data contain the locations of numerous urban facilities and can be used to calculate indicators such as urban structure57, economic vitality58, etc. It is also used as an auxiliary variable for population estimation, which significantly impacts populations living in urban areas.

The toponym data were sourced from the China National Geographical Names Database. IEG has been spatially concentrated for at least four generations, preserving distinct cultural practices, including their native languages. The place names derived from these languages are often difficult to translate into standardized naming conventions. Therefore, toponymic data are considered a reasonable proxy for indicating ethnic population distribution in certain regions. Following the approach proposed by Zhao59 et al., keywords from minority languages—such as Dai, Lisu, Jingpo, and Deang—were used to construct an ethnic toponym database, from which a dataset of minority-related place names was extracted. The data underwent screening to identify ethnic minority distributions using keyword extraction methods. The geographical names of the IEG in Dehong (Dai, Jingpo, Lisu, Deang and Achang) were selected from the distribution area, resulting in 3,465 IEG toponyms. Similar to point of interest (POI) data, place name data contain precise location information (latitude and longitude) as well as information on the origin of the place names. However, toponymic data are published by the national authority, the relevant information is more authoritative and stable, which is an important basis for exploring the distribution of ethnic minorities.

The data underwent screening to identify ethnic minority distributions using keyword extraction methods. The geographical names of the IEG in Dehong (Dai, Jingpo, Lisu, Deang and Achang) were selected from the distribution area, resulting in 3,465 ethnic minority geographical names. Similar to point of interest (POI) data, place name data contain precise location information (latitude and longitude) as well as information on the origin of the place names. However, toponymic data are published by the national authority, the relevant information is more authoritative and stable, which is an important basis for exploring the distribution of ethnic minorities.

Table 1 Research data description.

Methods

The method flowchart of this article is shown in Fig. 2.

Fig. 2
Fig. 2
Full size image

Graphic Summary. The map was created using ArcGIS Pro (version 3.0.1, https://www.esri.com/en-us/arcgis/products/arcgis-pro).

Constructing a distribution index for ethnic minorities

Current research on the development of ethnic minority areas is predominantly qualitative and based on field surveys, lacking theoretical frameworks and methodologies for evaluating the distribution of ethnic groups, as well as standardized quality criteria. The development status of ethnic minority areas is influenced by various factors such as population, economy, environment, and geographical location. Therefore, establishing indicators for the development of ethnic minority areas involves considering multiple factors.

Nighttime light illumination data can effectively reflect the economic development of a region and encompass many indicators not captured by traditional models. This study utilizes minority toponymic dataset to delineate the initial residential areas of ethnic minorities. Considering cases where ethnic minority place names exist but no ethnic minorities reside, this study adopts the methodology proposed by Zhao et al.60 Combining ethnic minority place name data with population data, ethnic minority areas are extracted. The formula for extracting IEG areas is presented as Eqs. (13).

$$\:{PR}_{i}=\frac{{y}_{ij}}{{Y}_{j}}$$
(1)
$$\:f\left(x\right)=\frac{1}{nh}\times\:\sum\:_{j=1}^{n}k\left(\frac{x-{x}_{j}}{h}\right)$$
(2)
$$\:{EMD}_{i}=\sqrt{{PR}_{i}\times\:{KDE}_{i}\times\:{NTL}_{i}}$$
(3)

Where \(\:{y}_{ij}\) represents the number of individuals belonging to minority group i in township j, and \(\:{Y}_{j}\)​ represent the total ethnic minority population in township j. The population ratio PRi (0 ≤ PRi≤1) indicates the proportion of ethnic group i within the township population.

The kernel density function is denoted as \(\:f\left(x\right)\), where \(\:k\left(\frac{x-{x}_{j}}{h}\right)\) represents the Gaussian kernel function. Here, h is the smoothing bandwidth parameter (h > 0), and \(\:{x}_{1}\),…, \(\:{x}_{n}\)​ are n independently distributed samples. The term \(\:x-{x}_{j}\) refers to the distance between the estimation point x and a known sample point \(\:{x}_{j}\).

\(\:{EMD}_{i}\) represents the distribution of ethnic minority group i, and \(\:{KDE}_{i}\) denotes the kernel density analysis result of the toponyms of ethnic minority i, with a final search radius of 3000 m. \(\:{NTL}_{i}\) represents the nighttime light radiation brightness value at point i.

The distribution of ethnic minorities is influenced by a variety of factors, including historical, geographical, economic, political and socio-cultural aspects. Additionally, there are differences in development between different regions of the same ethnic group and between different ethnic groups in the same region. Therefore, utilizing distribution of ethnic minority index (EMD) can effectively reflect these variations in development indicators, shedding light on the spatial distribution and developmental status of different ethnic minorities.

Spatialised modelling of the distribution of ethnic minorities

Random forest

The random forest algorithm is an ensemble learning method that utilizes decision trees and is commonly used to learn classification, regression and clustering problems, the algorithm can be used to achieve better results and operation through the addition of missing data and modelling of categorical values61. Additionally, during the training process, the random forest algorithm can utilize out-of-bag (OOB) data to achieve an unbiased estimate of real data errors, thereby detecting the mutual influences between feature points. Compared to other traditional regression models, random forests not only enable accurate predictions but also mitigate overfitting, demonstrating high adaptability and tolerance to outliers and noise62,63.

Furthermore, Stevens et al.64 pointed out that when using the random forest model in population spatialization, it is essential to have well-matched census data with GIS administrative boundaries. Therefore, based on the sixth and seventh census data of Dehong Prefecture obtained in this study, population spatialization was conducted using the random forest model in ArcGIS Pro 3.0. Firstly, population data at the township level were collected to obtain the total population of each ethnic minority and the population density in each area. Population density was selected as the dependent variable for this study, while seven influencing factors were chosen as auxiliary data, including land cover (LC), road density (RD), DEM, NDVI, POI kernel density (PD), slope, and EMD.

Taking into account the natural geographical features, population concentration patterns and cultural historical characteristics of ethnic minority distribution, different models were constructed for different IEG in fitting the population model. A total of twelve training models were constructed, including IEG-2010, Dai-2010, Jingpo-2010, Achang-2010, Lisu-2010, Deang-2010, IEG-2020, Dai-2020, Jingpo-2020, Achang-2020, Lisu-2020, and Deang-2020. Among them, “IEG” and “Dai, Jingpo, Achang, Lisu and Deang” represent “the overall situation of the IEG (5 minorities)” and “the distribution of the populations of the Dai, Jingpo, Achang and Lisu ethnic groups”, respectively, and 2010 and 2020 refer to the year. Each model used auxiliary data as the independent variables and township-level population density as the dependent variables.

Accuracy verification

Due to the bootstrap method used for sampling with replacement, 80% of the samples from the original training set were collected, while the remaining 20% constituted OOB data for cross-validation. Thus, this study utilized OOB unbiased estimates to assess the accuracy of the random forest models under different parameter settings. The determination coefficient (R2) and root mean square error (RMSE) between the census population in the validation dataset and the predicted population were used as the basis for evaluating the effectiveness, rationality, and accuracy of the training models employed in this research65. The simulated 500 m grid population distribution in this study was based on township-level census population data for spatialization. To reflect the accuracy level of the population spatialization model, the RMSE and the relative root mean square error (%RMSE) were utilized to measure the overall prediction accuracy. %RMSE is obtained by dividing the root-mean-square error by the average of the census counts, which reflects the accuracy of the model simulation, and is zero when the predicted value is exactly the same as the true value, and the larger the error, the larger the value.

$$\:RMSE=\sqrt{\frac{1}{N}\sum\:{({\widehat{y}}_{i}-{y}_{i})}^{2}}$$
(4)
$$\:\%RMSE=\frac{RMSE}{\frac{1}{N}\sum\:{y}_{i}}$$
(5)

In the equation, \(\:{\widehat{y}}_{i}\) represents the estimated value of the population density obtained from the population spatialization conducted in this study for the ith dataset. \(\:{y}_{i}\) represents the reference value of the population density obtained from the census data for the ith dataset. N is the number of towns is 51.

Mapping of distribution areas of ethnic minorities

Due to significant differences in population density among various IEG, this study employs a natural break method to classify the population density of each ethnic group into 7 levels and reclassify them. These levels include very sparse, sparse, generally sparse, medium, generally dense, dense, and very dense.

To illustrate the main distribution locations of each ethnic group in the population decomposition results effectively, the study compares the population density classification of IEG in each grid unit. For instance, if the Achang ethnic group is classified as “dense” in a grid unit while other IEG are less than “dense”, that grid unit will be identified as an Achang ethnic distribution area. Based on this, the overall distribution of IEG across the region is mapped out. If two or more ethnic groups have the same population level, they are labelled as mixed image elements and subsequently identified using a 3*3 window, which is defined as the ethnic category with the highest number of occurrences within the window. The image is identified with an ethnicity category if and only if there are more than 50 per cent valid image elements within the window, otherwise it is recorded as a background value.

Results

Characteristics of spatial demographic change

To explore the spatial changes in population across townships in Dehong Prefecture, population density increases and decreases from 2010 to 2020 were mapped for the total population, Han ethnicity, all ethnic minorities combined, IEG, and other ethnic minorities (except IEG). Uniform classification standards were applied to facilitate a better comparison of population changes across townships in Dehong Prefecture. The results are illustrated in Fig. 3.

In Dehong Prefecture, most areas show a decreasing trend in population, but there are noticeable increases in population density in small-scale areas within each county. This suggests an enhancement in population clustering tendencies, possibly driven by the pressure on ethnic survival, which promotes the evolution of villages66. Migration and concentration of populations toward economically developed areas contribute to improving urbanization and high-quality development67. In this classification, the areas of Han population density growth closely match the overall trend in population change across the prefecture (Fig. 3b). Significant increases in Han population density are observed in some townships, with density increases of over 10 people per square meter (in categories 3 and 4) being the most common among the five categories. The decrease in population density among ethnic minorities is substantial, particularly in Longchuan and Lianghe, with decreases mostly exceeding 10 people per square meter (categories 3 and 4). Lianghe County and Longchuan County exhibit significant declines in ethnic minority populations (Fig. 3c), indicating population loss in these areas, which may lead to the disappearance of ethnic-specific cultures. Therefore, exploring the population changes among ethnic minorities is of great significance for the future development of the region. Among ethnic minority populations, the changes among other ethnic groups, apart from the IEG, are minimal (Fig. 3e), indicating relative stability, possibly due to the smaller population base of these other ethnic groups. However, the five IEG show significant fluctuations, predominantly influencing the overall change in ethnic minority populations. Therefore, it is necessary to delve into the specific situations of these IEG.

Fig. 3
Fig. 3
Full size image

Population density changes in different ethnic groups over the past decade (2010–2020). The map was created using ArcGIS Pro (version 3.0.1, https://www.esri.com/en-us/arcgis/products/arcgis-pro).

Based on population density, an analysis of the population changes among each of the IEG is conducted (Fig. 4). Population density changes within the range of -1 to 1 are considered stable. Overall, the population density of the Dai and Achang groups exhibits significant and complex changes, while the Jingpo and Lisu groups show a significant increasing trend. Among them, the Deang group exhibits the most stable population density across the entire prefecture. As of 2020, the Dai group accounts for the largest proportion among ethnic minorities in Dehong, at 59.34%, representing 27.16% of the total population of the prefecture. Therefore, just as the Han population change plays a dominant role in the demographic change of the whole state, the population changes of the Dai largely dominate the overall demographic change of IEG. Due to the cultural similarities between the Dai and the Han, more detailed research exchanges are needed. As the second largest ethnic minority in Dehong, the Jingpo ethnic group accounts for 22.24% of the total ethnic minority population and 10.18% of the prefecture’s total population. The population in the counties of Yingjiang, Longchuan, Ruili, and Mangshi remains relatively stable (level 2), with a slight increasing trend in the southwest part of Dehong Prefecture. The Achang ethnic group, accounting for 5.14% of the ethnic minority population, shows signs of population decline in most areas across the prefecture, except for a few townships where have a significant increase in population. The Lisu and Deang ethnic groups account for 5.67% and 2.37% of the ethnic minority population, respectively. Their population density changes are generally less than 4 people per square meter, indicating relatively stable living conditions and less survival pressure for these two ethnic groups in recent years.

Fig. 4
Fig. 4
Full size image

Population density changes of 5 ethnic groups over the past decade (2010–2020). The map was created using ArcGIS Pro (version 3.0.1, https://www.esri.com/en-us/arcgis/products/arcgis-pro).

Spatial distribution of IEG areas based on random forest

The regional background, including natural geographical features, economic status, social status, and transportation accessibility, influences the spatial differentiation patterns of population migration68. This study uses the population densities of IEG as dependent variable and selects EMD (IEG), LC (land cover), RD (road density), PD (POI density), and other factors as 7 independent variables to establish population distribution models for different years and ethnic groups. As shown in Table 2, the R² values of the training models for various ethnic groups in Dehong for the years 2010 and 2020 are all greater than 0.80, and the %RMSE values are less than 4.3%. This indicates that the model has a good predictive effect at the township level, providing a reliable nonlinear relationship for subsequent grid-level population predictions.

Table 2 Evaluate the accuracy of the model.

Figure 5a and b display the results of the IEG-2010 and IEG − 2020 models, respectively. The figures reveal that, compared to a decade ago, the connectivity between townships for the IEG populations has diminished by 2020 (Fig. 5), and the darkening color in the central part of the belt indicates an increasing concentration of the population. The clustering effects in places like Mangshi and Ruili have intensified, with more pronounced aggregation features. Mangshi, as the administrative center of Dehong Prefecture, and Ruili, a well-known border city, suggest that IEG populations may be concentrating toward commercial and administrative hubs. In areas like Longchuan, which previously had two population centers, the clustering effect has weakened over the decade, possibly leading to a decline in minority cultural phenomena. The development of different regions, such as the rise of tourism and changes in economic models, attracts the migration of minority populations69. In other regions, populations are primarily concentrated along riverbanks and border areas, such as the banks of the Yingjiang and Ruili rivers, as well as border ports and areas adjacent to other prefectures in Longchuan and Ruili.

Figure 5c and d, created using the cartographic method described in Sect. 3.2.3, illustrate the distribution of the five IEG based on their population decomposition results in 2010 and 2020. In 2010, the Dai people were primarily concentrated in the central areas of various clusters, the Deang people were distributed along connecting pathways, and the Lisu people’s distribution, although limited, exhibited clear clustering characteristics. By 2020, the clustering patterns of all minority populations showed some degree of attenuation. However, the clustering effect of the Lisu people remained significant, indicating stable development. The Deang people continued to act as a link between townships. The differences in cultural and natural geographical conditions, such as living habits, terrain, and vegetation at different altitudes, have contributed to the long-term stability of these relationships.

Fig. 5
Fig. 5
Full size image

The overall situation and spatial distribution of different IEG in 2010 and 2020. The map was created using ArcGIS Pro (version 3.0.1, https://www.esri.com/en-us/arcgis/products/arcgis-pro).

From a spatial perspective, the overall clustering trend of IEG populations is quite significant, especially in Yingjiang, Ruili, and Mangshi, where the population increase is particularly notable. Figure 6 illustrates the spatiotemporal changes in the IEG populations from 2010 to 2020. The regions with population growth are concentrated on both sides of the economic centers of Ruili and Mangshi. In contrast, the areas with population decline are more widely distributed but show lower values, indicating a trend toward concentrated population distribution. The population decrease at the borders of various counties aligns with the spatial characteristics of the reduced population in the township connections shown in Fig. 5a and b. The Dai people exhibit a broad population decline, mirroring the overall trend of IEG. The areas with population increase are concentrated around the economic centers of Ruili and Mangshi, while the surrounding townships show a population decrease. Perhaps the commercial centers are attracting more minority populations. In June 2010, the State Council of China officially designated Ruili in Yunnan Province as a key pilot zone for development and opening-up70. Compared to the Dai, the Jingpo people show a more pronounced population increase, with small-scale growth around economic centers and a notable increase in the Yingjiang River basin and the junction of Longchuan and Ruili. The Achang and Lisu populations show broader areas of population growth compared to the Dai and Jingpo. The Achang people exhibit an increasing trend in Longchuan, Lianghe, Ruili, and Mangshi, with clearly defined clusters. In contrast, the Lisu population increase is mainly in the northern part of Dehong Prefecture, while the Achang are the primary drivers of population growth in the southern part. The spatial pattern of population growth for the Deang ethnic group is similar to that of the Achang group, though the decreasing trend is slightly more pronounced for the Deang.

Fig. 6
Fig. 6
Full size image

Spatial changes in population density among IEG over the past decade (2010–2020). The map was created using ArcGIS Pro (version 3.0.1, https://www.esri.com/en-us/arcgis/products/arcgis-pro).

Variable feature importance across different minority ethnic backgrounds

Investigating the impact of auxiliary data on population distribution across different geographic backgrounds and analyzing the importance of their features helps understand the influence of different variables on model accuracy and reveals patterns in population distribution65. The larger the feature importance, the greater the overall impact of the distribution variables. In 2010, PD and NDVI consistently held high positions, along with emd in some areas. In 2020, there was a dramatic increase in the importance of EMD and LC. Particularly noteworthy is the increasing proportion of land use importance.

Figure 7 illustrates the feature importance of seven variables (RD, PD, EMD, NDVI, LC, DEM, S) among the IEG (Dai, Jingpo, Lisu, Achang, Deang). Overall, PD, EMD, and NDVI had stable high importance across ethnic groups in different years, with the exception of the Lisu. In 2020, compared to 2010, the importance of LC has dramatically increased for population distribution. In 2010, the variable importance distribution for the Dai and Jingpo was similar to the IEG, likely due to their significantly larger population size among the IEG. For the Lisu, PD and NDVI are overwhelmingly dominant, with EMD having minimal impact compared to other ethnic groups, which may suggest that in 2010, the distribution of the Lisu ethnic group was more dependent on village-level settlement patterns, with population distribution influenced by road networks and vegetation. Among the Achang, EMD demonstrates significant dominance, followed by PD, DEM, and S, suggesting that the Achang’s distribution relies more on geographic factors, but less on LC land use correlation. Given the long-term stability of DEM and S, the likelihood of large-scale migration among the Achang is minimal. RD has the least impact on the Deang population distribution, suggesting that in 2010, the settlement pattern of the Deang ethnic group was less influenced by road infrastructure, possibly due to poor road conditions or limited accessibility.

In 2020, the variable importance of LC and RD increased significantly across all IEG, while DEM and S showed a slight decline, resulting in a more balanced importance of each variable overall. The increased importance of RD may be attributed to the Chinese government’s comprehensive rural revitalization policy, which has improved road access to villages and thus better reflects village distribution. Dehong has vigorously promoted the development of integrated transportation. From 2016 to 2020, the total investment in infrastructure projects across the prefecture exceeded 67 billion yuan71. The significant increase in LC importance among major ethnic groups suggests that land use has become more crucial for ethnic distribution compared to 2010, indicating an increasing awareness of development among ethnic groups. Investigations revealed that from 2010 to 2018, the degree of land use order in Dehong Prefecture gradually declined. During this period, land use homogeneity and diversity increased, indicating a shift toward a higher level of disorder. The dominance of single land use types weakened, while land functions became more comprehensive and integrated72.

And the population changes in different land use types will be discussed in Sect. 5.2. The decreased importance of DEM and S may indicate a diminishing influence of natural features on the distribution of IEG populations, with economic factors becoming more attractive for migration and settlement. It is worth noting that the impact of EMD on the Lisu remains limited, possibly due to their predominant distribution in densely vegetated areas, where dense vegetation obstructs NTL satellite observations, reflecting the preference for habitat selection among this ethnic group.

Fig. 7
Fig. 7
Full size image

Importance statistics of model variables in 2010 and 2020.

Discussions

The effectiveness and importance of EMD data vary across different IEG distributions

Fig. 8
Fig. 8
Full size image

Comparison results of model R2 between NTL and EMD input.

To further demonstrate the effectiveness of incorporating EMD data rather than directly using NTL data, we constructed a population spatialization model based on NTL data to measure the degree of improvement in model accuracy (NTL was used to replace EMD, while the remaining six variables remained unchanged). Figure 8 illustrates the improvement in accuracy achieved by using models trained with NTL and EMD data. Models with lower validation R2 values after introducing NTL data, which do not meet usability criteria, are not displayed. The results indicate that using EMD data as auxiliary information enhances the accuracy of population spatialization for ethnic minority populations in various regions compared to using NTL alone.

The improvement of model accuracy varies significantly among different IEG, with a more noticeable increase in accuracy observed in the 2010 models. The model of Dai and Jingpo exhibited the largest improvements, with increases of 0.022 and 0.033. For other IEG, the models using NTL did not pass significance tests. This difference suggests that while NTL data effectively reflects the relationship between nighttime light and population, it requires combination with different ethnic characteristic data (such as EMD data has introduced ethnic minority toponym distribution data) to better reflect their spatial distribution. However, the differences between models in 2020 are less, possibly due to the gradual balance in the importance ratio of various variables in 2020, with EMD not being prominently featured. Hence, the effect of experimentally introducing changes in NTL and EMD was not significant.

Previous studies have extensively demonstrated the importance of NTL in understanding population distribution. However, there have been fewer studies that integrate toponymic data to enhance model accuracy and reflect the distribution of populations of different ethnicities. This approach holds great potential for studying the distribution of different ethnic groups and also expands the scope of using NTL in conjunction with other spatio-temporal data.

Population changes across different land cover types

Taking into account the variations in the importance of CLCD as demonstrated in Sect. 4.3, this study compiled the different LC scenarios (Table 3) and the distribution of IEG changes among different LC types (Fig. 9).

From 2010 to 2020, there was an increase in bare land, farmland, built-up areas, water bodies, and shrubland, while forest land and grassland decreased. Particularly, built-up areas saw a significant increase, with an expansion rate of 181.71% compared to 2010. The primary sources of this increase were farmland (35.22%) and grassland (26.67%), indicating an overall increase in construction land in Dehong Prefecture73. The continuous urban development and expansion of built-up areas may be attracting and concentrating ethnic minority populations. In 2020, the expansion of farmland mainly originated from forest land (26.61%) and grassland (0.93%), suggesting an increase in land reclamation activities over the decade, possibly driven by increased agricultural activities among ethnic minority communities, although some forested areas have been reclaimed or lost74.

Table 3 Transformation of LC in the study area from 2010 to 2020.

The population of the IEG has significantly increased on built-up areas, with the Dai ethnicity showing the most noticeable growth. As shown by the deep red color of construction land in the fourth column of Fig. 9a, the increase in the Dai population mainly occurred in construction land areas, which include land converted from farmland, forest, and grassland, in 2020. The first row represents the distribution of farmland in 2010. Except for the farmland converted to construction land shown in red, the remaining cells are blue, indicating a significant decrease in the Dai population in the original farmland areas by 2020. This population loss in original farmland and its shift toward construction land may reflect an improvement in living facilities for the Dai people and a gradual reduction in agricultural activities. Investigations found that after 2010, the agricultural and ecological spaces in Dehong were correspondingly compressed, while urban spaces showed a clear expansion. This was manifested by an increase in rural residential land, whereas agricultural production lands such as cultivated land and orchards continued to decrease39.

In Fig. 9e, the population of the Deang has increased in all areas except shrubland, indicating a stable increasing trend. At present, the Achang people is distributed across built-up areas, grasslands, and farmlands, while the Lisu ethnicity predominantly resides in built-up areas and grasslands, similar to the Dai ethnicity. Overall, the populations of all ethnic groups have increased on both pre-existing and newly developed built-up areas, suggesting a migration of population towards built-up areas. This phenomenon reflects an improvement in living standards, possibly attributed to the rural revitalization goal announced during the 19th National Congress of the Communist Party of China in late 201775. This goal aims to promote rural development, support commercial activities, enhance local interactions, ensure effective governance, and foster prosperity in rural areas. Consequently, under this initiative, significant improvements have been made in transportation infrastructure and residential environments for rural residents.

Fig. 9
Fig. 9
Full size image

Population density changes under different LC transformations.

Recommendations for future work

Spatial decomposition of population based on different ethnic groups is an extremely complex task involving many economic, social and environmental aspects, and this study attempts to introduce geographical names as the main feature for identifying the distribution areas of indigenous ethnic minorities. And there are still areas for improvement in this study:

  1. 1.

    Improve the spatiotemporal resolution of data. Collecting more data to achieve finer resolution in the spatial decomposition of minority populations. It is noted that the SDGSAT-1 satellite, with its spatial resolution of up to 10 m, has been launched, which can effectively enhance the spatial resolution of population spatialization76. Since the census data used in this study mainly cover population data only 2010 and 2020, and the data from SDGSAT-1 is not yet available, the spatial resolution of this study is somewhat limited. However, future research can continue to explore with higher spatial resolution data. Besides, future studies on ethnic population distribution could benefit from incorporating data sources with higher spatial resolution and temporal alignment with the research period. Although place names show a significant correlation with ethnic distribution, their use as proxy indicators has certain limitations. Toponyms often reflect historical settlement patterns, which may not fully align with the current population distribution following recent migration. Future research could integrate oral histories, field investigations, and genealogical migration records to validate the spatial coupling between place names and the actual population distribution77,78.

  2. 2.

    Optimize the population decomposition model. The current model treats urban and rural areas as homogeneous spaces, failing to adequately capture the differentiated impacts of urban–rural structures on the distribution of ethnic minority populations. Establishing separate models for ethnic population decomposition in urban and rural areas79,80. Scholars have already conducted regional divisions between urban and rural areas. Subsequent studies can incorporate this division to construct population decomposition models for different regions, thereby analyzing the different spatial distribution patterns of minority populations in urban and rural areas and exploring the underlying drivers of population migration. In addition, this study primarily employed KDE to illustrate the clustering patterns of toponyms. Future research could further enhance this approach by incorporating improvements such as integrating gravity model or conducting anisotropic analysis81.

  3. 3.

    Other applications of social mapping. For population groups characterized by specific cultural markers, distribution maps can be generated based on the extraction of such markers. This approach is applicable when two conditions are met: (a) Identifiability — the target group possesses unique cultural markers that are both recognizable and spatially mappable (e.g., language, architecture, or commercial patterns); (b) Spatial clustering — the group exhibits a non-uniform spatial distribution, allowing density-based indicators to hold statistical significance. Based on the extracted cultural markers, local characteristics can be incorporated to construct a vitality index (e.g., the EMD used in this study), which can then be used to further analyze population distribution. For example, distribution maps of overseas Chinese populations can be generated using toponyms with Chinese linguistic features; dialect group maps can be developed based on lexical preferences in specific dialect regions77; and maps of religious communities can be constructed by identifying religious place names or architectural markers. By uncovering the spatial distribution of place names and regional cultural traits82, such approaches can help indicate population patterns and support the construction of population distribution indices, thereby shifting the focus of toponymic studies toward the spatial distribution of specific population groups.

Conclusions

The main conclusions of this study are as follows:

  1. 1.

    From 2010 to 2020, most areas in Dehong Prefecture witnessed a decrease in population. The population density changes of the Dai and Achang ethnic groups were large and complex, while the population changes of the Jingpo and Lisu were steady growth. The population density of the Deang ethnic group changed the least and remained the most stable across the entire prefecture.

  2. 2.

    Compared to 2010, the clustering characteristics of IEG became more prominent in 2020. In 2010, IEG populations were mainly distributed in Ruili, Mangshi, and the Yingjiang River basin, with clear “ethnic minority ties” connecting counties, mainly composed of the Deang, Lisu, and Dai ethnic groups. In 2020, the clustering trend of IEG populations became more pronounced, with the “tie effect” weakening. Populations near economic centers shifted towards these centers, while the Deang and Lisu ethnic groups still maintained ethnic ties, and the Dai ethnic group clustered towards economic centers.

  3. 3.

    The EMD effectively indicated the distribution areas of IEG, and the distribution of IEG is significantly influenced by factors such as PD, EMD, and NDVI consistently exhibit high importance across different years and IEG. The importance of LC for population distribution increased significantly after ten years. Compared to 2010, the number of IEG residents on construction land increased significantly by 2020, with clustering towards areas of economic development, improved infrastructure, and easy access to transportation.

The clustering of ethnic minority areas not only reflects the spatial distribution of populations but also serves as an important indicator of cultural, economic, and social development. However, ethnic minority populations are widely distributed, and in-depth research on this phenomenon not only helps to understand the uniqueness and diversity of ethnic minority areas but also provides important decision-making references for local governments to promote economic development, cultural heritage, and social harmony in ethnic minority areas.