Introduction

Ensuring inclusive and equitable access to quality education is one of the sustainable development goals (SDGs) for the 21st century (United Nations General Assembly, 2015; Allen et al., 2018; Bruns et al., 2019; Sajjad et al., 2022). Addressing urban-rural education inequality emerges as a particularly prominent issue (Baffoe et al., 2021, Han et al., 2023). Decades of rapid urbanization have led to a concentration of services and opportunities, thereby exerting pressures on the accessibility of public services and welfare in spatially dispersed areas such as rural regions (Atkinson, 2019; Ge et al., 2020; Yu et al., 2023). Furthermore, the evolving standards and planning requirements for urban public services have often overshadowed the welfare equity needs of rural communities (Bosworth and Venhorst, 2018; Johnson and Lichter, 2019). The “rural challenge” is a persistent issue that has yet to be resolved in both developed and developing countries (Moseley, 2023; Delbosc and Currie, 2011; Pot et al., 2023). Addressing this challenge is crucial for achieving educational parity between urban and rural areas, which is not only a means to reduce internal and international inequalities but also an integral component of sustainable urban development.

As the country with the fastest urbanization rate in this century, China also faces multiple “rural challenges”. In the context of economic globalization and manufacturing restructuring, the employment opportunities generated by rapid urbanization have led to increasing rural labor migrating to urban industrial and service jobs (Abbasi et al., 2023; Balsa-Barreiro et al., 2019; Chen et al., 2009). This trend has promoted the economic income of the rural population and greatly improved the economic and educational resources available to migrant children (Chang et al., 2019; Zamir et al., 2023). However, due to the restrictions of the household registration system, rural migrants generally cannot enjoy the same social welfare benefits as urban citizens (Abbasi et al., 2022; Zhang et al., 2015). Consequently, the educational resources for left-behind children remain primarily dependent on rural educational facilities (Zhang, 2017). Despite China’s implementation of targeted policies like the Rural School Mapping Adjustment (RSMA) project (Zhao and Barakat, 2015; Liu and Xing, 2016), and the increasing public financial investment in rural basic education (Cai et al., 2017; Zhang, 2017), the challenges of educational equity and accessibility in rural China remain significant.

At the same time, rural settlements in China have also experienced unprecedented reconstruction and migration in the past 20 years. In recent years, with the slowdown of urbanization, China’s development strategy has gradually shifted from city-centered to urban-rural balance (Zhao et al., 2018). In order to solve the problems caused by the neglect of rural development, such as soil and water pollution, insufficient infrastructure such as roads, water, electricity, and communications, lack of social services such as medical education, extensive land use, and insufficient housing security and risk resistance (Zhang and Kanbur, 2009; Liu and Liu, 2021), a series of national and regional policies to improve the rural living environment have been implemented, such as such as “Beautiful Countryside”, “One Village One Product”, and “Rural Revitalization,” which have initiated a “campaign-style” migration, reorganization, and construction that has rapidly transformed the rural settlement pattern (Liu et al., 2017). As a result, the development trajectory in rural settlements is distinct from that of urban settlements, which are typically characterized by concentrated growth (He et al., 2017). The evolution of rural settlements is marked by greater diversity and unpredictability (Wilson et al., 2003; Aguilera et al., 2011). In rural areas that lack systematic planning guidance, there is a notable absence of dynamic research on the impact of large-scale settlement construction and relocation on educational accessibility.

In this context, this paper conducts a comprehensive analysis of the spatial and growth patterns of newly established rural settlements in China over the past two decades based on a patch-scale. Utilizing the Multiscale Geographically Weighted Regression (MGWR) method, the study delves into the spatial disparities in how the development and expansion of these settlements influence educational accessibility. The research aims to address three pivotal questions: (1) What are the observed change patterns in rural settlements during the last 20 years? (2) What has been the trajectory of rural educational equity development? (3) How does the expansion of rural settlements affect the equity in rural education? The rest of the article is designed as follows: the section “Literature review” discusses the related research on rural settlement expansions and rural educational accessibility; the third section presents the Materials and Methodology, deliberates on the data sources and the MGWR analytical approach. The fourth section provides an in-depth analysis of the distribution of rural settlements, their expansion patterns, the evolution of educational accessibility, and the specific impacts of rural settlement growth on educational equity. The fifth section offers a discussion on the findings. The paper concludes with a summary of key insights, policy recommendations and limitations.

Literature review

Rural settlement changes

Rural settlements, as pivotal nodes of accessibility, play a crucial role in shaping the accessibility of education. As aforementioned, the dynamic changes of rural settlements are diversity and unpredictability (He et al., 2017; Ma et al., 2018). Researchers have probed into the distribution and evolution of rural settlements through various lenses, including landscape morphology (Tian et al., 2012), hierarchical structure (Wen et al., 2020), spatial configuration (Johnson and Maxwell, 2001), and social networks (Tian et al., 2016). The spatial distribution of rural settlements in China can be categorized into several distinct patterns, such as clustered, random, and uniformly dispersed (Yang et al., 2016). Regarding expansion patterns and dynamic evolution, the methodologies often draw parallels with the urban expansion typology, encompassing edge expansion, outlying, and infilling (Forman, 1995; Liu et al., 2014; He et al., 2023). Furthermore, existing research has delved into the intensity of rural land use (Robinson, 2003), the process of rural settlement urbanization (Zhu et al., 2020), the underlying driving mechanisms (Carrión‐Flores and Irwin, 2004), evolutionary simulations (Zhao et al., 2024), interactions with other land use types (Qu et al., 2019), and the broader impacts on cropland and ecosystem health (Chen et al., 2021).

Rural educational accessibility

The urban-rural gap in settlement research is mirrored in the study of educational accessibility (Chen et al., 2021; Liu and Xing, 2016; Shahraki et al., 2016). In urban settings, there is a well-established framework for planning and evaluating the provision of educational facilities, whether for new developments or to enhance the existing spatial distribution. Planning metrics, such as the community life circles (Liu et al., 2022; Song et al., 2024) and the ‘15-min city’ concepts (Allam et al., 2022; Zhang et al., 2023), are designed to ensure proximity and accessibility to educational resources. Moreover, urban areas generally exhibit a heightened awareness of educational accessibility, which extends beyond mere physical distance to encompass a multifaceted approach that includes time accessibility (Yan et al., 2018; Xiang et al., 2022; Wu et al., 2020). Unfortunately, in rural regions, the issue of educational accessibility is frequently neglected (Xiang and Stillwell, 2023). Challenges of educational equity and accessibility in rural China remain significant.

The spatial and geographical dimensions of educational access are integral to the concept of equality, representing a fundamental right of every citizen from a social perspective (Kučerová et al., 2015). The study of educational spatial accessibility has been thoroughly explored at the local scale, utilizing big data that combine walking, cycling, public transportation, and private car usage to more accurately measure the accessibility of primary and secondary schools (Wang et al., 2021; Han et al., 2023). In contrast, research in rural areas has predominantly taken a macro-evaluation approach, considering educational accessibility based on administration statistics (Yang et al., 2014; Xiang and Stillwell, 2023). A limited number of studies have attempted a more detailed geographical analysis of educational accessibility in rural areas, but these studies are mainly static, and the research scale is limited to local studies (Wei et al., 2018; Han et al., 2023; Wang et al., 2021; Geng et al., 2023; Zhao and Barakat, 2015), resulting in conclusions and policy implications that vary significantly across regions. Furthermore, due to the regional differences in resource endowment and economic development, changes in educational accessibility are non-stationary in time and space, and few studies have been conducted in-depth analyses of this issue.

To fill these gaps, this study measured the expansion patterns of rural settlements across the country, drawing parallels with urban expansion patterns. It also focused on the spatial non-stationarity of educational accessibility and explored the influencing factors and driving mechanisms of educational accessibility in different regions. Our study contributes to the theoretical understanding of rural educational accessibility and offers quantitative insights to inform planning policies concerning rural settlement and infrastructure development.

Materials and methodology

Study area

The study area includes eight natural and economic geographical regions in mainland China, involving 31 provinces and 2,867 county-level administrative regions (Fig. 1). These regions were divided based on diverse natural terrain conditions and regional economic developments. These regions include the Northeast Region (NER), the Northern Coastal Region (NCR), the Middle and Lower Yangtze River Region (MLYR), the Eastern Coastal Region (ECR), the Southern Coastal Region (SCR), the Middle Yellow River region (MYR), the Northwest Region (NWR), and the Southwest Region (SWR)Footnote 1. Additionally, the study takes into account the Hu-Line, which was proposed by Chinese geographer Hu Huanyong in 1935. This line serves as a demarcation of population density, terrain and natural environment in China. The eastern side of the Hu Line represents the densely populated and economically developed areas, while the northwest is regarded as sparsely populated and economically underdeveloped. This line effectively captures the pattern of the distribution pattern of rural settlements and rural infrastructure in China.

Fig. 1
figure 1

Study area.

Data and sources

This study utilized various datasets, including land use data, digital elevation models (DEMs), and county socioeconomic statistics (Table 1). Land use data is the data source for identifying the spatial distribution of rural settlements and educational facilities. In this study, we used the China Land Use/Cover Database (NLUD-C) released by the Resources and Environmental Sciences and Data Center of the Chinese Academy of Sciences. It was derived from TM/ETM images and the China-Brazil Earth Resources Satellite as the main information source, and was clarified through image correction, visual interpretation, supervised classification, and field surveys. The classification accuracy of NLUD-C has reached more than 90%, making it one of the most accurate and widely used land cover datasets. The database contains land use conditions for multiple periods. This study selected data from 2000 to 2020 and extracted land types coded as 52 as the subsequent analysis source for rural settlement patches. The distribution data of educational facilities was collected from https://wiki.openstreetmap.org/. Road data, administrative boundaries, and government locations were taken from the National Earth Systems Science Data Center of the Institute of Geography, Chinese Academy of Sciences (www.geodata.cn). The digital elevation model (DEM) was from the ASTER GDEM with a spatial resolution of 30 meters, which was downloaded from the Geospatial Data Cloud (http://www.gscloud.cn/search). The socioeconomic data were cleaned and sorted from the China County Statistical Yearbook, the National Bureau of Statistics, and other relevant websites.

Table 1 Data sources.

Variables and methodology

Definition of rural settlement expansion pattern

By overlaying and analyzing rural settlement patches from the years 2000 and 2020, we have delineated the emergence of new settlements over the two-decade period. We have adapted the calculation method traditionally used for urban expansion types to the realm of rural settlements. The expansion patterns of these new rural settlements are categorized into three types based on their spatial relationship with existing settlements: filling expansion, edge expansion, and outlying expansion. To quantify these expansion modes and patterns more precisely, we have developed the measurement of General Landscape Expansion Index (GLEI) proposed by Liu et al. (2010). The index is calculated using the formula (1), which measures the proportion of the newly added rural residential patch that overlaps with the original settlement.

$${GLEI}=\frac{{A}_{0}}{{A}_{0}+{A}_{v}}\times 100 \%$$
(1)

In this formula, \({A}_{0}\) represents the intersection area between the newly added rural residential patch and the original settlement patch, while \({A}_{v}\) denotes the intersection area between the buffer of the newly added patch and the open space. The classification of expansion types is based on the value of the GLEI: if GLEI > 50, the expansion is filling type; if 0 < GLEI < 50, it is edge expansion; and if GLEI = 0, the expansion is outlying.

Measurement of rural education accessibility

Given China’s administrative division system and the advantages of using distance to represent accessibility, this paper adopts the average shortest path method proposed by Yin et al. (2018) to measure rural education accessibility. The accessibility of education is assessed by calculating the shortest travel distance from each rural residential patch to the nearest educational facility. This measure is then used to determine the educational accessibility for all residential settlements within a county, followed by an average calculation to represent the county’s overall accessibility. The formula for this calculation is presented in formula (2). The shortest educational travel distance is operationalized using the nearest neighbor and network analysis tools in ArcGIS software by constructing origin–destination (OD) matrices. These matrices are applied to the network dataset to ascertain the travel-to-school distances, and the spatial distribution pattern is derived using the Manhattan method of nearest neighbor analysis.

$${D}_{k}=\mathop{\sum }\limits_{i=1}^{m}({min }_{j}^{n}{d}_{{ij}})/\mathop{\sum }\limits_{i=1}^{m}i$$
(2)

Here, \({D}_{k}\) signifies the educational accessibility of rural settlements in the kth county. \({d}_{{ij}}\) is the distance from the ith settlement to the jth educational facility, with \({\min }_{j}^{n}{d}_{{ij}}\) representing the distance to the nearest school. m is the total number of settlements in county k, and n is the total number of educational facilities.

The impact of rural settlements on educational accessibility

Influencing factors

The rural educational accessibility is influenced by a multitude of interacting driving factors (Wei et al., 2018). In this study, we categorized these driving factors into 3 groups: rural settlement expansion factors, topographic and geographical factors and socioeconomic factors, encompassing a total of eleven variables.

The expansion of rural settlements exerts a multifaceted influence on educational accessibility, affecting not only the quantity of rural settlements but also their structural distribution and expansion pattern. On one hand, an increase in the number of rural settlements and the enlargement of their spatial footprint can lead to a reduction in the average distance between settlements and schools within a county. This, in turn, can enhance the accessibility of rural education by bringing educational facilities closer to the communities they serve (Geng et al., 2023). On the other hand, the pattern of rural settlement expansion is subject to the regional planning and layout of these settlements, which can manifest in various forms. These different patterns can have a range of impacts on the proximity to primary and secondary schools, thereby influencing educational accessibility in diverse ways. In light of these considerations, this paper strategically selects two key variables to encapsulate the volume and expansion pattern of rural settlements. The Area of Rural Settlements (ARS) serves as a proxy for the scale of rural settlement expansion, and the ratio of the patch area of the outlying growth pattern to the total area of newly added rural settlements (Out_ratio).

Topography and location factors encompass DEM, distance to major urban areas (Dis_urban), distance to the administrative center (Dis_center), and the proximity to roads of various classifications, such as national highways and county roads. The distance to an urban area or to the administrative center reflects the potential disparities in educational accessibility that arise from the urban-rural divide. Settlements that are in close proximity to urban areas or administrative centers are likely to benefit from the spillover effects of better urban infrastructure, which can enhance educational accessibility. In contrast, more remote rural areas may face greater challenges due to the lack of such supportive infrastructure. The accessibility of transportation is also a critical factor measured by the distances to different levels of roads (Geng et al., 2023; Wei et al., 2018; Han et al., 2023). This study specifically examines the distance to national highways (Dis_CR) and county/village roads (Dis_VR). The Dis_CR is an indicator of vehicular accessibility, suggesting that settlements closer to national highways may enjoy more convenient vehicular traffic for commuting to educational facilities. Conversely, the Dis_VR reflects the suitability of the area for pedestrian access, with closer proximity to local roads indicating easier walking conditions for students and educators. The selection of these two variables is based on the premise that the smaller the values of Dis_CR and Dis_VR, the more favorable the transportation conditions, which can significantly improve the accessibility of educational facilities.

Socioeconomic development stands as a pivotal determinant of educational accessibility in rural areas, encompassing a spectrum of factors such as economic growth, industrial structure, and demographic characteristics (Yan et al., 2018; Sajjad et al., 2022; Yao et al., 2022). This paper delves into the influence of key variables, including Gross Domestic Product per capita growth rate from 2000 to 2020 (GDP), the growth rate of agricultural and secondary industrial output percentage of total GDP (1stGDP and 2ndGDP), and the permanent population growth rate from 2000 to 2020 (POP). While population density itself does not have a direct correlation with accessibility, it does have indirect implications. In rural areas with a substantial population, there is a correspondingly larger area of settlements, which may necessitate more educational institutions to serve the community, thus potentially enhancing educational accessibility. However, if the infrastructure in terms of school provision does not scale with population growth, or if settlements are dispersed without strategic school placement, this could lead to a situation where population density is inversely related to accessibility (Zhao and Barakat, 2015). GDP per capita is generally regarded as a facilitative variable for accessibility. In economically robust regions, the government’s fiscal capacity to invest in infrastructure, including roads and educational facilities, is higher. Such investments are instrumental in bolstering the region’s educational accessibility. In contrast, regions with a high percentage of 1stGDP may exhibit relatively undeveloped economies, which could be inversely related to accessibility. The impact of 2ndGDP is more nuanced and can vary. However, when considered alongside 1stGDP, it provides a comprehensive reflection of the region’s industrial structure. They are reliable control variables for assessing economic and social development’s influence on educational accessibility (Zhao et al., 2018).

Table 2 presents the descriptive statistics for all variables. Because of the administrative division adjustments between 2000 and 2020, our statistical samples for rural settlement area and education accessibility consist of 2823 counties. Additionally, due to missing data in some yearbooks and manual inspection, the final effective sample size for other influencing factors is reduced to 1832 counties. Consequently, in the spatial analysis of rural settlement expansion and education accessibility in the results section, 2823 counties were analyzed, while the subsequent MGWR empirical analysis focused on 1832 counties. Notably, all factors examined in this study underwent rigorous collinearity testing to ensure the robustness of the statistical analysis and the validity of the findings.

Table 2 Descriptive statistics.

Multiscale geographically weighted regression (MGWR)

As previously noted, educational accessibility exhibits non-stationarity across time and space. This variation is largely attributable to differences in the spatial distribution of topography and resource endowments. These differences lead to significant disparities in the location and infrastructure development of rural settlements across regions (Fan and Zhang, 2004; González-González and Nogués, 2019). Moreover, the migration patterns of rural settlements exert varying degrees of influence and modes of action on educational accessibility. In addition, uneven population density and scale have resulted in substantial differences in the degree of educational equity development (Zhang and Rozelle, 2022). The level of economic development also significantly impacts policies and methods related to rural settlement relocation (Drucker, 2016). These factors collectively contribute to the spatial non-stationarity of educational accessibility.

Accurate identification of the influencing factors, their spatial domains, and scales of influence is a fundamental prerequisite for understanding the disparities in educational accessibility and for effectively improving educational equity in a location-specific manner. MGWR proposed by Fotheringham et al. in 2017, is particularly well-suited for this purpose. Unlike traditional multivariate regression methods, which fail to capture the spatial characteristics of data, MGWR can explore the spatial heterogeneity of different influencing factors by assigning a specific bandwidth to each variable. This capability makes it an ideal tool for examining the mechanisms underlying educational accessibility.

The basic model of MGWR is GWR, it is an advanced spatial linear regression model that employs the local weighted least squares approach for estimating regression coefficients, distinguishing itself as a spatially varying coefficient regression model. GWR addresses the limitations of traditional multivariate regression methods, which often fail to capture the spatial nuances of data. The model capitalizes on the variation in parameter estimates with respect to spatial geographic location to investigate the non-stationarity of spatial relationships. By integrating the spatial location of data directly into the regression parameters, GWR enhances the realism of spatial valuation issues. The GWR model is mathematically expressed as:

$${y}_{i}=b({u}_{i},{v}_{i})+\mathop{\sum }\limits_{m=1}^{n}{a}_{m}({u}_{i},{v}_{i}){x}_{{im}}+{\partial }_{i}$$
(3)

In this equation, \({y}_{i}\) represents the dependent variable; i indexes the sample points (i = 1,2,…,m). \({x}_{{im}}\) (m = 0,1,…,n) denotes the independent variables; \(({u}_{i},{v}_{i})\) is the geographical location of the \(i{\rm{th}}\) sample point; \({a}_{m}({u}_{i},{v}_{i})\) (m = 1,2,…,n) is the regression coefficient for the \(m{\rm{th}}\) variable at the \(i{\rm{th}}\) sample point. The function as a whole is a geographically dependent function. \(b({u}_{i},{v}_{i})\) is a constant term, often denoted as \({b}_{i}\), and \({a}_{m}({u}_{i},{v}_{i})\) is also written as \({a}_{{im}}\). \({\partial }_{i}\) is an independent error term that is normally distributed (\({\partial }_{i} \sim N(0,{\sigma }^{2}\)).

The spatial process effects of rural settlement expansion factors, topographic and locational factors, and socioeconomic factors on educational accessibility differ significantly. The development of various socioeconomic elements often corresponds to distinct spatial scales, and socioeconomic phenomena are frequently the result of multiple spatial processes operating at different scales. MGWR is adept at characterizing and elucidating these “cross-scale” dynamics, revealing how various factors influence the spatial scale of educational accessibility in rural settlements differently (Fotheringham et al., 2017; Song et al., 2021; Pramanik et al., 2022). MGWR allows each variable to have its own spatial smoothing level, with the specific bandwidth serving as an indicator of the spatial scale of each spatial process, thereby enhancing the robustness of spatial process simulation results. MGWR is an enhancement of the GWR model and is mathematically represented as:

$${y}_{i}=b({u}_{i},{v}_{i})+\mathop{\sum }\limits_{m=1}^{n}{a}_{b{wm}}({u}_{i},{v}_{i}){x}_{{im}}+{\partial }_{i}$$
(4)

Here, bwm signifies the bandwidth associated with the regression coefficient of the mth variable.

Results

Spatial distribution and diversified growth pattern of rural settlements

Spatial distribution of rural settlements and 20 years’ growth

Overall National Scale

In 2000, the landscape of rural settlements in China was characterized by a total area of 153,642.7 square kilometers, with an average county settlement area of 56.6 square kilometers. Notably, Nong’an County in Jilin Province and Shuyang County in Jiangsu Province stood out with expansive rural settlements, covering 618.9 and 603.9 square kilometers, respectively. As shown in Fig. 2a, the spatial distribution revealed a pronounced east-west gradient, with a higher concentration in the eastern regions. Particularly dense settlements were observed in the Northeast and the Huanghuaihai area, as well as in the southeastern regions, including the middle and lower reaches of the Yangtze River. Analyzing the longitudinal and latitudinal distribution, a concentration was noted between 100°E–130°E, peaking around 118°E, with a north-south bimodal structure evident near 40°N and 32°N.

Fig. 2: Distribution and evolution of rural settlements from 2000 to 2020.
figure 2

a Area of rural settlements in 2000; b Area of rural settlements in 2020; c Area of newly added rural settlements from 2000 to 2020.

By 2020, the national settlement area had expanded to 198,109.2 square kilometers, with the average county settlement area increasing to 71.3 square kilometers. A significant number of counties, 78 in total, exhibited rural settlement areas exceeding 300 square kilometers, with the Inner Mongolia Autonomous Region and Anhui Province leading the count. Counties with settlement areas ranging from 100 to 300 square kilometers were predominantly found in northern regions such as Shandong, Henan, Hebei, Heilongjiang, and Inner Mongolia, alongside Jiangsu and Anhui provinces. In Fig. 2b, the spatial distribution in 2020 showed an emerging pattern of increased concentration in the northeast and a relative scarcity in the southwest. The longitudinal concentration persisted between 100°E–130°E, with enhanced density between 115–125°E. The latitudinal distribution now featured three distinct peaks near 40°, 38°, and 32.5°N.

The period from 2000 to 2020 marked a quarter-century of swift rural development in China. The study area witnessed an addition of 44,417.2 square kilometers to the rural settlement area, comprising 22.45% of the total settlement area. These increases were predominantly mapped in the North China Plain, the eastern coastal regions, and the Xinjiang Autonomous Region in the northwest (Fig. 2c). The growth was most substantial in the previously concentrated settlements between 110°E–130°E, with the majority of the expansion occurring between latitudes 28–42°N.

Natural and Economic Zoning Scale

From the perspective of economic zoning scale, the area of rural settlements in the middle reaches of the Yellow River Economic Zone is the most extensive, followed by the northern coastal economic zone and the northeastern economic zone, both of which exceed 30,000 square kilometers (Table 3). The Middle Yangtze River, Eastern Coastal, and Northwest regions each have areas exceeding 10,000 square kilometers, while the Southwest and Southern Coastal regions have the least settlement area. From the average value, the eight regions have an obvious fault distribution. The first echelon is the northern coastal, northeastern, eastern coastal and middle reaches of the Yellow River economic zones. On average, there is a pronounced tiered distribution among the eight economic regions. The Northern Coastal, Northeastern, Eastern Coastal, and Middle Yellow River economic zones form the first tier, with the average rural settlement area per county in these regions exceeding 40 square kilometers. In contrast, the remaining regions show an average settlement area of less than 23 square kilometers per county. Notably, Nong’an County in Jilin Province, located in the Northeastern Economic Zone, stands out with the largest area of rural settlements. Variance analysis reveals significant disparities in settlement areas among counties within the Northeastern and Northern Coastal economic zones, indicating a high degree of variation. The Middle Yellow River Economic Zone also shows considerable variance, suggesting uneven distribution of settlement areas. In contrast, the Northern Coastal and Middle Yangtze River regions, despite their large overall settlement areas, exhibit a more balanced distribution per county. Furthermore, the Northwest, Southwest, and Southern Coastal economic zones demonstrate lower heterogeneity, indicating a more uniform distribution of rural settlements across counties in these regions.

Table 3 Statistics of natural and economic regions.

Provincial Scale

In the year 2000, as illustrated in Fig. 3, the province with the largest expanse of rural settlements was the Inner Mongolia Autonomous Region, boasting an area of 17,077.7 square kilometers. It was closely followed by Henan Province, with a settlement area of 15,501.1 square kilometers. Additionally, Shandong, Anhui, Heilongjiang, Jiangsu, and Hebei provinces had rural settlement areas exceeding the 10,000 square kilometer threshold. In stark contrast, the Tibet Autonomous Region had the smallest area, with less than 10 square kilometers of rural settlements. Other provinces and municipalities such as Chongqing, Guizhou, Hainan, Shanghai, Qinghai, and Tianjin had rural settlement areas below 1,000 square kilometers.

Fig. 3
figure 3

Rural settlements and their proportion at the provincial level from 2000 to 2020.

By 2020, Shandong and Henan provinces had risen to the top of the list with rural settlement areas surpassing 20,000 square kilometers each. The Inner Mongolia Autonomous Region, Hebei, Jiangsu, Anhui, Heilongjiang, and Liaoning provinces also joined this significant threshold, with each having rural settlement areas over 10,000 square kilometers. Despite still having the smallest area, the Tibet Autonomous Region experienced a remarkable expansion, with its rural settlements growing to 149.5 square kilometers—a substantial increase from the 2000 figures. The number of provinces with rural settlement areas under 1000 square kilometers has decreased to five.

Of the newly added settlement areas between 2000 and 2020, 58.8% were concentrated in the aforementioned provinces with rural settlement areas large than 10,000 square kilometers in 2020. Furthermore, several provinces have shown notable growth in their rural settlement areas, including the Tibet Autonomous Region, Xinjiang, Chongqing, Beijing, and Shanghai, where the newly added rural settlements account for more than 40% of their total area. This indicates that over the past two decades, rural development in these provinces and cities has been relatively swift, highlighting the importance of rural development within the broader context of urban and rural progress.

Differentiated growth patterns of rural settlements

Rural settlements across the country exhibit pronounced spatial disparities in both the extent of growth and the dynamics of their expansion. As depicted in Fig. 4a, provinces in the Northeast, along the Eastern coast, and Xinjiang are predominantly marked by outlying growth, with 13 provinces witnessing a 100 percent outlying expansion in their newly added settlements. In the Southwest, particularly in Tibet, Qinghai, and Yunnan, outlying growth remains the dominant trend, indicative of a strategic reconfiguration of rural settlements in harmony with the local ecology and in response to policies advocating for rural revitalization and residential relocation initiatives. In the Central and Southern provinces, such as Ningxia, Hubei, Guangxi, Guangdong, and Fujian, edge development emerges as the prevalent growth pattern. This involves expansion along the periphery of existing settlements, leading to the formation of larger settlements. These provinces boast a robust foundation for rural settlement development and possess favorable conditions for scaling up their development efforts. It is noteworthy that the infilling growth mode is exceedingly rare in the expansion of rural settlements. This scarcity is attributed to the typical formation process of rural settlements, where a concentrated group of residents, under the influence of the natural environment, establishes discrete villages. The scenario of an initial hollow formation that subsequently gets filled in is notably uncommon.

Fig. 4: Differentiated growth mode of NARS.
figure 4

a Area and growth mode of newly added rural settlements in provinces. b overall Growth Landscape Expansion Index of NARS in counties.

The overall Growth Landscape Expansion Index (GLEI), as shown in Fig. 4b, substantiates these findings. The highest GLEI value is recorded in Leye County, Guangxi Zhuang Autonomous Region, reaching a score of 45.4. A total of 282 counties and cities have a GLEI index surpassing 34.6, indicating a relatively high expansion rate compared to other counties. However, the GLEI index for all counties in China remains below the threshold of 50, underscoring that the expansion types are predominantly non-filling in nature. This finding further reinforces the notion that the trajectory of rural settlement expansion diverges from the agglomerative and compact development models that are quintessential to urban landscapes.

Measurement of rural education accessibility over the past 20 years

The spatial distribution of education accessibility, as depicted in Fig. 5a, b, exhibits distinct characteristics when compared to the distribution of rural settlements. A consistent trend is observable from 2000 to 2020, with the Hu-line serving as a significant watershed. West of the Hu-line, education accessibility is generally poor, with many counties featuring the shortest distance from rural settlements to primary and secondary schools exceeding 10 km, and in some cases, over 20 km, indicating a severe condition for rural residents’ education accessibility. Despite this, there are exceptions where accessibility is notably good, such as in Dunhuang City, Gansu Province, Taxkorgan Tajik Autonomous County, Xinjiang Autonomous Region, and certain counties in the Tibet Autonomous Region, including Gegye, Cona, Lhuntse, and Leiwuqi, as well as Chengguan District in Lhasa City. In these areas, characterized by harsh natural environments and sparse populations, settlements are deeply concentrated, and small schools are present within settlements, resulting in an average distance to schools of less than 2 km and good educational accessibility.

Fig. 5: Rural educational accessibility changes from 2000 to 2020.
figure 5

a Accessibility in 2000; b accessibility in 2020; c accessibility changes from 2000 to 2010.

In contrast, east of the Hu-line, the scenario is more favorable, with numerous coastal areas in Guangdong, Shanghai, and Fujian having rural educational accessibility of less than 2 km, and the majority of counties and cities within a 5 km radius. However, a few counties in the northeast and southwest regions close to the Hu-line where educational accessibility exceeds 5 km, mainly in Heilongjiang Province, Inner Mongolia Autonomous Region, Sichuan Province, and Yunnan Province. Collectively, these findings indicate that the spatial distribution of educational accessibility in rural areas is highly uneven, underscoring the imperative for equitable access to compulsory education, at least in terms of spatial distribution.

A comparative analysis of Fig. 5a, b reveals that while the general characteristics between 2000 and 2020 are similar, localized changes have transpired. For instance, in certain counties within the North China and Southwest China regions, the distance between residential areas and schools has notably increased, leading to diminished accessibility. Upon further examination, it is observed that educational accessibility has deteriorated in 1678 counties and cities, improved in 1086, and remained static in 110 counties. Among these, a significant improvement is noted in 94 counties where the distance to schools has decreased by more than 1 km, while in 992 counties and cities, the improvement is marginal, less than 1 km. Conversely, the distance has increased by more than 1 km in 125 counties, with a minor increase in 1553 counties and cities.

Provincial statistics, as delineated in Fig. 5c, suggest that the majority of provinces have experienced a decline in accessibility, with 18 provinces showing a greater number of counties with downgraded accessibility compared to those with improved conditions, predominantly in the eastern region. The exceptions are the Inner Mongolia Autonomous Region, Fujian Province, Hubei Province, Guangxi Zhuang Autonomous Region, Hainan Province, Guizhou Province, Shaanxi Province, Gansu Province, and Qinghai Province, where the number of counties with enhanced accessibility surpasses those with declines. The Tibet Autonomous Region has exhibited minimal change.

Impact of rural settlement changes on educational accessibility

Comparison of estimation results of OLS, GWR, and MGWR models

The impact of rural settlement changes on educational accessibility is evaluated using three distinct modeling approaches: Ordinary Least Squares (OLS), Geographically Weighted Regression (GWR), and Multiscale Geographically Weighted Regression (MGWR). A comparative analysis of these models is presented in Table 4, highlighting key statistical indicators. Notably, the MGWR model surpasses both the OLS and GWR models in terms of the coefficient of determination (R2) and the adjusted goodness of fit (Adj. R2), while also demonstrating a lower Akaike Information Criterion corrected for small sample sizes (AICc). This enhanced performance is further underscored by the MGWR model’s more efficient parameterization and a reduced residual sum of squares, thereby yielding simulation results that are more closely aligned with the actual regression outcomes.

Table 4 Parameters of OLS, GWR, and MGWR models.

Scale analysis based on MGWR model

Table 5 illustrates the capability of the MGWR model to directly capture the variable-specific operational scales, a feature not inherently available in the traditional GWR model, which provides an average scale for each variable. The classic GWR model’s bandwidth is set at 151, representing 8.24% of the total sample size. The MGWR model reveals significant variability in the operational scales of different variables.

Table 5 Bandwidths of GWR and MGWR models.

Specifically, the bandwidths of the variables Out_ratio and Dis_center are 44 and 43, respectively, accounting for 2.40% of the total sample size. This scale is notably smaller than those of other variables. Given China’s land area of approximately 9.6 million square kilometers, 2.40% equates to an area of 230,400 square kilometers. On average, this scale is larger than the area of most provinces located to the southeast of the Hu-line, whereas the average provincial area to the northwest of the Hu-line is approximately 803,200 square kilometers, roughly 3–4 times larger. It is observed that once the scale exceeds 230,400 square kilometers, the variable coefficients undergo substantial changes. The bandwidths of the variables Dis_VR, 2ndGDP, POP, Dis_CR and 1stGDP are 65, 98, 115, 142, and 169, respectively, indicating an operational scope 2–4 times that of the first two variables. The bandwidths of the variables DEM, GDP, and Dis_urban exceed 300, suggesting a broad effect scale and highlighting the presence of spatial heterogeneity across the country. The scale of the ARS variable is 1794, indicating a global scale effect, which implies minimal spatial heterogeneity.

Spatial pattern analysis of coefficients based on MGWR

Statistical descriptions of MGWR coefficients are detailed in Table 6, and Fig. 6 shows the spatial distribution of coefficients. The coefficient statistics reveal that ARS exerts a significant and negative impact on the distance to educational institutions, thereby enhancing regional educational accessibility. From Fig. 6, this impact is predominantly observed in the northeast region, Inner Mongolia and the northern part of Heilongjiang province. The ARS coefficients vary from −0.026 to −0.012, with an average value of −0.02 and a standard deviation of 0.002. This variance indicates that for every unit increase or decrease in the area of rural settlements from the year 2000 to 2020, there is a corresponding decrease in the distance to schools by 0.012 to 0.026 units. This finding underscores the pivotal role that the establishment of new rural settlements plays in optimizing educational accessibility within rural regions. Especially in the northeastern region, where educational accessibility is already lagging behind the national average, the strategic expansion of rural settlements emerges as a potent measure for improvement.

Table 6 Variable statistics of the MGWR model.
Fig. 6
figure 6

Significance results of the MGWR model.

The form of new rural settlements, quantified by the Out_ratio variable, plays a critical role in educational accessibility. Global regression analysis indicates that Out_ratio positively affects the average distance between settlements and schools, signifying an inhibitory effect on educational accessibility. Spatially differentiated coefficient results, as illustrated in Fig. 6, reveal a strong inhibitory effect in certain regions, with coefficients ranging from −1.391 to 1.143, indicating variable regional impacts. In eastern Inner Mongolia, Huma and Tahe counties in northern Heilongjiang, areas south of Lanzhou in Gansu Province, and counties north of Sichuan Province, such as Mianyang and Guangyuan, Out_ratio shows a significant positive correlation, suggesting a detrimental effect on educational accessibility in these regions. However, in two distinct clusters—Taiyuan City in Shanxi and surrounding counties, and a cluster stretching from Alxa League in western Inner Mongolia to Ganzi Tibetan Autonomous Prefecture in Sichuan—the increase in Out_ratio promotes educational accessibility. Of the counties analyzed, 117 show a promoting effect, while 84 show an inhibitory effect, with the inhibitory counties having larger coefficient absolute values, leading to a net global inhibitory effect.

DEM shows a positive correlation in all but 19 samples, with coefficients ranging from 0 to 0.5 and an average of 0.334. This indicates that higher altitudes are associated with longer distances to educational facilities and poorer accessibility, highlighting the need for focused improvements in high-altitude areas. Dis_urban is positively correlated with distance to education facilities, particularly in the southwest and northeast regions, reflecting the urban-rural divide in educational accessibility. The northeast region includes most counties in Heilongjiang and Jilin Provinces, parts of the Inner Mongolia Autonomous Region, and northern Hebei. The southwest region comprises most counties in Yunnan Province and southern Sichuan. Dis_center is also positively correlated, mainly distributed northwest of the Hu-line, where greater distances from county centers equate to worse educational accessibility. This is attributed to the vast land, sparse population, and challenging terrain in the northwest, resulting in larger county areas with relatively poor infrastructure and educational accessibility.

Dis_CR and Dis_VR Show spatial differences in their impact on educational accessibility. Dis_CR is positively correlated in the eastern part of Inner Mongolia, suggesting that increasing national highway density could reduce educational travel distances. In some Qinghai and Gansu counties, a negative correlation indicates that county or rural roads are the main influence on educational access distance. Dis_VR exhibits positive correlations in 204 counties and negative correlations in 115, with the positively correlated counties spread across various regions, including Inner Mongolia, Ningxia Hui Autonomous Region, Gansu, Qinghai, northern Yunnan, and eastern Xinjiang. Negative correlations are most prevalent in Xinjiang and the eastern part of the Inner Mongolia Autonomous Region.

GDP significantly and negatively impacts the distance to schools in both global and local regression analyses, implying a positive influence on educational accessibility. The coefficient of GDP ranges from −0.365 to −0.008, with an average of −0.068, indicating that a unit increase in GDP from 2000 to 2020 has led to a reduction in school distance by 0.008 to 0.365. In the MGWR model, 137 counties exhibit a significant negative correlation, primarily in the northeastern provinces of China, including Heilongjiang, Jilin, Liaoning, and the eastern part of the Inner Mongolia Autonomous Region. In these areas, economic development constraints directly affect educational accessibility levels, especially in impoverished regions. However, as regional economies develop, the distance to schools for residents is more easily reduced, suggesting that economic development can be leveraged to enhance educational accessibility. POP has a positive correlation in 44 counties, located mainly in Qinghai Province, with some in the northwest of Sichuan Province and 5 in the Inner Mongolia Autonomous Region. In these regions, higher population density correlates with worse educational accessibility. Conversely, in 6 counties—Mohe, Tahe, and Huma in Heilongjiang Province, and Oroqen Autonomous Banner, Genhe City, and Ergun City in Inner Mongolia—population growth can enhance educational accessibility, likely due to increased attention and resources for smaller populations.

1stGDP in the local economy shows a coefficient range of −0.076 to 0.344 in global regression, with an average of 0.036, indicating a predominantly positive correlation. This suggests that a higher proportion of 1stGDP inhibits the improvement of educational accessibility. MGWR local regression results identify 311 counties with a significant positive correlation, including all sample counties in Heilongjiang, Jilin, and Xinjiang Uygur Autonomous Region, most counties in Qinghai, and select counties in Gansu, Inner Mongolia, and Yunnan’s ethnic minority areas. In these predominantly agricultural counties, the larger rural population, lower infrastructure development compared to urban areas, and a relative scarcity of educational facilities collectively reduce educational accessibility. This finding underscores the link between regional development levels, industrial composition, and educational access. 2ndGDP has a positive correlation observed in 214 counties, predominantly in the Xinjiang Autonomous Region, Inner Mongolia Autonomous Region, Qinghai Province, and Gansu Province, as well as a few counties in Sichuan Province adjacent to Qinghai. Notably, counties with a secondary GDP ratio coefficient greater than 0.5 are found in eastern Qinghai, central Gansu, and specific county clusters in Inner Mongolia. The increase in the secondary industry proportion per unit from 2000 to 2020 results in an increase of more than 0.5 units of distance, indicating a significant impact on educational accessibility in these regions.

Discussion

It is usually assumed that the reconstruction of rural settlements is to better improve the welfare of rural people and to be more effective in the provision of public goods, such as education and healthcare (Yin et al., 2023). However, the divergence between policy design and implementation has been infrequently explored (Zhao and Barakat, 2015), with previous studies highlighting negative impacts like environmental degradation and food security concerns that may overshadow new accessibility issues. This study fills a gap in the literature by providing a fresh perspective.

This study delves into the intricacies of rural education accessibility based on patch-scale data. While previous research has identified regional disparities and issues in education accessibility at the county level, it has often overlooked the nuances of accessibility within rural communities, where issues of equity are particularly pronounced. In terms of rural settlement expansion patterns, our results are consistent with those of Rao et al. (2025), who also conducted their study on a national scale. Specifically, both studies find that after 2000, rural settlements primarily expanded in an outlying pattern. Regarding educational accessibility, the magnitude of our findings aligns with those of Gao et al. (2016). However, while their results measured comprehensive accessibility for both rural and urban residential areas, our study reveals that the educational accessibility problem in rural residential areas is significantly more severe. Additionally, we find that the spatial differences in educational accessibility development levels across the eastern, central, and western regions are relatively similar. For instance, Gao et al. (2016) reported an average proximity to educational facilities within 1 km for several provinces, including Henan, Anhui, Jiangxi, Guangdong, and Hainan. In stark contrast, this study’s findings, which are specific to rural areas, indicate that no province has an average accessibility level with a distance of less than 1 km. Except for Guangdong Province, the education accessibility in rural areas of other provinces exceeds 2 km, as illustrated in Fig. 7. Nationally, the four municipalities—Tianjin, Shanghai, Beijing, and Chongqing—and the two provinces of Guangdong and Zhejiang exhibit the most favorable rural education accessibility, with distances typically under 2 km. Conversely, the northwestern provinces of Qinghai, Tibet, Inner Mongolia, and Xinjiang present the direst situation, with average distances from residences to schools exceeding 10 km, significantly impeding access to education.

Fig. 7
figure 7

Ranking of rural education accessibility in provinces.

However, when calculated together with the accessibility of urban residential areas, the average accessibility of Qinghai Province is within 10 km, which is much smaller than the average accessibility of rural residential areas of 16.05 km. Similarly, Xinjiang, Gansu, and Heilongjiang Provinces show average accessibilities within 5 km, markedly better than the rural residential areas’ figures. The findings underscore that rural education accessibility is a critical area where equity in service coverage is lacking, necessitating increased policy focus and strategic planning to bridge the urban-rural gap.

Conclusions and policy recommendations

Equity in compulsory education is a critical social and political issue, integral to state governance and economic development (Walker and Clark, 2010; Kučerová et al., 2015). The accessibility of rural education is a cornerstone of urban-rural equity and rural sustainable development, serving as a vital foundation for guiding spatial policies, planning, and management (Sajjad et al., 2022). This paper explores the spatial evolution of rural settlement development patterns and their impact on rural education accessibility, aiming to inform strategies that can bridge the urban-rural development gap and galvanize greater attention to rural development.

Drawing on the expansion characteristics of rural settlements in China over the past two decades, this study examines their influence on rural education accessibility. The key findings are as follows: (1) Over the past 20 years, there has been a significant increase in the area of newly added rural settlements, with outlying expansion emerging as the predominant growth pattern. (2) The spatial distribution of rural settlement accessibility is delineated by the Hu-line. Between 2000 and 2020, a greater number of counties experienced a decline in accessibility compared to those that saw improvements. (3) The total area of rural settlements in counties has a positive effect on rural educational accessibility. The development of rural residential areas in mainland China over the past 20 years has been conducive to enhancing rural educational accessibility. (4) Outlying growth impacts accessibility in a spatially differentiated manner, with a notable inhibitory effect observed in some northeastern and northwestern counties and cities. Addressing the urban-rural divide and advancing social equity are global challenges that demand innovative solutions. This paper offers theoretical and empirical insights that can inform rural development strategies in developing countries, contributing to a broader understanding of the complexities involved in achieving equitable access to education.

According to these empirical findings, the paper recommends that (1) planners should consider the drawbacks of outlying expansion in light of diminished education accessibility. While increasing the area of rural settlements can enhance educational accessibility, specific regions like Northeast China, Gansu, and Sichuan require careful consideration of outlying expansion strategies. This involves the complete relocation of settlements or the development of new ones, which should be planned with the location and service scope of educational facilities in mind to prevent diminished accessibility and social issues like “hollow villages” (Sun et al., 2011; Wang et al., 2021). (2) Economically backward countries could enhance educational accessibility by providing supporting infrastructure when relocating settlements. The construction and site selection of rural settlements should account for a variety of factors. In areas with a significant proportion of primary and secondary industries, educational accessibility tends to be poor. Planners should consider the construction of national and county roads around new settlements to select optimal locations or improve road density.

There are some limitations of this study. First, the present study’s analysis is based on cross-sectional data from 2000 and 2020, which, while informative, may lead to biased estimation results due to omitted variable bias. Panel data, offering more information, can mitigate this through fixed-effect or random-effect models, thus enhancing model estimation efficiency and empirical persuasiveness. Currently, the data available for this study is constrained, limiting the use of models like the MGTWR for more nuanced spatiotemporal exploration. Future research with more extensive data could leverage such models for a deeper analysis. Additionally, the comparison data on urban and rural educational accessibility from previous studies used in this paper’s discussion section date back to 2015, which may introduce certain inaccuracies when compared with the 2020 rural educational accessibility data presented in this paper.