Introduction

According to statistics from the World Bank for 2022, the global population reached 7.837 billion by the end of 2021, with 43.4% residing in rural areas. High-income countries have 18.5% of their population living in rural areas, while the majority of rural populations are found in middle-income, low-middle-income, and low-income countries, particularly in low-income countries where rural residents constitute 65.8% of the population. Countries such as India, Cambodia, Myanmar, Laos, Pakistan, Bangladesh, Sri Lanka, and Vietnam have a rural population exceeding 60%. Moreover, the global poverty rate in rural areas has reached 17% (compared to 7% in urban areas), with rural communities accounting for 70% of the world’s extreme poverty, according to the Rural Culture Committee of China Cultural Management Association in 2024. The persistent disparity between urban and rural development is largely attributed to the long-standing “dual economy” in underdeveloped regions and inadequate public services, including education, infrastructure, healthcare, and sanitation in most rural areas.

Education, as a public or quasi-public good, serves as a fundamental driver of human progress and social development. Country-specific experiences underscore the importance of balanced educational development in enhancing national comprehensive strength and fostering sustainable social progress (Yuan et al. 2020). Enhancing national quality and ensuring social equity through balanced primary education has become a universal policy focus for governments worldwide (Wang, 2018). The United Nations, in its 2030 Education Action Framework, emphasizes “developing fair and high-quality education” as a core educational goal, considering the significant urban–rural development imbalance globally, particularly in underdeveloped regions (UNESDOC, 2015). The European Commission has similarly prioritized the efficiency and equity of education systems (EU Commission, 2006). In India, the role of high-quality primary education is recognized as vital for unlocking economic potential, particularly in rural areas. Effective primary education for rural communities lays a stable foundation for higher education, skill acquisition, and employment opportunities. To this end, India advocates for corporate investment in social welfare initiatives aimed at improving rural education, reducing poverty, and supporting sustained economic growth (Ojha and Chakravarty, 2012; Shukla, 2020).

Renowned economist Theodore W. Schultz, in his book “Transforming Traditional Agriculture”, emphasized that “neglecting primary and secondary education is very short-sighted.”(Schultz, 1987). Similarly, Nobel laureate Douglas North believed that throughout history, institutional incentives for investing in productive knowledge have been lacking, often misdirecting resources toward higher education instead of primary education (North D C Institutions, 1990). According to the human capital theory, new structural economic theory, and the comparative advantage strategy of developing countries, primary education should be prioritized. This approach suggests that governments should assume greater responsibility for investing in primary and compulsory education to ensure equity and fairness. By adopting such a strategy, developing countries can alleviate income inequality, support economic development, and accelerate their progress toward parity with developed nations. Conversely, failure to prioritize primary education may hinder these countries’ ability to transition into the middle-income bracket and risk falling into the middle-income trap (Krueger and Lindahl, 2001; Lin, 2008; Cao and Li, 2018).

China has long prioritized primary education, as evidenced by its efforts to make compulsory education widely accessible, achieve balanced development, and pursue high-quality and balanced compulsory education. These actions reflect the country’s strategic reforms and adjustments. By the end of 2021, 2895 county-level administrative units in China have achieved the national recognition of basic educational balance (Meng and Yao, 2022). Despite efforts to enhance the urban-rural education balance through measures such as increased funding for rural compulsory education, improved rural school infrastructure, the “two exemptions and one subsidy” policy, and better compensation and training for rural teachers, there is still an imbalance between urban and rural areas. Approximately 500 million Chinese people still live in rural areas, and urbanization lags behind industrialized nations by 72.6% (Cai et al. 2019; Tang et al. 2022). Contributing factors include the entrenched dual economic system, rural-to-urban migration, unequal economic and social development, and imbalanced educational resource allocation. Consequently, rural schools, especially primary schools, often face challenges such as being under-resourced yet underpopulated, while urban primary schools are well-resourced but overcrowded (Xie, 2014). These issues hinder China’s efforts toward rural revitalization and urban-rural integration.

Currently, in the research on the development gap between urban and rural education, one is to take a qualitative approach, such as urban and rural education integration (Zheng et al. 2023; Tang et al. 2023; Lei and Xie, 2023), education equity (Jia, 2023), high-quality and balanced rural compulsory education development (Lu et al. 2023), digital transformation of rural basic education (Hu et al. 2023), and rural primary education system evaluation (Singh and Parashar, 2024), etc. The second is to take quantitative exploration, such as the resource allocation in urban and rural primary education (Zheng and Yue, 2017; Feng and Zhang, 2018), migrants’ primary education (Zhang et al., 2015; Zhang, 2017) and the role of compulsory education in poverty reduction (Li et al. 2019; Li, 2019; Zuo et al. 2023). Despite, various studies indicating that primary education in urban and rural China remains imbalanced, many suggest that it will become more balanced and eventually converge over time (Han and Li, 2013; Li and Luo, 2015; Yu and Wang, 2022; Xu and Liu, 2023). The emphasis in educational forecasting lies in estimating the extent of development (Xu and Liu, 2023), predicting academic performance (Field et al. 2019), school-age population changes (Zhou, 2024), and dropout rates (Nascimento et al. 2022). However, to our knowledge, no existing research predicts future trends or tests convergence in urban and rural primary education based on historical evolution.

The purpose of this study is to assess the development level of primary education in China and to create an evaluation index system for both urban and rural areas. Annual time series data were utilized on national primary education development from 1995 to 2022 (a complete data set available at the national level). The coefficient of variation entropy-comprehensive index method is first employed to identify the historical evolution characteristics of urban and rural primary education development in China. Secondly, time difference correlation analysis is used to screen early warning indicators and identify the leading factors influencing primary education development in urban and rural areas. Thirdly, a multiple linear regression model is applied to predict the future development of primary education from 2023 to 2035. Finally, Beta convergence analysis examines past and projected future convergence in primary education development between urban and rural areas. This study explores the disparity in primary education development between urban and rural regions, comparing their differences and evaluating the overall development level. This study provides technical methods and decision-making support for measuring and evaluating the development level of primary education in both urban and rural areas.

This study makes two key contributions to understanding the development of primary education in urban and rural China. Firstly, it’s an initial attempt to apply the time difference correlation analysis method for forecasting the future development of primary education in urban and rural areas from 2023–2035, further exploring the convergence potential between urban and rural education. Secondly, this study introduces predictive warning models to the education field, offering a novel approach that may complement and expand existing research, thus enriching the current literature. Meanwhile, this paper highlights the issue of resource imbalance in rural primary education and predicts future development gaps in China, providing an early warning for educational challenges in similar regions.

The paper is structured as follows: Section “Literature review and theoretical framework” reviews the literature on the development of primary education and the construction of the index system. Section “Data and methodology”, presents the data and research methodology. Including the coefficient of variation entropy-comprehensive index method, time difference correlation analysis method, multiple linear regression prediction model, and Beta convergence analysis method. Section “Empirical results”, discusses the empirical results and findings. The paper concludes in section “Conclusion and discussion” and outlines limitations and future research directions in section “Limitations and future research”.

Literature review and theoretical framework

Integration of urban–rural primary education

The integration of urban and rural primary education is essential for achieving equalization, fairness, coordinated and integrated development, and balanced linkage between urban and rural areas. While there is no universally agreed-upon definition of urban–rural primary education integration, interpretations may vary among different stakeholders’ perspectives. Based on existing literature, this integration involves positioning urban and rural primary education within a unified system that dismantles the traditional binary opposition between the two. It emphasizes coordinated development, shared educational resources, the rational flow of educational elements, and the establishment of policies, institutional mechanisms, and resource allocation strategies to promote joint advancement (Chu, 2009; Zhang, 2012).

The integration does not imply making urban and rural primary education identical. Instead, it should be understood based on three aspects: consistency in educational goals, synergy in development speed, and stability and convergence of educational quality. Urban primary education typically advances at a faster rate with higher quality compared to rural areas. Bridging the gap requires accelerating rural primary education to catch up with urban progress, aiming for overall coordinated development. In terms of quality, it should integrate education management, resources, teacher management, and standardized evaluation across urban and rural areas. Additionally, there should be differentiation in teaching methods and content should be maintained to respect unique characteristics, alongside financial support directed at rural areas (Yang, 2019; Yan, 2020).

Numerous studies emphasizes that the integration should start with equalizing the supply of primary education in urban and rural areas. This can be achieved through enhanced financial mechanisms, equitable resource allocation, and targeted policy implementation. Moving away from a historically “urban preference” policy bias and adopting “rural preference” policies can help ensure a fairer distribution of resources, reducing the developmental gap (Shi, 2007; Yuan, 2011). Additionally, some scholars advocate for a symbiotic approach involving government, schools, families, and communities to create supportive environments for urban–rural education integration (Wu and Wang, 2022). However, existing studies have largely focused on optimizing supply-side factors, often overlooking the demand side. This study aims to address this gap by developing an indicator system that considers both the supply and demand aspects of primary education development in urban and rural areas.

Allocation of urban–rural primary education resources

Educational equity reflects social equity within the educational field, and resource allocation is a critical starting point for achieving educational equity (Wen and Gu, 2017). The allocation of educational resources is categorized under educational supply (Zhao et al. 2018). Research has extensively explored policies, influencing factors, and the effects of resource allocation in primary education across urban and rural areas. For example, Assaad and Krafft (2015) examined Egypt’s free basic education policy, finding that issues of low quality and inefficiency often led families to incur significant costs for their children to succeed academically. Similarly, Molina-Muñoz et al. (2023) found that Spanish students’ average mathematics scores fell below EU and OECD averages, attributing differences to socioeconomic status, gender, immigration status, language is spoken at home, and student–teacher ratios.

Evaluation index systems have been developed to measure the allocation of educational resources, considering factors such as teacher availability, funding, infrastructure, and school conditions. Researchers use different analytical methods, including PLS structural equation modeling and data envelopment analysis (DEA) have been employed to assess the efficiency and spatial distribution of resources. For example, Ramzi et al. (2016) applied DEA to assess the efficiency of primary and secondary education. They find that inefficiency in education is strongly related to poverty within governorates. Research by Chinese scholars highlights that primary schools in China are mainly concentrated in the east of the “Hu Huanyong Line”, which has significant regional differences in educational resources. Resource allocation is generally higher in the eastern region compared to the central and western regions, where low comprehensive efficiency is often due to inadequate scale efficiency (Ma et al. 2020; Li and Lei (2021); Cai and Zhang, 2023). Notably, focusing solely on resource allocation without considering the demand side overlooks critical aspects of primary education development.

Evaluation indicators of urban–rural primary education development

Past studies have largely measured primary education development by focusing on educational conditions, teaching staff, and educational resources. For example, Cai et al. (2022) constructed an index system featuring four primary indicators—educational opportunities, educational conditions, educational quality, and educational equity—and 13 secondary indicators to assess the level of education modernization in ethnic regions. In contrast, Mo et al. (2022) replaced “educational conditions” with “educational investment” while maintaining the other three indicators. However, few studies have incorporated social and economic indicators such as per capita GDP, despite their significant impact on primary education development in urban and rural areas. This study posits that both the supply and demand sides are crucial for understanding education development. An increase in educational investment, teacher allocation, and economic indicators such as GDP per capita and disposable income per capita positively impact education quality and quantity. Therefore, a framework of indicator systems including education supply, education demand, and education quality has been constructed based on existing literature, the “National Basic Equilibrium Evaluation Plan for Compulsory Education Development”, and our thoughtful analysis of the study. The framework includes 16 sub-indicators organized into four subsystems: educational conditions, teacher strength, student development, and socio-economic conditions. In the following subsections, we will explain the reasons behind their inclusion in the model.

Educational conditions

The subsystem comprises six sub-indicators to measure primary education quality: per-student education expenditure, per capita output value of fixed assets, floor space of school buildings per student, books per student, computers per student, and the percentage of schools with a campus network. Per-student education expenditure reflects the level of financial investment in primary education and indicates governmental support levels. Investment in educational material resources is fundamental for educational development, and the richer the investment in educational material resources, the higher the level of educational development. Therefore, the sub-indicators encompass both hardware (e.g., floor space and fixed assets) and software resources (e.g., books and computers per student, campus network availability), reflecting overall educational resources access.

Teacher strength

This subsystem includes four sub-indicators: teacher–student ratio, the proportion of full-time teachers, average years of teacher education, and teacher title structure. Teachers are the essential human resources for education, and numerous studies have shown that the quality of teachers is a key factor in determining the quality of education. Small-class teaching fosters better academic outcomes, and a higher teacher–student ratio, more full-time teachers, and greater teacher professional titles contribute to stronger educational performance.

Student development

This subsystem encompasses three sub-indicators: the number of primary schools per student, the number of classes per student, and the enrollment rate. The aim of education is student development. Liu and Fang (2017) noted that areas with concentrated schools and student populations benefit from resource consolidation. The smaller class sizes enhance teacher-student interaction, promoting overall student development. The enrollment rate is another indicator of demand-side and the quality of school teaching and should be included in this subsystem.

Socioeconomic conditions

This subsystem includes three sub-indicators: per capita GDP, per capita disposable income, and the dependency ratio. The income disparities between urban and rural areas affect the investment in education. Based on existing research by Sun et al. (2023), an increasing rural household income significantly promotes their family investment in education, narrowing the urban–rural gap. Conversely, a higher dependency ratio signifies a greater economic burden on families, potentially reducing education investment.

Data and methodology

Data sources

This study primarily utilized data sourced from the China Education Statistics Yearbook, China Educational Finance Statistics Yearbook, China Statistical Yearbook, and China Population and Employment Statistics Yearbook for the period spanning 1995–2023. The data represent a comprehensive national sample. To facilitate a comparative analysis between urban and rural areas, the national data set was divided into two distinct groups: the urban sample and the rural sample. In instances where data points were missing for specific years, linear interpolation was applied to fill these gaps, ensuring the continuity and integrity of the data set.

Research methodology

Entropy of variation coefficient-comprehensive index method

The method of entropy variation coefficient-comprehensive index calculates the weights of each index using a combination of the coefficient of variation method and the entropy value method. It then applies the composite index method to measure the level of development in primary education.

The entropy method is an objective assignment method that determines weight based on the information provided by the indicators. The smaller the information entropy of the indicators, the greater the degree of variability, and the greater the weight. However, this method does not consider the interactions between indicators, the difficulty of handling missing values, and sensitivity to extreme values, which may lead to the distortion of the weight. The coefficient of variation is a statistical measure that is obtained by dividing the standard deviation by the mean of a data set. It is useful because it eliminates the issue of scale, objectively assigns values to each data point, and avoids subjectivity that may arise in other methods such as AHP (Zhang and Jia, 2016). And this method can reflect the degree of data dispersion but also consider the average level of data, so to make up for the lack of entropy method. The method of coefficient of variation is insensitive to the distribution shape of data and can deal with the extreme value and missing value better, so it can be used as a supplement to the entropy method in some cases.

Based on the coefficient of variation method having a strong compensating effect on the entropy method, a combination of the two methods was used after weighing their respective advantages and disadvantages. The weights obtained from the coefficient of variation method and entropy value method were utilized for this purpose. Since the application of these two methods has been established, the calculation of the combined weights will be explained in detail here. The weights of each indicator determined using the two methods will be set as the combination weights as \({v}_{i}\) and \({\tau }_{i}\), respectively, and i as each indicator, with a total of 16 sub-indicators in this study. According to the principle of minimum information entropy, the weights determined by the coefficient of variation method and entropy value method are coupled to obtain the combined weight \({\omega }_{i}\).

$$\min \eta =\mathop{\sum }\limits_{i=1}^{m}{\omega }_{i}\left({\mathrm{ln}}\,{\omega }_{\rm{i}}-{\mathrm{ln}}\,{\rm{v}}_{\rm{i}}\right)+\mathop{\sum }\limits_{i=1}^{m}{\omega }_{i}\left({\mathrm{ln}}\,{\omega }_{\rm{i}}-{\mathrm{ln}}\,{\tau }_{\rm{i}}\right)$$
$$\mathop{\sum }\limits_{i=1}^{m}{\omega }_{i}=1$$
$${\omega }_{i} \,>\, 0$$

\(\eta\) is the objective function of minimum information entropy and \(m\) is the number of indicators. The above function is obtained by optimizing it according to the Lagrange multiplier method:

$${\omega }_{i}=\frac{\sqrt{{\upsilon }_{i}{\tau }_{i}}}{\mathop{\sum }\nolimits_{i=1}^{m}\sqrt{{\upsilon }_{i}{\tau }_{i}}}$$

Drawing on Ding Jianjun’s method of calculating the Comprehensive Development Index (Ding, 2014), the level of development of primary education in overall, urban and rural areas from 1995 to 2022 in this study was calculated. The calculation formula is as follows:

$$E=\mathop{\sum }\limits_{i=1}^{m}{I}_{i}{\omega }_{i}$$

where E denotes the level of development of primary education, \({I}_{i}\) is the standardized value of the \(i\) indicator in the sub-indicator, and \({\omega }_{i}\) is the weight of the \(i\) indicator.

Time difference correlation analysis method

Time difference correlation analysis is a statistical method used to investigate the relationship between two sets of data over time. It looks at the correlation between the two sets of data, which is a measure of how closely related they are. The correlation coefficient is used to quantify this relationship, and it ranges [−1, 1]. A value of −1 indicates a perfectly negative correlation, while a value of 1 indicates a perfectly positive correlation (Liu and Liu, 2017). The principle of its calculation is as follows:

Let the time-difference correlation coefficient be \(\lambda\), the benchmark indicator is the vector of indexes of the level of development of primary education from 1995 to 2022, i.e.\(Y=\left\{{Y}_{1},{Y}_{2},\cdots ,{Y}_{n}\right\}\), the selected indicators are the sub-indicators in Table 1, and the vector of sub-indicators is \({X}_{i}=\left\{{X}_{i1},{X}_{i2},\cdots ,{X}_{{in}}\right\}\), of which \(i=\mathrm{1,2},\cdots ,m\), \(m\) is the number of sub-indicators, and there are a total of 16 sub-indicators in the present study, \(m=16\); \(n\) (\(j={1,2},\cdots ,n\)) is the total number of years, and the present study is for a total of twenty years from 1995 to 2022, and n = 28.

$$\begin{array}{l}\lambda =\frac{\mathop{\sum }\nolimits_{j={j}^{{\prime} }}^{{n}_{t}}({X}_{i,j+t}-\bar{X})({Y}_{j}-\bar{Y})}{\sqrt{\mathop{\sum }\nolimits_{j={j}^{{\prime} }}^{{n}_{t}}{({X}_{i,j+t}-\bar{X})}^{2}\mathop{\sum }\nolimits_{j={j}^{{\prime} }}^{{n}_{t}}{({Y}_{j}-\bar{Y})}^{2}}}\,,t=0,\pm 1,\pm 2,...,\pm T,{j}^{{\prime} }\\\quad=\left\{\begin{array}{c}1,t\ge 0\\ 1-{t},t < 0\end{array}\right.\end{array}$$

where \(t\) is the overrun or lag period, t > 0 is the overrun period and t < 0 is the lag period, so \(t\) is called the time difference or delay number. T indicates the maximum number of delays and \({n}_{t}\) is the amount of data after alignment. When selecting indicators, usually calculate the correlation coefficient of time difference for a number of different delays, and take the absolute value of the largest time difference correlation coefficient \({\lambda }_{\lambda }=\mathop{\max }\limits_{-T\le t\le T}{\lambda }_{t}\) as the correlation coefficient of the time difference between the selected indicator and the benchmark indicator, and the corresponding \({j}^{{\prime} }\) is the over-advanced or lagged period, and the closer this coefficient is to 1 the more desirable it is in the test. In the specific calculation, if the absolute values of the maximum and second largest time difference correlation coefficients are close to each other, necessary adjustments can be made to the overrun and lag periods according to the actual situation of economic activities. Lagged and synchronous indicators do not have much influence on the benchmark indicators, and only the overshooting and stronger correlation indicators are the main influencing factors of the benchmark indicators, which are thus used as indicators of the subsequent multiple linear regression prediction models.

Table 1 Evaluation index system for the development level of primary education in urban and rural areas of China.

Multiple linear regression models

The study of the linear relationship between a dependent variable and multiple independent variables is known as a multiple linear regression model. With a group n (n = 28) of k independent variables, the relationship is obtained:

$${y}_{j}=\alpha +{\beta }_{1}{x}_{j1}+{\beta }_{2}{x}_{j2}+\cdots {\beta }_{k}{x}_{{jk}}+{\varepsilon }_{j}\,,j=1,2,\cdots ,n$$

where \({y}_{j}\) is the level of development of primary education in nationwide (or urban and rural areas) in year \(j\), \(\alpha ,{\beta }_{1},{\beta }_{2}\cdots ,{\beta }_{k}\) is the parameter to be solved for, and \({\varepsilon }_{j}\) is the random error.

The goodness of fit and significance of the regression model also need to be tested, using \({R}^{2}\) inspection method to test the degree of correlation between all explanatory and interpreted variables, and the significance of the model is tested by the significance of the overall linear relationship between the explanatory and interpreted variables (Chen et al. 2017).

Beta convergence analysis

The convergence model is an important approach for analyzing regional disparities. It is based on the convergence theory of neoclassical economic growth theory (Xie and Yu, 2021). There are three types of convergence methods: sigma convergence, beta convergence, and club convergence. Among these, more scholars tend to use the beta convergence analysis method. Beta convergence is a method that links growth and its steady state. In other words, under the beta convergence, less developed regions catch up with the growth rate of developed regions over a long period. The development of primary education speed and its level of development are negatively correlated under beta convergence. Beta convergence can be divided into two types: absolute beta convergence and conditional beta convergence. Absolute beta convergence assumes that any region’s primary education development level has the same “basic characteristics” (e.g., per-student education expenditure, teacher strength, level of economic development). On the other hand, conditional beta convergence does not consider the “basic characteristics” (Li and Luo, 2015).

In 1992, Barro and Sala-I-Martin proposed the classical regression equation (Barro and Sala-i-Martin, 1992) for testing absolute beta convergence:

$$\frac{1}{T}\log \left(\frac{{{y}_{s}}_{,t+T}}{{y}_{s,t}}\right)=a+b\log ({y}_{s,t})+{\varepsilon }_{s,t}$$

where \(y\) denotes the level of development of primary education, \(s\) refers to overall, urban, and rural areas, respectively, \(t\) is a point in time, \(T\) denotes the number of periods in the sample, and \({\varepsilon }_{s,t}\) is the random error term; in addition, \(b=-\frac{1-{e}^{-\beta T}}{T}\) and beta denotes the speed of convergence. When \(b \,<\, 0\), the examined region shows absolute beta convergence (Han and Li, 2013) within the period \(T\). In this study, to make the best use of the data as much as possible, let \(T=1\).

The development level of primary education may vary greatly between urban and rural areas due to their internal structural characteristics. Therefore, the absolute beta convergence analysis method needs to be modified by adding relevant control variables to establish the conditional beta convergence of the development level of primary education. This is expressed as

$$\frac{1}{T}\log \left(\frac{{y}_{s,t+T}}{{y}_{s,t}}\right)=a+b\log ({y}_{s,t})+\mathop{\sum }\limits_{k=1}^{Z}{c}_{k}\log ({x}_{k,(s,t)})+{\varepsilon }_{s,t}$$

where \(k\)(\(k=\mathrm{1,2},\cdots ,z\)) denotes the number of control variables added, and the rest of the variables have the same meaning as in the absolute beta convergence formula. As above, let \(T=1\). Since the independent variables in the model are all delayed T-periods, the probability of endogenous problems is reduced.

Empirical results

Analysis of index weights

Figure 1, derived from Table 1, shows the weights obtained using the entropy method are nearly identical to those derived from the coefficient of variation method, except for some minor differences. Consequently, the combined weights exhibit a higher degree of overlap with the results from the two methods. It shows that the methods used in this study are robust and reasonable. The weights of \({x}_{1}\),\({x}_{2}\), \({x}_{5}\), \({x}_{6}\), \({x}_{14}\), and \({x}_{15}\) are larger and close to 0.1 in Fig. 1. Notably, four out of these six indicators pertain to educational conditions, while two relate to economic and social conditions, underscoring the significant impact of resource allocation and socioeconomic support on primary education development in China, meanwhile showing the internal data changes of these 6 indexes (i. e. the difference between urban and rural areas) are more obvious, which can more sharply reflect the operation of the evaluation system. \({x}_{8}\), \({x}_{9}\), \({x}_{10}\), and \({x}_{13}\) have the lowest weights, below or around 0.01. Three of these four indicators come from teacher strength and one from student development, suggesting their relatively lower impact and smaller internal data change degree (see Appendix ) on primary education development.

Fig. 1
figure 1

Weights of the indicators of primary education level in China.

Evolution of primary education in urban and rural China throughout history

Using the entropy variation coefficient-comprehensive index method to measure the level of primary education development in China’s nation, urban areas, and rural areas.

Evolutionary characterization of sub-indicators

Panel A in Fig. 2 highlights the significant improvement in educational conditions over time, with the gap between urban and rural areas initially widening before narrowing. From 1997 to 2022 (excluding 1995–1996 to avoid artificially inflated annual growth rates caused by extremely low initial values in rural areas), the average annual growth rates for educational conditions at the national, urban, and rural contexts were 13.32%, 10.51%, and 14.88%, respectively. These trends can be attributed to three main factors. First, the overall improvement in educational conditions reflects the government’s prioritization of basic education and increased investment, such as the 13.62% annual growth rate in per capita education expenditure \({x}_{1}\). Second, rural education conditions have significantly improved due to targeted policies, including the “Two-free and one-subsidy” policy for rural compulsory education launched in 2001 (Guo et al. 2024), the “Deepening reform of the mechanism for guaranteeing rural compulsory education funding” introduced in 2005, and the “Targeted poverty alleviation policy” initiated in 2015 (see Appendix ). Third, urban education conditions developed rapidly before 2015, leading to a widening gap in rural areas. However, since 2015, this gap has started to narrow due to the implementation of precision poverty alleviation policies, with education being a central component. During this phase, significant resources were allocated to rural areas. Despite these efforts, the gap remains substantial due to the historical under-investment in rural education compared to urban areas.

Fig. 2
figure 2

Historical evolution trend of sub-system comprehensive index measurement results.

Panel B in Fig. 2, the national teacher strength demonstrates a gradual increase followed by a slight decline. The difference between the teacher strength indices in urban and rural areas initially narrows, then widens, and finally narrows again. From 1995 to 2022, the average annual growth rates for teacher strength in national, urban, and rural contexts were 3.00%, 1.72%, and 3.35% respectively, with rural areas experiencing the fastest growth rate. Regarding the gap between urban and rural teacher strength, the difference in indices decreased from 0.26 in 1995 to −0.08 in 2001. It then increased from 0.04 in 2002 to 0.61 in 2016, before shrinking again from 0.57 in 2017 to 0.31 in 2022. As shown in Fig. 1, the ratio of teachers to students is a key influencing factor for teacher strength. The trends in teacher strength and the ratio of teachers to students exhibit significant convergence (see Appendix ).

Panel C in Fig. 2, the student development index showed a downward trend, with a pronounced decline in both the overall and rural areas, whereas urban areas experienced two growth phases between 1995–1999 and 2009–2014. The rural student development index was larger than that of the urban areas until 2011. However, a sharp increase in the urban index post-2011, followed by the hump-shape since then, saw it almost on par with the rural index by 2022. According to Fig. 1, the weight of the number of primary schools per student and the number of classes per student was the largest in the student development subsystem, underscoring the importance of school-student concentration and small class sizes for student development (see Appendix ). The reasons for the trend in student development in the paper are the changes in compulsory education policy, the acceleration of urbanization, and the reduction of the birth rate (Hannum and Wang, 2022; Zhang, 2017; Zhang et al. 2015) (see Appendix ).

Panel D in Fig. 2, the indices for national, urban, and rural socioeconomic development have demonstrated a continuous and stable upward trend. From 1995 to 2022, the average annual growth rates were 10.54%, 8.00%, and 11.52%, respectively. The growth of these socioeconomic development indices was primarily driven by sustained increases in GDP and disposable income per capita. As illustrated in Fig. 1, GDP per capita and disposable income per capita are the most significant components of the socioeconomic development sub-system, essentially explaining the overall growth of the social economy. Additionally, the gap between urban and rural areas has been widening steadily since 1995. This is largely due to the underdeveloped state of rural socioeconomic infrastructure. Although the growth rate of the rural socioeconomic development index is higher, the gap between urban and rural areas has still increased at an average annual rate of 7.09%. This indicates that without addressing the growing income disparity between urban and rural populations, it will be challenging to foster social and economic development and achieve convergence.

Analysis of general evolutionary features

Based on Fig. 3, primary education development in China has experienced a steady upward trend from 1995 to 2022. The average annual growth rates for primary education development were 6.89%, 6.91%, and 5.65%, for the national, urban, and rural areas, respectively. The higher growth rate in urban areas compared to rural areas has led to a widening gap in education development between urban and rural regions before 2018, a trend that began to shift in 2019, and the gap between urban and rural areas is narrowing (see Appendix ).

Fig. 3
figure 3

Evolution of primary education level in urban and rural China.

Predicting the development of primary education in urban and rural areas of China

Predictor screening based on time-difference correlation analysis

The development of primary education in China is influenced by several factors. This study has identified 16 core indicators for analyzing the development level of primary education in both urban and rural areas. Using SPSS26.0 software, a time-difference correlation analysis was conducted on these indicators. The development level of primary education in overall, urban, and rural areas from 1995 to 2022 is used as the benchmark indicator. The 16 sub-indicators were taken as the selected indicators affecting the benchmark indicator. The correlation coefficients of the selected indicators relative to the benchmark indicators are calculated for several periods of overrunning, synchronousness, or lagging (typically over three years, with this study using a 5-year moving period). The resulting classification categorized the indicators as leading, synchronous, or lagging based on their time-difference correlation coefficients (absolute values exceeding 0.5).

The analysis of the time difference correlation method yields Table 2. This table includes seven priority indicators in general: \({x}_{7}\), \({x}_{8},\) \({x}_{9},\) \({x}_{10},\) \({x}_{11},\) \({x}_{12},\) and \({x}_{16}\). Additionally, there are seven priority indicators for urban areas and six priority indicators for rural areas. The priority indicators for the level of development of primary education in urban and rural areas are partially consistent with the whole. However, urban areas have three additional indicators, including \({x}_{3},\) \({x}_{4},\) and \({x}_{13}\), rural areas have two additional indicators, including \({x}_{1}\) and \({x}_{13}\). The principle of time difference correlation analysis considers synchronous and lagging indicators, as well as indicators with a time difference correlation coefficient with an absolute value of <0.5, to have a weaker influence on the benchmark indicators. On the other hand, the leading indicators with a correlation coefficient with an absolute value of more than 0.5 can better reflect and predict the benchmark indicators.

Table 2 Leading, synchronous, and lagging indicators for the development level of primary education.

All prior indicators of the development level of primary education in overall and urban areas have correlation coefficients >0.5. Therefore, they are considered as main factors affecting the development level of primary education. For rural areas, only one leading indicator is considered a secondary factor because its absolute correlation coefficient is <0.5. The excluded metric is \({x}_{13}\).

Prediction and testing of multiple linear regression models

  1. (1)

    Model construction

    Using the time difference correlation analysis method, the study identified the early warning factors that affect the development levels of primary education in overall, urban, and rural areas. The dependent variable, in this case, was the development level of primary education. Regression analysis was performed using Eviews 8.0 software to obtain the multivariate regression prediction model for primary education development levels in China’s overall, urban, and rural areas:

    $$\begin{array}{l}{y}_{{{\rm {nation}}}}=-0.905-1.397\times 1{0}^{-2}{x}_{7}-2.407\times 1{0}^{-3}{x}_{8}\\\qquad\quad\;\;\;+\,1.332\times 1{0}^{-1}{x}_{9}-1.056\times 1{0}^{-1}{x}_{10}-54.196{x}_{11}\\\qquad\quad\;\;\;+\,17.409{x}_{12}+3.679\times 1{0}^{-3}{x}_{16}\end{array}$$
    $$\begin{array}{l}{y}_{{{\rm {urban}}\; {\rm {areas}}}}=0.570+1.359\times 1{0}^{-1}{x}_{3}+2.790\times 1{0}^{-5}{x}_{4}\\\qquad\quad\qquad\;\,-\,3.679\times 1{0}^{-2}{x}_{7}+6.340\times 1{0}^{-4}{x}_{10}-15.917{x}_{12}\\\qquad\quad\quad\quad\;\,-\,3.745\times 1{0}^{-3}{x}_{13}-1.708\times 1{0}^{-3}{x}_{16}\end{array}$$
    $$\begin{array}{l}{y}_{{{\rm {rural}}\; {\rm {areas}}}}=-6.527\times 1{0}^{-2}+2.650\times 1{0}^{-5}{x}_{1}+6.296\times 1{0}^{-3}{x}_{7}\\\qquad\quad\;\quad\quad-\,1.640\times 1{0}^{-3}{x}_{8}-60.977{x}_{11}\text{+}18.848{x}_{12}\end{array}$$
  2. (2)

    Validity test and modification of the model

    In general, Pearson correlation analysis requires that the absolute value of the correlation coefficient between the two variables is >0.5, and the corresponding P-values is <0.1. The Multicollinearity test for variance inflation factor (VIF) requires IF values of <10 for the independent variables. The closer the value of \({R}^{2}\) is to 1, the better the fitting degree of the model is. The larger the F-statistic and the smaller the P-value of the model, the more significant the effect of the independent variables on the dependent variable. According to the above test method, it is necessary to eliminate \({x}_{7},{x}_{8}\) and \({x}_{11}\) in the prediction model of the overall primary education development level, while urban areas should discard \({x}_{4}{,x}_{10},\) \({x}_{13}\) and \({x}_{16}\), and rural areas should get rid of \({x}_{7}\) and \({x}_{8}\). Thus, the appropriate and effective modified prediction models are obtained.

    Modified multiple regression prediction model for the primary education development levels in China’s overall, urban, and rural areas:

    $$\begin{array}{l}{y}_{{{\rm {whole}}\; {\rm {amendments}}}}=-1.706+0.174{x}_{9}-0.125{x}_{10}+13.809{x}_{12}\\\qquad\qquad\qquad\qquad+\,4.171\times {10}^{-3}{x}_{16}\end{array}$$
    $${y}_{{{\rm {urban}}\; {\rm {amendments}}}}=0.123+0.130{x}_{3}-3.292\times 1{0}^{-2}{x}_{7}-14.600{x}_{12}$$
    $${y}_{{{\rm {rural}}\; {\rm {amendment}}}}=-0.211+2.62\times 1{0}^{-5}{x}_{1}-62.013{x}_{11}+19.838{x}_{12}$$
  3. (3)

    Determination of the reasonableness of the model

The data from 1995 to 2022 for overall, urban and rural areas in China were used to predict the level of development of primary education. The predicted value was then compared with the actual value for each year, and the relative errors were analyzed. The results are presented in Table 3. The table shows that the absolute relative errors of overall, urban and rural areas in the table are >10% for 2, 5, and 3 years, respectively, and <10% for all other years. Therefore, the forecasts are more accurate except for a few years, it can be concluded that the model used to predict the level of development of primary education is reasonable.

Table 3 Relative error between predicted and actual values of primary education development level in China from 1995 to 2022.

To further assess the model’s validity in predicting the development level of primary education, data from 1995 to 2017 were used to project development levels for 2018–2022 in national, urban, and rural areas in China. The data of early warning indicators derived from the 2018 to 2022 prediction were applied to a multivariate linear regression model to obtain the predicted value of the primary education development levels during this period. Then, compare the predicted value with the actual value and according to the relative error judgment the accuracy of the model’s prediction of the dependent variable based on the estimated independent variable.

The revised prediction model incorporates four indicators in nation: \({x}_{9}\), \({x}_{10}\), \({x}_{12}\), and \({x}_{16}\). The average years of teacher education and the structure of teacher titles have been increasing year by year, with an average annual growth rate of 1% and 0.5%, respectively, use those as the projection estimate for \({x}_{9}\) and \({x}_{10}\) for the next 5 years. In response to the problem of ageing, China has changed its birth policy to allow three children since 2021, but the effect of the policy has yet to be seen, with negative population growth for two consecutive years so far. Considering the possible lag period of the policy support effect, a negative growth rate of 3% is estimated for x12 in the next 5 years based on historical data with an average annual decline rate of 4.8%. The dependency ratio had declined by an average of 2% annually until 2010 but had risen by 2% since 2011, taking into account the impact of the previous family planning policy, and a 2% increase was used as a projection estimate for \({x}_{16}\) in the next 5 years.

For urban primary education, the revised prediction model includes three indicators: \({x}_{3}\), \({x}_{7}\), and \({x}_{12}\). \({x}_{3}\) uses the projected growth rate of 3%, consistent with the national projection. \({x}_{7}\) was calculated based on the average of the past 6 years, while \({x}_{12}\) was predicted using the 1995–2017 average for the next 5 years.

The revised rural primary education prediction model comprises three indicators: \({x}_{1}\), \({x}_{11}\), and \({x}_{12}\). Since sustainable macroeconomic growth rates typically range from 2% to 6%, the average annual growth rates of 8% are used to estimate \({x}_{1}\) for 2018–2022. Taking into account state preferential policy support for primary education, the same negative growth rate of 3% is used as the predicted values for the next 5 years for \({x}_{11}\) and \({x}_{12}\).

This predictive approach was used to estimate the development level of primary education across China’s overall, urban, and rural areas. The estimated values were compared against actual data to assess the model’s accuracy and validity. The results, summarized in Table 4, indicate that the relative error for urban primary school development in 2022 was “unqualified”, while the rest predictions were rated as “Good”, “Qualified” or “Barely qualified”. Moreover, only the relative error of the national and urban predicted values in 2018 was rated as“Barely qualified”. The values of the indicators estimated by this method are feasible, and the model’s predictive results were found to be reasonable and valid.

  1. (4)

    Forecast results and analyses

Table 4 Prediction of the development level of primary education from 2018 to 2022.

The following indicators were estimated while considering current trends and conditions. For national primary education development, the average years of teacher education were projected with a 0.3% annual increase over the last 5 years, and the structure of teacher titles was estimated on the 1995–2022 average. As the impact of the birth policy has gradually become apparent, the average number of classes per student has changed little in the last 5 years, with a decline rate of <1%. Therefore, it is projected that the average number of classes per student will decline by 0.5% for 2023–2035. Due to the increasing challenges of an aging population, China’s dependency ratio has been rising continuously since 2010, with an average annual increase rate of 2%. Therefore, the dependency ratio in the context of the fertility policy change is estimated at an increase of 1% from 2023 to 2030 and a decrease of 1% from 2031 to 2035.

For estimating the development of primary education in urban areas, the floor space of school buildings per student was estimated at an average of 11.5 square meters over the past 15 years, and the ratio of teachers to students was estimated at 8.5% of the average from 1995 to 2022, and the estimated average number of classes per student is the same as the national forecast rate of decline by 0.5%.

In estimating the level of development of rural primary education, the median of the sustainable growth rate of macro-economy is 3%, which is the forecast of the average annual growth rate of the per student education expenditure in the future. And the rates of change for the number of primary schools per student and the number of classes per student were projected to follow the same trend as the national rate of change for the number of classes per student.

These estimated values were applied to the model to forecast the development levels of primary education in national, urban and rural areas from 2023 to 2035.The data presented in Fig. 4 indicates that the primary education development level in China is gradually improving, but at a slow pace across all categories, including national totals, urban areas, and rural areas (China’s economic development level has changed from high speed to steady growth, and the investment forecast of primary education development level is consistent with it, so the forecast chart Fig. 4 is no longer as steep as the actual chart Fig. 3 in the past). Additionally, while the gap between rural and urban primary education levels in China has narrowed, the reduction remains modest, highlighting that significant imbalances in urban and rural primary education development persist.

Fig. 4
figure 4

Prediction of the development level of primary education in urban and rural areas of China from 2023 to 2035.

Figure 5 illustrates the disparity in the development levels of primary education between urban and rural areas, as derived from Figs. 2 and 4. From 1995 to 2018, this gap continued to widen. However, between 2019 and 2035, the gap gradually narrowed due to significant state support for rural education infrastructure, teacher recruitment, student development, and other key areas, resulting in an annual average reduction rate of 2.63%. Although the gap in primary education levels between urban and rural areas in China has decreased, this decline is still modest, highlighting the persistent imbalance in the development of primary education across the two regions.

Fig. 5
figure 5

The development level gap of primary education in urban and rural areas of China from 1995 to 2035.

Convergence analysis of the level of development of primary education in urban and rural areas in China

As can be seen from Table 5, the regression coefficients for the overall, urban and rural primary education development levels are negative, whether we consider the actual data of 1995–2022 or include the predicted data of 2023–2035. In addition, only from 1995 to 2035, the development level of primary education in nation and urban areas has a significant absolute convergence, the rest is not significant absolute convergence. In terms of the remarkable absolute convergence, the convergence rate of the general and urban primary education development level was slower, and the urban convergence rate was faster than that of the general from 1995 to 2035, therefore, it implies that the development level of primary education in rural areas is extremely unbalanced and difficult to converge.

Table 5 Absolute beta convergence of the development level of primary education.

Combining the main factors affecting the development level of primary education in national totals, urban, and rural areas in the aforementioned multiple regression linear prediction model, the eight control variables of \({x}_{1}\), \({x}_{3}\), \({x}_{7}\), \({x}_{9}\), \({x}_{10}\), \({x}_{11}\), \({x}_{12}\), and \({x}_{16}\) are added to the conditional beta convergence to obtain the analyses in Table 6. According to the actual data from 1995 to 2022, there is a not significant conditional beta convergence in the level of development of primary education in urban and rural areas, in which all the factors influencing the level of development of primary education are insignificant. Together with the predicted data, there is conditional beta convergence for both rural and urban primary education development levels, but they are not significantly converging, and all the influencing factors are in-distinctive.

Table 6 Beta convergence of conditions for the development level of primary education.

Conclusion and discussion

The purpose of this study is to establish a comprehensive evaluation system for assessing the level of development of primary education. The research utilized national, urban and rural areas from 1995 to 2022 of time series data (full sample data at national level). The entropy variation coefficient-comprehensive index method is applied to explore the historical trends in the developments of primary education in China’s urban and rural regions. The concept of early warning was integrated through time-difference correlation analysis to identify key factors influencing primary education development. These factors are then used as explanatory variables to construct a multivariate regression prediction model, which anticipates the future development trend of primary education from 2023 to 2035. Finally, the research assessed the convergence of the level of development of primary education using beta convergence analysis. The main findings of this study can be summarized in four key points.

Firstly, the study found that primary education in China’s urban and rural areas has shown steady improvement (see Fig. 3). However, due to the faster growth rate in urban areas compared to rural areas, the urban-rural divide was still significant. Currently, there are only few studies that have measured the development level of urban and rural education in China. It is generally believed that the gap between urban and rural education in China has been narrowing, driven by national policy shifts, rapid industrialization, and other factors (Luo et al. 2022; Guo and Li, 2024). Two main perspectives emerge from existing research: one emphasizes the heterogeneity in regional education investment, teacher quality, and student demographics, leading to a widening gap in education development (Tu et al. 2005; Cai and Song, 2021). The other suggests that factors such as large-scale migration driven by industrialization, regional coordinated development strategies, reform of the household registration system, and other factors have gradually reduced the degree of imbalance in China’s regional education development (Xu and Liu, 2023; Xie and Yu, 2021). Our conclusion diverge from most studies because most studies used single indicators such as school enrollment, years of education, and academic achievement. For example, Zhang et al. (2015) focused on academic achievement as dependent variable, and uses data from the China Family Panel Survey (CFPS) and the Rural–Urban Migration in China (RUMiC) survey to compare the education performance of rural children, children of rural-to-urban migrants, and urban children over the period 2009–2010. They find that there is still significant inequality in the education performance of children between rural and urban areas of the PRC. In particular, rural-to-urban migration, which had been expected to play an active role in reducing education inequality, could not contribute much to narrow the gap. Similarly, Zhang (2017) also uses the Rural–Urban Migration in China (RUMiC) survey, takes school enrollment as dependent variable, and applies logistic regression and survival analysis to illustrate the new education-poverty trap imposed on migrant children by the institutional constraints and hierarchies in children’s education, created by the Chinese household registration system in Chinese cities. While we take the development level of primary education as the dependent variable, which is a comprehensive evaluation system that incorporates both internal and external factors. We have deconstructed the level of education development in urban and rural areas into four aspects: educational conditions, teacher strength, student development, and socioeconomic development. Four indicators show that education in urban areas is better than in rural areas (see Fig. 2). The combination of these sub-indicators reveals a widening gap between urban and rural primary education development levels (it started to shrink in 2022). The study highlights obstacles hindering the convergence of primary education development in China’s urban and rural areas, including educational infrastructure, teacher competency, student outcomes and socioeconomic factors. Rural areas face a weaker foundation, so priority should be given to improving educational infrastructure, teacher competency, student development and supporting socioeconomic growth. The government should thus focus on enhancing rural education, increasing teacher availability, enhancing students’ academic performance, and promoting rapid socioeconomic development. These efforts are essential for balancing primary education between urban and rural areas.

Secondly, according to our projected values for 2023–2035 indicate that while the gap between urban and rural education is slowly narrowing. Our prediction results show that the level of education development in national, urban and rural areas will steadily increase from 2023 to 2035. In addition, the average annual growth rate of primary education development in urban areas is lower than that in rural areas. The gap between the two is narrowing. The existing research mainly focuses on the prediction of students’ academic achievement, the demand for educational resources, and the change of the school-age population. It is evident that the existing predictions mostly use a single indicator. However, we use multiple indicators to make a useful supplement to the existing literature (Xu and Liu, 2023; Li et al. 2021; Zhou, 2024). In addition, our conclusions are based on the calculation of future investment, considering the current investment scale and its growth forecast. This finding has important implications, suggesting that the government should focus on optimizing top-level design and addressing the constraints of weak rural primary education foundations. Such constraints have led to an inability to self-regulate and catch up with urban education for a long time. These factors include education conditions, socioeconomic conditions, teacher strength and student development within the subsystem, such as education funding, number of students on campus, the average years of education received by teachers, and more.

Thirdly, our research uncovered concerning evidence that the urban–rural education divide has only slightly narrowed. The results of absolute beta convergence analysis align with our findings, showing no significant absolute convergence for national, urban and rural areas between 1995 and 2022. Although slow and significant absolute convergence with national and urban areas between 1995 and 2035, the same cannot be said for rural areas. In addition, the speed of convergence in urban areas is faster than that in general, which further indicates that the disequilibrium within rural areas is also difficult to reconcile. Our findings make a valuable contribution to the existing literature on educational convergence, highlighting the urgent need for action to address this growing inequality and ensure equal access to education for all. While previous studies noted convergence trends in national and urban education (Liaskos and Papadas, 2010; Han et al. 2013). However, our research findings are clear: continuing to invest in education at current levels in both urban and rural areas will not sufficiently improve the differentiation of primary education development between these two areas. It is crucial to recognize that even if there is a slow convergence of primary education development within urban areas, there will still be long-term imbalances that need to be addressed. Therefore, it is essential to take action and implement targeted measures to address this issue and ensure equal access to quality education for all.

Lastly, regarding the development of primary education in urban and rural areas, conditional convergence is a key factor. However, from 1995 to 2022, there was no conditional convergence in overall, urban and rural areas, while overall and urban areas showed unremarkable conditional convergence from 2001 to 2035. After controlling for variables, it was observed that optimizing investment resources and promote the gathering of schools and students for primary education can promote a balanced development process of primary education across the nation, including both urban and rural areas. This view is supported by current research such as Sab et al. (2001) study on education and health human capital convergence, Stamatakis, Petrakis (2006) research on human capital convergence within countries, as well as Zhou et al. (2019) on educational convergence in Chinese. The study also highlights key factors influencing primary education development (see Fig. 1). These include average per-student educational expenditure, per capita fixed assets output, the percentage of schools with a campus network, computers per student, the teachers-students ratio, average years teacher education, and per capita disposable income. Enhancing teacher quality and availability, as well as increasing education investment and household income, can positively impact student numbers and overall educational development. Concentrating educational resources by encouraging student clustering can further support balanced development. Achieving balanced primary education development in urban and rural areas requires promoting income and resource balance between urban and rural residents. While some existing studies have reached similar conclusions, their focus has often been limited to specific indicators, such as graduation rates, teacher-to-student ratios, investment in education, and average years of schooling (Li et al. 2015). This lack of comprehensiveness means that a more comprehensive approach is necessary to fully understand the factors that contribute to the development of primary education.

Limitations and future research directions

It is important to note that the accuracy of our study’s findings and results is limited by the reliability of our forecasts. Since our projections are based on historical data and assumption that future trends will mirror past pattern, any deviation from this assumption may result in biased forecasts and inaccuracies. To ensure the validity of our conclusions, follow-up research is recommended to validate our findings. For broader applicability, we suggest applying the research methods from this study to analyze the development of primary education in other countries, particularly developing nations with significant educational disparities. In addition, our analysis is based on national-level time series data, which limits the granularity of insights. Unlike studies such as Zhang et al. (2015) and Zhang (2017), which examine the nuanced impact of rural-urban migration and the household registration system on the development of primary education, our approach does not capture micro-level influences. Therefore, future research should also focus on understanding the micro-level factors such as policy effects, students’ mental and physical health, home-school relationships, and compare the results with other types of education, such as vocational and higher education.