Introduction

Road Traffic Injuries (RTIs) -defined as injuries or fatalities caused by collisions involving moving vehicles on roads1- result in 20–50 million non-fatal injuries and 1.35 million deaths annually worldwide, according to the WHO 2018 Global Status Report on Road Safety2. Notably, 93% of these fatalities occur in low- and middle-income countries. China, as the world’s largest developing country, faces a severe road traffic injury (RTI) burden, with over 250,000 annual deaths—representing 19% of global fatalities and ranking second worldwide3,4. These crashes constitute a critical public health crisis, posing significant threats to population health and safety.

According to research5,6,7, Shandong Province in China is among the regions most highly prone to road traffic crashes, having some of the highest mortality and accident rates in the country. Shandong Province is a major land transportation hub, boasting a dense road network, a large road network capacity, and strong service and radiation capabilities8. However, road network development in Shandong Province is uncoordinated, with the road transportation network in the Jinan Economic Circle being among the most optimal9. Although the road traffic crash-related mortality rate has recently been declining in Shandong Province10, road traffic crashes remain the leading cause of RTI-related deaths in the region, which are also accompanied by an enormous annual loss of healthy life years and economic advantages11,12.

The traditional notion that road traffic crashes are accidental and unavoidable is quite inaccurate. In fact, the frequency of road traffic crashes is intrinsically regular and could be prevented or even predicted to a large extent13. Given the severe consequences of road traffic crashes, it is imperative to employ statistical methods to identify the development pattern of fatalities resulting from road traffic crashes. This information could be analyzed to establish the likelihood and severity of road traffic crash-related fatalities in high-risk areas such as Shandong Province. Consequently, preventive measures could be promptly implemented, reducing the risk of road traffic crashes, especially fatal ones. Shandong Province have focused on factors like traffic flow14,15, residents’ trips16, traffic accessibility17, and road visibility18, yet no research has specifically addressed fatal crash prediction in this high-risk region. In this regard, there could be a huge challenge to road traffic crash prevention and social public safety in Shandong Province.

Given the sudden and spontaneous nature of road traffic crashes, there are challenges to relevant data collection for research, especially in cases of fatal road traffic crashes. This lack of information is detrimental to predictive research. The Grey system theory, although not beneficial regarding sample size or probability distribution, is designed in a manner that accounts for other influencing factors apart from the primary variables. This system is particularly adept at making accurate predictions in cases of limited data. The GM (1, 1) model, one of the most commonly used and classical models in the grey system theory, has been widely employed in road traffic crash-related research19. For instance, Khatiwada et al.20 assessed the prediction of road traffic crash incidence rates in Nepal using the GM (1, 1) model and found that the model had an excellent prediction accuracy, with an average relative simulation accuracy of 92.59%. Notably, when faced with the issue of limited data, the predictive performance of the GM (1,1) model in road traffic crash studies is enhanced. It is also noteworthy that Machine Learning (ML) algorithms21,22,23 are extensively being employed in road traffic fatality prediction, with the BP neural network model among the most commonly used ML models. The BP neural network model offers the benefits of an enhanced capacity to learn intricate correlations, high adaptability, parallel computing capabilities, strong generalization skills, and non-parametric characteristics; hence, it has been extensively employed across various fields24,25,26,27. For instance, Zhang et al.28 conducted a study aimed at predicting road traffic crashes in China between 1998 and 2009, revealing that the BP neural network model was effective in accurately predicting road traffic crashes.

However, multiple factors often influence road traffic crashes, and available data may show fluctuations and non-stationarity in sequences. Owing to the pronounced fluctuations in data points, the GM (1,1) model may display inadequate fitting and reduced forecasting accuracy. The BP neural network model has inherent drawbacks, including issues with local optima and overfitting during training. Therefore, to address the critical gap in fatal crash prediction for high-risk regions, this study aims to develop a GM-BP joint model that synergizes the GM(1,1) model’s stability in trend extraction with the BP neural network’s adaptability to nonlinear fluctuations. By applying this innovative approach to Shandong Province’s 2012–2022 mortality data, we seek to predict subgroup-specific fatalities, and systematically compare the predictive accuracy of GM(1,1) and GM-BP models across error metrics, which could crucially enhance traffic management and foster a safer environment for all road users.

Materials and methods

Data sources

Data on road traffic fatalities in Shandong Province between 2012 and 2022 were collected from the Population Death Information Registration Management System (PDIRMMS) of the Chinese Center for Disease Control and Prevention (China CDC). Specifically, based on a range of codes for road traffic fatalities in the International Classification of Diseases 10 th Edition (ICD-10), the V01-V04, V06, V09-V80, V87, V89, and V99 codes were selected and divided into four subgroups: pedestrians, non-motorized drivers, passengers, and motorized drivers. We also collected victims’ demographic information, including age, gender, education, marital status, place and time of death. Furthermore, the population data of Shandong Province was sourced from the Shandong Statistical Yearbook.

Quality control

In 2013, the Chinese National Health and Family Planning Commission, the Ministry of Civil Affairs, and the Ministry of Public Security collectively issued circulars and memos calling for the regulation of the registration and management of data regarding individuals’ deaths. This initiative aimed to ensure the prompt and precise acquisition of data related to such occurrences. Furthermore, based on the requirements of the China CDC’s underreporting survey program, the Shandong Provincial Center for Disease Control and Prevention often conducts underreporting surveys every three years across 31 national disease surveillance sites within the province. Notably, based on the findings of these surveys, relevant adjustments to death information are often made across the province. The latest underreporting survey was conducted in 2021. All these initiatives and programs ensured the reliability of the data used in this study.

Statistical analysis

Data on the basic situation of road traffic fatalities in Shandong Province were presented as Mean ± Standard Deviation (SD) and constitutive ratios. The mortality rates for the total population and road traffic crash subgroups in Shandong Province were determined using indirect standardization methods. The standardized mortality rates of the total population and road traffic crash subgroups in Shandong Province were predicted using the GM (1, 1) model and the GM-BP joint model. Notably, the GM-BP joint model was constructed via tri-fold cross-validation for internal validation. The models were assessed using various indicators, including Mean Squared Error (MSE), Mean Absolute Error (MAE), Mean Absolute Percentage Error (MAPE), and Root Mean Squared Error (RMSE) to establish their prediction accuracies, with smaller MSE, MAE, MAPE, and RMSE values indicating that the model has a stronger predictive capability.

The GM-BP joint model was constructed as follows:

  1. (1)

    First, prediction data were obtained via inputting the original data into the GM (1, 1) model;

(2) The BP neural network was then trained using the relative error values from the GM (1, 1) model as input data and the original data as output data;

(3) The prediction data of the GM-BP joint model was then obtained via inputting the prediction data of the GM (1, 1) model into the trained BP neural network model.

The GM (1, 1) model was analyzed using the Grey Modeling Software V 7.0 (Grey System Research Institute, Nanjing University of Aeronautics and Astronautics, China, http://igss.nuaa.edu.cn/main.htm). On the other hand, statistical descriptions and analysis of the GM-BP joint model were performed using the SPSSPRO software V 1.0.11 (Online Application Software, China, https://www.spsspro.com).

Results

Descriptive statistics

Between 2012 and 2022, 176,129 road traffic crash-related deaths were reported in Shandong Province, with an average age of 52.65 ± 17.94 years. Most cases involved individuals aged > 40 years old, accounting for 77.35% (136,239/176,129) of all cases. The gender ratio was 2.77:1 (129405/46724). Most victims were married, accounting for 83.08% of all cases. Furthermore, 79.60% of the reported fatalities had an education level of ≤ junior high school. The average standardized annual mortality rates were 18.59/100,000 persons, 0.27/100,000 persons, 0.78/100,000 persons, 0.32/100,000 persons, and 17.21/100,000 persons for the total population, pedestrians, non-motorized drivers, passengers, and motorized drivers, respectively (Table 1).

Table 1 Basic information on road traffic fatalities in Shandong Province, China, 2012–2022.

Predictive results of the GM (1,1) model

The GM (1,1) model was used to construct the prediction model for the standardized mortality rates for each category of road traffic crashes in Shandong Province. According to the results, the development coefficients (a) and gray effect sizes (b) for each group were as follows: total population (a = 0.050; b = 24.179), pedestrians (a= −0.010; b = 0.262), non-motorized drivers (a = −0.072; b = 0.525), passengers (a = −0.009; b = 0.313), and motorized drivers (a = 0.057; b = 23.274). Furthermore, the average relative errors for each group based on the prediction model were 2.742%, 3.165%, 4.544%, 4.497%, and 2.987% for the total population, pedestrians, non-motorized drivers, passengers, and motorized drivers, respectively. Notably, all groups had an average relative error below 10%, implying the prediction model’s strong fit. Table 2 shows the GM (1,1) model predictions of standardized mortality rates for each group of road traffic crashes in Shandong Province between 2012 and 2022.

Table 2 The GM (1, 1) model fitting results of standardized mortality rates for all group of road traffic crashes in Shandong Province, 2012–2022 (/100,000).

Predictive results of the GM-BP joint model

The GM-BP joint model was constructed after training the BP neural network model with data from the GM (1,1) model. The GM-BP joint model was used to determine the standardized mortality rates of each group of road traffic crashes in Shandong Province from 2012 to 2022. The R2 value was used to evaluate the accuracy of the GM-BP joint model, with higher R2 values (approaching 1) indicating greater accuracy. The R2 values were 0.871, 0.716, 0.983, 0.757, and 0.889 for the total population, pedestrians, non-motorized drivers, passengers, and motorized drivers, respectively. Table 3 shows the predictive results of the GM-BP joint model for the standardized mortality rates in each group of road traffic crashes in Shandong Province. Predictive accuracy was assessed using the average relative error. The average relative error values for the total population, pedestrians, non-motorized drivers, passengers, and motorized drivers were 5.66%, 2.80%, 1.81%, 5.53%, and 6.25%, respectively. Notably, all values were below 10, indicating the model’s superior prediction capabilities.

Table 3 The GM-BP joint model fitting results of standardized mortality rates for all group of road traffic crashes in Shandong Province, 2012–2022(/100,000).

Evaluation of models

The predictive capabilities for the standardized mortality rates for all groups of road traffic crashes in Shandong Province were evaluated using MAPE, RMSE, MSE, and MAE values. According to the results, compared to the GM-BP joint model, the GM (1, 1) model exhibited lower MAPE, RMSE, MSE, and MAE values for the standardized mortality rates of both the overall population and motorized drivers but higher values for pedestrians and non-motorized drivers. Regarding the standardized mortality rates of passengers, compared to the GM (1, 1) model, the GM-BP joint model exhibited slightly higher RMSE, MSE, and MAE values but lower MAPE values (Table 4).

Table 4 Evaluation of prediction models for standardized mortality rates of all groups of road traffic crashes in Shandong Province.

Discussion

The significant number of road traffic fatalities previously reported in Shandong Province has imposed a heavy burden on individual victims, families, and society as a whole, necessitating the implementation of measures to prevent and reduce the occurrence of road traffic crashes, especially fatal ones. In this regard, it is noteworthy that research on the prediction of road traffic fatalities could have positive implications for developing targeted road safety measures. Herein, we constructed a prediction model for road traffic crashes in Shandong Province using data on road traffic fatalities from the PDIRMS of China CDC. In addition to providing a theoretical basis for preventing future road traffic fatalities in Shandong Province, our findings could lay the groundwork for developing targeted preventive measures for various subgroups of road traffic fatalities.

The GM (1, 1) model, a commonly used model in predictive research, offers the benefits of applicability even in cases of incomplete information and relatively accurate prediction results even in cases of small samples. According to research, several factors could influence the occurrence of road traffic crashes, including coercive measures of laws and regulations29, socioeconomic conditions30, atmospheric environment31, and so on. Furthermore, road traffic crashes are often sudden and with incomplete information. Therefore, based on the aforementioned characteristics of the GM (1, 1) model, it is best suited to help in performing prediction research for road traffic crashes in Shandong Province. Multiple studies32,33 have employed the GM (1, 1) model in predictive research for road traffic crashes, with a majority demonstrating its excellent prediction accuracy. Similarly, we found that the GM (1, 1) model effectively predicted the total population and motorized drivers’ standardized mortality rates associated with road traffic crashes in Shandong Province. The model’s predictions aligned closely with the original data, accurately reflecting the evolution process. However, we noted that the accuracies for predicting mortality rates for pedestrians, non-motorized drivers, and passengers were relatively low. This discrepancy could be attributed to the small and less trend-changing values in the original data, which made the GM (1, 1) model less responsive.

Although the GM (1,1) model could effectively forecast the mortality rate for a specific group of road traffic crashes in Shandong Province, it could not equally predict outcomes for all groups, a limitation that could be attributed to its inherent drawbacks. Furthermore, although the GM (1,1) model is accurate in short-term predictions, it has proven unsuitable for long-term, complex, and highly dynamic forecasting studies, making it susceptible to underfitting. Conversely, the BP neural network model has been established to excel at effectively handling large datasets and complex relational data, making it particularly suitable for long-term prediction research. Moreover, the combined use of the GM (1, 1) model and the BP neural network model could mitigate the weaknesses of each individual model regarding prediction outcomes, thus enhancing the prediction accuracy and precision, especially in forecasting road traffic fatalities.

The GM-BP joint model has been widely employed in various prediction studies in multiple fields, including ecology34,35,36, energy37,38,39, engineering40, aerospace41, and computer technologies42, demonstrating various advantages including good prediction accuracies for the individual models. Nonetheless, few studies have applied the GM-BP joint model in road traffic safety research. For instance, Zhang et al.43 conducted a predictive investigation for the number of city taxis using the GM-BP joint model, revealing that besides improving the prediction accuracy for the number of city taxis, the joint model also illuminated the change patterns in the number of city taxis. Similarly, Guo et al.44 analyzed major traffic accidents involving ≥ 10 fatalities in a single incident in China using the GM-BP joint model, demonstrating superior predictive performance over the standalone GM(1,1) model, with results closely aligning with observed fatality and injury data.

In this study, the GM-BP joint model exhibited high prediction accuracy for standardized mortality rates among pedestrians, non-motorized drivers, and passengers in Shandong Province. This may be due to the fact that the GM(1,1) model is effective for small-sample predictions, it struggles with nonlinear data patterns, parameter sensitivity, and outlier susceptibility. The GM-BP joint model overcomes these limitations by synergizing grey theory with neural networks: the GM(1,1) component extracts baseline trends, whereas the BP neural network dynamically adjusts parameters through backpropagation to model nonlinear relationships and mitigate noise interference. This integration enhances prediction stability, improves adaptability to complex data environments, and strengthens generalization capabilities, offering a robust framework for real-world applications such as traffic safety analytics.Therefore, we deduced that the GM-BP joint model could facilitate the effective predictive analysis of road traffic crash-related standardized mortality rates in Shandong Province, particularly in capturing subtle changing trends. In other words, the GM-BP joint model holds significance in formulating preventive strategies for reducing the incidence of road traffic crashes.

Limitations

This study had three key limitations. First, due to Chinese customs, home deaths are frequent, and information on some road traffic fatalities is reported by doctors in villages or community health centers, potentially leading to errors in death information. Second, death-related information used herein was sourced from the cause-of-death surveillance system, which provides limited variables (e.g., lacking behavioral or environmental data), constraining model comprehensiveness. Third, the predictive stability of our models may be affected by unaccounted external dynamics, including future policy reforms (e.g., revised traffic laws), rapid technological advancements (e.g., autonomous vehicle adoption), and extreme weather events (e.g., unprecedented rainfall patterns), which could alter baseline risk profiles. Nevertheless, this study offers a scientific basis and support for improving road safety and reducing crash risk in the future.

Conclusion

We discovered that the GM-BP joint model mitigates the influence of inherent deficiencies in individual models on prediction outcomes, thus improving prediction stability and reliability. Herein, the GM-BP joint model yielded positive outcomes in forecasting the standardized mortality rates for pedestrians, non-motorized drivers, and passengers involved in road traffic crashes in Shandong Province. This study also evaluates the predictive mortality rate of the GM(1,1) and GM-BP joint models for road traffic fatalities in Shandong Province, China, and proposes a comprehensive “legislation-technology-data-resource” policy framework to enhance road safety. Key recommendations include enacting targeted legislation, integrating data from multiple departments, integrating adaptive policies for emerging technologies, prioritizing infrastructure upgrades for hazardous road sections, and strengthening data governance. Our predictive research on road traffic fatalities in Shandong Province could theoretically support transportation management departments and organizations, enhance the comprehension of road traffic crash patterns, and facilitate the formulation of targeted traffic management policies and action plans, such as mandating helmet use for non-motorized drivers and intensifying law enforcement in crash hotspots. Although the GM-BP joint model exhibited limitations in predicting the standardized mortality rates for the total population and motorized drivers involved in road traffic crashes in Shandong Province, our findings remain valid in terms of providing a valuable scientific foundation and support for advancing road safety and mitigating crash risks in the region.