Abstract
Skeletal fluorosis (SF) is a chronic metabolic bone disease resulting from long-term excessive fluoride exposure, affecting millions worldwide. Conventional diagnosis relies on radiographic evidence, which often detects the disease only at advanced stages, limiting opportunities for early intervention and prevention. A predictive model was developed to assess the severity of SF using comprehensive predictors, including demographic, environmental, and biomonitoring data, from 1,309 individuals across three major fluoride-endemic regions in China, representing coal-burning, drinking-water, and brick-tea fluoride exposure. After variable selection using the least absolute shrinkage and selection operator (LASSO) regression, five machine learning algorithms were trained and validated. Model performance was primarily evaluated using the area under the receiver operating characteristic curve (AUC). SHapley Additive exPlanations (SHAP) were applied to enhance model interpretability. The Random Forest model achieved the best predictive performance (AUC = 0.875 in the training set; 0.832 in the test set). SHAP analysis identified pain score, joint function, age, and UF concentration as the most influential predictors of SF severity. The model also captured regional differences in exposure and severity patterns across the three fluoride sources. This interpretable machine learning framework provides a robust tool for early risk screening and severity stratification of SF in high-risk populations. By enabling timely identification of individuals at risk of progression, the model serves as a foundation for targeted public health interventions and highlights the utility of data-driven methods in large-scale environmental health surveillance.
Similar content being viewed by others
Introduction
Skeletal fluorosis (SF) is a neglected chronic disease that poses a significant public health burden in many low-and middle-income countries, with an estimated 100 million people affected globally, particularly in India, Africa, and China1,2. Fluoride occurs naturally in water, air, soil, and food; however, exposure levels exceeding the World Health Organization (WHO) threshold of 1.5 mg/L can lead to serious metabolic and skeletal disorders3,4,5. In China, fluorosis predominantly arises from three environmental exposure pathways: coal-burning, drinking-water contamination, and consumption of fluoride-rich brick tea. These pathways affect 28 provincial-level administrative regions and over 70,000 villages nationwide6. Although each exposure route leads to distinct fluoride absorption patterns, the majority of fluoride is ultimately deposited in mineralized tissues, cumulatively resulting in progressive skeletal damage over time7.
Clinically, early-stage SF presents with subtle symptoms such as joint pain and stiffness, which may progress to severe skeletal deformities and disability if left untreated8. Current diagnostic criteria rely heavily on radiographic evidence, which typically becomes apparent only after significant and often irreversible skeletal damage has occurred. Such diagnostic delays greatly undermine the effectiveness of early interventions9.
Recent advances in machine learning (ML) offer promising opportunities to overcome these diagnostic limitations through predictive modeling. ML algorithms can integrate heterogeneous clinical and environmental data to detect early, subtle indicators of SF progression, potentially outperforming conventional diagnostic approaches10. In this study, we aimed to develop and validate an ML-based predictive framework that integrates environmental fluoride exposure data with clinical biomarkers. We utilized data from three fluoride-endemic regions in China, each representing a unique environmental fluoride exposure route, and evaluated the performance of several ML algorithms. The inclusion of diverse exposure patterns enhances the generalizability of the model across different affected populations. This predictive tool enables early identification of high-risk individuals and provides a valuable approach for targeted public health interventions to reduce the burden and progression of SF.
Method
Study design and participants
Data and participants for this study were drawn from the China Fluorosis Cohort (CFC), initiated in 2022. Based on the national classification criteria for endemic fluorosis areas (GB/T 17018–2011) and historical surveillance data, three provinces with distinct sources of fluoride exposure were selected: Shanxi (drinking-water type), Guizhou (coal-burning type), and Sichuan (brick-tea type)11,12,13. A multi-stage stratified cluster sampling strategy was used to recruit participants. In the first stage, 1–2 endemic areas were selected for each exposure type. In the second stage, the local Centers for Disease Control and Prevention (CDC) selected 3–6 communities/villages in each district based on population size. In the final stage, the diagnosis and grading of SF were conducted by a team of two radiologists and three orthopedic specialists based on standardized X-ray images and the People’s Republic of China Health Industry Standard for Endemic Fluorosis (WS/T192-2021). The same team was responsible for reading all radiographs from the three study regions. All cases were reviewed independently, and any diagnostic discrepancies were resolved through expert consensus. This process led to the inclusion of 1,789 individuals with varying degrees of SF severity.
All procedures in this study adhered to the ethical standards of our university, the National Research Committee, and the 1964 Helsinki Declaration and its later amendments or comparable ethical standards. The study was approved by the Ethics Committee of Guizhou Medical University (Ethics Approval No. 2023 − 189), and all participants provided written informed consent.
Data collection and preprocessing
A multimodal data collection framework combining epidemiologic survey, clinical examination, and laboratory test was used for the data of this study. A structured questionnaire designed by the research team for the CFC was administered through face-to-face interviews by master’s-level graduate students with medical backgrounds. All interviewers received centralized, standardized training prior to fieldwork to ensure consistency in question delivery and interpretation. To maintain data quality, a random sample of 20% of completed questionnaires was cross-checked each evening by an experienced quality control team. Feedback and retraining were provided as necessary, and questionnaires deemed invalid were excluded from the final analysis.
On the final day of fieldwork at each community site, standardized physical examinations were conducted. Local health center nurses were responsible for measuring anthropometric and physiological indicators, including height, weight, and blood pressure. Orthopedic specialists evaluated the functional status and pain levels of major limb joints (shoulders, elbows, knees, and hips) using standardized clinical protocols.
Blood and urine samples were collected from all participants during the survey. These specimens were used to analyze a comprehensive panel of laboratory markers, including liver function, kidney function, lipid level, electrolytes, bone metabolism indicators, and inflammatory markers.
Participants who did not undergo clinical examination or refused to participate in laboratory tests were excluded. Variables with more than 20% missing data were removed, and individuals with over 50% missing values were also excluded from the analysis. For variables with missing data less than or equal to 10%, multiple imputation was performed using the mice package in R to reduce potential bias. The final study dataset incorporated 1309 individuals and was randomly divided into training and test sets in a 7:3 ratio. Minimum-maximum scaling was used for continuous variables to ensure consistent scaling between the training and test sets, and categorical variables were converted to dummy variables by one-hot coding to match the input requirements of the ML models.
Feature selection and ML model building
In order to optimize the feature selection and improve the model interpretation, Least Absolute Shrinkage and Selection Operator regression (LASSO) was used to select the variables in the training set, and the coefficients of the non-critical features were compressed to zero by L1 regularization to reduce the influence of multiple covariates and achieve feature sparsification. Initially, 80 candidate variables were included in the study, and the standardized feature matrix and the dichotomous dependent variable (SF severity, coded 0/1) were input into the model. LASSO regression was conducted using the cv.glmnet() function in the glmnet R package, with 10-fold cross-validation. The regularization path of the penalty parameter was generated automatically using the default data-adaptive sequence provided by glmnet, which spans a logarithmically spaced range from maximal to minimal regularization. A fixed random seed was applied to ensure the reproducibility of fold assignments. Finally, 22 non-zero coefficients were selected for subsequent analysis based on the minimum cross-validation error (lambda.min).
Using these selected features, ML algorithms, including Random Forest (RF), K-Nearest Neighbor (KNN), Support Vector Machine (SVM), Extreme Gradient Boosting (XGBoost), and Decision Tree (DT), were constructed to develop prediction models for SF severity. To ensure model stability and prevent information leakage, all model training and hyperparameter tuning were conducted strictly within the training set using a grid-search strategy combined with 10-fold cross-validation. For each algorithm, a predefined and sufficiently wide hyperparameter space was systematically explored to identify the optimal configuration that maximized the area under the receiver operating characteristic curve (AUC). The search ranges were carefully calibrated to ensure that the finalized optimal values did not reside on the extreme boundaries of the predefined grids, thereby confirming the thoroughness of the optimization process. All evaluated parameters, their respective search ranges, and the final optimized values are detailed in Table 1.
After model optimization, the final models were applied exclusively to the independent test set to evaluate their generalization performance. Model performance was evaluated using AUC, accuracy (ACC), sensitivity (SENS), specificity (SPEC), and F1 score ((1)–(4)). Among these metrics, AUC was predefined as the primary criterion for model comparison and selection, while the remaining indicators provided complementary assessments of classification performance. For the training set, performance metrics were reported as the mean values across the 10-fold cross-validation, whereas for the test set, metrics were calculated directly based on the independent predictions. Furthermore, calibration curves and decision curve analysis (DCA) were conducted on the test set to assess the agreement between predicted probabilities and observed outcomes, as well as to quantify the net clinical benefit across a range of risk thresholds. The evaluation metrics of all classifiers are presented as follows:
-
(1)
\(\text{Accuracy=\:}\frac{\text{TP}\text{+}\text{TN}}{\text{TP}\text{+}\text{FP}\text{+}\text{FN}\text{+}\text{TN}}\)
-
(2)
\(\text{\:Sensitivity=\:}\frac{\text{TP}}{\text{TP}\text{+}\text{FN}}\)
-
(3)
\(\text{\:Specificity=\:}\frac{\text{TN}}{\text{TN}\text{+}\text{FP}}\)
-
(4)
\({\text{F1~Score = 2}} \cdot \frac{{{\text{Precision~}} \cdot {\text{Sensitivity}}}}{{{\text{Precision + Sensitivity}}}}\)
Model explanatory analysis
SHAP (Shapley Additive exPlanations) values are used in this study for feature importance analysis, which is based on the Shapley value principle in cooperative game theory, and can quantify the contribution of each feature variable to the model prediction results, and visualize the direction of the variable’s influence and its relative importance.
Statistical analysis
All statistical analyses were performed using R (version 4.4.2) and SAS (version 9.4). The normality of continuous variables was assessed using the Kolmogorov-Smirnov test, which revealed that all continuous variables were non-normally distributed. Therefore, continuous variables were expressed as median and interquartile range (IQR) and analyzed using Mann-Whitney U test or Kruskal-Wallis test. Categorical variables were expressed as counts and percentages for comparisons using the chi-square test. All statistical analyses were performed using two-sided tests with α < 0.05 considered statistically significant.
Result
Characteristics of participants
The training set included 917 participants (600 with mild SF and 317 with moderate-to-severe SF), and the test set consisted of 392 participants (254 mild, 138 moderate-to-severe). No significant differences in baseline characteristics were observed between the training and test sets (external P > 0.05), indicating that the data split preserved the representativeness of the overall sample. In the overall cohort, patients with moderate-to-severe fluorosis were significantly older than those with mild fluorosis (median age ~ 68 vs. 58 years in the training set, P < 0.001), a trend that was also observed in the test set (P < 0.001). In addition, several demographic characteristics were associated with fluorosis severity. Notably, living area (region of residence), ethnicity, education level, and annual income differed significantly between mild and moderate/severe groups (P < 0.001 for each), whereas there was no significant difference in severity between male and female participants. The distribution of disease severity varied markedly by region. The coal-burning fluorosis region, Guizhou, had a much higher proportion of moderate-to-severe cases compared to the brick-tea fluorosis region, Sichuan, with the drinking-water fluorosis region, Shanxi, in between (overall P < 0.001 for living area). For example, in the training set, approximately half of the participants from Guizhou had moderate-to-severe SF, versus only around 15% of those from Sichuan (brick-tea area), with Shanxi around 40%. A similar pattern was seen in the test set. These differences reflect the varying intensity of environmental fluoride exposure across regions.
As for bone health–related indices, bone mineral density and a bone mass index measure were significantly lower in participants with moderate-to-severe SF than in those with mild SF. Conversely, parathyroid hormone (PTH) levels were significantly higher in the moderate/severe group than in the mild group (median 52.8 vs. 48.3 pg/mL, P = 0.002), whereas β-CrossLaps (a marker of bone resorption) levels were significantly lower in moderate/severe compared to mild cases (0.45 vs. 0.52 ng/mL, P = 0.001). These differences in bone turnover markers and bone indices were consistent in both the training and test sets (P < 0.05 in each set).
In terms of general biochemical and hematological indices, we also observed significant differences between severity groups. Total bilirubin, urinary fluoride (UF), white blood cell count (WBC), neutrophil count (NEUT), mean corpuscular volume (MCV), mean corpuscular haemoglobin (MCH), and multiple inflammatory cytokines (e.g. interleukin-5, interferon α, IL-2, IL-1β, IFNγ, and IL-4) all differed significantly between mild and moderate/severe SF. Notably, UF levels in patients with moderate-to-severe SF were substantially higher than those in mild cases (median 2.04 vs. 1.30 mg/L, P < 0.001). Interestingly, several inflammatory cytokines were higher in mild fluorosis than in moderate/severe fluorosis. The trends for all these biomarkers were similar in both the training and test cohorts. Detailed data on the characteristics and group comparisons can be found in Table 2.
Feature selection
The LASSO regression identified 22 non-zero coefficient features from the 80 candidate variables that were significantly associated with the severity of SF, and the trends in the logarithmic values of the coefficient paths versus the regularization parameter λ are shown in Supplementary material 1.
The final selected predictors included demographic characteristics (Age, Living area, Average sleep time), skeletal health indicators (Bone Density, Bone Mass Index), joint mobility measures (Elbow, Spinal, Knee, Shoulder function), pain level score, bone metabolism markers (PTH, Osteocalcin, PINP, β-CrossLaps, Vitamin D), inflammatory marker (IL-4), hematological indices (WBC, HCT, MCV, MCHC), liver function indicators (Total bilirubin), and an environmental exposure marker (UF).
Evaluation and performance of ML models
We assessed the predictive performance of five ML classifiers using multiple evaluation metrics for predicting SF severity. As AUC was predefined as the primary metric for model comparison, the RF model achieved the highest cross-validated AUC in the training set among all evaluated classifiers. On the training set, the RF model achieved an AUC of 0.875 (95% CI: 0.851–0.913), along with favorable sensitivity (0.903; 95% CI: 0.871–0.936) and F1 score (0.855; 95% CI: 0.832–0.879). It also achieved an accuracy of 0.797 (95% CI: 0.765–0.829) (Fig. 1A; Table 3). When applied to the independent test set, the RF model maintained comparable performance with an AUC of 0.832, accuracy of 0.793, and F1 score of 0.841, indicating stable predictive performance and good generalizability (Fig. 1B; Table 4).
Calibration and decision curve analyses were used as complementary evaluations. Both RF and XGBoost showed good calibration and higher net benefit across decision thresholds (Fig. 1C,D). As RF consistently demonstrated the highest AUC during model development and stable performance in the independent test set, it was selected for subsequent interpretive analysis.
Performance evaluation of ML models for SF severity prediction. (A) ROC curves of the five ML models in the training set. (B) ROC curves of the five ML models in the test set. (C) Calibration curves. (D) Decision curve analysis.
To further explore the contribution of individual features to predicting SF severity, we conducted SHAP value analysis using the RF model. The SHAP analysis identified pain level score, knee function, age, shoulder function, and UF levels as the top five predictors with the greatest impact on the model’s predictions (Fig. 2A). Pain score emerged as the most influential predictor, exhibiting the highest SHAP values (> 0.06), strongly indicative of its critical role in differentiating SF severity. Other key factors, such as bone mass index, place of residence, total bilirubin, and bone metabolism markers (PINP, osteocalcin), also significantly influenced severity predictions.
The SHAP summary plot (beeswarm plot, Fig. 2B) visualizes how each feature contributes to model predictions. Specifically, lower pain scores were highly consistent with the prediction of mild SF, while individuals with no significant activity limitations in knee and shoulder function were similarly categorized as mild SF. Conversely, older age indicated a greater likelihood of moderate to severe SF in the elderly. In addition, lower bone mass index values and higher levels of bone metabolic markers (PINP and osteocalcin) were also associated with SF severity.
SHAP dependence plots for the most important predictors, including pain score and UF concentration, were further analyzed (Fig. 2C and D). UF exhibited a non-linear relationship with fluorosis severity prediction: initial increases in UF concentration sharply raised SHAP values, signifying increased disease severity risk, but beyond a certain threshold, additional increases produced diminishing predictive impact. In contrast, the dependence plot for pain score revealed a more complex pattern; low pain scores were consistently associated with mild fluorosis predictions (negative SHAP values), while higher scores correlated positively and increasingly with predictions of moderate-to-severe SF. These results underline the pivotal role of pain severity and environmental fluoride exposure, captured through UF levels, as predictors of SF severity.
SHAP-based interpretability analysis of key predictors for SF severity. (A) SHAP feature importance plot. (B) SHAP summary plot. (C) SHAP dependence plot for UF. (D) SHAP dependence plot for pain level score.
Discussion
This study presents a robust and interpretable ML model for predicting the severity of SF, integrating clinical biomarkers and environmental fluoride exposure data from three distinct endemic regions in China, representing coal-burning (Guizhou), drinking-water (Shanxi), and brick-tea consumption (Sichuan) exposure pathways. Among the five ML models tested, the RF model demonstrated the highest predictive performance (AUC = 0.875 in training and 0.832 in test), highlighting the utility of ML approaches in capturing the complex interplay between environmental exposure and disease progression. This represents an advancement beyond traditional radiographic severity assessment, offering a complementary data-driven perspective.
The selection of performance metrics is crucial when evaluating ML models in epidemiological contexts where class imbalance is common. Such an imbalance is frequently encountered in studies of chronic diseases, cancer prognosis, and disease severity stratification. In many of these settings, researchers may apply resampling techniques such as SMOTE or other balancing strategies during model training to address skewed class distributions14,15. Importantly, however, these preprocessing strategies do not alter the predominant use of AUC as the primary evaluation metric in medical classification research16,17. In this study, the dataset reflects the real-world distribution of SF severity, where moderate-to-severe cases are naturally less frequent than mild cases. Our objective was not to artificially rebalance the data, but to preserve this distribution to better reflect practical screening scenarios in endemic populations. While some researchers advocate for Precision-Recall (PR) curves in imbalanced settings, recent evidence suggests that AUC maintains robust discriminative power and remains highly competitive with PR-AUC in many medical classification scenarios18,19. Nevertheless, we acknowledge that relying solely on AUC may obscure the practical implications of false positives under specific decision thresholds. To address this, we complemented AUC with threshold-dependent metrics such as the F1-score, which balances precision and recall, as well as calibration and decision curve analyses to provide a more practically interpretable evaluation of model performance.
Pain level emerged as the most influential predictor of SF severity in our model, followed by knee function, age, shoulder function, and UF. This finding aligns closely with the clinical progression of SF, which typically begins with joint pain, stiffness, and limited mobility, and may eventually lead to deformities due to ossification of ligaments and the formation of bony outgrowths7,20. Deformities of the knees, hips, and spine are especially common in moderate-to-severe cases21,22. SHAP analysis revealed a clear gradient between higher pain scores and the predicted probability of severe SF. This suggests that pain, while often considered a subjective symptom, carries substantial risk stratification and severity discrimination value. Previous studies have demonstrated strong correlations between pain level and both SF severity (r = 0.71, P < 0.001) and patient age (r = 0.79, P < 0.001)23. This relationship indicates that pain may reflect not only current disease status but also cumulative exposure and biological vulnerability associated with aging.
In addition, pain scores are closely linked to joint function, particularly in the knees—a site frequently affected by fluorosis-related osteoarticular damage. A study emphasized the utility of pain scores in assessing functional impairment and disease burden in fluorosis-induced osteoarthritis24. This supports our finding that knee and shoulder function were among the top-ranking features in the predictive model. As such, pain scoring and joint function may serve as a practical and cost-effective indicator for early risk identification in endemic communities.
UF, as a direct biomarker of environmental fluoride exposure and body burden, also played a key role in our model23,25. Elevated UF levels have been shown to significantly increase the risk of SF (OR = 7.25)26, and our SHAP analysis confirmed that individuals with higher UF levels were more likely to be classified as having moderate-to-severe SF. Interestingly, the relationship between UF and SF severity appeared to follow a nonlinear threshold effect—suggesting that once UF levels surpass a critical value (e.g., 4 mg/L), disease progression may accelerate more rapidly and irreversibly. This highlights the public health importance of early intervention in high-exposure communities before fluoride accumulation crosses harmful thresholds. Multiple studies have also affirmed the value of UF as a non-invasive, objective biomarker for both individual-level risk assessment and community-level exposure surveillance27,28.
Together, these findings support a comprehensive model in which subjective symptoms (e.g., pain), functional measures (e.g., joint mobility), and objective biomarkers (e.g., UF) jointly contribute to SF severity assessment, risk evaluation, and prevention. The integration of such multidimensional indicators could significantly improve early detection and risk stratification strategies in endemic settings.
Beyond joint symptoms and fluoride exposure, alterations in bone metabolism also emerged as key indicators of SF severity. In particular, elevated levels of two bone formation markers, including PINP and osteocalcin, were strongly associated with moderate-to-severe disease in our model29,30. PINP, an early marker of type I collagen synthesis, reflects enhanced osteoblast proliferation and collagen production, while osteocalcin, secreted by mature osteoblasts, indicates active matrix mineralization31,32. The concurrent elevation of both markers points to a pathophysiological state of high bone turnover in SF patients. This metabolic profile underscores the involvement of systemic bone remodeling processes in the pathogenesis of SF, beyond the localized symptoms and fluoride burden. These findings align with the recognized pathological features of SF, which involve heightened osteogenic activity accompanied by compromised structural integrity of the newly formed bone. Although bone anabolism is accelerated, the newly formed bone tends to be structurally abnormal, characterized by increased density but higher brittleness, which ultimately compromises bone integrity33. Our results reinforce this paradoxical relationship: despite elevated bone formation markers, individuals with moderate-to-severe SF had significantly lower bone mass index and bone density, compared to those with milder disease. This “high turnover, low quality” state of bone metabolism represents a hallmark of osteofluorosis and likely contributes to increased fracture risk, skeletal deformities, and impaired mobility8,34.
Notably, bone metabolic disturbances were evident not only between healthy individuals and SF patients, but also among patients with differing levels of disease severity, indicating a progressive decline in bone quality along the disease continuum. Moreover, such imbalances appear to persist regardless of specific exposure pathways, highlighting the need to incorporate bone quality assessment into early screening and population-level health monitoring of SF. Evaluating markers like PINP and osteocalcin could aid in identifying individuals at risk for progressive bone damage, thereby informing preventive strategies and guiding the development of targeted therapeutic interventions aimed at restoring bone homeostasis.
Our results reveal a gradient association between fluoride exposure pathways and SF severity, with coal-burning areas showing the highest proportion of moderate-to-severe cases, followed by drinking-water and brick-tea regions. This pattern likely stems from differences in exposure complexity, cumulative dose, and intervention effectiveness. Coal-burning fluorosis in Guizhou is particularly severe due to combined dietary and inhalation exposure. Locally used coal and clay contain high fluoride levels- 78 mg/kg and 1116 mg/kg respectively, with coal-clay briquettes reaching 313 mg/kg35. Roasting staple foods like corn leads to surface fluoride accumulation, raising levels to 20.26 mg/kg- 16 times that of unroasted samples36. Despite stove renovation efforts, traditional practices persist, limiting control efficacy25. In contrast, drinking-water-type fluorosis, though globally widespread (≥ 1.5 mg/L exposure in ~ 200 million people; 72.07 million in China)12, has declined due to national interventions like the “Three Five-Year Plans” and the “Three-Year Action Plan”37,38,39,40. Brick-tea-type fluorosis, prevalent in Sichuan, is caused by long-term consumption of fluoride-rich Camellia sinensis products41. Though it leads to chronic accumulation, symptoms are generally milder and reversible upon cessation, making behavioral intervention and education the primary prevention strategy42,43. In summary, the severity of SF correlates with the intensity and multiplicity of fluoride exposure routes. These findings emphasize the need for tailored interventions, ranging from technological and environmental to behavioral measures, based on specific exposure contexts.
This study has several notable strengths. It is the first to construct an interpretable predictive model for SF severity that integrates three major fluoride exposure pathways (coal-burning, drinking-water, and brick-tea types), enhancing both the predictive utility and generalizability of the findings across endemic regions. The identification of key early-stage indicators, such as pain level scores, joint function, and UF, underscores their evaluative value throughout the disease continuum, extending beyond the distinction between patients and healthy individuals to the differentiation among varying degrees of severity. These insights support more precise risk stratification and provide an evidence base for targeted, context-specific public health interventions in fluoride-affected populations worldwide.
Nonetheless, several limitations should be acknowledged. First, the relatively low specificity observed in some models may be attributable to the substantial overlap in clinical and biochemical profiles between mild and moderate-to-severe SF groups, as well as the inherently non-linear and non-separable nature of disease severity progression. These characteristics may increase the likelihood of false-positive classifications when distinguishing between severity levels. Second, the study did not achieve precise quantification of individual-level fluoride exposure. Although regional fluoride types were considered, environmental monitoring data (e.g., from air, water, and food) were not directly integrated, limiting the granularity of exposure assessment. Third, key nutritional factors such as calcium and vitamin D intake, known to modulate fluoride toxicity were not included in the model, potentially affecting the comprehensiveness of disease prediction. Finally, while age was identified as a strong predictor, it does not fully capture cumulative fluoride exposure or distinguish age-related bone degeneration from fluoride-specific skeletal damage. Longitudinal cohort studies with detailed dietary and exposure tracking are needed to disentangle these effects and validate the model’s applicability over time.
Conclusions
This study developed and validated an interpretable machine learning framework for predicting the severity of SF by integrating multi-dimensional predictors, including demographic factors, environmental exposure levels, and key biomarkers. The model provides a practical tool for early identification and risk assessment in affected populations. By supporting timely diagnosis and stratified management, it holds potential value for guiding public health interventions in fluoride-endemic regions. Further work is needed to optimize the model’s performance and evaluate its broader applicability in diverse geographical contexts.
Data availability
Data described in the manuscript will be made available upon request pending application to and approval from the corresponding author PL.
Code availability
The R code used for data preprocessing, feature selection, ML model building, and SHAP analysis in this study is available as Supplementary 2. The provided script includes the comprehensive analysis pipeline necessary to reproduce the findings described in this study.
Abbreviations
- SF:
-
Skeletal fluorosis
- UF:
-
Uurinary fluoride
- PINP:
-
Procollagen I N-terminal propeptide
- PTH:
-
Parathyroid hormone
- WBC:
-
White blood cell count
- NEUT:
-
Neutrophil count
- RBC:
-
Red blood cell count
- HCT:
-
Red blood cell hematocrit
- MCV:
-
Mean corpuscular volume
- MCH:
-
Mean corpuscular hemoglobin
- MCHC:
-
Mean corpuscular hemoglobin concentration
- PDW:
-
Platelet distribution width
- PCT:
-
Platelet crit
- SVM:
-
Support vector machine
- XGBoost:
-
Extreme gradient boosting
- RF:
-
Random forest
- KNN:
-
K-nearest neighbors
- DT:
-
Decision tree
References
Bello, L. D. Fluorosis: an ongoing challenge for India. Lancet Planet. Health. 4, e94–e95 (2020).
Podgorski, J. & Berg, M. Global analysis and prediction of fluoride in groundwater. Nat. Commun. 13, 4232 (2022).
Malinowska, E., Inkielewicz, I., Czarnowski, W. & Szefer, P. Assessment of fluoride concentration and daily intake by human from tea and herbal infusions. Food Chem. Toxicol. 46, 1055–1061 (2008).
Wei, W., Pang, S. & Sun, D. The pathogenesis of endemic fluorosis: Research progress in the last 5 years. J. Cell. Mol. Med. 23, 2333–2342 (2019).
Solanki, Y. S., Agarwal, M., Gupta, A. B., Gupta, S. & Shukla, P. Fluoride occurrences, health problems, detection, and remediation methods for drinking water: A comprehensive review. Sci. Total Environ. 807, 150601 (2022).
Zhao, L. et al. Spatial-Temporal Analysis of Drinking Water Type of Endemic Fluorosis — China, 2009–2022. CCDCW 6, 25–29 (2024).
Krishnamachari, K. A. Skeletal fluorosis in humans: a review of recent progress in the understanding of the disease. Prog Food Nutr. Sci. 10, 279–314 (1986).
Sellami, M. et al. Skeletal fluorosis: don’t miss the diagnosis! Skeletal Radiol. 49, 345–357 (2020).
Yang, C., Wang, Y. & Xu, H. Treatment and Prevention of Skeletal Fluorosis. Biomed. Environ. Sci. 30, 147–149 (2017).
Handelman, G. S. et al. eDoctor: machine learning and the future of medicine. J. Intern. Med. 284, 603–619 (2018).
Luo, K., Li, L. & Zhang, S. Coal-burning roasted corn and chili as the cause of dental fluorosis for children in southwestern China. J. Hazard. Mater. 185, 1340–1347 (2011).
Sun, D., Gao, Y. & Liu, H. Achievements and prospects of endemic disease prevention and control in China in past 70 years. zgggws 35, 793–796 (2019).
Cao, J. et al. The relationship of fluorosis and brick tea drinking in Chinese Tibetans. Environ. Health Perspect. 104, 1340–1343 (1996).
Singh, J. et al. Batch-balanced focal loss: a hybrid solution to class imbalance in deep learning. J. Med. Imaging (Bellingham). 10, 051809 (2023).
Megahed, F. M. et al. The class imbalance problem. Nat. Methods. 18, 1270–1272 (2021).
Zheng, Y., Zhang, C. & Liu, Y. Risk prediction models of depression in older adults with chronic diseases. J. Affect. Disord. 359, 182–188 (2024).
Xiao, L. et al. Development and validation of an interpretable machine learning model for predicting in-hospital mortality for ischemic stroke patients in ICU. Int. J. Med. Informatics. 198, 105874 (2025).
Richardson, E. et al. The receiver operating characteristic curve accurately assesses imbalanced datasets. Patterns (N Y). 5, 100994 (2024).
Fu, G. H., Yi, L. Z. & Pan, J. Tuning model parameters in class-imbalanced learning with precision-recall curve. Biom. J. 61, 652–664 (2019).
Meena, L. & Gupta, R. Skeletal Fluorosis. N. Engl. J. Med. 385, 1510–1510 (2021).
Kumar, H., Boban, M. & Tiwari, M. Skeletal fluorosis causing high cervical myelopathy. J. Clin. Neurosci. 16, 828–830 (2009).
Richards, A., Mosekilde, L. & Søgaard, C. H. Normal age-related changes in fluoride content of vertebral trabecular bone—Relation to bone quality. Bone 15, 21–26 (1994).
Chen, G. et al. Clinical characteristics and related factors of elbow joint disease in skeletal fluorosis. Chin. J. Rheumatol. 28, 392–397 (2024).
Singh, V. et al. Clinical and Radiological Study of Serum Fluoride in Relation to Knee Osteoarthritis. Malays Orthop. J. 14, 151–154 (2020).
Chen, J. et al. Coal utilization in China: environmental impacts and human health. Environ. Geochem. Health. 36, 735–753 (2014).
Rawat, N. & Bafana, A. Health risk modeling and risk factors of fluorosis in the fluoride endemic village of Maharashtra: A cross-sectional study. Environ. Monit. Assess. 196, 1230 (2024).
Rugg-Gunn, A. J., Villa, A. E. & Buzalaf, M. R. A. Contemporary biological markers of exposure to fluoride. Monogr. Oral Sci. 22, 37–51 (2011).
Choubisa*, S. L. & Choubisa, A. A. Brief Review of Ideal Bio-Indicators, Bio-Markers and Determinants of Endemic of Fluoride and Fluorosis. J. Biomedical Res. Environ. Sci. 2, 920–925 (2021).
Bayer, M. & Palicka, V. Utilization and Reference Values of Bone Turnover Markers: Osteocalcin and Procollagen Type 1 N-Propeptide. Biomarkers in Bone Disease. 239-252 (2017).
Rathore, B., Singh, M., Kumar, V. & Misra, A. Osteocalcin: an emerging biomarker for bone turnover. Int. J. Res. Med. Sci. 4, 3670–3674 (2016).
Xu, H., Wang, C., Zhao, Z., Zhang, W. & Li, G. Role of Oxidative Stress in Osteoblasts Exposed to Sodium Fluoride. Biol. Trace Elem. Res. 123, 109–115 (2008).
Xu, H., Zhou, Y., Zhang, X., Lu, P. & Li, G. Activation of PERK signaling through fluoride-mediated endoplasmic reticulum stress in OS732 cells. Toxicology 277, 1–5 (2010).
Park, Y. A. et al. Skeletal fluorosis secondary to methoxyflurane use for chronic pain. JBMR Plus. 8, ziae032 (2024).
Iamandii, I., Mazzoli, R., De Pasquale, L., Vinceti, M. & Filippini, T. Fluoride exposure and risk of fractures: a systematic review and dose-response meta-analysis. Eur. J. Public Health 34, ckae144.1420. (2024).
Luo, K., Li, L. & Zhang, S. Coal-burning roasted corn and chili as the cause of dental fluorosis for children in southwestern China. J. Hazard. Mater. 185, 1340–1347 (2011).
Guo, J., Wu, H., Zhao, Z., Wang, J. & Liao, H. Review on Health Impacts from Domestic Coal Burning: Emphasis on Endemic Fluorosis in Guizhou Province, Southwest China. Reviews of Environmental Contamination and Toxicology. 258, 1-25 (2021).
Su, X. et al. Achievements, main experiences, challenges and suggestions on control of endemic diseases in China in the last decade. Chin. J. Endemiol. 780–784 (2022).
Sun, D. et al. Current main problems and countermeasures for prevention and control of endemic diseases in China. Chin. J. Endemiol. 1–3 (2023).
Zhang, L. et al. Probabilistic risk assessment of Chinese residents’ exposure to fluoride in improved drinking water in endemic fluorosis areas. Environ. Pollut. 222, 118–125 (2017).
Wang, F. et al. Epidemiological analysis of drinking water-type fluorosis areas and the impact of fluorosis on children’s health in the past 40 years in China. Environ. Geochem. Health. 45, 9925–9940 (2023).
Malinowska, E., Inkielewicz, I., Czarnowski, W. & Szefer, P. Assessment of fluoride concentration and daily intake by human from tea and herbal infusions. Food Chem. Toxicol. 46, 1055–1061 (2008).
Izuora, K. et al. Skeletal Fluorosis from Brewed Tea. J. Clin. Endocrinol. Metabolism. 96, 2318–2324 (2011).
Kurland, E. S. et al. Recovery From Skeletal Fluorosis (an Enigmatic, American Case). J. Bone Miner. Res. 22, 163–170 (2007).
Acknowledgements
We sincerely thank all the participants of the China Fluorosis Cohort (CFC) for their valuable contributions to this study. We would also like to thank the local Centers for Disease Control and Prevention (CDC) teams in Shanxi, Guizhou, and Sichuan, as well as the faculty at Harbin Medical University for their support in participant recruitment and data collection. Finally, we would like to thank each of the participating doctors and community workers for their help during the implementation of the project!
Funding
This work was supported by the National Key Research and Development Program of China (Grant no: 2022YFC2503003) and Guizhou Provincial Major Scientific and Technological Program (2024-015).
Author information
Authors and Affiliations
Contributions
H.L. and J.Z. wrote the main manuscript text and performed data analysis with model fitting. S.W., T.H., and Y.L. conducted data pre-processing and verification. Q.J., J.L., and H.Y. performed field investigations and data collection. Y.G. prepared visualizations and reviewed the manuscript. Z.Z. and P.L. supervised the study and revised the manuscript. All authors reviewed and approved the final manuscript.
Corresponding authors
Ethics declarations
Competing interests
The authors declare no competing interests.
Ethics approval and consent to participate
This study was approved by the Ethics Committee of Guizhou Medical University (Approval No. 2023 − 189). Written informed consent was obtained from all participants.
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary Information
Below is the link to the electronic supplementary material.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.
About this article
Cite this article
Long, H., Zeng, J., Wei, S. et al. Predicting skeletal fluorosis severity using machine learning across diverse fluoride-exposed populations in China. Sci Rep 16, 13858 (2026). https://doi.org/10.1038/s41598-026-43429-4
Received:
Accepted:
Published:
Version of record:
DOI: https://doi.org/10.1038/s41598-026-43429-4




