Harnessing machine learning for the development, validation, and prognostic evaluation of MASHRisk score: insights from a multicohort study

Ye, Bicheng; Teng, Yuming; Chen, Qianqian; Zhang, Jie; Li, Shun; Dong, Li; Xu, Changan; Qiao, Xiao; Miao, Xiaye

doi:10.1038/s41746-025-02220-x

Download PDF

Article
Open access
Published: 08 December 2025

Harnessing machine learning for the development, validation, and prognostic evaluation of MASHRisk score: insights from a multicohort study

Bicheng Ye¹^na1,
Yuming Teng²^na1,
Qianqian Chen³^na1,
Jie Zhang⁴^na1,
Shun Li⁵^na1,
Li Dong⁶,
Changan Xu³,
Xiao Qiao³ &
…
Xiaye Miao⁷

npj Digital Medicine volume 9, Article number: 47 (2026) Cite this article

3246 Accesses
Metrics details

Subjects

Abstract

Metabolic dysfunction-associated steatohepatitis (MASH) increases liver-related mortality risk more than tenfold, yet reliable predictive biomarkers remain scarce. This study developed the MASHRisk score, a blood-based non-invasive diagnostic tool integrating routine clinical and biochemical panels. Using ten machine learning algorithms, the score was derived from 218 participants and validated across multiple cohorts (n = 93, 96, and 26,256). The MASHRisk score demonstrated robust diagnostic performance with area under the receiver operating characteristic curve (AUC) values of 0.791, 0.793, 0.806, and 0.796 across training, validation, and test sets, respectively. It emerged as an independent predictor of MASH (p < 0.001) and outperformed existing indices including Fibrosis-4 (FIB-4), aspartate aminotransferase to Platelet Ratio Index (APRI), aspartate aminotransferase to alanine aminotransferase Ratio (AAR), and Non-Alcoholic Fatty Liver Disease Fibrosis Score (NFS). In a prognostic cohort of 390,574 individuals, high-risk participants showed significantly elevated hazard ratios (HR) for liver-related mortality (HR: 12.296), MASH (HR: 12.829), cirrhosis (HR: 8.863), hepatocellular carcinoma (HR: 9.278), atherosclerotic cardiovascular disease (ASCVD)-related mortality (HR: 2.303), and all-cause mortality (HR: 1.744) compared to low-risk individuals (all p < 0.001). The MASHRisk score represents a validated, user-friendly tool for early detection, risk stratification, and outcome prediction in MASH.

An explainable machine learning model for prediction of high-risk nonalcoholic steatohepatitis

Article Open access 13 April 2024

Clinical validation of an AI-based pathology tool for scoring of metabolic dysfunction-associated steatohepatitis

Article Open access 04 November 2024

AI-based automation of enrollment criteria and endpoint assessment in clinical trials in liver diseases

Article Open access 07 August 2024

Introduction

Metabolic dysfunction-associated steatotic liver disease (MASLD) is strongly associated with obesity, type 2 diabetes (T2D) mellitus, and atherosclerotic cardiovascular disease (ASCVD), and is linked to an increased risk of both all-cause and liver-related mortality^1,2. The progression from MASLD to metabolic dysfunction-associated steatohepatitis (MASH) is associated with a two-fold increase in all-cause mortality and more than a tenfold increase in the risks of hepatocellular carcinoma and liver-related mortality³. Thus, the identification of MASH is crucial for predicting individual clinical outcomes and informing personalized treatment strategies⁴.

Liver biopsy has long been regarded as the gold standard for diagnosing MASH⁵. However, its routine implementation in clinical settings is constrained by its invasive nature, poor patient acceptance, high costs, and substantial inter-observer variability. Moreover, the procedure involves risks, including post-procedural pain in nearly 50% of patients, and life-threatening hemorrhage in 0.6% to 1% of cases⁶. To enhance the diagnosis of MASH at the population level, it is crucial to develop non-invasive tests (NITs) that exhibit high diagnostic performance. This approach would not only decrease reliance on liver biopsies but also aid in stratifying patients who need follow-up treatments. Over the last 2 years, numerous international validation studies have investigated non-invasive methods for detecting MASH accompanied by fibrosis. These studies utilized a combination of imaging techniques and circulating biomarkers, such as the FAST (FibroScan-aspartate aminotransferase) score⁷ and NIS4⁸. However, crucial components for these evaluations, including vibration-controlled transient elastography, miR-34a-5p, YKL-40, and alpha-2 macroglobulin, are frequently inaccessible within primary care contexts. In light of these limitations, machine learning-based integration of routine clinical laboratory parameters and physical examination measurements represents a promising alternative strategy⁹. Machine learning algorithms excel at capturing complex, non-linear patterns and multi-dimensional interactions among readily available clinical variables, enabling superior predictive performance compared to traditional scoring systems. This approach facilitates the development of practical, cost-effective screening tools for population-level MASH detection in primary care settings¹⁰.

Consequently, there remains an unmet need to develop accurate tools using readily available laboratory and clinical variables to identify individuals at high risk of developing MASH and its complications, including liver-related events and mortality. Such tools would enable early detection and personalized monitoring of high-risk individuals and patients with MASH in primary care and other non-specialized healthcare settings, before the development of cirrhosis or its complications.

Results

Study population characteristics and cohort composition

The workflow of this study is illustrated in Fig. 1. The study included a total of 218 participants in the training cohort, 93 in the validation cohort, 96 in the test cohort 1, 26,256 in the test cohort 2, and 390,574 in the prognosis cohort. Table 1 presents the baseline characteristics of the participants. The mean age of participants ranged from 31 to 57 years, with the proportion of males varying from 31 to 71%. The training, validation, and test cohort 1 were obtained from hospitals in China, where the MASH positivity rate was between 44 and 48%. In contrast, the test cohort 2 was derived from the general population in the United Kingdom, which exhibited a MASH positivity rate of 4.4%. The robustness of our dataset is evidenced by the diverse demographic characteristics of the participants, which theoretically enhances the development of a broadly applicable MASH prediction model.

Table. 1 Baseline characteristics

Full size table

Development of the MASHRisk score

We calculated the correlation coefficients for the selected features in the training cohort, noting that the coefficients for aspartate aminotransferase (AST) and alanine aminotransferase (ALT), as well as those for low-density lipoprotein cholesterol (LDL-C) and total cholesterol, exceeded 0.7 (Fig. 2a). Employing the principle of maximizing the area under the curve (AUC) for feature selection, we ultimately identified the following features: AST, LDL-C, sex, age, body mass index (BMI), gamma-glutamyl transferase (GGT), albumin, total bilirubin (TB), uric acid, hemoglobin A1c (HbA1c), triglycerides (TG), and high-density lipoprotein cholesterol (HDL-C) (Fig. 2b, c). These features were subsequently utilized in the Boruta algorithm¹¹, which identified age, BMI, AST, HbA1c, and TG as the final modeling features (Supplementary Fig. 1a). Following this, we developed models using ten distinct machine learning algorithms based on the training set and assessed their performance on the validation set. Our results revealed that the model generated using the Partial Least Squares (PLS) algorithm achieved the highest AUC for predicting MASH in the validation set (Supplementary Fig. 1b, Supplementary Table 1). Consequently, we selected the PLS-derived model as the MASHRisk score. In the training, validation, test cohort 1, and test cohort 2, the MASHRisk score predicted MASH with AUC values of 0.791 (0.732–0.851), 0.793 (0.698–0.887), 0.806 (0.717–0.895), and 0.796 (0.785–0.808), respectively (Fig. 2d–g). The MASHRisk scores for participants diagnosed with MASH were significantly higher than those for non-MASH participants (all p < 0.001) (Fig. 2d–g). To promote broader utilization among researchers, we implemented a freely accessible online tool for MASHRisk score calculation, which can be accessed at https://njph.shinyapps.io/mashrisk/.

**Fig. 2: Development and validation of the MASHRish score.**

Assessment of the MASHRisk score

The multivariable logistic regression analyses conducted on the MASHRisk score across four distinct cohorts, while controlling for sex, age, and BMI as covariates, revealed that the MASHRisk score remained a statistically significant independent predictor of MASH (all p < 0.001) (Fig. 3a–d). Utilizing the MASHRisk score, we identified an optimal cutoff value of 0.457 from the receiver operating characteristic (ROC) curve analysis conducted in the training cohort, which facilitated the generation of confusion matrices for each subsequent cohort. In the training cohort, the model demonstrated a sensitivity of 0.729 and a specificity of 0.738 (Fig. 3e, Supplementary Table 2). In contrast, the validation cohort exhibited an increased sensitivity of 0.689, accompanied by a decreased specificity of 0.833 (Fig. 3f, Supplementary Table 2). For test cohort 1, sensitivity was recorded at 0.605, with specificity significantly higher at 0.906 (Fig. 3g, Supplementary Table 2). In test cohort 2, the sensitivity was noted to be 0.779, while specificity dropped to 0.689 (Fig. 3h, Supplementary Table 2). It is noteworthy that test cohort 2 reflects a general population that exhibits a MASH positivity rate of only 4.4%, and the MASHRisk score showed a negative predictive value of 99.0% in this cohort (Fig. 3h, Supplementary Table 2). Overall, these results suggest that the MASHRisk score is a valuable tool for predicting MASH and demonstrates consistent performance across different populations.

Comparison of the MASHRisk score with other NITs

We conducted a comprehensive comparison of the predictive performance of the MASHRisk score against Fibrosis-4 Index (FIB-4), AST to Platelet Ratio Index (APRI), AST to ALT Ratio (AAR), and Non-alcoholic Fatty Liver Disease Fatty Liver Severity Score (NFS) using ROC curves, calibration curves, and decision curve analysis (DCA). It is important to note that, due to the absence of platelet (PLT) data in test cohort 1, only the AAR could be calculated in this cohort. Firstly, among the four cohorts, the ROC curves indicated that the MASHRisk score exhibited the highest AUC for predicting MASH (Fig. 4a). Calibration curves demonstrated that the MASHRisk score’s predicted probabilities were closer to actual probabilities, reflecting the lowest Brier score (Fig. 4b). Additionally, DCA revealed that the MASHRisk score provided greater clinical net benefits compared to other NITs (Fig. 4c). In summary, the MASHRisk score offers superior predictive performance for MASH compared to FIB-4, APRI, AAR, and NFS, warranting its promotion in clinical practice.

**Fig. 4: Comparison of the MASHRisk score with other non-invasive tests (NITs).**

Model interpretation

We utilized the SHapley Additive exPlanations (SHAP) framework to interpret the MASHRisk score. The mean absolute SHAP values illustrated the impact of various risk factors, revealing that AST was the most significant variable (Supplementary Fig. 2a, b). To further elucidate the interpretability of the MASHRisk score, we highlighted two representative cases. SHAP values were employed to assess the influence of each feature on the predictions made by the MASHRisk score. In this study, diminished SHAP values correlated with a reduced likelihood of MASH, whereas elevated SHAP values indicated an increased probability. For example, patient 1, diagnosed with MASH, presented with an elevated SHAP value and a MASHRisk score of 0.884, suggestive of a greater MASH likelihood (Supplementary Fig. 2c). Conversely, patient 2, without MASH, exhibited a lower SHAP value and a prediction score of 0.353, indicating a decreased probability of MASH (Supplementary Fig. 2d).

Prognostic value of the MASHRisk score

In the prognostic cohort, participants were stratified into two categories: low-risk (n = 11,596; 3%) and high-risk (n = 378,978; 97%). Liver-related mortality was observed in 0.1% (n = 417) of the low-risk group, compared to 1.3% (n = 152) of the high-risk group. The high-risk group demonstrated a significantly elevated hazard ratio for liver-related mortality compared to the low-risk group, with a hazard ratio (HR) of 12.296 (95% CI: 10.212–14.806, p < 0.001) (Fig. 5a). Time-dependent ROC analysis revealed strong prognostic performance, exhibiting AUC values of 0.83 at 3 years, 0.85 at 5 years, and 0.80 at 10 years (Fig. 5a). The incidence of MASH events was 0.4% in the low-risk group and 1.2% (n = 141) in the high-risk group, with the latter exhibiting a significantly increased hazard ratio (HR = 12.829, 95% CI: 10.560–15.585, p < 0.001) compared to the low-risk group (Fig. 5b). Time-dependent ROC analysis for MASH events showed strong prognostic accuracy, yielding AUCs of 0.74 at 3 years, 0.79 at 5 years, and 0.81 at 10 years (Fig. 5b). For cirrhosis events, the incidence was recorded at 0.4% (n = 1676) in the low-risk group and 3.8% (n = 446) in the high-risk group. The high-risk group showed a significantly increased risk for cirrhosis with an HR of 8.863 (p < 0.001, 95% CI: 7.985–9.839) compared to the low-risk group (Fig. 5c). Time-dependent ROC analyses for cirrhosis yielded AUC values of 0.75 at 3 years, 0.75 at 5 years, and 0.75 at 10 years (Fig. 5c). Regarding liver cancer, the incidence was found to be 0.07% (n = 254) in the low-risk group and 0.6% (n = 72) in the high-risk group. The high-risk group exhibited a significantly elevated risk of developing liver cancer compared to the low-risk group (HR = 9.278, 95% CI: 7.142–12.059, p < 0.001) (Fig. 5d). Time-dependent ROC analysis for liver cancer demonstrated AUC values of 0.77 at 3 years, 0.75 at 5 years, and 0.74 at 10 years (Fig. 5d). Finally, the high-risk group showed a significantly increased risk for ASCVD and all-cause mortality compared to the low-risk group, with hazard ratios of 2.303 (95% CI: 2.047–2.592, p < 0.001) and 1.744 (95% CI: 1.661–1.831, p < 0.001), respectively (Supplementary Fig. 3a, b).

**Fig. 5: Prognostic value of MASHRisk score in the prognostic cohort.**

Enrichment analysis of the MASHRisk score

To explore the biological mechanisms of the MASHRisk score, we performed pathway enrichment analysis. Both over-representation analysis (ORA) and gene set enrichment analysis (GSEA) revealed that the high-risk group was significantly enriched in metabolic and immune-related pathways, such as the carboxylic acid metabolic process, carbon metabolism, and chemokine signaling pathway (Fig. 6a, b). This finding is in line with the known pathophysiology of MASH, as metabolic dysregulation and chronic inflammation are central to its development^12,13.

Discussion

We developed a biochemical examination-based MASH prediction score in a Chinese liver biopsy cohort using 14 candidate clinical parameters. By employing a machine learning pipeline, we identified that a combination of five essential parameters, namely age, BMI, AST, HbA1c, and TG, effectively predicts MASH. Furthermore, the MASHRisk score demonstrated successful external validation across various ethnic groups within a cohort from the British population, achieving an AUC of 0.806. Furthermore, this score demonstrates a clinically significant improvement in detecting MASH compared to the widely used FIB-4 and other NITs. A strong association with liver-related events underscores the clinical utility of this innovative referral tool.

Each element of the MASHRisk score has previously shown a robust individual correlation with MASH. Obesity is a recognized risk factor for fatty liver disease, and patients frequently exhibit elevated liver enzyme levels, thereby justifying the inclusion of BMI and AST as predictive factors^14,15,16. Age is a primary risk factor for chronic liver diseases, including MASLD and MASH¹⁷. Elevated TG levels are often observed in patients with metabolic syndrome, which is a significant risk factor for the development and progression of MASH¹⁷. High TG levels can contribute to hepatic steatosis and inflammation, thereby increasing the risk of MASH¹⁷. HbA1c, a marker of long-term blood glucose control, is closely related to T2D, which is one of the strongest risk factors for MASH¹⁸. The MASHRisk score takes into account individuals with elevated liver enzymes, diabetes, obesity, and hyperlipidemia, which may be one of the reasons for the outstanding performance of the MASH score.

MASLD has become the primary contributor to liver-related health issues and deaths globally¹⁹. Recognizing patients with MASLD who are at an increased risk for liver disease-related mortality is crucial in clinical settings. Previous research indicates that significant hepatic steatosis—assessed through ultrasonography or by using surrogate markers such as liver enzyme levels and various steatosis scores (including the ultrasound liver fat index, fatty liver index, NAFLD liver fat score, and hepatic steatosis index)—is linked to a higher risk of liver-related mortality^20,21. Nevertheless, these parameters alone are not highly effective; for example, the mortality risk only rises by four to five times among individuals exhibiting the highest liver enzyme levels or intermediate to high steatosis indices. Simple fibrosis scores, including AAR, FIB-4, APRI, and NFS, are commonly employed in clinical practice to estimate advanced liver fibrosis and may contribute to minimizing the need for liver biopsies^22,23. We found that the performance of the MASHRisk score was superior to several established NITs, including AAR, APRI, FIB-4, and NFS scores, in accurately diagnosing MASH. It is important to note that our laboratory tests are entirely based on biochemical examinations, such as FIB-4, NFS, and APRI, and also incorporate relevant complete blood count parameters like PLT. This increases the complexity of clinical implementation and testing costs, as it necessitates an additional blood sample for PLT testing. Incorporating the MASHRisk score as a targeted screening tool in primary care settings could alleviate unnecessary patient anxiety. Its user-friendly design encourages adoption across various healthcare settings, including resource-limited settings where advanced diagnostic tools may be unavailable. By integrating this score into routine check-ups, healthcare providers can prioritize patients for further diagnostic evaluation, lifestyle modifications, or treatment, ultimately enhancing patient outcomes. From a public health perspective, the widespread adoption of the MASHRisk score can help mitigate the burden of MASH by facilitating early detection and intervention. This, in turn, may lower healthcare costs related to advanced liver disease and improve the quality of life for affected individuals.

While the MASHRisk score demonstrates strong predictive performance, successful implementation requires addressing key challenges: integration with electronic health records, laboratory standardization across facilities, and clinician training on score interpretation and patient management. We propose practical solutions, including the development of digital calculators and mobile applications, standardized laboratory protocols, and clear implementation guidelines with defined action thresholds for referral and intervention. The score’s reliance on routinely available parameters makes it suitable for resource-limited settings where advanced diagnostics are inaccessible, maximizing its real-world impact in reducing the global burden of MASH.

This study has several limitations. First, the use of proton density fat fraction (PDFF) as a surrogate marker for MASH in the test 2 cohort may be open to discussion. Second, the potential for longitudinal monitoring of liver health using the MASHRisk score has yet to be verified. Long-term prospective follow-up studies are essential to validate its accuracy and determine its clinical utility in tracking disease progression or regression in response to therapeutic interventions.

In summary, the MASHRisk score is a validated, user-friendly, non-invasive tool applicable to diverse populations. It exhibits robust predictive ability across various cohorts and effectively assesses the long-term risk of liver-related outcomes. As healthcare systems transition towards managing chronic diseases on a population level, the MASHRisk score presents a scalable approach for early detection and risk stratification. With its adaptability to different healthcare environments^24,25, MASHRisk shows potential for practical implementation and for alleviating the global burden of MASH.

Methods

Study design

The primary objective of the first phase was to develop and validate a blood-based NIT, the MASHRisk score, which integrates routine clinical and biochemical examinations to diagnose MASH. The second phase aimed to evaluate whether the MASHRisk score could predict subsequent liver-related events in patients with MASLD. The candidate predictors and their distributions are detailed in Table 1. The workflow of this study is illustrated in Fig. 1, which was created using the BioGDP platform (https://biogdp.com/)²⁶.

Participants

The study involved participants from the First Affiliated Hospital of Nanjing Medical University in Nanjing, China, who were randomly divided into a training cohort and a validation cohort in a 7:3 ratio. These MASLD patients were treated within the departments of Infectious Disease or General Surgery and Bariatric and Metabolic Surgery, with data collected from August 2023 to July 2024. Inclusion criteria mandated that patients had undergone a liver biopsy and histologic examination. Participants were excluded if they met any of the following criteria: (1) viral hepatitis; (2) previous cirrhosis; (3) previous liver cancer or other malignancies; (4) incomplete baseline information. Ethical approval was granted by the Ethics Committee of the First Affiliated Hospital of Nanjing Medical University (2023-SR-398). The test cohort 1 was derived from a previous study, and the specific inclusion and exclusion criteria can be referenced in that study²⁷. Test cohort 2 and the prognostic cohort were derived from the UK Biobank, a large prospective cohort study that recruited approximately 500,000 participants aged 40–69 years from across the United Kingdom between 2006 and 2010. The inclusion criteria for test cohort 2 required participants to have undergone PDFF evaluation. The inclusion criteria for the prognostic cohort specified that participants should have complete follow-up information. The exclusion criteria for these cohorts included: (1) viral hepatitis; (2) previous cirrhosis; (3) previous liver cancer or other malignancies; and (4) incomplete baseline information.

Liver histology

Liver sections embedded in paraffin were subjected to a standard staining protocol, utilizing hematoxylin and eosin as well as Masson trichrome. The evaluation of pathological changes in each sample was conducted by two experienced pathologists who were blinded to the experimental groups. This assessment was carried out according to the scoring system established by the NASH Clinical Research Network²⁸.

Outcome definitions and other definitions

In the training, validation, and test cohort 1, a NAS score of 5 or higher was used to diagnose MASH²⁷. In the test cohort 2, MASH was defined as a PDFF exceeding 5.5% and an iron-corrected T1 mapping (cT1) value greater than 800 ms. To evaluate prognostic accuracy in the prognostic cohort, key outcomes included liver-related events, such as the development of MASH, cirrhosis, hepatocellular carcinoma, liver-related mortality, ASCVD, and all-cause mortality. A comprehensive definition of liver-related events is provided in Supplementary Table 3.

Modeling strategy

Within the derivation cohort, the Boruta algorithm, implemented via the Boruta R package¹¹, was initially employed to discern salient variables correlated with MASH. Subsequently, ten machine learning models were developed using the caret R package²⁹, based on the following algorithms: K-Nearest Neighbors, Gradient Boosting Machine, Recursive Partitioning and Regression Trees, Neural Network, the Support Vector Machine, AdaBoost, Random Forest, Extreme Gradient Boosting, Generalized Linear Model with Elastic Net Regularization (GLMNET), and PLS. The PLS algorithm exhibited the best predictive performance in the validation cohort and was selected to construct the MASHRisk score, with grid search identifying optimal performance at ncomp = 1. To enhance the understanding of the best model’s predictions, SHAP values were applied, and the shapviz R package was utilized to interpret the complex machine learning model³⁰.

Proteomics assays

High-throughput proteomics measurements were conducted using Olink®³¹. Details pertaining to the Olink proteomics assay, as well as data processing and quality control measures, have been documented in other sources. In summary, 2941 protein quantification values were obtained from their Normalized Protein eXpression, presented on Olink’s arbitrary log₂ scale³¹. Differential protein analysis was conducted using the limma package.

Enrichment analysis

Protein pathway enrichment analysis was conducted using two strategies. First, we employed the ORA method using Metascape³². Additionally, we performed GSEA using the clusterProfiler package³³. Differentially expressed proteins were identified using the limma package in R. Proteins with adjusted P value (adj.P.Val) < 0.05 and log2 fold change (log2FC) > 0.5 were selected for ORA.

Statistical analysis

We used z-score standardization during model development, with scaling parameters estimated on the training set and applied to validation and test cohorts. Missing observations were removed prior to scaling, and categorical variables were not scaled. Continuous variables are reported as means accompanied by standard deviations, while categorical variables are shown as counts and percentages. To evaluate relationships between the variables, Pearson or Spearman correlation methods were employed. The diagnostic efficacy of the MASHRisk score and other indices was determined through ROC curves. The optimal cutoff was determined on the training set by maximizing Youden’s index on the ROC. This threshold was then fixed and applied unchanged to the validation and test cohorts. Calibration curves were created to assess the alignment between predicted probabilities and actual outcomes. Furthermore, DCA was performed to investigate the clinical relevance of the models by calculating the net benefit of applying these scores at varying threshold probabilities. The follow-up period was established using the Kaplan–Meier approach. A p value of less than 0.05 was considered statistically significant. All statistical analyses were carried out using R version 4.4.2.

Data availability

The data for the training cohort and the validation cohort are not publicly accessible to protect patient privacy. Test cohort 1 is derived from a previously published study that is publicly available²⁷. Test cohort 2 and the prognosis cohort are sourced from the UK Biobank, with Application Number 351033.

Code availability

While the source code is not openly accessible, it may be shared with eligible researchers who submit reasonable requests to the corresponding author.

References

Targher, G., Tilg, H. & Byrne, C. D. Non-alcoholic fatty liver disease: a multisystem disease requiring a multidisciplinary and holistic approach. Lancet Gastroenterol. Hepatol. 6, 578–588 (2021).
Article PubMed Google Scholar
Simon, T. G., Roelstraete, B., Khalili, H., Hagström, H. & Ludvigsson, J. F. Mortality in biopsy-confirmed nonalcoholic fatty liver disease: results from a nationwide cohort. Gut 70, 1375–1382 (2021).
Article PubMed Google Scholar
Younossi, Z. M. et al. Global epidemiology of nonalcoholic fatty liver disease-Meta-analytic assessment of prevalence, incidence, and outcomes. Hepatology 64, 73–84 (2016).
Article PubMed Google Scholar
Rinella, M. E. & Sanyal, A. J. Management of NAFLD: a stage-based approach. Nat. Rev. Gastroenterol. Hepatol. 13, 196–205 (2016).
Article CAS PubMed Google Scholar
Chalasani, N. et al. The diagnosis and management of nonalcoholic fatty liver disease: practice guidance from the American Association for the Study of Liver Diseases. Hepatology 67, 328–357 (2018).
Article PubMed Google Scholar
Neuberger, J. et al. Guidelines on the use of liver biopsy in clinical practice from the British Society of Gastroenterology, the Royal College of Radiologists and the Royal College of Pathology. Gut 69, 1382–1403 (2020).
Article PubMed Google Scholar
Newsome, P. N. et al. FibroScan-AST (FAST) score for the non-invasive identification of patients with non-alcoholic steatohepatitis with significant activity and fibrosis: a prospective derivation and global validation study. Lancet Gastroenterol. Hepatol. 5, 362–373 (2020).
Article PubMed PubMed Central Google Scholar
Harrison, S. A. et al. A blood-based biomarker panel (NIS4) for non-invasive diagnosis of non-alcoholic steatohepatitis and liver fibrosis: a prospective derivation and global validation study. Lancet Gastroenterol. Hepatol. 5, 970–985 (2020).
Article PubMed Google Scholar
Cai, G. et al. Artificial intelligence-based models enabling accurate diagnosis of ovarian cancer using laboratory tests in China: a multicentre, retrospective cohort study. Lancet Digit. Health 6, e176–e186 (2024).
Article CAS PubMed Google Scholar
Greener, J. G., Kandathil, S. M., Moffat, L. & Jones, D. T. A guide to machine learning for biologists. Nat. Rev. Mol. Cell Biol. 23, 40–55 (2022).
Article CAS PubMed Google Scholar
Kursa, M. B. & Rudnicki, W. R. Feature Selection with the Boruta Package. J. Stat. Softw. 36, 1–13 (2010).
Article Google Scholar
Steinberg, G. R., Valvano, C. M., De Nardo, W. & Watt, M. J. Integrative metabolism in MASLD and MASH: pathophysiology and emerging mechanisms. J. Hepatol. 83, 584–595 (2025).
Article CAS PubMed Google Scholar
Sawada, K., Chung, H., Softic, S., Moreno-Fernandez, M. E. & Divanovic, S. The bidirectional immune crosstalk in metabolic dysfunction-associated steatotic liver disease. Cell Metab. 35, 1852–1871 (2023).
Article CAS PubMed PubMed Central Google Scholar
Nussbaumerova, B. & Rosolova, H. Obesity and dyslipidemia. Curr. Atheroscler. Rep. 25, 947–955 (2023).
Article CAS PubMed Google Scholar
Chalasani, N. et al. The diagnosis and management of non-alcoholic fatty liver disease: practice guideline by the American Association for the Study of Liver Diseases, American College of Gastroenterology, and the American Gastroenterological Association. Hepatology 55, 2005–2023 (2012).
Article PubMed Google Scholar
Njei, B., Osta, E., Njei, N., Al-Ajlouni, Y. A. & Lim, J. K. An explainable machine learning model for prediction of high-risk nonalcoholic steatohepatitis. Sci. Rep. 14, 8589 (2024).
Article CAS PubMed PubMed Central Google Scholar
He, Q. J. et al. Recent advances in age-related metabolic dysfunction-associated steatotic liver disease. World J. Gastroenterol. 30, 652–662 (2024).
Article CAS PubMed PubMed Central Google Scholar
Crommen, S. et al. Prognostic characteristics of metabolic dysfunction-associated steatotic liver in patients with obesity who undergo one anastomosis gastric bypass surgery: a secondary analysis of randomized controlled trial data. Nutrients 16, 3210 (2024).
Riazi, K. et al. The prevalence and incidence of NAFLD worldwide: a systematic review and meta-analysis. Lancet Gastroenterol. Hepatol. 7, 851–861 (2022).
Article CAS PubMed Google Scholar
Unalp-Arida, A. & Ruhl, C. E. Noninvasive fatty liver markers predict liver disease mortality in the U.S. population. Hepatology 63, 1170–1183 (2016).
Article CAS PubMed Google Scholar
Unalp-Arida, A. & Ruhl, C. E. Liver fat scores predict liver disease mortality in the United States population. Aliment Pharmacol. Ther. 48, 1003–1016 (2018).
Article CAS PubMed Google Scholar
Huang, Q. et al. A metabolome-derived score predicts metabolic dysfunction-associated steatohepatitis and mortality from liver disease. J. Hepatol. 82, 781–793 (2025).
Article CAS PubMed Google Scholar
Zhang, X. et al. A blood-based biomarker panel for non-invasive diagnosis of metabolic dysfunction-associated steatohepatitis. Cell Metab. 37, 59–68.e53 (2025).
Article CAS PubMed Google Scholar
Pose, E. et al. Moving diagnosis of liver fibrosis into the community. J. Hepatol. 83, 258–270 (2025).
Article CAS PubMed Google Scholar
Adams, L. A., Thiele, M. & Tsochatzis, E. A. Detecting at-risk steatotic liver disease and liver fibrosis in the community. Hepatology (2025).
Jiang, S. et al. Generic Diagramming Platform (GDP): a comprehensive database of high-quality biomedical graphics. Nucleic Acids Res. 53, D1670–d1676 (2025).
Article PubMed Google Scholar
Ding, J. et al. Integrative multiomic analysis identifies distinct molecular subtypes of NAFLD in a Chinese population. Sci. Transl. Med. 16, eadh9940 (2024).
Article CAS PubMed Google Scholar
Kleiner, D. E. et al. Design and validation of a histological scoring system for nonalcoholic fatty liver disease. Hepatology 41, 1313–1321 (2005).
Article PubMed Google Scholar
Kuhn, M. Building predictive models in R using the Caret package. J. Stat. Softw. 28, 1–26 (2008).
Article Google Scholar
Lundberg, S. M. & Lee, S.-I. A unified approach to interpreting model predictions. In Proc. 31st International Conference on Neural Information Processing Systems 4768–4777 (Curran Associates Inc., 2017).
Sun, B. B. et al. Plasma proteomic associations with genetics and health in the UK Biobank. Nature 622, 329–338 (2023).
Article CAS PubMed PubMed Central Google Scholar
Zhou, Y. et al. Metascape provides a biologist-oriented resource for the analysis of systems-level datasets. Nat. Commun. 10, 1523 (2019).
Article PubMed PubMed Central Google Scholar
Wu, T. et al. clusterProfiler 4.0: a universal enrichment tool for interpreting omics data. Innovation 2, 100141 (2021).
CAS PubMed PubMed Central Google Scholar

Download references

Acknowledgements

The authors gratefully acknowledge financial support from the Huai'an City Natural Science Research Project (No. HAB202318) and the National Natural Science Foundation of China (No. 82360226, No. 82300829). In addition, we extend our sincere gratitude to the UK Biobank for their invaluable support, associated with Application Number 351033.

Author information

These authors contributed equally: Bicheng Ye, Yuming Teng, Qianqian Chen, Jie Zhang, Shun Li.

Authors and Affiliations

Liver Disease Center of Integrated Traditional Chinese and Western Medicine, Basic Medicine Research and Innovation Center of Ministry of Education, Zhongda Hospital, Southeast University, Nurturing Center of Jiangsu Province for State Laboratory of AI Imaging & Interventional Radiology (Southeast University), Nanjing, China
Bicheng Ye
School of Medicine, Yunnan University, Kunming, China
Yuming Teng
Department of Gastroenterology, The Second People’s Hospital of Huai’an / The Affiliated Huai’an Hospital of Xuzhou Medical University, Huaian, China
Qianqian Chen, Changan Xu & Xiao Qiao
Department of Endocrinology and Metabolism, The Second People’s Hospital of Huai’an / The Affiliated Huai’an Hospital of Xuzhou Medical University, Huaian, China
Jie Zhang
Department of Neurology, Nanjing Drum Tower Hospital, Affiliated Hospital of Medical School, Nanjing University, Nanjing, China
Shun Li
Department of Infectious Disease, The First Affiliated Hospital of Nanjing Medical University, Nanjing, China
Li Dong
Department of Laboratory Medicine, Northern Jiangsu People’s Hospital, Yangzhou, China
Xiaye Miao

Authors

Bicheng Ye
View author publications
Search author on:PubMed Google Scholar
Yuming Teng
View author publications
Search author on:PubMed Google Scholar
Qianqian Chen
View author publications
Search author on:PubMed Google Scholar
Jie Zhang
View author publications
Search author on:PubMed Google Scholar
Shun Li
View author publications
Search author on:PubMed Google Scholar
Li Dong
View author publications
Search author on:PubMed Google Scholar
Changan Xu
View author publications
Search author on:PubMed Google Scholar
Xiao Qiao
View author publications
Search author on:PubMed Google Scholar
Xiaye Miao
View author publications
Search author on:PubMed Google Scholar

Contributions

B.Y., Y.T. and Q.C. drafted the initial manuscript. Y.T. was responsible for the data analysis of the UK Biobank. B.Y., J.Z. and S.L. were responsible for other data analyses. L.D. and C.X. were responsible for data collection and literature retrieval. X.Q. and X.M. conceived the study, supervised the research, and revised the manuscript. All the authors read and approved the final manuscript.

Corresponding authors

Correspondence to Xiao Qiao or Xiaye Miao.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

Supplementary Information.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.

Reprints and permissions

About this article

Cite this article

Ye, B., Teng, Y., Chen, Q. et al. Harnessing machine learning for the development, validation, and prognostic evaluation of MASHRisk score: insights from a multicohort study. npj Digit. Med. 9, 47 (2026). https://doi.org/10.1038/s41746-025-02220-x

Download citation

Received: 11 September 2025
Accepted: 25 November 2025
Published: 08 December 2025
Version of record: 14 January 2026
DOI: https://doi.org/10.1038/s41746-025-02220-x