Abstract
Metabolic dysfunction-associated steatohepatitis (MASH) increases liver-related mortality risk more than tenfold, yet reliable predictive biomarkers remain scarce. This study developed the MASHRisk score, a blood-based non-invasive diagnostic tool integrating routine clinical and biochemical panels. Using ten machine learning algorithms, the score was derived from 218 participants and validated across multiple cohorts (n = 93, 96, and 26,256). The MASHRisk score demonstrated robust diagnostic performance with area under the receiver operating characteristic curve (AUC) values of 0.791, 0.793, 0.806, and 0.796 across training, validation, and test sets, respectively. It emerged as an independent predictor of MASH (p < 0.001) and outperformed existing indices including Fibrosis-4 (FIB-4), aspartate aminotransferase to Platelet Ratio Index (APRI), aspartate aminotransferase to alanine aminotransferase Ratio (AAR), and Non-Alcoholic Fatty Liver Disease Fibrosis Score (NFS). In a prognostic cohort of 390,574 individuals, high-risk participants showed significantly elevated hazard ratios (HR) for liver-related mortality (HR: 12.296), MASH (HR: 12.829), cirrhosis (HR: 8.863), hepatocellular carcinoma (HR: 9.278), atherosclerotic cardiovascular disease (ASCVD)-related mortality (HR: 2.303), and all-cause mortality (HR: 1.744) compared to low-risk individuals (all p < 0.001). The MASHRisk score represents a validated, user-friendly tool for early detection, risk stratification, and outcome prediction in MASH.
Similar content being viewed by others
Introduction
Metabolic dysfunction-associated steatotic liver disease (MASLD) is strongly associated with obesity, type 2 diabetes (T2D) mellitus, and atherosclerotic cardiovascular disease (ASCVD), and is linked to an increased risk of both all-cause and liver-related mortality1,2. The progression from MASLD to metabolic dysfunction-associated steatohepatitis (MASH) is associated with a two-fold increase in all-cause mortality and more than a tenfold increase in the risks of hepatocellular carcinoma and liver-related mortality3. Thus, the identification of MASH is crucial for predicting individual clinical outcomes and informing personalized treatment strategies4.
Liver biopsy has long been regarded as the gold standard for diagnosing MASH5. However, its routine implementation in clinical settings is constrained by its invasive nature, poor patient acceptance, high costs, and substantial inter-observer variability. Moreover, the procedure involves risks, including post-procedural pain in nearly 50% of patients, and life-threatening hemorrhage in 0.6% to 1% of cases6. To enhance the diagnosis of MASH at the population level, it is crucial to develop non-invasive tests (NITs) that exhibit high diagnostic performance. This approach would not only decrease reliance on liver biopsies but also aid in stratifying patients who need follow-up treatments. Over the last 2 years, numerous international validation studies have investigated non-invasive methods for detecting MASH accompanied by fibrosis. These studies utilized a combination of imaging techniques and circulating biomarkers, such as the FAST (FibroScan-aspartate aminotransferase) score7 and NIS48. However, crucial components for these evaluations, including vibration-controlled transient elastography, miR-34a-5p, YKL-40, and alpha-2 macroglobulin, are frequently inaccessible within primary care contexts. In light of these limitations, machine learning-based integration of routine clinical laboratory parameters and physical examination measurements represents a promising alternative strategy9. Machine learning algorithms excel at capturing complex, non-linear patterns and multi-dimensional interactions among readily available clinical variables, enabling superior predictive performance compared to traditional scoring systems. This approach facilitates the development of practical, cost-effective screening tools for population-level MASH detection in primary care settings10.
Consequently, there remains an unmet need to develop accurate tools using readily available laboratory and clinical variables to identify individuals at high risk of developing MASH and its complications, including liver-related events and mortality. Such tools would enable early detection and personalized monitoring of high-risk individuals and patients with MASH in primary care and other non-specialized healthcare settings, before the development of cirrhosis or its complications.
Results
Study population characteristics and cohort composition
The workflow of this study is illustrated in Fig. 1. The study included a total of 218 participants in the training cohort, 93 in the validation cohort, 96 in the test cohort 1, 26,256 in the test cohort 2, and 390,574 in the prognosis cohort. Table 1 presents the baseline characteristics of the participants. The mean age of participants ranged from 31 to 57 years, with the proportion of males varying from 31 to 71%. The training, validation, and test cohort 1 were obtained from hospitals in China, where the MASH positivity rate was between 44 and 48%. In contrast, the test cohort 2 was derived from the general population in the United Kingdom, which exhibited a MASH positivity rate of 4.4%. The robustness of our dataset is evidenced by the diverse demographic characteristics of the participants, which theoretically enhances the development of a broadly applicable MASH prediction model.
Development of the MASHRisk score
We calculated the correlation coefficients for the selected features in the training cohort, noting that the coefficients for aspartate aminotransferase (AST) and alanine aminotransferase (ALT), as well as those for low-density lipoprotein cholesterol (LDL-C) and total cholesterol, exceeded 0.7 (Fig. 2a). Employing the principle of maximizing the area under the curve (AUC) for feature selection, we ultimately identified the following features: AST, LDL-C, sex, age, body mass index (BMI), gamma-glutamyl transferase (GGT), albumin, total bilirubin (TB), uric acid, hemoglobin A1c (HbA1c), triglycerides (TG), and high-density lipoprotein cholesterol (HDL-C) (Fig. 2b, c). These features were subsequently utilized in the Boruta algorithm11, which identified age, BMI, AST, HbA1c, and TG as the final modeling features (Supplementary Fig. 1a). Following this, we developed models using ten distinct machine learning algorithms based on the training set and assessed their performance on the validation set. Our results revealed that the model generated using the Partial Least Squares (PLS) algorithm achieved the highest AUC for predicting MASH in the validation set (Supplementary Fig. 1b, Supplementary Table 1). Consequently, we selected the PLS-derived model as the MASHRisk score. In the training, validation, test cohort 1, and test cohort 2, the MASHRisk score predicted MASH with AUC values of 0.791 (0.732–0.851), 0.793 (0.698–0.887), 0.806 (0.717–0.895), and 0.796 (0.785–0.808), respectively (Fig. 2d–g). The MASHRisk scores for participants diagnosed with MASH were significantly higher than those for non-MASH participants (all p < 0.001) (Fig. 2d–g). To promote broader utilization among researchers, we implemented a freely accessible online tool for MASHRisk score calculation, which can be accessed at https://njph.shinyapps.io/mashrisk/.
a Correlation coefficients of the included features. b Receiver operating characteristic (ROC) curve of aspartate aminotransferase (AST) (left) and alanine aminotransferase (ALT) (right) in predicting metabolic-associated steatotic hepatitis (MASH) in the training cohort. c ROC curve of low-density lipoprotein cholesterol (LDL-C) (left) and total cholesterol (TC) (right) in predicting MASH in the training cohort. ROC curve of MASHRisk score in predicting MASH across training cohort (d, left), validation cohort (e, left), testing cohort 1 (f, left), and testing cohort 2 (g, left). Comparative distribution of MASHRisk scores demonstrating clear differentiation between MASH and non-MASH across training cohort (d, right), validation cohort (e, right), testing cohort 1 (f, right), and testing cohort 2 (g, right).
Assessment of the MASHRisk score
The multivariable logistic regression analyses conducted on the MASHRisk score across four distinct cohorts, while controlling for sex, age, and BMI as covariates, revealed that the MASHRisk score remained a statistically significant independent predictor of MASH (all p < 0.001) (Fig. 3a–d). Utilizing the MASHRisk score, we identified an optimal cutoff value of 0.457 from the receiver operating characteristic (ROC) curve analysis conducted in the training cohort, which facilitated the generation of confusion matrices for each subsequent cohort. In the training cohort, the model demonstrated a sensitivity of 0.729 and a specificity of 0.738 (Fig. 3e, Supplementary Table 2). In contrast, the validation cohort exhibited an increased sensitivity of 0.689, accompanied by a decreased specificity of 0.833 (Fig. 3f, Supplementary Table 2). For test cohort 1, sensitivity was recorded at 0.605, with specificity significantly higher at 0.906 (Fig. 3g, Supplementary Table 2). In test cohort 2, the sensitivity was noted to be 0.779, while specificity dropped to 0.689 (Fig. 3h, Supplementary Table 2). It is noteworthy that test cohort 2 reflects a general population that exhibits a MASH positivity rate of only 4.4%, and the MASHRisk score showed a negative predictive value of 99.0% in this cohort (Fig. 3h, Supplementary Table 2). Overall, these results suggest that the MASHRisk score is a valuable tool for predicting MASH and demonstrates consistent performance across different populations.
Multivariate logistic regression in the training cohort (a), validation cohort (b), testing cohort 1 (c), and testing cohort 2 (d). Confusion matrices in the training cohort (e, left), validation cohort (f, left), testing cohort 1 (g, left), and testing cohort 2 (h, left). Assessment of the confusion matrices in the training cohort (e, right), validation cohort (f, right), testing cohort 1 (g, right), and testing cohort 2 (h, right). Spec specificity, Sens sensitivity, PPV positive predictive value, NPV negative predictive value, F1 F1 score, Prec precision, Rec recall, Prev prevalence, DP diagnostic probability, DR detection rate, BA balanced accuracy.
Comparison of the MASHRisk score with other NITs
We conducted a comprehensive comparison of the predictive performance of the MASHRisk score against Fibrosis-4 Index (FIB-4), AST to Platelet Ratio Index (APRI), AST to ALT Ratio (AAR), and Non-alcoholic Fatty Liver Disease Fatty Liver Severity Score (NFS) using ROC curves, calibration curves, and decision curve analysis (DCA). It is important to note that, due to the absence of platelet (PLT) data in test cohort 1, only the AAR could be calculated in this cohort. Firstly, among the four cohorts, the ROC curves indicated that the MASHRisk score exhibited the highest AUC for predicting MASH (Fig. 4a). Calibration curves demonstrated that the MASHRisk score’s predicted probabilities were closer to actual probabilities, reflecting the lowest Brier score (Fig. 4b). Additionally, DCA revealed that the MASHRisk score provided greater clinical net benefits compared to other NITs (Fig. 4c). In summary, the MASHRisk score offers superior predictive performance for MASH compared to FIB-4, APRI, AAR, and NFS, warranting its promotion in clinical practice.
a The receiver operating characteristic (ROC) curves of different NITs. From left to right: training cohort, validation cohort, testing cohort 1, and testing cohort 2. b The calibration curves of different NITs. From left to right: training cohort, validation cohort, testing cohort 1, and testing cohort 2. c The decision curve analysis (DCA) curves of different NITs. From left to right: training cohort, validation cohort, testing cohort 1, and testing cohort 2.
Model interpretation
We utilized the SHapley Additive exPlanations (SHAP) framework to interpret the MASHRisk score. The mean absolute SHAP values illustrated the impact of various risk factors, revealing that AST was the most significant variable (Supplementary Fig. 2a, b). To further elucidate the interpretability of the MASHRisk score, we highlighted two representative cases. SHAP values were employed to assess the influence of each feature on the predictions made by the MASHRisk score. In this study, diminished SHAP values correlated with a reduced likelihood of MASH, whereas elevated SHAP values indicated an increased probability. For example, patient 1, diagnosed with MASH, presented with an elevated SHAP value and a MASHRisk score of 0.884, suggestive of a greater MASH likelihood (Supplementary Fig. 2c). Conversely, patient 2, without MASH, exhibited a lower SHAP value and a prediction score of 0.353, indicating a decreased probability of MASH (Supplementary Fig. 2d).
Prognostic value of the MASHRisk score
In the prognostic cohort, participants were stratified into two categories: low-risk (n = 11,596; 3%) and high-risk (n = 378,978; 97%). Liver-related mortality was observed in 0.1% (n = 417) of the low-risk group, compared to 1.3% (n = 152) of the high-risk group. The high-risk group demonstrated a significantly elevated hazard ratio for liver-related mortality compared to the low-risk group, with a hazard ratio (HR) of 12.296 (95% CI: 10.212–14.806, p < 0.001) (Fig. 5a). Time-dependent ROC analysis revealed strong prognostic performance, exhibiting AUC values of 0.83 at 3 years, 0.85 at 5 years, and 0.80 at 10 years (Fig. 5a). The incidence of MASH events was 0.4% in the low-risk group and 1.2% (n = 141) in the high-risk group, with the latter exhibiting a significantly increased hazard ratio (HR = 12.829, 95% CI: 10.560–15.585, p < 0.001) compared to the low-risk group (Fig. 5b). Time-dependent ROC analysis for MASH events showed strong prognostic accuracy, yielding AUCs of 0.74 at 3 years, 0.79 at 5 years, and 0.81 at 10 years (Fig. 5b). For cirrhosis events, the incidence was recorded at 0.4% (n = 1676) in the low-risk group and 3.8% (n = 446) in the high-risk group. The high-risk group showed a significantly increased risk for cirrhosis with an HR of 8.863 (p < 0.001, 95% CI: 7.985–9.839) compared to the low-risk group (Fig. 5c). Time-dependent ROC analyses for cirrhosis yielded AUC values of 0.75 at 3 years, 0.75 at 5 years, and 0.75 at 10 years (Fig. 5c). Regarding liver cancer, the incidence was found to be 0.07% (n = 254) in the low-risk group and 0.6% (n = 72) in the high-risk group. The high-risk group exhibited a significantly elevated risk of developing liver cancer compared to the low-risk group (HR = 9.278, 95% CI: 7.142–12.059, p < 0.001) (Fig. 5d). Time-dependent ROC analysis for liver cancer demonstrated AUC values of 0.77 at 3 years, 0.75 at 5 years, and 0.74 at 10 years (Fig. 5d). Finally, the high-risk group showed a significantly increased risk for ASCVD and all-cause mortality compared to the low-risk group, with hazard ratios of 2.303 (95% CI: 2.047–2.592, p < 0.001) and 1.744 (95% CI: 1.661–1.831, p < 0.001), respectively (Supplementary Fig. 3a, b).
a Cumulative hazard plot comparing high-risk and low-risk patient groups over time (in days), with liver-related death as the outcome (left). Time-dependent receiver operating characteristic curves of the MASHRisk score for predicting liver-related mortality (right). b Cumulative hazard plot comparing high-risk and low-risk patient groups over time (in days), with metabolic-associated steatotic hepatitis (MASH) as the outcome (left). Time-dependent receiver operating characteristic curves of the MASHRisk score for predicting MASH (right). c Cumulative hazard plot comparing high-risk and low-risk patient groups over time (in days), with cirrhosis as the outcome (left). Time-dependent receiver operating characteristic curves of the MASHRisk score for predicting cirrhosis (right). d Cumulative hazard plot comparing high-risk and low-risk patient groups over time (in days), with liver cancer as the outcome (left). Time-dependent receiver operating characteristic curves of the MASHRisk score for predicting liver cancer (right).
Enrichment analysis of the MASHRisk score
To explore the biological mechanisms of the MASHRisk score, we performed pathway enrichment analysis. Both over-representation analysis (ORA) and gene set enrichment analysis (GSEA) revealed that the high-risk group was significantly enriched in metabolic and immune-related pathways, such as the carboxylic acid metabolic process, carbon metabolism, and chemokine signaling pathway (Fig. 6a, b). This finding is in line with the known pathophysiology of MASH, as metabolic dysregulation and chronic inflammation are central to its development12,13.
Discussion
We developed a biochemical examination-based MASH prediction score in a Chinese liver biopsy cohort using 14 candidate clinical parameters. By employing a machine learning pipeline, we identified that a combination of five essential parameters, namely age, BMI, AST, HbA1c, and TG, effectively predicts MASH. Furthermore, the MASHRisk score demonstrated successful external validation across various ethnic groups within a cohort from the British population, achieving an AUC of 0.806. Furthermore, this score demonstrates a clinically significant improvement in detecting MASH compared to the widely used FIB-4 and other NITs. A strong association with liver-related events underscores the clinical utility of this innovative referral tool.
Each element of the MASHRisk score has previously shown a robust individual correlation with MASH. Obesity is a recognized risk factor for fatty liver disease, and patients frequently exhibit elevated liver enzyme levels, thereby justifying the inclusion of BMI and AST as predictive factors14,15,16. Age is a primary risk factor for chronic liver diseases, including MASLD and MASH17. Elevated TG levels are often observed in patients with metabolic syndrome, which is a significant risk factor for the development and progression of MASH17. High TG levels can contribute to hepatic steatosis and inflammation, thereby increasing the risk of MASH17. HbA1c, a marker of long-term blood glucose control, is closely related to T2D, which is one of the strongest risk factors for MASH18. The MASHRisk score takes into account individuals with elevated liver enzymes, diabetes, obesity, and hyperlipidemia, which may be one of the reasons for the outstanding performance of the MASH score.
MASLD has become the primary contributor to liver-related health issues and deaths globally19. Recognizing patients with MASLD who are at an increased risk for liver disease-related mortality is crucial in clinical settings. Previous research indicates that significant hepatic steatosis—assessed through ultrasonography or by using surrogate markers such as liver enzyme levels and various steatosis scores (including the ultrasound liver fat index, fatty liver index, NAFLD liver fat score, and hepatic steatosis index)—is linked to a higher risk of liver-related mortality20,21. Nevertheless, these parameters alone are not highly effective; for example, the mortality risk only rises by four to five times among individuals exhibiting the highest liver enzyme levels or intermediate to high steatosis indices. Simple fibrosis scores, including AAR, FIB-4, APRI, and NFS, are commonly employed in clinical practice to estimate advanced liver fibrosis and may contribute to minimizing the need for liver biopsies22,23. We found that the performance of the MASHRisk score was superior to several established NITs, including AAR, APRI, FIB-4, and NFS scores, in accurately diagnosing MASH. It is important to note that our laboratory tests are entirely based on biochemical examinations, such as FIB-4, NFS, and APRI, and also incorporate relevant complete blood count parameters like PLT. This increases the complexity of clinical implementation and testing costs, as it necessitates an additional blood sample for PLT testing. Incorporating the MASHRisk score as a targeted screening tool in primary care settings could alleviate unnecessary patient anxiety. Its user-friendly design encourages adoption across various healthcare settings, including resource-limited settings where advanced diagnostic tools may be unavailable. By integrating this score into routine check-ups, healthcare providers can prioritize patients for further diagnostic evaluation, lifestyle modifications, or treatment, ultimately enhancing patient outcomes. From a public health perspective, the widespread adoption of the MASHRisk score can help mitigate the burden of MASH by facilitating early detection and intervention. This, in turn, may lower healthcare costs related to advanced liver disease and improve the quality of life for affected individuals.
While the MASHRisk score demonstrates strong predictive performance, successful implementation requires addressing key challenges: integration with electronic health records, laboratory standardization across facilities, and clinician training on score interpretation and patient management. We propose practical solutions, including the development of digital calculators and mobile applications, standardized laboratory protocols, and clear implementation guidelines with defined action thresholds for referral and intervention. The score’s reliance on routinely available parameters makes it suitable for resource-limited settings where advanced diagnostics are inaccessible, maximizing its real-world impact in reducing the global burden of MASH.
This study has several limitations. First, the use of proton density fat fraction (PDFF) as a surrogate marker for MASH in the test 2 cohort may be open to discussion. Second, the potential for longitudinal monitoring of liver health using the MASHRisk score has yet to be verified. Long-term prospective follow-up studies are essential to validate its accuracy and determine its clinical utility in tracking disease progression or regression in response to therapeutic interventions.
In summary, the MASHRisk score is a validated, user-friendly, non-invasive tool applicable to diverse populations. It exhibits robust predictive ability across various cohorts and effectively assesses the long-term risk of liver-related outcomes. As healthcare systems transition towards managing chronic diseases on a population level, the MASHRisk score presents a scalable approach for early detection and risk stratification. With its adaptability to different healthcare environments24,25, MASHRisk shows potential for practical implementation and for alleviating the global burden of MASH.
Methods
Study design
The primary objective of the first phase was to develop and validate a blood-based NIT, the MASHRisk score, which integrates routine clinical and biochemical examinations to diagnose MASH. The second phase aimed to evaluate whether the MASHRisk score could predict subsequent liver-related events in patients with MASLD. The candidate predictors and their distributions are detailed in Table 1. The workflow of this study is illustrated in Fig. 1, which was created using the BioGDP platform (https://biogdp.com/)26.
Participants
The study involved participants from the First Affiliated Hospital of Nanjing Medical University in Nanjing, China, who were randomly divided into a training cohort and a validation cohort in a 7:3 ratio. These MASLD patients were treated within the departments of Infectious Disease or General Surgery and Bariatric and Metabolic Surgery, with data collected from August 2023 to July 2024. Inclusion criteria mandated that patients had undergone a liver biopsy and histologic examination. Participants were excluded if they met any of the following criteria: (1) viral hepatitis; (2) previous cirrhosis; (3) previous liver cancer or other malignancies; (4) incomplete baseline information. Ethical approval was granted by the Ethics Committee of the First Affiliated Hospital of Nanjing Medical University (2023-SR-398). The test cohort 1 was derived from a previous study, and the specific inclusion and exclusion criteria can be referenced in that study27. Test cohort 2 and the prognostic cohort were derived from the UK Biobank, a large prospective cohort study that recruited approximately 500,000 participants aged 40–69 years from across the United Kingdom between 2006 and 2010. The inclusion criteria for test cohort 2 required participants to have undergone PDFF evaluation. The inclusion criteria for the prognostic cohort specified that participants should have complete follow-up information. The exclusion criteria for these cohorts included: (1) viral hepatitis; (2) previous cirrhosis; (3) previous liver cancer or other malignancies; and (4) incomplete baseline information.
Liver histology
Liver sections embedded in paraffin were subjected to a standard staining protocol, utilizing hematoxylin and eosin as well as Masson trichrome. The evaluation of pathological changes in each sample was conducted by two experienced pathologists who were blinded to the experimental groups. This assessment was carried out according to the scoring system established by the NASH Clinical Research Network28.
Outcome definitions and other definitions
In the training, validation, and test cohort 1, a NAS score of 5 or higher was used to diagnose MASH27. In the test cohort 2, MASH was defined as a PDFF exceeding 5.5% and an iron-corrected T1 mapping (cT1) value greater than 800 ms. To evaluate prognostic accuracy in the prognostic cohort, key outcomes included liver-related events, such as the development of MASH, cirrhosis, hepatocellular carcinoma, liver-related mortality, ASCVD, and all-cause mortality. A comprehensive definition of liver-related events is provided in Supplementary Table 3.
Modeling strategy
Within the derivation cohort, the Boruta algorithm, implemented via the Boruta R package11, was initially employed to discern salient variables correlated with MASH. Subsequently, ten machine learning models were developed using the caret R package29, based on the following algorithms: K-Nearest Neighbors, Gradient Boosting Machine, Recursive Partitioning and Regression Trees, Neural Network, the Support Vector Machine, AdaBoost, Random Forest, Extreme Gradient Boosting, Generalized Linear Model with Elastic Net Regularization (GLMNET), and PLS. The PLS algorithm exhibited the best predictive performance in the validation cohort and was selected to construct the MASHRisk score, with grid search identifying optimal performance at ncomp = 1. To enhance the understanding of the best model’s predictions, SHAP values were applied, and the shapviz R package was utilized to interpret the complex machine learning model30.
Proteomics assays
High-throughput proteomics measurements were conducted using Olink®31. Details pertaining to the Olink proteomics assay, as well as data processing and quality control measures, have been documented in other sources. In summary, 2941 protein quantification values were obtained from their Normalized Protein eXpression, presented on Olink’s arbitrary log₂ scale31. Differential protein analysis was conducted using the limma package.
Enrichment analysis
Protein pathway enrichment analysis was conducted using two strategies. First, we employed the ORA method using Metascape32. Additionally, we performed GSEA using the clusterProfiler package33. Differentially expressed proteins were identified using the limma package in R. Proteins with adjusted P value (adj.P.Val) < 0.05 and log2 fold change (log2FC) > 0.5 were selected for ORA.
Statistical analysis
We used z-score standardization during model development, with scaling parameters estimated on the training set and applied to validation and test cohorts. Missing observations were removed prior to scaling, and categorical variables were not scaled. Continuous variables are reported as means accompanied by standard deviations, while categorical variables are shown as counts and percentages. To evaluate relationships between the variables, Pearson or Spearman correlation methods were employed. The diagnostic efficacy of the MASHRisk score and other indices was determined through ROC curves. The optimal cutoff was determined on the training set by maximizing Youden’s index on the ROC. This threshold was then fixed and applied unchanged to the validation and test cohorts. Calibration curves were created to assess the alignment between predicted probabilities and actual outcomes. Furthermore, DCA was performed to investigate the clinical relevance of the models by calculating the net benefit of applying these scores at varying threshold probabilities. The follow-up period was established using the Kaplan–Meier approach. A p value of less than 0.05 was considered statistically significant. All statistical analyses were carried out using R version 4.4.2.
Data availability
The data for the training cohort and the validation cohort are not publicly accessible to protect patient privacy. Test cohort 1 is derived from a previously published study that is publicly available27. Test cohort 2 and the prognosis cohort are sourced from the UK Biobank, with Application Number 351033.
Code availability
While the source code is not openly accessible, it may be shared with eligible researchers who submit reasonable requests to the corresponding author.
References
Targher, G., Tilg, H. & Byrne, C. D. Non-alcoholic fatty liver disease: a multisystem disease requiring a multidisciplinary and holistic approach. Lancet Gastroenterol. Hepatol. 6, 578–588 (2021).
Simon, T. G., Roelstraete, B., Khalili, H., Hagström, H. & Ludvigsson, J. F. Mortality in biopsy-confirmed nonalcoholic fatty liver disease: results from a nationwide cohort. Gut 70, 1375–1382 (2021).
Younossi, Z. M. et al. Global epidemiology of nonalcoholic fatty liver disease-Meta-analytic assessment of prevalence, incidence, and outcomes. Hepatology 64, 73–84 (2016).
Rinella, M. E. & Sanyal, A. J. Management of NAFLD: a stage-based approach. Nat. Rev. Gastroenterol. Hepatol. 13, 196–205 (2016).
Chalasani, N. et al. The diagnosis and management of nonalcoholic fatty liver disease: practice guidance from the American Association for the Study of Liver Diseases. Hepatology 67, 328–357 (2018).
Neuberger, J. et al. Guidelines on the use of liver biopsy in clinical practice from the British Society of Gastroenterology, the Royal College of Radiologists and the Royal College of Pathology. Gut 69, 1382–1403 (2020).
Newsome, P. N. et al. FibroScan-AST (FAST) score for the non-invasive identification of patients with non-alcoholic steatohepatitis with significant activity and fibrosis: a prospective derivation and global validation study. Lancet Gastroenterol. Hepatol. 5, 362–373 (2020).
Harrison, S. A. et al. A blood-based biomarker panel (NIS4) for non-invasive diagnosis of non-alcoholic steatohepatitis and liver fibrosis: a prospective derivation and global validation study. Lancet Gastroenterol. Hepatol. 5, 970–985 (2020).
Cai, G. et al. Artificial intelligence-based models enabling accurate diagnosis of ovarian cancer using laboratory tests in China: a multicentre, retrospective cohort study. Lancet Digit. Health 6, e176–e186 (2024).
Greener, J. G., Kandathil, S. M., Moffat, L. & Jones, D. T. A guide to machine learning for biologists. Nat. Rev. Mol. Cell Biol. 23, 40–55 (2022).
Kursa, M. B. & Rudnicki, W. R. Feature Selection with the Boruta Package. J. Stat. Softw. 36, 1–13 (2010).
Steinberg, G. R., Valvano, C. M., De Nardo, W. & Watt, M. J. Integrative metabolism in MASLD and MASH: pathophysiology and emerging mechanisms. J. Hepatol. 83, 584–595 (2025).
Sawada, K., Chung, H., Softic, S., Moreno-Fernandez, M. E. & Divanovic, S. The bidirectional immune crosstalk in metabolic dysfunction-associated steatotic liver disease. Cell Metab. 35, 1852–1871 (2023).
Nussbaumerova, B. & Rosolova, H. Obesity and dyslipidemia. Curr. Atheroscler. Rep. 25, 947–955 (2023).
Chalasani, N. et al. The diagnosis and management of non-alcoholic fatty liver disease: practice guideline by the American Association for the Study of Liver Diseases, American College of Gastroenterology, and the American Gastroenterological Association. Hepatology 55, 2005–2023 (2012).
Njei, B., Osta, E., Njei, N., Al-Ajlouni, Y. A. & Lim, J. K. An explainable machine learning model for prediction of high-risk nonalcoholic steatohepatitis. Sci. Rep. 14, 8589 (2024).
He, Q. J. et al. Recent advances in age-related metabolic dysfunction-associated steatotic liver disease. World J. Gastroenterol. 30, 652–662 (2024).
Crommen, S. et al. Prognostic characteristics of metabolic dysfunction-associated steatotic liver in patients with obesity who undergo one anastomosis gastric bypass surgery: a secondary analysis of randomized controlled trial data. Nutrients 16, 3210 (2024).
Riazi, K. et al. The prevalence and incidence of NAFLD worldwide: a systematic review and meta-analysis. Lancet Gastroenterol. Hepatol. 7, 851–861 (2022).
Unalp-Arida, A. & Ruhl, C. E. Noninvasive fatty liver markers predict liver disease mortality in the U.S. population. Hepatology 63, 1170–1183 (2016).
Unalp-Arida, A. & Ruhl, C. E. Liver fat scores predict liver disease mortality in the United States population. Aliment Pharmacol. Ther. 48, 1003–1016 (2018).
Huang, Q. et al. A metabolome-derived score predicts metabolic dysfunction-associated steatohepatitis and mortality from liver disease. J. Hepatol. 82, 781–793 (2025).
Zhang, X. et al. A blood-based biomarker panel for non-invasive diagnosis of metabolic dysfunction-associated steatohepatitis. Cell Metab. 37, 59–68.e53 (2025).
Pose, E. et al. Moving diagnosis of liver fibrosis into the community. J. Hepatol. 83, 258–270 (2025).
Adams, L. A., Thiele, M. & Tsochatzis, E. A. Detecting at-risk steatotic liver disease and liver fibrosis in the community. Hepatology (2025).
Jiang, S. et al. Generic Diagramming Platform (GDP): a comprehensive database of high-quality biomedical graphics. Nucleic Acids Res. 53, D1670–d1676 (2025).
Ding, J. et al. Integrative multiomic analysis identifies distinct molecular subtypes of NAFLD in a Chinese population. Sci. Transl. Med. 16, eadh9940 (2024).
Kleiner, D. E. et al. Design and validation of a histological scoring system for nonalcoholic fatty liver disease. Hepatology 41, 1313–1321 (2005).
Kuhn, M. Building predictive models in R using the Caret package. J. Stat. Softw. 28, 1–26 (2008).
Lundberg, S. M. & Lee, S.-I. A unified approach to interpreting model predictions. In Proc. 31st International Conference on Neural Information Processing Systems 4768–4777 (Curran Associates Inc., 2017).
Sun, B. B. et al. Plasma proteomic associations with genetics and health in the UK Biobank. Nature 622, 329–338 (2023).
Zhou, Y. et al. Metascape provides a biologist-oriented resource for the analysis of systems-level datasets. Nat. Commun. 10, 1523 (2019).
Wu, T. et al. clusterProfiler 4.0: a universal enrichment tool for interpreting omics data. Innovation 2, 100141 (2021).
Acknowledgements
The authors gratefully acknowledge financial support from the Huai'an City Natural Science Research Project (No. HAB202318) and the National Natural Science Foundation of China (No. 82360226, No. 82300829). In addition, we extend our sincere gratitude to the UK Biobank for their invaluable support, associated with Application Number 351033.
Author information
Authors and Affiliations
Contributions
B.Y., Y.T. and Q.C. drafted the initial manuscript. Y.T. was responsible for the data analysis of the UK Biobank. B.Y., J.Z. and S.L. were responsible for other data analyses. L.D. and C.X. were responsible for data collection and literature retrieval. X.Q. and X.M. conceived the study, supervised the research, and revised the manuscript. All the authors read and approved the final manuscript.
Corresponding authors
Ethics declarations
Competing interests
The authors declare no competing interests.
Additional information
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary information
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.
About this article
Cite this article
Ye, B., Teng, Y., Chen, Q. et al. Harnessing machine learning for the development, validation, and prognostic evaluation of MASHRisk score: insights from a multicohort study. npj Digit. Med. 9, 47 (2026). https://doi.org/10.1038/s41746-025-02220-x
Received:
Accepted:
Published:
Version of record:
DOI: https://doi.org/10.1038/s41746-025-02220-x








