Abstract
Steatotic liver disease (SLD), formerly named fatty liver disease, has a prevalence estimated at 30–38% in adults. Detection of SLD is important, since prompt initiation of treatment can stop disease progression, lead to a reduction in adverse outcomes, and reduce the economic burden associated with the disease. We report the development of a novel Deep Learning (DL) method for the prediction of SLD, which consists of transforming the input variables from tabular data into images, with the goal of using the pattern recognition power of DL models to reach the best prediction performance. The dataset used in this study includes registries from 2,999 patients. The data of each patient, originally represented as a vector, is converted into an image replicating each variable in rows and columns. Our DL models reach better results compared to those of traditional ML models at various levels of sensitivity and specificity. A sensitivity of 0.9497, a specificity of 0.6417, and an AUCROC of 0.8662 were reached with one DL model. We also achieved significantly better results relative to those obtained with the Hepatic Steatosis Index (HSI). Our DL models reach higher AUCROC values compared to those of the traditional ML models, and also with respect to those obtained with HSI.
Similar content being viewed by others
Introduction
Steatotic liver disease (SLD), formerly known as fatty liver disease, is a common disease defined by the presence of steatosis in more than 5% of the hepatocytes1. A new nomenclature was recently published, classifying SLD into 5 groups: Metabolic Dysfunction-Associated Steatotic Liver Disease (MASLD), Alcohol-Associated Liver Disease (ALD), a combination of MASLD and ALD (MetALD), SLD secondary to another specific etiology, and cryptogenic SLD2. The term MASLD replaces the former terms Non-Alcoholic Fatty Liver Disease (NAFLD), and Metabolically-Associated Fatty Liver Disease (MAFLD). A great concordance has been shown among these three definitions, and previous NAFLD studies are considered to be valid under the new MASLD definition3.
MASLD is the most common cause of chronic liver disease worldwide1 with a continuously growing global prevalence associated with the increasing prevalence of diabetes, obesity, and metabolic syndrome4,5,6. Its prevalence is estimated to be at 30–38% in adults with a 50.4% increase in the last 3 decades4,7,8. A meta-analysis estimated a high prevalence of MASLD in South America (30.4%)9. MASLD prevalence is higher in metabolic risk groups, affecting > 70% of patients with type 2 diabetes mellitus (T2DM), and 90% of patients with severe and morbid obesity undergoing bariatric surgery6. Likewise, ALD is one of the leading causes of chronic liver disease10 but contrary to MASLD, ALD prevalence has remained stable during the last decades, with an estimated global prevalence of 8%6,11. ALD may coexist with other liver diseases, such as viral hepatitis and MASLD (MetALD), contributing to the progression of liver disease12,13.
SLD is associated with liver and non-liver adverse outcomes. It can progress to steatohepatitis, fibrosis, cirrhosis, end-stage liver disease, and hepatocellular carcinoma (HCC), and is associated with an increase in all-cause mortality1,6,14. Both MASLD and ALD are considered leading indications for livertransplantation1,6,13,14,15. ALD was the most common underlying reported chronic liver disease in patients with acute-on-chronic liver failure16. Furthermore, MASLD is associated with a significant number of comorbidities that lead to a higher risk of non-liver malignancies such as colorectal cancer, lung diseases, chronic kidney disease, cognitive impairment, and complications of T2DM6,17,18. Also, MASLD shares a complex bidirectional relationship with cardiovascular disease, which is the main cause of death in these patients8,18.
Consequently, SLD is associated with a high economic burden, which has been increasing in recent decades19,20. MASLD significantly increases health-care costs compared to those of non-MASLD patients. The average rate of overall outpatient visits at 5 years following diagnosis was 40% higher among patients with MASLD compared with controls19,21. Furthermore, MASLD is associated with a reduction in health-related quality of life compared to patients with no liver disease and patients with liver disease due to other causes22,23. MASLD-related deaths due to cirrhosis and liver cancer have increased by 76.7% and 95.1%, respectively between 1990 and 201924.
The diagnosis of SLD requires evidence of hepatic steatosis by either imaging or histology2,5,15,25. Liver biopsy is considered the gold standard for diagnosis, which is associated with low, but not negligible complication rates26and it is reserved for specific scenarios such as diagnostic doubt, or patients at increased risk for advanced fibrosis25. Consequently, in routine clinical practice, most diagnoses of SLD are made radiologically. Abdominal ultrasound is the most commonly used method because it is relatively inexpensive, accessible, and innocuous, with a sensitivity and specificity of approximately 85% and 94%, respectively1,26,27 and its use is recommended by international guidelines as a first-line diagnostic test28,29,30.
Hepatic steatosis is characterized by a bright liver echotexture and blurring of the hepatic vasculature. Ultrasound reliability is operator dependent, is limited in patients with central obesity, and has limited sensitivity in mild steatosis31. Alternative imaging techniques are associated with higher costs, including vibration-controlled transient elastography (VCTE), magnetic resonance spectroscopy (MRS), and magnetic resonance proton density fat fraction (MRI-PDFF)18. MRS has a good correlation with MRI-PDFF, which has a sensitivity of 93% and 94% specificity32. Despite its better accuracy for detecting steatosis, cost and limited availability restrict its use in clinical practice26,30.
Early SLD diagnosis is important, since prompt initiation of treatment can stop disease progression, lead to a reduction in adverse outcomes, and reduce the economic burden associated with the disease33,34. A recent study showed that a screening strategy for MASLD followed by intensive lifestyle interventions, or pioglitazone in persons with T2DM, is cost-effective35. Life-style interventions in patients with MASLD have proven regression in MRI-PDFF36 and improvement in liver histology37. In patients with ALD, alcohol abstinence reduces the risk of disease progression14. However, early diagnosis is difficult, since SLD patients are often asymptomatic and have no laboratory alterations, especially in early stages26. The use of screening techniques can help disease detection in asymptomatic patients. Although abdominal ultrasound is a relatively low-cost test and has good performance as a first-line diagnostic test, in larger screening studies the cost and availability of imaging impact feasibility, especially in primary care centers30.
Currently in Latin America, there is no consensus on recommending SLD screening in the general population, due to the low cost-effectiveness of this practice, and the associated risks of invasive tests. However, MASLD screening is recommended in patients with repeatedly altered liver enzymes, features of metabolic syndrome, or obesity. In these cases, abdominal ultrasound is the recommended initial screening method38. However, this method is not widely available in primary health care.
From this perspective, there have been attempts to create prediction models to diagnose SLD without the use of imaging, or biopsy, that can be applied to the general population39,40. More recently, with the development of machine learning, new models have been published41,42,43,44,45,46,47,48,49,50,51,52,53. Some of these models use simple variables, such as age, body mass index (BMI), alanine aminotransferase (ALT), aspartate aminotransferase (AST), gamma-glutamyl transferase (GGT), fasting plasma glucose (FPG), and triglyceride.
ML models have also been used previously in the prediction of different diseases. Several models were developed to predict the risk of gestational diabetes mellitus (GDM)54. ML was used to diagnose acute gastrointestinal (GI) bleed55.
State of the art in SLD prediction with ML models
Several models have been developed to predict SLD, NAFLD and MAFLD with various approaches39,40,41,42,43,44,45,46,47,48,49,50,51,52,53,56,57,58,59,60,61,62,63,64,65. Most of them use data from medical visits and blood tests. Liver ultrasound combined with diffusion models are used to improve the classification of NAFLD64. The best performance was reached with a combination of Triglycerides, Glycemia, and Waist Circumference (WC). For the prediction of NAFLD, 35 clinical and biochemical variables are used with an extreme gradient boosting (XGB)42 and 12 variables are used as input for an XGB46. A random forest (RF) model for the prediction of SLD was made using age, gender, systolic blood pressure (SBP), diastolic blood pressure (DBP), abdominal girdle and triglycerides43. 30 variables are used in a random under-sampling (RUS) boosted tree48. Comparisons among a custom model (LR), FLI, and Hepatic Steatosis Index (HSI) are made, using patient data from two different regions of Europe. On average, the custom model reached better prediction results compared to those of FLI and HSI. The custom model required 8 variables: age, serum aspartate aminotransferase (AST), AST – alanine aminotransferase (ALT) ratio, Waist circumference (cm), Ferritin, body mass index (BMI), Serum triglycerides andGout49. 11 variables were used for the prediction of SLD in a Chinese cohort using an XGB50 the variables were BMI, Albumin, ALT, globulin, fasting blood glucose (FBG), high-density lipoprotein cholesterol (HDL-c), low-density lipoprotein cholesterol (LDL-c) and triglyceride. A combination of 28 variables with a RF was used to predict NAFLD, however the most important variables were WC, chest circumference, trunk fat and BMI51. After a selection of variables, just 8 were used for an XGB model to predict NAFLD52. BMI, uric acid, triglyceride, HDL, height, hemoglobin, LDL, carcinoembryonic antigen (CEA), AST, age, glucose, and alpha-fetoprotein (AFP) were the inputs of an XGB model53. A variety of clinical data is used to diagnose MAFLD and NAFLD, with the best performance obtained by using the Fatty Liver Index (FLI)56. Another approach employed genetic biomarkers to predict NAFLD, using Machine Learning (ML) methods to select the biomarkers, followed by performing the prediction using the biomarkers with a nomogram57. A logistic regression (LR) model is used for MAFLD and NAFLD prediction using blood tests including Triglycerides, Glycemia, HOMA-IR, and data measured in clinical visits58. An XGB is also used with 27 variables59. Age, gender, BMI, Cholesterol, HDL-C, LDL-C, Glucose, GOT-AST, and GPT-ALT were used for the prediction of SLD with an LR60. BMI, Waist Circumference, HDL, Triglycerides, ALT, Tuber Consumption, Fry Food Consumption, Diabetes and Hyperuricemia were used for the prediction of NAFLD with an stepwise LR model61. Sex, Age, Gamma-Glutamyl Transferase (GGT), Glucose, Abdominal Volume Index were used as inputs to a neural network to predict NAFLD62. Age, gender, BMI, WHR, ALT, LDL, HDL, UA, and smoking were used with a LR model to predict MAFLD in young adults (18–44 years old)63. Age, sex, waist circumference, BMI, ALT and triglyceride glucose index were used to predict NAFLD with a LR model65.
FLI was proposed by Bedogni et al.39 for the prediction of SLD. The model requires just 4 variables: gamma-glutamyltransferase (GGT), BMI, triglyceride, and WC. HSI was proposed by Lee et al.40 to predict NAFLD, and requires ALT/AST ratio, BMI, diabetes mellitus (DM) and gender. In both cases, the models employed were LR39,40. In other studies, models use more complex variables; e.g., data from multiple medical visits is used to predict SLD47.
Deep learning capabilities in pattern recognition
Deep learning (DL), which has its roots in conventional neural networks, significantly outperformed its predecessors in pattern recognition tasks66. Deep Learning models include a layered architecture of data representation, in which the high-level features can be extracted from the last layers of the networks while the low-level features are extracted from the lower layers66,67. These architectures were originally inspired by Artificial Intelligence (AI) simulating processes of key sensory areas, such as vision in the human brain68. One of the main advantages of DL is mimicking how the human brain works. With great success in many fields, deep learning has reached excellent performance in tasks that require pattern recognition, image classification, object detection, video processing, natural language processing, and speech processing, among others68,69,70. Patient variables, obtained through exams or physical measurements, and coded as tabular data, are still a challenge to DL models71. Some approaches consist of transforming tabular data into an image. DeepInsight converts non-image data into an image, formed by arranging pixel positions with similar features together72. Bazgir et al. present a feature representation termed REFINED to create a 2D image based on the feature pairwise relationship73. An image generator for tabular data (IGDT) also positions features based on how close they are to each other74. Sharma et al. proposed methods to represent a 1-D vector in a 2-D graphical image, using a bar graph, a normalized distance matrix, and a combination of both75. Previous methods use the power of CNN to classify, getting good results in their comparison with traditional models.
A multimodal model was used to predict severe hemorrhage in placenta previa using MRI and tabular data76.
Our contributions
Several studies have shown the tremendous success of DL methods in medical image analysis77,78,79,80. In particular, with the introduction of CNNs, many pattern recognition problems in images have been solved successfully. Part of the success could be attributed to CNN architectures that are based on the visual system architecture with convolutional layers that extract various features from images. Our proposed models consider the spatial representation of each variable, including width and height, in the image, and the importance of the spatial position of each variable since these could match the filters of the convolutional layers of the CNNs spatially to improve classification results.
We developed a model using DL for the disease prediction transforming the input variables from tabular data to images, with the goal of using the power of CNN models to recognize complex patterns in images. DL models should outperform traditional machine learning models in disease prediction. This is the main problem we address in our study, proposing a new novel method to treat tabular data as images. The main contributions of our method are summarized as follows:
-
(1)
In this work we report the development of a novel DL method for the prediction of SLD, which consists of transforming the input variables from tabular data into images, with the goal of matching the variable representation sizes in the image with those of the filters of the convolutional layers of the CNN models to reach the best classification performance. Additionally, based on our literature review, existing approaches require a larger number of variables than those available in the SLD prediction problem and have not explored the variable representation size and position. Also, previously published methods have not been applied to SLD prediction.
-
(2)
We applied our method to an important illness, SLD, that has a prevalence between 30 and 38% in adults using a database of 2,999 patients. Detection of SLD is important, since prompt initiation of treatment can stop disease progression, lead to a reduction in adverse outcomes, and reduce the economic burden associated with the disease.
-
(3)
Our proposed model is compared with the twelve different traditional ML models we have also developed for SLD prediction.
-
(4)
A search was performed for the optimization of the hyperparameters of all DL and ML models. We also included the application of a variable selection process, reducing the redundance of data to improve the performance of the model. Additionally, our proposed method could be extended to other illnesses.
-
(5)
Our results show that by using DL on the transformed patients’ data, we obtained significantly better performance than that based on traditional machine learning models, and those based on the Hepatic Steatosis Index (HSI). We obtained an average reduction in false positives (FP) for the same level of sensitivity of 9.85% which is very significant.
Materials and methods
Database
The dataset used in this study was obtained from patients attending the Preventive Medicine Unit of the University of the Andes Clinic, in Santiago, Chile. The dataset includes data from 2,999 patients, obtained between February 2022 and October 2023. The only exclusion criteria were that the patients must be at least 18 years old, and patients must have all variables assessed to be included in the dataset. AUDITc was computed using a brief test proposed by Bush et al. in 199881, in which the range of values is between 0 and 12. The data for each input to the model was normalized by subtracting the mean and dividing it by the standard deviation. The dataset was randomly divided into three partitions; training set (70%), validation set (10%) and testing set (20%). The data usage was approved by the institutional review board (IRB) of Clinica Universidad de los Andes, Santiago, Chile. Our study is a retrospective study on de-identified data. The institutional review board (IRB) of Clinica Universidad de los Andes, Santiago, Chile, approved the data usage and waived the need for informed consent. All methods were performed in accordance with relevant guidelines and regulations.
Data augmentation
Data Augmentation (DA) is a method commonly used in ML and DL to improve the performance of the models54,82. In this study, a DA method is proposed that is only used on the training set. It consists of the creation of new patients, but these new patients must have their new data within a range of values established by a medical specialist in Hepatology. The objective of using these ranges is that the new, artificially created patients have their variables with values that are validated by the medical experts. Table S1 of the Supplementary Material presents the range for each of the variables proposed by the medical specialists, including the limitations for some variables, such as in Glycemia, keeping the same category of the original patient, and in AUDITc, keeping same category. BMI adapts to the changes of weight and height, and it is recomputed; however, the new patient must be within the same category of BMI classification recommended by the WHO83,54. A similar DA was proposed and used by us54. We named this proposed DA Controlled Noise (CN) since for the new patients the input variables take random values within the assigned ranges. We used three options. In the first one, we created i patients from each original patient. In the second option, we created i patients from each original positive patient, and in the third option we created i patients from each original positive patient, and j patients from each original negative patient, with i > j. Using the last two options, the training set may be balanced for the positive and negative cases.
Development of a new DL method for the prediction of SLD
In this study we developed a model using DL for the prediction of SLD transforming the input variables from tabular data to images, with the goal of using the power of CNN models to recognize complex patterns in images. The method has three stages: the data transformation stage, the CNN stage, and the classification stage.
Data transformation stage
With the goal of using the power of CNN models to recognize complex patterns in images, it is necessary to transform the patient’s data into an image. Each patient’s data can be represented as a vector of n variables/features. This vector can be transformed into a matrix. CNNs normally use an image of 224 × 224 as input67. In our case the number of available variables for each patient is 22. Therefore, we replicate the data m times, creating a matrix of (m × n). To increase the number of columns in the matrix, we replicate each column k times, which can be interpreted as increasing the width of each column. This process results in a matrix of (m × nk). Figure 1 shows an example of the results of the replication process for a patient with 9 variables. The matrix is used for the three channels of the CNN67,84,85. Data Augmentation is applied to the whole image, changing the values randomly as is described in the section, Data Augmentation of the Materials and Methods. Figure 1 shows the creation of the matrix for a patient with 9 variables (a). The resulting matrix (m x n) = (30 × 9) after replication of the rows is shown in Fig. 1b. Figure 1c shows the resulting matrix (m x nk) = (30 × 9*3) after column replication. Figure 2 shows the three options of DA for the patient with 9 variables of Fig. 1. Figure 2a shows the image without DA. Figure 2(b) shows the image with DA modifying a percentage of each row. Figure 2c shows the image with DA modifying a percentage of each column, and Fig. 2d shows the image with DA modifying a percentage of both rows and columns.
(A) Example of data from a patient with 9 variables. (B) Resulting matrix (m × n) = (30 × 9) after row replications. (C) Resulting matrix (m × nk) = (30 × 9*3) after column replication.
Example of DA options in the patient with 9 variables of Fig. 1. (A) Image without DA. (B) Image with DA modifying a percentage of each row. (C) Image with DA modifying a percentage of each column. (D) Image with DA modifying a percentage of both rows and columns.
Binary/Categorical variables are not used to create the input image for the CNN in this transformation because they just take two values. These binary variables are considered as inputs to the classification stage as shown in Fig. 3. Considering this new patient data representation, we added another type of DA usually used with images67,82,84 consisting of small random rotations of ± 5° and vertical flips with a probability of 50%, both applied only to the training set.
Block diagram of the proposed method including the data transformation stage, the CNN stage, and the classification stage for SLD prediction.
CNN stage
As described in the previous section, after the data transformation stage, each patient’s data is represented by an image that becomes the input to a CNN. The selected CNN is the ResNet-50, using the 1.5 version implemented in PyTorch86. This CNN is used to extract features from the image of each patient. The weights of the CNN model were pretrained using the ImageNet 2012 dataset87. The classification layer was removed and replaced by the classification Stage.
Classification stage
The classification stage is a Multi-Layer Perceptron (MLP), consisting of 3 hidden layers of 1000, 500, and 100 neurons. The input to this MLP is the output of the CNN and the binary/categorical variables. The output of this stage is the SLD prediction. The entire process including the three stages, i.e., the data transformation stage, the CNN stage, and the classification stage, can be observed in Fig. 3.
Traditional ML models
To compare the performance of our proposed method, we used twelve traditional ML models developed for SLD prediction using tabular data. The models are the following: Gaussian Naïve Bayes (GNB)88 Bernoulli Naïve Bayes (BNB)88 Decision Trees (DT)88 Support Vector Machines (SVMs)88 Multi-Layer Perceptron (MLP)88 K-Nearest Neighbors (KNN)88Logistic Regression (LR)88 Random Forest (RF)88 Extra Trees (ET)88 Balanced Random Forest (BRF)89 and Gradient Boosting Machines (GB), in two popular implementations, Extreme Gradient Boosting Machines (XGB)90 and Light Gradient Boosting Machines (LGBM)91. These ML models have been used in the prediction of various illnesses with tabular data as inputs, e.g., Gestational Diabetes54. DA Controlled Noise is also applied to these ML models.
DL and ML model implementations and hyperparameters
The models were implemented in Python 3.10.11, using the libraries PyTorch 2.0.1, Scikit-Learn 1.2.2, Imbalanced-Learn 0.10.1, XGBoost 1.7.3, and LightGBM 3.3.5. The hyperparameters for our proposed DL models are related to the image generated m and k, that adjust the size of the image. The values studied for m are 50, 150, 180, 210, 250, and 300. The values analyzed for k are 1, 2, 3, 4, 10, and 36. The layer selected to be the output of the CNN Stage is also a hyperparameter. Two layers of the ResNet were analyzed, Average Pooling layer (Avgpool) and the last convolutional layer. The alternative of selecting some of the CNN intermediate layers has been studied with good results67. The spatial position of the variables in the image is also a hyperparameter. Several random positions were first analyzed, and then we performed permutations of variable positions around those that yielded good results in the first analysis.
The hyperparameters used for the traditional ML models are shown in Table S2 of the Supplementary Material. The hyperparameter selection was performed using a grid search evaluated in a 5-Fold Cross Validation (CV)92. Variable selection was part of the grid search, and the best variables were used for our proposed model. Variable selection was performed using 4 methods/metrics to select the optimal number of variables required for the best model performance, whilst reducing redundancy: F-test of ANOVA (Analysis of Variance), Chi-Square Test, Mutual Information, using the implementation of Scikit-Learn88 and Balanced Random Forest89.
The top 15% of the models with the highest area under the curve (AUC) were selected and assessed on the validation set.
Model evaluation
The validation set was used to select the best models and the decision threshold of our models. The test set was not used in training, model selection, or in decision threshold selection. Trained models were tested using the test set. With the decision thresholds, model results of accuracy, sensitivity, specificity, recall macro, area under the ROC curve (AUCROC), False positives (FP) and False Negatives (FN) are available. A high sensitivity is a priority for gastroenterologists because the model is intended to be used for SLD screening purposes. Therefore, we explored sensitivities above 0.80 (80%) with special attention.
Results
Population characteristics
A total of 2999 patients was included in this study. The dataset was partitioned into a training set of 2099 patients (70%), a validation set of 300 patients (10%), and a test set with 600 patients (20%). The prevalence of SLD in the dataset was 26.64% (799/2999). The test set had 159 patients positive for SLD. Table 1 presents the variables collected in the dataset.
Variable selection
Twenty-two variables were available for the prediction of SLD. Variable selection improves the performance of the models, reducing the use of irrelevant or redundant data. The best 11 variables for each method appear on Table S3 of the Supplementary Material.
Model performance
Table 2 shows the performance of our proposed DL model (OursDLM) compared with traditional ML models. Table 2 includes the number of variables used, the image size for our proposed model, and the DA used. In the traditional ML models, only noise in artificial patients could be applied, while in our DL proposed model, the noise could be applied in rows and/or columns. In both cases, the number of patients created with noise appears next to them on the table. This means that both ML and DL models could use DA Controlled Noise. Noise is a small value, with ranges provided by a medical specialist. Only the training set is altered by this DA. Our DL proposed model also has the possibility of DA with Vertical Flip and Random Rotations. Table 2 also shows the following metrics: Accuracy, Sensitivity, Specificity, Recall Macro, AUCROC, a confusion matrix inline, and the total number of errors for each model. Table 2 also shows the best DL model and the best traditional ML model for each level of sensitivity from 1 to 0.8000. Additional results of OursDLM and OursMLM for more sensitivity levels are included on Table S4 of the Supplementary Material, and Table S4 includes results presented on Table 2. We also included Table S5 in the Supplementary Material, with information about the output layer used and spatial position of the variables. Models are tested using a test set with data that was not used in any of the methods employed to find the best models.
Our DL models show excellent SLD prediction capability with the use of simple-to-obtain clinical and laboratory variables. Table 2 presents the following models that obtained excellent results and that the medical specialists can choose depending on the desired balance between sensitivity (FN) and specificity (FP).
The DL models reached better results compared to the traditional ML models for the same levels of sensitivity.
A comparison of the results, after changing the parameters of replication, was performed for three models (23, 27, 31) at the same threshold of sensitivity as that in Table 3. Replication parameters affect the input image size to the CNN. It can be seen on Table 3 that an image with a shorter height (50) has a worse performance in comparison with our selected value (180), with an average increase in error of 30 FP patients. With a height of 100, there is an increase of error of 19 FP on average. An increase in height, to 210, also has a larger number of errors, up to an average of 8 FP patients. Conversely, decreasing the width of the columns to 5 increases the errors to 86 FP patients on average, in comparison with our selected value (36). With a width of 12, errors increase on average to 57 FP patients in comparison with our selected value. Reducing the width of the column to 12, the error of FP is 36 patients more than in the case of our selected width. If we set the column width to 1, i.e., no replication of the column, the error increases on average to 114 FP patients. Finally, if we increase the width of each column to 40, the error also increases on average to 32 FP patients.
Table S6 of the Supplementary Material shows the results of the Hepatic Steatosis Index (HSI) applied to our database for the various levels of sensitivity from 1 to 0.8050. The comparison of these results to those on Table 2 shows that the results of all the DL models are significantly better than those of the HSI model. In this study, a screening was performed with the general population to detect patients at high risk of SLD. Then, a second confirmatory diagnostic test was performed only with those patients, that is with that high risk of SLD. In this first diagnostic test, which is widely available in the health care system, we use the biochemical profile that includes the variables to determine HSI.
Table S7 in the Supplementary Material shows the metrics for the traditional models without optimization (default hyperparameters), with and without variable selection, and without Data Augmentation. A comparison of the results between models with DA and without DA are shown on Table S8. Thirty of the thirty-two models achieved better results with DA (all models except models 23 and 31). Without DA the errors increased by 4.65% on average, with the greatest difference in Model 1, with 44 more errors when DA is not used.
Table S9 in Supplementary Material shows the thresholds used to calculate the metrics for the models.
A comparison between our proposed model and TabNet is presented on Table S4. In general, TabNet models achieved better, or similar, results compared to those of the traditional ML models. TabNet achieved improved results, e.g., a smaller number of errors, compared to the traditional ML models, in models 1–6, 16–19, 26–28, 30 and 31. TabNet achieved the same results compared to the traditional ML models, with models 15, 21 and 29. In other sensitivity thresholds (0.9623 − 0.9182, 0.8805, 0.8679 − 0.8491, 0.8050), traditional ML models achieved better results than TabNet (models 7–15, 20–25, 29 y 32). In general, TabNet models yielded lower results compared to our proposed models for high sensitivity ranges (0.8050 to 1). For example, our model 20 correctly predicts 14 more negative patients compared to TabNet 20 (328 vs. 314), with the same sensitivity value of 0.8805. Another example is our model 5 and TabNet 5, where the difference is 33 patients not detected by TabNet.
Figure 4 shows the ROC curves for DL models 9, 17, 27, 31, and 32, with sensitivities 0.9497, 0.8994, 0.8365, 0.8113 and 0.8050, respectively. These DL models reach higher AUCROC values compared to the corresponding traditional ML model for all levels of sensitivity in the range 1-0.8050. Also, these DL models reach much higher AUCROC values than those obtained with HSI.
Figure 5 shows another way to compare model results by comparing the total number of errors FP + FN. Figure 5 shows the total number of errors (FP + FN) as a function of the True Positives (TP) in the test set. Our DL models (OursDLM) are shown in blue. Our traditional ML models are shown in red (with optimization and variable selection). In orange are the traditional ML models without optimization, with variable selection. In purple are the traditional ML models without optimization and with no variable selection. The results of the HSI models in our database are in green.
ROC curves of our DL models, OursDLM, with sensitivities 0.9497, 0.8994, 0.8365, 0.8113 and 0.8050.
Total number of errors (FP + FN) as a function of the true positives (TP) in the test set. Our DL models (OursDLM) are shown in blue. In red are our traditional ML models (with optimization and variable selection). The traditional ML models without optimization and with variable selection are in orange. In purple are the traditional ML models without optimization and with no variable selection. The results of the HSI models in our database are in green.
Discussion
Our study presents a new method that enables applying the power of DL models for the prediction of SLD. We also developed twelve different traditional ML models, optimizing their hyperparameters, and compared their results to those of the DL models in the prediction of SLD. In general, the application of the traditional ML models has emerged as a tool to help identify diseases and make decisions in real time43. In our study, we used a dataset that includes data from 2999 patients attending a preventive medicine unit. SLD prevalence in our population was 26.6%.
Our results show that DL models outperform the traditional ML models in a high sensitivity range(1–0.8113). For lower sensitivities (< 0.8052) the traditional ML models reach the best results. Figure 5 shows the total number of errors (FP + FN) as a function of the True Positives (TP) in the test set. It can be observed that our DL models (OursDLM) (blue color) achieve results with the lowest total number of errors compared to those results reached with our traditional ML models (red curve). However, both options OursDLM and OursMLM, yield better results than those of the ML models without optimization (orange and purple curves). It is important to note that variable selection and parameter optimization in the traditional ML models improves results significantly (orange, purple and red curves).
Our DL models show excellent SLD prediction capability with the use of simple-to-obtain clinical and laboratory variables. On Table 2 the following models are presented that obtained excellent results, and that the medical specialists can choose depending on the desired balance between sensitivity (FN) and specificity (FP). For example, model 9 reached a sensitivity of 0.9497 (8 FN), a specificity of 0.6417 (158 FP), and an AUCROC of 0.8662. Another good example on Table 2 is model 17 that reached a sensitivity of 0.8994 (16 FN), a specificity of 0.7211 (123 FP) and an AUCROC of 0.8660. Another choice for cases with lower sensitivity and higher specificity is provided by model 27 on Table 2, that reached a sensitivity of 0.8365 (26 FN), a specificity of 0.7868 (94 FP) and an AUCROC of 0.8630. Additionally, model 31 achieved a sensitivity of 0.8113 (30 FN), a specificity of 0.8004 (88 FP) with an AUCROC of 0.8562. These findings are of particular importance given the increasing prevalence of SLD and its associated adverse effects. The high prevalence of this disease makes it difficult to implement universal screening programs. The use of our DL models could help identify patients at increased risk for SLD who would benefit from a confirmation by testing. On the other hand, our DL models also allow us to identify patients with a very low risk of presenting the disease, for whom it will be possible to choose not to perform further studies, thus reducing costs. Our best models demonstrated that the most effective predicting variables were age, weight, BMI, waist perimeter, AST, ALT, triglycerides, and HDL cholesterol. These are clinical and laboratory variables that are easy to obtain and of low cost, which facilitates the implementation of this model in primary care.
The use of a two-step screening program, in which a formula is applied to select high-risk patients who benefit from abdominal ultrasonography, has shown a reduction in ultrasonography requests, with a low false-negative rate93. In a two-step screening program using one of our models (model 17, Table 2), we could avoid 55.7% of abdominal ultrasounds, with a false negative rate of 2.7%.
OursDLMs in comparison with OursTMLs offers an average reduction in FP for the same level of sensitivity of 15.625 (9.85% reduction) in the sensitivity range analyzed, with a minimum of 3 (3.37% reduction at sensitivity of 0.8050), and a maximum of 43 (20.09% reduction, at sensitivity 0.9748). This improved performance is due to the transformation of the data into images and subsequent application of CNNs for pattern recognition, including use of our proposed DA. OursDLMs compared with traditional models without optimization, but with variable selection, results in an average improvement of 23.59 (14.91% reduction) of FP for the same level of sensitivity with a minimum of 12 (12.24% reduction at sensitivity of 0.8050) and a maximum of 50 (22.62% reduction at sensitivity of 0.9748). It is important to note the importance of both, traditional model optimization, and variable selection, since without it, the difference would be even greater. For example, OursDLMs compared with our traditional models without optimization or variable selection, provide an average improvement of 43.19 (reduction of 23.83%) of FP for the same level of sensitivity with a minimum of 29 (14.80% reduction at sensitivity of 0.9623), and a maximum of 56 (31.64% reduction at sensitivity of 0.8931). Comparing OursDLMs against HSI, the difference is even higher. OursDLMs reach an average improvement of 88.28 (reduction of 38.52%) of FP at the same level of sensitivity with a minimum of 60 (30.61% reduction at sensitivity of 0.9120), and a maximum of 183 (47.04% reduction at sensitivity of 0.9874).
Variable selection impact
Table S10 in the Supplementary Material shows a comparison between traditional models with variable selection and the same models using all available variables (22). Compared to the 32 models, those with 22 variables increase the number of errors by 1.93% on average. The greatest increase in error was for model 24, with an increase of 13 FP. Using 22 variables in models 1, 30, and 31, yielded the same results. Models 2, 5, 6, 10, 12, 23, 26, and 27, improve performance by up to 7 patients. However, in these cases, 22 variables are more than double the number required by the models with variable selection (i.e., model T2 requires 8 variables, and model V22-2 requires 22 variables). Also, using 22 variables will require more time to fill in the data to use the models in clinical practice.
Weight-related variables (BMI, Waist Perimeter and/or Weight) are chosen in the top 5 of all the variable selection methods. Something similar happens with GPT ALT and Triglycerides. Cholesterol HDL is chosen 6th in all the methods except Chi-Square, where it is chosen in 4th place.
Using SHAP, a plot of global feature importance is shown in Fig. 6. The plot shows that GPT ALT is the most influential variable, followed by Triglycerides, BMI, and Age. Cholesterol HDL, and GOT AST have a lesser influence in model prediction.
Global feature importance for SLD prediction, based on SHAP values.
There is limited public availability of datasets from other published studies, making a direct comparison of model performance impossible. There are also different criteria for patient inclusion in each study. Since in our case patients were selected from a screening study, the prevalence of SLD is similar to that of the population. However, in other previous publications51,52,60 the selection of patients was made from a group that consulted for a disease, and therefore the SLD prevalence is much higher. Thus, models may not be directly comparable. As a reference, models from previous studies for SLD prediction are shown on Table 4.
The input variable used by OursDLMs and OursTMLs and the models of the state of the art are shown on Table S11 of Supplementary Material. It is important to mention that 7 variables are used by all of our models: Age, Weight, BMI, Waist Perimeter, GPT ALT, Triglycerides, and Cholesterol HDL. The average number of variables used by our traditional model is 10.85 compared to OursDLMs that use an average of 8.28 variables. The input variable used by OursDLMs and OursTMLs, and the models of the state of the art are shown on Table S11 of the Supplementary Material, including the variable selection method used. It is important to notice that OurDLM of 8 variables uses variables selected by the Chi-Square Test method, while OurDLM of 11 variables uses variables selected by BRF.
The purpose of our study is to develop a screening method for SLD. Then, in patients with high risk of SLD, a second test would be performed to confirm diagnosis. In this context, it is a priority to make the correct prediction of positive patients (high sensitivity) and low FN rate. Also, having a good level of specificity reduces the rate of false positives (FP), and therefore, the second test is performed in a reduced group of patients with a high risk of SLD. We consider at least these two metrics together to make a decision for model performance. A high sensitivity requires a large number of true positives (TP), i.e., most patients with the disease are detected with a low number of false negatives (FNs are patients with the disease not detected). A high specificity requires a small number of FPs. In a screening task, it is necessary first to have good performance in sensitivity, and second in specificity, which could be interpreted as requisite for reducing False Positives and False Negatives.
AUC ROC provides an overall assessment of the model diagnostic performance at different values of sensitivity and specificity. Nevertheless, the ROC curves of the models may intersect and have different performances at various ranges of FP. Therefore, for screening (high sensitivity and high specificity) there may be models with better performance than those with the largest total AUC ROC94,95,96. For example, as shown on Table S4, traditional models 1 to 12 have a higher AUC ROC, but worse performance in FP values for the same level of sensitivity (sensitivities from 1 to 0.802). Another example on Table S4 is model 13, in which our proposed model has a slightly larger AUC ROC (0.8681 versus 0.8678); however, the number of FPs is reduced significantly from 157 to 142 with our model. Many other examples can be observed on Table S4. Also, Table S4 shows that for all levels of sensitivity, from 1 to 0.8050, the best specificity (i.e., the best combination of low FNs and FPs) is reached with our proposed models.
The partial area under the ROC curve (pAUC) is a metric that can be used to compare models in different regions of the ROC curve. Table S12 shows the pAUC for true positive rate (TPR) computed in steps of 0.2. In each range, one of our models achieved the best results. For example, model 31 achieved the best results in the range 0–0.2, and in 0.2–0.4. In the range 0.8–1, our models 27 and 32 yielded the best results.
The same conclusion can be reached by using the precision-recall (PR) curve. Our models in the region of interest (recall 0.8–1) in Figure S1 show greater precision. For example, model 9 has a larger area between recall of 0.9 and 0.98. Model 17 achieves the best results, between recall 0.86 and 0.9. Models 17, 27, 31 and 32 are better in the recall region 0.8–0.86. BRF has less precision, despite achieving a larger AUC PR in the region between a recall of 0.925 and 1. MLP has a good precision in the region of interest (recall 0.8-1), but our proposed model achieves improved results.
Conclusion
SLD, formerly named fatty liver disease, has a prevalence estimated at 30–38% in adults. Detection of SLD is important, since prompt initiation of treatment can stop disease progression, lead to a reduction in adverse outcomes, and reduce the economic burden associated with the disease. In this study we reported the development of a novel DL method for the prediction of SLD, which consists of transforming the input variables from tabular data into images, with the goal of using the pattern recognition power of DL models to achieve the best classification performance. For that purpose, the data of each patient, originally represented as a vector of n variables, was converted into an image replicating each variable m times in one dimension and, k times in the other, creating a matrix of (m x kn). A variable selection, and data augmentation method were used during training to improve prediction results. Twelve traditional machine learning (ML) models were implemented as a comparison with our DL proposed models.
All our proposed DL models reached better results compared to those of traditional ML models at all levels of sensitivity in the range 1-0.8050. This sensitivity range was selected by the hepatologist specialist as appropriate for SLD screening purposes. For example, a sensitivity of 0.9497, a specificity of 0.6417, and an AUCROC of 0.8662 were reached with one of our DL models. Another model reached a sensitivity of 0.8113, a specificity of 0.8004 with an AUCROC of 0.8562. These models require only 8 widely available variables in clinical practice. We also reached significantly better results compared to those obtained with the Hepatic Steatosis Index (HSI). All our DL models reached higher AUCROC values compared to those of the traditional ML models in the sensitivity range 1-0.8050. Additionally, our DL models reach much higher AUCROC values than those obtained with HSI. Our proposed method converts tabular data into images enabling applying the pattern recognition power of DL models to the prediction of SLD. The combination of our proposed DL model with variable selection, hyperparameters optimization and data augmentation allows us to set a new state of the art level for SLD prediction, reaching a better performance than traditional ML models and those of HSI. The proposed method may be applied to prediction of other illnesses by converting tabular data into images.
Data availability
The dataset used in this study are not publicly available due to privacy reasons. The dataset is provided by Clinica los Andes, access to this data may be provided to qualified researchers upon request and permission of this institution (gmezzano@clinicauandes.cl).
Abbreviations
- AFP:
-
Alpha-fetoprotein
- AI:
-
Artificial intelligence
- ALD:
-
Alcohol-associated liver disease
- ALT:
-
Alanine aminotransferase
- AST:
-
Aspartate aminotransferase
- AUC:
-
Area under curve
- AUDIT:
-
Alcohol use disorders identification test
- BMI:
-
Body mass index
- BNB:
-
Bernoulli Naïve Bayes
- BRF:
-
Balanced random forest
- CEA:
-
Carcinoembryonic antigen
- CN:
-
Controlled noise
- CNN:
-
Convolutional neural network
- CV:
-
Cross validation
- DA:
-
Data augmentation
- DBP:
-
Diastolic blood pressure
- DL:
-
Deep learning
- DM:
-
Diabetes mellitus
- DT:
-
Decision tree
- ET:
-
Extra trees
- FN:
-
False negative
- FP:
-
False positive
- FPG:
-
Fasting plasma glucose
- GB:
-
Gradient boosting machine
- GDM:
-
Gestation diabetes mellitus
- GGT:
-
Gamma-glutamyl transferase
- GI:
-
Gastrointestinal
- GNB:
-
Gaussian Naïve Bayes
- HCC:
-
Hepatocellular carcinoma
- HDL:
-
High-density lipoprotein
- HSI:
-
Hepatic steatosis index
- HOMA-IR:
-
Homeostatic model assessment for insulin resistance
- IRB:
-
Institutional review board
- IQR:
-
Interquartile range
- KNN:
-
K-nearest neighbors
- LDL:
-
Low-density lipoprotein
- LGBM:
-
Light gradient boosting machine
- LR:
-
Logistic regression
- MAFLD:
-
Metabolically-associated fatty liver disease
- MASLD:
-
Metabolic dysfunction-associated steatotic liver disease
- MetALD:
-
Metabolic and alcohol-associated liver disease
- ML:
-
Machine learning
- MLP:
-
Multi-layer perceptron
- MRI:
-
Magnetic resonance image
- MRI-PDFF:
-
Magnetic resonance proton density fat fraction
- MRS:
-
Magnetic resonance spectroscopy
- NAFLD:
-
Non-alcoholic fatty liver disease
- RF:
-
Random forest
- ROC:
-
Receiver operating characteristic
- RUS:
-
Random under-sampling
- SBP:
-
Systolic blood pressure
- SLD:
-
Steatotic liver disease
- SVM:
-
Support vector machine
- T2DM:
-
Type 2 diabetes mellitus
- TN:
-
True negative
- TP:
-
True positive
- UA:
-
Uric acid
- VCTE:
-
Vibration-controlled transient elastography
- WC:
-
Waist circumference
- WHR:
-
Waist-hip ratio
- XGB:
-
Extreme gradient boosting machine
References
Powell, E. E., Wong, V. W. S. & Rinella, M. Non-alcoholic fatty liver disease. Lancet 397, 2212–2224 (2021).
Rinella, M. E. et al. A multisociety Delphi consensus statement on new fatty liver disease nomenclature. Hepatology 78, 1966–1986 (2023).
Song, S. J., Lai, J. C. T., Wong, G. L. H., Wong, V. W. S. & Yip, T. Can we use old NAFLD data under the new MASLD definition? J. Hepatol. 80, e54–e56 (2024).
Le, M. H. et al. Prevalence of non-alcoholic fatty liver disease and risk factors for advanced fibrosis and mortality in the united States. PLoS One 12, e0173499 (2017).
Cotter, T. G. & Rinella, M. Nonalcoholic fatty liver disease 2020: the state of the disease. Gastroenterology 158, 1851–1864 (2020).
Staufer, K. & Stauber, R. E. Steatotic liver disease: Metabolic dysfunction, alcohol, or both? Biomedicines 11, 2108 (2023).
Younossi, Z. M. et al. The global epidemiology of nonalcoholic fatty liver disease (NAFLD) and nonalcoholic steatohepatitis (NASH): A systematic review. Hepatology 77, 1335–1347 (2023).
Lazarus, J. V. et al. A global research priorityagenda to advance public health responses to fatty liver disease. J. Hepatol. 79, 618–634 (2023).
Younossi, Z. M. et al. Global epidemiology of nonalcoholic fatty liver disease—meta-analytic assessment of prevalence, incidence, and outcomes. Hepatology 64, 73–84 (2016).
Ayares, G., Idalsoaga, F., Díaz, L. A., Arnold, J. & Arab, J. P. Current medical treatment for alcohol-associated liver disease. J. Clin. Exp. Hepatol. 12, 1333–1348 (2022).
Dang, K., Hirode, G., Singal, A. K., Sundaram, V. & Wong, R. J. Alcoholic liver disease epidemiology in the United States: A retrospective analysis of 3 US databases. Official J. Am. Coll. Gastroenterol. ACG 115, 96–104 (2020).
Younossi, Z. M., Wong, G., Anstee, Q. M. & Henry, L. The global burden of liver disease. Clin. Gastroenterol. Hepatol. 21, 1978–1991 (2023).
Devarbhavi, H. et al. Global burden of liver disease: 2023 update. J. Hepatol. 79, 516–537 (2023).
Seitz, H. K. et al. Alcoholic liver disease. Nat. Rev. Dis. Primers 4, 16 (2018).
Saiman, Y., Duarte-Rojo, A. & Rinella, M. E. Fatty liver disease: Diagnosis and stratification. Annu. Rev. Med. 73, 529–544 (2022).
Mezzano, G. et al. Global burden of disease: acute-on-chronic liver failure, a systematic review and meta-analysis. Gut 71, 148 (2022).
Paik, J. M. et al. Mortality related to nonalcoholic fatty liver disease is increasing in the united States. Hepatol. Commun. 3, 1459–1471 (2019).
Pal, S. C. & Méndez-Sánchez, N. Screening for MAFLD: Who, when and how? Ther. Adv. Endocrinol. Metab. 14, 20420188221145650 (2023).
Yoo, E. R., Ahmed, A. & Kim, D. Economic burden and healthcare utilization in nonalcoholic fatty liver disease. Hepatobiliary Surg. Nutr. 8, 181–183 (2019).
Younossi, Z. M., Henry, L., Bush, H. & Mishra, A. Clinical and economic burden of nonalcoholic fatty liver disease and nonalcoholic steatohepatitis. Clin. Liver Dis. 22, 1–10 (2018).
Allen, A. M., Van Houten, H. K., Sangaralingham, L. R., Talwalkar, J. A. & McCoy, R. G. Healthcare cost and utilization in nonalcoholic fatty liver disease: Real-World data from a large U.S. Claims database. Hepatology 68, 2230–2238 (2018).
Stepanova, M., Henry, L. & Younossi, Z. M. Economic burden and patient-reported outcomes of nonalcoholic fatty liver disease. Clin. Liver Dis. 27, 483–513 (2023).
Dan, A. A. et al. Health-related quality of life in patients with non-alcoholic fatty liver disease. Aliment. Pharmacol. Ther. 26, 815–820 (2007).
Jiang, W. et al. Global burden of nonalcoholic fatty liver disease, 1990 to 2019: Findings from the global burden of disease study 2019. J. Clin. Gastroenterol. 57, 631–639 (2023).
Chalasani, N. et al. The diagnosis and management of nonalcoholic fatty liver disease: Practice guidance from the American association for the study of liver diseases. Hepatology 67, 328–357 (2018).
Huang, T., Behary, J. & Zekry, A. Non-alcoholic fatty liver disease: a review of epidemiology, risk factors, diagnosis and management. Intern. Med. J. 50, 1038–1047 (2020).
Hernaez, R. et al. Diagnostic accuracy and reliability of ultrasonography for the detection of fatty liver: A meta-analysis. Hepatology 54, 1082–1090 (2011).
Berzigotti, A. et al. EASL clinical practice guidelines on non-invasive tests for evaluation of liver disease severity and prognosis – 2021 update. J. Hepatol. 75, 659–689 (2021).
Eslam, M. et al. The Asian Pacific association for the study of the liver clinical practice guidelines for the diagnosis and management of metabolic associated fatty liver disease. Hepatol. Int. 14, 889–919 (2020).
European Association for the Study of the Liver (EASL). European association for the study of diabetes (EASD) & European association for the study of obesity (EASO). EASL–EASD–EASO clinical practice guidelines for the management of non-alcoholic fatty liver disease. J. Hepatol. 64, 1388–1402 (2016).
Ciardullo, S., Vergani, M. & Perseghin, G. Nonalcoholic fatty liver disease in patients with type 2 diabetes: Screening, diagnosis, and treatment. J. Clin. Med. 12, 5597 (2023).
Gu, J. et al. Diagnostic value of MRI-PDFF for hepatic steatosis in patients with non-alcoholic fatty liver disease: A meta-analysis. Eur. Radiol. 29, 3564–3573 (2019).
Rinella, M. E. Nonalcoholic fatty liver disease: A systematic review. JAMA 313, 2263–2273 (2015).
Cusi, K. et al. American association of clinical endocrinology clinical practice guideline for the diagnosis and management of nonalcoholic fatty liver disease in primary care and endocrinology clinical settings: Co-sponsored by the American association for the study of liver diseases (AASLD). Endocr. Pract. 28, 528–562 (2022).
Noureddin, M. et al. Screening for nonalcoholic fatty liver disease in persons with type 2 diabetes in the United States is cost-effective: A comprehensive cost-utility analysis. Gastroenterology 159, 1985–1987e4(2020).
Wong, V. W. S. et al. Community-based lifestyle modification programme for non-alcoholic fatty liver disease: A randomized controlled trial. J. Hepatol. 59, 536–542 (2013).
Eckard, C. et al. Prospective histopathologic evaluation of lifestyle modification in nonalcoholic fatty liver disease: A randomized trial. Th. Adv. Gastroenterol. 6, 249–259 (2013).
Arab, J. P. et al. Latin American association for the study of the liver (ALEH) practice guidance for the diagnosis and treatment of non-alcoholic fatty liver disease. Ann. Hepatol. 19, 674–690 (2020).
Bedogni, G. et al. The fatty liver index: A simple and accurate predictor of hepatic steatosis in the general population. BMC Gastroenterol. 6, 33 (2006).
Lee, J. H. et al. Hepatic steatosis index: A simple screening tool reflecting nonalcoholic fatty liver disease. Dig. Liver Dis. 42, 503–508 (2010).
Goldman, O. et al. Non-alcoholic fatty liver and liver fibrosis predictive analytics: risk prediction and machine learning techniques for improved preventive medicine. J. Med. Syst. 45, 22 (2021).
Liu, Y. X. et al. Comparison and development of advanced machine learning tools to predict nonalcoholic fatty liver disease: An extended study. Hepatobiliary Pancreat. Dis. Int. 20, 409–415 (2021).
Wu, C. C. et al. Prediction of fatty liver disease using machine learning algorithms. Comput. Methods Progr. Biomed. 170, 23–29 (2019).
Zhao, M., Song, C., Luo, T., Huang, T. & Lin, S. Fatty liver disease prediction model based on big data of electronic physical examination records. Front Public. Health 9 (2021).
Noureddin, M. et al. Predicting NAFLD prevalence in the United States using national health and nutrition examination survey 2017–2018 transient elastography data and application of machine learning. Hepatol. Commun. 6, 1537–1548 (2022).
Ji, W., Xue, M., Zhang, Y., Yao, H. & Wang, Y. A machine learning based framework to identify and classify Non-alcoholic fatty liver disease in a large-scale population. Front Public. Health 10 (2022).
Wu, C. T., Chu, T. W. & Jang, J. S. R. Current-visit and next-visit prediction for fatty liver disease with a large-scale dataset: Model development and performance comparison. JMIR Med. Inf. 9, e26398 (2021).
Atsawarungruangkit, A., Laoveeravat, P. & Promrat, K. Machine learning models for predicting non-alcoholic fatty liver disease in the general United States population: NHANES database. World J. Hepatol. 13, 1417–1427 (2021).
Meffert, P. J. et al. Development external validation, and comparative assessment of a new diagnostic score for hepatic steatosis. Official J. Am. Coll. Gastroenterol. ACG 109, 1404–1414 (2014).
Weng, S., Hu, D., Chen, J., Yang, Y. & Peng, D. Prediction of fatty liver disease in a Chinese population using Machine-Learning algorithms. Diagnostics 13, 1168 (2023).
Razmpour, F. et al. Application of machine learning in predicting non-alcoholic fatty liver disease using anthropometric and body composition indices. Sci. Rep. 13, 4942 (2023).
Peng, H. Y. et al. Development and validation of machine learning models for nonalcoholic fatty liver disease. Hepatobiliary Pancreat. Dis. Int. 22, 615–621 (2023).
Pei, X., Deng, Q., Liu, Z., Yan, X. & Sun, W. Machine learning algorithms for predicting fatty liver disease. Ann. Nutr. Metab. 77, 38–45 (2021).
Cubillos, G. et al. Development of machine learning models to predict gestational diabetes risk in the first half of pregnancy. BMC Pregnancy Childbirth 23, 469 (2023).
Deshmukh, F. & Merchant, S. S. Explainable machine learning model for predicting GI bleed mortality in the intensive care unit. Official J. Am. Coll. Gastroenterol. ACG 115, 1657–1668 (2020).
Otsubo, N. et al. Utility of indices obtained during medical checkups for predicting fatty liver disease in Non-obese people. Intern. Med. 62, 2307–2319 (2023).
Han, N. et al. Identification of biomarkers in nonalcoholic fatty liver disease: A machine learning method and experimental study. Front. Genet. 13, (2022).
Xue, Y., Xu, J., Li, M. & Gao, Y. Potential screening indicators for early diagnosis of NAFLD/MAFLD and liver fibrosis: triglyceride glucose index–related parameters. Front Endocrinol. (Lausanne) 13, (2022).
Chen, Y. Y. et al. Machine-learning algorithm for predicting fatty liver disease in a Taiwanese population. J. Pers. Med. 12, 1026 (2022).
Islam, M. M., Wu, C. C., Poly, T. N., Yang, H. C. & Li, Y. C. J. Applications of machine learning in fatty live disease prediction. Stud. Health Technol. Inf. 247, 166–170 (2018).
Pan, X. et al. Risk prediction for Non-alcoholic fatty liver disease based on biochemical and dietary variables in a Chinese Han population. Front Public. Health 8,(2020).
Sorino, P. et al. Development and validation of a neural network for NAFLD diagnosis. Sci. Rep. 11, 20240 (2021).
Yuan, Y. et al. Development and validation of a nomogram model for predicting the risk of MAFLD in the young population. Sci. Rep. 14, 9376 (2024).
Hardy, R. et al. Improving nonalcoholic fatty liver disease classification performance with latent diffusion models. Sci. Rep. 13, 21619 (2023).
Li, D., Zhang, M., Wu, S., Tan, H. & Li, N. Risk factors and prediction model for nonalcoholic fatty liver disease in Northwest China. Sci. Rep. 12, 13877 (2022).
Pouyanfar, S. et al. A survey on deep learning: Algorithms, techniques, and applications. ACM Comput. Surv. 51, 1–36 (2018).
Zambrano, J. E., Benalcazar, D. P., Perez, C. A. & Bowyer, K. W. Iris recognition using Low-Level CNN layers without training and single matching. IEEE Access. 10, 41276–41286 (2022).
Dong, S., Wang, P. & Abbas, K. A survey on deep learning and its applications. Comput. Sci. Rev. 40, 100379 (2021).
Khan, A., Sohail, A., Zahoora, U. & Qureshi, A. S. A survey of the recent architectures of deep convolutional neural networks. Artif. Intell. Rev. 53, 5455–5516 (2020).
Gu, J. et al. Recent advances in convolutional neural networks. Pattern Recognit. 77, 354–377 (2018).
Borisov, V. et al. Deep neural networks and tabular data: A survey. IEEE Trans. NeuralNetw. Learn. Syst. 35, 7499–7519 (2024).
Sharma, A. et al. A methodology to transform a non-image data to an image for convolution neural network architecture. Sci. Rep. 9, 11399 (2019).
Bazgir, O. et al. Representation of features as images with neighborhood dependencies for compatibility with convolutional neural networks. Nat. Commun. 11, 4391 (2020).
Zhu, Y. et al. Converting tabular data into images for deep learning with convolutional neural networks. Sci. Rep. 11, 11325 (2021).
Sharma, A. & Kumar, D. Classification with 2-D convolutional neural networks for breast cancer diagnosis. Sci. Rep. 12, 21857 (2022).
Akazawa, M. & Hashimoto, K. A multimodal deep learning model for predicting severe hemorrhage in placenta previa. Sci. Rep. 13, 17320 (2023).
Shen, D., Wu, G. & Suk, H. I. Deep learning in medical image analysis. Annu. Rev. Biomed. Eng. 19, 221–248 (2017).
Suzuki, K. Overview of deep learning in medical imaging. Radiol. Phys. Technol. 10, 257–273 (2017).
Greenspan, H., van Ginneken, B. & Summers, R. M. Guest editorial deep learning in medical imaging: Overview and future promise of an exciting new technique. IEEE Trans. Med. Imaging 35, 1153–1159 (2016).
Litjens, G. et al. A survey on deep learning in medical image analysis. Med. Image Anal. 42, 60–88 (2017).
Bush, K. et al. The AUDIT alcohol consumption questions (AUDIT-C): An effective brief screening test for problem drinking. Arch. Intern. Med. 158, 1789–1795 (1998).
Montecino, D. A., Perez, C. A. & Bowyer, K. W. Two-level genetic algorithm for evolving convolutional neural networks for pattern recognition. IEEE Access 9, 126856–126872 (2021).
World Health Organization. A healthy lifestyle—WHO recommendations. (2010). https://www.who.int/europe/news-room/fact-sheets/item/a-healthy-lifestyle---who-recommendations
Pilataxi, J. I., Zambrano, J. E., Perez, C. A. & Bowyer, K. W. Improved search in neuroevolution using a neural architecture classifier with the CNN architecture encoding as feature vector. IEEE Access 12, 11987–12000 (2024).
Perez, J. P. & Perez, C. A. Face patches designed through neuroevolution for face recognition with large pose variation. IEEE Access 11, 72861–72873 (2023).
Paszke, A. et al. Curran Associates Inc., Red Hook, NY, USA,. PyTorch: an imperative style, high-performance deep learning library. in Proceedings of the 33rd International Conference on Neural Information Processing Systems (2019).
Russakovsky, O. et al. ImageNet large scale visual recognition challenge. Int. J. Comput. Vis. 115, 211–252 (2015).
Pedregosa, F. et al. Scikit-learn: machine learning in Python. J. Mach. Learn. Res. 12, 2825–2830 (2011).
Lemaître, G., Nogueira, F. & Aridas, C. K. Imbalanced-learn: A Python toolbox to tackle the curse of imbalanced datasets in machine learning. J. Mach. Learn. Res. 18, 1–5 (2017).
Chen, T. & Guestrin, C. XGBoost. in Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining 785–794 (ACM, New York, NY, USA, 2016). https://doi.org/10.1145/2939672.2939785.
Ke, G. et al. LightGBM: A highly efficient gradient boosting decision tree. in Advances in Neural Information Processing Systems (ed Guyon, I.) vol. 30 (Curran Associates, Inc., 2017).
Cawley, G. C. & Talbot, N. L. C. On over-fitting in model selection and subsequent selection bias in performance evaluation. J. Mach. Learn. Res. 11, 2079–2107 (2010).
Procino, F. et al. Reducing NAFLD-screening time: A comparative study of eight diagnostic methods offering an alternative to ultrasound scans. Liver Int. 39, 187–196 (2019).
Halligan, S., Altman, D. G. & Mallett, S. Disadvantages of using the area under the receiver operating characteristic curve to assess imaging tests: A discussion and proposal for an alternative approach. Eur. Radiol. 25, 932–939 (2015).
Bhat, S. et al. AUCReshaping: Improved sensitivity at high-specificity. Sci. Rep. 13, 21097 (2023).
Janssens, A. C. J. W. & Martens, F. K. Reflection on modern methods: Revisiting the area under the ROC curve. Int. J. Epidemiol. 49, 1397–1403 (2020).
Acknowledgements
This work was supported in part by Agencia Nacional de Investigación y Desarrollo (ANID) under Grant FONDECYT 1231675, and under Basal funding for Scientific and Technological Center of Excellence, Project AFB220002, IMPACT #FB210024, and in part by the Department of Electrical Engineering, Universidad de Chile. Also, this work has been supported by Centro de EnfermedadesDigestivas and Unidad de Medicina Preventiva, Universidad de los Andes, Chile, and by Gastroenterology, Hospital del Salvador, Chile.
Author information
Authors and Affiliations
Contributions
G.C. participated in the conceptualization, data curation, formal analysis, investigation, methodology, software implementation, results generation, validation, visualization, writing original draft. J.P.V. participated in the conceptualization, data curation, formal analysis, investigation, methodology, visualization, writing original draft. H.A. participated in the investigation, methodology, validation, writing the original draft . L.M. participated in the data curation, formal analysis, investigation, methodology. L.C. participated in the conceptualization, investigation, methodology, supervision. G.M. participated in the conceptualization, investigation, methodology, supervision, writing original draft, and review. C.P. participated in the conceptualization, formal analysis, funding acquisition, investigation, methodology, project administration, supervision, validation, visualization, writing original draft. All authors reviewed the manuscript.
Corresponding author
Ethics declarations
Competing interests
The authors declare no competing interests.
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Electronic supplementary material
Below is the link to the electronic supplementary material.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.
About this article
Cite this article
Cubillos, G., Perez-Valenzuela, J., Aguirre, H. et al. Development of a novel deep learning method that transforms tabular input variables into images for the prediction of SLD. Sci Rep 15, 28024 (2025). https://doi.org/10.1038/s41598-025-12900-z
Received:
Accepted:
Published:
Version of record:
DOI: https://doi.org/10.1038/s41598-025-12900-z








