Development of a novel deep learning method that transforms tabular input variables into images for the prediction of SLD

Cubillos, Gabriel; Perez-Valenzuela, Javier; Aguirre, Herman; Martínez, Luz; Castro, Lorena; Mezzano, Gabriel; Perez, Claudio A.

doi:10.1038/s41598-025-12900-z

Download PDF

Article
Open access
Published: 31 July 2025

Development of a novel deep learning method that transforms tabular input variables into images for the prediction of SLD

Gabriel Cubillos^1,2,
Javier Perez-Valenzuela³,
Herman Aguirre⁴,
Luz Martínez⁵,
Lorena Castro⁶,
Gabriel Mezzano⁶ &
…
Claudio A. Perez^1,2

Scientific Reports volume 15, Article number: 28024 (2025) Cite this article

2290 Accesses
5 Citations
Metrics details

Subjects

Abstract

Steatotic liver disease (SLD), formerly named fatty liver disease, has a prevalence estimated at 30–38% in adults. Detection of SLD is important, since prompt initiation of treatment can stop disease progression, lead to a reduction in adverse outcomes, and reduce the economic burden associated with the disease. We report the development of a novel Deep Learning (DL) method for the prediction of SLD, which consists of transforming the input variables from tabular data into images, with the goal of using the pattern recognition power of DL models to reach the best prediction performance. The dataset used in this study includes registries from 2,999 patients. The data of each patient, originally represented as a vector, is converted into an image replicating each variable in rows and columns. Our DL models reach better results compared to those of traditional ML models at various levels of sensitivity and specificity. A sensitivity of 0.9497, a specificity of 0.6417, and an AUCROC of 0.8662 were reached with one DL model. We also achieved significantly better results relative to those obtained with the Hepatic Steatosis Index (HSI). Our DL models reach higher AUCROC values compared to those of the traditional ML models, and also with respect to those obtained with HSI.

A stacking ensemble machine learning model to predict alpha-1 antitrypsin deficiency-associated liver disease clinical outcomes based on UK Biobank data

Article Open access 11 October 2022

Deep-learning segmentation to select liver parenchyma for categorizing hepatic steatosis on multinational chest CT

Article Open access 25 May 2024

Improving nonalcoholic fatty liver disease classification performance with latent diffusion models

Article Open access 07 December 2023

Introduction

Steatotic liver disease (SLD), formerly known as fatty liver disease, is a common disease defined by the presence of steatosis in more than 5% of the hepatocytes¹. A new nomenclature was recently published, classifying SLD into 5 groups: Metabolic Dysfunction-Associated Steatotic Liver Disease (MASLD), Alcohol-Associated Liver Disease (ALD), a combination of MASLD and ALD (MetALD), SLD secondary to another specific etiology, and cryptogenic SLD². The term MASLD replaces the former terms Non-Alcoholic Fatty Liver Disease (NAFLD), and Metabolically-Associated Fatty Liver Disease (MAFLD). A great concordance has been shown among these three definitions, and previous NAFLD studies are considered to be valid under the new MASLD definition³.

MASLD is the most common cause of chronic liver disease worldwide¹ with a continuously growing global prevalence associated with the increasing prevalence of diabetes, obesity, and metabolic syndrome^4,5,6. Its prevalence is estimated to be at 30–38% in adults with a 50.4% increase in the last 3 decades^4,7,8. A meta-analysis estimated a high prevalence of MASLD in South America (30.4%)⁹. MASLD prevalence is higher in metabolic risk groups, affecting > 70% of patients with type 2 diabetes mellitus (T2DM), and 90% of patients with severe and morbid obesity undergoing bariatric surgery⁶. Likewise, ALD is one of the leading causes of chronic liver disease¹⁰ but contrary to MASLD, ALD prevalence has remained stable during the last decades, with an estimated global prevalence of 8%^6,11. ALD may coexist with other liver diseases, such as viral hepatitis and MASLD (MetALD), contributing to the progression of liver disease^12,13.

SLD is associated with liver and non-liver adverse outcomes. It can progress to steatohepatitis, fibrosis, cirrhosis, end-stage liver disease, and hepatocellular carcinoma (HCC), and is associated with an increase in all-cause mortality^1,6,14. Both MASLD and ALD are considered leading indications for livertransplantation^1,6,13,14,15. ALD was the most common underlying reported chronic liver disease in patients with acute-on-chronic liver failure¹⁶. Furthermore, MASLD is associated with a significant number of comorbidities that lead to a higher risk of non-liver malignancies such as colorectal cancer, lung diseases, chronic kidney disease, cognitive impairment, and complications of T2DM^6,17,18. Also, MASLD shares a complex bidirectional relationship with cardiovascular disease, which is the main cause of death in these patients^8,18.

Consequently, SLD is associated with a high economic burden, which has been increasing in recent decades^19,20. MASLD significantly increases health-care costs compared to those of non-MASLD patients. The average rate of overall outpatient visits at 5 years following diagnosis was 40% higher among patients with MASLD compared with controls^19,21. Furthermore, MASLD is associated with a reduction in health-related quality of life compared to patients with no liver disease and patients with liver disease due to other causes^22,23. MASLD-related deaths due to cirrhosis and liver cancer have increased by 76.7% and 95.1%, respectively between 1990 and 2019²⁴.

The diagnosis of SLD requires evidence of hepatic steatosis by either imaging or histology^2,5,15,25. Liver biopsy is considered the gold standard for diagnosis, which is associated with low, but not negligible complication rates²⁶and it is reserved for specific scenarios such as diagnostic doubt, or patients at increased risk for advanced fibrosis²⁵. Consequently, in routine clinical practice, most diagnoses of SLD are made radiologically. Abdominal ultrasound is the most commonly used method because it is relatively inexpensive, accessible, and innocuous, with a sensitivity and specificity of approximately 85% and 94%, respectively^1,26,27 and its use is recommended by international guidelines as a first-line diagnostic test^28,29,30.

Hepatic steatosis is characterized by a bright liver echotexture and blurring of the hepatic vasculature. Ultrasound reliability is operator dependent, is limited in patients with central obesity, and has limited sensitivity in mild steatosis³¹. Alternative imaging techniques are associated with higher costs, including vibration-controlled transient elastography (VCTE), magnetic resonance spectroscopy (MRS), and magnetic resonance proton density fat fraction (MRI-PDFF)¹⁸. MRS has a good correlation with MRI-PDFF, which has a sensitivity of 93% and 94% specificity³². Despite its better accuracy for detecting steatosis, cost and limited availability restrict its use in clinical practice^26,30.

Early SLD diagnosis is important, since prompt initiation of treatment can stop disease progression, lead to a reduction in adverse outcomes, and reduce the economic burden associated with the disease^33,34. A recent study showed that a screening strategy for MASLD followed by intensive lifestyle interventions, or pioglitazone in persons with T2DM, is cost-effective³⁵. Life-style interventions in patients with MASLD have proven regression in MRI-PDFF³⁶ and improvement in liver histology³⁷. In patients with ALD, alcohol abstinence reduces the risk of disease progression¹⁴. However, early diagnosis is difficult, since SLD patients are often asymptomatic and have no laboratory alterations, especially in early stages²⁶. The use of screening techniques can help disease detection in asymptomatic patients. Although abdominal ultrasound is a relatively low-cost test and has good performance as a first-line diagnostic test, in larger screening studies the cost and availability of imaging impact feasibility, especially in primary care centers³⁰.

Currently in Latin America, there is no consensus on recommending SLD screening in the general population, due to the low cost-effectiveness of this practice, and the associated risks of invasive tests. However, MASLD screening is recommended in patients with repeatedly altered liver enzymes, features of metabolic syndrome, or obesity. In these cases, abdominal ultrasound is the recommended initial screening method³⁸. However, this method is not widely available in primary health care.

From this perspective, there have been attempts to create prediction models to diagnose SLD without the use of imaging, or biopsy, that can be applied to the general population^39,40. More recently, with the development of machine learning, new models have been published^{41,42,43,44,45,46,47,48,49,50,51,52,53}. Some of these models use simple variables, such as age, body mass index (BMI), alanine aminotransferase (ALT), aspartate aminotransferase (AST), gamma-glutamyl transferase (GGT), fasting plasma glucose (FPG), and triglyceride.

ML models have also been used previously in the prediction of different diseases. Several models were developed to predict the risk of gestational diabetes mellitus (GDM)⁵⁴. ML was used to diagnose acute gastrointestinal (GI) bleed⁵⁵.

State of the art in SLD prediction with ML models

Several models have been developed to predict SLD, NAFLD and MAFLD with various approaches^{39,40,41,42,43,44,45,46,47,48,49,50,51,52,53,56,57,58,59,60,61,62,63,64,65}. Most of them use data from medical visits and blood tests. Liver ultrasound combined with diffusion models are used to improve the classification of NAFLD⁶⁴. The best performance was reached with a combination of Triglycerides, Glycemia, and Waist Circumference (WC). For the prediction of NAFLD, 35 clinical and biochemical variables are used with an extreme gradient boosting (XGB)⁴² and 12 variables are used as input for an XGB⁴⁶. A random forest (RF) model for the prediction of SLD was made using age, gender, systolic blood pressure (SBP), diastolic blood pressure (DBP), abdominal girdle and triglycerides⁴³. 30 variables are used in a random under-sampling (RUS) boosted tree⁴⁸. Comparisons among a custom model (LR), FLI, and Hepatic Steatosis Index (HSI) are made, using patient data from two different regions of Europe. On average, the custom model reached better prediction results compared to those of FLI and HSI. The custom model required 8 variables: age, serum aspartate aminotransferase (AST), AST – alanine aminotransferase (ALT) ratio, Waist circumference (cm), Ferritin, body mass index (BMI), Serum triglycerides andGout⁴⁹. 11 variables were used for the prediction of SLD in a Chinese cohort using an XGB⁵⁰ the variables were BMI, Albumin, ALT, globulin, fasting blood glucose (FBG), high-density lipoprotein cholesterol (HDL-c), low-density lipoprotein cholesterol (LDL-c) and triglyceride. A combination of 28 variables with a RF was used to predict NAFLD, however the most important variables were WC, chest circumference, trunk fat and BMI⁵¹. After a selection of variables, just 8 were used for an XGB model to predict NAFLD⁵². BMI, uric acid, triglyceride, HDL, height, hemoglobin, LDL, carcinoembryonic antigen (CEA), AST, age, glucose, and alpha-fetoprotein (AFP) were the inputs of an XGB model⁵³. A variety of clinical data is used to diagnose MAFLD and NAFLD, with the best performance obtained by using the Fatty Liver Index (FLI)⁵⁶. Another approach employed genetic biomarkers to predict NAFLD, using Machine Learning (ML) methods to select the biomarkers, followed by performing the prediction using the biomarkers with a nomogram⁵⁷. A logistic regression (LR) model is used for MAFLD and NAFLD prediction using blood tests including Triglycerides, Glycemia, HOMA-IR, and data measured in clinical visits⁵⁸. An XGB is also used with 27 variables⁵⁹. Age, gender, BMI, Cholesterol, HDL-C, LDL-C, Glucose, GOT-AST, and GPT-ALT were used for the prediction of SLD with an LR⁶⁰. BMI, Waist Circumference, HDL, Triglycerides, ALT, Tuber Consumption, Fry Food Consumption, Diabetes and Hyperuricemia were used for the prediction of NAFLD with an stepwise LR model⁶¹. Sex, Age, Gamma-Glutamyl Transferase (GGT), Glucose, Abdominal Volume Index were used as inputs to a neural network to predict NAFLD⁶². Age, gender, BMI, WHR, ALT, LDL, HDL, UA, and smoking were used with a LR model to predict MAFLD in young adults (18–44 years old)⁶³. Age, sex, waist circumference, BMI, ALT and triglyceride glucose index were used to predict NAFLD with a LR model⁶⁵.

FLI was proposed by Bedogni et al.³⁹ for the prediction of SLD. The model requires just 4 variables: gamma-glutamyltransferase (GGT), BMI, triglyceride, and WC. HSI was proposed by Lee et al.⁴⁰ to predict NAFLD, and requires ALT/AST ratio, BMI, diabetes mellitus (DM) and gender. In both cases, the models employed were LR^39,40. In other studies, models use more complex variables; e.g., data from multiple medical visits is used to predict SLD⁴⁷.

Deep learning capabilities in pattern recognition

Deep learning (DL), which has its roots in conventional neural networks, significantly outperformed its predecessors in pattern recognition tasks⁶⁶. Deep Learning models include a layered architecture of data representation, in which the high-level features can be extracted from the last layers of the networks while the low-level features are extracted from the lower layers^66,67. These architectures were originally inspired by Artificial Intelligence (AI) simulating processes of key sensory areas, such as vision in the human brain⁶⁸. One of the main advantages of DL is mimicking how the human brain works. With great success in many fields, deep learning has reached excellent performance in tasks that require pattern recognition, image classification, object detection, video processing, natural language processing, and speech processing, among others^68,69,70. Patient variables, obtained through exams or physical measurements, and coded as tabular data, are still a challenge to DL models⁷¹. Some approaches consist of transforming tabular data into an image. DeepInsight converts non-image data into an image, formed by arranging pixel positions with similar features together⁷². Bazgir et al. present a feature representation termed REFINED to create a 2D image based on the feature pairwise relationship⁷³. An image generator for tabular data (IGDT) also positions features based on how close they are to each other⁷⁴. Sharma et al. proposed methods to represent a 1-D vector in a 2-D graphical image, using a bar graph, a normalized distance matrix, and a combination of both⁷⁵. Previous methods use the power of CNN to classify, getting good results in their comparison with traditional models.

A multimodal model was used to predict severe hemorrhage in placenta previa using MRI and tabular data⁷⁶.

Our contributions

Several studies have shown the tremendous success of DL methods in medical image analysis^77,78,79,80. In particular, with the introduction of CNNs, many pattern recognition problems in images have been solved successfully. Part of the success could be attributed to CNN architectures that are based on the visual system architecture with convolutional layers that extract various features from images. Our proposed models consider the spatial representation of each variable, including width and height, in the image, and the importance of the spatial position of each variable since these could match the filters of the convolutional layers of the CNNs spatially to improve classification results.

We developed a model using DL for the disease prediction transforming the input variables from tabular data to images, with the goal of using the power of CNN models to recognize complex patterns in images. DL models should outperform traditional machine learning models in disease prediction. This is the main problem we address in our study, proposing a new novel method to treat tabular data as images. The main contributions of our method are summarized as follows:

(1)
In this work we report the development of a novel DL method for the prediction of SLD, which consists of transforming the input variables from tabular data into images, with the goal of matching the variable representation sizes in the image with those of the filters of the convolutional layers of the CNN models to reach the best classification performance. Additionally, based on our literature review, existing approaches require a larger number of variables than those available in the SLD prediction problem and have not explored the variable representation size and position. Also, previously published methods have not been applied to SLD prediction.
(2)
We applied our method to an important illness, SLD, that has a prevalence between 30 and 38% in adults using a database of 2,999 patients. Detection of SLD is important, since prompt initiation of treatment can stop disease progression, lead to a reduction in adverse outcomes, and reduce the economic burden associated with the disease.
(3)
Our proposed model is compared with the twelve different traditional ML models we have also developed for SLD prediction.
(4)
A search was performed for the optimization of the hyperparameters of all DL and ML models. We also included the application of a variable selection process, reducing the redundance of data to improve the performance of the model. Additionally, our proposed method could be extended to other illnesses.
(5)
Our results show that by using DL on the transformed patients’ data, we obtained significantly better performance than that based on traditional machine learning models, and those based on the Hepatic Steatosis Index (HSI). We obtained an average reduction in false positives (FP) for the same level of sensitivity of 9.85% which is very significant.

Materials and methods

Database

The dataset used in this study was obtained from patients attending the Preventive Medicine Unit of the University of the Andes Clinic, in Santiago, Chile. The dataset includes data from 2,999 patients, obtained between February 2022 and October 2023. The only exclusion criteria were that the patients must be at least 18 years old, and patients must have all variables assessed to be included in the dataset. AUDITc was computed using a brief test proposed by Bush et al. in 1998⁸¹, in which the range of values is between 0 and 12. The data for each input to the model was normalized by subtracting the mean and dividing it by the standard deviation. The dataset was randomly divided into three partitions; training set (70%), validation set (10%) and testing set (20%). The data usage was approved by the institutional review board (IRB) of Clinica Universidad de los Andes, Santiago, Chile. Our study is a retrospective study on de-identified data. The institutional review board (IRB) of Clinica Universidad de los Andes, Santiago, Chile, approved the data usage and waived the need for informed consent. All methods were performed in accordance with relevant guidelines and regulations.

Data augmentation

Data Augmentation (DA) is a method commonly used in ML and DL to improve the performance of the models^54,82. In this study, a DA method is proposed that is only used on the training set. It consists of the creation of new patients, but these new patients must have their new data within a range of values established by a medical specialist in Hepatology. The objective of using these ranges is that the new, artificially created patients have their variables with values that are validated by the medical experts. Table S1 of the Supplementary Material presents the range for each of the variables proposed by the medical specialists, including the limitations for some variables, such as in Glycemia, keeping the same category of the original patient, and in AUDITc, keeping same category. BMI adapts to the changes of weight and height, and it is recomputed; however, the new patient must be within the same category of BMI classification recommended by the WHO^83,54. A similar DA was proposed and used by us⁵⁴. We named this proposed DA Controlled Noise (CN) since for the new patients the input variables take random values within the assigned ranges. We used three options. In the first one, we created i patients from each original patient. In the second option, we created i patients from each original positive patient, and in the third option we created i patients from each original positive patient, and j patients from each original negative patient, with i > j. Using the last two options, the training set may be balanced for the positive and negative cases.

Development of a new DL method for the prediction of SLD

In this study we developed a model using DL for the prediction of SLD transforming the input variables from tabular data to images, with the goal of using the power of CNN models to recognize complex patterns in images. The method has three stages: the data transformation stage, the CNN stage, and the classification stage.

Data transformation stage

With the goal of using the power of CNN models to recognize complex patterns in images, it is necessary to transform the patient’s data into an image. Each patient’s data can be represented as a vector of n variables/features. This vector can be transformed into a matrix. CNNs normally use an image of 224 × 224 as input⁶⁷. In our case the number of available variables for each patient is 22. Therefore, we replicate the data m times, creating a matrix of (m × n). To increase the number of columns in the matrix, we replicate each column k times, which can be interpreted as increasing the width of each column. This process results in a matrix of (m × nk). Figure 1 shows an example of the results of the replication process for a patient with 9 variables. The matrix is used for the three channels of the CNN^67,84,85. Data Augmentation is applied to the whole image, changing the values randomly as is described in the section, Data Augmentation of the Materials and Methods. Figure 1 shows the creation of the matrix for a patient with 9 variables (a). The resulting matrix (m x n) = (30 × 9) after replication of the rows is shown in Fig. 1b. Figure 1c shows the resulting matrix (m x nk) = (30 × 9*3) after column replication. Figure 2 shows the three options of DA for the patient with 9 variables of Fig. 1. Figure 2a shows the image without DA. Figure 2(b) shows the image with DA modifying a percentage of each row. Figure 2c shows the image with DA modifying a percentage of each column, and Fig. 2d shows the image with DA modifying a percentage of both rows and columns.

Binary/Categorical variables are not used to create the input image for the CNN in this transformation because they just take two values. These binary variables are considered as inputs to the classification stage as shown in Fig. 3. Considering this new patient data representation, we added another type of DA usually used with images^67,82,84 consisting of small random rotations of ± 5° and vertical flips with a probability of 50%, both applied only to the training set.

CNN stage

As described in the previous section, after the data transformation stage, each patient’s data is represented by an image that becomes the input to a CNN. The selected CNN is the ResNet-50, using the 1.5 version implemented in PyTorch⁸⁶. This CNN is used to extract features from the image of each patient. The weights of the CNN model were pretrained using the ImageNet 2012 dataset⁸⁷. The classification layer was removed and replaced by the classification Stage.

Classification stage

The classification stage is a Multi-Layer Perceptron (MLP), consisting of 3 hidden layers of 1000, 500, and 100 neurons. The input to this MLP is the output of the CNN and the binary/categorical variables. The output of this stage is the SLD prediction. The entire process including the three stages, i.e., the data transformation stage, the CNN stage, and the classification stage, can be observed in Fig. 3.

Traditional ML models

To compare the performance of our proposed method, we used twelve traditional ML models developed for SLD prediction using tabular data. The models are the following: Gaussian Naïve Bayes (GNB)⁸⁸ Bernoulli Naïve Bayes (BNB)⁸⁸ Decision Trees (DT)⁸⁸ Support Vector Machines (SVMs)⁸⁸ Multi-Layer Perceptron (MLP)⁸⁸ K-Nearest Neighbors (KNN)⁸⁸Logistic Regression (LR)⁸⁸ Random Forest (RF)⁸⁸ Extra Trees (ET)⁸⁸ Balanced Random Forest (BRF)⁸⁹ and Gradient Boosting Machines (GB), in two popular implementations, Extreme Gradient Boosting Machines (XGB)⁹⁰ and Light Gradient Boosting Machines (LGBM)⁹¹. These ML models have been used in the prediction of various illnesses with tabular data as inputs, e.g., Gestational Diabetes⁵⁴. DA Controlled Noise is also applied to these ML models.

DL and ML model implementations and hyperparameters

The models were implemented in Python 3.10.11, using the libraries PyTorch 2.0.1, Scikit-Learn 1.2.2, Imbalanced-Learn 0.10.1, XGBoost 1.7.3, and LightGBM 3.3.5. The hyperparameters for our proposed DL models are related to the image generated m and k, that adjust the size of the image. The values studied for m are 50, 150, 180, 210, 250, and 300. The values analyzed for k are 1, 2, 3, 4, 10, and 36. The layer selected to be the output of the CNN Stage is also a hyperparameter. Two layers of the ResNet were analyzed, Average Pooling layer (Avgpool) and the last convolutional layer. The alternative of selecting some of the CNN intermediate layers has been studied with good results⁶⁷. The spatial position of the variables in the image is also a hyperparameter. Several random positions were first analyzed, and then we performed permutations of variable positions around those that yielded good results in the first analysis.

The hyperparameters used for the traditional ML models are shown in Table S2 of the Supplementary Material. The hyperparameter selection was performed using a grid search evaluated in a 5-Fold Cross Validation (CV)⁹². Variable selection was part of the grid search, and the best variables were used for our proposed model. Variable selection was performed using 4 methods/metrics to select the optimal number of variables required for the best model performance, whilst reducing redundancy: F-test of ANOVA (Analysis of Variance), Chi-Square Test, Mutual Information, using the implementation of Scikit-Learn⁸⁸ and Balanced Random Forest⁸⁹.

The top 15% of the models with the highest area under the curve (AUC) were selected and assessed on the validation set.

Model evaluation

The validation set was used to select the best models and the decision threshold of our models. The test set was not used in training, model selection, or in decision threshold selection. Trained models were tested using the test set. With the decision thresholds, model results of accuracy, sensitivity, specificity, recall macro, area under the ROC curve (AUCROC), False positives (FP) and False Negatives (FN) are available. A high sensitivity is a priority for gastroenterologists because the model is intended to be used for SLD screening purposes. Therefore, we explored sensitivities above 0.80 (80%) with special attention.

Results

Population characteristics

A total of 2999 patients was included in this study. The dataset was partitioned into a training set of 2099 patients (70%), a validation set of 300 patients (10%), and a test set with 600 patients (20%). The prevalence of SLD in the dataset was 26.64% (799/2999). The test set had 159 patients positive for SLD. Table 1 presents the variables collected in the dataset.

Variable selection

Twenty-two variables were available for the prediction of SLD. Variable selection improves the performance of the models, reducing the use of irrelevant or redundant data. The best 11 variables for each method appear on Table S3 of the Supplementary Material.

Table 1 Clinical variables of the patients in the dataset. IQR, interquartile range.

Full size table

Model performance

Table 2 shows the performance of our proposed DL model (OursDLM) compared with traditional ML models. Table 2 includes the number of variables used, the image size for our proposed model, and the DA used. In the traditional ML models, only noise in artificial patients could be applied, while in our DL proposed model, the noise could be applied in rows and/or columns. In both cases, the number of patients created with noise appears next to them on the table. This means that both ML and DL models could use DA Controlled Noise. Noise is a small value, with ranges provided by a medical specialist. Only the training set is altered by this DA. Our DL proposed model also has the possibility of DA with Vertical Flip and Random Rotations. Table 2 also shows the following metrics: Accuracy, Sensitivity, Specificity, Recall Macro, AUCROC, a confusion matrix inline, and the total number of errors for each model. Table 2 also shows the best DL model and the best traditional ML model for each level of sensitivity from 1 to 0.8000. Additional results of OursDLM and OursMLM for more sensitivity levels are included on Table S4 of the Supplementary Material, and Table S4 includes results presented on Table 2. We also included Table S5 in the Supplementary Material, with information about the output layer used and spatial position of the variables. Models are tested using a test set with data that was not used in any of the methods employed to find the best models.

Our DL models show excellent SLD prediction capability with the use of simple-to-obtain clinical and laboratory variables. Table 2 presents the following models that obtained excellent results and that the medical specialists can choose depending on the desired balance between sensitivity (FN) and specificity (FP).

The DL models reached better results compared to the traditional ML models for the same levels of sensitivity.

Table 2 The top two models for different sensitivity levels, with sensitivity ≥ 0.8052, our DL model, and our traditional ML model (model number 1 (T1) to 32 (T32), and up to 14 variables.

Full size table

A comparison of the results, after changing the parameters of replication, was performed for three models (23, 27, 31) at the same threshold of sensitivity as that in Table 3. Replication parameters affect the input image size to the CNN. It can be seen on Table 3 that an image with a shorter height (50) has a worse performance in comparison with our selected value (180), with an average increase in error of 30 FP patients. With a height of 100, there is an increase of error of 19 FP on average. An increase in height, to 210, also has a larger number of errors, up to an average of 8 FP patients. Conversely, decreasing the width of the columns to 5 increases the errors to 86 FP patients on average, in comparison with our selected value (36). With a width of 12, errors increase on average to 57 FP patients in comparison with our selected value. Reducing the width of the column to 12, the error of FP is 36 patients more than in the case of our selected width. If we set the column width to 1, i.e., no replication of the column, the error increases on average to 114 FP patients. Finally, if we increase the width of each column to 40, the error also increases on average to 32 FP patients.

Table 3 Comparison of three models (23, 27, 31) with the same model but different parameters of replication, which vary image size.

Full size table

Table S6 of the Supplementary Material shows the results of the Hepatic Steatosis Index (HSI) applied to our database for the various levels of sensitivity from 1 to 0.8050. The comparison of these results to those on Table 2 shows that the results of all the DL models are significantly better than those of the HSI model. In this study, a screening was performed with the general population to detect patients at high risk of SLD. Then, a second confirmatory diagnostic test was performed only with those patients, that is with that high risk of SLD. In this first diagnostic test, which is widely available in the health care system, we use the biochemical profile that includes the variables to determine HSI.

Table S7 in the Supplementary Material shows the metrics for the traditional models without optimization (default hyperparameters), with and without variable selection, and without Data Augmentation. A comparison of the results between models with DA and without DA are shown on Table S8. Thirty of the thirty-two models achieved better results with DA (all models except models 23 and 31). Without DA the errors increased by 4.65% on average, with the greatest difference in Model 1, with 44 more errors when DA is not used.

Table S9 in Supplementary Material shows the thresholds used to calculate the metrics for the models.

A comparison between our proposed model and TabNet is presented on Table S4. In general, TabNet models achieved better, or similar, results compared to those of the traditional ML models. TabNet achieved improved results, e.g., a smaller number of errors, compared to the traditional ML models, in models 1–6, 16–19, 26–28, 30 and 31. TabNet achieved the same results compared to the traditional ML models, with models 15, 21 and 29. In other sensitivity thresholds (0.9623 − 0.9182, 0.8805, 0.8679 − 0.8491, 0.8050), traditional ML models achieved better results than TabNet (models 7–15, 20–25, 29 y 32). In general, TabNet models yielded lower results compared to our proposed models for high sensitivity ranges (0.8050 to 1). For example, our model 20 correctly predicts 14 more negative patients compared to TabNet 20 (328 vs. 314), with the same sensitivity value of 0.8805. Another example is our model 5 and TabNet 5, where the difference is 33 patients not detected by TabNet.

Figure 4 shows the ROC curves for DL models 9, 17, 27, 31, and 32, with sensitivities 0.9497, 0.8994, 0.8365, 0.8113 and 0.8050, respectively. These DL models reach higher AUCROC values compared to the corresponding traditional ML model for all levels of sensitivity in the range 1-0.8050. Also, these DL models reach much higher AUCROC values than those obtained with HSI.

Figure 5 shows another way to compare model results by comparing the total number of errors FP + FN. Figure 5 shows the total number of errors (FP + FN) as a function of the True Positives (TP) in the test set. Our DL models (OursDLM) are shown in blue. Our traditional ML models are shown in red (with optimization and variable selection). In orange are the traditional ML models without optimization, with variable selection. In purple are the traditional ML models without optimization and with no variable selection. The results of the HSI models in our database are in green.

Discussion

Our study presents a new method that enables applying the power of DL models for the prediction of SLD. We also developed twelve different traditional ML models, optimizing their hyperparameters, and compared their results to those of the DL models in the prediction of SLD. In general, the application of the traditional ML models has emerged as a tool to help identify diseases and make decisions in real time⁴³. In our study, we used a dataset that includes data from 2999 patients attending a preventive medicine unit. SLD prevalence in our population was 26.6%.

Our results show that DL models outperform the traditional ML models in a high sensitivity range(1–0.8113). For lower sensitivities (< 0.8052) the traditional ML models reach the best results. Figure 5 shows the total number of errors (FP + FN) as a function of the True Positives (TP) in the test set. It can be observed that our DL models (OursDLM) (blue color) achieve results with the lowest total number of errors compared to those results reached with our traditional ML models (red curve). However, both options OursDLM and OursMLM, yield better results than those of the ML models without optimization (orange and purple curves). It is important to note that variable selection and parameter optimization in the traditional ML models improves results significantly (orange, purple and red curves).

Our DL models show excellent SLD prediction capability with the use of simple-to-obtain clinical and laboratory variables. On Table 2 the following models are presented that obtained excellent results, and that the medical specialists can choose depending on the desired balance between sensitivity (FN) and specificity (FP). For example, model 9 reached a sensitivity of 0.9497 (8 FN), a specificity of 0.6417 (158 FP), and an AUCROC of 0.8662. Another good example on Table 2 is model 17 that reached a sensitivity of 0.8994 (16 FN), a specificity of 0.7211 (123 FP) and an AUCROC of 0.8660. Another choice for cases with lower sensitivity and higher specificity is provided by model 27 on Table 2, that reached a sensitivity of 0.8365 (26 FN), a specificity of 0.7868 (94 FP) and an AUCROC of 0.8630. Additionally, model 31 achieved a sensitivity of 0.8113 (30 FN), a specificity of 0.8004 (88 FP) with an AUCROC of 0.8562. These findings are of particular importance given the increasing prevalence of SLD and its associated adverse effects. The high prevalence of this disease makes it difficult to implement universal screening programs. The use of our DL models could help identify patients at increased risk for SLD who would benefit from a confirmation by testing. On the other hand, our DL models also allow us to identify patients with a very low risk of presenting the disease, for whom it will be possible to choose not to perform further studies, thus reducing costs. Our best models demonstrated that the most effective predicting variables were age, weight, BMI, waist perimeter, AST, ALT, triglycerides, and HDL cholesterol. These are clinical and laboratory variables that are easy to obtain and of low cost, which facilitates the implementation of this model in primary care.

The use of a two-step screening program, in which a formula is applied to select high-risk patients who benefit from abdominal ultrasonography, has shown a reduction in ultrasonography requests, with a low false-negative rate⁹³. In a two-step screening program using one of our models (model 17, Table 2), we could avoid 55.7% of abdominal ultrasounds, with a false negative rate of 2.7%.

OursDLMs in comparison with OursTMLs offers an average reduction in FP for the same level of sensitivity of 15.625 (9.85% reduction) in the sensitivity range analyzed, with a minimum of 3 (3.37% reduction at sensitivity of 0.8050), and a maximum of 43 (20.09% reduction, at sensitivity 0.9748). This improved performance is due to the transformation of the data into images and subsequent application of CNNs for pattern recognition, including use of our proposed DA. OursDLMs compared with traditional models without optimization, but with variable selection, results in an average improvement of 23.59 (14.91% reduction) of FP for the same level of sensitivity with a minimum of 12 (12.24% reduction at sensitivity of 0.8050) and a maximum of 50 (22.62% reduction at sensitivity of 0.9748). It is important to note the importance of both, traditional model optimization, and variable selection, since without it, the difference would be even greater. For example, OursDLMs compared with our traditional models without optimization or variable selection, provide an average improvement of 43.19 (reduction of 23.83%) of FP for the same level of sensitivity with a minimum of 29 (14.80% reduction at sensitivity of 0.9623), and a maximum of 56 (31.64% reduction at sensitivity of 0.8931). Comparing OursDLMs against HSI, the difference is even higher. OursDLMs reach an average improvement of 88.28 (reduction of 38.52%) of FP at the same level of sensitivity with a minimum of 60 (30.61% reduction at sensitivity of 0.9120), and a maximum of 183 (47.04% reduction at sensitivity of 0.9874).

Variable selection impact

Table S10 in the Supplementary Material shows a comparison between traditional models with variable selection and the same models using all available variables (22). Compared to the 32 models, those with 22 variables increase the number of errors by 1.93% on average. The greatest increase in error was for model 24, with an increase of 13 FP. Using 22 variables in models 1, 30, and 31, yielded the same results. Models 2, 5, 6, 10, 12, 23, 26, and 27, improve performance by up to 7 patients. However, in these cases, 22 variables are more than double the number required by the models with variable selection (i.e., model T2 requires 8 variables, and model V22-2 requires 22 variables). Also, using 22 variables will require more time to fill in the data to use the models in clinical practice.

Weight-related variables (BMI, Waist Perimeter and/or Weight) are chosen in the top 5 of all the variable selection methods. Something similar happens with GPT ALT and Triglycerides. Cholesterol HDL is chosen 6th in all the methods except Chi-Square, where it is chosen in 4th place.

Using SHAP, a plot of global feature importance is shown in Fig. 6. The plot shows that GPT ALT is the most influential variable, followed by Triglycerides, BMI, and Age. Cholesterol HDL, and GOT AST have a lesser influence in model prediction.

There is limited public availability of datasets from other published studies, making a direct comparison of model performance impossible. There are also different criteria for patient inclusion in each study. Since in our case patients were selected from a screening study, the prevalence of SLD is similar to that of the population. However, in other previous publications^51,52,60 the selection of patients was made from a group that consulted for a disease, and therefore the SLD prevalence is much higher. Thus, models may not be directly comparable. As a reference, models from previous studies for SLD prediction are shown on Table 4.

Table 4 Results of models published in previous studies. Used as reference.

Full size table

The input variable used by OursDLMs and OursTMLs and the models of the state of the art are shown on Table S11 of Supplementary Material. It is important to mention that 7 variables are used by all of our models: Age, Weight, BMI, Waist Perimeter, GPT ALT, Triglycerides, and Cholesterol HDL. The average number of variables used by our traditional model is 10.85 compared to OursDLMs that use an average of 8.28 variables. The input variable used by OursDLMs and OursTMLs, and the models of the state of the art are shown on Table S11 of the Supplementary Material, including the variable selection method used. It is important to notice that OurDLM of 8 variables uses variables selected by the Chi-Square Test method, while OurDLM of 11 variables uses variables selected by BRF.

The purpose of our study is to develop a screening method for SLD. Then, in patients with high risk of SLD, a second test would be performed to confirm diagnosis. In this context, it is a priority to make the correct prediction of positive patients (high sensitivity) and low FN rate. Also, having a good level of specificity reduces the rate of false positives (FP), and therefore, the second test is performed in a reduced group of patients with a high risk of SLD. We consider at least these two metrics together to make a decision for model performance. A high sensitivity requires a large number of true positives (TP), i.e., most patients with the disease are detected with a low number of false negatives (FNs are patients with the disease not detected). A high specificity requires a small number of FPs. In a screening task, it is necessary first to have good performance in sensitivity, and second in specificity, which could be interpreted as requisite for reducing False Positives and False Negatives.

AUC ROC provides an overall assessment of the model diagnostic performance at different values of sensitivity and specificity. Nevertheless, the ROC curves of the models may intersect and have different performances at various ranges of FP. Therefore, for screening (high sensitivity and high specificity) there may be models with better performance than those with the largest total AUC ROC^94,95,96. For example, as shown on Table S4, traditional models 1 to 12 have a higher AUC ROC, but worse performance in FP values for the same level of sensitivity (sensitivities from 1 to 0.802). Another example on Table S4 is model 13, in which our proposed model has a slightly larger AUC ROC (0.8681 versus 0.8678); however, the number of FPs is reduced significantly from 157 to 142 with our model. Many other examples can be observed on Table S4. Also, Table S4 shows that for all levels of sensitivity, from 1 to 0.8050, the best specificity (i.e., the best combination of low FNs and FPs) is reached with our proposed models.

The partial area under the ROC curve (pAUC) is a metric that can be used to compare models in different regions of the ROC curve. Table S12 shows the pAUC for true positive rate (TPR) computed in steps of 0.2. In each range, one of our models achieved the best results. For example, model 31 achieved the best results in the range 0–0.2, and in 0.2–0.4. In the range 0.8–1, our models 27 and 32 yielded the best results.

The same conclusion can be reached by using the precision-recall (PR) curve. Our models in the region of interest (recall 0.8–1) in Figure S1 show greater precision. For example, model 9 has a larger area between recall of 0.9 and 0.98. Model 17 achieves the best results, between recall 0.86 and 0.9. Models 17, 27, 31 and 32 are better in the recall region 0.8–0.86. BRF has less precision, despite achieving a larger AUC PR in the region between a recall of 0.925 and 1. MLP has a good precision in the region of interest (recall 0.8-1), but our proposed model achieves improved results.

Conclusion

SLD, formerly named fatty liver disease, has a prevalence estimated at 30–38% in adults. Detection of SLD is important, since prompt initiation of treatment can stop disease progression, lead to a reduction in adverse outcomes, and reduce the economic burden associated with the disease. In this study we reported the development of a novel DL method for the prediction of SLD, which consists of transforming the input variables from tabular data into images, with the goal of using the pattern recognition power of DL models to achieve the best classification performance. For that purpose, the data of each patient, originally represented as a vector of n variables, was converted into an image replicating each variable m times in one dimension and, k times in the other, creating a matrix of (m x kn). A variable selection, and data augmentation method were used during training to improve prediction results. Twelve traditional machine learning (ML) models were implemented as a comparison with our DL proposed models.

All our proposed DL models reached better results compared to those of traditional ML models at all levels of sensitivity in the range 1-0.8050. This sensitivity range was selected by the hepatologist specialist as appropriate for SLD screening purposes. For example, a sensitivity of 0.9497, a specificity of 0.6417, and an AUCROC of 0.8662 were reached with one of our DL models. Another model reached a sensitivity of 0.8113, a specificity of 0.8004 with an AUCROC of 0.8562. These models require only 8 widely available variables in clinical practice. We also reached significantly better results compared to those obtained with the Hepatic Steatosis Index (HSI). All our DL models reached higher AUCROC values compared to those of the traditional ML models in the sensitivity range 1-0.8050. Additionally, our DL models reach much higher AUCROC values than those obtained with HSI. Our proposed method converts tabular data into images enabling applying the pattern recognition power of DL models to the prediction of SLD. The combination of our proposed DL model with variable selection, hyperparameters optimization and data augmentation allows us to set a new state of the art level for SLD prediction, reaching a better performance than traditional ML models and those of HSI. The proposed method may be applied to prediction of other illnesses by converting tabular data into images.

Data availability

The dataset used in this study are not publicly available due to privacy reasons. The dataset is provided by Clinica los Andes, access to this data may be provided to qualified researchers upon request and permission of this institution (gmezzano@clinicauandes.cl).

Abbreviations

AFP:: Alpha-fetoprotein
AI:: Artificial intelligence
ALD:: Alcohol-associated liver disease
ALT:: Alanine aminotransferase
AST:: Aspartate aminotransferase
AUC:: Area under curve
AUDIT:: Alcohol use disorders identification test
BMI:: Body mass index
BNB:: Bernoulli Naïve Bayes
BRF:: Balanced random forest
CEA:: Carcinoembryonic antigen
CN:: Controlled noise
CNN:: Convolutional neural network
CV:: Cross validation
DA:: Data augmentation
DBP:: Diastolic blood pressure
DL:: Deep learning
DM:: Diabetes mellitus
DT:: Decision tree
ET:: Extra trees
FN:: False negative
FP:: False positive
FPG:: Fasting plasma glucose
GB:: Gradient boosting machine
GDM:: Gestation diabetes mellitus
GGT:: Gamma-glutamyl transferase
GI:: Gastrointestinal
GNB:: Gaussian Naïve Bayes
HCC:: Hepatocellular carcinoma
HDL:: High-density lipoprotein
HSI:: Hepatic steatosis index
HOMA-IR:: Homeostatic model assessment for insulin resistance
IRB:: Institutional review board
IQR:: Interquartile range
KNN:: K-nearest neighbors
LDL:: Low-density lipoprotein
LGBM:: Light gradient boosting machine
LR:: Logistic regression
MAFLD:: Metabolically-associated fatty liver disease
MASLD:: Metabolic dysfunction-associated steatotic liver disease
MetALD:: Metabolic and alcohol-associated liver disease
ML:: Machine learning
MLP:: Multi-layer perceptron
MRI:: Magnetic resonance image
MRI-PDFF:: Magnetic resonance proton density fat fraction
MRS:: Magnetic resonance spectroscopy
NAFLD:: Non-alcoholic fatty liver disease
RF:: Random forest
ROC:: Receiver operating characteristic
RUS:: Random under-sampling
SBP:: Systolic blood pressure
SLD:: Steatotic liver disease
SVM:: Support vector machine
T2DM:: Type 2 diabetes mellitus
TN:: True negative
TP:: True positive
UA:: Uric acid
VCTE:: Vibration-controlled transient elastography
WC:: Waist circumference
WHR:: Waist-hip ratio
XGB:: Extreme gradient boosting machine

References

Powell, E. E., Wong, V. W. S. & Rinella, M. Non-alcoholic fatty liver disease. Lancet 397, 2212–2224 (2021).
Article CAS PubMed Google Scholar
Rinella, M. E. et al. A multisociety Delphi consensus statement on new fatty liver disease nomenclature. Hepatology 78, 1966–1986 (2023).
Article PubMed Google Scholar
Song, S. J., Lai, J. C. T., Wong, G. L. H., Wong, V. W. S. & Yip, T. Can we use old NAFLD data under the new MASLD definition? J. Hepatol. 80, e54–e56 (2024).
Article PubMed Google Scholar
Le, M. H. et al. Prevalence of non-alcoholic fatty liver disease and risk factors for advanced fibrosis and mortality in the united States. PLoS One 12, e0173499 (2017).
Article MathSciNet PubMed PubMed Central Google Scholar
Cotter, T. G. & Rinella, M. Nonalcoholic fatty liver disease 2020: the state of the disease. Gastroenterology 158, 1851–1864 (2020).
Article CAS PubMed Google Scholar
Staufer, K. & Stauber, R. E. Steatotic liver disease: Metabolic dysfunction, alcohol, or both? Biomedicines 11, 2108 (2023).
Younossi, Z. M. et al. The global epidemiology of nonalcoholic fatty liver disease (NAFLD) and nonalcoholic steatohepatitis (NASH): A systematic review. Hepatology 77, 1335–1347 (2023).
Article PubMed Google Scholar
Lazarus, J. V. et al. A global research priorityagenda to advance public health responses to fatty liver disease. J. Hepatol. 79, 618–634 (2023).
Article PubMed Google Scholar
Younossi, Z. M. et al. Global epidemiology of nonalcoholic fatty liver disease—meta-analytic assessment of prevalence, incidence, and outcomes. Hepatology 64, 73–84 (2016).
Article PubMed Google Scholar
Ayares, G., Idalsoaga, F., Díaz, L. A., Arnold, J. & Arab, J. P. Current medical treatment for alcohol-associated liver disease. J. Clin. Exp. Hepatol. 12, 1333–1348 (2022).
Article CAS PubMed PubMed Central Google Scholar
Dang, K., Hirode, G., Singal, A. K., Sundaram, V. & Wong, R. J. Alcoholic liver disease epidemiology in the United States: A retrospective analysis of 3 US databases. Official J. Am. Coll. Gastroenterol. ACG 115, 96–104 (2020).
Article Google Scholar
Younossi, Z. M., Wong, G., Anstee, Q. M. & Henry, L. The global burden of liver disease. Clin. Gastroenterol. Hepatol. 21, 1978–1991 (2023).
Article PubMed Google Scholar
Devarbhavi, H. et al. Global burden of liver disease: 2023 update. J. Hepatol. 79, 516–537 (2023).
Article PubMed Google Scholar
Seitz, H. K. et al. Alcoholic liver disease. Nat. Rev. Dis. Primers 4, 16 (2018).
Article PubMed Google Scholar
Saiman, Y., Duarte-Rojo, A. & Rinella, M. E. Fatty liver disease: Diagnosis and stratification. Annu. Rev. Med. 73, 529–544 (2022).
Article CAS PubMed Google Scholar
Mezzano, G. et al. Global burden of disease: acute-on-chronic liver failure, a systematic review and meta-analysis. Gut 71, 148 (2022).
Article CAS PubMed Google Scholar
Paik, J. M. et al. Mortality related to nonalcoholic fatty liver disease is increasing in the united States. Hepatol. Commun. 3, 1459–1471 (2019).
Article PubMed PubMed Central Google Scholar
Pal, S. C. & Méndez-Sánchez, N. Screening for MAFLD: Who, when and how? Ther. Adv. Endocrinol. Metab. 14, 20420188221145650 (2023).
Yoo, E. R., Ahmed, A. & Kim, D. Economic burden and healthcare utilization in nonalcoholic fatty liver disease. Hepatobiliary Surg. Nutr. 8, 181–183 (2019).
Article PubMed PubMed Central Google Scholar
Younossi, Z. M., Henry, L., Bush, H. & Mishra, A. Clinical and economic burden of nonalcoholic fatty liver disease and nonalcoholic steatohepatitis. Clin. Liver Dis. 22, 1–10 (2018).
Article PubMed Google Scholar
Allen, A. M., Van Houten, H. K., Sangaralingham, L. R., Talwalkar, J. A. & McCoy, R. G. Healthcare cost and utilization in nonalcoholic fatty liver disease: Real-World data from a large U.S. Claims database. Hepatology 68, 2230–2238 (2018).
Article PubMed Google Scholar
Stepanova, M., Henry, L. & Younossi, Z. M. Economic burden and patient-reported outcomes of nonalcoholic fatty liver disease. Clin. Liver Dis. 27, 483–513 (2023).
Article PubMed Google Scholar
Dan, A. A. et al. Health-related quality of life in patients with non-alcoholic fatty liver disease. Aliment. Pharmacol. Ther. 26, 815–820 (2007).
Article ADS CAS PubMed Google Scholar
Jiang, W. et al. Global burden of nonalcoholic fatty liver disease, 1990 to 2019: Findings from the global burden of disease study 2019. J. Clin. Gastroenterol. 57, 631–639 (2023).
Article CAS PubMed Google Scholar
Chalasani, N. et al. The diagnosis and management of nonalcoholic fatty liver disease: Practice guidance from the American association for the study of liver diseases. Hepatology 67, 328–357 (2018).
Article PubMed Google Scholar
Huang, T., Behary, J. & Zekry, A. Non-alcoholic fatty liver disease: a review of epidemiology, risk factors, diagnosis and management. Intern. Med. J. 50, 1038–1047 (2020).
Hernaez, R. et al. Diagnostic accuracy and reliability of ultrasonography for the detection of fatty liver: A meta-analysis. Hepatology 54, 1082–1090 (2011).
Article PubMed Google Scholar
Berzigotti, A. et al. EASL clinical practice guidelines on non-invasive tests for evaluation of liver disease severity and prognosis – 2021 update. J. Hepatol. 75, 659–689 (2021).
Article Google Scholar
Eslam, M. et al. The Asian Pacific association for the study of the liver clinical practice guidelines for the diagnosis and management of metabolic associated fatty liver disease. Hepatol. Int. 14, 889–919 (2020).
Article PubMed Google Scholar
European Association for the Study of the Liver (EASL). European association for the study of diabetes (EASD) & European association for the study of obesity (EASO). EASL–EASD–EASO clinical practice guidelines for the management of non-alcoholic fatty liver disease. J. Hepatol. 64, 1388–1402 (2016).
Article Google Scholar
Ciardullo, S., Vergani, M. & Perseghin, G. Nonalcoholic fatty liver disease in patients with type 2 diabetes: Screening, diagnosis, and treatment. J. Clin. Med. 12, 5597 (2023).
Article CAS PubMed PubMed Central Google Scholar
Gu, J. et al. Diagnostic value of MRI-PDFF for hepatic steatosis in patients with non-alcoholic fatty liver disease: A meta-analysis. Eur. Radiol. 29, 3564–3573 (2019).
Article PubMed Google Scholar
Rinella, M. E. Nonalcoholic fatty liver disease: A systematic review. JAMA 313, 2263–2273 (2015).
Article CAS PubMed Google Scholar
Cusi, K. et al. American association of clinical endocrinology clinical practice guideline for the diagnosis and management of nonalcoholic fatty liver disease in primary care and endocrinology clinical settings: Co-sponsored by the American association for the study of liver diseases (AASLD). Endocr. Pract. 28, 528–562 (2022).
Article PubMed Google Scholar
Noureddin, M. et al. Screening for nonalcoholic fatty liver disease in persons with type 2 diabetes in the United States is cost-effective: A comprehensive cost-utility analysis. Gastroenterology 159, 1985–1987e4(2020).
Article PubMed Google Scholar
Wong, V. W. S. et al. Community-based lifestyle modification programme for non-alcoholic fatty liver disease: A randomized controlled trial. J. Hepatol. 59, 536–542 (2013).
Article PubMed Google Scholar
Eckard, C. et al. Prospective histopathologic evaluation of lifestyle modification in nonalcoholic fatty liver disease: A randomized trial. Th. Adv. Gastroenterol. 6, 249–259 (2013).
Article Google Scholar
Arab, J. P. et al. Latin American association for the study of the liver (ALEH) practice guidance for the diagnosis and treatment of non-alcoholic fatty liver disease. Ann. Hepatol. 19, 674–690 (2020).
Article CAS PubMed Google Scholar
Bedogni, G. et al. The fatty liver index: A simple and accurate predictor of hepatic steatosis in the general population. BMC Gastroenterol. 6, 33 (2006).
Article ADS PubMed PubMed Central Google Scholar
Lee, J. H. et al. Hepatic steatosis index: A simple screening tool reflecting nonalcoholic fatty liver disease. Dig. Liver Dis. 42, 503–508 (2010).
Article CAS PubMed Google Scholar
Goldman, O. et al. Non-alcoholic fatty liver and liver fibrosis predictive analytics: risk prediction and machine learning techniques for improved preventive medicine. J. Med. Syst. 45, 22 (2021).
Article PubMed Google Scholar
Liu, Y. X. et al. Comparison and development of advanced machine learning tools to predict nonalcoholic fatty liver disease: An extended study. Hepatobiliary Pancreat. Dis. Int. 20, 409–415 (2021).
Article PubMed Google Scholar
Wu, C. C. et al. Prediction of fatty liver disease using machine learning algorithms. Comput. Methods Progr. Biomed. 170, 23–29 (2019).
Article Google Scholar
Zhao, M., Song, C., Luo, T., Huang, T. & Lin, S. Fatty liver disease prediction model based on big data of electronic physical examination records. Front Public. Health 9 (2021).
Noureddin, M. et al. Predicting NAFLD prevalence in the United States using national health and nutrition examination survey 2017–2018 transient elastography data and application of machine learning. Hepatol. Commun. 6, 1537–1548 (2022).
Article CAS PubMed PubMed Central Google Scholar
Ji, W., Xue, M., Zhang, Y., Yao, H. & Wang, Y. A machine learning based framework to identify and classify Non-alcoholic fatty liver disease in a large-scale population. Front Public. Health 10 (2022).
Wu, C. T., Chu, T. W. & Jang, J. S. R. Current-visit and next-visit prediction for fatty liver disease with a large-scale dataset: Model development and performance comparison. JMIR Med. Inf. 9, e26398 (2021).
Article Google Scholar
Atsawarungruangkit, A., Laoveeravat, P. & Promrat, K. Machine learning models for predicting non-alcoholic fatty liver disease in the general United States population: NHANES database. World J. Hepatol. 13, 1417–1427 (2021).
Article PubMed PubMed Central Google Scholar
Meffert, P. J. et al. Development external validation, and comparative assessment of a new diagnostic score for hepatic steatosis. Official J. Am. Coll. Gastroenterol. ACG 109, 1404–1414 (2014).
Weng, S., Hu, D., Chen, J., Yang, Y. & Peng, D. Prediction of fatty liver disease in a Chinese population using Machine-Learning algorithms. Diagnostics 13, 1168 (2023).
Article CAS PubMed PubMed Central Google Scholar
Razmpour, F. et al. Application of machine learning in predicting non-alcoholic fatty liver disease using anthropometric and body composition indices. Sci. Rep. 13, 4942 (2023).
Article ADS CAS PubMed PubMed Central Google Scholar
Peng, H. Y. et al. Development and validation of machine learning models for nonalcoholic fatty liver disease. Hepatobiliary Pancreat. Dis. Int. 22, 615–621 (2023).
Article CAS PubMed Google Scholar
Pei, X., Deng, Q., Liu, Z., Yan, X. & Sun, W. Machine learning algorithms for predicting fatty liver disease. Ann. Nutr. Metab. 77, 38–45 (2021).
Article CAS PubMed Google Scholar
Cubillos, G. et al. Development of machine learning models to predict gestational diabetes risk in the first half of pregnancy. BMC Pregnancy Childbirth 23, 469 (2023).
Article PubMed PubMed Central Google Scholar
Deshmukh, F. & Merchant, S. S. Explainable machine learning model for predicting GI bleed mortality in the intensive care unit. Official J. Am. Coll. Gastroenterol. ACG 115, 1657–1668 (2020).
Article Google Scholar
Otsubo, N. et al. Utility of indices obtained during medical checkups for predicting fatty liver disease in Non-obese people. Intern. Med. 62, 2307–2319 (2023).
Article CAS PubMed Google Scholar
Han, N. et al. Identification of biomarkers in nonalcoholic fatty liver disease: A machine learning method and experimental study. Front. Genet. 13, (2022).
Xue, Y., Xu, J., Li, M. & Gao, Y. Potential screening indicators for early diagnosis of NAFLD/MAFLD and liver fibrosis: triglyceride glucose index–related parameters. Front Endocrinol. (Lausanne) 13, (2022).
Chen, Y. Y. et al. Machine-learning algorithm for predicting fatty liver disease in a Taiwanese population. J. Pers. Med. 12, 1026 (2022).
Article PubMed PubMed Central Google Scholar
Islam, M. M., Wu, C. C., Poly, T. N., Yang, H. C. & Li, Y. C. J. Applications of machine learning in fatty live disease prediction. Stud. Health Technol. Inf. 247, 166–170 (2018).
Google Scholar
Pan, X. et al. Risk prediction for Non-alcoholic fatty liver disease based on biochemical and dietary variables in a Chinese Han population. Front Public. Health 8,(2020).
Sorino, P. et al. Development and validation of a neural network for NAFLD diagnosis. Sci. Rep. 11, 20240 (2021).
Article ADS CAS PubMed PubMed Central Google Scholar
Yuan, Y. et al. Development and validation of a nomogram model for predicting the risk of MAFLD in the young population. Sci. Rep. 14, 9376 (2024).
Article ADS CAS PubMed PubMed Central Google Scholar
Hardy, R. et al. Improving nonalcoholic fatty liver disease classification performance with latent diffusion models. Sci. Rep. 13, 21619 (2023).
Article ADS CAS PubMed PubMed Central Google Scholar
Li, D., Zhang, M., Wu, S., Tan, H. & Li, N. Risk factors and prediction model for nonalcoholic fatty liver disease in Northwest China. Sci. Rep. 12, 13877 (2022).
Article ADS CAS PubMed PubMed Central Google Scholar
Pouyanfar, S. et al. A survey on deep learning: Algorithms, techniques, and applications. ACM Comput. Surv. 51, 1–36 (2018).
Google Scholar
Zambrano, J. E., Benalcazar, D. P., Perez, C. A. & Bowyer, K. W. Iris recognition using Low-Level CNN layers without training and single matching. IEEE Access. 10, 41276–41286 (2022).
Article Google Scholar
Dong, S., Wang, P. & Abbas, K. A survey on deep learning and its applications. Comput. Sci. Rev. 40, 100379 (2021).
Article MathSciNet Google Scholar
Khan, A., Sohail, A., Zahoora, U. & Qureshi, A. S. A survey of the recent architectures of deep convolutional neural networks. Artif. Intell. Rev. 53, 5455–5516 (2020).
Article Google Scholar
Gu, J. et al. Recent advances in convolutional neural networks. Pattern Recognit. 77, 354–377 (2018).
Article ADS Google Scholar
Borisov, V. et al. Deep neural networks and tabular data: A survey. IEEE Trans. NeuralNetw. Learn. Syst. 35, 7499–7519 (2024).
Article PubMed Google Scholar
Sharma, A. et al. A methodology to transform a non-image data to an image for convolution neural network architecture. Sci. Rep. 9, 11399 (2019).
Article ADS PubMed PubMed Central Google Scholar
Bazgir, O. et al. Representation of features as images with neighborhood dependencies for compatibility with convolutional neural networks. Nat. Commun. 11, 4391 (2020).
Article ADS CAS PubMed PubMed Central Google Scholar
Zhu, Y. et al. Converting tabular data into images for deep learning with convolutional neural networks. Sci. Rep. 11, 11325 (2021).
Article ADS CAS PubMed PubMed Central Google Scholar
Sharma, A. & Kumar, D. Classification with 2-D convolutional neural networks for breast cancer diagnosis. Sci. Rep. 12, 21857 (2022).
Article ADS CAS PubMed PubMed Central Google Scholar
Akazawa, M. & Hashimoto, K. A multimodal deep learning model for predicting severe hemorrhage in placenta previa. Sci. Rep. 13, 17320 (2023).
Article ADS CAS PubMed PubMed Central Google Scholar
Shen, D., Wu, G. & Suk, H. I. Deep learning in medical image analysis. Annu. Rev. Biomed. Eng. 19, 221–248 (2017).
Article CAS PubMed PubMed Central Google Scholar
Suzuki, K. Overview of deep learning in medical imaging. Radiol. Phys. Technol. 10, 257–273 (2017).
Article PubMed Google Scholar
Greenspan, H., van Ginneken, B. & Summers, R. M. Guest editorial deep learning in medical imaging: Overview and future promise of an exciting new technique. IEEE Trans. Med. Imaging 35, 1153–1159 (2016).
Article Google Scholar
Litjens, G. et al. A survey on deep learning in medical image analysis. Med. Image Anal. 42, 60–88 (2017).
Article PubMed Google Scholar
Bush, K. et al. The AUDIT alcohol consumption questions (AUDIT-C): An effective brief screening test for problem drinking. Arch. Intern. Med. 158, 1789–1795 (1998).
Article CAS PubMed Google Scholar
Montecino, D. A., Perez, C. A. & Bowyer, K. W. Two-level genetic algorithm for evolving convolutional neural networks for pattern recognition. IEEE Access 9, 126856–126872 (2021).
Article Google Scholar
World Health Organization. A healthy lifestyle—WHO recommendations. (2010). https://www.who.int/europe/news-room/fact-sheets/item/a-healthy-lifestyle---who-recommendations
Pilataxi, J. I., Zambrano, J. E., Perez, C. A. & Bowyer, K. W. Improved search in neuroevolution using a neural architecture classifier with the CNN architecture encoding as feature vector. IEEE Access 12, 11987–12000 (2024).
Article Google Scholar
Perez, J. P. & Perez, C. A. Face patches designed through neuroevolution for face recognition with large pose variation. IEEE Access 11, 72861–72873 (2023).
Article Google Scholar
Paszke, A. et al. Curran Associates Inc., Red Hook, NY, USA,. PyTorch: an imperative style, high-performance deep learning library. in Proceedings of the 33rd International Conference on Neural Information Processing Systems (2019).
Russakovsky, O. et al. ImageNet large scale visual recognition challenge. Int. J. Comput. Vis. 115, 211–252 (2015).
Article MathSciNet Google Scholar
Pedregosa, F. et al. Scikit-learn: machine learning in Python. J. Mach. Learn. Res. 12, 2825–2830 (2011).
MathSciNet Google Scholar
Lemaître, G., Nogueira, F. & Aridas, C. K. Imbalanced-learn: A Python toolbox to tackle the curse of imbalanced datasets in machine learning. J. Mach. Learn. Res. 18, 1–5 (2017).
Google Scholar
Chen, T. & Guestrin, C. XGBoost. in Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining 785–794 (ACM, New York, NY, USA, 2016). https://doi.org/10.1145/2939672.2939785.
Ke, G. et al. LightGBM: A highly efficient gradient boosting decision tree. in Advances in Neural Information Processing Systems (ed Guyon, I.) vol. 30 (Curran Associates, Inc., 2017).
Cawley, G. C. & Talbot, N. L. C. On over-fitting in model selection and subsequent selection bias in performance evaluation. J. Mach. Learn. Res. 11, 2079–2107 (2010).
MathSciNet Google Scholar
Procino, F. et al. Reducing NAFLD-screening time: A comparative study of eight diagnostic methods offering an alternative to ultrasound scans. Liver Int. 39, 187–196 (2019).
Article PubMed Google Scholar
Halligan, S., Altman, D. G. & Mallett, S. Disadvantages of using the area under the receiver operating characteristic curve to assess imaging tests: A discussion and proposal for an alternative approach. Eur. Radiol. 25, 932–939 (2015).
Article PubMed PubMed Central Google Scholar
Bhat, S. et al. AUCReshaping: Improved sensitivity at high-specificity. Sci. Rep. 13, 21097 (2023).
Article ADS CAS PubMed PubMed Central Google Scholar
Janssens, A. C. J. W. & Martens, F. K. Reflection on modern methods: Revisiting the area under the ROC curve. Int. J. Epidemiol. 49, 1397–1403 (2020).
Article PubMed Google Scholar

Download references

Acknowledgements

This work was supported in part by Agencia Nacional de Investigación y Desarrollo (ANID) under Grant FONDECYT 1231675, and under Basal funding for Scientific and Technological Center of Excellence, Project AFB220002, IMPACT #FB210024, and in part by the Department of Electrical Engineering, Universidad de Chile. Also, this work has been supported by Centro de EnfermedadesDigestivas and Unidad de Medicina Preventiva, Universidad de los Andes, Chile, and by Gastroenterology, Hospital del Salvador, Chile.

Author information

Authors and Affiliations

Department of Electrical Engineering, and Advanced Mining Technology Center, Universidad de Chile, Santiago, Chile
Gabriel Cubillos & Claudio A. Perez
IMPACT, Center of Interventional Medicine for Precision and Advanced Cellular Therapy, Santiago, Chile
Gabriel Cubillos & Claudio A. Perez
Internal Medicine Resident, Universidad de los Andes, Santiago, Chile
Javier Perez-Valenzuela
Gastroenterology, Hospital del Salvador, Santiago, Chile
Herman Aguirre
Unidad de Medicina Preventiva, Universidad de los Andes, Santiago, Chile
Luz Martínez
Centro de Enfermedades Digestivas, Universidad de los Andes, Santiago, Chile
Lorena Castro & Gabriel Mezzano

Authors

Gabriel Cubillos
View author publications
Search author on:PubMed Google Scholar
Javier Perez-Valenzuela
View author publications
Search author on:PubMed Google Scholar
Herman Aguirre
View author publications
Search author on:PubMed Google Scholar
Luz Martínez
View author publications
Search author on:PubMed Google Scholar
Lorena Castro
View author publications
Search author on:PubMed Google Scholar
Gabriel Mezzano
View author publications
Search author on:PubMed Google Scholar
Claudio A. Perez
View author publications
Search author on:PubMed Google Scholar

Contributions

G.C. participated in the conceptualization, data curation, formal analysis, investigation, methodology, software implementation, results generation, validation, visualization, writing original draft. J.P.V. participated in the conceptualization, data curation, formal analysis, investigation, methodology, visualization, writing original draft. H.A. participated in the investigation, methodology, validation, writing the original draft . L.M. participated in the data curation, formal analysis, investigation, methodology. L.C. participated in the conceptualization, investigation, methodology, supervision. G.M. participated in the conceptualization, investigation, methodology, supervision, writing original draft, and review. C.P. participated in the conceptualization, formal analysis, funding acquisition, investigation, methodology, project administration, supervision, validation, visualization, writing original draft. All authors reviewed the manuscript.

Corresponding author

Correspondence to Claudio A. Perez.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary Material 1. (download DOCX )

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.

Reprints and permissions

About this article

Cite this article

Cubillos, G., Perez-Valenzuela, J., Aguirre, H. et al. Development of a novel deep learning method that transforms tabular input variables into images for the prediction of SLD. Sci Rep 15, 28024 (2025). https://doi.org/10.1038/s41598-025-12900-z

Download citation

Received: 08 July 2024
Accepted: 21 July 2025
Published: 31 July 2025
Version of record: 31 July 2025
DOI: https://doi.org/10.1038/s41598-025-12900-z

Subjects

Abstract

Similar content being viewed by others

A stacking ensemble machine learning model to predict alpha-1 antitrypsin deficiency-associated liver disease clinical outcomes based on UK Biobank data

Deep-learning segmentation to select liver parenchyma for categorizing hepatic steatosis on multinational chest CT

Improving nonalcoholic fatty liver disease classification performance with latent diffusion models

Introduction

State of the art in SLD prediction with ML models

Deep learning capabilities in pattern recognition

Our contributions

Materials and methods

Database

Data augmentation

Development of a new DL method for the prediction of SLD

Data transformation stage

CNN stage

Classification stage

Traditional ML models

DL and ML model implementations and hyperparameters

Model evaluation

Results

Population characteristics

Variable selection

Model performance

Discussion

Variable selection impact

Conclusion

Data availability

Abbreviations

References

Acknowledgements

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Competing interests

Additional information

Publisher’s note

Electronic supplementary material

Supplementary Material 1. (download DOCX )

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Quick links