Introduction

Metabolic dysfunction-associated fatty liver disease (MAFLD), previously termed non-alcoholic fatty liver disease (NAFLD), has emerged as the most prevalent chronic liver disease worldwide, closely linked to obesity, type 2 diabetes, and metabolic syndrome1,2. MAFLD encompasses a wide spectrum of liver conditions ranging from benign hepatic steatosis (MASLD) to more severe metabolic dysfunction-associated steatohepatitis (MASH), which is characterized by inflammation, hepatocyte damage, and varying degrees of fibrosis3. Diet and lifestyle modifications are effective in reversing the early stages of MASLD4,5,6. MASH may be reversible, but often progresses to complications such as liver fibrosis, cirrhosis, and hepatocellular carcinoma (HCC)7. The recent change in nomenclature reflects a shift towards acknowledging the central role of metabolic dysfunction in the pathogenesis of the disease and aims to reduce stigma while improving diagnostic clarity2. Approximately 25–30% of patients with MASLD are estimated to progress to MASH during the disease8, underscoring the critical need to better understand the mechanisms driving disease progression.

Despite its high global prevalence, the pathogenesis of MAFLD remains poorly understood, particularly regarding the role of immune cell populations in disease progression9,10. MASLD-associated liver injury is driven by a complex interplay between metabolic stress and immune dysregulation11. Previous studies have focused on the role of individual immune cell types, such as macrophages or T cells, in MASLD12. However, the complex interactions among various immune cell types across the stages of MAFLD and how these interactions contribute to disease progression have not been fully explored13. This gap in knowledge limits the development of targeted therapies and personalized treatment strategies for MASLD.

MASLD progression to MASH is characterized by changes in hepatic immune cells composition14. For instance, an imbalance between pro-inflammatory M1 macrophages and anti-inflammatory M2 macrophages has been implicated in driving hepatic inflammation and fibrosis15. Similarly, the accumulation of exhausted CD8 + T cells exacerbates tissue damage through their cytotoxic effects on hepatocytes16. In a methionine-choline-deficient (MCD) MASH mouse model, the progressive recruitment of T lymphocytes, lobular inflammation, and ballooning are associated with the accumulation of CD8 + T cells in the liver17. These results indicate an altered hepatic immune function in MASH development. However, the complex interplay between these immune cells and their role in orchestrating disease progression remains poorly understood18. A comprehensive analysis of immune cell landscapes in MASLD is essential to clarify the significance of cellular interactions in disease progression. Understanding these cell states and their impact on patient outcomes could enhance our understanding of MASLD and uncover new biomarkers for diagnosis and intervention19,20,21.

Furthermore, while recent advances in single-cell sequencing have provided valuable insights into cellular heterogeneity within MAFLD22,23,24,25, challenges such as sampling bias, high costs, and data noise limit its application in clinical settings26,27. This underscores the need for computational tools that can analyze bulk RNA-sequencing data to decipher immune cell compositions and interactions more cost-effectively18.

CIBERSORT is one such tool, designed to deconvolute bulk gene expression data and estimate immune cell proportions within mixed samples28. It uses a predefined signature matrix of reference gene expression profiles for distinct immune cell types, allowing for the estimation of immune cell abundances through mathematical modeling and linear regression29. Ecotyper, on the other hand, offers a more comprehensive approach by analyzing the transcriptional heterogeneity of cellular ecosystems30. It combines machine learning with a modular framework, encompassing steps such as in silico purification, cell state and ecotype discovery, and recovery31. By leveraging expression data, Ecotyper enables the identification of distinct cell states and ecotypes within a complex mixture of cells. This facilitates the understanding of cell-type-specific transcriptional patterns, aiding in the elucidation of cellular dynamics and interactions underlying disease progression and pathogenesis. While Ecotyper has been applied to study immune cell landscapes and survival in high-grade serous ovarian cancer32, its application in MAFLD remains limited.

Here, we aim to address these gaps by conducting a comprehensive investigation of immune cell states and their clinical significance across different stages of MAFLD. By integrating state-of-the-art computational tools such as Ecotyper, we examine how immune cell states and ecotypes are associated with disease progression and patient outcomes. Additionally, we identified resveratrol as a promising therapeutic candidate capable of modulating immune responses and extracellular matrix remodeling. These results were validated in independent datasets and by other analytical tools. The overall data analysis framework is demonstrated in Fig. 1. Our study may unveil the potential of these cell states as biomarkers for disease discrimination and prognosis, shedding light on the drug development of MASLD.

Fig. 1
figure 1

The overall data analysis framework for our analysis. Public transcriptomic datasets (GSE167523, GSE135251, and GSE68421) were analyzed. Ecotyper was used to identify transcriptionally distinct immune cell states and ecotypes in MASLD and MASH. Differentially expressed immune cell states between MASLD and MASH were identified to reveal disease-specific signatures. WGCNA was applied to detect co-expression modules associated with disease progression. Functional enrichment analyses were performed using GO and KEGG to interpret the co-expression modules. Resveratrol was identified as a potential therapeutic candidate through statistical drug screening models. External datasets were used to validate the role of resveratrol and immune cell states in MASLD after treatment. Immune cell states’ ability to distinguish MASLD from MASH was evaluated using AUC analysis. Mendelian randomization was conducted to establish causal relationships between key genes and MAFLD progression. WGCNA: weighted gene co-expression network analysis; GO: gene ontology; KEGG: Kyoto Encyclopedia of Genes and Genomes; MASLD: metabolic dysfunction-associated steatotic liver; MASH: metabolic dysfunction-associated steatohepatitis; AUC: Area Under the Curve.

Materials and methods

Datasets and preprocessing

Three MAFLD tissue datasets were obtained from the National Center for Biotechnology Information (NCBI) Gene Expression Omnibus (GEO). Two datasets are from RNA-Seq platforms, including GSE167523 and GSE135251. GSE167523 contains liver transcriptome profiles of 47 MASLD and 51 MASH patients, respectively33. GSE135251 contains liver transcriptome profiles of 216 MAFLD samples across the disease spectrum34. As we focus on the changes between MASLD and MASH, GSE167523 was used for immune cell states and ecotypes analysis. GSE135251 was used for validation. Additionally, GSE68421, which contains the MASLD hepatic transcriptome (n = 7) from a randomized clinical trial with treatment by resveratrol 1.5 g daily for 6 months, was utilized for drug target validation35. Due to variations in clinical and histological information across datasets, they were analyzed individually rather than in a pooled manner.

Briefly, the expression matrix, with gene symbols as the initial column, was uploaded onto the Ecotyper online tool accessible at https://ecotyper.stanford.edu/carcinoma/. Ecotyper represents a robust platform tailored for transcriptomic analysis, particularly in deciphering cell-type-specific transcriptional profiles and higher-order cellular ecosystems30. This tool offers valuable insights into the intricate composition of heterogeneous biological specimens. Subsequently, the output files containing information on cell states and ecotypes were downloaded and merged into a unified file for downstream analyses.

Weighted gene co-expression network analysis (WGCNA)

Samples were dichotomized into “CE1 high-expression” and “CE1 low-expression” groups based on the median value of CE1 abundance across all samples. Samples with CE1 values above the median were classified as CE1-high, and those below the median as CE1-low. The R limma package was for differential expression gene identification in CE1 high-expression compared to CE1 low-expression samples36. Parameters were set as follows: log2FoldChange > 0.6, BH adjusted P < 0.05. WGCNA for gene co-expression module identification followed the package manual’s guidelines37. Specific parameters were set as follows: softPower = 22, corOptions = “use = ‘p’”, networkType = “signed”, minModuleSize = 20, deepSplit = 4, MEDissThres = 0.2. The process involves calculating the pairwise Pearson correlation coefficients between genes, followed by the construction of an adjacency matrix. This is done by raising the correlation matrix to a power of 22, a parameter known as the ‘soft power.’ This soft power value controls the degree of network connectivity and is selected to ensure that the network follows a scale-free topology, which is important for the robustness and reliability of the results. The weighted network was transformed into a network of topological overlap (TO), and genes were hierarchically clustered based on their TO. Co-expression gene modules were identified using the Dynamic Tree Cut algorithm38. A module in WGCNA is defined as a group of genes whose expression patterns are highly correlated across samples. These genes are typically functionally related and may share common biological processes or pathways. Each module expression was represented by a module eigengene (ME), which captured its principal component. WGCNA also provided gene connectivity information, revealing hub genes with high connectivity39.

Single-cell analysis and cell type-level expression analysis were based on online tools Human Protein Atlas40 and GEPIA202141. Two independent MASH databases SteatoSITE and NAFLD scRNA seq were used for validating the gene expression42,43.

Functional annotation and drug screening

Enrichment analysis of gene sets was conducted using the clusterProfiler package44. For drug screening, up-regulated and down-regulated genes were submitted separately to the Connectivity Map45, and significant results were determined at a significance level of adjusted P < 0.01. Protein-ligand docking was performed using SwissDock46, and ligand-protein interactions were visualized with LigPlot+47.

Mendelian randomization (MR) analysis

MR is a statistical method that uses genetic variants as instrumental variables (IVs) to assess the causal relationship between an exposure (e.g., gene expression) and an outcome (e.g., MASLD or MASH progression)48. The key assumption of MR is that genetic variants are randomly assigned at conception, and they influence the exposure but are not directly related to confounding factors that could affect the outcome49. In this study, we employed MR to validate that the expression of AEBP1 plays a causal role in MASLD progression. Statistical methods such as the inverse-variance weighted (IVW) method and MR-Egger regression were applied to estimate the causal effects and test for potential pleiotropy50. The TwoSampleMR package was used with exposure data of eqtl-a-ENSG00000106624 and outcome data of finn-b-K11_FIBROCHIRLIV in the IEU OpenGWAS project51.

Statistical analysis

Differential analyses were carried out using Student’s t-test for pairwise comparison or ANOVA for multiple pairwise comparisons. Adjusted P values below 0.05 were considered statistically significant. The associations of cell states, ecotypes, or gene expression with patient clinical parameters were analyzed by multivariate regression models in the R package. Association analyses were adjusted for the patient’s age and gender. To identify independent predictors, we first performed univariate generalized linear modeling (GLM) for each immune cell state and clinical variable using the glm function in R. Variables with P < 0.1 in univariate analyses were included as candidates in the multivariate GLM. The final multivariate model was determined using a stepwise selection procedure (step function in R), based on Akaike Information Criterion (AIC). Immune cell abundance values were scaled to percentages before modeling. Multicollinearity among variables in the final model was assessed using variance inflation factors (VIFs), all of which were below 2, indicating low multicollinearity. The Area Under the Curve (AUC) was used to evaluate the performance of a classification model, where a higher AUC indicates a better ability to differentiate between the two groups. Gene-level survival curves were plotted by Kaplan-Meier Plotter52.

Results

Differential expression of immune cell states in MASLD and MASH

The clinical relevance of cell states in MAFLD has not been fully explored. In this study, we utilized the abundance matrix of cell states to investigate their associations with MASH. In this study, an ‘ecotype’ refers to a higher-order grouping of cell states identified by Ecotyper. These ecotypes represent clusters of transcriptionally similar cell states and offer a comprehensive view of cellular ecosystems within MAFLD. Of the 71 cell states quantified (Fig. 2A), 32 showed significant differential expression, all of which were up-regulated in MASH compared to MASLD. These 71 cell states were further organized into 10 ecotypes in GSE167523 (Fig. 2A). Notably, the top 6 significant cell states were Fibroblasts.3 (fold change (FC) = 3.6), Epithelial.cells.3 (FC = 4.4), PMNs.3 (FC = 3.0), Macrophages.6 (FC = 3.8), Mast.cells.5 (FC = 4.1), and Fibroblasts.7 (FC = 3.1) (Fig. 2B, all with P < 3E-12). Interestingly, the six cell states all belong to the higher cell state ecotype CE1, which is characterized by fibroblasts, EMT, and TGF-β response and poor outcome30. According to Ecotyper, Fibroblasts.3 marker gene is COL10A1 and is annotated as tumor-associated. Epithelial.cells.3 marker gene is ITGA3 and is annotated as pro-angiogenic. PMNs.3 marker gene is CLEC12A and is annotated as classically activated. Macrophages.6 marker gene is AEBP1 and is annotated as M2 foam cell-like. Mast.cells.5 marker gene is ARRB2 and is annotated as unknown. Fibroblasts.7 is annotated as unknown. Interestingly, the marker genes of Macrophages.6, Fibroblasts.3, Epithelial.cells.3, and Mast.cells.5 were gradually elevated across fibrosis stages 0 ~ 4 during MAFLD progression (P < 0.001, ANOVA, Fig. 2C), indicating the importance of these genes and respective cell subsets. Single-cell analysis revealed that AEBP1 is expressed in fibroblasts, smooth muscle cells, endothelial cells, and macrophages (Fig. 2D). In normal and carcinoma livers, AEBP1 is highly expressed in M2 macrophages (Fig. 2E). These results may indicate the importance of macrophages in the transition from MASLD to MASH.

Fig. 2
figure 2

The variance in cell abundance across six distinct cell states between MASLD and MASH, and marker genes across fibrosis stages in GSE167523. (A) The heatmap shows the distribution of 71 cell stats in MAFLD. (B) Box plots for the six top significant cell states between MASLD and MASH (P < 0.0001, t test). PMNs: Polymorphonuclear neutrophils, MASLD: metabolic dysfunction-associated steatotic liver disease, MASH: metabolic dysfunction-associated steatohepatitis. (C) Macrophages.6 marker gene AEBP1, Fibroblasts.3 marker gene COL10A1, Epithelial.cells.3 marker gene ITGA3, and Mast.cells.5 marker gene ARRB2 are progressively up-regulated across fibrosis stages 0 ~ 4 during MASH progression (P < 0.001, ANOVA). (D) Single-cell analysis shows that in normal liver tissue, AEBP1 is expressed in fibroblasts, smooth muscle cells, endothelial cells, and macrophages in the Human Protein Atlas. The x-axis and y-axis represent the first and second UMAP dimensions (UMAP1 and UMAP2), respectively, illustrating the two-dimensional projection of single-cell transcriptomic profiles. UMAP: Uniform Manifold Approximation and Projection. (E) AEBP1 is significantly expressed in M2 macrophages in normal and carcinoma liver (P < 0.001, ANOVA) in GEPIA2021. HCC: hepatocellular Carcinoma.

Macrophages.6, fibroblasts.7, and age are risk factors for the occurrence of MASH

To check the contribution of cell states to the MASLD disease progression, we performed multivariate analysis in GSE167523. In univariate analysis, all six cell states were significantly associated with MASH occurrence, even incorporating age and gender. However, only Macrophages.6 and Fibroblasts.7 remained significant in multivariable analysis, after considering age and gender as covariates (Table 1). Macrophages.6 (M2 foam cell-like) has the highest value of HR, indicating its great contributions to MASH. In dataset GSE135251, we also found that Fibroblasts.3 and Macrophages.6 were significantly higher in advanced fibrosis (stage F3–F4) than in mild fibrosis (stage F0–F2) with P values of 5.2E-6 and 1.9E-5 (Fig. 3). These results confirmed the importance of Macrophages.6 in the disease progression.

Table 1 Multivariate analysis for the occurrence of MASH in GSE167523.
Fig. 3
figure 3

Differentially expressed (A) Fibroblasts.3 and (B) Macrophages.6 were positively associated with MASH fibrosis severity in GSE135251. The comparison between disease groups was conducted by the Kruskal-Wallis Analysis of Variance (ANOVA) test with adjusted P values. MASLD: metabolic dysfunction–associated steatotic liver disease. MASH: metabolic dysfunction-associated steatohepatitis.

ROC analysis revealed that Fibroblasts.3 (AUC = 0.936) had the highest performance in discrimination of MASH from MASLD, followed by Epithelial.cells.3 (AUC = 0.923) and Macrophages.6 (AUC = 0.921) (Fig. 4). At the higher ecotype level, we found that CE1 can separate MASH best (AUC = 0.891), followed by CE3 (AUC = 0.826). Higher CE4 abundance corresponds to an increased likelihood of MASLD rather than MASH (AUC = 0.871, Supplementary Material Figure S1A-C). CE3 includes Macropahges.4 (classical M2), Mast.cells.6 (activated), Dendritic.cells.2 (myeloid cDC2-B (Inflammatory)), CD8 T cells.2 (late-stage differentiated effector). CE4 includes Fibroblasts.1 (myofibroblast-like) and Endothelial.cells.5 (myoendothelium-like). According to the annotation, CE1, CE3, and CE4 are all associated with poor outcomes. CE1 was characterized by EMT, female, and TGF-β response. CE3 was characterized by COSMIC signature 17, which is associated with T > C and T > A mutations and cancers. CE4 was characterized by myogenesis, old age (> 60), and male. These findings suggest that the cell states and cell ecotypes identified in MAFLD may play a crucial role in patient outcomes and warrant further investigation for a better understanding of their clinical implications.

Fig. 4
figure 4

ROC curves show that the six cell states could separate metabolic dysfunction-associated steatotic liver disease (MASLD) and metabolic dysfunction-associated steatohepatitis (MASH) patients in GSE167523. ROC: receiver operating characteristic, AUC: area under the ROC curve.

Differential gene expression analysis of CE1 high and low expression samples

As CE1 demonstrated the best MASH sample discriminating ability, we conducted a differential gene expression analysis between these patients assigned to the CE1 ecotype and the remaining samples in GSE167523. Low CE1-expressing samples were treated as controls. A total of 727 DEGs were identified (Fig. 5A). Among them, the 10 most significant genes were MSN (log2FC = 0.76), EHD4 (log2FC = 0.69), CAPN2 (log2FC = 0.91), COL1A2 (log2FC = 1.25), PACSIN3 (log2FC= -1.00), COL1A1 (log2FC = 1.32), ANXA2 (log2FC = 0.94), FAT1 (log2FC = 1.21), COL4A2 (log2FC = 1.01), and MVP (log2FC = 0.73), all of which exhibited up-regulation in CE1 high-expression samples, except PACSIN3. Expression patterns of the 10 genes were the same in two independent MASH datasets (Supplementary Material Figure S2)42,43. AKR1B10 was the gene with the highest up-regulated fold change. AEBP1, THBS2, and many collagen genes were also up-regulated (Table S1). Notably, EHD4 was favorable and MSN and ANXA2 were unfavorable in early-stage HCC survival (Supplementary Material Figure S3A, C), indicating the differential roles of these genes during the disease progression. The clustering analysis based on the DEGs revealed that CE1 high-expression samples clustered together under the same major branch (Fig. 5B). Functional enrichment analysis suggested the up-regulated genes were significantly enriched with extracellular matrix (ECM) organization and PI3K-Akt signaling. The down-regulated genes showed no association with ECM. Instead, they were associated with responses to copper ion (Fig. 5C, D). These results indicate the importance of the ECM, PI3K-Akt signaling, and CE1 ecotype in the disease progression.

Fig. 5
figure 5

The distinctive expression patterns of samples expressing high levels of CE1 in GSE167523. (A) A total of 727 genes exhibiting differential expression (DEGs) are identified. (B) The clustering heatmap of the DEGs displays the discernible expression patterns that differentiate these groups. In the color bar situated above the heatmap, the color blue denotes MASLD or men, while the color pink denotes MASH or women. (C) Kyoto Encyclopedia of Genes and Genomes (KEGG) pathways enrichment analysis for the up-regulated genes. (D) Gene Ontology (GO) biological processes enrichment analysis for the down-regulated genes. Both enrichment analyses were performed in the clusterProfiler tool.

Weighted gene co-expression network analysis identified modules associated with fibroblasts and macrophages

As the DEGs were screened based on cell states, we conducted gene co-expression analysis to check if these DEGs were associated with cell types and distinct biological processes. A scale-free network was constructed (Fig. 6A). We identified only three modules that are associated with disease status, age, and gender (Fig. 6B, C). M1 and M2 were positively associated with MASH status, but negatively correlated with gender (i.e., higher module expression observed in males). M1 was also positively correlated with age (Fig. 6C). Functional enrichment analysis showed that module M1 was significantly enriched with fibroblasts and was associated with ECM organization (Fig. 6D). Module M2 showed a strong correlation with cell cycle regulation and leukocyte migration and was enriched with macrophages (Fig. 6E). The hub genes in M1 include LAMA2 and AEBP1. Interestingly, the expression of AEBP1 was highly correlated with that of M1 (Supplementary Material Figure S4A). Bidirectional MR supported a causal relationship between AEBP1 expression and liver fibrosis progression (Supplementary Material Figure S4B-D). Therefore, AEBP1 may be the upstream regulator of MASLD progression.

Fig. 6
figure 6

Differentially expressed genes between ecotype CE1 high and low-expressing samples in GSE167523 were organized into three functional modules. (A) When a power value of 22 is set, the constructed co-expression network adheres to a power law. (B) The cluster dendrogram shows gene module assignments. The accompanying color bar denotes the respective gene module assignments. (C) Heatmap depicting the relationship between modules and traits. Each cell within the heatmap corresponds to the correlation between the expression of specific modules and distinct clinical parameters. The numerical values enclosed in brackets within each cell denote the statistical significance of the correlation. (D,E) Gene Ontology (GO) biological processes enrichment analysis is performed on the genes within modules M1 and M2 by the clusterProfiler tool.

Candidate drug screening and independent data validation

To identify candidate drugs for MASH, we selected all up-regulated and down-regulated genes from differentially expressed genes and submitted them to the Connectivity Map tool. Six significant chemicals were identified, including resveratrol, etoposide, clioquinol, ciclopirox, ChemBridge ID compound 5,109,870, and perhexiline. The four top candidates are listed in Table 2. Protein-ligand docking analysis showed the interactions between Macrophages.6 marker gene AEBP1 and resveratrol (Fig. 7A). Pose view analysis showed that hydrogen bonds may reside at sites Asn685, Gly832, and Asp878 in AEBP1 (Fig. 7B). Independent datasets (GSE68421) were used to validate the results. In the randomized clinical trial dataset with seven MASLD patients (n = 7) treated with resveratrol for six months, the expression of AEBP1 and Macrophages.6 (M2 foam cell-like) showed a downward trend, though not statistically significant, suggesting a potential modulatory effect of resveratrol on this cell state. Moreover, Fibroblasts.8 (pro-migratory-like), one of the 32 differentially abundant cell states identified earlier (Table S2), showed the most significant reduction following resveratrol treatment. Using the top 30 down-regulated genes, we found that these genes were enriched with immunoglobulin complex, including IGHV3-74, IGHV3-33, IGHA1, IGKC, and IGLV1-44 (Fig. 7C). These results may indicate potential mechanisms and reduced inflammation after resveratrol treatment. Given the very small sample size, these findings should be interpreted with caution.

Table 2 The top 4 candidate MASH drugs identified by connectivity Map.
Fig. 7
figure 7

Analysis of protein-ligand docking for resveratrol and the marker genes AEBP1 in Macrophages.6. (A) The three-dimensional structure models showcase the interactions between AEBP1 and resveratrol. (B) The pose view provides a visual representation of the interaction sites, with hydrogen bonds depicted by black dashed lines between red atoms. (C) Immune-related genes IGHV3-74, IGHV3-33, IGHA1, IGKC, and IGLV1-44 were down-regulated by resveratrol treatment in human MASLD (GSE68421, n = 7). MASLD: metabolic dysfunction-associated steatotic liver disease.

Discussion

This study provides valuable insights into the molecular mechanisms driving MASLD progression by identifying distinct cell states and their alterations across disease stages. These findings may provide potential cellular biomarkers and therapeutic targets for personalized medicine.

We identified six significantly up-regulated cell states (Fibroblasts.3, Epithelial.cells.3, PMNs.3, Macrophages.6, Mast.cells.5, and Fibroblasts.7), all belonging to CE1, indicating a shared biological significance. Among these, Mast.cells.5 was annotated as “unknown” in Ecotyper, with ARRB2 identified as its marker gene. Recent in vivo studies have demonstrated that β-arrestin 2 (ARRB2) is up-regulated in liver macrophages and circulating monocytes of MASH patients and correlates with disease severity. Myeloid-specific ARRB2 deletion in mice attenuates MASH by improving macrophage metabolic programming, inhibiting IRG1-mediated itaconate suppression, and reducing mitochondrial ROS and M1 polarization53. Future functional assays targeting ARRB2 in our data-derived cell state could further elucidate its causal role. Notably, Fibroblasts.3, shows strong potential as biomarkers for distinguishing MASLD from MASH, highlighting key pathways such as ECM organization and the PI3K-Akt signaling pathway. In multivariate analysis, Macrophages.6 and Fibroblasts.7 were confirmed as key contributors to MASH progression. Age showed a significant but smaller hazard ratio (HR = 1.1) compared to the immune cell states, indicating its relatively modest contribution to disease progression. High CE1 expression indicates a higher risk of MASH.

The robust discriminative power of these cell states, particularly Fibroblasts.3, suggests their potential utility in clinical settings for accurate MASH diagnosis and patient stratification. The Ecotyper annotation showed that Fibroblasts.3 is linked to tumor-related processes, while Epithelial.cells.3 demonstrated pro-angiogenic traits. Fibroblasts.3 marker gene COL10A1 has been identified as the most substantial up-regulation collagen in cirrhotic liver, although type I collagen (COL1A1, COL1A2) is the most abundant protein in the ECM54. Epithelial.cells.3 marker gene ITGA3 plays a role in both the regulation of the ECM and the infiltration of immune cells55. Additionally, the involvement of angiogenesis in MASH progression has been emphasized in recent studies56. These functional annotations enhance our understanding of the potential roles played by these cell states in MAFLD pathogenesis.

Notably, several identified immune cell states, such as Fibroblasts.3 and Macrophages.6, were significantly associated with fibrosis severity in independent cohorts, suggesting their potential as biomarkers of fibrogenesis. Given that liver fibrosis and cirrhosis are key determinants of long-term outcomes in MASLD, incorporating these immune signatures into risk stratification models may facilitate personalized monitoring and intervention. Further studies are warranted to validate their predictive value for fibrosis progression and to explore their utility in guiding therapeutic decisions in patients at risk of cirrhosis.

The identification of cell state ecotypes, such as CE1, CE3, and CE4, with strong discriminatory capabilities further enriches our understanding of cell state involvement in MAFLD. Fibrosis emerges as a consequence of an excessive synthesis of ECM that is not adequately balanced by degradation, thus resulting in net accumulation57. The PI3K-Akt signaling pathway was enriched in CE1 high-expression samples. It may play a critical role in cell proliferation, apoptosis, and macrophage polarization, underpinning its involvement in MASH pathogenesis58. This pathway’s role in metabolic and inflammatory responses further underscores its therapeutic potential59,60,61. The dysregulation of copper metabolism, reflected in the enrichment of genes responsive to copper ions in CE1 high-expression samples, also points to a novel aspect of MASH development, where mineral imbalances contribute to oxidative stress, mitochondrial dysfunction, and fibrogenesis62. Therefore, targeting copper metabolism may represent a novel therapeutic avenue, warranting further investigation in MASLD.

Most CE1 samples were from MASH patients, indicating their relevance to the disease. Module-level analysis confirmed processes such as cell cycle regulation, leukocyte migration, and ECM organization in MASH, with M1 macrophages being closely linked to ECM remodeling. Interestingly, M1 was negatively associated with gender, supporting findings that women are at greater risk of MASLD progression63. ECM exhibited consistent up-regulation in cases of advanced fibrosis compared to mild fibrosis64. M2 is associated with hepatic macrophages. Homeostasis of Kupffer cells is markedly impaired in the context of MASH, and this has a notable impact on liver pathology65. These results provide novel insights into the complex interplay between different cell types and disease progression.

Interestingly, several genes upregulated in CE1-high samples, such as MSN, EHD4, and ANXA2, were associated with unfavorable survival outcomes in HCC. This observation aligns with the well-established notion that MASH is a major precursor of HCC and supports the idea that these genes may play pivotal roles throughout the disease continuum, from metabolic dysfunction and fibrosis to malignant transformation. The enrichment of ECM remodeling and pro-inflammatory pathways in CE1-high samples further suggests that the microenvironment created by these gene expression changes may not only drive fibrosis but also promote carcinogenesis. Therefore, the survival association observed in HCC highlights the potential prognostic and pathogenic significance of these genes beyond MASH, warranting further investigation.

Our drug screening analysis revealed possible interactions between the Macrophages.6 marker gene AEBP1 and resveratrol, highlighting the potential for targeting specific cell states to modulate disease outcomes. Resveratrol, known for its antioxidant properties and ability to modulate inflammation and fibrosis through the PI3K-Akt pathway, emerged as a promising therapeutic for MASLD/MASH66,67,68. Future studies on its structure-activity relationship (SAR) could facilitate the development of derivatives with improved bioavailability and targeted action69. We found a good linear correlation between AEBP1 and the ECM module M1, which contains 14 collagen genes. This is consistent with previous reports64. According to the Mouse Genome Informatics (MGI) database, Aebp1−/− mice have decreased circulating cholesterol and triglyceride levels, cell proliferation, and many other liver-related phenotype alterations70. We confirmed that AEBP1 is a potential causal gene of MASH by MR analysis. Studies suggest that AEBP1 was associated with inflammation, fibrosis, and MASH71,72,73.

Single-cell analysis indicates that AEBP1 expression was elevated in M2 macrophages, which may drive MASH progression. Although AEBP1 down-regulation was not significant following resveratrol treatment, several immunoglobulin genes, including IgG, were significantly down-regulated. These genes reflect B cell–derived immunoglobulin production, and their reduction suggests a dampened humoral immune response and decreased inflammatory activity, consistent with the proposed anti-inflammatory effects of resveratrol. Increased levels of IgG are independently linked to a higher risk of hepatic decompensation and mortality in MASH74. Besides, we identified the significantly down-regulated MFSD2A, which can modulate intestinal endothelial cells’ response to inflammation75. Resveratrol can reduce neuropathic pain in mice by maintaining a balance between the release of pro-inflammatory and anti-inflammatory cytokines76. Resveratrol’s potential to reduce immune responses and mitigate inflammation aligns with its protective effects in early-stage hepatocellular carcinoma77. Therefore, resveratrol can be a promising candidate for MASH treatment.

There are several limitations that should be considered in our analysis. Ecotyper relies on pre-defined reference signatures of cell types30, which may not fully capture the diversity of immune cell states, especially in complex diseases like MAFLD. This could potentially lead to incomplete or biased identification of certain cell states. To address this, we validated our findings using multiple datasets78,79. We also performed additional complementary analysis methods like WGCNA, single-cell analysis, and bidirectional MR to ensure the robustness and reproducibility of the results. It is important to acknowledge that the findings are based on secondary analyses of public transcriptomic datasets. Experimental validation at the cellular level is necessary to confirm the functional relevance of the identified cell states and pathways. Such experiments would strengthen the translational impact of our findings and help establish these immune alterations as actionable targets in MAFLD.

In conclusion, this study significantly enhances our understanding of the clinical relevance of cell states in MAFLD. The identified cell states, particularly CE1, which exhibits high discriminatory ability, can serve as powerful diagnostic biomarkers for distinguishing MASLD from MASH. Moreover, targeting specific immune cell states or pathways, such as the PI3K-Akt signaling pathway enriched in CE1, could lead to the development of more effective therapeutic interventions. Our findings highlight the significant translational potential of resveratrol, a polyphenolic compound, for the treatment of MASLD. By modulating immune responses and extracellular matrix remodeling, resveratrol could reduce inflammation and fibrosis, which are critical drivers of MASH progression. However, further studies are needed to clarify the underlying molecular mechanisms, particularly regarding inflammation, lipid metabolism, and gut-liver interactions. To advance these insights, we plan to conduct comprehensive in vitro and in vivo experiments to elucidate the molecular pathways that regulate these cell states and identify key therapeutic targets. Additionally, clinical trials are essential to validate its efficacy and optimize its therapeutic application. Future studies should also investigate the pharmacokinetics of resveratrol, its interactions with immune cell states, and its effectiveness in larger, diverse patient populations.

Conclusions

In this study, we systematically characterized the immune cell state alterations associated with MAFLD progression from MASLD to MASH. By integrating transcriptomic deconvolution and co-expression network analysis, we identified specific immune cell states and ecotypes, particularly CE1, that are strongly associated with disease severity. Furthermore, the identification of resveratrol as a candidate compound targeting key immune cell markers such as AEBP1 provides preliminary insights into potential therapeutic strategies. These findings enhance the current understanding of immune microenvironment remodeling in MASLD and highlight possible molecular targets for future intervention. However, further experimental validation and clinical studies are warranted.