Introduction

Esophageal cancer (ESCA) represents a significant global health challenge, characterized by rising incidence rates and a profound impact on patient survival and quality of life1. The disease not only inflicts a heavy toll on affected individuals but also imposes substantial economic burdens on healthcare systems due to high treatment costs and lost productivity2. Current diagnostic and therapeutic modalities, including surgical intervention, chemotherapy, and radiotherapy, are hindered by limitations such as late-stage diagnosis and the development of treatment resistance3,4. These challenges underscore a critical research gap, necessitating the exploration of novel approaches to enhance early detection and treatment efficacy5. In this context, the present study aims to investigate the role of perfluorooctane sulfonate (PFOS), an environmental toxicant, in the pathogenesis of ESCA.

Recent studies have suggested that various environmental factors may play a pivotal and significant role in the complex development of esophageal cancer, a serious and often life-threatening condition6,7. Among these environmental influences, PFOS has garnered considerable attention as a persistent organic pollutant, known for its widespread presence in the environment and its potential carcinogenic properties, raising concerns about its impact on human health and the urgent need for further research into its effects8. Prior research has indicated correlations between PFOS exposure and various health risks, including cancer, highlighting the necessity for further exploration of its specific effects on esophageal cancer9,10. Understanding the association between PFOS and the carcinogenic processes involved in ESCA could provide valuable insights into the disease’s etiology and pave the way for preventive strategies. Furthermore, a recent investigation has revealed that exposure to environmental factors and the accumulation of PFOS in the body may significantly contribute to the progression of esophageal squamous cell carcinoma11. Although there is an expanding body of research linking environmental pollutants to cancer, a considerable gap persists in our understanding of the precise mechanisms by which PFOS affects the development of esophageal cancer. Most existing research has concentrated on general correlations, neglecting the molecular alterations caused by PFOS exposure12,13,14. This deficiency in detailed insight highlights the necessity for advanced bioinformatics methods to analyze extensive datasets, which would allow researchers to pinpoint crucial toxicological targets and clarify their interactions with PFOS.

In this study, we adopt a comprehensive bioinformatics methodology that includes differential expression analysis, machine learning algorithms, and molecular docking simulations. These tools allow for the identification of differentially expressed genes (DEGs) associated with PFOS exposure and their potential roles in the progression of esophageal cancer. By leveraging these techniques, we aim to uncover the molecular mechanisms underlying the relationship between PFOS and esophageal cancer, ultimately identifying novel biomarkers that can aid in disease diagnosis and treatment.

The primary objective of this research is to elucidate how PFOS impacts the development of esophageal cancer at a molecular level and to explore the potential of identified biomarkers for clinical application. By integrating bioinformatics approaches, we hope to provide a clearer understanding of the influence of environmental toxins on cancer pathogenesis and contribute to the ongoing efforts in cancer research and public health. The findings from this study could significantly advance our knowledge of esophageal cancer and facilitate the development of targeted therapies aimed at mitigating the effects of environmental toxins like PFOS.

Methods

Collection of ESCA-related targets

We made use of genomic data that is publicly accessible from the TCGA-ESCA dataset, which comprises RNA-sequencing information for 163 ESCA tumor samples and 11 samples of normal tissue. This dataset was obtained from the Cancer Genome Atlas (TCGA) portal (https://portal.gdc.cancer.gov/). Differentially expressed genes (DEGs) between ESCA tumor tissues and normal tissues were identified with the limma package in R. The threshold for considering DEGs significant was set at p.adj < 0.05.

Prediction of PFOS toxicity targets

Potential toxicity targets of PFOS were predicted using the Comparative Toxicogenomics Database (CTD) (https://ctdbase.org/) and SuperPred (https://prediction.charite.de/index.php)15,16. Venn diagram analysis was conducted utilizing an online tool (https://bioinfogp.cnb.csic.es/tools/venny/index.html) to pinpoint overlapping targets that are differentially expressed in connection with PFOS exposure and are involved in the pathogenesis of esophageal cancer.

Expression and PPI analysis

The expression levels of the identified toxicity targets were visualized in a comprehensive heatmap, which was generated using the sophisticated “ComplexHeatmap” package. In addition, a thorough protein-protein interaction (PPI) analysis was conducted utilizing the extensive STRING database, which is a valuable resource for understanding the functional associations between proteins (https://cn.string-db.org/)17. The resulting interaction network was then elegantly visualized with the advanced Cytoscape software, specifically version 3.8.2. The top 40 hub genes were selected based on degree centrality (the number of connections per node) within the PPI network, as nodes with higher degrees are considered biologically pivotal in network analysis. Nodes with degree values below the top 40 were excluded to focus on the most interconnected and potentially influential targets.

Enrichment analyses

Kyoto Encyclopedia of Genes and Genomes (KEGG) (https://www.kegg.jp/) provides an integrated database platform for the systematic representation and computational analysis of biological systems18. Enrichment analyses were conducted to gain insights into the functional roles of the identified genes, specifically utilizing the Gene Ontology (GO) and KEGG enrichment analyses. These analyses were performed with the aid of the “clusterProfiler” package, version 3.18.0. In the context of the GO analysis, we considered three distinct categories: biological process (BP), cellular component (CC), and molecular function (MF), each providing a unique perspective on gene functionality. To effectively communicate our findings, the top 10 pathways identified through these analyses were visually represented using informative bar plots.

Machine learning algorithms

Four distinct machine learning algorithms were employed to identify and analyze key toxicological targets, each chosen for its unique strengths and capabilities in handling complex datasets. The Random Forest (RF) algorithm was implemented using the “randomForest” package, allowing for robust classification and regression tasks through its ensemble learning approach. Meanwhile, the XGBoost model was developed utilizing the “xgboost” package, which is renowned for its efficiency and performance in gradient boosting, particularly in scenarios involving large datasets and high-dimensional feature spaces. LASSO regression, a technique known for its ability to perform both variable selection and regularization to enhance the prediction accuracy and interpretability of the statistical model, was executed using the “glmnet” package. Lastly, the Support Vector Machine (SVM) algorithm was executed through the “e1071” package, which is widely recognized for its effectiveness in classification tasks, especially in high-dimensional spaces. To visualize the relationships and intersections among the identified toxicological targets, an UpSet Venn diagram was created using the “UpSetR” package. Detailed descriptions of the machine learning methods are provided in the Supplementary Methods section.

Differential expression analysis

The differential expression of the core toxicity targets, which include PLAU, TOP2A, and BAX, was analyzed through the use of box plots and receiver operating characteristic (ROC) curves, providing a comprehensive visual representation of the data. These informative plots were generated utilizing the “ggplot2” package, alongside the “pROC” package, within the R programming environment.

GSVA analysis

The CancerSEA database has categorized the distinct functional states of 14 different tumor cell types. A prior study introduced the z-score algorithm, which integrates the expression of characteristic genes to reflect pathway activity19. Using the R package GSVA with the zscore parameter, the 14 functional state gene sets were computed, and a combined z-score was obtained. We then applied the scale function to further normalize this score, defining it as the gene set score. Finally, we calculated the Pearson correlation between each gene and the respective gene set scores.

Immune cell infiltration analysis

The ESTIMATE score, a crucial metric in evaluating tumor microenvironments, was computed using the “estimate” package. Following this, the single-sample Gene Set Enrichment Analysis (ssGSEA) was conducted utilizing the “GSVA” package. To visualize the results, heatmaps that represent the correlation matrix were generated using the “pheatmap” package, allowing for a clear depiction of the relationships among the variables. Furthermore, the correlation between the core toxicity targets and the infiltration of various immune cell types was analyzed employing Pearson correlation.

Cell experiment protocol

The ESO-26 and FLO-1 cell lines, obtained from the American Type Culture Collection (ATCC), were cultured in RPMI-1640 medium (Life Technologies, Shanghai, China) supplemented with 10% fetal bovine serum (FBS) and 1% penicillin-streptomycin. Cells were maintained at 37 °C in a 5% CO₂ humidified incubator. For treatment, cells were incubated with either 2 µM PFOS or a DMSO vehicle control for 48 h. Post-treatment, the cells were washed twice with PBS and harvested for downstream analysis. Total RNA was extracted using a Thermo Fisher Scientific RNA isolation kit, followed by cDNA synthesis with the PrimeScript RT Reagent Kit. Gene expression profiling was performed via quantitative PCR (qPCR) on an ABI 7900HT system (Applied Biosystems). The 2–ΔΔCt method was applied to calculate relative gene expression levels.

Molecular docking analysis

To investigate the binding affinities between PFOS and key proteins, we conducted molecular docking simulations. We sourced 3D protein structures from the PDB (https://www.rcsb.org/) and refined them by eliminating water molecules, heteroatoms, and organic ligands using PyMOL. The PFOS ligand was sourced from PubChem (https://pubchem.ncbi.nlm.nih.gov/) and converted to PDBQT format via AutoDockTools, followed by energy minimization. We defined a 3D grid around each protein’s active site, carefully choosing dimensions to encapsulate the binding area. Docking was executed in AutoDock Vina with the prepared protein and ligand files. Binding affinities for each pose were computed, and the poses with the lowest binding energies were chosen. Visualization of the docking results was done using PyMOL 2.5.2.

Molecular dynamics (MD) simulations

MD simulations were carried out using the Desmond/Maestro 2022.1 software package. The system was constructed with the TIP3P water model and incorporated a 0.15 M NaCl solution to neutralize the system’s charge and replicate physiological conditions. Prior to the production phase, the system underwent energy minimization via the steepest descent method to eliminate any steric clashes or unfavorable interactions in the initial configuration. The system was then equilibrated for 100 ps under isothermal-isobaric (NPT) conditions at 300 K and 1 bar to stabilize the temperature and pressure. Subsequently, a 100 ns production simulation was performed at the same conditions, and trajectory analysis was conducted using Desmond’s built-in tools.

Statistical analysis

Data are expressed as mean values ± standard deviation (SD). Comparisons between groups were performed using an unpaired two-tailed Student’s t-test, with a p-value below 0.05 considered statistically significant.

Results

Identification and analysis of PFOS toxicity targets in esophageal cancer

In this study, we utilized the TCGA-ESCA dataset to identify a total of 5,757 differentially expressed genes (DEGs) associated with esophageal cancer. Subsequently, we employed the Comparative Toxicogenomics Database and SuperPred to predict 255 potential toxicity targets of perfluorooctane sulfonate (PFOS). Through the application of the Venn diagram tool, we further refined our analysis to identify 98 toxicity targets that are differentially expressed in response to PFOS exposure and are implicated in the pathogenesis of ESCA (Fig. 1A and Table S1). The identification of these 98 targets is crucial for understanding the molecular mechanisms by which PFOS may contribute to the development of esophageal cancer. In Fig. 1B, we present a heatmap that visualizes the expression levels of the 98 identified toxicity targets within the TCGA-ESCA dataset. The heatmap reveals distinct expression patterns, indicating that these targets may play significant roles in the biological processes associated with ESCA. Furthermore, Fig. 1C depicts the protein-protein interaction (PPI) analysis of the 98 toxicity targets. The PPI network elucidates the complex interactions among these proteins, providing insights into the potential signaling pathways and biological processes that may be affected by PFOS exposure. Overall, our findings contribute to the growing body of evidence linking environmental toxicants, such as PFOS, to the risk of esophageal cancer, emphasizing the need for continued research in this area to better understand the underlying mechanisms and to inform public health strategies.

Fig. 1
figure 1

Analysis of PFOS toxicity targets in esophageal cancer. (A) Venn diagram showing the intersection of DEGs in esophageal cancer from the TCGA-ESCA dataset and 255 predicted PFOS toxicity targets, resulting in 98 overlapping genes. (B) Heatmap displaying the expression levels of the 98 overlapping toxicity targets in the TCGA-ESCA dataset, comparing normal and tumor tissues. The color gradient indicates the level of gene expression, with red representing higher expression and blue representing lower expression. (C) PPI network analysis of the 98 overlapping toxicity targets. Node colours reflect the degree of connectivity, with larger nodes representing more connected proteins.

Enrichment analyses of differentially expressed PFOS-related toxicity targets in esophageal cancer

To elucidate the biological functions and pathways associated with the 98 differentially expressed toxicity targets of PFOS exposure in esophageal cancer, enrichment analyses were performed. As shown in Fig. 2A, the top 10 enriched BP terms include response to oxygen levels, response to hypoxia, response to decreased oxygen levels, epithelium migration, and epithelial cell migration. These results suggest that PFOS exposure may influence cellular responses to hypoxic conditions and migration-related processes, which are critical in tumor progression and metastasis. The top 10 enriched CC terms highlight the involvement of structures and complex assemblies such as the serine/threonine protein kinase complex, cell-substrate junction, focal adhesion, cyclin-dependent protein kinase holoenzyme complex, and protein kinase complex (Fig. 2B). These components are implicated in maintaining cellular architecture and signaling pathways necessary for cell adhesion and migration. The top 10 enriched MF terms include heme binding, tetrapyrrole binding, protein serine/threonine/tyrosine kinase activity, voltage-gated calcium channel activity, and gated channel activity (Fig. 2C). These molecular functions are essential for cellular signal transduction and ion transport, which are often dysregulated in cancer cells. As shown in Fig. 2D, the KEGG pathway enrichment analysis identified several key pathways associated with the 98 toxicity targets. Notable pathways include Small cell lung cancer, Thyroid hormone signaling pathway, MicroRNAs in cancer, Oocyte meiosis, Human T-cell leukemia virus 1 infection, Prostate cancer, the AGE-RAGE signaling pathway in diabetic complications, central carbon metabolism in cancer. The involvement of these pathways underscores the multifaceted roles of PFOS exposure in perturbing various signaling mechanisms and cellular processes that contribute to the development and progression of ESCA. These comprehensive enrichment analyses reveal that PFOS exposure can significantly impact a wide range of biological processes, cellular components, and molecular functions, providing a deeper understanding of the mechanistic pathways through which PFOS may influence carcinogenesis in the esophagus.

Fig. 2
figure 2

Enrichment analyses of differentially expressed PFOS-related toxicity targets. GO enrichment analysis of the 98 differentially expressed PFOS toxicity targets categorized by BP (A), CC (B), and MF (C). (D) KEGG pathway enrichment analysis of the 98 targets identifying significant pathways.

Identification of key toxicological targets using machine learning algorithms

In this study, we employed four distinct machine learning algorithms to identify key toxicological targets associated with esophageal cancer, based on the gene expression profiles of the top 40 toxicity targets derived from the PPI network. Figure 3A illustrates the results obtained using the RF algorithm, which successfully identified 8 significant toxic targets. Figure 3B presents the findings from the XGBoost algorithm, which identified 9 important toxic targets. Figure 3C shows the results from the LASSO algorithm, which pinpointed 11 significant toxic targets. Figure 3D, E depict the outcomes from the SVM algorithm, which identified a total of 24 important toxic targets. Finally, Fig. 3F employs the UpSet Venn diagram to reveal 3 core toxic targets (PLAU, TOP2A, and BAX) that were consistently identified across multiple algorithms. Overall, the integration of machine learning approaches in this study not only enhances our understanding of the toxicological landscape associated with PFOS exposure but also provides a foundation for future research aimed at elucidating the mechanisms underlying esophageal cancer development.

Fig. 3
figure 3

Identification of key toxicological targets using machine learning algorithms. (A) RF algorithm identified 8 significant toxicity targets based on their mean decrease Gini values. (B) XGBoost algorithm recognized 9 important toxicity targets, illustrating feature importance values. (C) LASSO algorithm identified 11 significant toxicity targets, with the tuning parameter (lambda) selection plot showing the binomial deviance. SVM algorithm identified 24 significant toxicity targets. (D) Five-fold cross-validated accuracy plot and (E) cross-validated error plot display the optimal number of features for maximum model performance. (F) UpSet Venn diagram highlighting the 3 core toxicity targets (PLAU, TOP2A, BAX) commonly identified by the RF, XGBoost, LASSO, and SVM algorithms.

Survival analysis and expression levels of key targets in esophageal cancer.

The survival analysis for key target genes in the TCGA dataset demonstrated distinct survival probabilities based on the expression levels of PLAU, TOP2A, and BAX. Figure 4A shows the Kaplan-Meier survival curve for PLAU, indicating that high expression levels of PLAU are significantly associated with poorer overall survival. The hazard ratio for high expression vs. low expression is 1.92 (P = 0.047). In Fig. 4B, the survival analysis for TOP2A shows that its high expression is associated with a trend towards worse overall survival, though it did not reach statistical significance (P = 0.061). Figure 4C illustrates the results for BAX, where high expression also correlates with a trend towards reduced overall survival, with a hazard ratio of 1.89 (P = 0.057). The gene expression levels of PLAU, TOP2A, and BAX were further analyzed between normal and esophageal cancer tissues using the TNMplot database. Figure 4D shows that the expression levels of PLAU were significantly elevated in esophageal cancer tissues compared to normal tissues (P < 0.001). Similarly, Fig. 4E indicates a significant overexpression of TOP2A in tumor tissues relative to normal tissues (P < 0.001). Figure 4F shows that BAX expression was also markedly higher in esophageal cancer tissues compared to normal tissues (P < 0.01). These findings collectively suggest that high expression levels of PLAU, TOP2A, and BAX are associated with esophageal cancer, and elevated levels of these genes are indicative of poorer prognosis in ESCA patients. The differential expression of these genes in tumor versus normal tissues highlights their potential role as biomarkers and therapeutic targets in esophageal cancer.

Fig. 4
figure 4

Survival analysis and expression levels of key targets in esophageal cancer. Kaplan-Meier survival curves for overall survival of esophageal cancer patients from the TCGA dataset based on the expression levels of PLAU (A), TOP2A (B), and BAX (C). Gene expression levels of PLAU (D), TOP2A (E), and BAX (F) in normal and esophageal cancer tissues from the TNMplot database. **P < 0.01, ***P < 0.001.

Validation of core gene expression in esophageal cancer cells following PFOS exposure

The expression levels of PLAU, TOP2A, and BAX were significantly increased in esophageal cancer cells following PFOS exposure. In ESO-26 cells (Fig. 5A), PFOS treatment led to a notable upregulation of PLAU (P < 0.01), TOP2A (P < 0.05), and BAX (P < 0.01). In FLO-1 cells (Fig. 5B), PFOS exposure resulted in a significant increase in the expression of PLAU (P < 0.001), TOP2A (P < 0.001), and BAX (P < 0.01). These results indicate that PFOS exposure leads to a marked upregulation of PLAU, TOP2A, and BAX gene expressions in esophageal cancer cell lines, suggesting a potential mechanism by which PFOS may influence esophageal carcinogenesis.

Fig. 5
figure 5

Upregulation of core genes in esophageal cancer cells following PFOS exposure. Gene expression levels of PLAU, TOP2A, and BAX in ESO-26 cells (A) and FLO-1 cells (B) treated with PFOS (red bars) compared to the control group (NC, green bars). Data are expressed as mean values ± standard deviation (SD). **P < 0.01, ***P < 0.001.

GSVA analysis of core toxicity targets

GSVA was performed to analyze the association between the core toxic targets (PLAU, TOP2A, and BAX) and various oncogenic pathways. As illustrated in Fig. S1A, PLAU expression exhibited significant positive correlations with multiple oncogenic pathways, including angiogenesis, apoptosis, differentiation, DNA damage, EMT, hypoxia, inflammation, invasion, metastasis, proliferation, and quiescence. Figure S1B demonstrates that TOP2A expression significantly correlated with cell cycle, DNA damage, DNA repair, and proliferation. In addition, TOP2A negatively correlating with hypoxia and inflammation. In Fig. S1C, BAX was negatively correlated with differentiation and hypoxia. In sum, GSVA enrichment analysis highlights significant associations between PLAU, TOP2A, and BAX expressions with critical oncogenic pathways, underscoring their importance in the molecular pathology of esophageal cancer affected by PFOS exposure. The results provide insights into potential mechanisms through which these core targets contribute to cancer progression.

Correlation analysis of core toxicity targets with immune cell infiltration.

We analyzed the correlation between the core toxicity targets (PLAU, TOP2A, and BAX) and immune cell infiltration using the ESTIMATE and ssGSEA algorithms. Figure S2A presents the correlation matrix of core toxicity targets with StromalScore, ImmuneScore, and ESTIMATEScore. PLAU exhibited a significant positive correlation with both StromalScore (p < 0.001) and ESTIMATEScore (p < 0.01). TOP2A showed a negative correlation with ImmuneScore (p < 0.01) and ESTIMATEScore (p < 0.05). BAX did not demonstrate significant correlations with these scores. Figure S2B illustrates the ssGSEA analysis for immune cell infiltration. PLAU expression positively correlated with various immune cells, including iDC, macrophages, NK cells, Tcm, Th1 cells, Th2 cells and several other immune cell types. TOP2A expression was negatively correlated with CD8 T cells, DC, mast cells, neutrophils, and other immune cells. BAX exhibited fewer significant correlations; it was correlated with Tcm. These results suggest that PLAU and TOP2A are involved in immune cell infiltration in esophageal cancer, potentially influencing the tumor microenvironment and immune response mechanisms associated with PFOS exposure.

Molecular docking analysis

Molecular docking analysis was performed to elucidate the interactions between PFOS and the core toxicity targets (BAX, PLAU, and TOP2A). The docking results indicate a significant binding affinity between BAX and PFOS, with a Vina score of -8.5 (Fig. 6A). Similarly, the interaction between PLAU and PFOS demonstrates a moderate binding affinity with a Vina score of -7.1 (Fig. 6B). Notably, TOP2A exhibited the strongest binding affinity to PFOS among the three targets, with a Vina score of -10.2 (Fig. 6C). These findings suggest that PFOS may directly interact with BAX, PLAU, and TOP2A, potentially influencing their functional roles in esophageal cancer pathology. The MD simulation analysis of the interaction between PFOS and PLAU was conducted to elucidate the potential mechanisms in ESCA. As illustrated in Fig. 6D, the root mean square deviation (RMSD) of the PLAU protein remained relatively stable throughout the simulation, ultimately stabilizing between 2.8 Å and 3.2 Å. The RMSD of the ligand PFOS exhibited fluctuations before stabilizing around 6.0 Å. This indicates that, despite initial conformational changes, a more stable binding conformation was achieved through the dynamics simulation.

Fig. 6
figure 6

Molecular docking analysis of core toxicity targets with PFOS. Molecular docking of PFOS with BAX (A), PLAU (B), and TOP2A (C). MD simulation analysis of the interaction between PFOS and PLAU (D).

Discussion

ESCA is a significant global health concern characterized by its high mortality rates and complex treatment challenges. The disease is predominantly classified into two main histological types: squamous cell carcinoma and adenocarcinoma, each associated with distinct risk factors and pathophysiological mechanisms. Early detection remains a critical challenge, as most patients are diagnosed at advanced stages, leading to poor prognoses. The rising incidence of ESCA, particularly in high-risk regions, underscores the urgent need for innovative approaches to improve diagnosis and treatment, as well as to understand the underlying biological mechanisms that contribute to its development and progression.

This study aims to explore the potential relationship between PFOS, an environmental toxicant, and the development of esophageal cancer. By employing a comprehensive bioinformatics approach, including differential expression analysis, machine learning algorithms, and molecular docking simulations, this research seeks to identify key toxicological targets and elucidate the molecular pathways involved in ESCA related to PFOS exposure. The findings from this study are expected to provide valuable insights into the pathogenesis of ESCA and pave the way for the identification of novel biomarkers for early diagnosis and targeted therapeutic interventions.

The association between PFOS exposure and various cancers has been a subject of increasing interest in recent years. However, the current study did not sufficiently address the direct causality between PFOS exposure and esophageal cancer carcinogenesis. To clarify, we now specifically examine how PFOS may contribute to carcinogenesis in esophageal cancer through the disruption of key biological processes, such as oxidative stress, DNA damage, and immune modulation. Previous studies have indicated that PFOS can disrupt endocrine functions and induce oxidative stress, leading to cellular damage and tumorigenesis20,21,22. Our enrichment analyses corroborate these findings, as we observed significant involvement of biological processes such as response to hypoxia and epithelial cell migration, which are critical in cancer progression23. The upregulation of genes related to these processes suggests that PFOS may exacerbate the hypoxic tumor microenvironment, promoting aggressive tumor behavior. Moreover, the identification of key pathways such as the Thyroid Hormone Signaling Pathways and Human T-cell leukemia virus 1 infection align with previous research that has implicated these pathways in various malignancies24,25. The role of the AGE-RAGE signaling pathway in cancer development has been previously explored, and it is known to play a role in inflammation, fibrosis, and tumor progression, which could be particularly relevant in the context of PFOS exposure26. The Thyroid hormone signaling pathway has also been shown to influence the regulation of cell differentiation and proliferation, with implications for tumor growth27. The dysregulation of thyroid hormone signaling is linked to various cancers, including esophageal cancer, as it could contribute to tumor cell proliferation and survival in the presence of environmental toxins such as PFOS. The involvement of microRNAs in cancer, identified in our KEGG pathway enrichment analysis, is particularly noteworthy. MicroRNAs are small non-coding RNAs that regulate gene expression at the post-transcriptional level and have been implicated in cancer progression, including esophageal cancer28. We now hypothesize that PFOS exposure may disrupt the normal expression of microRNAs, leading to the upregulation of oncogenes and downregulation of tumor suppressor genes, thus contributing directly to cancer initiation and progression.

The application of machine learning algorithms in our study has proven effective in identifying core toxicological targets associated with ESCA. The convergence of results from multiple algorithms, including RF, XGBoost, LASSO, and SVM, underscores the robustness of our findings. Notably, the identification of PLAU, TOP2A, and BAX as core targets is particularly significant. These targets are not merely correlated with the presence of esophageal cancer but appear to directly contribute to its carcinogenesis. Our analysis now includes an evaluation of the functional roles of these targets in cancer development, focusing on their contribution to tumorigenesis, cellular survival, and metastasis. Our study found that PLAU expression was significantly upregulated in tumor tissues relative to normal tissues, which is consistent with previous reports linking PLAU to cancer progression29. PLAU has been implicated in multiple oncogenic processes, including angiogenesis, invasion, and metastasis, by regulating the extracellular matrix degradation and promoting cell migration30,31. Our analysis showed that PLAU expression was significantly correlated with critical oncogenic pathways, such as epithelial-mesenchymal transition (EMT), invasion, and metastasis. This aligns with findings in other cancers, where PLAU overexpression facilitates tumor progression by altering the tumor microenvironment32. The strong correlation between PLAU expression and immune cell infiltration (e.g., macrophages, NK cells, and T cells) further supports its role in influencing the tumor immune microenvironment, as previously demonstrated in pancreatic ductal adenocarcinoma32. Thus, PLAU emerges as a promising biomarker for diagnostic purposes and a potential therapeutic target for PFOS-related esophageal cancer. The expression of TOP2A was also significantly elevated in esophageal cancer tissues, consistent with findings from other studies in various cancers33,34. TOP2A is crucial for DNA replication and repair, and its dysregulation can lead to genomic instability, a hallmark of cancer35. Our study’s GSVA analysis indicated that TOP2A expression strongly correlated with DNA damage and proliferation. These findings align with earlier reports that showed elevated TOP2A expression in tumors correlates with higher cell proliferation rates and poor prognosis36,37. Moreover, TOP2A’s negative correlation with hypoxia and inflammation pathways in our analysis suggests that TOP2A may play a complex role in modulating the tumor microenvironment, potentially influencing tumor adaptation to stress conditions such as hypoxia and immune infiltration38. In the context of PFOS exposure, the upregulation of TOP2A may be a result of DNA damage induced by environmental toxins. PFOS has been shown to disrupt the DNA repair mechanisms in various cell types, which may enhance the oncogenic potential of molecules like TOP2A39,40. These findings underscore the potential of TOP2A as both a biomarker for esophageal cancer and a target for therapeutic interventions. Interestingly, BAX exhibited a more variable expression pattern in our study, with upregulation observed in cancer tissues compared to normal tissues. The pro-apoptotic protein BAX plays a pivotal role in regulating cell survival and apoptosis by interacting with other Bcl-2 family members to promote mitochondrial outer membrane permeabilization and caspase activation41. The negative correlations observed between BAX and differentiation and hypoxia pathways suggest that BAX may contribute to the deregulation of apoptotic signaling in esophageal cancer, potentially allowing for tumor cell survival in adverse conditions such as hypoxia. These results corroborate findings in other cancers where reduced BAX expression is linked to poor prognosis and resistance to apoptosis42,43. Thus, BAX might serve as an additional target for therapeutic strategies aimed at restoring apoptotic sensitivity in PFOS-exposed esophageal cancer cells.

The role of PFOS in esophageal cancer development is a growing area of interest, and our study provides new insights into the molecular mechanisms by which this environmental toxin may promote carcinogenesis. PFOS, a perfluorinated compound, is known to induce oxidative stress, DNA damage, and inflammation in various cell types44. Our findings indicate that PFOS exposure may dysregulate critical genes involved in DNA repair, apoptosis, cell proliferation, and immune response, contributing to the development and progression of esophageal cancer. Additionally, our analysis of immune cell infiltration revealed that PLAU and TOP2A are involved in modulating the tumor immune microenvironment, suggesting that PFOS exposure may affect the interaction between cancer cells and immune cells. The immunosuppressive effects of PFOS have been documented in other studies, with reports indicating that PFOS exposure can induce immunotoxicity and skew immune responses45. These findings point to the importance of investigating the immune-modulatory effects of PFOS in future research, as targeting the tumor microenvironment may offer new therapeutic avenues for PFOS-related cancers.

The limitations of this study warrant careful consideration. Firstly, the absence of wet lab experiments to validate the bioinformatics findings restricts the ability to confirm the biological relevance of the identified differentially expressed genes and toxicological targets. Additionally, without clinical validation, the translational potential of the identified biomarkers remains uncertain, limiting their applicability in real-world settings. These constraints underscore the necessity for subsequent research endeavors that incorporate experimental validation and diverse datasets to substantiate the findings of this investigation. Furthermore, while this study focused on PFOS due to its well-established carcinogenic potential and emerging evidence linking it to esophageal cancer, we acknowledge that other environmentally toxic compounds may also contribute to disease pathogenesis. Future studies should expand this framework to include comparative analyses of additional pollutants, such as PFOA, BPA, or heavy metals, to provide a more comprehensive understanding of environmental toxicants in cancer progression.

Conclusions

In conclusion, this research provides significant insights into the molecular mechanisms by which PFOS exposure may contribute to the development of esophageal cancer. The identification of critical differentially expressed genes and potential biomarkers highlights the importance of environmental factors in cancer pathogenesis. A multidisciplinary approach integrating bioinformatics, experimental validation, and clinical assessments will be pivotal in advancing our understanding of the interplay between environmental toxins and cancer progression.