Abstract
PLOD3 (procollagen-lysine, 2-oxoglutarate 5-dioxygenase 3), a key enzyme involved in collagen post-translational modification, is critical for maintaining the structural integrity of the extracellular matrix (ECM). Dysregulation of PLOD3 has been implicated in various malignancies, including colorectal cancer (CRC).This study aimed to elucidate the role of PLOD3 in CRC and evaluate its potential as a prognostic biomarker and therapeutic target. We conducted a comprehensive analysis utilizing data from The Cancer Genome Atlas (TCGA) and the Gene Expression Omnibus (GEO) to evaluate PLOD3 expression in CRC. Univariate and multivariate Cox regression analyses were conducted to investigate its impact on overall survival. Functional assays, including wound healing, Transwell migration, and invasion assays, were carried out in CRC cell lines with modified PLOD3 expression to elucidate its role in regulating tumor cell behavior. Furthermore, Gene Set Enrichment Analysis (GSEA) was employed to identify signaling pathways associated with PLOD3 expression. Our findings demonstrate that PLOD3 is significantly overexpressed in CRC tissues compared to normal tissues, and its elevated expression is associated with poor prognosis and reduced overall survival. The study also developed a RiskScore model incorporating PLOD3 and 11 other genes, which exhibited strong predictive performance for patient outcomes. Functional experiments confirmed that PLOD3 overexpression enhances CRC cell migration and invasion. GSEA linked high PLOD3 expression to the activation of epithelial-mesenchymal transition (EMT) and metastasis-related pathways. In conclusion, PLOD3 plays a pivotal role in CRC progression by promoting tumor growth and metastasis. Its elevated expression serves as an independent prognostic marker and a potential target for therapeutic intervention, offering new insights into the molecular mechanisms driving CRC.
Similar content being viewed by others
Introduction
Colorectal cancer ranks as the third most common type of cancer globally and is the second leading cause of cancer-related mortality. According to GLOBOCAN 2020 data, there were approximately 1.9 million new cases and 930,000 deaths globally in 2020, with an emerging trend of increased incidence among younger age groups in high-income countries1, this highlights a growing public health challenge. Current guidelines from the Chinese Society of Clinical Oncology (CSCO) and the National Comprehensive Cancer Network (NCCN), recommend treatment modalities for CRC, including surgical resection, radiotherapy, chemotherapy, and targeted therapies. However, early detection and intervention remain critical for improving patient survival rates2. In recent years, advancements in molecular biology technology have facilitated the identification and clinical application of numerous biomarkers, significantly enhancing the diagnosis and treatment of CRC.
The PLOD gene family encodes a group of enzymes involved in the post-translational modification of collagen, with PLOD3 encoding lysine hydroxylase 3. Aberrant expression of PLOD3 has been observed in various tumor types. Emerging evidence suggests that PLOD3 contributes to tumor development and progression by modulating extracellular matrix (ECM) stability and promoting tumor cell invasiveness3. However, the precise role and underlying mechanisms of PLOD3 in CRC remain poorly understood, and its potential as a biomarker for diagnosis or therapy has yet to be fully explored.
This study examines the expression pattern of PLOD3 in CRC and its association with tumor biological behaviors through a comprehensive analysis of gene expression data and single-cell sequencing information obtained from publicly available databases. Furthermore, a risk prediction model was constructed using machine learning approaches, and the relationship between PLOD3 expression and the immune microenvironment, as well as its potential impact on immunotherapy response, was investigated. Finally, the functional role of PLOD3 in CRC was validated through in vitro experiments, highlighting its influence on the biological characteristics of CRC cells. These findings provide a novel perspective for the early diagnosis and treatment of CRC.
Methods
Single-cell clustering and cell annotation analysis
The single-cell dataset GSE1612774 was retrieved from the GEO database (https://www.ncbi.nlm.nih.gov/geo/), and samples from CRC and normal tissues were subsequently selected for analysis. The R package Seurat (v5.1.0)5 was used to loading and preprocessing the single-cell data. The ‘PercentageFeatureSet’ function was employed to calculate the proportion of mitochondrial genes. Ensure that each cell exhibits the expression of more than 250 genes, the mitochondrial content was less than 20%, the UMI of each cell was at least 500, and the ‘log10genesperumi’ was greater than 0.8 (calculated as ‘GenesPerUMI = nFeature_RNA / nCount_RNA’).
Subsequently, log-normalization was applied to the filtered GSE161277 dataset using the FindVariableFeatures function with default parameters to identify highly variable genes based on the Variance Stabilizing Transformation (VST). To address batch effects among samples, canonical correlation analysis (CCA) was performed, followed by sample integration using the FindIntegrationAnchors function.
After completing data integration with the “IntegrateData” function, we applied the “ScaleData” function to normalize gene expression across all features. Principal component analysis (PCA) was then used for dimensionality reduction to identify anchor points in the dataset. Subsequently, cell clustering was performed using the “FindNeighbors and FindClusters” functions.
Following clustering, we stratified the cell population based on previously established CRC and immune cell markers6,7,8,9,10,11, enabling the analyzing of the distribution of various cell types within the sample. Marker genes for each cell type were identified using the “FindallMarkers” function, with gene selection criteria set at FC > 2 and FDR < 0.05. Finally, functional enrichment analysis of these marker genes was performed using the “ClusterProfiler” package to unveil their potential biological significance.
Cellular communication and ligand analysis
We utilized the “CellChat” R package (version 1.6.0)12 to analyzing intercellular communication based on single-cell data and the defined cell classification. The analysis was performed using the “CellChatDB. human” database integrated into the “CellChat” package, ensuring that each cell type contained at least 10 cells.
Expression profile data processing and analysis
We retrieved expression profile data and associated clinical information for colorectal adenocarcinoma (COAD) from the TCGA database. Additionally, we downloaded the raw data for the datasets GSE1753813, GSE3311314, GSE3958215, GSE1433316, and GSE3883217 from the GEO database, and retained their sequencing data of the GPL570 platform.
The RMA function (Robust multi-array average expression measure) in the R package Affy (version 1.66.0)18 was employed to process and normalize the expression profile data, generating a normalized expression dataset. Using the annotation file of the GPL570 platform, probes were mapped to their corresponding gene symbols. For cases where multiple probes were mapped to the same gene symbol, the median expression value was calculated and assigned to the gene symbol. Conversely, probes mapped to multiple gene symbols were excluded from the analysis. Only CRC tumor samples with complete survival time and survival status information were retained for further analysis.
The six datasets - TCGA-COAD, GSE17538, GSE33113, GSE39582, GSE14333, and GSE38832-were integrated into a single dataset, hereafter referred to as Meta. Batch effect across datasets were corrected using the Combat function from the SVA package. Subsequently, we applied the single-sample gene set enrichment analysis (ssGSEA) method to compute enrichment scores for the expression data, utilizing marker genes derived from single-cells analyses.
Building prognostic models and evaluating their prognostic performance
We performed univariate Cox regression analysis on fibroblast marker genes in the TCGA dataset, applying a p-value threshold of < 0.01 to identify genes associated with survival. Based on the identified genes, we extracted their expression profiles from multiple independent datasets, including TCGA, GSE 39,582, GSE 17,538, GSE 33,113, GSE 14,333, GSE 38,832. The TCGA dataset was designated as the training set, while the remaining five datasets (GSE 39582, GSE 17538, GSE 33113, GSE 14333, GSE 38832) served as independent validation sets.
To screen for the optimal model, we applied 36 different methods19 and subsequently conducted multivariate Cox regression analysis on the selected genes to determine the risk coefficient for each gene, and the results were visualized for comparison. Additionally, we computed the risk score for each sample based on gene expression levels and plotted Kaplan-Meier (KM) survival curves to evaluate survival differences.
In the TCGA-COAD, GSE39582, GSE17538, and GSE14333 datasets, we performed both univariate and multivariate Cox regression analyses using clinical data to derive hazard ratios (HR), 95% confidence intervals (CI) for the HR, and associated P values. Through these analyses, we identified significant clinical characteristics and risk scores, which were subsequently employed to construct a nomogram model aimed at assessing its influence on survival prediction. Following this, we validated the performance of the nomogram by calibrating the predictions for 1-, 3-, and 5-year survival outcomes and visualizing the model’s efficacy. Finally, we conducted decision curve analysis (DCA) and compared the area under the receiver operating characteristic (ROC) curve (AUC value) across various time points.
Analysis of immune-related properties and functions in the model
We utilized the estimation tool to evaluate the Meta-dataset, obtaining scores for three indicators: stromal score, immune score, and estimate score, and compared score variances between the high-risk and low-risk groups. Referring to PMID 31,043,417, we identified immune checkpoint-related genes, calculated the correlation between these genes and risk score in the Meta-dataset, and presented the results in a heatmap format.
Subsequently, we used the “Limma” package to analyze the differentially expressed genes (DEGs) between the high-risk and low-risk groups in the Meta-dataset, using a threshold of |fold change FC| > 1.2 and p < 0.05. KEGG pathway enrichment analysis was then performed on the identified DEGs20. Additionally, we utilized the GSEA function in the clusterProfiler package to perform KEGG pathway enrichment analysis on the high-risk and low-risk groups in the Meta-dataset.
The potential of models in immunotherapy
We utilized the online tool TIDE (Tumor Immune Dysfunction and Exclusion)21,22 to predict immunotherapy responses in the Meta dataset. The correlation between RiskScore and TIDE scores was assessed, and the probabilities of immunotherapy response were compared between high-risk and low-risk groups.
Subsequently, we performed a subgroup analysis, comparing the expression profiles of high and low-risk groups in the Meta dataset with another published dataset, IMvigor210, which includes data on PD-L1 treatment. Additionally, we acquired immunotherapy-related datasets GSE7822023 and GSE13522224 and applied our risk scoring formula to calculate the risk scores for the samples in these datasets. The differences in the proportions of immunotherapy responders between the high- and low-risk groups in these two datasets were then compared.
Finally, for single-drug treatment effect, we utilized the R package oncoPredict to estimate IC50 values based on GDSC1 data and analyzed the correlation between drug IC50 values and RiskScore.
Mendelian randomization analysis of model genes and CRC
We obtained the GWAS dataset for the PLOD3 gene (prot-a-2305) and the GWAS dataset for CRC (bbj-a-107) from the OPENGWAS database (https://gwas.mrcieu.ac.uk/). For the prot-a-2305 dataset, a screening criterion of P < 5 × 10⁻⁶ was applied. Subsequently, analyses were conducted with the bbj-a-107 dataset using five Mendelian randomization methods: MR Egger, Weighted median, Inverse variance weighted, Simple mode, and Weighted mode.
Experimental verification
To further elucidate the association between the PLOD3 gene in the Riskscore gene model and CRC, we conducted additional verification through in vitro experiments.
Cell culture and transfection
Normal intestinal epithelial cells (NCM460) and human CRC cell lines (SW1116 and Caco2) were obtained from BNCC. The cells were cultured in DMEM medium (Gibco, USA) supplemented with 10% fetal bovine serum (FBS; Invitrogen, Carlsbad, CA, USA) and 1% penicillin/streptomycin. The cultures were maintained in a controlled environment at 37°C with an atmosphere of 5% carbon dioxide. The negative control (Vector) and siRNA (5’-GCGGTGATGAACTTTGTGGTT-3’) were transfected into SW1116 and Caco2 cells using Lipofectamine 2000 (Invitrogen, USA).
RNA extraction and quantitative reverse transcription polymerase chain reaction (RT-PCR)
Total cellular RNA was extracted using TRIzol (Invitrogen, Carlsbad, CA, USA). The total RNA was then reverse transcribed into cDNA according to the manufacturer’s instructions for the PrimeScript™ RT Kit (TaKaRa, Otsu, Shiga, Japan). Quantitative PCR amplification was performed using the SYBR Premix Ex Taq II (TaKaRa, Otsu, Shiga, Japan) on the TaKaRa system, following the standard real-time PCR protocol to quantify the relative mRNA expression levels.
The RT-PCR primers were sourced from Gene Pharma (Shanghai Gene Pharma, Shanghai, China). mRNA expression levels were normalized to GAPDH, and the relative gene expression was calculated using the 2-∆∆CT method.
PLOD3: Forward: GACCCGGTCAACCCAGAGA, Reverse: CTCCACCAACTGTTCGAGCC.
GAPDH: Forward: GTCTCCTCTGACTTCAACAGCG, Reverse: ACCACCCTGTTGCTGTAGCCAA.
Protein extraction and Western blot analysis
Total cellular proteins were extracted using RIPA buffer (Beyotime Biotechnology Co., Shanghai, China) containing protease inhibitors (Roche Diagnostics, Indianapolis, IN, USA). Protein concentration was determined according to the instructions provided with the BCA kit (Thermo Fisher Scientific, Waltham, MA, USA). A total of 40 µg of protein was separated by 10% SDS-PAGE and subsequently transferred onto a polyvinylidene fluoride (PVDF) membrane (Millipore, Billerica, MA, USA).
The membrane was blocked with 5% fat-free milk at 25 °C for 1 h, followed by overnight incubation with primary antibodies at 4 °C. The primary antibodies (details provided in Table 1) were sourced from Abcam (Cambridge, MA, USA). After washing, the membrane was incubated with a corresponding secondary antibody (1:50,000) at 25 °C for 1 h.
Immune-reactive proteins were detected using the Pierce ECL detection system (Thermo Fisher Scientific, Waltham, MA, USA) with ECL substrate, and quantitative analysis was performed using ImageJ software (NIH, Bethesda, MD, USA).
Transwell
Cells (5 × 104) were seeded into Transwell chambers that were either pre-coated with Matrigel (BD Biosciences, USA, for invasion assays) or left uncoated (for migration assays). The upper chamber was filled with serum-free medium, while the lower chamber contained complete medium. After 24 h of incubation, the migrated or invaded cells were fixed with 4% paraformaldehyde and stained with 0.1% crystal violet.
Scratch test
Cell migration was assessed using wound healing experiments. A total of 5 × 104 transfected cells were seeded into 6-well plates and cultured until cell confluence reached 90%. A linear wound was created in the cell monolayer using a 20 µL pipette tip, followed by washing with phosphate-buffered saline (PBS). After 24 h, the cells were fixed with 4% paraformaldehyde and stained with 0.1% crystal violet (Servicebio) for 15 min. The stained cells were imaged using an inverted microscope (Leica), and the wound closure rate was quantified using ImageJ software.
Statistical analysis
The data were analyzed using SPSS 19.0 software (SPSS Inc., Chicago, IL, USA) and GraphPad Prism 5.0 software (GraphPad Software, La Jolla, CA, USA). One-way analysis of variance (ANOVA) was performed to evaluate differences between groups, followed by post hoc testing using the Student-Newman-Keuls method. A p-value < 0.05 was considered statistically significant.
Results
Data sorting and analysis
From the single-cell dataset of colorectal cancer, we identified a total of 24 subpopulations through filtering and dimensionality reduction (Fig. 1A). Based on criteria established in previous studies6,7,8,9,10,11, we screened a series of key markers to distinguish colorectal cancer and its associated immune cells. This analysis encompassed various cell types, including epithelial cells (EPCAM, KRT19, CLDN4), fibroblasts (COL3A1, COL1A1, COL1A2), macrophages (IL7R, CD68, IL1B), T cells (IL7R, CD3G), NK cells (NKG7), B cells (CD79A, VPREB3, CD79B), endothelial cells (PECAM1), mast cells (TPSAB1, TPSB2, KIT), and plasma cells (SSR4, IGLL5). Based on the expression patterns of these marker genes across different clusters, we further classified the cells into nine distinct types (Fig. 1B), and visualized the expression levels of marker genes both among different cell types and within the same cell type (Fig. 1C-D). To gain deeper insight into the functional roles of these marker genes, we performed functional enrichment analysis (Fig. 1E). Additionally, we calculated the proportion of each cell type across samples (Fig. 1F-G).The results revealed significant variations in cell proportions between samples, highlighting notable cellular heterogeneity. Notably, the proportion of epithelial cells was 44% in tumor tissues, compared to 17% in normal tissues, consistent with findings from current clinical research.
Single-cell clustering and cell annotation analysis. (A) Subgroup clustering at various resolutions; (B) Annotated TSNE plot of cells post single-cell dimensionality reduction clustering; (C) Expression map of cell marker genes within the cells; (D) Display of differential marker genes across distinct cell types; (E) KEGG functional enrichment analysis of differential marker genes in various cell types; (F) Proportion of cells in tumor versus normal samples; (G) Proportion of cells across different sample types.
In the COAD samples, we identified nine distinct cell types with varying proportions across different samples. Using the CellChat tool for analysis, we revealed a complex intercellular communication network. Notably, fibroblasts exhibited the highest number and weight of communications within this network (Fig. 2A-E), underscoring their potential central role in regulating intercellular communication.
Cellular communication and ligand analysis. (A) Directionality and quantity of communication between cells; (B) Directionality and intensity of communication between cells; (C) Ligand analysis with fibroblasts as the source across different cell types; (D) Ligand analysis with fibroblasts as the target across different cell types; (E) Identification of signals that contribute most significantly to the output or input of various cell groups; (F–J) Intercellular communication diagram of a portion of the signaling network.
Further analysis of intercellular signaling pathways revealed interactions involving the collagen signaling pathway, which is closely associated with the tumor extracellular matrix, as well as the transmission modes of CXCL signaling pathway, which play critical roles in immune response and cell migration (Fig. 2F-J).
Subsequently, we employed single-cell marker genes to assess the cellular status of the merged meta dataset. Our analysis focus on comparing fibroblast scores between normal and tumor samples in the TCGA-COAD and GSE33113 datasets. Notably, the results demonstrated significantly higher fibroblast scores in normal samples compared to tumor samples (Fig. 3A).
Prognostic model construction and validation: differential expression, survival analysis, and machine learning performance in CRC cohorts. (A) Differential expression of fibroblast scores across various groups in the TCGA and GSE14333 datasets; (B) Results of univariate analysis from the TCGA dataset; (C) Multivariate Cox analysis of model genes; (D) Calculation of the C-index for different datasets using 36 machine learning methods; (E) Kaplan-Meier (KM) curves and Receiver Operating Characteristic (ROC) analysis for the TCGA, GSE39582, GSE17538, GSE33113, GSE14333, GSE38832, and Meta datasets; (F) Differential expression analysis of model genes in normal and tumor samples within the TCGA dataset.
Construction and effect of prognostic model
We screened 57 survival-related genes associated with fibroblast marker genes from the TCGA dataset (Fig. 3B), including MMP23B, TRIP10, SLC39A1, CHID1, ABL2, PLOD3, among others. After extracting gene expression profiles from the Meta dataset, we constructed a model using the training and validation sets and identified key genes associated with prognosis. Subsequently, we evaluated the selected genes using multivariate COX regression analysis (Fig. 3C), calculated the risk coefficient for each gene, and assessed the C-index of each dataset using different machine learning methods (Fig. 3D). The results showed that the 12 genes selected using the StepCox[backward] method had a higher average C-index. Therefore, we utilized this method to construct a prognostic model, represented by the following formula:
RiskScore = 0.8921*ABL2 + 0.2645*AIF1L + 0.1339*APOD-0.3421*BGN + 0.3051*CD248 + 0.274*CRIP2-0.6143*NPR1 + 0.1456*NR4A1 + 0.1928*PCOLCE2 + 0.3098*PLOD3 + 0.1424*S100A4-0.5725*USP53.
We further calculated the KM and ROC curves for each dataset to evaluate the predictive performance of the model (Fig. 3E). Additionally, we compared the expression of model genes in normal and tumor tissues within the TCGA-COAD and GSE33113 datasets. The results revealed significant differences in the expression of genes such as ABL2, AIF1L, APOD, BGN, NPR1, PCOLCE2, PLOD3, and USP53 between the two types of samples (Fig. 3F).
To validate the independence of the RiskScore model in clinical applications, we conducted further analyses. Firstly, across multiple datasets, the RiskScore demonstrated significant prognostic predictive power in both univariate and multivariate analyses(Fig. 4A-D). Secondly, the significance of other clinical characteristic factors in multivariate analysis was notably reduced, further supporting the independence and reliability of the RiskScore model. Notably, in the TCGA-COAD dataset, both univariate and multivariate analyses identified that T-stage, M-stage, and RiskScore model as significant predictors of prognosis. Consequently, we incorporated these three features to construct a nomogram for further analysis (Fig. 4E).
Comprehensive prognostic analysis and model validation across multiple datasets for colorectal cancer. (A) Univariate and multivariate analyses of the TCGA-COAD dataset; (B) Univariate and multivariate analyses of the GSE39582 dataset; (C) Univariate and multivariate analyses of the GSE17538 dataset; (D) Univariate and multivariate analyses of the GSE14333 dataset; (E) Nomogram for the TCGA-COAD dataset; (F) Calibration plot of the nomogram for the TCGA-COAD dataset; (G) Decision Curve Analysis (DCA) plot of clinical features and Risk Score for the TCGA-COAD dataset; (H) Area Under the Curve (AUC) at different time points for clinical features and Risk Score in the TCGA-COAD dataset.
The nomogram results indicated that the RiskScore had the greatest impact on survival rate prediction, underscoring the high efficacy of gene-based risk models in prognostic prediction. The performance of the model was further validated by correcting the 1-year, 3-year, and 5-year survival rate data (Fig. 4F). Finally, through decision curve analysis (DCA) and AUC values of ROC curves at different time points, we confirmed that the RiskScore model exhibits robust predictive performance across various time points.
Association between models and immune and function
By comparing the immune microenvironment scores of high and low-risk groups in the Meta dataset, we found that that the immune score of the high-risk group was significantly higher than that of the low-risk group (Fig. 5A). Further analysis revealed a significant positive correlation between RiskScore and ImmuneScore (Fig. 5B).
Analysis of immune-related properties and functions in the model. (A) Comparison of the immune microenvironment across risk groups in the Meta dataset; (B) Correlation analysis between ImmuneScore and RiskScore in the Meta dataset; (C) Heatmap illustrating the correlation between RiskScore and immune checkpoint genes in the Meta dataset; (D) Differential comparison of immune checkpoint genes across risk stratification in the Meta dataset; (E) KEGG enrichment analysis of differentially expressed genes in the risk stratification of the Meta dataset; (F) An enrichment map organizes enriched terms into a network, connecting terms that share common genes; (G,H) Significantly enriched pathways in the high-risk group of the Meta dataset as identified by GSEA analysis; (I) Significantly enriched pathways in the low-risk group of the Meta dataset as identified by GSEA analysis.
Additionally, a comparative analysis of immune checkpoint genes showed that their expression was significantly and positively correlated with RiskScore in most cases (Fig. 5C). Differential gene expression analysis between the high- and low-risk groups identified 1923 differentially expressed genes. Enrichment analysis of these genes revealed significant enrichment in several signaling pathways, including the IL-17 signaling pathway, ECM-receptor interaction, TNF signaling pathway, PPAR signaling pathway, Hippo signaling pathway, PI3K-Akt signaling pathway, Wnt signaling pathway, Cytokine-cytokine receptor interaction, Chemokine signaling pathway, NF-kappaB signaling pathway, Focal adhesion, and Human papillomavirus infection pathway (Fig. 5D-F).
Further GSEA analysis revealed significant enrichment in the high-risk group for pathways such as the PI3K-Akt signaling pathway, NF-kappa B signaling pathway, VEGF signaling pathway, TGF-beta signaling pathway, and Wnt signaling pathway. Additionally, the high-risk group showed enrichment in pathways like the Chemokine signaling pathway, Toll-like receptor signaling pathway, Natural killer cell-mediated cytotoxicity, Th17 cell differentiation, and Th1 and Th2 cell differentiation. In contrast, the low-risk group exhibited significant enrichment in pathways including Homologous recombination, DNA replication, Mismatch repair, Base excision repair, and Nucleotide excision repair (Fig. 5G-I).
Potential association of the model with immunotherapy
We used the TIDE tool to predict immunotherapy responses in the samples from the Meta-dataset and compared the treatment response rates across different subtypes. The results showed that the TIDE scores in the low-risk group were lower than those in the high-risk group, indicating that the low-risk group might be more suitable for immunotherapy (Fig. 6A). Further analysis demonstrated a positive correlation between RiskScore and TIDE scores (Fig. 6B). Moreover, the percentage of samples with a True treatment response in the low-risk group was significantly higher than in the high-risk group, reaching 62% (Fig. 6C).
The potential of models in immunotherapy. (A) Demonstration of the differences in TIDE between high and low RISKTYPE groups in the meta-dataset; (B) Correlation analysis between RISKTYPE and TIDE outcomes within the dataset; (C) Presentation of the proportion of immune therapy efficacy within the RISKTYPE groups in the Meta-dataset; (D) Submap of immune response in the meta-dataset compared to IMvigor210; (E) Proportion of immune therapy response in the risk groups of dataset GSE78220; (F) Proportion of immune therapy response in the risk groups of dataset GSE135222; (G) Display of the correlation analysis results between RISKTYPE scores and drug IC50 in the dataset; (H) Differential analysis of drug IC50 across RISKTYPE score groups in the dataset.
In the analysis of the expression profiles of the Meta dataset, we observed that compared to the clinical gene set treated with PD-L1, the expression profiles of the high-risk group showed significant correlations with the PD and SD response groups (p = 0.016, p = 0.002), suggesting a potential association between the high-risk group and response to PD-L1 therapy. Conversely, the expression profiles of the low-risk group exhibited significant correlations with the PR response group (p = 0.026), indicating that the low-risk group may demonstrate a more favorable response to PD-L1 therapy (Fig. 6D). Furthermore, we calculated the risk scores for two immunotherapy datasets using the risk model and observed that the proportion of PR/CR was higher in the low-risk group—79% in GSE78220 and 57% in GSE135222(Fig. 6E-F). These findings suggest that the RiskScore model could serve as a potential biomarker for guiding personalized PD-L1 therapy.
Finally, we utilized the GDSC1 dataset to analyze the correlation between drug IC50 and RiskScore. After establishing a threshold of |R| > 0.3 and a P-value < 0.05 for screening, we identified 38 significantly associated drugs. Among these, 24 showed a significant positive correlation between their IC50 and RiskScore, while 14 demonstrated a significant negative correlation (Fig. 6G-H). Notably, drugs such as Pyrimethamine, Gemcitabine, and Tozasertib exhibited a significant positive correlation with RiskScore, whereas drugs like Erlotinib, Lapatinib, and Dasatinib showed a significant negative correlation.
Mendelian randomization analysis of model genes and CRC
To further investigate the association between these model genes and CRC, we utilized the OPENGWAS database for data retrieval. Among the genes in this database, only PLOD3 had available genome-wide association study (GWAS) data. We further conducted a Mendelian randomization analysis for PLOD3. The heterogeneity tests using the MR Egger and Inverse Variance Weighted methods indicated no significant heterogeneity between the datasets (p > 0.05), and a pleiotropy analysis also showed no significant pleiotropy (p > 0.05). The scatter plot demonstrated that all five analytical methods supported a positive correlation between the exposure factor PLOD3 and the outcome of CRC, suggesting that PLOD3 may be a risk factor for CRC (Fig. 7A). The odds ratios (OR) for the five methods—MR Egger, Weighted Median, Inverse Variance Weighted, Simple Mode, and Weighted Mode—were all greater than 1 (Fig. 7B).
Mendelian randomization analysis of model genes and CRC. (A) Scatterplot illustrating the results of five methods. The abscissa represents the impact of SNP on the gene PLOD3, while the ordinate represents the impact of SNP on CRC. The dots in the plot represent SNPs, and the crosshairs indicate the fluctuation range of the effect size. (B) Results of the Mendelian randomization analysis utilizing the five methods.
Experimental verification of model gene
To further verify the role of PLOD3 in vitro, we evaluated the expression of the PLOD3 gene in tumor cells and normal cells through an in vitro cell culture experiment. The results indicated that the mRNA and protein levels of PLOD3 were significantly upregulated in SW1116 and Caco2 cells compared to normal NCM460 cells (p < 0.05, Fig. 8A-C). Following the knockout of PLOD3, the migration and invasive capabilities of the tumor cells were impaired(Fig. 8D-H), as evidenced by the significantly fewer cells present in the trans-wells in the si-PLOD3 group (p < 0.05) and the notably slower wound closure observed in the si-PLOD3 group during the wound healing experiment (p < 0.05).
PLOD3 is significantly upregulated in tumor cells, promoting cell migration and invasion. (A) High expression levels of PLOD3 mRNA in CRC cells. (B,C) Elevated protein levels of PLOD3 in CRC cells. The original gel is shown in supplementary Fig. 1. (D) qPCR validation of the knockdown efficiency of PLOD3. (E,F) Reduced expression of PLOD3 significantly inhibited the invasive capabilities of SW1116 and Caco2 cells. (G,H) Suppression of PLOD3 expression significantly decreased the migratory capacity of SW1116 and Caco2 cells. **p < 0.01, ***p < 0.001, ****p < 0.0001, #1 Results of the first experiment (The results of the second experiment #2 are uploaded as Supplementary Fig. 2.).
Discussion
The molecular mechanisms underlying CRC are intricate, involving the dysregulated control of multiple genes and signaling pathways. The PLOD family, consisting of PLOD1, PLOD2, and PLOD3, represents a crucial group of enzymes involved in the post-translational modification of collagen proteins. Their abnormal expression in various tumors is closely associated with tumor progression and metastasis25. The PLOD3 gene encodes an enzyme known as lysyl hydroxylase, whose primary function is to catalyze the hydroxylation of lysine residues in collagen precursors. This process is crucial for the correct folding and functionality of collagen. Specifically, PLOD3 introduces hydroxylysine residues into collagen molecules, which serve as essential sites for subsequent glycosylation and cross-linking reactions26. Studies have demonstrated that PLOD3 is overexpressed in various cancers and is closely associated with modifications of collagen in the tumor microenvironment. These modifications include lysine hydroxylation, a critical step in collagen maturation. Increased PLOD3 activity in cancer cells may alter the collagen network, thereby influencing tumor invasiveness and metastatic potential27. Collagen cross-linking is a critical feature of the tumor microenvironment, in which PLOD3 plays an integral role by hydroxylating lysine residues. This cross-linking enhances the stability of the extracellular matrix, which may be linked to increased tumor invasiveness and drug resistance. In cancers such as oral cancer, the activity of PLOD3 positively correlates with the degree of collagen cross-linking28. These prior studies indicate that PLOD3 expression and activity in cancer cells are directly implicated in the hydroxylation of lysine residues within collagen, a process that substantially influences the stability and functionality of the collagen network in the tumor microenvironment. These observations are consistent with our findings in colorectal cancer.
In this study, we conducted a comprehensive analysis of gene expression data from public databases, single-cell sequencing data, and in vitro experimental validation to elucidate the role of PLOD3 in CRC and its potential molecular mechanisms.
By conducting single-cell clustering analysis on the GSE161277 dataset from the GEO database, we accurately annotated and classified various cell types in CRC (CRC). Data normalization and batch effect correction using the Seurat package ensured data consistency and reliability. Following clustering, cells were reclassified based on established markers of CRC and immune cells, enabling us to identify the primary cellular populations in the tumor microenvironment and reveal their proportional distribution across different samples. The significance of this approach lies in its precise depiction of tumor microenvironment heterogeneity, particularly with regard to key immune cell populations in CRC. Functional enrichment analysis further elucidated the biological significance of these marker genes, suggesting that PLOD3 may play a significant role in specific cell types, thereby impacting tumor progression.
Analysis of cellular communication and its role within the tumor microenvironment Cellular communication analysis conducted using the CellChat package has revealed ligand-receptor interactions among different cell types, particularly the intricate communication network between tumor cells and immune cells. The analysis indicates significant communication activity between PLOD3-overexpressing cell populations and pro-inflammatory macrophage subtypes, suggesting that PLOD3 may influence the progression of CRC by modulating the immune response within the tumor microenvironment. This finding aligns with published literature highlighting the critical role of inflammation in CRC, further emphasizing the role of the tumor microenvironment in tumor growth and metastasis. Specifically, the high expression of PLOD3 in macrophages may facilitate tumor immune evasion and progression through the release of cytokines such as TNF-α, IL-6, and IL-17. These discoveries not only elucidate the function of PLOD3 within the tumor microenvironment but also provide a theoretical basis for the future development of immunotherapies based on PLOD3.
The integration of Meta-datasets and the elimination of batch effects have enabled us to validate the role of PLOD3 on a larger scale, ensuring the reliability and broad applicability of the research findings. Importantly, the consistent expression of PLOD3 across multiple independent datasets reinforces its significance as a potential prognostic marker.
In the construction of prognostic models, we utilized univariate Cox regression analysis to identify genes significantly associated with the survival of patients with CRC. Based on these genes, we developed various predictive models. The TCGA dataset was used as the training set, allowing the models to be trained on a large sample size, thereby enhancing their stability. The genes selected for predictive model construction included ABL2, AIF1L, APOD, BGN, CD248, CRIP2, NPR1, NR4A1, PCOLCE2, PLOD3, S100A4, and USP53. Notably, Jones et al. discovered through in vitro experiments that the ABL proto-oncogene 2 (ABL2), a member of the non-receptor protein tyrosine kinase family, not only plays a role in tumor adhesion and metastasis but also induces leptomeningeal metastasis of medulloblastoma by regulating the downstream c-MYC gene29. Cao et al. identified a novel circular RNA (has_circ_0007368) through RNA sequencing technology. Their findings showed that this novel circular RNA can effectively inhibit the progression of thyroid cancer by upregulating the expression of the allograft inflammatory factor 1 like (AIF1L) gene30. HE et al. screened out differentially expressed genes associated with metastasis in 681 CRC patients and found that BGN (biglycan) and THBS2 genes are closely related to this process and are associated with epithelial-mesenchymal transition (EMT) and overall patient survival time31. Similarly, as highly represented inhibitory genes in the gene model, NPR1 (natriuretic peptide receptor 1) and USP53 (ubiquitin-specific peptidase 53) play an indispensable role in the regulation of various tumors. For instance, Liu et al. experimentally confirmed that the absence of the NPR1 gene promotes the expression level of integrin subunit beta 4, thereby increasing adhesion of vascular endothelial cells32. Zhao et al. found that reduced levels of the USP53 gene are associated with lung adenocarcinoma, and this low expression is related to reduced cell apoptosis and increased metabolic activity, suggesting its potential tumor suppressor role. They also discovered that USP53 affects cell apoptosis and glycolysis by regulating the AKT1 pathway, ultimately inhibiting the growth of lung adenocarcinoma33. These findings demonstrate the rationale behind the gene model we constructed from both clinical and experimental perspectives.
Subsequently, validation in multiple independent datasets demonstrated that the prognostic model based on genes such as PLOD3 exhibits a high C-index, indicating its potential application value in the prognosis assessment of patients with CRC. Notably, PLOD3 not only shows significant predictive power for overall survival but also exhibits a strong correlation with immune-related pathways, enhancing its potential for future clinical applications.
Previous studies on the PLOD family genes in tumor research have indicated that they play a significant role in tumor development and metastasis by influencing the maturation of collagen proteins and the remodeling of the Extracellular Matrix (ECM)34. For instance, high expression of PLOD2 in tumor patients is associated with increased tumor aggressiveness and poor prognosis35. In glioma, PLOD1 contributes to pathogenesis through the NF-κB signaling pathway and the transition to malignant mesenchymal subtypes36. These studies provide essential background information on the role of the PLOD family in tumor biology. PLODs (including PLOD-1, -2, and − 3) are commonly overexpressed in ovarian cancer, and a strong correlation exists between PLOD3 and gap junction protein 43, suggesting that PLOD3 may play a particularly significant role in regulating intercellular communication and collagen synthesis37.
Conclusions
The results of this study indicate that PLOD3 not only plays a crucial role in the occurrence and development of CRC, but also potentially modulates the immune response within the tumor microenvironment, thereby influencing the efficacy of immunotherapy. This finding holds significant clinical implications, particularly in the context of the widespread clinical application of immune checkpoint inhibitors38. In the future, PLOD3 is expected to be incorporated as a novel biomarker into the molecular diagnostic and prognostic assessment systems for CRC. Moreover, combination therapies based on PLOD3, such as integration with existing immunotherapies, may enhance therapeutic efficacy, thus offering patients more personalized and precise treatment options.
In future studies, to further validate the efficacy of PLOD3 inhibitors in colorectal cancer and enhance the clinical translational significance of the research, we propose the integration of multi-faceted experimental designs and advanced technical approaches. First, CRISPR/Cas9 gene-editing technology can be utilized to knock out the PLOD3 gene in xenograft models, allowing observation of its effects on tumor growth and metastasis. Concurrently, the development of small-molecule inhibitors or monoclonal antibodies targeting PLOD3, coupled with pharmacokinetic (PK) analysis and maximum tolerated dose (MTD) assessments, could provide preliminary validation of drug efficacy. Moreover, multi-omics technologies, such as transcriptomics (RNA-seq) to examine the impact of PLOD3 inhibition on tumor-related signaling pathways (e.g., TGF-β, Wnt, VEGF), proteomics (LC-MS/MS) to investigate changes in collagen modifications, and metabolomics (LC-MS or GC-MS) to explore its role in tumor metabolic regulation (e.g., glycolysis and glutamine metabolism), could further elucidate the underlying mechanisms. Finally, liquid biopsy techniques, such as exosomal RNA and circulating tumor DNA detection, may enable dynamic monitoring of therapeutic efficacy and resistance mechanisms. These strategies will provide critical data to support the research and significantly enhance its scientific rigor and clinical relevance.
Data availability
The dataset referred to in this article is sourced from the GEO and TCGA websites and is publicly accessible.
References
Morgan, E. et al. Global burden of CRC in 2020 and 2040: incidence and mortality estimates from GLOBOCAN. Gut 72(2), 338–344 (2023).
Benson, A. B. et al. Colon cancer, version 2.2021, NCCN clinical practice guidelines in oncology. J. Natl. Compr. Canc. Netw. 19(3), 329–359 (2021).
Chen, Y. et al. PLOD3 contributes to HER-2 therapy resistance in gastric cancer through FoxO3/Survivin pathway. Cell. Death Discovery. 8(1), 321 (2022).
Zheng, X. et al. Single-cell transcriptomic profiling unravels the adenoma-initiation role of protein tyrosine kinases during colorectal tumorigenesis. Signal. Transduct. Target. Therapy. 7(1), 60 (2022).
Hao, Y. et al. Integrated analysis of multimodal single-cell data. Cell 184(13), 3573–3587 (2021). e3529.
Becker, W. R. et al. Single-cell analyses define a continuum of cell state and composition changes in the malignant transformation of polyps to CRC. Nat. Genet. 54(7), 985–995 (2022).
Wang, X. et al. Single-cell RNA-seq reveals the genesis and heterogeneity of tumor microenvironment in pancreatic undifferentiated carcinoma with osteoclast-like giant-cells. Mol. Cancer. 21(1), 133 (2022).
Devlin, J. C. et al. Single-cell transcriptional survey of ileal-anal pouch immune cells from ulcerative colitis patients. Gastroenterology 160(5), 1679–1693 (2021).
Khaliq, A. M. et al. Refining CRC classification and clinical stratification through a single-cell atlas. Genome Biol. 23(1), 113 (2022).
Fawkner-Corbett, D. et al. Spatiotemporal analysis of human intestinal development at single-cell resolution. Cell 184(3), 810–826 (2021). e823.
Che, L-H. et al. A single-cell atlas of liver metastases of CRC reveals reprogramming of the tumor microenvironment in response to preoperative chemotherapy. Cell. Discovery. 7(1), 80 (2021).
Jin, S. et al. Inference and analysis of cell-cell communication using CellChat. Nat. Commun. 12(1), 1088 (2021).
Smith, J. J. et al. Experimentally derived metastasis gene expression profile predicts recurrence and death in patients with colon cancer. Gastroenterology 138(3), 958–968 (2010).
Felipe de Sousa, E. M. et al. Methylation of cancer-stem-cell-associated wnt target genes predicts poor prognosis in CRC patients. Cell. stem cell. 9(5), 476–485 (2011).
Marisa, L. et al. Gene expression classification of colon cancer into molecular subtypes: characterization, validation, and prognostic value. PLoS Med. 10(5), e1001453 (2013).
Jorissen, R. N. et al. Metastasis-associated gene expression changes predict poor outcomes in patients with dukes stage B and C CRC. Clin. Cancer Res. 15(24), 7642–7651 (2009).
Tripathi, M. K. et al. Nuclear factor of activated T-cell activity is associated with metastatic capacity in colon cancer. Cancer Res. 74(23), 6947–6957 (2014).
Gautier, L., Cope, L., Bolstad, B. M. & Irizarry, R. A. Affy—analysis of Affymetrix GeneChip data at the probe level. Bioinformatics 20(3), 307–315 (2004).
Liu, Z. et al. Machine learning-based integration develops an immune-derived lncRNA signature for improving outcomes in CRC. Nat. Commun. 13(1), 816 (2022).
Kanehisa, M. & Goto, S. KEGG: Kyoto Encyclopedia of genes and genomes. Nucleic Acids Res. 28(1), 27–30 (2000).
Fu, J. et al. Large-scale public data reuse to model immunotherapy response and resistance. Genome Med. 12, 21 (2020).
Jiang, P. et al. Signatures of T cell dysfunction and exclusion predict cancer immunotherapy response. Nat. Med. 24(10), 1550–1558 (2018).
Hugo, W. et al. Genomic and transcriptomic features of response to anti-PD-1 therapy in metastatic melanoma. Cell 165(1), 35–44 (2016).
Kim, J. Y., Choi, J. K. & Jung, H. Genome-wide methylation patterns predict clinical benefit of immunotherapy in lung cancer. Clin. Epigenetics. 12, 1–10 (2020).
Tong, Y. et al. The PLOD2/succinate axis regulates the epithelial–mesenchymal plasticity and cancer cell stemness. Proc. Natl. Acad. Sci. USA 120(20), e2214942120 (2023).
GeneCards. GeneCards Commercial Trial. GeneCards [Internet]. 2025 [cited 2025 Jan 29]. https://www.genecards.org/cgi-bin/carddisp.pl?gene=PLOD3
Gong, S. et al. A human pan-cancer system analysis of procollagen-lysine, 2-oxoglutarate 5-dioxygenase 3 (Plod3). Int. J. Mol. Sci. 22(18), 9903 (2021).
Uzawa, K., Kasamatsu, A. & Yamauchi, M. Collagen cross-linking in oral cancer. Oral Sci. Int. 21(1), 3–14 (2024).
Jones, J. K. et al. ABL1 and ABL2 promote medulloblastoma leptomeningeal dissemination. Neuro-oncology Adv. 5(1), vdad095 (2023).
Cao, S. et al. CircGLIS3 inhibits thyroid cancer invasion and metastasis through miR-146b-3p/AIF1L axis. Cell. Oncol. (Dordrecht). 46(6), 1777–1789 (2023).
He, Z. et al. Identification of BGN and THBS2 as metastasis-specific biomarkers and poor survival key regulators in human colon cancer by integrated analysis. Clin. Translational Med. 12(11), e973 (2022).
Liu, H. et al. Lack of NPR1 increases vascular endothelial adhesion through induction of integrin Beta 4. Int. J. Mol. Sci. 23(20), 12627 (2022).
Zhao, X., Wu, X., Wang, H., Yu, H. & Wang, J. USP53 promotes apoptosis and inhibits glycolysis in lung adenocarcinoma through FKBP51-AKT1 signaling. Mol. Carcinog. 59(8), 1000–1011 (2020).
Yuan, Z. et al. Extracellular matrix remodeling in tumor progression and immune escape: from mechanisms to treatments. Mol. Cancer. 22(1), 48 (2023).
Du, H., Pang, M., Hou, X., Yuan, S. & Sun, L. PLOD2 in cancer research. Biomed. Pharmacother. 90, 670–676 (2017).
Wang, Z., Shi, Y., Ying, C., Jiang, Y. & Hu, J. Hypoxia-induced PLOD1 overexpression contributes to the malignant phenotype of glioblastoma via NF-κB signaling. Oncogene 40(8), 1458–1475 (2021).
Guo, T., Gu, C., Li, B. & Xu, C. PLODs are overexpressed in ovarian cancer and are associated with gap junctions via connexin 43. Lab. Invest. 101(5), 564–569 (2021).
Robert, C. A decade of immune-checkpoint inhibitors in cancer therapy. Nat. Commun. 11(1), 3801 (2020).
Acknowledgements
We express our gratitude to the oncology specialists from the Traditional Chinese Medicine Hospital of Lu’an City, Xu Sheng, Zhang Hongxia, Wu Dezhen, Shi Yue, Xu Jing, Cheng Li, and Ci Xueping, for their invaluable advice and guidance during the manuscript writing process.
Funding
This study was supported by the Natural Science Foundation of Anhui Provincial Higher Education Institutions (Project No. 2023AH050834) and the Science and Technology Plan Project of Lu’an City, Anhui Province (Project No. 2023lakj-zyy007).
Author information
Authors and Affiliations
Contributions
WD and BX were responsible for the overall conceptual framework of the manuscript, data collection, and the drafting of the initial manuscript. SL, WT, and QS were in charge of data processing and visualization, as well as translating, polishing, and checking for duplication in the manuscript. PZ and MW were responsible for the validation and visualization of the experimental sections.
Corresponding author
Ethics declarations
Competing interests
The authors declare no competing interests.
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Electronic supplementary material
Below is the link to the electronic supplementary material.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.
About this article
Cite this article
Dong, W., Li, S., Tang, W. et al. To investigate the tumor promotion role of PLOD3 in colorectal cancer and its potential as a prognostic biomarker and therapeutic target. Sci Rep 15, 5371 (2025). https://doi.org/10.1038/s41598-025-89521-z
Received:
Accepted:
Published:
DOI: https://doi.org/10.1038/s41598-025-89521-z