Introduction

Colorectal cancer (CRC) is the third most common cancer worldwide and the second most common cause of cancer-related death1,2. In 2023, 153,020 new cases of CRC were diagnosed in the United States3. Most CRCs begin with abnormal crypts, which gradually develop into adenomas after the accumulation of genetic and epigenetic mutations4,5 and eventually progress to CRC in 10–15 years6.

Adenomas can be further divided into non-advanced/low-risk adenomas (NAAs) and advanced adenomas (AAs) based on the size of the lesion and its pathological manifestations7. Colorectal lesions that meet any of the following conditions can be defined as AAs: (1) ≥ 10 mm in size; (2) tubular villous/villous histology; and (3) high-grade dysplasia. No significant difference was observed in the incidence of CRC or CRC-related mortality between patients with NAA and individuals without adenomas, whereas the prognosis of patients with AA is similar to that of patients with CRC7,8. Therefore, AAs may be a crucial intermediate in the development of CRC.

Patients with limited early CRC lesions (American Joint Committee on Cancer stages 0, I, or II) have a good prognosis, with a 5-year survival rate of > 80%9; therefore, early diagnosis and intervention are crucial for prevention and treatment. Colonoscopy is the most effective diagnostic method for CRC10, as it can screen and treat patients in the early stages of disease. Unfortunately, the missed diagnosis rate of colonoscopy varies between 12 and 26%11 because of the sporadic nature12,13 of adenomas and their small size. In addition, some patients with contraindications cannot undergo colonoscopy. Therefore, auxiliary screening tests are necessary. Several noninvasive methods have been used to detect CRC, such as the fecal occult blood and carcinoembryonic antigen (CEA) tests. However, the accuracy of these methods is relatively poor, and a novel auxiliary screening test with high accuracy needs to be explored.14,15.

Metabolomics is a discipline that studies small molecules (metabolites) present in biological systems at a given time point and can be used to identify changes in metabolite composition under the influence of exogenous (such as environment, lifestyle, and gut microbiota) and endogenous factors (such as genetic variations)16,17. This molecular characterization reflects physiological and pathological phenotypes18. Metabolomics provides a unique understanding of the mechanisms of disease occurrence and development19. It is therefore an ideal method for examining early characteristic changes in CRC progression. There are multiple prospective and retrospective serum metabolomic studies using different metabolite identification methods and different research cohorts, which provide insights into the mechanisms of CRC occurrence, development, risk prediction, and diagnosis20,21,22,23,24. However, current research cohorts only involve healthy controls and patients with adenomas or CRC and do not reflect the metabolic characteristics of key evolutionary stages from NAAs to AAs.

In this study, we performed a non-targeted metabolomic analysis using serum samples collected from healthy control individuals and those with NAAs, AAs, and CRC. This study aimed to explore the mechanism of CRC occurrence and development based on the unique serum metabolic fingerprints of different stages of CRC and to discover a new non-invasive diagnostic method based on serum metabolomics to differentiate high-risk populations (AA and CRC) and low-risk populations (NAA).

Results

Patient characteristics

After screening, 40 healthy individuals (NC group; age 55.94 ± 10.84 years), 40 patients with NAAs (NAA group; age 60.05 ± 9.13 years), 40 patients with AAs (AA group; age 56.45 ± 12.55 years), and 22 patients with early-stage cancer (CRC group; age 59.95 ± 9.94 years) were ultimately included. There were no statistically significant differences in age, sex, or tumor markers (CEA, CA19-9, CA125, and AFP) among the four groups (Table 1).

Table 1 General information of the subjects.

Serum metabolomic characteristics of the CRC progression cohort

The metabolites in the serum samples were identified and analyzed using liquid chromatography/mass spectrometry. The results of principal component analysis (PCA) and permutation multivariate analysis of variation (PERMANOVA) showed that differences in the metabolite composition among the four groups exist (R2 = 0.0783; P = 0.002) (Fig. 1A). Further PERMANOVA between pairs showed that differences mainly existed between the NC group and NAA group (R2 = 0.1136; P = 0.002), the NC group and AA group (R2 = 0.0585; P = 0.003), the NC group and CRC group (R2 = 0.1022; P = 0.001), the NAA group and CRC group (R2 = 0.0808; P = 0.006). Unfortunately, although the inter group differences were slightly greater than the intra group differences when comparing the NAA and AA groups, this was not statistically significant (R2 = 0.0257; P = 0.117). Unlike the NC and NAA groups, the serum metabolic profiles of patients in the AA group and early CRC group have higher homogeneity (R2 = 0.0123; P = 0.503) (Table 2). Moreover, these changes in serum metabolite composition during the occurrence and development of cancer are synchronous and continuous (Fig. 1B).

Fig. 1
figure 1

The composition of serum metabolites changes during the occurrence and development of colorectal cancer. (A) Principal component analysis; (B) partial least squares discriminant analysis. NC: normal control; NAA: non-advanced adenoma; AA: advanced adenoma; CRC: colorectal cancer; PERMANOVA: permutational multivariate analysis of variance.

Table 2 Permutational multivariate analysis of variance.

Key metabolites associated with NAA deterioration

To determine the characteristic serum metabolites associated with the occurrence and development of colorectal adenomas, inter group comparisons were conducted for each metabolite. However, no metabolite content was found to gradually increase or decrease in the four groups with statistical significance.

Given that the serum metabolite compositions of patients with AA and CRC were relatively similar, and the prognoses of the two patient groups are also similar7,8. We combined data from the two groups to identify key serum metabolites that changed during NAA progression. After comparing the semi-quantitative data of serum metabolites in the NC, NAA, and AA + CRC groups, 33 metabolites in the AA + CRC group changed compared to the NAA group; the abundances of 30 metabolites increased (Table 3). The functions of these metabolites mainly involved the glycerophospholipid metabolism, the sphingolipid signaling pathway, and caffeine metabolism (Fig. 2A–B).

Fig. 2
figure 2

Pathway and functional prediction of serum metabolites associated with non-advanced adenoma deterioration. (A) Functional annotation of differential metabolites based on KEGG database; (B) functional annotation of differential metabolites based on GO database.

Table 3 Permutational multivariate analysis of variance.

Construction of the diagnostic model

To evaluate the potential of serum metabolites as diagnostic biomarkers for AA/CRC, we randomly divided the NC/NAA groups (n = 80) and the AA/CRC groups (n = 62) into training (55/41) and test sets (25/21) in a 2:1 ratio to construct and evaluate the diagnostic model. First, LASSO was used to identify serum metabolite variables in patients with AA/CRC compared to NAA/healthy controls. A total of 57 metabolites in the training set had the potential to serve as diagnostic biomarkers. Next, diagnostic models were constructed using the LR, SVM, GBM, NNET, RF, and XGB algorithms, and ROC curves were plotted. The RF and XGB algorithms performed best on the training set, with area under the curve (AUC) reaching 1.000 (95% confidence interval [CI], 1.000–1.000) for both. The diagnostic models using the LR (AUC = 0.708, 95% CI [0.570–0.847]), NNET (AUC = 0.729, 95% CI [0.595–0.863]), and RF (AUC = 0.685, 95% CI [0.540–0.830]) algorithms performed better on the test set (Fig. 3A–B). Overall, the model generated using the RF algorithm exhibited the best diagnostic performance. The importance of the feature in the RF algorithm diagnostic model was ranked, with PI (18:1 [9Z]/18:1 [9Z]) and SM (d18:1/22:0) making the most prominent contributions to the model (Fig. 3C–D).

Fig. 3
figure 3

Construction of a diagnostic model for patients with advanced adenomas/colorectal cancer. (A) Receiver operating characteristic curves of the training set; (B) receiver operating characteristic curves of the test set; (CD) Permutation importance. LR: logistic regression; SVMS: support vector machine; GBM: gradient boosting machine; NNET: neural network classifier; RF: random forest; XGB extreme gradient boosting; AUC: area under curve; CI: confidence interval.

Discussion

A large number of studies have reported classification models for colorectal adenomas or cancers based on gut metabolites25,26, circulating metabolites20,21,22,23,24, and complex microbial metabolite characteristics, demonstrating their potential in diagnosing CRC and exploring the pathogenesis of different types of CRC27. However, the differences in serum metabolomics between NAA and AA with clear significance for clinical classification require further investigation. In this study, a cohort study of CRC progression was conducted, and differences in the composition of metabolites were identified among the four serum groups.

Corresponding to the clinical outcome and prognosis8, the serum metabolite composition of the AA and CRC groups was similar, while the NC and NAA groups showed significant differences compared to the CRC group. A study in Austria analyzed the serum metabolome of adenomas (excluding healthy controls) and found no significant difference in the composition of metabolites between patients with AAs and NAAs28. However, this study found that NAAs and AAs can be partially distinguished based on serum metabolites (but without statistical differences). A larger sample size or more precise targeted metabolomics may better explain these differences.

The occurrence and development of CRC involve multifactorial accumulation and slow progression6. AAs are important intermediate nodes in terms of serum metabolites. Patients with AA and CRC had significant changes in 33 metabolites compared to patients with NAA; these metabolites may indicate or affect the deterioration of NAAs. Altered metabolic functions mainly involved glycerophospholipid metabolism, sphingolipid signaling pathway, and caffeine metabolism. The metabolism of glycerophospholipids may be a hallmark of NAA deterioration. By analyzing the metabolomic differences among stage II CRC tissues, adjacent tissues, and normal tissues, Lin et al. found a significant relationship between glycerophospholipids and CRC development29. In a prospective cohort of 250 patients with CRC, dysregulation of glycerophospholipids in plasma was identified as a factor that may increase the risk of CRC30. Our results suggest that metabolic abnormalities in glycerophospholipids may be observed in the serum of patients with AAs at an earlier stage of the disease. Cancer cells must continuously generate glycerophospholipids for the production of cell membranes to match their high proliferative capacity, which may explain the increased glycerophospholipid content in the serum of patients with AAs31. At the same time, abnormal activity of the sphingomyelin metabolic pathway is one of the important serum metabolite characteristics during the deterioration of non-advanced adenomas (Fig. 2, Table 2). Elevated sphingomyelin in the cell membrane reduces membrane fluidity and permeability while increasing rigidity and strength32. This, to some extent, leads to immune suppression and abnormal proliferation. Moreover, abnormal accumulation of sphingomyelin can lead to a decrease in ceramide production. And ceramides are important signaling molecules in cancer biology, which can affect the apoptosis, proliferation, migration, aging, and inflammation of tumor cells33,34.

Unlike glycerophospholipids and sphingolipid, the role of caffeine in CRC occurrence and development remains controversial. Caffeine can interact with signaling pathways such as transforming growth factor beta, phosphoinositol kinase/AKT/mammalian target of rapamycin, and mitogen-activated protein kinase, playing an important role in the onset, metastasis, and prognosis of CRC. In addition, caffeine can act as an antioxidant to protect cells from oxidative stress and as a cell cycle regulator that regulates the DNA repair system35. The controversy over the role of caffeine in the progression of CRC may stem from the spatiotemporal heterogeneity of the disease progression cycle, where it may play a role in specific processes of adenoma deterioration during advanced stages.

Several studies have constructed early diagnostic models for CRC based on serum metabolites20,21,22,23,24. However, the classification in these models has certain shortcomings, and no further division of the developmental stages of colorectal adenomas are present. As the risk of colorectal cancer in patients with AAs and NAAs are significantly different8, developing a convenient diagnostic model that can distinguish AAs and NAAs is necessary. Therefore, this study used LASSO to screen for variables in serum metabolites and constructed models to distinguish the NC/NAA groups and AA/CRC groups based on multiple machine learning algorithms. The RF algorithm model performed the best in both the training (AUC = 1.000, 95% CI [1.000–1.000]) and test sets (AUC = 0.685, 95% CI [0.540–0.830]), with PI (18:1 [9Z]/18:1 [9Z]) and SM (d18:1/22:0) being the highest contributing metabolite variables. These results indicate that the serum metabolite diagnostic model based on PI (18:1 [9Z]/18:1 [9Z]) and SM (d18:1/22:0) has the potential to be used as a supplementary diagnostic tool to provide additional information about the disease situation of patients with colorectal adenoma and suggest corresponding prevention and treatment measures.

Serum metabolites show specificity at different stages of the development of colorectal adenoma into CRC. Significant differences in the composition of serum metabolites were observed between the NC/NAA groups and CRC group, whereas the serum metabolite compositions of the AA and CRC groups were similar. Auxiliary diagnostic measures based on serum metabolites have the potential to guide the follow-up and treatment of patients with colorectal adenomas.

Limitation

However, our study may be considered preliminary, as research based on a single group may be affected by group selection and analytical biases. Independent validation in a larger, multicenter, prospective trial is necessary to explore the diagnostic performance under real-world conditions. Such large studies reduce the effect of confounding factors (e.g., age and comorbidities) that may limit the model’s applicability or reduce its overall performance. Subsequently, the sources and biological functions of 33 serum metabolites related to the development of colorectal adenomas require further in vitro and in vivo experiments for interpretation.

Methods

Patients and specimen collection

This study was approved by the Medical Ethics Committee of First Affiliated Hospital of Ningbo University (KS202111002) and conducted in accordance with relevant guidelines. All participants and/or their legal guardians provide informed consent. Patients who visited the Gastroenterology Department of the First Affiliated Hospital of Ningbo University between December 2021 and January 2023 were enrolled in the study. The inclusion criteria were as follows: (1) age > 18 years and complete medical record information; (2) satisfactory bowel preparation and examination process and successful removal of tumor-like tissue; (3) clear staging of tissue pathological biopsy; and (4) provided informed consent for this study. The exclusion criteria were as follows: (1) familial polyposis of the colon, P-J syndrome, and other malignant tumors of the digestive system; (2) receiving antibiotics, immunosuppressants, or probiotics within 2 months prior to enrollment; (3) chronic diseases such as hypertension and diabetes; and (4) inability to actively cooperate. Patients with lesions < 1 cm in diameter and pathological results of tubular adenoma with or without low-grade intraepithelial neoplasia were included in the NAA group; patients with lesions ≥ 1 cm in diameter and pathological results of tubular adenoma with or without low-grade intraepithelial neoplasia or tubular villous adenoma or villous adenoma were included in the AA group; and patients with pathological results of high-grade intraepithelial neoplasia/submucosal carcinoma/submucosal superficial carcinoma were included in the CRC group.

Three milliliters of blood from each participant were centrifuged at 8000 rpm for 8 min, and the upper serum samples were collected. The serum samples were stored at -80°C until metabolomic testing was performed. Clinical data such as routine blood tests, biochemistry, coagulation function, tumor markers, pathological results, and colonoscopy reports were collected from electronic medical records.

Metabolite extraction

One hundred microliter of serum sample was transferred to an EP tube, and 400 μL of extraction solution (methanol: acetonitrile = 1:1 (V/V), with isotope-labeled internal standard mixture) was added. Subsequently, the solution was mixed by vortexing for 30 s, followed by sonication for 10 min in an ice-water bath. The mixture was left to stand at -40°C for 1 h and then centrifuged for 15 min at 4°C and 12,000 rpm (centrifugal force 13,800 × g, radius 8.6 cm). The supernatant was then added to a sample bottle and tested. An equal amount of supernatant was collected from all samples and mixed to form a QC sample for machine testing.

Metabolite analysis

The Vanquish (Thermo Fisher Scientific) ultra-high performance liquid chromatography system and Waters ACQUITY UPLC BEH Amide (2.1 mm × 100 mm, 1.7 μm) liquid chromatography column were used for chromatographic separation of the target compounds. The A phase of the liquid chromatograph was aqueous and contained 25 mmol/L ammonium acetate and 25 mmol/L ammonium hydroxide; the B phase was acetonitrile. The temperature of the sample tray was 4°C, and the injection volume was 2 μL. The Orbitrap Exploris 120 mass spectrometer is capable of collecting primary and secondary mass spectrometry data using the control software (Xcalibur, version 4.4, Thermo). The detailed parameters are as follows: Sheath gas flow rate: 50 Arb; Aux gas flow rate: 15 Arb; Capillary temperature: 320℃; Full ms resolution: 60,000; MS/MS resolution: 15,000; Collision energy: 10/30/60 in NCE mode; and Spray Voltage: 3.8 kV (positive) or -3.4 kV (negative).

Data processing

The raw data were converted into mzXML format using ProteoWizard software and processed using a self-developed R program package (with an XCMS kernel) for peak recognition, extraction, alignment, and integration. The processed data were matched with the BiotreeDB (V2.1) self-built secondary mass spectrometry database for substance annotation, and the cutoff value for algorithm scoring was set to 0.3. OmicsBean was used for pathway enrichment analysis, and MetaboAnalyst was used for principal component analysis (PCA) and partial least squares discriminant analysis (PLS-DA). Permutational multivariate analysis of variance (PERMANOVA) was used to evaluate the differences in metabolomics between groups. Paired non-parametric Wilcoxon test was used to determine metabolites with significant changes, and P < 0.05 was considered to have statistical significance.

Construction of the diagnostic model

All clinical samples were randomly divided into training and testing sets in a 2:1 ratio, and least absolute shrinkage and selection operator (LASSO) regression analysis was used to screen variables for the samples in the training set. Logistic Regression (LR), Support Vector Machine (SVM), Gradient Boosting Machine (GBM), Neural Network Classifier (NNET), Random Forest (RF), and Extreme Gradient Boosting (XGB) algorithms were used to construct diagnostic models and draw receiver operating characteristic (ROC) curves. Diagnostic performance was tested in the test set. These steps were performed using R (4.3.2).