Introduction

Colorectal cancer (CRC) is the third most commonly diagnosed cancer and is responsible for the second-highest number of cancer-related deaths worldwide1. Recurrence following surgery is the leading cause of morbidity and mortality in patients with CRC. The prognosis of CRC is highly favorable when the disease is detected and treated in its early stages and when the tumor is confined to the bowel wall. Neoadjuvant treatment approaches have been developed to reduce disease burden and, thereby, the risk of recurrence2,3.

In recent years, several factors have been recognized as pivotal in estimating the risk of recurrence before surgery. Approaches such as determining the lymphocyte infiltration and assessing the transcriptional profile of the tumor microenvironment (TME) have been associated with oncological outcomes4,5,6. Particularly, Immunoscore, a score that quantifies lymphocyte infiltration in invasive and central tumor regions, has been identified as a prognostic marker in CRC while also being predictive of neoadjuvant therapy efficacy in rectal cancers7. This requires access to the tumor, which may be cumbersome or not available. Being able to phenotype the tumors without access to the tumor may thus be beneficial in the neoadjuvant setting.

Circulating tumor DNA (ctDNA) levels before surgery have been shown to be prognostic factors for recurrence8,9. However, ctDNA analysis often requires tumor-informed analysis, necessitating access to tumor tissue for analysis. Recently, the TriMeth test, a tumor-agnostic ctDNA detection approach targeting hypermethylation of short regions of the C9orf50, KCNQ5, and CLIP4 genes, was shown to have high sensitivity and specificity for the blood-based detection of colorectal cancer10. During discovery and validation of the TriMeth test, it was also established that 100% of CRC tumors showed hypermethylation of at least one of the three genes10. Also, TriMeth was found to be highly prognostic in the postoperative setting after curative-intended cancer surgery for metastatic CRC11. However, the TriMeth test has not been linked to any transcriptional changes in the TME before.

To further enlighten the phenotype of ctDNA shedding tumors, we have analyzed the association between the transcriptional TME profile, including both cancer and non-cancer cells, immune cell invasion of the tumor, and the ctDNA status, determined by the TriMeth test. This may provide essential insights in the neoadjuvant setting, which potentially could guide targets at the tumor level and possibilities for measuring treatment response.

Results

Patient characteristics

Samples were available from 140 patients for ctDNA analysis, immunohistochemistry (IHC), and targeted gene expression analyses (Fig. S2). The patients had a median age of 69 years (range, 42–86 years), and 41.4% were females. A total of 16 (11.4%) patients had a mismatch repair (dMMR) deficiency. The baseline characteristics of the patients are presented in Table 1. Preoperative clinical biochemistry variables are available in Supplementary Table S1. Patients had a follow-up period of a median 4.6 years (IQR 3.8–5,6 years). In our cohort, 19 out of 140 patients (13.6%) were diagnosed with a recurrence within the follow-up period.

Table 1 Baseline characteristics

Expression of immune-related functional pathways in patients experiencing recurrence

First, we investigated whether the transcriptional profile of the primary TME was associated with recurrence. To analyze this, a targeted gene expression assay (n = 750 genes) was utilized to identify patient clusters and gene programs (GPs) associated with recurrence, and these findings were validated in the Cancer Genome Atlas (TCGA) colon and rectal adenocarcinoma (COADREAD) cohorts (Fig. 1a)12.

Fig. 1: Association between mRNA expression analysis of tumor samples and recurrence.
Fig. 1: Association between mRNA expression analysis of tumor samples and recurrence.The alternative text for this image may have been generated using AI.
Full size image

a Overview of the analysis approach. b Heatmap of scaled expression of all 750 endogenous genes. Each column designates a patient, and each row expression. Patients were clustered in C1-3 through an unsupervised k-means approach, with three clusters chosen based on the elbow and silhouette methods. Gene programs 1–6 were discovered based on visual inspection and the identified pathways. c Validation of the identified pathways in the 411 patients with non-metastatic CRC study cohort in the public TCGA COADREAD cohort. Only the GPs significantly associated with recurrence in the study cohort are depicted. Normalized mean mRNA expression in the pathways is depicted as boxplot showing median, upper, and lower quartiles. Whiskers extend into a max of 1.5 times the interquartile range. c Patient cluster. COADREAD Colon and rectal adenocarcinoma cohort, EMT Epithelial to mesenchymal transition pathway, GP Gene program, GO Gene ontology, HM Hallmarks of cancer, IFNA Interferon alpha signaling pathway, IFNG Interferon gamma signaling pathway, NS NanoString, PI3K PI3K-Akt-mTOR signaling pathway, TCGA The Cancer Genome Atlas, TIL1 Tumor-infiltrating lymphocytes pathway 1, TIL2 Tumor-infiltration lymphocytes pathway 2, TNFA TNF alpha signaling via NF-kappa beta signaling pathway.

Initially, we investigated the optimal number of patient clusters using the elbow and silhouette methods (Fig. S2) and clustered the patients with an unsupervised k-means clustering approach. The optimal number of patient clusters was deemed to be c = 3, and was associated with recurrence (fisher’s exact test, p = 0.015), suggesting that these patients had different TME phenotypes. Patients with recurrence were over-represented in clusters one (C1) and three (C3), n = 13/62 (21.0%) and 5/36 (13.9%), respectively. Cluster two (C2) included only one patient with recurrence, n = 1/42 (2.4%) (Fig. 1b). After identifying these clusters, we discovered 6 GPs with different gene expression levels between patient clusters (gene names, raw expression, and normalization variables are available in Supplementary Data file 1). A gene set enrichment analysis was performed, which identified several cancer- and immune-related pathways being represented in each GP. GP1 included pathways such as Hallmarks (HM) of Cancer Hypoxia, epithelial to mesenchymal transition, and Angiogenesis, all associated with the ability of cancer cells to migrate and metastasize13,14. GP2 included the disrupted cell division-associated pathways, HM G2M Checkpoint and E2F Targets, and the TNF Alpha signaling pathway associated with cancer proliferation15. GP5 included the cancer proliferation-associated pathways PI3K and p5316. GP3 included the immune system-related gene ontology (GO) pathways of lymphocyte activation, immune response, and adaptive immune response pathways. GP6 included the NanoString (NS) tumor-infiltrating lymphocytes (TIL) 1 and 2 pathways17, the NS Cytotoxic pathway, and the HM interferon gamma and alpha pathways. GP4 included the cancer differentiation-associated pathways HM, TGF Beta, and NOTCH signaling pathways18,19. In Fig. S3 a functional enrichment score analysis of the identified patient clusters and GPs is depicted, showing that C1 and C2 differ significantly in normalized expression of genes related to GP1, GP2, GP3, GP4, and GP6. Moreover, we notice that C2 and C3 are similar in GP1 and GP6, while C3 and C2 differ in all GPs but GP3.

In a functional enrichment score analysis (Fig. S4), the cancer metastasis-related GP1 was significantly increased (p = 0.010), and the cytotoxic activity-related GP6 (p = 0.006) was significantly decreased in patients with recurrence. A trend was evident for GP3 (p = 0.06) and GP5 (p = 0.08).

To externally validate our findings, we overlaid the identified GPs with the public TCGA COADREAD cohort. After curation, 411 patients with non-metastatic CRC and bulk RNA sequencing data remained. In a functional enrichment score analysis, the immune activity GP6 remained statistically significantly associated with recurrence in the TCGA cohort (Fig. 1c).

Association between ctDNA and a metastatic transcriptional phenotype in the tumor

Next, we investigated whether the preoperative ctDNA test results were associated with the transcriptional signatures of the primary TME and recurrence-free survival (Fig. 2a). The ctDNA test result was considered positive if two or more methylation markers (methC9orf50, methKCNQ5, and methCLIP4) were detected. The ctDNA test or its individual components were not associated with recurrence in a fisher’s exact test (Fig. 2b) nor in a log-rank test with recurrence-free survival as an outcome (Fig. 2c, d), although a trend was evident.

Fig. 2: Association between the ctDNA test with recurrence and the tumor microenvironment.
Fig. 2: Association between the ctDNA test with recurrence and the tumor microenvironment.The alternative text for this image may have been generated using AI.
Full size image

a Overview of the analysis approach. The ctDNA test comprised three different methylation markers, methKCNQ5, methC9orf50, and methCLIP4, and is considered positive if two or more markers are detected. b Association between the ctDNA test and its individual markers with recurrence. c, d Association between the ctDNA test and the methylation marker methKCNQ5 with recurrence-free survival analyzed via a log-rank test. e Functional enrichment score analysis of the association between identified GPs. Normalized mean mRNA expression in the pathways is depicted as boxplot showing median, upper, and lower quartiles. Whiskers extend into a max of 1.5 times the interquartile range. cfDNA cell-free circulating DNA, GP Gene program, Neg Negative, Pos Positive, RFS Recurrence-free survival, TME Tumor microenvironment.

In a functional enrichment score analysis, five of the six identified GPs were significantly associated with the ctDNA test results (Fig. 2e). The cancer metastasis and proliferation-associated GPs 1, 2, and 5 were significantly upregulated in patients with a positive ctDNA test (p < 0.001, p = 0.046, and p = 0.046, respectively). The immune activity-related GP3 was significantly downregulated in patients with a positive ctDNA test (p = 0.012), with a similar result in the cancer differentiation-associated GP4 (p = 0.019). There was no difference in the cytotoxicity-associated GP6 (p = 0.27). These results show that the ctDNA test can inform of the transcriptional profile of the TME, with a significant association with a metastatic phenotype.

Immune infiltration correlates with immune-related functional pathways in the tumor

We then investigated if the infiltration of CD3+ and CD8+ lymphocytes in the tumor was associated with recurrence-free survival and the transcriptional TME signature (Fig. 3a). CD3+ and CD8+ lymphocyte densities were quantified in tumors using an IHC-based validated quantification app20. Immune infiltration was defined as the mean percentile density of each cell population. Next, unsupervised hierarchical clustering was performed that split our patient population at a value of 55, designating patients with an immune infiltration of 0–55 to be in the “low” and patients with an immune infiltration of 56–100 to be in the “high” group (Fig. S5).

Fig. 3: Association between immune infiltration of the tumors with recurrence and the tumor microenvironment.
Fig. 3: Association between immune infiltration of the tumors with recurrence and the tumor microenvironment.The alternative text for this image may have been generated using AI.
Full size image

a Overview of the analysis approach. The immune infiltration variable comprised of the mean percentile of the CD3+ and CD8+ lymphocyte density in the central tumor. b Association between the immune infiltration and recurrence. c Association between immune infiltration and recurrence-free survival. d Functional enrichment score of the identified pathways in gene set enrichment analysis. Normalized mean mRNA expression in the pathways is depicted as boxplot showing median, upper, and lower quartiles. Whiskers extend into a max of 1.5 times the interquartile range. GP Gene program, RFS Recurrence-free survival, TME Tumor microenvironment.

IHC-based immune infiltration was not significantly associated with recurrence or recurrence-free survival (Fig. 3b, c). Next, the association of immune cell infiltration on the TME was examined (Fig. 3d). In a functional enrichment analysis, the cancer metastasis-related GP1 and the cancer differentiation-related GP4 were significantly downregulated in patients with high immune infiltration (p < 0.001 and p = 0.001, respectively). The cytotoxicity-related GP6 was significantly upregulated in patients with high immune infiltration (p < 0.001), while a trend was evident for the immune activity-related GP3 (p = 0.071). There was no difference in expression levels of GP2 and GP5 (p = 0.241 and p = 0.291, respectively).

ctDNA status differentiates transcriptional phenotypes within clinical risk factors

Next, we investigated how the transcriptional phenotypes of clinical N stage, MMR status, and immune infiltration were impacted by either a positive or negative ctDNA test. First, we found a significant difference in a functional enrichment score analysis of the cancer metastasis-related GP1 between clinical N0 and N1 or N2 stage (p = 0.010) (Fig. 4a). Then, we investigated whether a positive or negative ctDNA test influenced the phenotypes within clinical N0, and we discovered a significant association between a positive ctDNA test with GP1 (p = 0.030), with no significant difference for clinical N1 or N2 (p = 0.480) (Fig. 4b). For MMR status there was no significant difference between dMMR and pMMR in GP1 (p = 0.079) or the immune activity-related GP3 (p = 0.525) (Fig. 4c). Within pMMR a significant difference between ctDNA negative and ctDNA positive was evident for GP1 (p < 0.001) with a trend evident for GP3 (p = 0.069) (Fig. 4d). Within dMMR, no significant difference was seen between ctDNA positivity or negativity within GP1 (p = 0.117), but a significant difference was determined within GP3 (p < 0.001). For immune infiltration, we had previously seen a significant difference between immune infiltration low and high in GP1 with a trend evident in GP3 (Fig. 3d). Within the immune infiltration low group, ctDNA positivity was significantly associated with higher expression of genes related to GP1 (p < 0.001) and significantly associated with lower expression of genes related to GP3 (p = 0.012). Within the immune infiltration high group, ctDNA positivity was significantly associated with GP1 (p = 0.026), with no significant association with GP3 (p = 0.486).

Fig. 4: Functional enrichment score analysis of ctDNA status and different transcriptional phenotypes within clinical risk factors associated with recurrence.
Fig. 4: Functional enrichment score analysis of ctDNA status and different transcriptional phenotypes within clinical risk factors associated with recurrence.The alternative text for this image may have been generated using AI.
Full size image

a Association between clinical N stage and the GP1 phenotype. b Association between ctDNA status and clinical N stage with the GP1 phenotype. c Association between MMR status and the GP1 and GP3 phenotype. d Association between ctDNA status and MMR status with the GP1 and GP3 phenotype. e Association between ctDNA status and immune infiltration with the GP1 and GP3 phenotype. Numbers in parentheses denote the number of patients in each group. Normalized mean mRNA expression in the pathways is depicted as boxplot showing median, upper, and lower quartiles. Whiskers extend into a max of 1.5 times the interquartile range. dMMR Deficient MMR status, GP1 Metastatic phenotype, GP3 Immune activation phenotype, Imm Immune infiltration, N Clinical N stage, pMMR Proficient MMR status.

ctDNA levels are attributed to high importance in an elastic net regression model with recurrence as the outcome

We investigated how the different variables performed in an elastic net regression model with recurrence as the outcome. To determine the relative importance of ctDNA, we investigated how the models performed with or without available DNA data (Fig. 5a) (input variables and model settings are in Materials and Methods, and Data file 1, Table S2, ROC curves are available in Fig. S6). We chose an elastic net regression model because it is suitable for datasets where the number of variables is close to or more than the number of patients21. It handles multicollinearity well, which was suited for our dataset, as several pathways were correlated with ctDNA, immune infiltration, and recurrence. With this approach, redundant and correlated variables would be either reduced or removed from contributing to the model. The model could, therefore, suggest which variables from our dataset were relevant for predicting recurrence.

Fig. 5: Prediction of recurrence using an elastic net regression model approach.
Fig. 5: Prediction of recurrence using an elastic net regression model approach.The alternative text for this image may have been generated using AI.
Full size image

a Overview of the design of the two elastic net regression models. b, c Variable importance plot of the 20 most important variables in the model with all data and without DNA data. cfDNA cell-free circulating DNA, ctDNA cell-free circulating tumor DNA, CVD Cerebrovascular disease, GP Gene program, HM Hallmarks of Cancer, NS NanoString.

In the model with the full dataset, clinical, biochemistry, pathway, immune infiltration, and circulating DNA data were represented among the 20 most important variables (Fig. 5b). Whether methylation of the KCNQ5 gene was positive or negative was attributed the highest importance, while methC9orf50, ctDNA positivity, and cfDNA variables were also present. IHC data were represented by the high/low immune infiltration variable and the CD3+ lymphocyte density, while the transcriptional signatures GP5 and GP4 were also present. Among the clinical variables, MMR status, history of different comorbidities, and clinical N status were represented, with creatinine, hemoglobin, and potassium levels likewise assigned an importance.

When we trained the model with no circulating DNA data, the immune infiltration variable increased in importance (3.9 vs. 6.1, Fig. 5c). Alongside, GP5, GP1, and GP6 were also assigned an importance. Interestingly, now the MMR status and history of several comorbidities were highly important for estimating recurrence.

Discussion

Our study demonstrated that preoperative analysis using ctDNA can provide important information regarding the TME. Some TME phenotypes were significantly correlated with recurrence, and high immune infiltration was correlated with immune-related pathways in the TME.

Other studies have demonstrated that blood methylation analysis informs the detection and prognosis of several cancers10,11,22, with some based on whole genome methylation analysis. In our study, we utilized the validated TriMeth test, which has a simple, quick, and cheap lab workflow and straightforward interpretation of results, increasing its potential clinical usage.

Access to tumor tissue can be cumbersome for patients in the preoperative period. Thus, blood-based testing has gained much attention as a minimally invasive approach. Among blood markers, the neutrophil-to-lymphocyte ratio has been indicated to be prognostic in patients with advanced-stage cancers and negatively correlated with immune infiltration23. ctDNA has generally been utilized to detect minimal residual disease after curative-intent surgery, as a biomarker of recurrence, or for use in targeted therapies8,24,25. Few studies have investigated the correlation between cfDNA and its correlated features in the TME, besides the early signal of cfDNA being correlated with the stage and size of the cancer24,26. cfDNAs have been recognized as a damage-associated molecular pattern of toll-like receptor 9 (TLR-9), linked with the hosts defense against microbial products27. Activation of these receptors attracts, among other things, tumor-associated macrophages to the TME, thus inhibiting the function of adaptive immune cells28,29. Although the origin of cfDNA is still debated30, tumors with a phenotype of high yield of cfDNA and ctDNA in blood may reversely alter the TME to a more metastatic phenotype, which could explain the association to the GPs associated with a more aggressive cancer phenotype. This could ideally be tested in organoid or in vitro models where they are exposed to a medium with either high or low cfDNA/ctDNA concentrations.

The ability of ctDNA to distinguish phenotypes within clinically established risk factors of recurrence was interesting, especially as we showed a significant difference in the immune activity-related GP3 within the dMMR group, with a negative ctDNA test associated with a more immunologically active tumor. This is interesting, as although the high efficacy of immune checkpoint inhibitor treatment in dMMR CRC is evident, still some patients do not have a complete response. Based upon our findings, the ability of ctDNA to stratify responders/non-responders to immune checkpoint inhibitor treatment should be investigated further.

We wanted to investigate how ctDNA, in combination with the other variables, ranked in a prediction model with recurrence as the outcome. In the model with all available data, the ctDNA variables were of high importance, while only the immune infiltration and CD3 density were among the top 20 most important variables. This underlines the importance of ctDNA in determining the phenotype of the tumors. However, we also noted that the model still utilized some GPs associated with ctDNA, which suggests that multiple modalities are relevant. This was also underlined by the several clinical factors that persisted across the two models, which advocate that the patient should be represented by a phenotype based on medical history and different translational methods to capture and predict recurrence reliably. When DNA data was removed from the dataset, the immune infiltration variable and the GPs associated with metastasis increased in importance. This suggests that although immune infiltration is important, a more profound phenotypic representation of the TME is valuable for predicting recurrence. Interestingly, this difference in the assigned importance to pathways associated with ctDNA underlines that ctDNA is potentially useful as a surrogate marker of the TME.

One central consideration of our results is that some patients received adjuvant chemotherapy after surgery. This could mean that even though a patient had a phenotype that would result in recurrence, this could have been abrogated by adjuvant chemotherapy treatment. This represents a common limitation in using preoperative and pre-adjuvant chemotherapy samples.

Although our study presents intriguing results, there are some limitations. We excluded patients if not all samples and analyses were available, which could have biased our results. We identified GP1, GP3, and GP6 as significantly associated with recurrence in our study cohort, while we only found this association with GP6 in the TCGA cohort. Finally, the elastic net regression model in our study is not suited for clinical use due to the low number of patients, but it only serves to showcase the behavior when ctDNA data is present or not.

Recently, the biomarker-directed phase 2 trial KEYNOTE-495 showed that prospective characterization of a tumor-based T-cell-inflamed gene expression profile and tumor mutational burden was possible before initiation of first-line combination pembrolizumab treatment in advanced non-small cell lung cancer and could predict responders to therapy31. Future studies should investigate whether ctDNA could be utilized to select patients for neoadjuvant therapy.

This study shows that preoperative tumor-agnostic analysis of ctDNA is associated with the TME and could identify patients with a metastatic phenotype.

Methods

Patients

The study consecutively included patients who met the following inclusion criteria: age > 18 years, non-metastatic clinically suspected or histologically verified colon or rectal adenocarcinoma, no indication for neoadjuvant treatment, and scheduled for curative-intent surgery. Exclusion criteria were previous or concurrent cancer or synchronous tumors. Written informed consent was obtained from all study participants. This study was approved by the Danish Regional Committee on Health Research Ethics (SJ-829 and 1-10-72-223-14) and was conducted in accordance with the Declaration of Helsinki.

Study design

The study was a prospective cohort study, with consecutive inclusion of patients adhering to the inclusion and exclusion criteria from September 2016 to September 2019 from the Department of Surgery, Slagelse Hospital. All patients underwent curative-intent surgery.

Sample collection

Blood samples were collected within 40 days before surgery. Blood samples were centrifuged at 3000 g for 10 min at room temperature, and plasma was isolated. The plasma was centrifuged using the same approach and frozen at −80 °C until analysis. Tumor samples were collected from the resected specimen containing both the invasive margin and the central part of the tumor and stored as formalin-fixed paraffin-embedded (FFPE) blocks.

cfDNA isolation and quantification

Plasma was thawed at room temperature, and cfDNA was extracted from 8 ml of plasma using the QIAamp Circulating Nucleic Acid Kit (Qiagen) following the manufacturer’s instructions. Purified cfDNA was eluted in LoBind tubes (Eppendorf) in a volume of 60 µL and stored at −80 °C until further use. cfDNA was quantified by ddPCR as previously described10,32.

ddPCR assays

cfDNA in plasma was quantified using an assay targeting a reference gene located on chromosome 1 (the “CF assay”), which rarely shows copy number alterations in cancer. This assay amplifies a cytosine-free genomic region, allowing it to be used before and after sodium bisulfite conversion. Thus, the cfDNA recovery can be estimated as the measured “CF quantity” after bisulfite conversion divided by the “CF quantity” before bisulfite conversion.

Methylation-based ctDNA detection included specific primers and probes targeting the fully methylated, bisulfite-converted regions of the C9orf50, KCNQ5, and CLIP4 genes. The identification, optimization, and validation of the markers were described in previous studies10,32. All samples were analyzed using two duplex reactions (C9orf50 + KCNQ5 and CLIP4 + “CF-assay”), i.e., the sample volume was divided into two reactions.

ddPCR setup

All ddPCR experiments were conducted on the Droplet Digital PCR System (Bio-Rad) according to the manufacturer’s instructions. The reaction mix included 8 µL template cfDNA, 18 pmol forward primer, 18 pmol reverse primer, 5 pmol probe, 2x Supermix for Probes (no UTP; Bio-Rad), and AccuGENE™ Molecular Biology Water (Lonza) to a final volume of 22 µL. Droplets of 1 nL were generated on the QX200 AutoDG Droplet Generator (Bio-Rad). After droplet generation, the samples were amplified by PCR on an S1000 Thermal cycler (Bio-Rad) using the following program: 95 °C for 10 min, 45 cycles of 95 °C for 30 s, and 56 °C for 1 min, and one final cycle of 98 °C for 10 min. PCR products were stored at 4 °C for up to 18 h before analysis on a QX200 reader (Bio-Rad). Positive, negative, and no-template controls (NTC) were included for each assay. For methylation-based ctDNA detection, the positive control was 5 ng methylated bisulfite-converted DNA, the negative control was 66 ng non-methylated bisulfite-converted DNA (Zymo Research), and a water sample was NTC.

Sodium bisulfite conversion of cfDNA

Before cytosine conversion, cfDNA was dried using vacuum centrifugation at 30 °C (speedVac, Concentrator plus 5350, Eppendorf AG) and resuspended in 20 µL AccuGENE™ Molecular Biology Water (Lonza). cfDNA was sodium bisulfite converted using the EZ-96 DNA Methylation-Direct™ MagPrep kit (Zymo Research) either manually or automated on a Zephyr robot according to the manufacturer’s instructions, but with the following volumes of reagents: 60 µL CT conversion reagent, 280 µL M-Binding buffer, 5 µL Mag-Binding Beads, 185 µL M-Wash buffer, 93 µL M-Desulphonation Buffer, and 25 M-Elution Buffer. Fully methylated and fully unmethylated DNA control samples were included in all batches as positive and negative controls, respectively. Reactions were performed on an S1000 Thermal Cycler (Bio-Rad). All cfDNA was analyzed directly after bisulfite conversion.

Methylation-based ctDNA analysis

The raw fluorescence intensity data for all individual droplets in each well were extracted using Quantasoft v1.7 software (Bio-Rad) and analyzed plate-wise as previously described10,32. Briefly, fluorescence data from a fully methylated positive control sample on each plate were used to identify fluorescence maxima for the negative and positive droplet populations. A Gaussian kernel density was applied to identify exactly two maxima and one minimum. All samples on each plate were subsequently normalized to the median fluorescence of the negative population from the positive control. The fluorescence threshold for calling droplets positive or negative was finally set for all wells using a stringency parameter, beta, of 0.15, thus including 85% of the data after the threshold point set at the minimum point between the negative and positive populations as defined by the positive control sample. The concentration c (copies per well) of methylated DNA was calculated as c = -N*ln(1-P/N), where N is the total number of droplets and P is the number of positive droplet33. The code in the R language is available on GitHub (https://github.com/MOMA-CRC/ddanalyzor.git). A previously developed and validated algorithm was used for classifying samples as ctDNA positive or negative10,11,32,34. The validated algorithm has recently been tested on more than 330 CRCs and 800 non-cancer controls and delivered a specificity of >98.5% and a sensitivity ranging from 68 to 100%, dependent on patient UICC TNM stage34. The algorithm classifies a sample as “ctDNA positive” if at least two of three methylation markers shows a positive signal above the threshold, otherwise the sample is “ctDNA negative” 10.

Immunohistochemistry and digital quantification of infiltrating CD3+ and CD8+ lymphocytes

Sections for downstream analysis of CD3+ and CD8+ lymphocytes with a thickness of 4 µm were cut from FFPE tissue blocks. Staining was conducted using the following primary antibodies: anti-Cytokeratin clone BS5 (Nordic Biosite Aps, Denmark, # BSH-7124-1), anti-CD3 clone LN10 (Leica/Triolab AS, Denmark, # NCL-L-CD3-565), and anti-CD8 clone C8/144B (Agilent/Dako, Denmark, # GA623). The double-labeling staining procedure was performed on the automated instrument Omnis (Agilent/Dako, Denmark). To begin, antigen retrieval was carried out with EnVision™ FLEX Target Retrieval Solution, High pH (Agilent/Dako, Denmark, # GV804), and subsequently, the slides were incubated with primary antibodies, CD3 (1:50) or CD8 (Ready-to-Use), for 30 min at 32 °C. The reactions were then detected using the standard polymer technique EnVision™ FLEX /HRP Detection Reagent (Agilent/Dako, Denmark, # GV800/GV821) and visualized using the EnVision™ Flex DAB+ Chromogen system (Agilent/Dako, Denmark, # GV825) according to the manufacturer’s instructions. In the second sequence, the slides were incubated with anti-Cytokeratin (1:800) using the same protocol as described above, except the reactions were visualized with the EnVision™ Flex Magenta Chromogen system (Agilent/Dako, Denmark, # GV900). Finally, the sections were counterstained with hematoxylin and mounted using Pertex mounting media (Pertex™/Histolab, Sweden, #00801-EX). We used a Nanozoomer s60 slide scanner (Hamamatsu, Japan) and processed the digital images using the Visiopharm Quantitative Digital Pathology software (Visiopharm, Denmark). We used an in-house developed and validated application to quantify the densities of CD3+ and CD8+ lymphocytes, described in detail elsewhere20. We used only the densities of CD3+ and CD8+ lymphocytes measured in the central part of the tumor.

Gene expression panel

We used the same FFPE block as for IHC analysis. We extracted total RNA from 10 µm sections of FFPE slides using the High Pure FFPET RNA Isolation Kit (Roche Life Science, Germany) as per the manufacturer’s instructions. The quantification of total RNA was performed using spectrophotometry (NanoDrop, Thermo Scientific, USA), and the quality assessment of RNA was carried out using the Bioanalyzer (Agilent, Denmark). For each sample, approximately 300 ng of total RNA was utilized to ensure adequate determination of gene expression levels. To conduct RNA hybridization overnight, we employed the nCounter® IO360 panel consisting of 750 endogenous human transcripts for gene expression analysis (NanoString, USA). The sample acquisition was accomplished using the nCounter® system following the manufacturer’s instructions.

Gene expression data analysis

We used the nCounter platform to generate a raw gene count matrix and then performed an iterative quality control (QC) and normalization as described elsewhere35. Initially, technical sample quality was assessed, and several quality control measures were conducted. These included running principal component analysis (PCA), an Euclidean distance heat map, and a gene count histogram. Raw gene counts were normalized using an upper-quartile normalization approach, followed by a variance stabilizing transformation. Unwanted variation was estimated using the “RUVg” function from the RUVSeq package (v.1.32.0). To eliminate the estimated unwanted variation, the “removeBatchEffects” function from the limma package (v.3.56.1) was employed. Through iterative quality control and normalization procedures, including gender as a factor in normalization, a total of n = 3 vectors of unwanted variation were subsequently removed from the dataset.

To partition patients based on the transcriptional profile, the k-Means clustering approach was applied using “kmeans” function from the stats package (v.4.4.1.). The elbow method and average silhouette width were calculated for cluster k = 2–20 to estimate the optimal number of partitions in the dataset. Parameters for final sample clustering were k = 3. Genes were divided into n = 6 gene programs (GPs) using a hierarchical clustering algorithm, based on Euclidean distances and the farthest neighbor linkage. Optimal number of GPs was 6, determined by visual inspection of the different iterations of GPs. To generate a heatmap of the complete gene panel (n = 750), the “Heatmap” function from the ComplexHeatmap package (v.2.16.0) in R was employed36. Gene set enrichment analysis was carried out to assess the functional enrichment score of GPs. Gene set annotations were obtained from the Molecular Signatures Database v7.4, including gene sets from the Hallmark database37 and the “biological processes” (BP) from the Gene Ontology (GO) database38. In addition, NanoString panel-specific gene sets were retrieved from Danaher et al.17. To perform the gene set enrichment analysis, enrichment functions specific to each gene set annotation were utilized from the “clusterProfiler” package (v.4.8.1)39, such as “enrichGO” for the GO database. Pathways were considered for functional enrichment score analysis if they met the threshold criteria of an adjusted p-value < 0.05.

One sample was removed after QC and normalization. To identify GPs associated with recurrence, we iteratively investigated the visual distribution of high or low expression of genes in the different patient clusters with a higher or lower number of GPs. The functional enrichment score analysis was performed using the mean of genes represented in each GP, and the association with different categorical variables, such as recurrence, was analyzed with an unpaired t-test.

TCGA colon adenocarcinoma and rectal adenocarcinoma cohort

Baseline, pathological, follow-up, and RNA sequencing data for the TCGA colon and rectal adenocarcinoma (COADREAD) cohort were downloaded from the public National Cancer Institute GDC Data portal (portal.gdc.cancer.gov). Baseline, pathological, and follow-up data were merged, and only patients with non-metastatic disease with available RNA-sequencing data from tumor samples were retained. The recurrence event variable was derived from the progression-free interval (PFI) and the overall survival (OS) variable. PFI was defined as the period from the date of diagnosis until the first occurrence of a new tumor event (NTE), which could be the progression of disease, locoregional recurrence, distant metastasis, new primary tumor, or death with tumor12. As only patients with non-metastatic disease were retained, we considered PFI not to include disease progression. The recurrence variable was defined as equal to “Recurrence” if a patient experienced a PFI = 1 without death at the same date. This ensured that the recurrence event comprised only locoregional recurrence, distant metastasis, or a new primary tumor. Although a new primary tumor may be regarded as a censoring event rather than a recurrence, it was not possible to distinguish this in the data. The incidence of recurrence was comparable between TCGA data and the study cohort (14.5% vs. 13.5%). RNA-sequencing data were normalized by variance stabilizing transformation (standard DESeq2 workflow) (v.1.44.0)40 with additional correction for patient sex using the “removeBatchEffects” function from the limma package (v.3.56.1). To perform the functional enrichment score analysis, we only utilized the genes identified in our GPs from the study cohort. Then, we took the mean of genes represented in each GP in the RNA-sequencing data and tested the association with recurrence with an unpaired t-test.

Variables in the elastic net regression model

The variables utilized in the elastic net regression model are provided in the Supplementary Materials. All variables were considered numerical and were scaled around the mean with the standard deviation as variance, and variables with no variation were removed prior to model training. Variable importance plots were computed using the vip package (v.0.3.2).

Statistical considerations

The primary endpoint was the association between ctDNA and the TME. Exploratory outcomes were recurrence-free survival, defined as the time to recurrence (local or distant), and death after surgery. The end of the follow-up period was 01/01/2023. No formal sample size calculations were made. The log-rank test was used to analyze the time to recurrence between the groups. Fisher’s exact test was used to compare categorical variables, and unpaired t-tests were used to compare continuous variables between the groups. We used the Wald significance test within DESeq2 frame to compute the DE genes between the groups. All box plots are presented as median and interquartile ranges.

The tidyverse package (v.2.0.0) was used for the elastic net regression model. Patients were randomly split into training (80%) and test (20%) sets, stratified for incidence of recurrence. We applied five-fold cross-validation to the training set. The elastic net regression model was tuned in the training set to discover the best-performing hyperparameter settings (penalty and shrinkage) using a maximal entropy design with a grid of 500. The area under the receiver operator characteristic curve (AUROC) was used as a benchmark for model performance. After determining the optimal hyperparameters, the model was applied to the test set. For the model with all data, the optimal penalty was 0.35 and the mixture of 0.16. The area under the receiver operator curve (AUROC) was 0.68. For the model with no DNA data, the optimal penalty was 0.33 and the shrinkage of 0.14. The AUROC was 0.80. For performance metrics, 1000 bootstrapped runs were performed to determine 95% CI intervals. No data were missing. The outcome designated for the elastic net regression model was the incidence of recurrence. Statistical analyses were performed using R version 4.3.0. Multiple testing corrected (Bonferroni–Hochberg correction) and uncorrected values of p < 0.05, unless stated otherwise.