Introduction

Mutation of the downstream TGF-β-BMP ligand-dependent signalling pathway can result in context-dependent and paradoxical gain of function (GOF) of the cancer phenotype, i.e. the ligand activation of the normally functioning signalling pathway exerts a tumour suppressive effect, yet following downstream pathway mutation, ligand activation can then result in ‘oncogenic’ GOF activation, epithelial-mesenchymal transition (EMT) and potent promotion of the invasive and metastatic cancer phenotype1. It is well established that human cancer associated variants (mutations) of the TGF-β-BMP-SMAD4 pathway are correlated with poorer prognosis in enriched sub-types of intestinal cancers such colorectal and pancreatic adenocarcinoma2. Despite the incorporation of whole genome sequencing, gene level RNA sequencing and proteomic biomarkers into supervised machine learning prognostic classifiers in colorectal cancer, there remains a paucity of pathway-specific gene expression biomarkers more closely linked to molecular mechanisms and indicative of pathway activity3,4. This background provided the impetus for further evaluation of the critically important TGF-β-BMP-SMAD4 pathway that when disrupted, is associated with the worse prognostic sub-types of intestinal cancer. Here, we used murine conditional Apc dependent intestinal models to generate in vivo and in vitro adenoma with and without Smad4 pathway modification to prospectively identify phenotypes and context specific gene expression.

The TGF-β1 ligand acts as a dimer to bring together specific heterodimeric cell surface type I and II serine/threonine kinase receptors to activate signals by phosphorylation of R-Smad transcription factors. For example, TGF-β1 binds to TGFBR1 and TGFBR2 to signal through p-Smad2/3, but can also bind to activin receptor type 1A (ACVR1) to transduce signal through p-Smad1/55,6. Following receptor activation, phospho-R-Smads form either heterotrimeric or heterodimeric complexes with Smad4, the co-Smad that continuously shuttles between the cytoplasm and nucleus5,7. Smad4-R-Smad complexes bind with co-factors and DNA to co-ordinate gene expression6. Chromatin regions can alter quickly (within 10 min) following TGF-β1 stimulation, with assembled Smad complexes binding to cis-regulatory elements with AP-1 footprints8,9. Smad4 and R-Smad proteins consist of two highly conserved globular domains, named Mad homology domains, both at the N-terminus (MH1) and C-terminus (MH2) separated by a less well-conserved serine and proline-rich linker region critical for protein–protein interactions and stability6,9,10. The MH1 domain is involved in DNA binding through a conserved identical hairpin structure present in all members except Smad25,11,12. The majority of SMAD4 loss of function (LOF) cancer somatic mutations are missense, although nonsense, splice site, frameshift, and in-frame insertion/deletion mutations exist. In terms of protein localisation, frequent hot spot mutations occur within the conserved R-SMAD binding surface of the MH2 domain and in the MH1 domain along the DNA-binding interface13.

The specialised intestinal epithelium is comprised of crypt-villus functional cell units that provide an ideal experimental system for the investigation of specific components of the TGF-β-BMP-SMAD4 signalling pathway14. Differentiated cells in the crypt and villous comprise of four epithelial lineages: enterocytes, goblet, entero-endocrine and Paneth cells14. Leucine-rich repeat containing G-protein-coupled receptor (Lgr5) positive crypt base columnar cells (CBC) are stem cells that interact with Paneth cells to maintain the stem cell niche through Wnt activation and β-catenin stability15,16. Lgr5 positive CBC cells divide, renew and create transit amplifying cells that undergo 4 to 5 cycles of cell division before differentiating as they ascend the villus. By using a Lgr5 driven Cre recombinase, genes with associated loxP sites can be randomly disrupted in CBCs once the Cre is activated. BMP signalling is active at the crypt-villous border towards the villous tip and is controlled by a ligand gradient to promote differentiation towards secretory and enteroendocrine cells17,18,19. BMP concentrations are regulated in the intestinal crypts by several different BMP antagonists, such as Gremlin, Noggin and Chordin-like120. TGF-β type I/II receptors and ligands are present and expressed in the muscularis, lamina propria and the differentiated compartment of both the small intestine and colon21. TGF-β is also frequently present in the intestinal tumour microenvironment and is released either by cancer cells, tumour stromal cell or immune cells22. Intestinal adenoma can be conditionally generated when Apc loss of function no longer regulates the degradation of β-catenin, resulting in constitutive activation of transcription by β-catenin/Tcf4 and adenoma formation. Subsequent mutational events in humans are selected for during the progression of adenoma to invasive and metastatic carcinoma, e.g. RAS/MAPK GOF, PTEN and TP53 LOF, and later acquired LOF mutation of the TGF-β-BMP ligand activated receptor pathway, such a mutation of receptors and Smad2/3/423.

Here, we first set out to evaluate loss of the TGF-β pathway signal transduction regulator, Mothers against decapentaplegic homolog 4 (Smad4), in vivo and in vitro, with respect to a murine intestinal adenoma phenotype and dependent gene expression. In view of the complexity of the interacting pathways that can alter TGF-β1-Smad4 induced phenotypic effects, especially in immortalised human cancer cell lines, we sought to better define the contextual impact of Smad4 LOF to specifically evaluate the Smad4 and TGF-β1 ligand dependency on gene expression in murine models and to translate these findings with respect to human intestinal cancer. The conditional Apcfl/fl model of intestinal adenoma using Lgr5CreERT2 is well characterised, and so we first incorporated loxP dependent (LOF) of Smad4 (Smad4fl/fl alleles) into this model.

Results

Combined Apc and Smad4 conditional disruption results in discordant adenoma phenotypes

To investigate the adult stage-specific genetic interactions between Apc and Smad4 in the murine intestine, we combined conditional alleles with a Cre driven by the promoter Lgr5 and a ROSA26-loxSTOP reporter allele (Lgr5-EGFP-IRES-CreERT2 or Lgr5-CreERT2)24 (Fig. 1a). Heterozygote alleles were inter-crossed (Apcfl/+Smad4fl/+Lgr5-CreERT2) on a C57Bl/6J background (backcrossed to C57Bl/6J for > 10 generations) to generate homozygotes and heterozygote littermates that differ in Smad4 conditional alleles25,26. Conditional recombination followed tamoxifen injection (4-OHT) in adult animals that were then phenotypically analysed with respect to survival and adenoma phenotypes (Fig. 1, Supplementary Fig. 1). The floxed alleles of Smad4 (Smad4Δ/Δ) resulted in loss of the MH1 DNA binding domain and the production of a truncated 43 kDa protein which was not capable of translocating to the nucleus (see below, Supplementary Fig. 2). The germline homozygote loxP alleles are denoted Smad4fl/fl and Apcfl/fl, and where somatic tissues have been directly tested for the presence of the resulting Cre-induced recombined alleles (floxed) using PCR, were then denoted as Smad4Δ/Δ and ApcΔ/Δ. Recombination rates in intestinal epithelial cells were lower overall with the Lgr5 transgene (5–6%), but highly specific for Lgr5 CBC stem cells15,27. Rapid adenoma development occurred as expected following homozygous Apcfl/fl loss of function (median 35–40 days post injection 4-OHT) and animals were culled immediately if they exceeded predefined humane endpoints. Comparison of overall survival (Kaplan–Meier) between Apcfl/flSmad4fl/flLgr5-CreERT2 and Apcfl/flSmad4fl/+Lgr5-CreERT2 with Apcfl/flSmad4+/+Lgr5-CreERT2 controls surprisingly showed a modest but significant improvement in survival (p = 0.011, Fig. 1a). These observations suggest that introducing Smad4 LOF in Apc LOF intestinal adenoma does not result in overwhelming tumour progression and metastasis. However, analysis of intestinal adenoma distribution between the small intestine and colon showed a marked discordance in growth phenotypes in Apcfl/flSmad4fl/flLgr5-CreERT2. A significant reduction in small intestinal adenoma burden (number of adenoma per mouse) was observed concurrent with paradoxical growth promotion of adenoma in the caecum (Fig. 1b). As the distribution of adenoma on a C57BL/6 background is predominantly in the small intestine rather than the colon and caecum, the multiple small intestinal adenoma would be expected to promote anaemia, small intestinal obstruction and decreased survival27,28. The significantly reduced small intestinal adenoma burden in Apcfl/flSmad4fl/flLgr5-CreERT2 compared to controls may have improved overall survival if this was the only phenotypic effect (Fig. 1a, b). Examination of the caecum and large intestine revealed markedly enlarged coalesced adenoma with associated mucus cysts where numerical analysis of single and separate adenoma was not possible, hence we reported caecal weight as a surrogate (Fig. 1b). The adenoma burden in heterozygote Apcfl/flSmad4+/flLgr5-CreERT2 compared to controls was statistically non-significant and does not support either Smad4 haplo-insufficiency or a dominant negative effect of Smad4fl/+. Histological grading of small intestinal adenoma showed low-grade dysplasia in all genotypes, whereas more advanced high grade dysplasia with cytonuclear and architectural changes including foci of microinvasion were more frequently observed in regions of Apcfl/flSmad4fl/flLgr5-CreERT2 in the caecum (Supplementary Fig. 1, author HM pathological review). Further examination of H&E stained small intestinal adenoma tissue sections Apcfl/flSmad4+/+ mice had larger sessile adenomatosis appearance compared to Apcfl/flSmad4fl/fl that had mainly tubular adenomatous appearance. Crypt cystic regions (dirty necrosis) were detected in both genotypes but no muscularis tissue invasion, lymph node involvement and metastasis were observed within the relatively short median overall survival (Fig. 1c, Supplementary Fig. 1). The efficiency of Cre-mediated Smad4 allele deletion was examined by immunohistochemistry (IHC, Fig. 1c). Smad4 protein was clearly detected in ApcΔ/ΔSmad4+/+ adenoma and adjacent tissue as expected. Adenoma from ApcΔ/Δ Smad4Δ/Δ did not label strongly with anti-Smad4 antibodies, consistent with adenoma-specific disruption in both the small and large intestine (Fig. 1c). The presence of active TGF-β (BMP) signaling was supported by p-Smad2 and p-Smad1/5/8 immunoblots (Fig. 1d). Both p-Smad2 and p-Smad 1/5/8 labelling were detected in adenoma at higher levels from ApcΔ/ΔSmad4Δ/Δ compared to normal tissue and littermate control adenoma, consistent with either TGF-β alone or with BMP ligand local context specific activation in combination (p < 0.001, Fig. 1d).

Fig. 1
figure 1

Adenoma growth following conditional disruption of Apc and Smad4 loxP alleles using Lgr5-CreERT2. a Breeding strategy for experimental mice with combined loxP conditional alleles with Lgr5-CreERT2 prior to 4-OHT injection in adult mice. Chromosome 18 and the location of the centromere (black dot) are shown for the location of both Apc (18qB1) and Smad4 (18qE2). Kaplan–Meier survival of Apcfl/flSmad4+/+, Apcfl/flSmad4fl/+ and Apcfl/flSmad4fl/fl mice post-Cre induction. Comparison between Apcfl/flSmad4+/+ and Apcfl/flSmad4fl/fl genotypes, *p = 0.011, Log-rank test. Caecal enlargement (arrow) from Apcfl/flSmad4fl/fl mice. Bar 1 cm. b Caecum (adenoma) weight, small intestine and colon total adenoma per mouse from Apcfl/flSmad4+/+ (n = 9), Apcfl/flSmad4fl/+ (n = 17) and Apcfl/flSmad4fl/fl (n = 13). Horizontal bars; mean. One-way ANOVA with Bonferroni’s multiple comparison test. ***p < 0.001. c Haematoxylin and Eosin (H&E) sections and anti-Smad4 antibody immunohistochemistry localisation in Apcfl/flSmad4+/+ and Apcfl/flSmad4fl/fl caecal adenoma. Arrows show crypt abscesses, upper panel. Smad4 localisation is reduced in adenoma and confined to adjacent normal colonic crypt cells in Apcfl/flSmad4fl/fl mice. Scale bar 100 µm. d Western blot of adenoma (T) and normal tissue (N) from Apcfl/flSmad4+/+ and Apcfl/flSmad4f./fl with anti-phospho-Smad2 (p-Smad2) and anti-phospho-Smad1/5/8 antibodies. Densitometric analysis for p-Smad2 represented as ratio to β-actin (for Apcfl/flSmad4+/+ n = 3 and for Apcfl/flSmad4fl/fl n = 4 experimental replicates). Significant increase in p-Smad2 in adenoma (T) compared to normal tissues (N) in Apcfl/flSmad4fl/fl mice. (See also, Supplementary Information Western Blots Fig. 1d). ***p < 0.001. Non-parametric one-way ANOVA with Dunn’s multiple comparison test. Mean ± S.D.

Combined Apc and Smad4 disruption modifies growth of adenoma organoids

In order to generate an organoid culture, we isolated ApcΔ/ΔSmad4Δ/Δ (Smad4Δ/Δ) and ApcΔ/ΔSmad4+/+ (Smad4+/+) caecal adenoma and generated in vitro cultures in Matrigel using published conditions24,29. Using a ROSA26-LSL-YFP reporter line bred with Apcfl/flSmad4fl/flLgr5-CreERT2, we confirmed that 40HT exposure resulted in uniform YFP expression in adenoma organoid culture that was also TGF-β1 independent (Fig. 2a). In culture, Smad4Δ/Δ adenomas appeared smaller and grew slower compared to Smad4+/+ adenomas, as measured by both area and the alamarBlue (AB) growth assay (Supplementary Fig. 3). Noggin is a BMP pathway inhibitor that normally prevents enterocyte differentiation and is required for crypt culture maintenance23,24,29. Noggin addition had no effect on adenoma growth (Supplementary Fig. 3). SB-505124, a TGF-βR1 small molecule inhibitor also had no significant inhibitory effect on Smad4+/+ and Smad4Δ/Δ adenoma growth (1 μM) without the exogenous addition of TGF-β1 (Supplementary Fig. 3). These data suggest that the adenoma culture conditions appeared to be independent of endogenous TGF-β supply.

Fig. 2
figure 2

TGF-β1 and Smad4 dependent signalling in intestinal adenoma organoids. a Expression of YPF reporter, E-cadherin localisation and DNA (DAPI) in ApcΔ/ΔSmad4+/+ and ApcΔ/ΔSmad4Δ/Δ adenoma organoids with and without exposure to TGF-β1 (390 pM). b TGF-β1 (range: 0.039- 3900 pM) induced growth inhibition of adenoma organoids quantified by alamarBlue (measured at 24 h. Data normalised to untreated control (100%) (n = 4 experimental replicates). Growth inhibition of 390 pM TGF-β1 rescued by pre-treatment with SB-505124 (ALK4,5,7 inhibitor) for 2 h (1 mM). Error bars ± S.D. c Representative immunoblots of Smad4, total R-Smads and p-R-Smads after TGF-β1 (2 h) from ApcΔ/ΔSmad4+/+ and ApcΔ/ΔSmad4Δ/Δ adenoma organoids with and without SB-505124. Graphs of p-Smad2/ Smad2 and p-Smad3/ Smad3 densitometry relative to β-actin (n = 3 experimental replicates). Non-parametric one way-ANOVA with Dunn’s multiple comparison. (See also, Supplementary Information Western Blots Fig. 2c). d Confocal immunofluorescence images of Smad2 and Smad3 nuclear localisation (DAPI) following TGF-β1 (390 pM) exposure and SB-505124 inhibition. Scale bar 50 µm. Associated Immunoblots of p-Smad2 at different time points (h) following TGF-β1 (390 pM), densitometry relative to β-actin. (See also, Supplementary Information Western Blots Fig. 2d). e Bright field confocal images of organoids 24–72 h following TGF-β1 exposure. Red arrows intact spherical (circular) organoid morphology, yellow arrows show shape change, red dotted line indicates organoids with marked disruption, cell spreading and cellular transformation. Quantification shows relative preservation of spheroid (circular) organoids in ApcΔ/ΔSmad4Δ/Δ. One-way ANOVA with Sidak’s multiple comparison test ****p < 0.0001. f Suppression of organoid proliferation following TGF-β1 treatment assessed with EdU incubation for 2 h. Confocal images of EdU incorporation (green) in nuclei (DAPI- blue) from ApcΔ/ΔSmad4+/+ and ApcΔ/ΔSmad4Δ/Δ and quantification using ImageJ (n = 10 organoids). One-way ANOVA with Sidak’s multiple comparison test *** p < 0.001, ****, p < 0.0001, respectively. g Cleaved Caspase 3 (CC3) activation following 24–72 h of TGF-β1 (390 pM). CC3 (arrowheads) labelling in confocal images supported by immunoblot labelling (18 h). Arrows indicate margin of organoids. (See also, Supplementary Information Western Blots Fig. 1g). Scale bar 50 µm.

To exclude whether there may have been a genotype dependent difference in the floxing efficiency between Apc and Smad4 that may have accounted for the two-fold differences in growth in culture, we also generated adenoma with an alternative conditional Cre (Villin-CreERT2). Here, cultured small intestinal organoids from healthy animals bred with non-floxed alleles, Apcfl/flSmad4+/+Villin-CreERT2Rosa26-YFP and Apcfl/flSmad4fl/flVillin-CreERT2Rosa26-YFP genotypes, were cultured in organoid media containing R-spondin-1, Noggin and EGF. Intestinal organoids that formed a normal crypt-villous morphology were exposed to 4-OHT followed by withdrawal of R-spondin-1. This resulted in adenoma organoid formation in both ApcΔ/ΔSmad4+/+ and ApcΔ/ΔSmad4Δ/Δ (Supplementary Fig. 4). Allele recombination was confirmed by PCR product of 259 bp and 512 bp for Apc and Smad4 alleles, respectively. Both the Apcfl/fl and Smad4fl/fl alleles recombined at the same time and rate following 4-OHT treatment, and were detected as early as day 2 post treatment (Supplementary Fig. 4). Thus, floxing efficiency appeared similar between alleles.

Deletion of MH1 domain of Smad4 following Cre-mediated recombination

To evaluate the expression of Smad4 mRNA in Smad4Δ/Δ and Smad4+/+ in vitro, we used bulk RNA-seq of adenoma organoids and discovered two different Smad4 mRNA transcripts. Smad4 gene tracking with Cuffdiff revealed transcripts from the intact Smad4 gene, and a second transcript from the Smad4 gene with exon 2 excised following loxP mediated recombination (Supplementary Fig. 2). Immunoblots confirmed expression of Smad4 full-length protein (61 kDa) in both Smad4+/+ and Smad4+/Δ adenomas, with a smaller band of (43 kDa) detected in protein extracts from Smad4+/Δ and Smad4Δ/Δ adenomas (Supplementary Fig. 2). In silico analysis revealed an open reading frame in exon 5 of the Smad4 sequence and mapping with ExPASy revealed that exon 2 deletion results in a truncated protein in which the Smad4 MH1 DNA binding domain is selectively deleted. The predicted size of the truncated protein was 43 kDa, comprising the linker and Smad4 MH2 domain (Supplementary Fig. 2). The full-length Smad4 protein in Smad4+/+ adenoma showed nuclear localisation following exposure of 390 pM TGF-β1 for 2 h. Under the same conditions, no nuclear localisation of Smad4 (43 kDa) was detected in Smad4Δ/Δ adenomas, consistent with loss of DNA binding function (Supplementary Fig. 2). Taken together, Cre-mediated recombination of the Smad4 gene resulted in selective MH1 deletion and a truncated protein that did not undergo nuclear localisation following TGF-β1 stimulation. These data, and the apparent reduction of detection using the same anti-Smad4 antibody in IHC of Smad4Δ/Δ adenoma, are consistent with non-DNA binding and presumably soluble truncated SMAD4 protein that depletes from of cells during tissue processing resulting in reduced detection signals.

TGF-β1 dose dependency in Smad4Δ/Δ adenoma organoids

We next tested the dose and timing effects of exogenous addition of TGF-β1 to adenoma organoid cultures on adenoma phenotypes and the subsequent localisation of R-Smads. TGF-β1 induced dose-dependent growth inhibition of both the Smad4+/+ and Smad4Δ/Δ adenomas, but with a 20-fold differential sensitivity (Fig. 2b). Smad4+/+ adenomas were more sensitive to TGF-β1, with an IC50 of 24 pM and threshold of 0.39 pM. For Smad4Δ/Δ, the TGF-β1 IC50 was 534 pM with a threshold between 39-390 pM. The decrease in adenoma viability with TGF-β1 dosing was prevented when the adenoma cultures were pre-incubated with SB-505124 (Fig. 2b). For subsequent experiments, a TGF-β1 concentration of 390 pM was used to differentiate the TGF-β1 context-dependent sensitivity effects between Smad4+/+ (100 fold higher than the threshold) and Smad4Δ/Δ (at threshold but below IC50) adenomas. TGF-β1 addition (390 pM) resulted in the detection of phosphorylated p-Smad2 and p-Smad3 in both the Smad4+/+ and Smad4Δ/Δ adenomas as expected (Fig. 2c). Despite the differential sensitivity, the p-Smad2 kinetics of response to TGF-β1 stimulation were similar between genotypes, with p-Smad2 maximally detected after 1–2 h and associated with nuclear localisation (Fig. 2d). These data also suggest that the increased levels of pSmad2 detected in the in vivo adenoma of were most likely due to increased local TGF-β-BMP ligand supply in ApcΔ/ΔSmad4Δ/Δ (Fig. 1d). Within 24 h of TGF-β1 administration, > 90% of the Smad4+/+ adenoma cells either had either morphological changes of apoptosis with organoid retraction, or changes in shape toward a spindle or mesenchymal phenotype (see Supplementary Movies 14). These features appeared delayed to 72 h in Smad4Δ/Δ adenomas but were still detectable (Fig. 2e). The reduced viability and altered morphology were accompanied by significantly decreased proliferation and increased apoptosis in Smad4+/+ adenoma relative to Smad4Δ/Δ adenomas when quantified by EdU and cleaved caspase 3 (CC3) labelling, respectively (Fig. 2f, g).

TGF-β1 and Smad4-dependent Id1 and Spp1 gene expression in adenoma organoids

To assess the transcriptome response to TGF-β1 that might inform differences between Smad4+/+ and Smad4Δ/Δ adenoma organoids, bulk paired-end RNA-seq detected 16,143 gene transcripts that we utilised for the analysis of differentially expressed genes (DEG)(Fig. 3a–h). Three separate mice per genotype provided adenoma biological replicates and were harvested at different times t = 0, t = 1 and t = 12 h after exposure to a 390 pM TGF-β1 (Fig. 3a). Principle component analysis (PCA) of gene expression showed a clear separation of the samples based on genotype and duration of TGF-β1 exposure (Fig. 3b). Comparison of DEG at baseline between Smad4Δ/Δ and Smad4+/+ at t = 0 (no TGF-β1) showed 111 upregulated genes (log2FC > 1.5) and 51 downregulated genes (log2FC < -1.5) (Fig. 3c, Supplementary Table 1). The most important observation was the magnitude of the genotype dependent expression of a group of genes that included Id1 (Inhibitor of DNA binding 1), Ddit4, Serpine1, Skil, Mucins, Myc, JunB and E2f. target genes (Fig. 3c, d, Supplementary Fig. 5). Gene ontology (GO) analysis of biological processes showed enrichment for epithelial cell proliferation and response to TGF-β1 (Supplementary Fig. 5, Supplementary Table 2). Gene set enrichment analysis (GSEA) using hallmark gene sets with “tmod” in Smad4Δ/Δ showed downregulation of Myc target genes, differential regulation of TGF-β signalling genes, mainly by upregulation, and enrichment of other pathways including IL2-Stat5 and TNF signalling (AUC > 0.5, padj < 0.0001, Fig. 3e, Supplementary Table 2). Importantly, mucins in Smad4Δ/Δ adenomas were among the DEGs with Muc4, Muc20, Muc2 being upregulated and Muc6 being downregulated (padj p < 0.05, supplementary Table 1, supplementary Fig. 6). Subsequent Mucin 2 immunolabeling also confirmed protein overexpression in Smad4Δ/Δ adenoma tissue sections (Supplementary Fig. 6).

Fig. 3
figure 3

TGF-β1 and Smad4 dependent bulk RNA expression signatures in adenoma organoids. a Adenoma organoids derived from 3 mice per genotype of ApcΔ/ΔSmad4+/+ and ApcΔ/ΔSmad4Δ/Δ were exposed to TGF-β1 (390 pM) (0, 1 and 12 h, eight wells per time point), were then pooled for paired-end bulk RNA-Seq (total of 18 RNA-Seq samples). b Principal components of transcriptional clusters according to genotype and time (vst-transformed). c Venn diagram classifies TGF-β1 and Smad4 dependent down-regulated and upregulated genes. Threshold utilised 1.5 log2FC and adjusted p-value < 0.05. d Volcano plots of differentially expressed genes comparing time and genotype (+/+ = ApcΔ/ΔSmad4+/+, Δ/Δ = ApcΔ/ΔSmad4Δ/Δ). In all volcano plots, genes with -log(p-adj) = 30 are shown. Cut-off values for log2FC (1.5) and adjusted-p-value (1e-3) are marked with dashed lines. ApcΔ/ΔSmad4Δ/Δ DEGs are much reduced in number following TGF-β1 treatment. e Enriched gene modules of DEG using the hallmark gene set enrichment (GSEA with tmod package). Comparisons shown in each column. Red and blue indicate the proportion of genes in a module that are either upregulated or downregulated, respectively. Width of each box relates to its effect size, where lighter, less saturated colours indicate lower significance with p-values. f Heatmap showing time-dependent changes in expression (log2FC) for the 50 genes with the lowest adjusted p-values (Likelihood ration test) and g pathway enrichment for these genes are shown generated using Metascape. h Summary of gene specific expression comparing ApcΔ/ΔSmad4+/+ and ApcΔ/ΔSmad4Δ/Δ with time post TGF-β1 exposure. Statistical comparison performed using the Wald test in Deseq2, followed by Benjamini and Hochberg multiple hypothesis correction.

For subsequent timepoints post-TGF-β1, DEGs were mainly observed in Smad4+/+ adenoma organoids and included Skil, Junb, Serpine 1, Ddit4 and Id1 (Fig. 3d, Supplementary Fig. 5, Supplementary Table 1). The differential suppression of Id1 in Smad4Δ/Δ compared to Smad4+/+ after 1 h achieved a 2.83 log2FC difference. By 12 h, Id1 maintained suppression (1.2 log2FC, Fig. 3d, Supplementary Table 1). Despite the repression of expression, Id1 protein abundance and immune-localisation did not appear to be significantly altered between adenoma genotypes (Supplementary Fig. 7). We also observed that Spp1 (Secreted phosphoprotein 1 or osteopontin), a monocyte/macrophage and extracellular matrix gene, was the only gene significantly upregulated gene following TGF-β1 exposure (12 h) in Smad4Δ/Δ (log2FC > 1.5 threshold, Fig. 3c, d, h, Supplementary Table 3).

Further significant time dependent differences in gene expression occurred by 12 h post-TGF-β1. For example, 35 DEGs were observed in Smad4Δ/Δ compared to 689 in Smad4+/+ adenomas at t = 12 h vs t = 0 h (Fig. 3d, Supplementary Table 1). Pathway enrichment analysis using tmod also showed that E2f., G2m, Myc and mTORC targets were more significantly downregulated in Smad4+/+ adenomas compared to Smad4Δ/Δ adenomas (Fig. 3e, Supplementary Table 2). Time series analysis using the likelihood ratios showed differences in expression of genes including Fos, Lama3, Aldh1a3, Col7a1 and Doks associated with increased induction by t = 12 h, whereas E2f8, Exo1, Clspn, Kif14 and Kif12 all were decreased in Smad4+/+ adenomas. Overall, the suppression of cell cycle genes was evident in Smad4+/+ compared to Smad4Δ/Δ adenomas, as shown by Metascape overrepresentation (Fig. 3f, g, Supplementary Table 3). Finally, TRRUST (Transcriptional Regulatory Relationship Unravelled by Sentence based Text mining) enrichment analysis implemented in Metascape, also suggested potential regulation by transcription factors, with Smad4-dependent downregulated genes being co-regulated by Sp1, Smad3 and Jun, and upregulated genes by Rbl2, Tp53 and E2f1 (Supplementary Fig. 5). Several gene signatures were therefore determined that were Smad4 and TGF-β1 dependent, with the two genes with the greatest magnitude of differential expression being Id1 (low) and Spp1 (high). A summary of gene specific expression of Id1, Fn1, Myc, Pak3, Sema5a, Spp1 and Slc14a1 are shown with comparison of ApcΔ/ΔSmad4+/+ and ApcΔ/ΔSmad4Δ/Δ with respect to time post-TGF-β1 exposure (Fig. 3h).

TGF-β1 dependent gene expression and protein correlation

Smad4 LOF did not fundamentally alter the range of TGF-β1 dose dependent phenotypic responses, but did change the overall magnitude and relative gene expression. Whilst TGF-β1 administration in Smad4+/+ adenomas resulted in a > 4.5 Log2FC relative increase in expression of Fn1, we did not observe a similar change in Fn1 protein by Western blot (fibronectin 1, Supplementary Fig. 8a). Similar findings were observed for N-cadherin and when N-cadherin was compared relative to β-catenin, even when EMT associated actin reorganisation was evident (Supplementary Fig. 8b, c). In addition, TGF-β1 exposure resulted in a 0.25 log2FC reduction in Survivin (Birc5) expression after 12 h in Smad4Δ/Δ compared to a 2.55 log2FC reduction in Smad4+/+ adenoma. Anti-survivin antibody western blots and nuclear localisation suggested detectable reduction in protein levels in Smad4+/+ adenoma by 18 h post- TGF-β1 (Supplementary Fig. 9). The Wnt dependent intestinal stem cell marker Lgr5 was downregulated in Smad4+/+ following TGF-β1 (12 h) compared to Smad4Δ/Δ adenoma (2 log2FC, Supplementary Table 1), also suggesting modification in Wnt pathway activity. E-cadherin and β-catenin proteins remained immunolocalised at the adherens junctions and in nuclei in both adenoma genotypes as expected (Supplementary Fig. 10). Immunoblots of adenomas from both genotypes showed similar multiple molecular weight bands for β-catenin and E-cadherin in the context of disruption of the β-catenin destruction complex by LOF of ApcΔ/Δ (Supplementary Fig. 10b, c). Gene expression for Myc was not markedly different between genotypes, yet c-Myc protein levels appeared increased in Smad4Δ/Δ (12–18 h) compared to Smad4+/+ adenomas (Supplementary Fig. 10c, Supplementary Table 1). Overall, there did not appear to be significant direct correlation between changes in gene expression and protein localisation and abundance between genotypes. This implies that protein abundance, regulated by numerous and variable processes independent of gene expression, may not be a universal and reliable surrogate for mRNA gene expression in this context.

Single cell RNA-seq of Smad4fl/fl caecal adenoma are enriched in Pak3Hi epithelial progenitor cells

In view of the differential changes in bulk mRNA expression, we next sought to quantify mRNA expression at the single cell level in the large adenoma directly harvested from the caecum. Epithelial cells (Epcam cluster) consisted of the Lgr5 stem cell (Lgr5 ISC) and seven stem cell-like (SC-L) clusters based on differential expression of Lgr5 and other stem-cell markers such as Lrig1, Smoc2, Hopx as previously defined30,31,32,33 (Fig. 4a, b. Supplementary Fig. 11, Supplementary Table 4). Differential enrichment in stem cell clusters was observed between the two genotypes, consisting of 61.8% of the total cell population in Smad4fl/fl adenoma compared to 47.3% in Smad4+/+ (Fig. 4c, d, Supplementary Table 3). A Pak3High cluster (Pak3; P21 Rac1/CDC42 activated kinase 3) was the most enriched in Smad4fl/fl (10.5%) compared to Smad4+/+ (2.5%) control adenoma. Pathway enrichment analysis for DEGs between Smad4fl/fl and Smad4+/+ for the Pak3high cluster showed an enrichment for the Wnt pathway and pluripotency markers (Supplementary Fig. 12). The same comparison in adenoma organoids between genotypes and all timepoints post TGF-β1 showed a non-significant difference in expression of Pak3, suggesting this was an in vivo adaptation that was not reproduced in organoids (Supplementary Table 1, Fig. 3h). The Slc14a1Highcluster was also enriched in Smad4fl/fl (7%) compared to Smad4+/+ (3.5%), where expression of the stem cell marker Ly6a was enriched (Sac1). Conversely, the Lgr5 ISC cluster was significantly enriched in Smad4+/+ (5.8%) compared to Smad4fl/fl adenoma (2.7%) (Fig. 4d). Smad4fl/fl showed significantly lower proportions of enterocytes (EC) as a consequence (7.8% Smad4fl/fl vs 23.4% Smad4+/+) yet Fmnl2Hgh, Cachd1High and Slc12a2High showed similar proportions between genotypes. Note that the SC-L and the Lgr5 ISC clusters originate from the same node when viewed with cluster tree (0.2 resolution) (Fig. 4a). The Tacstd2Hgh cluster (murine foetal intestinal progenitor gene34) and the Clhigh cluster (Clusterin marker of the revival stem cell35) are SC-L clusters with a similar proportions and originate from the same node (0 at 0.2 resolution). Markers for non-stem cells that lacked stem cell expression markers, included ribosomal gene expression33 in transient amplifying cells (TA) and the Paneth cells marker Pnliprp1 (pancreatic lipase-related protein 1)30, enteroendocrine precursor Gadd45g33, Mt3 and Pnliprp2 in secretory precursor cells (secretory pr).

Fig. 4
figure 4

Smad4 dependent adenoma cell populations using single cell RNA-seq from in vivo caecal adenoma. a Pooled caecum adenoma cells derived from different mice, Apcfl/flSmad4+/+ n = 3 and Apcfl/flSmad4fl/fl n = 5, were subject to scRNA-Seq. Cluster trees show the resolution from 0 to 1.2 with the EClpr and ECepr clusters identified at resolution 2.2. b Heatmap of the scaled expression of the top 5 genes identified in each cluster (genes selected based on the biggest difference between the cluster compared to all clusters). Lgr5 ICS = Lgr5 Intestinal stem cell; EC = enterocyte; EClpr = enterocyte late precursor; ECepr = enterocyte early precursor; TA = transit amplifying; CAF = cancer associated fibroblast; Secretory Pr = secretory precursor; hi = high referring to specific genes. c UMAP plot of the cell-type clusters from Apcfl/flSmad4+/+ (9674, n = 3) and Apcfl/flSmad4fl/fl (11,212, n = 5) based on the expression of known marker genes. d Bar graph; frequencies of clusters of the single cells in Apcfl/flSmad4+/+ and Apcfl/flSmad4fl/fl adenoma. Point range plot; the confidence interval for the absolute log2FC (obs_log2FC) for the different identified cell types. Significant results are labelled in red. Note enrichment of Pak3high, Slc14a1high, CAF and neutrophil populations and the reduction in ECepr and EClpr populations in Apcfl/flSmad4fl/fl adenoma. Permutation test with FDR adjustment for p-value. e UMAP plots comparing genotypes for single cell expression of ID1 (suppression in all cell types in Apcfl/flSmad4fl/fl), Spp1 (monocytes and macrophage expression) and Pak3 (increased expression in progenitor enterocyte population of Apcfl/flSmad4fl/fl). f Violin plots of pseudobulk RNA expression of Id1, Spp1 and Pak3. Statistical analysis using MAST with Bonferroni adjustment for p-value.

Using both the 1.2 and 2.2 resolution cluster tree, we identified two groups of cells; the enterocyte early precursor (ECepr) expressing Apoa1 Dmbt1, Reg4, and Reg3g, and the enterocyte late precursor (EClpr) expressing Reg3b, Mt1, and Mt233. In Smad4fl/fl, EClpr (0.26% vs 4.1%) and ECepr (1.3% vs 6.5%) were depleted compared to Smad4+/+ adenoma. The mature enterocyte clusters, EC-1 and EC-2, showed expression of Krt19, Krt8, Lgals3, Lypd8, Emp1, and Krt20 with differences in Mxd1, Cdkn1a and Fam3b expression. Significantly, 2.2% of Smad4fl/fl were EC-1 cells compared to 5.4% for Smad+/+ (Fig. 4a, b, c, d). Thus, scRNA-seq of in vivo adenomas identified Pak3high and Slc14a1high as genes with increased expression and relatively differentially enriched in progenitor epithelial cells in Smad4fl/fl caecal adenoma. UMAP plots comparing genotypes for single cell expression of ID1 (suppression in all cell types in Apcfl/flSmad4fl/fl), Spp1 (monocytes and macrophage expression) and Pak3 (increased expression in progenitor enterocyte population of Apcfl/flSmad4fl/fl) confirms the distribution changes for Pak3 and Id1 (Fig. 4e). Moreover, comparison of combined scRNAseq data using a pseudo-bulk approach, also confirmed significant downregulation of Id1 and upregulation of Pak3, consistent with the bulk-organoid RNA-Seq (Fig. 3h, Fig. 4f). In addition, Id1 and Sema5a were the uniquely deregulated genes that were common between organoid bulk RNA-seq and the primary pseudo-bulk caecal adenoma scRNA-seq data (Supplementary Fig. 13). For Spp1, the high overall mRNA expression in adenoma derived organoids was not matched in the caecal derived single cell mRNA in epithelial cells, presumably as expression was confined mainly to the monocyte-macrophage lineage not present in organoid cultures (Fig. 4f).

Smad4 dependent ID1, SPP1 and PAK3 expression in human colorectal cancer

Having identified TGF-β1-BMP-Smad4 pathway dependency of Id1low and Spp1high gene expression, and Pak3high expressing cell enrichment in primary caecal intestinal adenoma, we next explored whether any of these functional gene expression biomarkers are also present in human colorectal cancer (Fig. 5, Fig. 6, Supplementary Table 5). Data was derived from 382 individual cases from the human Cancer Genome Atlas project (TCGA; COAD, Colon Adenocarcinoma, READ, Rectal adenocarcinoma) that had both mutational profiles in c-BioPortal and RNA-seq data. We tested whether the co-existing mutational context with and without SMAD4 variants in colorectal cancer altered SMAD4 dependent gene expression and key regulatory gene expression pathways. We first assembled common mutational subgroups of colorectal cancer based on WNT-APC-β-catenin, RAS-MAPK, PI3K, and TP53 pathway variants (Fig. 5a). This test matrix derived four variant based sub-groups named Group 1–4 (Fig. 5a). Firstly, colorectal cancers with confirmed WNT-APC-β-catenin variants were selected and subsequently sub-divided. Group 1 and Group 2 cases were SMAD4 wild-type and Group 3 and 4 cases had SMAD4 pathogenic variants. Groups 1&3, and 2&4 differed depending on the presence or absence of second pathway variants in either the RAS-MAPK, PI3K or TP53 pathway, respectively. We next utilised the 76 DEGs identified in murine adenoma organoids and applied the equivalent genes to the human colorectal RNA-seq data. Volcano plots revealed only 7 of these genes were significantly differentially expressed based on the presence (n = 44) or absence (n = 191) of a SMAD4 pathogenic variant (Fig. 5b). These genes were BCAS1, CACNB2, CREB3L1, ID1, ID3, RASGRF2 and SLC30A2. Further comparison of the subgroups based on these genes along with second pathway variants revealed 26 gene-context pairs that were significantly upregulated and 15 gene-context pairs that were significantly downregulated (Fig. 5c, d). Despite the correlation established for Spp1 and Pak3 in mouse adenoma organoids, these two genes did not appear significantly differentially expressed in human colorectal cancer with respect to SMAD4 variants (Fig. 5c). Importantly, the upregulated expression of CREB3L1 and BCAS1 and downregulated expression of ID1 and ID3 appeared least impacted by the co-existing 2nd (cancer) pathway variants based on visualisation of the heat maps. However, analysis based on our three predefine criteria (see Methods) of the comparison of the increased expression of CREB3L1 and BCAS1, and repression of ID1 and ID3, only ID1low met the significance criteria. Analysis of the TCGA cohort with respect to either the presence or absence of any additional TGF-β/BMP pathway variants also showed that ID1 expression was also significantly downregulated with co-existing SMAD4 and other TGF-β pathway downstream variants compared to normal control tissues (Supplementary Table 5).

Fig. 5
figure 5

SMAD4 pathogenic variant dependent expression biomarkers for the Smad4-TGF-β gene-pathway in human colorectal cancer (TCGA). a Flow diagram of TCGA bulk somatic DNA and RNA NGS sequencing data (COAD = colon adenocarcinoma, READ = rectal adenocarcinoma) with known WNT pathway driver variants (n = 326). Pathogenic variants for TGF-β, activin or BMP receptors and Smad 2/3/4/5/7/9 were identified (n = 135) from wild-type cases (n = 191). Sub-groups (G1-4) were then assembled with or without co-existing common pathway pathogenic variants in either RTK-RAS, TP53 or PI3K pathways. b The human equivalents of the 76 significantly differentially expressed genes obtained from the comparison of bulk mRNA expression in the mouse adenoma were then compared in the human TCGA mRNA data set. Volcano plot of median log2 fold-change against adjusted p-value (-Log10) using Mann–Whitney U test (non-normal distribution using Shapiro–Wilk test) identified 7 human genes (in red) significantly different in expression between Smad4 wild-type (n = 191) and Smad4 pathogenic variants (n = 44) [BCAS1, CACNB2, CREB3L1, ID1, ID3, RASGRF2 and SLC30A2]. c, d Comparison between sub-groups with and without context dependent variants G1-G4 are shown as heat maps only for the significantly differentially expressed genes (median RSEM expression log10). Non-parametric Kruskal–Wallis test (FDR adjustment p < 0.05) was followed by Dunn’s test and Holm adjustment for multiple comparisons. 26 gene expression-context pairs were identified to be upregulated in the TCGA cohort (c) and 15 gene expression-context pairs were significantly down regulated (d). Spp1 and Pak3 median expression are shown separately but were non-significant between groups (right heatmap). Of the comparison of the increased expression of CREB3L1 and BCAS1 and repression of ID1 and ID3, only ID1 met the significance criteria.

Fig. 6
figure 6

SMAD4, ID1, SPP1 and PAK3 expression biomarkers for the Smad4-TGF-β gene-pathway and human colorectal cancer (TCGA) survival. a Comparison of ID1, SPP1 and PAK3 mRNA expression in cases with somatic pathogenic variants in either SMAD4 alone, SMAD4 with other genes in the TGF-β/BMP pathway (TGF-β SMAD4 mutated), single gene variants in either the TGF-β or BMP pathway (TGF-β or BMP mutated), more than one variant in TGF-β /BMP (TGF-β and BMP mutated) or variants of BMP pathway genes alone (BMP mutated). Statistical analysis was performed using Kruskal–Wallis test (p < 0.0001) followed by Dunn test multiple comparison test adjusted by Holm test (shown in the plot). Note the significant repression of ID1 and increase in SPP1 in SMAD4 mutated cases. b Kaplan–Meier survival curves showing survival differences between high and low gene expression of SMAD4, ID1, SPP1 and PAK3 (left). Cut points were determined using the maximally selected rank statistics (maxstat) method and are shown for each gene (right). Statistical significance between the High vs low expression groups determined with the Log-rank test (p values shown).

Compared to non-SMAD4 associated TGF-β/BMP pathway pathogenic variants (e.g. in Type 1 receptors, other SMADs or in the BMP pathway), SMAD4 variants specifically appeared to have the greatest impact on overall ID1 repression (Fig. 6a, p < 0.0001). In the context of all TGF-β/BMP pathway mutations combined, however, SPP1 mRNA did appear to be expressed significantly higher compared to cases without variants in the TGF-β and BMP pathway (p = 0.044, respectively, Dunn test, Fig. 6a). For PAK3, no significant overall expression differences were observed between SMAD4 mutational groups (Fig. 6b). Having established that the Smad4 dependent gene expression signature in mouse adenoma only overalapped with the SMAD4 dependent gene expression in human colorectal cancer with respect to ID1low, we next evaluated whether any of the expression biomarkers impacted on the survival outcome of colorectal cancer. By using maximum selected rank statistics to define a single cut-point (Fig. 6b), Smad4low expression (p = 0.0019) and ID1low expression (p = 0.0062) as well as SPP1high (p = 0.0072) and PAK3high (p = 0.012), all appeared to be associated with significantly poorer overall survival in the TCGA cohort (Kaplan–Meier, Fig. 6b). The less significant impact of PAK3high may reflect the relative low proportion of PAK3high expressing sub-populations of cells within tumours.

Discussion

The identification of cancer cells that have disruption of the TGF-β-BMP-SMAD4 pathway may have important implications for risk assessment for invasion and metastasis. Validated models are required in order to discover mechanism and pathway specific gene biomarkers2. Moreover, validated RNA based molecular prognostic classifiers for colorectal cancer have rapidly developed in line with the now clinically applied breast cancer equivalents. Such bulk RNA based machine learning classifiers have identified several clusters (e.g. CRISB) that include a stromal based signature that correlates with poor prognosis in colorectal cancer3,36,37. A supervised machine learning CMS1-4 classifier (Consensus Molecular Classifier) has correlated gene level expression signatures with four prognostic groups, where the poor prognostic group CSM4 was correlated with EMT and TGF-β dependency, yet the classifier does not utilise specific RNA expression that are validated readouts of TGF-β-BMP-SMAD4 pathway disruption. With respect to the latter, human colorectal polyp derived organoids have recently been used to discover TGF-β dependent gene sets in human serrated adenomas combined with a BRAFV600E mutations, where there was an induced mesenchymal phenotype, yet this model did not address specific context of Smad4 loss of function38. Further analysis of classifiers of the human expression data has also newly identified three classes of pathway derived subtypes (PDS1-3) in colorectal cancer with and without KRAS mutations, where CSM4 and CSM1 combine in the PDS2 subtype where TGF-β-BMP-SMAD4 pathway expression profiles appear enriched4. These subtypes appeared independent of KRAS driver gene variant classes and phenotypic appearance, and for PDS1 and 2, correlated with RNA expression data from mouse genetically modified models. Despite the likely pathway specific variations, gene-pathway specific gene expression signatures are becoming better understood and being directly applied from mouse model systems to human intestinal cancer classifiers. In this murine system, we sought to evaluate gene-pathway specific gene expression signatures for Smad4.

We discovered that the generation of homozygous Smad4 with Apc LOF alleles was associated with a discordant intestinal phenotype, with adenoma suppression in the small intestine and adenoma progression in the caecum. By deriving organoid cultures from the adenoma, we discovered that the high sensitivity to TGF-β1 induces similar EMT and cell death phenotypes in a dose dependent manner, with a relative tenfold dose resistance to TGF-β1 in Smad4Δ/Δ adenoma. Following TGF-β1 exposure, we identified genes with extremes of differential expression by genotype, including reduced Id1 and increased Spp1 gene expression as consistent features in Smad4Δ/Δ murine adenoma organoids. The gene expression changes as consistent with Smad4 independent activity of the R-Smads, in this case this is likely to be mainly Smad339,40,41. Single cell RNA sequencing of the primary caecal adenomas also detected enrichment of the Pak3High expressing progenitor epithelial cell population in Smad4Δ/Δ but no changes in bulk expression in adenoma organoids. We assume that these differentially expressed genes are the main underlying differences that account for functional discordant progression of adenoma, for example Id1Low, although this will need to be directly tested in future experiments.

Initial models of Smad4 LOF in the intestine have shown Smad4 dependent progression phenotypes, that have been further evaluated using conditional disruption42. For example, Smad4+/- mice spontaneously developed juvenile polyposis (JP) at 1.5 years, and invasive adenocarcinoma when crossed with Apc∆716/+ mice42,43,44,45. In addition, an Apc1638N/+Smad4fl/flK19-CreERT2 model appeared to show an increased small intestinal and colonic adenoma burden46. SMAD4 LOF also blocked differentiation during tumour progression in an orthotopic organoid transplantation model mimicking an intestinal adenoma-carcinoma transition47. Additional murine models incorporating disruption of the TGF-β-BMP pathway (Tgfbr2, Smad3) with other contextual drivers (e.g. Apc, Kras, inflammation) have also shown increased intestinal invasion and metastasis that are strongly pathway dependent42,48,49. Selective disruption of Apc and Smad4 in the caecal and colonic epithelium using a carbonic anhydrase CreER also resulted in pathway dependent invasive adenoma50. CRISPR/Cas9-mediated genome editing of human intestinal epithelial organoid cultures introduced mutations in APC, TP53, KRAS and SMAD4 which, when combined and selected, developed aggressive invasive carcinomas23. Moreover, these findings may explain the selection of deletions of chromosome arms 18p and 18q (including SMAD4) in regions of human adenoma and 66% of invasive colorectal cancers51,52. The generation of homozygous floxed alleles in this conditional model results in rapid adenoma formation within 30 days, as it avoids stochastic LOH of the WT allele in the Apc truncated variant alleles of previous models. As a result, temporal development of further invasion and metastasis are restricted compared to models that are either dependent on time for LOH of the Apc allele or have lower recombination events. In our case, strict use of humane endpoints because of anaemia and intestinal obstruction are likely to happen before any prospect of the detection of invasive and metastatic sites. The discordant pattern of adenoma development in Apcfl/flSmad4fl/fl suggests that the caecum and colon may be more prone to adenoma initiation and progression when both genes are disrupted. As with similar mouse models, this is most likely to be due to the caecal inflammatory microenvironment and the local TGF-β-BMP ligand supply. The anatomy of the mouse appendix and colon are markedly different from that of the human, even so the model may be relevant to the rare mucinous type adenocarcinomas that can arise in the human appendix. In terms of the inflammatory microenvironment, we show downregulation of Pla2g2a (2.5log2FC) in Smad4Δ/Δ adenoma. Pla2g2a is considered to be a tumour suppressor independent of the Wnt/β-catenin signalling pathway, and intraluminal secretion of Pla2g2a is a host defence mechanism during the active phase of ulcerative colitis53. Moreover, Reg3γ and Reg3β were also downregulated (1.2log2FC) in Smad4Δ/Δ adenomas and as both genes are important for epithelial defence against bacterial invasion54,55. Thus, adenoma Smad4 specific alteration in genes expression may indirectly alter the inflammatory and innate immune environment. A further confounder to the phenotype may be modifier alleles co-segregating with LoxP alleles, although this was mitigated by backcrossing to > 10 generations of all conditional alleles to C57Bl/6J27. Excess TGF-β-BMP may also directly promote adenoma formation independent of Apc LOF, as targeted deletion of Bmpr1a can also lead to crypt expansion, fused villi and adenoma formation due to an increase in β-catenin56. Overall, adenoma Smad4 LOF may stimulate the conditions for increased local TGF-β1 ligand supply and further stimulation of the pathway, contributing to differentiation arrest, chronic inflammation and subsequent adenoma progression.

The dose dependency of TGF-β1 ligand supply with respect to adenoma organoid cell growth phenotype has not been previously described to our knowledge, but may contribute functionally to the manifestation of differential phenotypes with Smad4 LOF; at a given dose, pathway gene expression positively modifies differentiation and EMT function, but at the same time there is resistance to cell death and cell cycle arrest. The time-dependent TGF-β1 on gene expression used a fixed concentration of 390 pM which exceeds the TGF-β1 IC50 of Smad4+/+ adenoma, resulted in differential expression of genes that were Smad4 and TGF-β1 dependent, with the most significant being decreased expression of Id1. Id1 codes for a non-DNA binding inhibitor of basic-helix-loop-helix (bHLH) transcription factors (e.g. GATA4) to inhibit multiple cell processes including differentiation. ChiP-Seq has shown that Smad3 and Smad4 can bind the Id1 promoter to directly stimulate early induction following TGF-β1 exposure40. Feedback Id1 repression is then thought to be also pSmad3 dependent but through feedback activation of ATF3, that then directly represses Id1 expression57. Id1 over-expression in pancreatic ductal adenocarcinoma in mouse and human is also associated with more invasive and undifferentiated cells, a mechanism proposed to be dependent on EIF2 signalling58 and independent of the TGF-β pathway59. Id1 proteins detected by IHC may show increased expression in colorectal cancer and mouse models, and one now retracted report proposed that a genetic knock-out of Id1 results in adenoma suppression in the small intestine of ApcMin/+, but not when colitis is chemically induced. Id1 protein localisation may not correlate with gene expression in tissues, as we have observed, and so is likely to context dependent at the protein level60. Recently, a Nur77 based molecular switch has also been proposed to regulate Id1, where stabilisation of TGF-β induced Smad3 degradation leads to an increase in Id1 expression, whereas in the absence of TGF-β, Nur77 enhances Smad3 degradation and results in reduces Id1 expression61. The signalling of TGF-β and Id1 associated gene expression may not be solely dependent on Smad4, as Smad1/5 dual signalling through TGFBR1 and ACVR1 also results in increased Id1 expression and EMT60.

Spp1 (osteopontin) expression normally occurs in osteoclasts and macrophages, but has also been specifically detected in tumour associated M2 macrophages, with senescence associated secretary phenotype (SASP) epithelial tumour cells and at the invasive front of human colorectal tumours62,63,64,65,66. As well as extensive circumstantial data associating high SPP1 expression in human colorectal cancer, Spp1 was also previously reported to be a key Smad4 dependent expressed gene functionally associated with murine prostatic cancer invasion in a Pten prostate deletion model67. High SPP1 expression may have therapeutic implications, as immune exclusion correlates in the tumour microenvironment with cancer associated fibroblasts resulting in resistance to immune checkpoint therapy68,69.

Pak3 is a well-known downstream effector of Cdc42 and Rac1 GTPases that mediates motility and MAPK activation associated with cancer invasion and metastasis70,71,72. In a recent adenovirus Cre-induced KrasG12D, p53fl/fl, Smad4fl/fl model of lung cancer, increased metastasis was associated with Smad4 LOF and Pak3 activation, the latter identified as a downstream effector of Smad4 via the PAK3-JNK-Jun pathway72. Interestingly, the proposed mechanism appeared to be due to attenuation of Smad4 dependent transcription of miR-495 and miR-543 that can directly bind to the PAK3 3’UTR. Overexpression of PAK family members are also associated with CRC outcome, transition of adenoma to carcinoma and stabilisation of β-catenin73,74,75,76,77,78,79.

There are significant limitations in comparing mouse adenoma derived gene expression signatures and human colorectal cancer in this study, hence we would consider the comparison with respect to the evolutionary coserved TGF-β-BMP-SMAD4 pathway as being exploratory. The numerous challenges that restrict the phenocopy of the mouse models to human colorectal cancer concern the the developmental, physiology and regulatory gene expression profiles are fundamentally different between human and mouse, best exemplified by the different distributions to adenoma formation when the Apc gene is disrupted, small intestine in the mouse and colon in the human. In terms of the comparison of pathway dependent gene expression signatures, of the 76 differentially expressed TGF-β-BMP-SMAD4 gene-pathway dependent candidate gene expression signatures in mouse, only 7 of these DEGs were also identified in human, several of which (CREB3L1, BCAS1, ID1 and ID3) also appeared independent of common co-existing pathogenic variants in the RTK-RAS, TP53 and PI3K pathways, but only ID1low remained significant. SMAD4, ID1, SPP1 and PAK3 mRNA expression also appeared to have prognostic implications for colorectal cancer using the TCGA cohorts, and with further evaluation required in extended human cohorts, are candidate functional readouts associated with disruption of the TGF-β-BMP-SMAD4 pathway. In summary, we identified TGF-β and SMAD4 gene expression signatures in mouse adenoma, both for bulk and single cell sequencing, that require future prospective evaluation of pathway based prognostic-predictive classifiers across intestinal cancer subtypes4.

Materials and methods

Conditional Cre intestinal adenoma model

Animal work was performed under a UK Home Office licence, and approved by University of Oxford Animal ethics committee. All methods were carried out in accordance with relevant guidelines and regulations. Home Office licence approvals include planned experimental design, statistical considerations, procedures (including schedules of humane killing in this case using enflurane anaesthesia prior to neck dislocation) and outcomes and are in full accordance with the Arrive guidelines (https://arriveguidelines.org). Mouse lines were obtained from Elisabeth Robertson, Sir William Dunn School, Oxford University, U.K. (Smad4fl/fl and RosaYFPfl/fl)25 and Alan Clarke (deceased), Cardiff University, U.K. (Lgr5-CreERT2, Apcfl/fl and Vil-CreERT2)24. The mouse lines were backcrossed to C57Bl/6J as reported including Smad4fl/fl (Smad4tm1Rob) (Supplementary Material and Methods). Genotyping PCRs were briefly (5'-3', forward-F and reverse-R): Apc F-GTTCTGTATCATGGAAAGATAGGTGGTC and R-CACTCAAAACGCTTTTGAGGGTTG (GAGTACGGGGTCTCTGTCTCAGTGAA for recombined product), Smad4 F-CTTTTATTTTCAGATTCAGGGGTTC (F-AAAATGGGAA-AACCAACGAG for recombined product) and R-TACAAGTGTATGTCTTCAGCG. Genotype combinations were generated by experimental breeding. Smad4fl/flRosaYFPfl/fl and Apcfl/flSmad4fl/flRosaYFPfl/fl were crossed with either Lgr5-CreERT2 or Vil-CreERT2 mice. Apcfl/flSmad4+/+RosaYFPfl/fl Lgr5-CreERT2/Vil-CreERT2 animals were used as controls. Mice bred with Vil-CreERT2 mice were used for an in vitro experiment only. To generate combined heterozygote Apcfl/+ Smad4fl/+loxP alleles, heterozygote Apcfl/+ was bred with homozygote Smad4fl/fl and combined heterozygotes (both loxP alleles located on mouse Chromosome 18) were inter-crossed to generate homozygote alleles and the experimental mice as in Fig. 1a. For Lgr5-CreERT2 activation, adult mice age ≥ 8 weeks had an intra-peritoneal injection of 5 mg (200 mg.kg−1) tamoxifen dissolved in corn oil. Mice were monitored daily, small intestinal and colonic adenomas were scored for number, diameter, distribution and weight as described27.

Intestinal adenoma organoids

Adenoma formed in vivo from Smadfl/fl or Smad4+/+ Lgr5 mice were cultured as described27,80. For crypt culture, the small intestines were collected from Smad4fl/flRosaYFPfl/fl and Apcfl/flSmad4fl/flRosaYFPfl/ Vil-CreERT2 mice and processed as described80. Briefly, crypts were cultured in Matrigel and supplemented with ADF2 and crypt growth factors, ENR (8.1 nM EGF (E), 500 ng ml-1 R-spondin1 (R), and 2.16 nM Noggin (N)) and incubated at 37 °C, 5% as described. At D3, crypts generated from transgenic mice were incubated with 500 nM 4-hydroxy-tamoxifen (4-OHT) (Millipore, 579,002) dissolved in 100% EtOH (test), vehicle (negative control) with 5 µl EtOH for 12 h at 37 °C. The test and the vehicle treated organoids were supplemented with EN only (no R) and the normal crypts with ENR. Proteins were extracted from organoids as described81.

Cell labelling and imaging

Neutral buffered formalin fixed tissues were paraffin embedded and 5 µm sections were processed either by immunohistochemistry (IHC) or immunofluorescence (IF). The antibodies that used in this study are listed (Supplementary Materials and Methods). Organoids growing on coverslips in 24 well plates were used for IF. Organoid proliferation used the thymidine analogue 5-ethynyl-2’-deoxyuridine (EdU) for 2 h using Click-iT EdU Alexa Fluor-555 Imaging Kit (Invitrogen, C10338). H&E and IHC slides were viewed by brightfield-fluorescent microscopy (Olympus BX60). Images were acquired by CCD camera and Nuance 2.10 software (CRI, Massachusetts). Confocal images were captured with an Olympus FluoView FV1000 using 40 × 1.3NA lens and associated software. Labelled slides (EdU, IHC or IF) were analysed using ImageJ 1.48 software. AlamarBlue (AB) (AbD Serotec, BUF012B), was used to quantify adenoma growth. Absorbance after 24 h was read at 570 nm and 600 nm using a Spectramax M5 plate reader (MTX Lab Systems). Live cell imaging was conducted with a Zeiss Axiovert 200 inverted microscope with a BSI Prime CMOS camera with a Neofluar 10 × 0.3NA lensfor adenoma organoids treated with TGF-β for 24 h. MetaMorph software (v 7.8.12.0) multidimensional Z-stack acquisition with 100 ms exposure acquired at regular intervals of 15 min over an extended duration of 6–18 h culture at 37.5 °C, and 5% CO2 in air. Figure images were assembled following auto adjustment of tone, colour and brightness/contrast.

TGF-β1 induced gene expression bulk RNASeq

Early passage adenomas seeded at 9,000 cells per 25 µl well were used for RNA-Seq. Adenomas were grown in ADF2 medium supplemented with 8.1 nM EGF, 10 µM Y-27632 and 1 µM SB-505142. The medium was changed on D2, the cells were washed, EGF and 390 pM TGF-β1 added and samples extracted from three different time points (t = 0, t = 1, t = 12 h post TGF-β1 treatment). Three biological replicates from each genotype and RNA was pooled from 8 wells per treatment. RNA extraction was performed using Quick-RNA MiniPrep (Zymo Research, R1054), assessed using a Nanodrop-1000 spectrophotometer and RNA Clean & concentrator™-5 (Zymo Research, R1015). RNA-Seq was performed at the Wellcome Trust Center for Human Genetics, University of Oxford. Briefly, directional multiplexed mRNA and miRNA Libraries were prepared. All mRNA was ribo-depleted and converted to cDNA. Paired-end sequencing (100nt) were run on a HiSeq 2500 (Illumina) for mRNA-Seq.

Transcript abundances from RNA-Seq data were quantified using the Tuxedo pipeline82. Fastq reads were aligned to the mouse genome (mm10/GRCm38) with the TopHat-Bowtie2 aligner, versions 2.1.0 and 2.3.4, respectively and expression of transcripts was quantified with Cufflinks 2.2.1, as fragments per kilobase per million mapped reads (FPKM). Counts were also generated using the feature ‘Counts’ from the Subread package. Pairwise differential expression analysis between groups was performed in R using DESeq2 version 1.2883. Genes that have less than 5 reads in at least 3 samples were removed. Genes with adjusted p-values (FDR) < 0.05 were considered significant. Shrunk log2fold change (log2FC) was calculated using as a shrinkage estimator. Genes considered as differentially regulated between 2 conditions have ± 1.5log2FC. Time-series analysis was performed using the likelihood ratio test implemented in DESeq2. Gene ontology term enrichment analysis for the significant DEG genes performed using “clusterProfiler” where we evaluated biological processes and molecular functions84. For easier visualisation, we used “tmod”package for enrichment analysis (https://peerj.com/preprints/2420v1). In tmod, we applied CERNOtest, Coincident Extreme Ranks in Numerical Observations (CERNO) after converting mouse genes to human orthologs. We performed the enrichment analysis using Hallmark gene sets from MSigDB v7.2 (Broad Institute) and the blood signature from LI modules in tmod, gene sets with the area under the curve (AUC) ≥ 0.5 and padj < 0.05 were considered positive. Metascape website was utilised for pathway analysis of certain gene lists, and to identify the transcription regulatory network based on TRRUST ontology85.

Early TGF-β target genes were classified based on log2FC ± 1.5 expression and adjusted p < 0.05 after 1 h compared to t = 0 in Smad4+/+ adenomas. Transient early upregulated/downregulated genes were the genes with opposite directions to their initial regulation when compared to late time points (t = 12 vs t = 0 and/or t = 12 vs t = 1 h) in Smad4+/+ adenomas. Late TGF-β upregulated/downregulated genes were classified according to their upregulation/downregulation at 12 h in Smad4+/+ adenomas compared to t = 0 and/or t = 1 h with log2FC ± 1.5 and adjusted p < 0.05. Genes were considered Smad4-independent if they were significantly expressed in both Smad4+/+ and Smad4Δ/Δ adenomas with TGF-β1 treatment compared to their corresponding untreated baseline. However, if genes were differentially expressed in an opposite direction to that of TGF-β target genes in Smad4Δ/Δ vs Smad4+/+ at t = 0, they were classified as Smad4-dependent genes and excluded from the Smad4Δ/Δ vs Smad4+/+ t = 0 list. TGF-β target genes that were only upregulated in Smad4+/+ adenomas were classified as Smad4-dependent genes. Genes that were expressed in Smad4Δ/Δ vs Smad4+/+ at t = 0 h and were not among TGF-β target genes and considered for baseline comparisons.

Caecal adenoma single cell mRNA sequencing

Pooled single-cell suspensions from were obtained from caecal tumours from three Apcfl/fl,Smad4+/+, Lgr5CreERT2 and five Apcfl/fl, Smad4fl/fl, Lgr5CreERT2 mice. Cell suspensions underwent library preparation using the Chromium Single Cell 3’ kit (10X Genomics) and sequencing using Illumina HiSeq 2 × 150 bp (Azenta, USA), with approximately 25,000 reads per cell. Sample demultiplexing, barcode processing and single-cell 5′ unique molecular identifier (UMI) counting (Cell Ranger Software Suite v7.0.1) with FASTQ alignment to the mouse reference transcriptome mm10-3.0.0. The following criteria were then applied to each cell: gene number between 250 and 7,000, UMI count > 500 and mitochondrial gene percentage < 0.15, log10 Genes Per UMI > 0.80. Doublets were removed using the scDblFinder library. Genes with ten read counts or less were removed. A total of 20,886 cells were analysed (11,212 for Smad4fl/fl and 9674 for Smad4+/+). Dimensionality reduction and clustering used the filtered gene-barcode matrix normalised using the SCTransform with Seurat (v4.3.0)86. Variables including mitochondria Ratio, nUMI, nGene and the cell cycle gene difference were regressed and the top 3000 variable genes were used. The filtered gene-barcode matrix of all samples were then integrated, k nearest neighbour (k-NN) generated using FindNeighbors (Seurat) using the top 40 principal components and clusters identified using FindCluster function (Louvain algorithm). We used resolution 1.2 to classify the different cell types and resolution 2.2 to determine the enterocyte precursors based on clustree (v0.5.1)87. Gene markers that identified each cluster relative to all other clusters was determined with FindAllMarkers function in Seurat using the Wilcox test (log-fold change threshold 0.25). Manual cell labelling of the top marker genes for each cluster utilised previous scRNA-seq studies30,32,88. Significant differences (p values, and confidence intervals via bootstrapping) in cell type proportion between each genotype was determined with the permutation test implemented in scProportionTest package in R (V4.2.3). Data was visualised using Uniform Manifold Approximation and Projection (UMAP) dimensional reduction and plots were generated using Seurat and the scCustomize library. Each cluster was determined using FindAllMarkers function in Seurat (Wilcox test). The log fold change threshold was set to 0.25. Manual cell labelling utilised gene clusters in previous studies30,31,32,33, the Lgr5 intestinal stem cell signature32 and the CellMarker 2.0 database88. The significant differences in cell type proportion used the permutation test implemented in scProportionTest package in R which calculate a p-value for each cluster, and a confidence interval for the magnitude difference via bootstrapping89. All analysis was performed in R V4.2.3. Differential gene expression analysis for each cluster between Smad4fl/fl and Smad4+/+ were performed using MAST90 with FindAllMarkers function in Seurat. Differentially expressed genes were selected based on log fold change threshold and expressed in at least in 25% in one of the groups. Pathway analysis was performed using Metascape85. Pseudo-bulk expression analysis was performed using MAST cell implemented in Seurat’s FindMarkers function with the first and second identities using all Smad4+/+ and Smad4fl/fl cells, respectively.

TCGA colorectal cancer analysis

Normalised gene expression from RNA-seq (illumine hiseq Level 3 RSEM normalized genes) retrieved from http://firebrowse.org/?cohort=COADREAD accessed October 2019.

Genes and pathways that are frequently mutated in colorectal cancer, retrieved from Cancer Genome Atlas project (TCGA)51, include the Wnt/β-catenin pathway (APC, CTNNB1, DKK1, DKK2, DKK3, DKK4, LRP5, FZD10, FAM123B:AMER1, AXIN2, TCF7L2, FBXW7, and ARID1A), TP53 pathway (ATM, TP53). PI3K (IGF2, IRS2, PTEN, PIK3R1, and PIK3CA), RTK-RAS pathway (NRAS, KRAS, BRAF, ERBB2, and ERBB3). TGF-β pathway (ACVR1B, ACVR2A, SMAD2, SMAD3, SMAD4, TGFBR1 and TGFBR2) and BMP pathway (SMAD1, SMAD9, SMAD5, BMPR1A, BMPR1B and ACVR1). Genomic variants (missense mutation, homozygote deletion, amplification or fusion gene) in these genes for COAD and READ TCGA cohorts were retrieved from cBioPortal (https://www.cbioportal.org), accessed December 2019. In COAD and READ TCGA cohorts the variant status was available for 56 patients with SMAD4 variants. 98.1% of these patients had alterations in the Wnt/β-catenin pathway, 59.3% had alteration in the TP53 pathway, 46.3% in the PI3K pathway and 64.8% had variants in the RTK-RAS pathway. Normalized RNA-seq was available for 51 normal tissues and 326 patient tumours with variant data. In TCGA COAD-READ cohorts, 44 patients with probable pathogenic and possibly pathogenic SMAD4 variants, but with no other variants in the TGF-β pathway, had RNA-seq data. One patient did not have a variant in the Wnt pathway and was excluded from the analysis. Patients with known variants in the TGF-β pathway were excluded from the control group. Using this patient group, we validated the top 76 DEG in the mouse adenomas, genes with ± 2log2FC and padj < 0.05 from Smad4Δ/Δ vs Smad4+/+ (t = 0 h), either in the presence or absence of variants in the PI3K, TP53, PTK-RAS pathways in the context of SMAD4 variant. Statistical analysis was performed using Kruskal–Wallis test followed by FDR adjustment. Genes with p < 0.05 following FDR adjustment were considered significant. Multiple comparisons were performed using Dunn test followed by Holm adjustment to detect SMAD4-dependent genes. The expression between 2 groups were considered positive if adjusted p < 0.05. We considered genes to be SMAD4 dependent genes based on three criteria. First, there needs to be a significant difference in gene expression between the groups (Group 1-Group 3, and Group 1-Group 4, Group 2-Group 3 and Group 2-Group 4) in the context of the three second pathways with and without variants. Second, there should be no significant difference in gene expression between Group 1-Group 2, and between Group 3 -Group 4, respectively. Third, any change in gene expression should follow the same direction as observed in the mouse adenomas. Note that 84 genes differentially regulated at t = 0 with absolute log2FC >  = 2 and adjusted p <  = 0.05. Of those, 76 genes had human orthologs, and 73 genes had RNA data in the TCGA cohorts. Survival data for the COAD-READ TCGA cohort were retrieved (Table S1, Tab TCGA-CDR, accessed May 201991).

Statistical analysis

Statistical and graphical data analyses were performed using either Graph-Pad Prism version 6.0. software or R version 4.0.2. Images were analysed using ImageJ 1.48 software. Final Figures were generated using Photoshop (CS5). Immunoblot analysis was performed as follow; loading control and test from scanned immunoblots was aligned horizontally on Photoshop (CS5) and analyzed using ImageJ 1.48 software. For TCGA COAD-READ expression data, the cut-off point between high and low mRNA expression levels for each gene was computed based on the maximally selected rank statistics (maxstat) using the surv_cutpoint function from the ‘survminer’ package in R.