Introduction

Inflammatory bowel disease (IBD) is a chronic inflammatory condition that affects more than 3.5 million people worldwide. Ulcerative colitis (UC) is one of the major forms of IBD that causes inflammation in the mucosa lining of the colon and rectum1. The episode of the disease is usually chronic with remissions and exacerbations2,3. These patients are susceptible to develop colorectal cancer4 and it occurs earlier than the sporadic cancer5. Albeit microorganisms, genetic factors, environmental factors, and immune factors have been known to contribute to the causation of the disease, the exact pathogenesis of UC still remains unresolved6. The most commonly accepted aetiology of UC is loss of tolerance to commensal bacteria, an important pathogenic mechanism of the innate immune system response7.

Despite the availability of multiple imaging techniques, to detect UC with high accuracy requirement of experienced personnel, sophisticated instruments, and high costs restricted its application8,9. Hence, non-invasive biomarker is urgently required for early detection, and monitoring of disease progression, and therapeutic responses. Several inflammatory markers are routinely used in the laboratory for IBD diagnosis such as C-reactive protein (CRP), Erythrocyte sedimentation rate (ESR), Leucine-rich alpha-2 glycoprotein (LRG), Faecal calprotectin (FCP) etc10,11. but CRP and ESR are non-specific, and has low sensitivity12 while LRG has not been tested widely. In this regard, the potential of anti-microbial peptides (AMPs) for e.g. Defensins (DEFA5, DEFA6), Hepcidin, Cathelicidins, Lactoferrin, Elafin, Galectin 1, Galectin 3, and FCP in the pathogenesis of IBD has been studied13,14,15,16,17,18,19. Among these, FCP is the most widely used faecal marker for IBD though it is altered in other colonic and intestinal diseases20. As clinical symptoms alone are not sufficient to determine the extent of the disease, transcriptomics, and proteomics analysis get more importance which endow data that are more effective, and disease specific. But, platform variation, variation in studied population, limited sample sizes ultimately lead to incomparable data. In this context, meta-analysis with publicly available datasets of UC transcriptomics have retrieved 6 hub genes- Lipocalin 2 (LCN2), C-X-C motif chemokine ligand 1 (CXCL1), matrix metalloproteinase 3 (MMP3), Indoleamine 2,3-dioxygenase 1 (IDO 1), matrix metalloproteinase 1 (MMP1), and S100 calcium binding protein A8 (S100A8) for UC21.While machine learning tools are also being explored to identify the potential genes for therapeutic targets in UC using 10 available dataset in Gene expression omnibus (GEO), and identified Olfactomedin 4 (OLFM4), and complement component 4 binding protein beta (C4BPβ) may be conducive for identification of UC patients22. The latest clinical practice guideline depicts that a combination of multiple markers such as FCP >150 µg/g, elevated Faecal Lactoferin and CRP should be considered for taking treatment decision instead of colonoscopy23. But, colonoscopy is recommended for a patient with remission and patients with high biomarker or an active patient with no biomarker.

Thus, there is still no single cost-effective non-invasive biomarker for patients with UC. Although FCP is widely used, its high cost, and limited specificity restrict its routine clinical utility. Hence, there is an urgent need to identify affordable, accurate, non-invasive, and disease-specific biomarkers for better diagnosis, and monitoring of patients with active UC, and UC patients in remission.

Methods

Ethical permission

The study was approved by the ethical committee of the Institute of Post Graduate Medical Education and Research (IPGME&R), Kolkata, India [IPGME&R/IEC/2019/529]. Written permission was obtained from each participant or family member.

Inclusion of human subjects

UC patients within the age group 18–65 years coming to the IBD clinic at Gastroenterology Department of School of Digestive and Liver Diseases, IPGME&R for evaluation of the disease were enrolled in the study. Active UC patients (n = 39) were diagnosed on the basis of European Crohn’s and Colitis Organisation (ECCO) guidelines i.e., bloody diarrhea with stool frequency ≥ 6/day, pulse rate > 90/minute or temperature > 37.8 °C or Hemoglobin < 10.5 gm/dl or ESR > 30 mm/hour or CRP > 30 mg/L along with verification in colonoscopy and colonic histopathology. Activity, and chronicity of the disease were determined by the presence of neutrophil infiltration, crypt abscess, and crypt architectural distortion etc. Patients who recovered were having stool frequency ≤ 3/day with no blood in stool or no urgency were considered as UC in remission (n = 25). Age, and sex matched controls were included from the IBS patients (n = 10) attending OPD of IBS Clinic of IPGME&R, and diagnosed according to the Rome IV criteria. In addition, age, and sex matched 21 healthy normal samples were enrolled for performing ELISA. Crohn’s disease (CD) patients (n = 10) were included as other inflammatory disease group. The flow diagram of the study is presented in Fig. 1.

Fig. 1
figure 1

Flow diagram of sample inclusion in the study.

Exclusion of subjects

Patients with age < 18 years or > 65 years having other pre-existing GI diseases, chronic medical illness viz. Chronic kidney disease, Diabetes mellitus etc. were excluded.

Biopsy tissue and blood collection

Colonic tissues were collected from IBS, active UC, UC in remission, and CD patients in RNA later, and in 10% formalin immediately after biopsy, and washed with sterile phosphate buffer saline (PBS). Tissue collected in RNA later was kept at 4 °C for overnight and then preserved at -80 °C for future use. Blood was collected from all subjects in clot vials for separation of serum by centrifugation at 5000 rpm for 15 min, and stored at -20 °C in small aliquots.

Total RNA isolation and microarray analysis

Total RNA was isolated from 0.5 mg of colonic tissue using Trizol (Ambion) following the manufacturer’s protocol. In brief, the colonic tissue was homogenized in 500 µl of Trizol, followed by addition of 100 µl (1/5th volume of Trizol) of chloroform and centrifugation at 4 °C with 13,000 rpm. Then, 250 µl of Isopropanol (1/2 volume of Trizol) was added to the supernatant for the precipitation of RNA. RNA was precipitated by centrifugation for 15 min at 4 °C with 13,000 rpm, washed with 70% ethanol, air-dried, and dissolved in RNase free water. RNA integrity (RIN) was assessed using Bioanalyzer (Agilent) and tissue samples from UC (n = 4) and IBS-control (n = 3) with RIN value >7 were subjected to microarray analysis using Illumina platform. The differentially expressed genes (DEG) were analyzed using the “Limma” package of R Bioconductor24. The Benjamini–Hochberg correction was applied to minimize the false discovery rate (FDR). Genes with adjusted p-value below or equals to 0.1 and fold change (± log1) were considered as DEGs. The “heat mapper” was employed for generation of heatmap with DEGs.

Pathway analysis

Gene Ontology (GO) enrichment analysis was performed with DEGs to get information related to the deregulated biological processes (BP). We utilized the BP components through “cluster Profiler” package of R (version 4.0.3) with the significance threshold of p value < 0.0525.

cDNA synthesis and quantitative real time polymerase chain reaction (qRT-PCR)

Total RNA (2.5 µg) was used to generate cDNA using RevertAid Reverse Transcriptase (Thermo Scientific) following manufacturer’s protocol. cDNA was diluted in 1:30, and subjected to real time PCR using SYBR green PCR master mix (Thermo Scientific), and gene specific primers (Table 1) in ABI Quant Studio7 Flex Real time PCR machine in triplicate. Relative expression value [2 – (ΔCt sample-ΔCt control) x106] was plotted. Each experiment was performed in triplicate and repeated three times.

Table 1 Sequences of Primers.

ELISA assay

DEFB4A/hBD2 level was determined in serum of active UC patients, UC patients in remission, and HC samples using kit from Elabioscience following manufacturer’s protocol. The serum of CD patients was included as negative control.

Statistical analysis

Statistical analysis was performed using GraphPad Prism 8.0.1 software (La Jolla, CA, USA). For comparison between the two groups, Mann-Whitney t-test and Chi-square test analysis were performed. qRT-PCR data are presented as mean ± standard deviation (SD). Area under receiver operating characteristic (AUROC) analysis was performed with R packages. p value ≤ 0.05 was considered statistically significant.

Results

Clinical, biochemical, and demographic profiles of individuals included in the study

The clinical, biochemical and demographical characteristics of the subjects included in the cohort are presented in the Table 2. The clinical, and biochemical parameters of the active UC patients (n = 39), HC (n = 21), and IBS-control group (n = 10) were compared, and found that the fever, abdominal pain, weight loss, stool frequency, blood in stool, Hemoglobin, and ESR were associated more with active UC patients than HC, and IBS-controls (p = 0.05). UC patients in remission (n = 25), and CD patients (n = 10) showed similar result when compared with active UC and HC respectively.

Table 2 Clinical and biochemical parameters of the subjects included in the study.

Microarray analysis and qRT-PCR validation to identify DEGs in active UC patients compared to IBS-controls

The colonic biopsy tissues were stained with Haematoxylin and Eosin (H&E) to verify each sample prior to subjecting it for gene expression analysis (Fig. 2a). To identify the DEGs in the UC patients, four colonic tissue of active UC patients (n = 4) were subjected to microarray analysis using Illumina HT platform and compared with IBS-control samples (n = 3). After initial normalization of the data, the Principal component analysis (PCA) plot showed UC, and IBS-control samples were grouped independently (Fig. 2b). Upon DEG analysis, 94 significantly up-regulated (log fold change > 1.0, padj<0.05) and, 14 down-regulated genes (log fold change >-1.0, padj<0.05) were obtained as shown in Volcano plot, and the heatmap represented the top deregulated genes (Fig. 2c, d ).

Fig. 2
figure 2

(a) Hematoxylin and Eosin staining of colonic biopsy tissue from IBS-control and Active Ulcerative colitis (UC) patients. (b) Principal Component Analysis (PCA) to determine the relatedness between samples, (c) Volcano plot shows the statistical significance and magnitude of alteration in data, and (d) Unsupervised Hierarchical clustering heatmap with deregulated genes showing the expression variation of genes among IBS-control, and active UC patients. Red to green represents low to high expression. P < 0.05 was considered as significance.

Pathway analysis with DEGs to select the top deregulated pathways in UC patients

The genes altered in microarray analysis were subjected to pathway analysis, and the data revealed that anti-microbial humoral response mediated by anti-microbial peptides (AMP), anti-microbial immune response, and immune response to bacterial lipopolysaccharides (LPS) were among the highest altered pathways [log2fold change > 4, p < 0.001) in active UC patients compared to IBS-control (Fig. 3a). A few candidate genes from top three pathways such as IL6, TLR4, STAT3, IFN-γ related to immune response pathways, and REG3A, S100A8, DEFB4A/hBD2 from anti-microbial immune response pathways were validated by qRT-PCR. (Figure 3b and c). All the genes were observed to be overexpressed in the colonic tissue of active UC patients compared to IBS-controls, except for TLR4 and S100A8. Otherwise, most of the qRT-PCR data was consistent with the microarray data.

Fig. 3
figure 3

(a) Pathway analysis with deregulated genes in active UC patients vs. IBS-control group using KEGG pathway analysis. Validation of genes by qRT-PCR analysis from top two altered pathways (b) Immune response pathways (IL6, TLR4, STAT3 and IFN-γ) and (c) Anti-microbial immune response pathways (REG3A, S100A8, hBD2). *, ** and ns mean p < 0.05, 0.01 and not significant respectively.

Next, among the three genes validated from anti-microbial immune response pathways, as DEFB4A/hBD2 showed highest significant alteration in active UC patients, the overexpression of DEFB4A/hBD2 was also verified by immuno-histochemistry with anti-DEFB4A/hBD2 antibody in colonic tissue of active UC patients, and compared with IBS-control (Fig. 4a). In addition, colon cancer cell line (SW480) was treated with lipopolysaccharide (LPS), and observed overproduction of DEFB4A/hBD2 in compare to untreated cells (Fig. 4b).

Fig. 4
figure 4

Validation of expression of DEFB4A/hBD2 using colonic tissue of active UC patients and IBS-control by (a) Immune staining with anti-DEFB4A/hBD2 antibody, and (c) qRT-PCR. DEFB4A/hBD2 level determined by (b) qRT-PCR after LPS treatment on colon cancer cell line SW480 vs. untreated cells, and (d) ELISA using serum from healthy control, active UC, UC in remission, and Crohn’s disease (CD). (e) and (f) Area under receiver operating characteristics (AUROC) curve analysis to classify DEFB4A/hBD2 as biomarker for active UC and UC in remission. **, ***, and **** indicate p < 0.01, 0.001and 0.0001 respectively. ns means not significant.

Verification of serum level of DEFB4A/hBD2 to classify its potential as diagnostic biomarker for UC

As DEFB4A/hBD2 was the highest altered AMP among active UC patients, this gene was further explored to verify its potential as a biomarker for diagnosis of UC. The expression pattern of DEFB4A/hBD2 was verified by qRT-PCR in additional colonic tissue samples from active UC patients, and UC patients in remission, and compared with IBS-controls. Here, CD patients were included as another inflammatory disease to confirm the specificity of DEFB4A/hBD2. The data showed that DEFB4A/hBD2 expression was higher in active UC patients compared to IBS-controls, and it significantly decreased after remission. (Fig. 4c). No significant difference in the level of DEFB4A/hBD2 was noted in CD compared to IBS-control.

The level of DEFB4A/hBD2 was also quantified in the serum, and observed a significant enrichment of DEFB4A/hBD2 in the serum of the active UC patients than HC, and it was decreased in the UC patients in remission. But, no significant change was observed in the serum of CD patients compared to HC (Fig. 4d).

The area under receiver operating curve (AUROC) analysis was performed to verify its potential to be used as a biomarker for classification of active UC patients from HC. The area under curve (AUC) was 0.95 with predictive cut-off value was more than 220.74pg/ml. It showed sensitivity and specificity of 89%, and 95%. It can differentiate active UC patients from HC with an accuracy of 95%, and 95% confidence interval (CI) of (0.88–0.98) with positive predictive value (PPV) of 97%, and negative predictive value (NPV) of 83% (Fig. 4e). With the same cut-off value, remission patients showed AUC of 0.90 with sensitivity and specificity of 0.90, and 0.77, and with PPV of 86% while NPV 81% respectively. The 95% CI was (0.81–0.97) (Fig. 4f).

Thus, the overall data depicts that serum DEFB4A/hBD2 may be considered as a non-invasive biomarker for differentiation of active UC patients from HC. It can also help in distinguishing UC patients in remission from active UC.

Discussion

In this study we have identified and validated a secretary protein, DEFB4A/hBD2, in the blood of active UC patients, and compared with HC. To identify tissue-specific genes for classifying this patient group, a microarray analysis was performed with colonic tissue of active UC and IBS-control sample using Illumina platform. After pathway analysis with the significantly upregulated genes, anti-microbial immune response by AMP was the top deregulated pathway and validation with qRT-PCR revealed that DEFB4A/hBD2 exhibited most significant comparable data with the microarray results. Thus, we verified this protein in the blood of active UC patients, and UC patients in remission. The ELISA data suggests that this protein can distinguish both the groups. Therefore, this marker showed high accuracy in classifying UC group from normal with a cut off of more than 220.74pg/mL achieving 89% sensitivity, and 95% specificity, a PPV of 97%, and an NPV of 83%. At the same cut-off value, this marker can also distinguish UC patients in remission from active UC patients with 90% sensitivity, and 77% specificity, a PPV of 86%, and 81% NPV.

Despite availability of transcriptomics, proteomics, and metabolomics data of IBD patients, clinicians are still considering the level of CRP and FCP in routine clinical practice though CRP is non-specific and FCP has major limitations26. FCP varies over a few days27 and even diet and, exercise has profound impact on it though it can differentiate inflammatory, and non-inflammatory gastrointestinal diseases28. Thus, elevated level of FCP can be seen in various inflammatory diseases28. FCP might also be increased in non-intestinal inflammatory diseases when microbiota is altered such as decompensated liver cirrhosis29, pneumonia30 etc. Proton pump inhibitors31, glucocorticoids32, and Non-steroidal anti-inflammatory drugs (NSAIDs)33 also induce FCP expression. Thus, along with FCP, colonoscopic confirmation is required before therapy. FCP consists of a complex of two heterogenic proteins S100A8 and S100A934. The monomer S100A8 forms a heterodimer with S100A9 in a calcium dependent manner to form FCP. Microarray analysis with colonic tissues revealed a significantly higher expression of S100A8 in active UC compared to IBS-control; however, this difference was not significant in qRT-PCR data using colonic tissue from active UC, and IBS-control. It is important to note that the expression pattern of S100A8 in IBS-control was unusual. Two separate groups of patients were present, one with high S100A8 levels (60%) and another with low (40%). Additionally, FCP is more expensive than DEFB4A/hBD2 estimation. Since, DEFB4A/hBD2 showed most significant alteration in expression in our UC cohort compared to S100A8, we verified the serum level of DEFB4A/hBD2, and observed significant enrichment in active UC patients while the level decreased during remission. A longitudinal study is required to track DEFB4A/hBD2 levels from diagnosis throughout their clinical course of UC patients.

Though this study has identified a non-invasive marker to monitor the disease prognosis of an UC patient, a larger sample size in all groups may help us to interpret the results more effectively. Additionally, molecular analysis is also necessary to understand the impact of increased DEFB4A/hBD2 on the disease progression.