Introduction

Globally, colorectal cancer (CRC) ranks second in cancer related mortality and third in incidence1. By 20,230, the global burden of CRC is expected to increase by 60%in terms of new cases and deaths2. Traditional treatments such as radiotherapy and chemotherapy are often associated with significant adverse effects3,4. Although, the 5-year survival rate for CRC is approximately 64%, it drops drastically to 12% in metastatic cases5. In Egypt, CRC is the 7th most common cancer in Egypt according to the Global Cancer Observatory6. Notably, Egyptian patients are typically diagnosed at late stages and face poor prognosis7. This underscores the urgent need for reliable and accurate prognostic molecular biomarkers.

One of the key emerging hallmarks of cancer is epigenetic dysregulation,, which is often influenced by environmental factors, inflammation, and cellular stress8,9,10. Among (epi) genetic mechanisms, non-coding RNAs (ncRNAs) have been identified as important molecular biomarkers implicated in various diseases, including multiple types of cancer11,12,13,14,15.

Long non-coding RNAs (lncRNAs), which are transcripts longer than 200 nucleotides that do not encode known proteins, play a significant role in tumorigenesis by regulating gene expression, protein synthesis, and epigenetic modifications16,17. Recent studies revealed that several lncRNAs were aberrantly expressed in CRC patients and were considered to be an indicator of a poor prognosis including lncRNA nicotinamide nucleotide transhydrogenase antisense RNA 1; NNT-AS118, nicotinamide phosphoribosyltransferase antisense RNA 1; NAMPT-AS119.

Long intergenic noncoding RNA 00,511 (LINC00511) known as onco-lncRNA-12 is a 2265 bp ncRNA that maps to chromosome 17q24.3. LINC00511has been shown to exert oncogenic functions in glioma20, lung cancer21, cervical cancer22, gastric cancer23, and CRC24,25. Mechanistically, LINC00511 may function as competitive endogenous RNA (ceRNA), contributing to the induction and progression of various cancers20,22,23 including breast cancer26,27. We selected LINC00511 as the focus of our study based on prior filtration and discussion processes. While genetic variations in LINC00511 have been recently detected in Chinese breast cancer patients28. The role of LINC00511 single nucleotide polymorphisms (SNPs) in CRC remains unexplored. Specifically, no studies to date have examined the association between LINC00511 SNPs and CRC susceptibility or related risk factors.

Therefore, to better understand the association between LINC00511 SNPs and CRC pathogenesis and severity, this study aims to quantify relevant epigenetic variations. This study aims to examine whether the genotype distribution of the tested LINC00511 SNP alleles is associated with CRC risk and/or patient outcomes by comparing colorectal cancer patients with cancer-free controls. Genetic and epigenetic variations often co-occur in specific combinations of SNPs known as haplotypes, which, when inherited together, can influence the expression of both coding and non-coding genes, including lncRNAs29,30. These expression changes may, in turn, impact the progression and clinical course of CRC—a hypothesis that this study seeks to explore.

Objectives

The objectives of this study are fourfold. First, to evaluate whether LINC00511 SNP(s) variants influence CRC susceptibility by conducting a case-controlled study involving Egyptian CRC patients and cancer-free controls. This analysis will also explore whether these SNPs are associated with clinicopathological characteristics or tumor subtypes, providing a basis for precision diagnosis. Second, a logistic regression analysis will be utilized to assess the associations between LINC00511 SNPs (rs17780195 or rs9906859, and rs1558535) and CRC susceptibility or CRC risk while adjusting for potential confounders such as age, BMI, and family history.

Third, the study will investigate whether these SNPs can predict CRC prognosis by calculating odds ratios (ORs) and 95% confidence intervals (CIs) under established genetic models.

Fourth, in silico analyses using haplotype and survival association databases will be conducted to evaluate the potential role of LINC00511 SNPs in CRC outcomes. Collectively, the findings from this study are expected to inform future precision oncology strategies, including the development of targeted therapies against downstream signaling pathways regulated by LINC00511, thereby enhancing both diagnostic and therapeutic precision in CRC management.

Subjects

Sample size and power study

Based on the previous study by Chong et al.28, the sample size for the current study was calculated using the reported prevalence of the GG genotype of the rs17780195 SNP among breast cancer patients and healthy controls. The calculation was performed based on comparing genotype prevalence, the odds ratio (OR), between CRC cases and cancer-free controls using a Fisher’s Exact test, with a two-sided α-error level set at 0.05, a power of 80%, and a case-to-control ratio of 1:1. According to the referenced study, the prevalence of the GG genotype among healthy individuals was approximately 64.9%, and the odds ratio (OR) for the association with breast cancer was approximately 0.398. Using these parameters, including a two-sided 95% confidence level and a case-to-control ratio of 1:1, the minimum required sample size was estimated to be 85 participants per group (i.e., 85 cases and 85 controls) to achieve adequate statistical power (80%) to detect a significant association, assuming the null hypothesis that the genotype distribution is equal between groups. The Type I error probability (α) for this hypothesis test was set at 0.05. Sample size estimation was performed using the Power and Sample Size online calculator for genetic association studies (G*Power), accessed in October 2021.https://csg.sph.umich.edu/abecasis/gas_power_calculator/31.

Ethical Approval and Consent to Participate

All clinical and pathological data were collected using structured questionnaires. The study was approved by the Research Ethical Committee of the Faculty of Pharmacy, Ain Shams University (Approval No: 157, Date: 17 January 2023). The study was carried out in adherence to the Declaration of Helsinki Guidelines (World Medical Association Declaration of Helsinki: Ethical principles for medical research involving human subjects, 2013)32. Written informed consent was obtained from all participants—both CRC patients and healthy controls—after they were fully informed about the study objectives and procedures.

Study design

Case-controlled, single-center, retrospective study.

Study clinical trial registration

ClinicalTrials.gov Identifier: NCT06534242.

Study participants

A total of 200 CRC patients were recruited from Mansoura University Hospitals, Mansoura, Egypt. For the control group, 200 age- and sex-matched apparently healthy individuals were randomly selected using frequency matching (±2 years). Control subjects had no history of chronic illness, were not on any medications, and had normal kidney and liver function tests. Additionally, they showed no clinical or laboratory evidence of CRC. Controls were recruited during routine health examinations—either for themselves or while accompanying relatives—or through the Chronic Diseases Screening National Presidential Program.

Patients’ criteria

Inclusion criteria required participants in the study to be adult with newly diagnosed CRC confirmed by pathological examination. Diagnosis was based on clinical presentation—including symptoms such as constipation, rectal bleeding, and significant weight loss—and confirmed through colonoscopy, abdominal imaging, and histopathological analysis. Only patients with CRC of no specific histological subtype were included. Exclusion criteria included any history of blood disorders, Hepatitis B or C infection, HIV, schistosomiasis, thyroid dysfunction, alcohol intake, diabetes mellitus, cardiovascular disease, or other inflammatory conditions. Patients who had received chemotherapy or radiotherapy, had undergone gastrointestinal surgical operations, or were diagnosed with any cancer other than CRC were also excluded from the study.

CRC patients’ clinical and pathological features

For all CRC participants, full family history of cancer and records of any prior surgical procedures were collected. Clinical staging and histological grading were determined based on colonoscopy findings, abdominal radiographic imaging, pathological analysis, and clinical judgment, following the American Joint Committee on Cancer (AJCC) 2010 criteria33. Tumor staging was categorized according to the TNM classification system33,34, as follows: stage 0 (carcinoma in situ) refers to early lesions confined to the mucosa; stage I includes small, localized tumors considered early-stage and potentially curable; stage II encompasses larger primary tumors that may extend locally but without lymph node involvement; stage III indicates regional lymph node metastasis; and stage IV denotes the presence of distant metastasis Histological grading was classified as low-grade (grade I–II), representing well or moderately differentiated tumors, or high-grade (grade III–IV), which includes poorly differentiated, undifferentiated, adenocarcinoma, or mucinous carcinoma.. Additional clinical data such as age, weight, and height were retrieved from patient records and used to calculate Body Mass Index (BMI) using the NIH BMI calculator

https://www.nhlbi.nih.gov/health/educational/lose_wt/BMI/bmicalc.htm. Participants with a BMI ≥ 25 kg/m2 were considered overweight or obese. Laboratory findings, including complete blood count (CBC) and classical tumor markers such as carcinoembryonic antigen (CEA) and cancer antigen 19.9 (CA19.9), were also recorded from medical files and tabulated for analysis.

Methods

In silico analysis

In silico LINC00511 gene database(s) analysis

Gene card identification was used for the identification of LINC00511 gene and its mRNA expression in different human tissues using https://www.genecards.org/ databases (Accessed on April, 2022)35. LINC00511 gene expression was obtained from RNA-seq data unit TPM, from 53 human tissue samples from the Genotype-Tissue Expression (GTEx) Project from Expression Atlas Gene expression across species and biological conditions https://www.ebi.ac.uk/gxa/home36 (Accessed Jan. 3rd, 2022).

LINC00511 SNPs selection

Based on the previous study by Chong et al. which investigated several LINC00511 SNPs in breast cancer28, we screened for a SNP with a minor allele frequency (MAF) above 0.05 (≥ 5%). MAF data were obtained from the International Genome Sample Resource (IGSR) Supporting open human variation data https://www.internationalgenome.org/ in 1000 genomes data https://www.ensembl.org/ 1000GENOMES:phase 337. The information on the selected Reference SNP (rs) obtained from https://www.ncbi.nlm.nih.gov/snp/ and RefSNP report38.

In silico linc00511 snps haplotype association data analysis

. EnsembI release 108—Oct 2022 © EMBL-EBI EMBL’s European Bioinformatics Institute EMBL-EBI was used to access human genomic data and annotation via https://www.ensembl.org. Pairwise linkage disequilibrium (LD) metrics—including heatmap-based pairwise correlation coefficients (R2) and standardized linkage disequilibrium values (D′)—were obtained for the three tested LINC00511 SNP variants, The analysis was conducted using data from the 1000 Genomes Project Phase 3, specifically the YRI (Yoruba in Ibadan, Nigeria) population, to assess the genetic correlation and potential haplotype structure among the selected SNPs. Direct access to variant-level data and visualization tools was provided through the Ensembl variation browser at http://www.ensembl.org/Homo_sapiens/Info/Index?db=core;r=17:72618050-72638049;v=rs17780195;vdb=variation;vf=106739215

LINC00511 gene differential expression and snps survival platform analysis

Pan-Cancer Survival Analysis for LINC00511 gene was conducted across 32 different types of Cancers, comprising a total of 10,882 RNA-seq samples. Among these, expression data for colon adenocarcinoma were specifically examined. The RNA-seq expression data of colon adenocarcinoma were downloaded from The Cancer Genome Atlas Program; TCGA project https://cancergenome.nih.gov/ via Harmonized Cancer Datasets; Genomic Data Commons Data Portal https://portal.gdc.cancer.gov from the National Cancer Institute GDC Data Portal39,40. LINC00511 expression values were log-transformed using the formula log2(FPKM + 0.01) to normalize the data. For pan-cancer comparative analysis, expression profiles were retrieved from ENCORI – The Encyclopedia of RNA Interactomes https://starbase.sysu.edu.cn/panCancer.php, which integrates ~ 10,000 RNA-seq and ~ 9,900 miRNA-seq samples from the TCGA project41.. This analysis enabled cross-cancer comparisons and evaluation of LINC00511’s expression pattern and prognostic significance, including in colorectal cancer.

Blood samples

Four mls of peripheral blood were withdrawn from CRC patients (collected at the time of diagnosis for those who met the inclusion criteria, and signed the IC, and were rejected for those who met the exclusion criteria) and controls, under strict sterile conditions for molecular testing, on EDTA anticoagulant vacutainers and stored at −80º C, until DNA extraction and biochemical assessment at the Advanced Biochemistry Research Lab, Faculty of Pharmacy, Ain Shams University (Research Setting), Abassia, Cairo, Egypt.

DNA Extraction

DNA was extracted from 200 μL whole blood using a QIAmp DNA Blood Mini extraction kit (Qiagen, Valenica, CA) according to the manufacturer’s instructions.

DNA Quantification

The yield was quantified, and its purity was measured by Platinum-colored DS11 Spectrophotometer (DeNovix Inc, USA) and stored at −80°C, until assessment.

SNPs Genotyping

Was performed using predesigned TaqMan® SNP genotyping assays for rs17780195, rs9906859, and rs1558535 (catalog no: 4351379, Thermo Fisher Scientific, USA) and TaqMan genotyping Master Mix (Catalog no: 4371353, Thermo Fisher Scientific, USA). For each sample, 20 ng of DNA template genotyped using 10 μL (2×) TaqMan® genotyping Master Mix, 0.5 μL (40×) TaqMan® SNP genotyping assay, and continued by DNAse/RNAse-Free water (Gibco, Life Technologies, USA) to a total volume of 20 μL reaction using default settings for genotyping with the appropriate negative control. The real-time polymerase chain reaction (RT-PCR) was performed on StepOne Plus Thermal Cycler (Applied Biosystems, USA).

Data statistical-analysis

The statistical analysis was performed using R software (version 4.2.0). First, for each studied SNP, genotype frequencies were checked to be in concordance with the Hardy–Weinberg Equilibrium (HWE) using Pearson’s χ2 Chi-squared tests. Qualitative variables were expressed as number and percentage while quantitative data were expressed as median and interquartile range (IQR). Mann–Whitney’s W Test and the χ2 test were used to compare quantitative and qualitative variables between the CRC and control groups, respectively. After adjusting for demographic factors (age, sex, BMI, family history of cancer in first degree relatives), an unconditional logistic regression applied to explore the independent risk factors for CRC among the examined LINC00511 target SNPs using the co-dominant, dominant, over-dominant, and recessive genotypic models, then being compared using measures of model fit and prediction (the Akaike Information Criterion (AIC), Bayesian Information Criterion (BIC), Deviance Information Criterion (DIC), Pseudo R2 (McFadden’s, Cox and Snell’s, and Nagelkerke’s), and the area-under-receiver operating characteristics (AUC) curve. The additive/co-dominant model was superior to all models based on these criteria. Sensitivity (SN), specificity (SP), positive predictive value (PPV), and negative predicted value (NPV) were reported for the model with the best fit and predictive power (not for mere disease diagnosis). The optimal cut-offs for age, CA19-9, and CEA categorization of LINC00511 SNPs were determined using ROC analysis. SNPs genotypes were analyzed using the snpReady library in R42. Missing genotypes were imputed using Wright’s method (based on Wright’s equilibrium) and missing baseline demographic and clinical data were imputed using the predictive mean matching method implemented in R43,44.

Multiple logistic regression analysis was done to examine the association between each SNP alleles, genotypes, haplotypes, and CRC prevalence, stage, and grade, while adjusting for baseline covariates (age, BMI, and additional risk factors for tumor stage and tumor grade, family history, tumor site, history of inflammatory bowel disease (IBD)), classical tumor markers (CEA and CA19.9), and the presence or absence of vascular infiltration.

Firth’s logistic regression was implemented in the case of quasi-separation in one or more variables. Haploview software version 2.0 was used to calculate r2 and D’ as the measurements of linkage disequilibrium extent between pairwise SNP combinations blocks of different genotypes were determined using the SHEsis software http://analysis.bio-x.cn/myAnalysis.php45. A stratified analysis and the SHEsis plus online software http://shesisplus.bio-x.cn/SHEsis.html applied to further evaluate the association between LINC00511 SNPs and CRC susceptibility as well as haplotypes frequency as a measurement of genetic distribution was directly calculated in CRC and healthy control groups. Further, Epistasis was analyzed by Multifactor Dimensionality Reduction (MDR) package software in R https://ritchielab.org/software/mdr-download for carrying out SNP-SNP or gene–gene and SNP-Environment or gene-environment interaction analysis, applied to evaluate the interactive role of genetic and demographic factors (false Discovery rate was controlled by adjusting the significance level using Benjamini and Hochberg, Benjamini and Yekutieli, Holm step-down, Sidak step-down, and Sidak single-step p-value adjustment procedures). When comparing the predictive ability of the models using the pseudo-R-squared measures and the SN, SP, PPV, and NPV by ROC curve, in addition to the MDR part, aids for LINC00511 SNPs rs prediction. Prediction method LD values were calculated by a pairwise estimation between LINC00511 SNPs genotyped in the same sample and within a given window. An established method was used to estimate the maximum likelihood of the proportion that each possible haplotype contributed to the double heterozygote. To confirm differences between obtained results in the patient group were not by chance, a Bonferroni calculation was applied to the data46.

Finally, we analyzed the overall survival (OS) for 1–3 years in the CRC patients (n = 200) with recording the date of death or last contact with the Clinician as the follow-up end point. Median follow-up time for the patients was 1 year. For disease-free survival (DFS), as the event free survival (EFS), in patients with non-metastatic CRC at the time of diagnosis, date of relapse or last contact with the Clinician was the follow-up end point. Median follow-up time was 1 years. The non-parametric Kaplan–Meier method was done for OS curves and EFS. The survival probability calculator generates the Kaplan–Meier curve with 95% CI, using log-rank test using the Chi square distribution, for comparison of more than of two groups.

The relative risk of disease relapse was estimated as hazard ratio (HR). Univariable survival analyses were done for gender, MBI, tumor site, family history of cancer, lymph node involvement, TNM stage and tumor grades separately, for the CRC patients’ group as well as for LINC00511 SNPs genotypes. A two-sided value of P < 0.05 was deemed as a sign of statistical significance.

Results

LINC00511 gene In silico databases analysis

We used the HUGO Gene Nomenclature Committee (HGNC), supported by the National Human Genome Research Institute (NHGRI) https://www.genenames.org/data/gene-symbol-report/#!/hgnc_id/HGNC:43564, to provide the Symbol report for LINC00511. LINC00511 is an oncogene lncRNA located on chromosome 17, its cytogenetic band is 17q24.3 identified by HGNC and Entrez Gene or Ensembl algorithms. LINC00511 gene card (GC) id is GC17M072323 and its length is 2,265 bp being made up of 5 exons https://www.genecards.org/cgi-bin/carddisp.pl?gene=LINC00511&keywords=LINC00511 (Accessed on April, 2022) (Fig. 1A). Using gene cards algorithm RNAseq https://www.genecards.org/cgi-bin/carddisp.pl?gene=LINC00511#function, LINC00511 mRNA expression was obvious in sigmoid and transverse colon (Accessed on April, 2022) (Fig. 1B). LINC00511 differential gene expression box plot (Fig. 1C) from the ENCORI project in 471 colon adenocarcinoma patient’s vs 41 normal samples. https://rnasysu.com/encori/panGeneDiffExp.php#COAD Pan-Cancer Survival Analysis of Genes across 32 types of Cancers across 447 samples and HR 0.87 was non-significant (p = 0.48) for LINC00511 gene. https://rnasysu.com/encori/panGeneSurvivalExp.php#COAD (Accessed on Jan. 3rd, 2023) (Figure 1D).

Figure. 1
figure 1

A) LINC00511 genomic location: bands according to Ensembl https://www.genecards.org/cgi-bin/carddisp.pl?gene=LINC00511 locations according to GeneLoc. Accessed on April, 2022 and B) LINC00511 gene mRNA expression in normal human tissues from RNAseq https://www.genecards.org/cgi-bi n/carddisp.pl?gene = LINC00511#function showing sigmoid and transverse colon (Accessed on April, 2022). Accessed Jan. 3rd, 2022. C) LINC00511 gene differential expression in 471 colon adenocarcinoma vs 41 normal samples https://rnasysu.com/encori/panGeneDiffExp.php#COAD Data source is ENCORI project. D) LINC00511 gene overall survival plot from Pan-Cancer Survival Analysis for Colon Adenocarcinoma by ENCORI_ The Encyclopedia of RNA Interactomes https://starbase.sysu.edu.cn/panGeneSurvivalExp.php# Accessed on Jan. 3rd, 2023).

Selected SNPs criteria

Based on the previous study by Chong et al. 2020 that investigated several LINC00511 SNPs, and following screening for SNPs with a MAF above 0.05, three SNPs located within the LINC00511 gene were selected for analysis: rs1558535 (MAF = 0.36), rs17780195 (MAF = 0.18), and rs9906859 (MAF = 0.47). These SNPs were chosen by our research group to evaluate their potential association with colorectal cancer (CRC). Detailed information about the selected SNPs is provided in Table 1. The rs of selected SNPs report obtained from https://www.ncbi.nlm.nih.gov/snp/ and RefSNP report. MAF obtained from the International Genome Sample Resource (IGSR) Supporting open human variation data https://www.internationalgenome.org/ in 1000 genomes data https://www.ensembl.org/ as reported previously. The pathogenic allele in the rs1558535 is T, and for rs17780195 is G, however, it is C for rs9906859.

Table 1 LINC00511 SNPs rs9906859, rs1558535, and rs17780195 allele, chromosome, functional consequence, and MAF info report.

For all the variant type is SNV, intron variants, being validated by frequency, by alfa, by cluster. Reference SNP (rs) Report https://www.ncbi.nlm.nih.gov/snp/ and Ensembl genome browser 110, [LINC00511: Long Intergenic Non-Coding RNA 00,511, MAF: Minor Allele Frequency, SNP: Single Nucleotide Polymorphism, SPD: Sequence Position Deletion Insertion.] Via the SNPinfo web server, LINC00511 SNP rs9906859 info in DNA sequence https://manticore.niehs.nih.gov/cgi-bin/snpinfo/snpinfo1.cgi#t168135875 neighbor rs is rs17780195.

In the current study, the three LINC00511 SNPs were in strong linkage disequilibrium (LD) and were moderately correlated as appears in Table 2.

Table 2 Linkage disequilibrium and correlation between LINC00511 studied SNPs.

LINC00511 SNPs Allele/Genotype frequencies

LINC00511 SNPs rs1558535, rs17780195, and rs9906859 were checked for agreeing with the HWE using Pearson’s χ 2 Chi-squared tests. All LINC00511 studied SNPs had call rates more than 0.95 and MAF greater than 0.05, in all the investigated groups. Imputed genotypes comprised 1.9% of all data. There was a significant deviation from the HWE in LINC00511 SNP rs1558535 in controls (χ2 = 10.5, P=0.001), and for LINC00511 SNP rs9906859 in controls (P < 0.001) as well as for all the study participants (n=400) (P=0.047) (Table 3).

Table 3 LINC00511 SNPs rs1558535, rs17780195, and rs9906859 frequencies attributes and HWE testing in the studied groups.

Subjects demographic basic data and snps results

The median age of the whole study group was 45 years, range (20—78 years). Patients were significantly older than the healthy controls with median age 53 vs. 38 years, (P < 0.001). A significantly higher proportion of CRC patients were overweight or obese (BMI ≥ 25 kg/m2) compared to healthy controls (79.0% vs. 64.5%, P < 0.01), as shown in Table 4. In addition, rs1558535 AT and variant was more frequent in CRC patients than controls (55.5% vs. 37.0%, P < 0.01). Also, rs9906859 TC variant was higher in CRC cases than controls (52.0% vs. 35.5%, P < 0.01). CRC patients carrying wild genotype of both rs1558535 and rs9906859 were much less than controls carrying such genotype (19.0% and 13.0% vs. 30.6% and 25.0%, respectively).

Table 4 Baseline age and BMI as well as LINC00511 SNPs rs1558535, rs17780195, rs9906859 genotypes in CRC patients (n = 200) vs control group (n = 200).

Association tests

Single locus (Alleles) association tests

Comparisons between alleles proportions in CRC patients vs. healthy controls showed no difference in the proportions of the pathogenic allele in rs1558535 (53% vs. 54%), rs17780195 (23% vs. 24%), and rs9906859 (61% vs. 57%) (Table 5). Similarly, logistic regression analysis, adjusted for age and BMI category, showedno association between the allele type and CRC in rs1558535 (OR: 1.05, 95% CI: 0.77—1.43), rs17780195 (OR: 0.88, 95% CI: 0.60—1.30), and rs9906859 (OR: 1.09, 95% CI: 0.81—1.49).

Table 5 Association between LINC00511 SNPs rs1558535, rs17780195, and rs9906859 allele frequencies (proportion) in CRC patients (n = 200) vs. controls (n = 200).

Genotype association tests

Concerning the tumor grade and stage association to LINC00511 SNPs presented in Tables 6 and 7, respectively. First, none of the studied SNPs correlated with tumor grade (Table 6). rs1558535 AT variant correlated with more advanced stages III or IV diseases compared to earlier stages I or II (67% vs. 41%, P < 0.001) and a nearly fourfold increase in the odds of presenting with CRC stages III or IV (adjusted OR: 3.99, 95% CI: 1.30—13.0, P = 0.022). The rs17780195 AG variant correlated with more advanced stages III or IV diseases compared to earlier stages I or II (46% vs. 30%, P < 0.045) and a nearly threefold increase in the odds of presenting with CRC stages III or IV (adjusted OR: 2.72, 95% CI: 1.28—6.02, P = 0.011). On the other hand, rs9906859 did not correlate with the tumor stage (Table 7). Interestingly, data shown in Table 6 and 7 revealed that gender, family history and IBD history are associated with high tumor grades with OR 3.04, 4.02 and 125.04, respectively, at P < 0.05. In addition, transverse tumor site, vascular infiltration, and patients having CEA > 4 ng/mL or CA 19–9 > 40 U/mL are associated with late CRC stages.

Table 6 Co-dominant Model for CRC group (n = 200) tumor grade, low or high, association with demographic characteristics (age, BMI, sex, and family cancer history) and tumor clinical features (tumor site and vascular infiltration, IBD, and blood TMs) with LINC00511 SNPs rs1558535, rs17780195, and rs9906859 genotypes.
Table 7 Co-dominant model table for CRC group (n = 200) tumor stages association with demographic characteristics (age, BMI, sex, and family cancer history) and tumor clinical features (tumor site and vascular infiltration, IBD, and blood TMs) with LINC00511 SNPs rs1558535, rs17780195, and rs9906859 genotypes.

P1, Pearson’s Chi-Squared Test. Odds ratios are obtained by Firth’s logistic regression analysis. Odds ratios are adjusted for age, BMI, sex, family history tumor site, vascular infiltration, IBD history, CEA, and CA19-9 at baseline. P2; P values for odds ratios and corrected for multiple testing using Benjamini and Hochberg’s method. Age, CEA, and CA19-9 optimal cut-offs for grade prediction were determined using ROC curve analysis.

[BMI: Body mass index, CA-19–9: Carbohydrate antigen 19–9, CEA: Carcinoembryonic antigen, IBD: Inflammatory bowel disease, LINC00511: Long intergenic non-coding RNA 50,011, NS: non-significant, OR: Odds ratio, CI: Confidence interval.]

P1, Pearson’s Chi-Squared Test. Odds ratios are obtained by Firth’s logistic regression analysis. Odds ratios are adjusted for age, BMI, sex, family history tumor site, vascular infiltration, IBD history, CEA, and CA19-9 at baseline. P2; P values for odds ratios and corrected for multiple testing using Benjamini and Hochberg’s method. Age, CEA, and CA19-9 optimal cut-offs for grade prediction were determined using ROC curve analysis.

[BMI: Body mass index, CA-19–9: Carbohydrate antigen 19–9, CEA: Carcinoembryonic antigen, IBD: Inflammatory bowel disease, LINC00511: Long intergenic non-coding RNA 50,011, NS: non-significant, OR: Odds ratio, CI: Confidence interval.]

To explore the independent risk factors for CRC among the examined LINC00511 SNPs, unconditional logistic regression was applied, compared model fit and prediction using the co-dominant genotypic model, WSD significantly better than the null model. The co-dominant, and over-dominant models showed comparable adequacy of fit (DIC: 446 vs. 449, AIC: 464 vs. 461, BIC: 500 vs. 485), and predictive ability demonstrated by comparable AUCs (0.786 vs 0.784) and pseudo R2 values (McFadden’s: 0.196 vs. 0.190, Cox and Snell’s: 0.238 vs. 0.232, Nagelkerke’s: 0.317 vs. 0.309) (Table 1S). In the co-dominant model, rs1558535 genotypes were independently correlated with CRC; The frequency of the wild-type AA variant was greater in controls compared to cases (30% vs. 20%, P = 0.020) and the frequency of the heterozygous AT variant was greater in cases than controls (56% vs. 39%, P < 0.001). However, when adjusted for age and BMI, the association between rs1558535 variants and CRC was insignificant. rs9906859 independently correlated with CRC; the frequency of the wild-type TT variant was greater in controls compared to cases (26% vs. 13%, P = 0.002) and the frequency of the heterozygous TC variant was greater in cases than controls (52% vs. 34%, P = 0.001). In addition, when adjusted for age and BMI, the heterozygous TC variant was associated with a more than threefold increase in the odds of CRC (OR: 3.54, 95% CI: 1.58–8.13, P = 0.002). Moreover, both dominant and over-dominant models showed association between rs9906859 and increased risk for CRC (OR: 3.04, 95% CI: 1.43–6.64, P = 0.004 and OR: 2.57, 95% CI: 1.49–4.52, P = 0.001, respectively). For rs17780195, there were no associations between any of its variants and CRC (P = 0.221) (Table 2S).

Haplotype analysis

A significant correlation between LINC00511 SNPs rs1558535, rs17780195, and rs9906859 haplotype and CRC was observed (P < 0.001). Post-hoc analysis revealed that the ‘TAC’ haplotype—comprising the mutant allele for rs1558535, wild-type for rs17780195, and mutant for rs9906859—was associated with a modestly increased risk of CRC (observed in 31.5% of cases vs. 24.0% of controls; P = 0.020). This haplotype conferred an approximately 1.5-fold increased risk of CRC (OR: 1.46; 95% CI: 1.07–1.99), and the association remained statistically significant after adjustment for false discovery rate.. On the other hand, the ‘TAT’ mutant-wild-wild haplotype conferred lower risk of CRC (Cases: 1.7%, Controls: 8%, P < 0.001) and fivefold lower odds of CRC (OR: 0.20, 95% CI: 0.09–0.47). Adjusting for false detection rate did not impact the association between the ‘TAT” haplotype and CRC (Table 8).

Table 8 Haplotype analysis of LINC00511 SNPs rs1558535, rs17780195, and rs9906859 in CRC (n = 200).

Gene interaction analysis (GIA)/Epistasis analysis based on Shannon’s entropy

To identify and visualize the specific multi-locus genotype combinations involving the 3 studied SNPs that are associated with either high or low-risk of CRC, Multi-Factor Dimensionality Reduction (MDR) interaction plot was used (Fig. 2). The MDR analysis showed that individuals who are all-heterozygous or all-homozygous for the alternate allele in all studied loci are at high risk of CRC. Finally, individuals who are heterozygote and homozygote or homozygote and heterozygote for rs1558535 and rs9906859, respectively (i.e. TT/AG/TC or AT/AG/CC) are at high risk of CRC as well, suggesting that the interaction between rs1558535 and rs9906859 is more predictive of CRC. Furthermore, individuals who are double heterozygotes for any combination of the two loci are also at high risk.

Fig. 2
figure 2

Multifactor dimensionality reduction (MDR) model fit with a three-way split for LINC00511 SNPs for CRC cases (n = 200) and controls (n = 200). Epistasis analysis is based on Shannon’s Entropy flowing chi square distribution. [SNP1 is rs1558535, SNP2 is rs17780195, and SNP3 is rs9906859. Genotypes are coded ‘0’ for the wild-type, ‘1’ for heterozygous, and ‘2’ for homozygous. Grey bars denote high-risk and the white bars for low-risk.].

Data revealed that all the three SNPs contribute to CRC high-risk prediction through complex three-way interaction and rs9906859, particularly its heterozygous genotype, appears to be the most broadly and consistently associated with CRC high-risk. In addition, the predictive power of rs1558535 and rs17780195 is evident when they are in specific combinations, indicating that these three SNPs are crucial for precise risk stratification. Therefore, W-test and post-hoc epistasis analysis using 2-way interactions were performed to provides a more detailed breakdown of pairwise SNP associations with CRC.

MDR analysis identified the three-way model (Table 9) of LINC00511 rs1558535, rs17780195, and rs9906859 as the best predictive model of disease status, which minimized balanced accuracy in 5 out of 5 cross-validation intervals and estimates a prediction accuracy of 64.9% and classification accuracy of 66.1%. Data shown in Table 8 revealed that rs1558535 and rs9906859 are independently contribute to CRC risk, confirming data illustrated in Table 4. In addition, there are highly significant two-way interactions between all pairs of SNPs, and crucially, a highly significant three-way interaction among all three SNPs. This indicates that the risk of CRC is influenced by complex epistatic relationships between rs1558535, rs17780195, and rs9906859.

Table 9 W-Test of LINC00511 SNPs rs1558535, rs17780195, and rs9906859 Gene Interaction Analysis in colorectal cancer (n = 200) group and control (n = 200) group.

The mosaic plots (Fig. 3) that represent the post-hoc epistasis analysis display the relationship between (rs1558535 and rs17780195) (Fig. 3A) and (rs1558535 and rs9906859) (Fig. 3B). It is clear that there is a stronger statistically significant association between the later pair of SNPs and CRC, manifested by extremely low P-value. Certain genotype combinations such as (rs1558535 AT, rs9906859 TT) is significantly enriched in CRC patients, indicating their link to increased risk of CRC.

Fig. 3
figure 3

Mosaic plot showing post-hoc epistasis analysis after multifactor dimensionality reduction (MDR) model fit with a two-way split for LINC00511 SNPs for CRC cases (n = 200) and controls (n = 200) (A) between rs1558535 and rs17780195 (B) between rs1558535 and rs9906859. [Pearson Residuals; blue shading (positive residuals) indicates a positive association or enrichment, red shading (negative residuals) indicates a negative association, gray/white shading (near zero residuals) indicates that the observed counts are close to what would be expected under independence].

Testing for linkage disequilibrium (LD) and pairwise correlation coefficient with haplotypes

CRC susceptibility would be influenced by the LD of neighbor genetic variations or linkage among other indirect SNPs, which may provide further insight into the connection of several non-random associated SNPs (haplotypes). A pair of alleles is said to be in linkage disequilibrium when they co-occur more frequently than would be expected based on their individual allelic frequencies. Linkage disequilibrium was assessed using the genotype data that had been gathered. Pairwise standardized linkage disequilibrium (D’) and haplotype analysis were calculated between each pair of LINC00511 polymorphism loci examined to evaluate the association between CRC susceptibility and LINC00511 SNPs rs17780195, rs9906859, and rs1558535.

Figure 4A shows a strong LD observed in the CRC group LINC00511 SNPs rs17780195 vs rs9906859 were highly linked according to the D’ (0.94), as well as the other two sets of pairwise loci LINC00511 SNPs rs9906859 vs rs1558535 D’ = 0.89, and finally, rs17780195 vs rs1558535 D’ = 0.73, exhibiting strong linkage as well. The most striking finding from comparing these two LD plots is LD between rs1558535 and rs9906859 in control group (Fig. 4B) that showed only moderate LD (D’ = 0.61) compared to their very strong LD (D’ = 0.89) in the CRC group. This suggests that a particular risk haplotype involving rs1558535 and rs9906859 might be more tightly linked and prevalent in individuals with CRC.

Fig. 4
figure 4

Haplotype block analysis between LINC00511 SNPs rs17780195, rs9906859, and rs1558535 A) calculated Pairwise standardized linkage disequilibrium D’ in pairs in CRC patients (n = 200) (left panel) and B) controls (n = 200) (right panel), C) calculated pairwise correlation coefficient R2 in pairs in CRC patients (n = 200) (left panel) and D) controls (n = 200) (right panel), [D’: Pairwise standardized linkage disequilibrium, R2: Correlation coefficient.].

Pairwise correlation coefficient (R2) analysis (Figs. 4C and 4D) gave the same results as LD but with lesser extent. Greater correlation was observed between LINC00511 SNPs rs9906859 vs rs1558535 in CRC patients (R2 = 0.57) versus 0.33 in control group. Again, this difference indicates possible formation of a"risk-associated haplotype” involving rs1558535 and rs9906859 in CRC patients than in healthy controls. This comes in according with data shown in Table 7 considering ‘TAC’ mutant-wild-mutant haplotype for LINC00511 SNPs rs1558535, rs17780195, and rs9906859 as the haplotype with highest risk for CRC.

Kaplan-meier survival analysis

Univariate analysis of the OS among the CRC patients (n = 200) without distant metastasis according to gender, tumor site, and late CRC stages are prognostic factors (Table 10).

Table 10 Univariate analysis of CRC patients’ (n = 200) survival and the known CRC prognostic clinicopathological factors.

Kaplan–Meier survival analysis among the CRC patients (n = 200) without distant metastasis according to LINC00511 SNPs genotypes rs1558535 A > T DFS (Fig. 5A) and OS (Fig. 5B) and rs17780195 A > G DFS (Fig. 5C) and OS (Fig. 5D). However, for rs9906859 T > C DFS and OS were non-significant with P = 0.17 and P = 0.38, respectively.

Fig. 5
figure 5

Kaplan–Meier survival analysis among CRC patients (n = 200) without distant metastasis according to LINC00511 SNPs genotypes A) DFS for rs1558535 A > T, B) OS for rs1558535 A > T, C) DFS for rs17780195 A > G and D) OS for rs17780195 A > G. [DFS, disease-free survival; OS, overall survival].

Discussion

CRC is one of the leading causes of cancer-related deaths worldwide1,2. There are more than approximately 1 million new cases of CRC globally1,5 with high mortality and poor prognosis. More research is required to identify potential cancer prognostic and predictive markers to mitigate mortality and improve prognosis47. Moreover, CRC is a multifactorial disease in which both genetic and epigenetic modification are highly implicated8,14.Genome association studies have highlighted the significant association between SNPs and different diseases48,49 including various cancer types30,50,51.

LncRNAs represent a new frontier in molecular biology, participating in the regulation of nearly every stage of gene expression16,52 and were found to be implicated in variety of cancers18,53. It is evident that only 7% of disease-associated SNPs are found within protein-coding regions while the rest 93% of SNPs are located within the non-coding regions54. In addition, molecular research has revealed significant links between CRC susceptibility and lncRNA SNPs55,56,57.

LINC00511 has been shown as oncogene linked to various cancers as breast and CRC20,25,26 and its genetic variants have been found previously to be associated with breast cancer in the Chinese population28. To the best of our knowledge, this is the first study to explore the association between LINC00511 SNPs and CRC risk, via elucidating LINC00511 SNPs (rs1558535, rs17780195 and rs9906859) variants in CRC patients’ liquid biopsy as well as to determine the implication of these SNPs prognostic performance in CRC Egyptian patients’ cohort.

This study revealed a prospective role of the 3 studied SNPs (rs1558535, rs17780195 and rs9906859) in mounting CRC risk. The results obtained revealed significant increased frequencies of heterozygous genotype of both rs1558535 A > T and rs9906859 T > C in CRC patients in comparison to controls. Also, carriers of the wild genotype of both rs1558535 and rs9906859 are much less in the patients’ group than the control subjects carrying such genotype. Unconditional logistic regression showed an independent correlation between rs1558535 and rs9906859 and CRC risk. The codominant regression model revealed that TC genotype of rs9906859 was associated with 3.54-fold increased risk in CRC, after adjustment for age and BMI. Dominant regression results depicted that rs9906859 TC + CC increased the risk of CRC by 3.04-fold. This suggests “the strong predictive role of these genetic variants and CRC”. The association between LINC00511 SNPs and the increased risk of CRC could be attributed to changes in the expression of the secondary structure of LINC00511.

Several studies have demonstrated that mutant variants can alter the secondary structure of lncRNAs, thereby affecting their interaction with microRNA (miRNA; miR) binding sites. LINC00511 could mediate tumorigenesis of CRC through sponging miRs such as miR-625-5p25 and miR-29c-3p58. Therefore, LINC00511 genetic variants could be implicated in CRC increased risk via interrupting the binding of LINC00511 with its target miRNAs and subsequently affecting the expression of downstream genes1.

In examining the association between tumor stage and grade with the three studied LINC00511 SNPs in the CRC patient cohort, none of the variants showed a significant correlation with tumor grade. However, a notable association was observed between certain SNP genotypes and advanced disease stages. Specifically, patients heterozygous for the mutant allele of rs1558535 were more frequently found in advanced CRC stages (III or IV) compared to early stages (I or II), with an odds ratio (OR) of 3.99, indicating a nearly fourfold increased likelihood of presenting with late-stage disease. Similarly, carriers of the rs17780195 AG genotype were significantly more likely to be diagnosed at stages III or IV, showing a 2.72-fold increase in the odds of advanced-stage CRC presentation compared to those with earlier-stage disease. These findings are clinically relevant, given that tumor stage is a well-established prognostic factor in CRC, with the 5-year survival rate dropping from approximately 64% in early-stage disease to just 12% in metastatic CRC1,2,5.. Therefore, the strong association observed between tumor stage and both rs1558535 and rs17780195 variants in our CRC patient cohort suggests that these genetic variants may serve as potential prognostic biomarkers, given their link to CRC progression. Furthermore, regression analysis identified several prognostic risk factors for CRC, including male gender, positive family history, and history of inflammatory bowel disease (IBD), all of which were significantly associated with higher tumor grade. Additionally, both tumor site and the presence of vascular infiltration were found to be significantly correlated with advanced CRC stages, further supporting their role in disease progression.

Haplotype analysis offers significant potential for disease gene mapping by leveraging the relationship between causative mutations and their ancestral haplotypes of origin. One of the most important findings in the current study is the detection of a significant correlation between LINC00511 SNPs haplotype and CRC. It was found that ‘T rs1558535 A rs17780195 Crs9906859’ ‘TAC’ as mutant-wild-mutant haplotype is associated with 1.5-fold CRC increased risk (OR: 1.46, 95% CI: 1.07–1.99). While ‘TAT’ haplotype conferred a fivefold lower CRC risk (OR: 0.20, 95% CI: 0.09–0.47). Gene interaction analysis and epistasis analysis showed that CRC patients who are all-heterozygous or are all-homozygous for the alternate allele in all studied loci are at CRC high risk. Furthermore, individuals who are double heterozygotes for any combination of the two loci are also at high risk. Additionally, individuals who carry at least one mutant allele (i.e. are heterozygote and homozygote or homozygote and heterozygote) at rs1558535 and rs9906859, respectively, are at high risk of CRC as well. Even individuals who are heterozygous at rs1558535 are also at high risk. This highlights “the role of both rs1558535 and rs9906859 in predicting CRC risk”.

Results of this study provided another level of evidence on the role of rs1558535 and rs9906859 in CRC risk. Result of one-way W-Test for gene interaction analysis revealed that either rs1558535 or rs9906859 is highly statistically significant independent predictors for CRC risk. In addition, two-way interaction W-test using different pairs of the studied SNPs showed high W-value of 52.4 for rs1558535 and rs9906859 pair, indicating a highly significant epistatic interaction between them. Moreover, the mosaic plot of post-hoc epistasis analysis visualized the strong and complex interplay between these genotypes in determining CRC risk. Some genotype combinations between this pair of SNPs especially (rs1558535 AT, rs9906859 TT) genotype found to significantly enriched in CRC patients than control, underscoring how these two SNPs might influence an individual’s susceptibility to CRC.

This would be further confirmed or rolled out by LD analysis of neighbor genetic variants to assess the connection of several haplotypes. In this study, pairwise LD analysis showed a strong association in CRC group between different pairs (rs1558535 and rs9906859) as well as (rs17780195 and rs9906859). The most notable difference is the substantially stronger LD between rs1558535 and rs9906859 in the CRC group compared to the control group (D’ = 0.89 vs 0.61, respectively). This suggests that a specific haplotype combination involving these two SNPs is preferentially preserved and more prevalent in CRC patients. In the same line, correlation analysis revealed the same results. Greater correlation was observed between LINC00511 SNPs rs1558535 and rs9906859 in CRC patients than in the control group (R2 = 0.57 in CRC vs 0.33 in controls). This implies that certain alleles at these two loci might be preferentially combined together in CRC patients forming a"risk-associated haplotype". These finding provide further insights into “the connections of the studied SNPs, particularly rs1558535 and rs9906859 and their significant association with CRC susceptibility”.

Finally, we analyzed the OS and DFS in CRC group. The univariate analysis revealed that the late CRC stage IV showed 15.66-fold increase in the OS hazard ratio (OR: 15.66, 95%CI: 1.147–213.71). This is consistent with previous research considering advanced CRC stage, and its related factors, as strong predictor(s) of poor CRC prognosis/outcome1,5,7.

Shortcomings to this study were missing data for the genotypes for some patients, decreasing the statistical efficiency.

Strength(s) within the present study

First, the study included a relatively adequate sample size and was conducted on a population that represents a common and relevant demographic, enhancing its generalizability. Second, confounding bias was minimized through frequency matching of cases and controls. Third, the study is a step-toward personalized and precision medicine application, determine the genetic variants in lncRNA SNPs “implicating-to-disease”, aligning with the objectives of broader epigenomic and genomic research initiatives. Fourth, as part of the study’s future perspective, 10% of the samples will be selected for validation of genotyping results using lncRNA sequencing (lncRNA-seq) to confirm and expand upon current findings. Moreover, two ongoing researches, by our research group, at the Advanced Biochemistry Research Lab, Faculty of Pharmacy, Ain Shams University, are addressing the same LINC00511 SNPs variants haplotype role in breast cancer Egyptian female patients as well as hepatocellular carcinoma Egyptian patients’ prognosis and pathogenesis.

Sustainability plan

First, lncRNA-seq. via next generation sequencing (NGS) to study functional roles in diverse biological processes and human diseases, such as cancer, using the mutant samples. Second, LINC00511 neighbor SNPs prediction (rs4432291) using genotype imputation technique by the relevant possible software as MACH, IMPUTE, BEAGLE, SNPTEST (used for genotype imputation uncertainty when performing a test for association between genotypes and phenotypes) or BIMBAM and PLINK (the free open-source program) used as GWA analysis toolset59. Third, the OS and DFS are to be correlated with LINC00511 SNPs variants and lncRNA-seq. results.

Conclusion(s)

In brief, LINC00511 SNPs (rs1558535, rs17780195 and rs9906859) are associated with CRC increased risk. LINC00511 SNPs rs1558535 and rs17780195 are highly linked to CRC late stages. Individuals who are TT/AG/TC or AT/AG/CC for the alternate allele in all studied loci of SNPs rs1558535 and rs9906859 are at high risk of CRC.