Fig. 1: Characterization of HERVH expression in CRCs.

a PCA of gene expression of 51 normal and 631 CRC tumor tissues from the TCGA COREAD dataset. b PCA based on the expression of repetitive sequences. c Classification of differentially expressed repetitive sequences analyzed using DESeq2 (version v1.22.2, with two-tailed likelihood ratio test). Differentially expressed repetitive elements are determined with the cut-off values of adjusted p value <0.05 and |Log2 FC (Fold Change)| >0.585. d Volcano plot of the differentially expressed ERVs analyzed using DESeq2 (version v1.22.2, with two-tailed likelihood ratio test). Up (red) and down (green) regulated ERVs are determined with the cut-off values of adjusted p value <0.05 and |Log2 FC | > 0.585. e Overlap analysis of the upregulated ERVs in CRCs samples and early embryonic cells identifies the internal coding sequences of HERVH (HERVH-int) and its corresponding LTR (LTR7Y) as the commonly upregulated elements. f Survival analysis based on the expression level of HERVH-int and the overall survival (OS) from 493 patients with AJCC pathologic tumor stage >I. The mean expression value of HERVH-int is used to demarcate the HERVH-int-High (145 patients) and HERVH-int-Low (348 patients) groups. Log-rank test, p = 0.011. g Correlation analysis of HERVH-int expression and mutational status of the most frequently mutated genes in CRCs using the TCGA dataset. h Correlation of HERVH-int expression and gene mutations in CRC cell lines in the CCLE dataset. The data used to generate each panel in Fig. 1 are listed in Supplementary Data 1. Source data including exact p values are provided as a Source data file.