Fig. 3

Technical validation of PCMR resource. (A) Precancerous sample clinical information and profile metadata of GSE13898 dataset cataloged in PCMR. PCMR incorporates information of each precancerous, normal and/or malignant sample, including sample group (normal/premalignant/cancer), disease name, biological tissue for data, data platform, cancer stage, gender, age, and PubMed ID. (B) Distribution of PIGR gene expression across precancerous, normal, and/or malignant conditions within GSE13898. P-value was calculated by Mann–Whitney U test, and adjusted by Bonferroni correction. (C) Precancer-gene associations collected in PCMR, originated from differential analysis of precancerous profiles, text mining of abstracts using the ChatGPT large language model, and manual curation of cancer-related genes from published resources. (D) Differential expression analysis of 7 previously reported esophageal precancer-related genes (CFTR, PIGR, NAT2, CCDC25, ABCG2, CD86, GPA33), and 13 additional unreported relevant genes (HES1, RETSAT, NAT8, BMP4, SHH, TGFB1, SMO, CDKN2A, PTCH1, IGF1, GLI1, TP53, NFKB1). Based on GSE13898 dataset, the expression level of RNAs are compared across precancerous, normal, and malignant conditions. P-value was calculated by Mann–Whitney U test, and adjusted by Bonferroni correction. (E) Kaplan-Meier survival analysis of NAT8 in esophageal carcinoma based on the Gene Expression Profiling Interactive Analysis (GEPIA) database, with data from The Cancer Genome Atlas Program (TCGA). (F) Gene functional enrichment analysis of 13 differentially expressed genes based on the Database for Annotation, Visualization and Integrated Discovery (DAVID).