Table 1 COADREAD dataset information list.
From: Identification of a novel macrophage-related prognostic signature in colorectal cancer
- TCGA the cancer genome atlas, COADREAD colon adenocarcinoma/rectum adenocarcinoma esophageal carcinoma.
- We employed R package TCGAbiolinks45 to download the expression matrix of CRC (colon adenocarcinoma/rectum adenocarcinoma esophageal carcinoma, COADREAD) dataset TCGA-COADREAD from the cancer genome atlas (TCGA, https://portal.gdc.cancer.gov/), eliminated samples missing key clinical information, and obtained 644 CRC samples (cancer group, grouping: COADREAD) and 51 paracancer samples (normal group, grouping: Normal), and they were normalized into Fragments Per Kilobaseper Million (FPKM) format, and UCSC Xena database49 (http://genome.ucsc.edu) was utilized to acquire corresponding clinical data. R package limma13 was employed to normalize the count sequencing data of TCGA-COADREAD dataset.
- We obtained the COADREAD-related datasets GSE1433346, GSE7460247 and GSE8721148 from the GEO database50 via R package GEOquery51. For GSE14333, Homo Sapiens was selected, and GPL570 [HG-U133_Plus_2] Affymetrix Human Genome U133 Plus 2.0 Array served as data platform. GSE14333 contained microarray gene expression profile data of 290 CRC patient samples. GSE74602 from Homo Sapiens, GPL6104 Illumina humanRef-8 v2.0 expression beadchip, containing microarray gene expression profiles from 30 CRC patient samples and 30 fully matched normal tissue samples adjacent to cancer. GSE87211 from Homo Sapiens, GPL13497 Agilent-026652 Whole Human Genome Microarray 4 × 44 K v2 (Probe Name version), a total of 203 CRC patient samples and 160 partially matched paracancer normal tissue samples were included in the microarray gene expression profile data. All samples were included in this study. The datasets were annotated with the corresponding GPL platform files, and all three GEO datasets were used as validation sets (Table 1).
- We collected MRGs from the GeneCards52 database, which provides comprehensive information on human genes (https://www.genecards.org/). In the GeneCards database, only MRGs with "Protein Coding" and Relevance score > 5 were retained after searching for "Macrophage" as a keyword, and a total of 576 MRGs were obtained. We obtained 92 MRGs from the references and then combined and de-duplicated them to obtain a total of 637 MRGs (Table S1).
- We downloaded somatic mutation data from TCGA-COADREAD dataset from the TCGA website including data such as SNP (single nucleotide polymorphism) and visualized the data using the R package maftools53. To analyze copy number variation (CNV) in COADREAD patients, R package TCGAbiolinks was employed to download "Copy Number Variation" data of TCGA-COADREAD dataset and then the data were integrated for GISTIC 2.0 analysis54, using default settings for the analysis parameters. We obtained the data of tumor mutation burden (TMB) and microsatellite instability (MSI) of TCGA-COADREAD dataset by downloading from cBioPortal for Cancer Genomics database (https://www.cbioportal.org/)55.