Table 2 List of scripts necessary to reproduce Method-2016 and -2021 available at https://github.com/sysbiomed/MONET).

From: Updating TCGA glioma classification through integration of molecular data following the latest WHO guidelines

Scripts

Files

Description

2016-classification.R

INPUT

• TCGA-LGG.RData

• TCGA-GBM.RData

OUTPUT

• Integrated_data_2016.RData

TCGA-LGG and -GBM clinical information

Matrices (one per each glioma type + unclassified) containing clinical + integrated molecular information needed for reclassification

2021-classification.R*

INPUT

• TCGA-LGG.RData

• TCGA-GBM.RData

• IDHstatus-TCGA-case_study.csv

OUTPUT

• Case_set.csv

• Integrated_data_2016.RData

TCGA-LGG and -GBM clinical information

Information about IDH status retrieved manually

List of cases with missing IDH status information

Matrices (one per each glioma type + unclassified) containing clinical + integrated molecular information needed for reclassification

Creation_output1.R

INPUT

• Integrated_data_2016.RData

OUTPUT

• Matrix_WHO2016.csv

Sample classification provided by Method-2016

Creation_output2.R

INPUT

• Integrated_data_2021.RData

OUTPUT

• Matrix_WHO2021.csv

Sample classification provided by Method-2021

Creation_output3.R

INPUT

• Integrated_data_2016.RData

• Integrated_data_2021.RData

OUTPUT

• SIMPLIFIED_CLASSIFICATION_TCGA_2016_2021.csv

Comparison between TCGA, 2016 and 2021 classifications (simplified labels)

  1. Files in bold are further described in the item below.
  2. *In “2021-classification.R” script, a commented line (No. 84) creates the csv file with the samples manually searched on the portal (“case_set.csv” file). The file “IDHstatus-TCGA-case_study.csv” needs to be uploaded to include the information we integrated from the GDC Data Portal. It contains a list of n = 20 samples providing the IDH status information, the outcome of our information retrieval (WT or IDH1 and/or IDH2-mutant), and the corresponding mutation, if present.
  3. Note that the scripts “2016-classification.R” and “2021-classification.R” produce as output two RData files, which serves as input to create our final results. These files, namely “Integrated_data_2016.RData” and “Integrated_data_2021.RData”: R files comprising four matrices each (one per glioma type – considering the simplified labels: Astrocytoma, Oligodendroglioma and GBM –, plus a list of the unclassified samples). These matrices contain all the clinical information provided by the TCGA-LGG and -GBM projects, integrated with the curated molecular features needed for classification. Particularly, we included IDH mutation status and the presence/absence of 1p/19q codeletion, and, only in the case of Method-2021, information about TERT promoter status and the combination of chromosome 7 gain and chromosome 10 loss, to be able to classify GBM in samples exhibiting IDH-wildtype.