Table 3 Metadata for tumor and PDX compendia.
From: Consistently processed RNA sequencing data from 50 sources enriched for pediatric data
Metadata field | Column name | Values |
---|---|---|
Treehouse dataset identifier | th_dataset_id | [study_id]_[donor number]_S[dataset within donor], e.g. “THR37_1294_S01” |
Treehouse harmonized disease | disease | one of 192 values, e.g. “acute lymphoblastic leukemia”, “Ewing sarcoma” |
ICD-O disease code | icd_disease | one of 139 values, e.g. “9801/3: Acute leukemia, NOS”, “9364/3: Ewing sarcoma” or “unavailable” |
Age at diagnosis, in years | age_at_dx | 0–90, e.g 1.9 or “unknown” |
Organism | organism | “Homo sapiens” |
Is the patient pediatric, adolescent or young adult? | pedaya | “yes”, “no”, or “unknown” |
Sex | sex | “female”, “male”, or “unknown” |
Treehouse code for the source of a group of datasets, used in the Treehouse dataset identifier | study_id | typically represents a research group or publication, usually starting with TH (for clinical partners) or THR, e.g. “TH03”, “THR37”; exceptions include “TCGA” and “TARGET” |
Repository ID for the source of a group of datasets | study_accession | Study ID from SRA, EGA, dbGaP or St Jude. e.g. “SRP092501”, “EGAD00001001098”, or “unavailable” |
Name of the repository or host of the study | source_name | e.g. “St. Jude Cloud”; or “unavailable” |
Identifier used by the repository or publication to refer to the individual who donated the tissue | study_donor_id | e.g. “TARGET-30-PASSRS”, “BZ11-Tumor”, “RMS1”; or “unavailable” |
Identifier used by the repository or publication to refer to RNA-Seq data generated from one sample | study_dataset_id | e.g. “EGAN00001179688”, “SRR4306220”; or “unavailable” |