Table 3 Metadata for tumor and PDX compendia.

From: Consistently processed RNA sequencing data from 50 sources enriched for pediatric data

Metadata field

Column name

Values

Treehouse dataset identifier

th_dataset_id

[study_id]_[donor number]_S[dataset within donor], e.g. “THR37_1294_S01”

Treehouse harmonized disease

disease

one of 192 values, e.g. “acute lymphoblastic leukemia”, “Ewing sarcoma”

ICD-O disease code

icd_disease

one of 139 values, e.g. “9801/3: Acute leukemia, NOS”, “9364/3: Ewing sarcoma” or “unavailable”

Age at diagnosis, in years

age_at_dx

0–90, e.g 1.9 or “unknown”

Organism

organism

“Homo sapiens”

Is the patient pediatric, adolescent or young adult?

pedaya

“yes”, “no”, or “unknown”

Sex

sex

“female”, “male”, or “unknown”

Treehouse code for the source of a group of datasets, used in the Treehouse dataset identifier

study_id

typically represents a research group or publication, usually starting with TH (for clinical partners) or THR, e.g. “TH03”, “THR37”; exceptions include “TCGA” and “TARGET”

Repository ID for the source of a group of datasets

study_accession

Study ID from SRA, EGA, dbGaP or St Jude. e.g. “SRP092501”, “EGAD00001001098”, or “unavailable”

Name of the repository or host of the study

source_name

e.g. “St. Jude Cloud”; or “unavailable”

Identifier used by the repository or publication to refer to the individual who donated the tissue

study_donor_id

e.g. “TARGET-30-PASSRS”, “BZ11-Tumor”, “RMS1”; or “unavailable”

Identifier used by the repository or publication to refer to RNA-Seq data generated from one sample

study_dataset_id

e.g. “EGAN00001179688”, “SRR4306220”; or “unavailable”