Table 2 Levels of HTAN data

From: Sharing data from the Human Tumor Atlas Network through standards, infrastructure and community engagement

Level

Single-cell RNA-seq

Multiplex imaging

Spatial transcriptomics

1

Unaligned sequencing reads, usually in the FASTQ file format.

Raw imaging tiles that require preprocessing, such as stitching, registration or background subtraction. Typically TIFF or a proprietary format

Unaligned sequencing reads, usually in the FASTQ file format.

2

Aligned sequencing reads, usually in the BAM file format.

Multichannel image. Usually in the OME-TIFF file format, accompanied by a CSV file containing channel metadata.

Aligned sequencing reads, usually in the BAM file format.

3

Gene expression matrix. For example, a matrix of all cells by all genes, with expression counts.

Multiple file formats are supported, including CSV, MTX and h5ad.

Segmentation masks denoting nuclei, cytoplasm, whole cells or regions of interest.

Multiple file formats are supported although TIFF and OME-TIFF are recommended.

Gene expression matrix.

For example, a matrix of all cells by all genes, with expression counts.

Multiple file formats are supported, including CSV, MTX and h5ad.

4

Feature matrix. For example, a matrix of cluster assignments or imputed cell types across all sequenced cells.

Multiple file formats are supported, including CSV and h5ad.

Feature matrix. For example, a matrix of mean intensity values per cell and channel

Multiple file formats are supported, including CSV and h5ad.

Feature matrix. For example, a matrix of cluster assignments or imputed cell types across all sequenced cells.

Multiple file formats are supported, including CSV and h5ad.

  1. Lower levels indicate raw data and higher levels indicate data analyzed by one or more bioinformatics or image-processing pipelines. Three primary categories of data are highlighted.