Table 1 Description of the 23 datasets used in the experiments.

From: A scalable sparse neural network framework for rare cell type annotation of single-cell transcriptome data

Application

Dataset

Cell number

Class number

Protocal

Reference

Minor cell type (<5% total cell number)

Cell number of the smallest cell type

 

Deng

268

6

Smart-seq2

Deng et al.49

2

12

 

Darmanis

466

9

SMARTer

Darmanis et al.50

3

16

 

Usoskin

622

4

STRT-Seq

Usoskin et al.51

0

81

 

CampLiver

777

7

SMARTer

Camp et al.52

0

70

 

Baron Mouse

1886

13

inDrop

Baron et al.53

8

6

Intra-datast Benchmark

Muraro

2122

9

CEL-Seq2

Muraro et al.54

4

3

 

Lake

3042

16

Fluidigm C1

Lake et al.55

10

45

 

Baron Human

8569

14

inDrop

Baron et al.53

9

7

 

Campbell

21,086

32

Drop-Seq

Campbell et al.56

29

30

 

Zilionis

34,558

9

inDrop

Zilionis et al.57

4

108

 

TM (Tabula Muris)

54,865

55

10X Genomics

Schaum et al.58

48

24

 

Zheng 68 K

65,943

11

10X Genomics

Zheng et al.59

6

92

 

PbmcBench 10X (V2)

23,154

9

10X Genomics (v2)

Ding et al.60

4

132

 

PbmcBench 10X (V3)

19,690

8

10X Genomics (v3)

Ding et al.60

4

209

 

PbmcBench (CEL-Seq)

19,754

7

CEL-Seq2

Ding et al.60

3

559

 

PbmcBench (Drop-Seq)

23,154

9

Drop-Seq

Ding et al.60

4

102

 

PbmcBench (inDrop)

21,832

7

inDrop

Ding et al.60

3

134

Inter-dataset Benchmark

PbmcBench (Seq-Well)

18,966

7

Seq-Well

Ding et al.60

3

102

 

PbmcBench (SMARTseq)

18,886

6

SMART-Seq2

Ding et al.60

2

569

 

Xin

1449

4

SMARTer

Xin et al.61

1

46

 

Baron Human

8569

14

inDrop

Baron et al.53

9

7

 

Segerstolpe

2133

13

SMART-s

Segerstolpe et al.62

7

5

 

Muraro

2122

9

CEL-Seq

Muraro et al.54

4

3

Case study

Cardiac Atlas

487,106

11

10X Genomics

Litviňuková et al.41

5

3799

 

PKU_Covid Atlas

1,462,702

64

10X Genomics

Ren et al.17

54

17

  1. The Dataset column presents the dataset name we used in the article. The cell number column shows the total number of cells in the dataset before preprocessing. The protocol column shows the sequencing method that generates this dataset. The minor cell-type column shows the number of the cell types which has cells less than 5% of the total cell numbers. The cell number of the smallest cell-type column presents the number of cells in the cell population that has the smallest cell number. All the usages of the corresponding dataset are shown in the Application column.