Table 1 Total numbers of spectra (# Spec) and compounds (# Mol) used for training, evaluation, and external test

From: FIDDLE: a deep learning method for chemical formulas prediction from tandem mass spectra

Training and evaluation datasets

Q-TOF

Orbitrap

Instruments

 

# Spec

# Mol

# Spec

# Mol

 

NIST23

32,057

2,490

569,157

27,836

See the complete list in Table 2

NIST20

31,749

2,479

394,395

17,883

Agilent PCDL

42,410

11,479

0

0

MoNA

21,609

2652

591

303

GNPS

2642

1270

1513

509

Waters Q-TOF

757

610

0

0

Unique Totals

131,224

15,399

965,656

28,838

-

External Test Datasets

Q-TOF

Orbitrap

Instruments

 

# Spec

# Mol

# Spec

# Mol

 

CASMI 2016

502

405

0

0

Agilent 6540 Q-TOF, and Waters Synapt G2 Q-TOF

CASMI 2017

40

40

0

0

Waters Synapt G2 Q-TOF

EMBL-MCF 2.0

0

0

224

130

Orbitrap Exploris 240 (Thermo Fisher), and Q-Exactive Plus (Thermo Fisher)

  1. Training and evaluation datasets include NIST20 and NIST2319, Agilent PCDL, MoNA2, GNPS21, and Waters Q-TOF, while external test datasets include CASMI 2016 and 201729, and EMBL-MCF 2.030.