Table 1 Published reference databases commonly used for taxonomic assignment in COI eukaryotic metabarcoding studies.

From: MARES, a replicable pipeline and curated reference database for marine eukaryote metabarcoding

Reference Database

Target organisms

Source repository

Method

Sequences

Unique species (%Unique species)

Marine species (%Marine species)

Reference

BOLD

Eukaryotes

BOLD

Keyword search

5,586,934

169,705 (3.04)

18,328 (10.80)

Ratnasingham and Hebert18

GenBank

Eukaryotes

GenBank

Keyword search

1,933,547

160,061 (8.28)

17,943 (11.21)

NCBI Resource Coordinators20.

Midori

Metazoans

GenBank

Keyword search

927.386

131,988 (14.23)

14,057 (10.65)

Machida, et al.43

db_COI_MBPK

Eukaryotes

EMBL, BOLD

in silico ecoPCR + custom R script

188.975

48,853 (25.85)

6,844 (14.01)

Wangensteen and Turon39

CRUX_CO1

Eukaryotes

EMBL, GenBank

CRUX (in silico ecoPCR + blast)

1,401,802

127,422 (9.10)

15,737 (12.35)

Curd, et al.41

MARES_BAR

Marine eukaryotes

GenBank, BOLD

Keyword search

1,224,187

61,123 (4.91)

17,884 (29.26)

This data descriptor38

MARES_NOBAR

Marine eukaryotes

GenBank, BOLD

Keyword search

1,491,691

71,499 (4.79)

19,154 (26.79)

This data descriptor38

  1. BOLD and GenBank reference databases were built using Step 1 and 2 of the bioinformatic pipeline (Fig. 1). ‘BOLD’ was generated by retrieving all COI sequences available from the BOLD repository. ‘GenBank’ was generated with the keyword search Eukaryota and COI synonyms. ‘Unique species’ were retained after a quality control procedure that retains only unique, fully identified taxa with binomial species names. ‘% Unique species’ was calculated using the number of unique species as the numerator and the total number of sequences as the denominator. ‘Marine species’ was determined by the number of unique species present in each database that appeared in the World Register of Marine Species (WoRMS) and AlgaeBase27. ‘% Marine species’ was then calculated using the number of marine species as the numerator and the number of unique sequences as the denominator.