Background & Summary

Macrobenthic invertebrates, animals larger than 1 mm, are key components in environmental monitoring and are extensively used for ecological status assessment of marine ecosystems worldwide because of their sensitivity to natural and anthropogenic disturbances1,2,3,4,5,6,7. Anthropogenic disturbances such as fisheries, sand extraction, pollution and shipping can impact growth, mortality, dispersal and recruitment of macrobenthic invertebrates, which in turn will affect ecosystem structure and function, along with their resilience8,9,10. Macrobenthic invertebrates are among the key obligatory components of biological monitoring surveys implemented in numerous countries in support of environmental directives, such as the European Union’s Marine Strategy Framework Directive (MSFD 2008/56/EC) and the Environmental Impact Assessment Directive (EIAD 2014/52/EU). Furthermore, for ecosystem health assessment, often ecological indices, such as the AZTI’s Marine Biotic Index (AMBI) and Shannon-Wiener diversity index (H’), are applied to macrobenthic communities, and these mostly require species-level identifications2.

Taxonomic identifications of macrobenthic invertebrates for routine assessments in marine areas, including the North Sea, have been carried out, using almost entirely morphology-based methodologies up to this day. This is a time- and cost-consuming, as well as a skill-dependent approach, which can result in a bottleneck (sampled vs processed specimens) in processing benthic samples for e.g. status assessments11,12. Moreover, species-level identifications can often be hindered, either because morphology-based identifications are difficult and require specialised expertise (especially true in groups such as Bryozoa, Hydrozoa, and Nemertea), or because during sampling and processing specimens get damaged and are missing key taxonomic characteristics. Furthermore, species-level identifications can be extremely difficult or nearly impossible when dealing with non-adult stages such as juveniles, larvae or eggs12. The increasing reports of cryptic species, even among many common macrobenthic species, complicates the morphology-based species identifications even further13. Finally, non-indigenous species resembling native species can be overlooked in routine rapid monitoring assessments14.

DNA-based approaches, such as DNA metabarcoding, have the potential to help tackle many of the limitations encountered by the morphology-based approach15,16,17,18,19. DNA metabarcoding appears to be more cost- and time-effective, does not require taxonomic experts for species identification and can detect non-indigenous, rare, or even undiscovered species that can go unnoticed with conventional methods14,20,21. Instead of the specimens being identified one by one morphologically, DNA is extracted from the total community, and a small fragment of the genome is amplified through PCR22. The resulting amplicons are sequenced using high-throughput sequencing and the sequences produced are processed through bioinformatic pipelines23. However, the potential power of DNA metabarcoding is currently limited mainly by the considerable endeavour needed to build comprehensive and reliable taxonomic sequence reference libraries that are required for matching DNA sequences to species names15,24,25. To ensure a high quality reference library, sequences must have a priori curated taxonomic information, and are preferably restricted to a list of species of the study area as taxonomic misassignment increases with geographic distance25,26. Most studies re-use sequences obtained from public sequence repositories, with the most common being NCBI GenBank (https://www.ncbi.nlm.nih.gov/genbank/)27,28 and Barcode of Life Data System (BOLD; http://www.boldsystems.org/)29. Although the databases’ importance is unquestionable, there is a significant percentage of sequence data without quality control and taxonomic validation that could lead to misleading results25,30,31,32.

The North Sea is amongst the most heavily human-impacted marine areas worldwide33,34,35 (Fig. 1). At the same time, the North Sea is also one of the most well studied and data-rich marine areas in the world34, and it is routinely monitored by several countries organized in OSPAR (Convention for the Protection of the Marine Environment of the North-East Atlantic) and ICES (International Council for the Exploration of the Sea). The management of such a system with many anthropogenic pressures requires timely and efficient monitoring approaches. Therefore, the Interreg project GEANS (Genetic Tools for Ecosystem Health Assessment in the North Sea Region), a transnational project among nine institutions across the North Sea aimed to implement accurate, fast, cost-effective DNA-based tools in routine biomonitoring of the North Sea. To this end, GEANS deemed it essential to develop a curated DNA reference library based on mitochondrial cytochrome c oxidase subunit I (COI) for the North Sea macrobenthos (mainly soft bottom) in support of the routine monitoring programs in the North Sea. The choice of COI marker was driven by (i) the marker’s taxonomic resolution which permits species discrimination, identification and discovery in most of the marine invertebrate groups15,36; (ii) the vast amount of data already available as reference in the collaborators’ labs and in public repositories27,29 that could be used for cross-checking; (iii) the consistent use of COI in barcoding species and especially the 5′ end (COI-5P), the region that can be amplified using universal DNA-barcoding primers, such as LCO1490/HCO219837 and their variations developed in recent years38. Furthermore, prior national and international initiatives have demonstrated the general effectiveness of DNA barcoding for various marine invertebrate groups in the North Sea, such as Mollusca39, Echinodermata40, Crustacea41,42.

Fig. 1
figure 1

Map showing the collection localities of the barcoded specimens included in the GEANS reference library.

The aim of the present work was to create a curated DNA barcode reference library for the macrobenthic invertebrates of the North Sea (with priority to soft bottoms) by: (i) producing new COI reference sequences from a targeted region-defined species list; (ii) assessing and curating COI data already available to the collaborators’ labs; and (iii) providing a workflow for the creation of a curated reference library (Fig. 2).

Fig. 2
figure 2

Simplified overview of curation workflow for the GEANS reference library. Inspired by Collins et al. 2021 (logos and images are public domain and were acquired from https://www.phylopic.org, https://www.ncbi.nlm.nih.gov, https://boldsystems.org, https://www.keyence.eu, https://www.deutsche-meeresforschung.de).

The GEANS reference library numbers a total of 4005 COI-5P barcode sequences from 732 (715 identified to species level) macrobenthic taxa and assigned to 764 BINs (Barcode Index Number43), which in turn were distributed over 15 phyla, 29 classes, 92 orders, 333 families, and 537 genera (Fig. 3; GEANS Reference Library44). The reference library, when compared to the number of macrobenthic species (2514 species, North Sea species list44) present in the North Sea covers over 29% of macrobenthos species diversity. Of the total number of taxa barcoded and identified to species level (715), 77 correspond to NIS (GEANS targeted species list44). A total of 1714 new DNA barcodes were generated through this study, of which 173 belonged to 62 species barcoded for the first time (Fig. 4; Table 1). The number of individuals per species ranged from 1 to 158, with 346 species (48%) represented by less than three individuals, 272 of which were represented by only a single specimen. Arthropoda was the most well represented taxon in number of sequences in the library with 1886 (47%; Fig. 3; Table 1) belonging to 246 species (Figs. 4, 5). Annelida, although recorded by a low number of sequences (358), were well represented in the reference library (126 species, 18% of the total number of species barcoded within GEANS, Fig. 5). Among all sequenced groups, Echinodermata had the highest barcode coverage with 93% (corresponding to 40 species; Fig. 5) of all Echinodermata species included in the GEANS target checklist being barcoded and 48% (40 species) when compared to the whole North Sea fauna (84 species; Fig. 4). In contrast, Bryozoa had the lowest barcode coverage with 50% of the total number of Bryozoa species in the checklist but only 8% when compared with the total North Sea fauna (Figs. 4, 5).

Fig. 3
figure 3

Taxonomic composition of the 4005 sequenced invertebrate marine specimens included in the GEANS DNA reference library (ds-GEANS1).

Fig. 4
figure 4

Barcode coverage of marine macrobenthic species of the North Sea in the GEANS DNA reference library. Numbers on bars refer to the species barcoded in comparison to the North Sea species.

Table 1 Number of sequenced specimens, genera, species and the corresponding BINs per phylum included in the GEANS reference library.
Fig. 5
figure 5

Barcode coverage of marine macrobenthic species of the North Sea included in the GEANS target list. Numbers on bars refer to the species successfully barcoded, species with specimens present but with unsuccesfully barcoding, and to species with no specimens aquired.

Methods

The curated DNA barcode reference library (GEANS Reference Library44) presented here for North Sea macrobenthos was constructed based on a seven-step workflow (Fig. 2) that generated a diverse set of validated data starting with a targeted species checklist (GEANS targeted species list) restricted mainly to the south North Sea (Fig. 1).

Targeted species checklist and North Sea species list

The nine GEANS partners (originating from seven countries: Belgium, Denmark, Germany, Netherlands, Norway, Sweden, United Kingdom) provided regional species lists based on species encountered in their long-term morphology-based monitoring data. This resulted in a concatenated target list (GEANS targeted species list) of 1016 marine macrobenthic species (119 non-indigenous species, NIS), that were checked for taxonomy (e.g. synonyms removed, checking validity of species names). As such, the majority of these species occur in areas of the North Sea, where the GEANS partners performed the case studies for testing the effectiveness of metabarcoding for specific monitoring questions14,45,46. The targeted checklist served as the basis of the GEANS reference library. To put the targeted list in a wider North Sea perspective, a North Sea macrobenthic species list44 was generated. This North Sea species list was created after extracting macrobenthic data from EurOBIS in a similar manner as Herman et al.47. Additionally, it was completed by the list of Zettler et al.48, and cross-checked by the list provided by WORMS for the North Sea which in turn was verified based on the relevant literature.

Specimen collection, and identification of specimens

Specimens were collected from the North Sea during various research expeditions that took place in the years 2019–2021 using Van Veen grabs, boxcorers, and dredges (ring dredge, Triple-D dredge). Sampling was conducted by three GEANS partners, the German Centre for Marine Biodiversity Research (DZMB), Naturalis Biodiversity Center (Naturalis), and Flanders Research Institute for Agriculture, Fisheries and Food (ILVO) with research vessels RV Senckenberg, RV Pelagia, RV Belgica A956, RV Simon Stevin and GeoSurveyor XI. Subsequently, the same partners performed the morphological and genetic analyses. Following their collection, bulk samples or separated animals were fixed in precooled 96% or 99.8% ethanol. For all samples and specimens, DZMB collected, the ethanol was decanted after 24 hours and replaced with new 96–99.8% EtOH to guarantee sufficient ethanol concentration for preservation of high-quality DNA, and subsequently stored at −20 °C in one or more of the collaborative laboratories. In the laboratory, samples were sorted and identified at species level by taxonomic experts. The taxonomic status of each species was validated based on the World Register of Marine Species (www.marinespecies.org). For each species, when possible, at least three voucher specimens were archived. Additional specimens were provided by the Gothenburg Natural History Museum, Sweden as well as by the German authority Landesbetrieb für Küstenschutz Nationalpark und Meeresschutz Schleswig-Holstein.

Barcoding data collection

DNA extraction, amplification, and sequencing

Total genomic DNA was extracted from animal tissue. At DZMB for samples where DNA of high quality was expected, the DNA extractions were carried out using 30 μL Chelex (InstaGene Matrix, Bio-Rad) according to the protocol of Estoup et al.49 and directly using it as DNA template for PCR. For samples where DNA of low quality was expected the Monarch Genomic DNA Purification Kit was used following manufacturer’s instructions. At ILVO, DNA was extracted using the DNeasy Blood & Tissue kit (Qiagen) following the manufacturer’s protocol. The concentration of the DNA was determined using the Quantus Fluorometer with the QuantiFluor dsDNA System (Promega). At Naturalis, DNA was extracted using the NucleoMag 96 Tissue kit (Macherey-Nagel) on the KingFisher (Thermo Scientific) according to the manufacturer’s protocol. DNA extractions were stored at −20 °C. A fragment of 658 bp of the mitochondrial cytochrome c oxidase subunit (COI), which is the standard barcoding marker for animals, was amplified by polymerase chain reaction (PCR). Amplifications in DZMB were performed using AccuStart PCR SuperMix (ThermoFisher Scientific) in a 25-μL volume using a standardised protocol (Table 2). All PCR products were purified using ExoSap-IT (ThermoFisher Scientific). Amplifications in Naturalis were performed using Phire II Hotstart (Thermo Scientific) in a 25 μL volume (Table 2). For the COI amplification the degenerate forward primers jgLCO1490 and reverse primer jgHCO219838, tailed with M13F and M13R-pUC, respectively, were used both by DZMB and Naturalis. DZMB also used the Echinodermata specific forward primer, LCOech1aF150, a polychaeta specific primer pair51 whereas, a universal pair that amplifies a shorter barcode region was also tested11. Amplifications in ILVO were performed with LCO1490 and HCO2198 primers in a 40 μL volume (Table 2). PCR products produced by ILVO were purified using the Wizard® SV Gel and PCR Clean-Up System (Promega). Purified PCR products from SGN and ILVO were sequenced by Sanger sequencing in both directions at Macrogen Europe BV (Amsterdam, The Netherlands), whereas Naturalis fragments were sequenced at BaseClear BV (Leiden, The Netherlands).

Table 2 PCR amplification conditions for COI gene in each research institute.

Existing barcodes in collaborators databases

The collaborators’ internal databases were mined for barcode sequences of macrobenthic animals collected from the North Sea. Only barcodes above 500 bp were considered, unless shorter fragments were the only ones available for a targeted species. Specifically, the DZMB completed the GEANS reference library with COI sequences from past barcode initiatives such as the “Molecular taxonomy and DNA barcoding of marine organisms (metazoa) of the North Sea”39,40,41,42. These sequences correspond to specimens or tissue archived in DZMB’s collections. The forward and reverse sequence chromatograms for each specimen were inspected, assembled, and edited using Geneious v.9.1.7 (www.geneious.com52). The COI sequences were aligned using MAFFT v7.30853 under G-INS-I algorithm, while alignments were further manually edited.

Data Records

The GEANS Reference Library (summary information), GEANS Targeted Species List and North Sea species list and Neighbour Joining trees are available in Figshare44. Additionally in Figshare44 are found the DNA barcodes and specimen photos corresponding to the new barcodes produced. Additionally, barcodes produced during GEANS are available in GenBank (BioProject PRJNA123682254). The data are available as well in BOLD through the dataset DS-GEANS155 (dx.doi.org/10.5883/DS-GEANS1). Each COI barcode included in the GEANS reference library is accompanied by the following mandatory information: 1) sample ID; 2) specimen taxonomic identification and classification; 3) collection date; 4) collection coordinates; 5) storing institution; 6) when possible one photograph of the specimen (Fig. 6), when possible photos of the key diagnostic features; 7) name of taxonomic expert; 8) sequence chromatograms; 9) museum ID when specimens are archived in museum collections. Finally, the GEANS reference library also includes: (1) voucher specimens; (2) tissue samples; (3) total DNA extractions. A specimen was considered as a species reference when molecular and morphological assessments agreed. The library follows the barcode data standard requirements29,32,36,56. Samples and extractions are available in the partner institutes (DZMB, Naturalis, ILVO).

Fig. 6
figure 6

GEANS reference library online gallery of photo vouchers of sequenced specimens identified by taxonomic experts. (A) Hippasteria phrygiana (Parelius, 1768); (B) Pagurus bernhardus (Linnaeus, 1758); (C) Macoma balthica (Linnaeus, 1758); (D) Peringia ulvae (Pennant, 1777); (E) Loimia ramzega Lavesque et al. 2017; (F) Psammechinus miliaris (P.L.S. Müller, 1771); (G) Ampelisca brevicornis (A. Costa, 1853); (H) Doris pseudoargus Rapp, 1827; (J) Diastylis bradyi Norman, 1879; (I) Magelona johnstoni Fiege, et al., 2000; (K) Euspira nitida (Donovan, 1803); (L) Pilumnus hirtellus (Linnaeus, 1761); (M) Lekanesphaera rugicauda (Leach, 1814). Scales: 1 cm (A, B, D, F); 1 mm (K, M); 2 mm (G, H, J, I, L); 5 mm (E). Photos by: V. Borges (A); M. Christodoulou (B, D, F); H. Hillewaert (C, E, G, J, I, K); GiMaRIS (H, L); W. Stamerjohanns (M).

Technical Validation

Each institution performed independent morphological identifications prior to the genetic identification. When disagreements were found, they were listed and the voucher specimens or the photos were revised to verify the original identifications. Obvious mistakes in identification or curation (e.g., mixing of photos for example) were corrected, in all other cases the mismatch between genetic and morphological identification was recorded as such. Finally, the species names were updated to the current taxonomy based on the World Register of Marine Species (WoRMS). Curation cycles were performed at regular intervals (Fig. 2). In addition to morphological validation, all barcodes were translated into amino acids to check for stop codons and to detect the presence of nuclear DNA pseudogenes (NUMTs). The obtained COI sequences were initially compared with the GenBank nucleotide database using BLASTN57 to confirm the phylum identity (Fig. 2). Additionally the BOLD database was used for verification once the barcodes were within, since BOLD contains more barcode sequences than GenBank (including unpublished barcodes). For each taxonomic group (phylum or order depending on the number of sequences), Neighbour Joining (NJ) analysis based on p-distances with 1000 non-parametric bootstrap replicates was performed using the software MEGA v.1158 and any irregularities (possible contaminations) were removed from the library (trees are available in Figshare44). Sequences were considered to be the same taxon if sequence identity was ≥97.5%.

Usage Notes

The GEANS DNA reference library offers a comprehensive, publicly available barcode dataset for North Sea macrobenthos available in BOLD (DS-GEANS1 (dx.doi.org/10.5883/DS-GEANS155). This resource enables specimen identification through barcoding and metabarcoding, thereby greatly facilitating macrobenthic biodiversity assessments using molecular tools in the North Sea region. The DNA barcode reference library presented in this study includes around 30% of North Sea macrobenthic species, and aims to complement and facilitate the morphological identification of species through barcoding or metabarcoding.

From the total number of targeted species in the checklist (1016 species, GEANS targeted species list), we were unable to recover barcode sequences for 215 species (21%), and were not successful in finding specimens for an additional 86 species (8%). The phylum with the lowest amplification success was Annelida-Polychaeta (37%, Fig. 5), followed by Arthropoda (18%) and Mollusca (14%).

The majority of BINs allocated to the species within the GEANS dataset were considered concordant (i.e., one BIN = one species) with 684 species corresponding to 96% of the total number of BINs (GEANS Reference Library). A total of 31 species were assigned to more than one BIN (72 BINs, 4% of the species). Although originally a larger number of BINs than the one mentioned above were found to be discordant (BINs shared by more than one species), a subsequent validation revealed that this was due mainly to misidentifications. A small number of shared BINs are most likely due to the presence of unvalidated or erroneously identified data in BOLD and not actually wrong records in our dataset, however some closely related species may not be distinguishable solely by the COI and they may appear sharing BINs. At the same time, a number of species found to hold more than one BIN could indicate the presence of cryptic species (e.g., Astropecten irregularis, Crepidula fornicata, Hediste diversicolor).

The library is expected to significantly expand the reach and accuracy of DNA metabarcoding studies in the North Sea whereas it allows for its continued growth to better understand the diversity of the North Sea fauna.