Background & Summary

Fish are the most abundant and diverse vertebrate taxa on Earth, giving rise to modern tetrapods. Studying fish diversity and distribution is critical to understanding biodiversity in aquatic ecosystems. Fish also play crucial roles in marine food webs. Changes in fish populations can alter ecosystems by disrupting predators, including marine birds and mammals, and impact nutrient cycling1.

Moreover, fish are vital to human life, possessing social, cultural, recreational, and economic significance1,2. They are a global protein source, particularly in developing countries, and contribute largely to global and local economies3. In its 2022 report to Congress, the National Oceanic and Atmospheric Administration (NOAA) stated that recreational and commercial marine fisheries contributed $321 billion to the American economy and supported 2.3 million jobs. According to a 2018 survey developed by the Suffolk County Department of Economic Development and Planning, marine employment on Long Island, New York (USA), generated approximately 34,000 jobs with an estimated $1 billion in wages. Additionally, the NOAA Fisheries database indicates that in 2022, the Montauk and Shinnecock ports brought in a combined 12.5 million pounds of fish while generating $24 million in revenue.

Understanding fish population dynamics is essential to maintaining these important ecological, social, and economic benefits. Considerable resources are devoted to monitoring fish, including regular surveys (e.g., refs. 4,5,6,7) and tagging and tracking studies (e.g., refs. 8,9,10,11). These efforts primarily gather data on adult fish. Such data are used to develop population dynamics models, which in turn inform stock assessments and management guidelines. However, fish population dynamics models often lack data on their early life stages – especially the distributions and abundances of fish embryos. This knowledge gap contributes to uncertainty in stock assessments12, hindering effective marine fish species management.

Fish have complex life histories. Many species begin their life cycle as plankton, residing within 200 meters of the ocean surface and exhibiting limited swimming capabilities13. Small size and lack of mobility make planktonic fish embryos and larvae – collectively referred to as ichthyoplankton – vulnerable to various elements. Survival through these fragile early stages can determine overall population dynamics13. As fish mature into juveniles, some shift from pelagic waters to the benthos14. Adults may migrate long distances to spawn or recruit, a strategy to compensate for the differing habitat requirements during the various life history stages of a species14. Therefore, monitoring both ichthyoplankton and adult fish is important to gain a complete understanding of fish population dynamics.

However, studying the early life history stages of fish species, especially the embryonic stages, has historically been challenging due to the high degree of morphological similarities between species. Our inability to visually identify fish embryos at the species level leads to extensive knowledge gaps regarding their spatial and temporal abundances and distributions, which further contributes to uncertainty in overall estimates of population dynamics.

DNA barcoding is an effective solution to this challenge as it allows for the easy and reliable species-level classification of ichthyoplankton. Using the polymerase chain reaction (PCR), a specific genetic region is amplified, most commonly the mitochondrial cytochrome c oxidase subunit 1 (CO1) gene. The CO1 sequence is compared to reference databases, such as the Barcode of Life Database System (BOLD). As of October 2024, over 19.3 million samples have been identified through this database, including 524,359 bony fish (class Actinopterygii).

However, while DNA barcoding of ichthyoplankton can fill essential knowledge gaps in our understanding of fish early life history, the technique has not been widely applied to samples collected on the Northeast US continental shelf. In this region, ichthyoplankton have been studied for decades15,16 – including via NOAA’s plankton surveys, conducted regularly since the 1970s. These studies typically rely on visual methods and as a result do not provide species-level information on embryos, which are virtually impossible to distinguish morphologically. Lewis et al. (2016) used DNA barcoding to identify 1,495 (93.26%) embryos collected from 2002–201217, but this pilot project has not been continued since.

Thus, we present our DNA barcoding dataset to contribute key missing information on ichthyoplankton distribution and abundance. We sampled across the New York Bight subregion of the Northeast US continental shelf, from the south shore of Long Island to the continental shelf break (Fig. 1), generating a large, comprehensive dataset of species-level identification of early life stages of fish. Concurrent with biological sampling, we collected environmental data using a conductivity, temperature, and depth (CTD) unit. Our dataset contains results from quarterly sampling from 2021–2023. We isolated 2,294 individual ichthyoplankton, including 1,344 embryos. We have nearly doubled the number of available species-level identifications of fish embryos in our commercially important region.

Fig. 1
figure 1

Map of the New York Bight region surveyed between spring 2021 and fall 2023. The pie charts reflect our sampling effort: the diameter of the circles represents the frequency at which a location was sampled, and the colored slices show the seasonal breakdown. The westernmost transect was sampled in the initial survey design as the starting point of survey cruises. A slight change in the survey design moved the stations eastward.

This dataset provides valuable insights that can address several critical questions related to species presence, as well as the spatial and temporal dynamics of fish spawning. It can also be used to detect shifting species ranges. Additionally, relationships between ichthyoplankton abundance and abiotic factors, including temperature and stratification, and potential impacts of climate change can be explored with our data. These results can also inform the development of conservation methods necessary for local fish populations to sustain and preserve ecosystem stability. Our dataset is especially significant given that information on fish embryos present in the New York Bight is generally lacking, despite the importance of fishing to local economic and recreational activities.

Methods

Sampling protocols

From 2021 to 2023, we sampled fixed stations in the New York Bight subregion of the Northeast US continental shelf, which spans from the mouth of the Hudson River to Montauk Point and extends to the continental shelf break (Fig. 1). Cruises were conducted seasonally, although adverse weather, mechanical difficulties, and COVID-related disruptions sometimes shortened sampling efforts. In 2023, the station design was altered slightly, and winter sampling was discontinued. Ichthyoplankton samples were collected with a vertical plankton tow of a 60 cm diameter, 333 μm mesh net at a 25 m maximum depth. Collections were permitted by the New York State Department of Environmental Conservation (scientific permit #1145). After collection, samples were stored in 95% ethanol. At the same station, we collected water column profiles of temperature, salinity, fluorometry, dissolved oxygen (DO), photosynthetic active radiation (PAR), and pH using a Seabird 911 + CTD with a rosette carrying eight 10 L Niskin bottles. This data was processed using Seabird’s data processing software.

Ichthyoplankton analyses

Samples were stored at 4 °C prior to analyses and then diluted to a 2:1 ethanol:biovolume ratio. Samples were sorted in their entirety under a Nikon SMZ1270 stereo zoom microscope. Ichthyoplankton individuals were isolated and imaged using a DFK 33UX174 camera from the Imaging Source and IC Imaging Control software. We used ImageJ to measure the diameter of embryos and the body length of larvae. We specifically report the notochord length, measuring from the end of the tail, excluding the finfold, to the tip of the nose. For bent larva, we measured adjacent segments following the spine to estimate the length. We recorded no available measurement when the specimen was severed or folded onto itself.

After imaging, the isolated individuals were placed in a 96-well PCR plate containing 30 μL of 95% ethanol per well. One well was omitted as a negative control. The plate was shipped to the Canadian Centre for DNA Barcoding at the University of Guelph (Canada), where standard protocols were followed, as detailed below.

PCR Protocols

The samples were broken down with standard lysis buffer containing proteinase K (Thermo Fisher Scientific Inc., Waltham, MA), and DNA extraction was performed using an automated protocol. PCR primers cocktails C_FishF1t1 (made up of VF2_t1: TGTAAAACGACGGCCAGTCAACCAACCACAAAGACATTGGCAC and FishF2_t1: TGTAAAACGACGGCCAGTCGACTAATCATAAAGATATCGGCAC18 ) and C_FishR1t1 (FishR2_t1: CAGGAAACAGCTATGACACTTCAGGGTGACCGAAGAATCAGAA and FR1d_t1: CAGGAAACAGCTATGACACCTCAGGGTGTCCGAARAAYCARAA18), and a PCR mix of 6.25 μL of 10% trehalose, 1.25 μL 10 × PCR buffer, 0.625 μL (2.5 mM) MgCl2, 0.625 μL (10 mM) deoxyribonucleotide triphosphates (dNTPs), 0.625 μL Platinum Taq polymerase (Thermo Fisher Scientific Inc.), 3 μL H20 and 1 μL of DNA template amplified the ~658 base pair CO1 gene. This target region was chosen so that our results align with the previous ichthyoplankton barcoding study conducted in our region17. This mixture was heated in cycles of 1 min at 94 °C; 5 cycles of 30 s at 94 °C; 40 s at 55 °C; 1 min at 72 °C; 35 cycles of 30 s at 94 °C; 40 s at 55 °C; 1 min at 72 °C; 10 min at 72 °C. The mixture underwent gel electrophoresis through the E-Gel system (Thermo Fisher Scientific Inc.). The amplified DNA fragments were sequenced through BigDye Terminator version 3.1 and reviewed on an Applied Biosystems 3730xl DNA Sequencer (Thermo Fisher Scientific Inc.). Trace files were put through Codon Code Aligner software (CodonCode Corp., Centerville, MA) to produce continuous sequences of DNA. Species identifications were obtained through comparison with BOLD’s database of published sequences. Resulting sequences were shared on Genbank in addition to BOLD.

In the rare instances when DNA barcoding or sequencing failed, we employed one of two alternative methods for identification. For one plate with a particularly low success rate, we reran PCRs using an alternative forward primer (AquaF2, ATCACRACCATCATYAAYATRAARCC18) and processed the fragments and sequences as above. Additionally, in some cases, sequence information was low quality but salvageable; in these instances, we used the DNA Subway toolkit19 to clean the sequence and identify the individual via a BLAST search. When PCR and/or sequencing failed entirely, and no sequence information was available, we attempted visual identification of larvae. To do so, we consulted both our own images of larvae that we had successfully identified through BOLD and ref. 15, which details the unique features of Western North Atlantic species throughout their larval development. Figure 2 describes our full workflow, from collection to identification. We also note that the elimination of winter sampling should not impact the utility of this dataset, as only a small minority of the resident fish species spawn in winter. When we sampled in the winter of 2022, for example, we only isolated one individual from seven stations.

Fig. 2
figure 2

Flow chart representing the steps from collection to species-level identification of ichthyoplankton. (A) is a Northern Searobin (Prionotus carolinus) embryo collected during October 2023. (B) is a Gulfstream Flounder (Citharichthys arctifrons) embryo collected during October 2023. These two embryos cannot be distinguished visually but are readily identifiable genetically.

For a small number of closely related species, we made assumptions based on known species ranges. For example, some embryos’ sequences matched Atlantic Menhaden (Brevoortia tyrannus) and Gulf Menhaden (Brevoortia patronus). Our study area overlaps with the range of Atlantic but not that of Gulf Menhaden, and therefore, we assumed the embryos belonged to the Atlantic species (note also that recent genetic evidence does not categorically support the historic species distinction, instead suggesting they might be separate populations of a single species20). Similarly, we inferred that embryos were Atlantic, not Gulf, Butterfish (Peprilus triacanthus, not Peprilus burti) and Northern, not Blackwing, Searobin (Prionotus carolinus, not Prionotus rubio). Finally, we deduced that sequences that matched both Bullet and Frigate Tuna (Auxis rochei and Auxis thazard) matched with sequences previously misidentified as the latter species. We, therefore, counted them all as Bullet Tuna. We did not use visual information to distinguish between these closely related species.

In total, we isolated 2,294 individual ichthyoplankton – 1,344 embryos and 950 larvae – from 7 offshore cruises. Of these, 2,175 (95%) were identified at the species level using one of the techniques described above, finding 50 unique species in our region and sampling period (Table 1).

Table 1 Summary of sampling and ichthyoplankton findings across research cruises.

Noteworthy findings

Of the 50 species identified, 32 (64%) were also found in an earlier study that used DNA barcoding to identify fish eggs collected across the Northeast US continental shelf – an area including but not limited to our study region17. Of the remaining 18 species, 8 were found exclusively at a shelf break station, adjacent to the Slope Sea, that we sampled in September 2022. An additional species (Bothus robinsi) was also found at the shelf break in September 2022 and at one other station we sampled. Of the nine species uniquely found in September 2022, eight were previously found in a 1988 study of larval fish in the New York Bight and described as belonging to a ‘slope assemblage’16. Of note, the September 2022 cruise occurred between Hurricanes Fiona (Category 4) and Ian (Category 5) – suggesting that the transport of these uncommon species may have been storm-related.

There were nine taxa not found in refs. 16,17, nor on the shelf break during September 2022. Of these, 4 are regularly identified in routine larval sampling by NOAA: Ophidion sp., Opisthonema oglinum, Vinciguerria attenuata, and Rhomboplites aurorubens21. Of the remaining 5, larvae of 4 species – Antigonia capros, Astroscopus guttatus, Rachycentron canadum and Stenotomus chrysops – are extremely rare in routine offshore ichthyoplankton sampling, with fewer than 10 records in decades of bimonthly effort (Dr. David Richardson, personal communication). Scup (Stenotomus chrysops) is of particular interest as this is a commercially important species whose early life history is poorly understood. We found eight scup embryos in our study, which would not have been identified via traditional visual methods. Cobia (Rachycentron canadum) supports a valuable recreational fishery, and it has been hypothesized that this species is undergoing a climate change-induced northward range shift22,23 – a hypothesis our results support.

Bathyanthias mexicanus is the one ichthyoplankton species we found that, to our knowledge, had not been previously identified in our region. This species was first described from samples collected in the Gulf of Mexico in the 1950s24. It is described as a deep water Serranid of the subfamily Epinephelinae, ranging from the Guianas and Venezuela to the Florida east coast, but is most common in the Gulf of Mexico25.

None of the species included in our results are considered endangered, threatened, or protected.

Data Records

All data records are available on Figshare26. This includes records of each species identification (scientific and common name), cruise ID, collection date, time, longitude, and latitude, and links to images27. We included information on whether species identification was determined via BOLD, BLAST, or visually, as well as temperature and salinity measurements from CTD casts. Along with the CTD measurements, we report the longitude, latitude, date, and time of each CTD cast. Additionally, we have shared folders of specimen images organized by the 96-well PCR plate number submitted to CCDB and the corresponding BOLD process IDs28.

Technical Validation

Species identifications were obtained through BOLD Systems with a strict identification threshold of 99% homology. Additionally, we estimated our sorting error rate, i.e., the probability that we failed to isolate a fish embryo or larva from the other zooplankton in our sorting process, at 5.6%. We derived this estimate by selecting a random 10% of our sampled volume for re-sorting. Two people conducted re-sorting simultaneously to increase the likelihood of finding any individuals missed during the initial round. For most re-sorted samples, we found zero individuals; in the minority, we isolated one or two. However, we did notice one anomalous sample in which a large number of individuals had been missed in the first round. We re-sorted 14% of this sample volume and found 12 individuals, while 17 had been found previously. We thus determined that there was a more systematic issue with this particular sample – likely related to poor ship-board storage – and decided to re-sort it in its entirety. This is ongoing work, and the databases will be updated to reflect new results once they are available.

Any individuals isolated during the quality control re-sorting were barcoded, and their results are included in our current dataset. No new species were identified in this process.

Usage Notes

In addition to Figshare, data are available on BOLD Systems as a public dataset29. Although this study is ongoing, data reported in BOLD, Genbank, and Figshare reflect the work completed at the time of publication of this manuscript. Added data will be associated with a new version of the respective datasets.