Introduction

Mitochondrial genomes are a useful tool for understanding evolution at different levels of biological organization, including ecological effects on populations1, divergence patterns among closely related species2, and elucidating phylogenetic relationships3. The reason for the versatility of the mitogenome is that the mitochondrial genes and regulatory regions such as the control region, also known as D-loop—herein used in the broad sense to refer to the entire non-coding region between the tRNA-Pro and tRNA-Phe genes4—have been shown to evolve at different rates5,6,7,8. Thus, the relative ease and affordability of generating mitogenomes makes them an appealing tool for comparative population genomics and phylogenomics.

Here we describe the structure and content of a new high-quality mitochondrial genome of the Eastern Blacknose Dace, Rhinichthys atratulus. Mitogenomic data have revealed important insights into phylogenetic relationships of fishes3,9. An important consideration is whether to include D-loop sequence as part of the data set. In some prior studies, D-loop sequences have successfully contributed important information about relationships10,11. Other studies, however, have excluded D-loop sequence from analyses because of length variation or alignment problems1,12,13. In this paper, we illustrate the potential value of including the D-loop sequence in mitogenomic datasets, even when some sequence length variation exists across sampled taxa.

Osborne et al.12 used mitogenomes without the D-loop to estimate phylogenetic relationships among a small set of cyprinoid fishes (minnows), rooting their tree with two outgroup species from the family Catostomidae. We expanded the number of ingroup cyprinoid terminal taxa to 27: two cyprinids and 25 leuciscids. We included our de novo mitogenome of R. atratulus in the study group in order to determine if R. atratulus is sister to R. obtusus as demonstrated by Kraczkowski and Chernoff14 (see below). We compared our phylogenetic results to those of Osborne et al.12 and illustrated the potential importance of including sequence information from the D-loop. Furthermore, we compared our results to the comprehensive study of cyprinoid species with 358 leucisid taxa based upon two mitochondrial (cytb, COI) and two nuclear (rag1, IRBP) genes15.

Study material and species identification

Eastern Blacknose Dace, Rhinichthys atratulus Hermann (1804), was obtained from a feeder stream to the Coginchaug River, a small tributary of the Connecticut River in central Connecticut, USA. The species generally inhabits medium to small streams and rivers, preferring riffles and rocky habitats over sand or clay14. Kraczkowski and Chernoff14 demonstrated that R. atratulus is found in Atlantic slope drainages from Nova Scotia to the Roanoke River drainage in Virginia, and in southeastern tributaries to Lake Ontario.

The putative R. atratulus tissues for Schönhuth et al.15,16 and the mitogenome NC_033943.1 are from specimens collected together in Russell Creek, northeast of Tazewell, Tennessee, a tributary of the Powell River, Tennessee, in the Clinch River and Tennessee River drainages. The specimens are catalogued in the University of Alabama Ichthyology Collection (UAIC 9850.01). We examined the specimens in UAIC 9850.01, and determined that they are Western Blacknose Dace, Rhinichthys obtusus Agassiz (1854). R. atratulus is not known from the Tennessee River basin14. It should be noted that the first paper by Schönhuth et al.16 (2012) and the deposition of the sequence in GenBank (2011) predate the removal of R. obtusus from the synonymy of R. atratulus by Kraczkowski and Chernoff14 (2014).

Results

The R. atratulus mitogenome assembly has a length of 16,646 bases and is composed of 13 protein-coding genes, 22 tRNA genes, 2 rRNA genes, and a 974 bp D-loop (Fig. 1), with a GC content of 45.7%. The assembly is supported by a mean read depth of 1677.481 (Fig. S1). Overall, the nucleic acids are relatively consistent across the sampled taxa (Fig. 2).

Fig. 1
figure 1

Annotation of our de novo R. atratulus mitogenome assembly using MitoFish, visualized with Chloroplot (Zheng et al., 2020). The innermost ring represents the GC content of a given 100 bp interval of the sequence, with the subdivided rings representing increasing 25% intervals. The arrows in this section correspond to the direction of transcription for the genes and RNAs displayed in the outer ring, with the inner features transcribed clockwise (forward strand) and the outer features transcribed counterclockwise (reverse strand). Features are color coded by their product, per the key. The inset photograph is of a male R. atratulus individual in breeding colors. This individual was collected and photographed by the authors from the same locality as the sequenced individual in May 2023.

Fig. 2
figure 2

Comparison of nucleic acid composition (as percent of total sequence, y-axis) between each mitochondrial genome (x-axis). The differently colored sections of each bar represent the proportion of A (blue), T (green), G (gold), and C (pink) in each mitogenome. The species are ordered according to the phylogenetic results shown in Fig. 3. *This specimen (NC_033943.1) is labeled as Rhinichthys atratulus in the Genbank but is properly identified as R. obtusus14; see Study Material and Species Identification for further information.

Among all mitogenomes examined, GC content comprises less than 50% (except for Meda fulgida) because guanine comprises less than 15% of the mitogenome. Thymines are greater than 25% of all the nucleic acids and more common than adenines, which are almost uniformly 25% (Fig. 2).

The D-loop of our R. atratulus assembly represented the median length among the taxa at 974 bp and was close to the mean length of 977.448 bp (Table 1). D-loop lengths varied considerably among sampled taxa, ranging from 890 bp (Oregonichthys crameri) to 1135 bp (Tiaroga cobitis). This large variation in D-loop length (e.g., the 245 bp difference between the Oregonichthys and Tiaroga) could potentially bias phylogenetic trees. Our results did not reflect this; the D-loop contained meaningful signals and increased bootstrap support at a number of nodes.

Table 1 Mitogenome and D-loop lengths for selected taxa with corresponding accession numbers. The Length column lists the total length of the RefSeq in base pairs (bp), while the D-loop length column lists the length of the non-coding control region between the tRNA-Pro and tRNA-Phe genes (Nicholls and Minczuk, 2014). *This specimen (NC_033943.1) is labeled as Rhinichthys atratulus in the Genbank but is properly identified as R. obtusus14; see Study Material and Species Identification for further information.

There was 100% consensus of the branching patterns for the phylogenetic trees that we generated, both with and without the D-loop sequences (Fig. 3). Both phylogenetic estimates demonstrated that ingroup cyprinoids and leuciscids were monophyletic, and that R. atratulus is most closely related to a monophyletic R. obtusus. Overall, the tree with the D-loop exhibited better bootstrap support than the tree that did not include D-loop sequences. Of the 26 nodes present in both trees, four nodes in the tree lacking D-loop sequence were poorly supported (i.e., bootstrap support < 70%). Inclusion of the D-loop reduced the number of poorly supported nodes to two. The support for the monophyly of Semotilus, Meda, and Tribolodon jumped from 50.2% to 77.7% (Fig. 3), and support for the monophyly of Cyprinella and two species of Hybognathus increased from 22 to 72.2% (Fig. 3).

Fig. 3
figure 3

Phylogenetic estimates from mitogenomes of 27 ingroup cyprinoid taxa (see Table 1 for complete list of NCBI accessions). The de novo Rhinichthys atratulus (PP958238) mitogenome is highlighted in gold. The trees were rooted at the two outgroups of the genus Catostomus. Trees were constructed (A) with D-loops and (B) without D-loops. Bootstrap values are displayed adjacent to each node. *This specimen (NC_033943.1) is labeled as Rhinichthys atratulus in the Genbank but is properly identified as R. obtusus14; see Study Material and Species Identification for further information.

Discussion and conclusions

Mitochondrial genomes are widely used to investigate biological questions at different levels in the hierarchy of life—from population-level questions17 to species-level relationships3 and to higher-order phylogenetic analyses18. Recent advances in genomic sequencing have reduced the time and cost of obtaining mitogenomes. In this paper we introduce a new mitogenome, Rhinichthys atratulus, to the growing list of mitogenomes available. Our new mitogenome is important because it comes from the exact locality of the neotypic series for the species—thus, if there is an R. atratulus, it comes from the Coginchaug River at Wadsworth Falls State Park.

We generated two phylogenetic trees from a sample of 29 cyprinoid fishes with and without sequences from the D-loop. Although the D-loop sequences required careful alignment, these sequences provided important support for nodes on the tree; the number of poorly supported nodes were halved (from 4 to 2). Osborne et al.12 omitted the D-loop from their analyses due to concerns over variation in D-loop length across sampled taxa; however, our results suggest that this variation can carry phylogenetically useful information without reducing support elsewhere. Indeed, the species with the shortest and longest D-loop regions—O. crameri and T. cobitis, respectively—are sister taxa with very strong node support in both of our trees (99.8%, Fig. 3) despite the 245 bp difference between their respective D-loops.

We compared our phylogenetic results with the findings by Schönhuth et al.15, Schönhuth et al.16, and Osborne et al.12. Schönhuth et al.15 produced three phylogenetic trees from: (i) two mitochondrial genes; and (ii) two nuclear genes; and (iii) combined mitochondrial and nuclear genes. Schönhuth et al.16 produced a similar set of trees, but instead used only one mitochondrial gene and three nuclear genes. Our study and those of Osborne et al.12 and Schönhuth et al.16 included many fewer taxa than did Schönhuth et al.15 (27, 21, 71, and 410, respectively). Our results were largely congruent with these studies. An important finding of our study is that R. atratulus is the sister group to R. obtusus (Fig. 3); the former species was not included in the other studies (see Study Material and Species Identification above for further details).

There are three areas of our tree that differ from Osborne et al.12, Schönhuth et al.15, or Schönhuth et al.16. The first concerns the species relationships within a clade containing Hybognathus amarus, H. nuchalis, Cyprinella lutrensis, and Pimephales promelas. Our results had C. lutrensis as sister to the Hybognathus species clade with P. promelas the next closest taxon. Osborne et al.12 alternatively estimated that P. promelas was the sister group to Hybognathus with C. lutrensis their nearest relative. Lastly, Schönhuth et al.15 had C. lutrensis and P. promelas as sister taxa with Hybognathus as sister to this clade. There is only medium support for each of these hypotheses in their respective studies.

A second difference is that our mitogenome trees have Tiaroga and Oregonichthys as most closely related to Rhinichthys (Fig. 3), in agreement with Schönhuth et al.15. However, the trees of Schönhuth et al.16 with a different set of genes and taxa than Schönhuth et al.15 suggested that Tiaroga is related to Rhinichthys at a more inclusive level. Our findings lend further support to the relationships described in Schönhuth et al.15.

The third difference involves the relationship of Erimystax to Exoglossum and other pogonichthyines. Exoglossum is positioned as the sister to the non-Mylocheilus-clade16 pogonichthyines in Schönhuth et al.15,16 and our study. In our study Erimystax is sister to Exoglossum with 100% bootstrap support and with an exceedingly short branch length joining them (Fig. 3). In Schönhuth et al.15 these genera are distantly related, and Erimystax is the sister group to Phenacobius; a result that occurs in their mitochondrial, nuclear, and combined trees.

The differences noted above can be due to several factors. The most obvious is taxon sampling, especially in relation to the potential problems of long branch attraction19,20. However, the branches in our analyses are not particularly long—especially in the case of Erimystax and Exoglossum, which have exceptionally short branches—and are well distributed. While we recognize that the superb sampling of Schönhuth et al.15 is optimal and has the best chance of escaping this issue, the differences among our studies12,15,16 may also have to do with phylogenetic signal. The complete mitogenome, including the D-loop, of the cypriniform fishes of our study is at least five times larger than the total sequence used in Schönhuth et al.15. Thus, the phylogenetic signal of the mitogenomes in our study may be very strong. Additional study to compare phylogenetic signal from nuclear genomes with mitogenomes will resolve this issue.

This paper demonstrated the importance of mitogenomic data in addressing phylogenetic relationships, and especially the inclusion of the D-loop sequence. Many other studies have benefitted from inclusion of D-loop sequences ranging from phylogenetic relationships to population level studies10,11. R. atratulus and R. obtusus have been demonstrated to be monophyletic, confirming previous work14. Our study has raised questions about the phylogenetic relationships of Cyprinella, Hybognathus and Pimephales; Tiaroga and Rhinichthys; and Exoglossum and Erimystax. The question of their relationships within the Pogonichthyinae will require additional taxa and potentially analyses of mitogenomes and nuclear genomes to resolve the issues.

Our introduction of a new Rhinichthys atratulus genome brings attention to ensuring that we collect mitogenomes from species at or near type localities in order to ensure correct taxonomic nomenclature. Furthermore, the mitogenome of R. atratulus is an important resource for our ongoing phylogeographic, population evolution, and conservation studies of the R. atratulusR. obtusus species complex.

Materials and methods

The sequenced specimen was captured in a small tributary of the Coginchaug River at Wadsworth Falls State Park, Middletown, Middlesex County, Connecticut River Drainage, CT, USA, longitude-72.687042, latitude 41.535731. The fish was collected via electrofishing and euthanized in a solution of tricaine methanesulfonate (MS-222) at a concentration of 500 mg/L. Fish collection and handling was humane, ethical, and approved by state and university animal care and use committees (IACUC2017-1212-Chernoff-A); all methods were performed in accordance with the relevant guidelines and regulations, and this study conforms to ARRIVE reporting guidelines. We sent a single whole fish to Cantata Genome Services (Cantata Bio, LLC, Scotts Valley, CA, USA). Whole genomic DNA was extracted from the muscle tissue and sequenced via PacBio circular consensus sequencing (Pacific Biosciences of California Inc., Menlo Park, CA, USA), producing a total read length 169.1 Gbp. A chromosome-level genome was assembled de novo with HiFiasm21 (v. 0.15.4-r347). Contamination was removed with BlobTools22 (v. 1.1.1) using the NCBI nt database23 as reference, while haplotigs and contig overlaps were removed with purge_dups24 (v. 1.2.5). The mitogenome was isolated from the de novo assembly using Minimap225 (v. 2.26) by aligning against the Rhinichthys cataractae mitogenome (GenBank accession MG570448.1) and extracting the aligned scaffold using seqtk26 (v. 1.3). We validated the identity of the extracted mitogenome by running a BLASTn search27 (v. 2.14.0) against the NCBI mitochondrial RefSeq database28. Read coverage depth was determined using Winnowmap229 (v. 2.03). Annotations were completed using MitoFish30.

Phylogenetic analysis involved alignment of the de novo R. atratulus mitogenome with 28 additional complete mitogenomes sourced from the NCBI RefSeq database (see Table 1). We aligned complete mitochondrial genomes with Clustal Omega31 (v. 1.2.3) using default parameters. Due to variation in D-loop length among the selected genomes, we ran an additional alignment with the D-loops removed. Phylogenetic trees were constructed in R32 (v. 4.3.2) using tidyverse33 (v. 2.0.0), ape34 (v. 5.7–1), phangorn35 (v. 2.11.1; Schliep, 2011), and data.table36 (v. 1.15.0) packages. Using the phangorn function modelTest, we selected GTR + G(4) + I as the evolutionary model based on AIC values. Each phylogeny was constructed with 1000 bootstrap iterations. Trees were visualized using the R package ggtree37 (v. 3.10.1). Sequence base composition was calculated using seqtk.