Introduction

The genetic population structure of a species is shaped by interactions between geological events, climatic changes, and ecological dynamics1,2,3. These processes influence population divergence, gene flow, and adaptation, contributing to current biodiversity. Primary freshwater crabs exhibit a unique evolutionary history, characterized by a land-locked life history and direct development without a planktonic larval stage, and these traits often lead to genetic differentiation and speciation among islands4,5,6. This relationship between life history traits and genetic structure, shaped by the geographical history, makes these crabs a valuable model for understanding genetic divergence mechanisms.

In particular, the genus Geothelphusa is the northernmost and diverse freshwater crab group in East Asia, spanning from Taiwan to the Japanese Archipelago6, exhibiting island-specific differentiation in some species7,8,9. Geothelphusa dehaani (Japanese name, Sawagani), the most commonly observed freshwater crab in Japan, is usually found in freshwater habitats from coastal areas to high-altitude mountains, from Nakanoshima in the northern Ryukyu Islands (29.9°N) to southern Hokkaido (41.5°N)9,10,11,12. This wide distribution makes it ideal for investigating how the geological history of the Japanese Archipelago has shaped the genetic structures of these freshwater crab.

Previous genetic studies on G. dehaani were limited in scope, using samples from restricted areas and examining partial regions of enzyme or mitochondrial DNA sequences13,14,15,16. Takenaka et al.17 were the first to comprehensively analyze its phylogeny, revealing 10 populations across the Japanese Islands and identifying island-specific genetic differentiation using a combination of partial mitochondrial DNA (mtDNA) markers, namely cytochrome oxidase subunit 1 (COI) and 16S rRNA and nuclear DNA (nuDNA), namely internal transcribed spacer (ITS) and histone H3. However, the discordance between mtDNA and nuDNA phylogenies, often observed in other taxa, e.g., the geotrupid dung beetle Phelotrupes auratus18 or the Japanese fire-bellied newt Cynops pyrrhogaster19, indicates the need for further analysis. Such discordance can result from differing evolutionary rates or adaptive mtDNA introgression20. If this applies to G. dehaani, it is necessary to examine each genetic structure in detail to understand its phylogeny and evolutionary processes.

Differences in genetic structure often reflect phenotypic traits, such as body coloration. Geographic variation in body color is well-documented in G. dehaani15,21,22,23,24, with individuals traditionally classified into three main color types: dark (DA), red (RE), and blue (BL)21,22. These body color types exhibit distinct geographic distributions, with DA being widespread and RE and BL more localized21,22,23,24,25. Body color could serve as a population marker if there were a relationship between body color and genetic structure; in this case, G. dehaani could be used as a model for studying gene flow, selection, and demographic history underlying color variation, as animal coloration is a key adaptive trait with diverse functions18,26. However, previous studies on color–genetic relationships yielded conflicting results, largely due to reliance on partial mtDNA data14,15,17,24,27. As mtDNA data alone offers limited insights into the demographic history and geographic division of color forms, genome-wide sequence data are necessary to elucidate the genetic basis of geographic color variation18.

Genome-wide approaches using large datasets of single nucleotide polymorphisms (SNPs) in nuDNA help uncovering insights not visible by analyzing a region of mtDNA, nuDNA, or morphological analyses28. Further, genome-wide approaches provide a fundamental baseline to tackle taxonomic issues in the G. dehaani species complex, which may include cryptic species or distinct genetic groups. Molecular phylogenetic studies based on partial mtDNA regions have suggested multiple clades and potential undescribed species within G. dehaani14,15,17,24,27. In fact, Naruse and Ng9 recently described two new species based on morphological differences, G. mutsu and G. amakusa, from Aomori in Honshu Island and Amakusa Island in Nagasaki Prefecture in Kyushu, respectively. They also redesignated the lectotype of G. dehaani sensu lato from its type locality in Nagasaki City. However, Naruse and Ng9 did not describe the phylogenetic relationship among these three species, and Takenaka et al.17 found no genetic differences between specimens from Amakusa and those near the G. dehaani type locality (Clade 7). This situation emphasizes the need to clarify the phylogenetic relationship within the G. dehaani species complex, including G. amakusa and G. mutsu.

Addressing these issues requires extensive sampling across the G. dehaani species complex distribution and advanced molecular tools, such as next-generation sequencing17,27. Accordingly, this study aimed to investigate the genetic population structure of the G. dehaani species complex using genome-wide SNP data. We comprehensively collected 504 specimens from 217 sites across the Japanese Islands to examine the phylogenetic relationships between nuDNA and mtDNA and assess a potential discordance. Moreover, we used approximate Bayesian computational (ABC) analysis to infer its demographic history and analyzed the relationship between body color and genetic structure to evaluate its potential as a population marker. Finally, we have discussed the evolutionary patterns of the G. dehaani species complex in the Japanese Islands.

Results

Body color distributions

In this study, we identified four body color types, following Chokki22 and Naruse and Ng9: dark purplish brown carapace and thoracic legs (DA), dark brown carapace, sometimes with orange or brown color on the posterolateral portion, and reddish orange thoracic legs (RE), grayish blue or bluish green carapace, occasionally including dark reddish purple, with light grayish or light yellow thoracic legs (BL), and a sky blue carapace with whitish, brownish, or reddish thoracic legs (amakusa). We also found the OC type, which did not align with descriptions by Chokki22 and Naruse and Ng9, featured a reddish purple carapace with a milky white wedge pattern in the posterior two-thirds or a dark brown anterior half with a dark orange pattern on the anterior margin and posterior half, all with whitish bases on the thoracic legs (Fig. 1, Table S1). As expected, the distribution patterns in the Japanese Archipelago differed among body color types (Fig. 1): DA was widespread in Hokkaido, Honshu, Shikoku, Kyushu, and adjacent islands (Sadogashima, Okinoshima, and Goto islands); BL was locally concentrated along the Pacific coast, particularly in Honshu from the Boso Peninsula to the Izu Peninsula and the southeastern Kii Peninsula, southern part of Shikoku, southern-west part of Kyushu, and Osumi Islands (Yakushima and Tanegashima islands); RE was primarily found in southern Japan, including the southern Kii Peninsula, southwestern Shikoku, and Kyushu; amakusa was located in the Amakusa Islands; and OC in Nagasaki and Oita in Kyushu.

Fig. 1
figure 1

Map showing the geographical distribution of the analyzed samples. Symbol shapes represent species and data sources; circles, the Geothelphusa dehaani species complex; squares, outgroups; triangles: reference data of G. dehaani sensu lato from the literature. Symbol colors represent body coloration types; sky blue, BL type; brown, DA type; red, RE type; blue, amakusa type: orange, OC type; white, unidentified specimens due to small individuals; and black, unknown.

Genetic population structure based on mtDNA

The amplified mtDNA sequences ranged from 557 to 590 base pairs, yielding 195 haplotypes (Table S2). The G. dehaani species complex was divided into three monophyletic populations (clades pop1–3) and 14 subclades in the neighbor network (Fig. 2). Pop1 comprised specimens from Shikoku and the southeastern Kii Peninsula, further divided into three subpopulations: 1a (central and western Shikoku), 1b (southeastern Kinki), and 1c (southeastern Shikoku). Pop2 included specimens from Shikoku and Kyushu, comprising 2a (Kyushu) and 2b (Shikoku and Kyushu). Pop3 was separated into nine subpopulations: 3a (Hokkaido, Honshu from Tohoku to eastern Chugoku, and eastern Shikoku), 3b (western Chugoku and northwestern Shikoku), 3c (western Kyushu, including G. amakusa, and South Korea), 3d (northeastern Shikoku and Kyushu), 3e (Yakushima and Nakanoshima islands), 3f (Tanegashima Islands), 3 g (Koshikishima Islands), 3 h (southern Kanto and Eastern Shizuoka Prefecture), and 3i (Danjo Islands in Oshima Islands).

Fig. 2
figure 2

Neighbor-net phylogenetic network based on the 557–590 base pairs of the mtDNA COI region.

The haplogroup diversity of populations based on mtDNA was highest in pop3, nucleotide diversity was highest in pop2 and both were lowest in pop1 (Table S3). Tajima’s D values and Fu’s Fs values showed significant values (P < 0.01) only in pop3; both were negative. Pairwise fixation index (FST) values were significant between all sites, in the following order: pop1 - pop2 < pop1 - pop3 < pop2 - pop3 (Table S4).

Genetic differences among populations based on MIG-seq

Table S5 shows the number of SNPs and CV error values for datasets 0–6. For set 0 (All individuals included in the SNP analysis), the lowest CV error value was obtained for K = 5, dividing the population into five groups: HO (Hokkaido, Honshu from Tohoku to eastern Chugoku, and Shikoku), SHI (southeast Kinki and Shikoku), nKC (western Chugoku and northern Kyushu), cK (central Kyushu, including G. amakusa), and sKK (southern Kyushu, southern Kanto, and eastern Shizuoka) (Fig. 3; Fig. S1, S2). These groups partially corresponded to our observed color types, excluding juveniles (S) and individuals that could not be color-discriminated (un), with HO primarily comprising DA (89% of all specimens) and RE (11%); SHI primarily comprising BL (71%); nKC primarily comprising DA (80%) and RE (7%); cK primarily comprising DA (48%), RE (24%), and amakusa (14%); and sKK predominantly comprising BL (100%) (Figs. 1 and 3). In the SHI population, only four specimens corresponded to DA and RE types (specimen numbers 62, 276, 432, and 527). A few OC specimens were observed in nKC and cK populations. Among the 140 samples in sets 1–6, 128 had a Q value (Q) of > 95% (Fig. S3, S4). Twelve samples had Q values between 5% and 95%, including set 1 (Kanto region: Nos. 104, 105), set 2 (Kinki region: No. 166), set 4 (Shikoku region & Awaji Island: No. 346), set5 (Northern Kyushu: Nos. 422, 423) and set 6 (southern Kyushu: Nos. 422, 423, 457, 458, 465, 483) (Fig. S4). Except for No. 483, these samples were collected near geographical population boundaries.

Fig. 3
figure 3

Map showing the results of the ADMIXTURE analysis for each location (set 0) and the distribution of color types. Large pie chart, genetic elements of each individual; small pie chart, color types of individuals at each location used in this analysis. Color types are indicated as in Fig. 1. The colors in the pie charts corresponding to each group and genetic elements are as follows: HO (brown), Hokkaido, Honshu from Tohoku to eastern Chugoku, Shikoku; SHI (blue), southeast Kinki and Shikoku; nKC (orange), western Chugoku and northern Kyushu; cK (green), central Kyushu, including Geothelphusa amakusa; and sKK (pink), southern Kyushu, southern Kanto and eastern Shizuoka. The colors of the ADMIXTURE graph correspond to each genetic element on the map, and the sample numbers refer to Table S1. Distribution map with sample numbers represents as Fig. S2.

The results for both variant positions and all positions were consistent in the five populations (Table 1). The highest nucleotide diversity (π) value was observed in sKK, whereas HO and SHI had the lowest. nKC had the highest observed heterozygosity (Ho), and sKK the highest expected heterozygosity (He). All groups exhibited higher He than Ho, indicating homozygous excess, with all inbreeding coefficient (FIS) > 0. Pairwise FST and corrected FST (FST) values were all high and significant (Table 2, FST = 0.3375–0.5746; FST = 0.3382–0.5684, all P < 0.01), indicating deep genetic divergence among populations. The FST values were consistent across populations, being larger in SHI-HO, SHI-cK, and HO-sKK pairs, and smallest in the HO-nKC pair (Table 2).

Table 1 Overall population genomic statistics for the Geothelphusa dehaani species complex based on 354 SNP loci (variant positions) and 9058 SNP loci (all positions). Abbreviations: observed heterozygosity (Ho), expected heterozygosity (He), nucleotide diversity (π), inbreeding coefficient (FIS). Population abbreviations were based on the results of ADMIXTURE: HO (Hokkaido, Honshu from Tohoku to Eastern chugoku, and Shikoku), SHI (southeast Kinki and Shikoku), nKC (western Chugoku and northern Kyushu), cK (central Kyushu, including G. amakusa), and sKK (southern Kyushu, southern Kanto, and Eastern Shizuoka).
Table 2 FST and FST values among the five Geothelphusa dehaani species complex populations based on 354 SNP loci. Bottom left: pairwise fixation index (FST), top right: corrected FST value (FST). Population abbreviations were based on the results of ADMIXTURE: HO (Hokkaido, Honshu from Tohoku to Eastern chugoku, and Shikoku), SHI (southeast Kinki and Shikoku), nKC (western Chugoku and northern Kyushu), cK (central Kyushu, including G. dehaani in lato type locality and G. amakusa distribution), and sKK (southern Kyushu, southern Kanto, and Eastern Shizuoka). Asterisk indicates the significant at P < 0.01. Lower values in FST represent 95% CI.

Demographic history

To explore the demographic history of the five genetic clades, we analyzed individuals composed of single genetic elements based on the ADMIXTURE results (see “Materials and methods”), using DIYABC-RF based on 313 SNPs. Principal component analysis (PCA) pre-scenario checks indicated that the observed dataset aligned well with the simulated dataset (Fig. 4a), suggesting that the analysis conditions were suitable for random forest analysis.

Fig. 4
figure 4

Results of DIYABC-RF for the Geothelphusa dehaani species complex based on 313 SNPs. (a) Principal component analysis plots evaluating the fit between the observed data and simulated datasets for 32 demographic scenarios of the G. dehaani species complex in the Japanese Islands. (b) The best voted scenario for the divergence history of G. dehaani populations (Scenario 30; see Fig. S4).

Among the 32 hypothetical divergence scenarios tested (Fig. S5), Scenario 30 was selected by 52.0% votes (Table S6, 520 out of a total 1000 votes) as the best fit, with a mean posterior probability of 0.831 and prior and posterior error rates (i.e., global and local errors) of 0.133 and 0.169, respectively. Scenario 30 proposed a divergent history where the SHI population diverged first, followed by HO, with nKC, sKK, and cK populations differentiating around the same time (Fig. 4b). The expected time of divergence events for the SHI population at t3 and for the HO population at t2 were estimated as 2.78 × 105 (95% confidence interval (CI): 1.35 × 105–4.32 × 105) and 1.94 × 105 (95% CI: 1.04 × 105–3.00 × 105) generations ago, respectively. Finally, the expected time of divergence events in the Kyushu at t1 was estimated as 1.06 × 105 (95% CI: 5.59 × 104–1.59 × 105) generations ago. Assuming the generation time to be four years in the G. dehaani species complex29, divergent events were expected to have occurred in the SHI population at 1.11 (95% CI: 0.54–1.72) MYA, in the HO population at 0.77 (0.41–1.20) MYA, and for the nKC, cK, and sKK populations at 0.42 (0.22–0.63) MYA. Scenarios diverging by body color (Scenarios 1–3) received only two votes.

Discussion

Genetic population structure and evolutionary history

In this study, the ADMIXTURE analysis identified five populations across the Japanese Archipelago, delineated by distinct geographical boundaries. The ABC results suggest that the divergence order is firstly SHI, then HO, and lastly three Kyusyu populations. In addition, a clear differentiation between populations was demonstrated in FST (Table 2; Fig. 4). In this study, we treat all populations as intra-specific to avoid taxonomic confusion, because at the present stage, we cannot judge whether those are cryptic species without morphological observations.

Our findings indicate that the genetic population structure of the G. dehaani species complex was shaped by multiple geological events throughout its complex evolutionary history. Notably, most populations, except the cK population distributed in Kyushu, span multiple islands. Takenaka et al.17 identified genetic differentiation among nine major populations according to island, based on a combination of partial mtDNA and nuDNA regions (COI, 16S rRNA, ITS, and histone H3), suggesting that straits between islands act as major genetic barriers for G. dehaani. In contrast, our results highlight the importance of geological events, including volcanic activity and climate change, in shaping the genetic structure of this species group.

For example, the co-occurrence of three populations (nKC, cK, and sKK) on Kyushu Island cannot be explained solely by the geographical barrier posed by the strait, as these populations are separated along a North–South axis with distinct geographical boundaries. ABC analysis suggested that they diverged nearly simultaneously from a common ancestor, implying a large-scale or continuous event disrupted gene flow across Kyushu. Moreover, the identified boundaries appear to be influenced by sea level fluctuations and volcanic activity. For example, the nKC population spans the Kanmon Strait into central Chugoku and northern Kyushu. Despite its unclear boundary with the HO population, it aligns with the distribution boundaries of other species, such as calopterygid damselflies and small salamanders30,31, hinting at historical events in the Chugoku region that impeded gene flow. Major rivers, such as Gohnokawa and Ota, which flow across the Chugoku Mountains to the Sea of Japan, may have once served as barriers connected to the Seto Inland Sea32. In contrast, the southern boundary between the nKC and cK distributed in Kyushu lacks a singular geographical barrier, suggesting that multiple factors influenced their genetic divergence. For example, during past interglacial periods, the Tsukushi and Fukuoka plains in northern Kyushu, which were submerged by rising sea levels, restricted the movement of freshwater organisms between this plain and the adjacent Sefuri Mountains and other mountain ranges27,33. Similarly, nKC and CK populations, which are distributed adjacent to each other along the Tsukushi and Fukuoka plains, may have diverged for the same reason. Additionally, volcanic activity between 5–6 million years ago (MYA) and 0.5–2.0 MYA along the Beppu–Shimabara Graben, which runs East–West through central Kyushu34, likely shaped local barrier and limited gene flow. In fact, the latter period activity may have influenced the amphibian distribution35; the estimated divergence age for the Kyushu population in this study (0.42 MYA) is consistent with this hypothesis. Such disturbances promoted local isolation and genetic differentiation. Similarly, the cK–sKK boundary in southern Kyushu appears to be influenced by volcanic activity, as noted by Suzuki & Tsuda23.

The HO, SHI, and sKK populations exhibited a unique distribution pattern, with the latter two clearly displaying enclave distributions. Two scenarios could explain such distribution: a stable contact zone scenario, in which a population persists as a remnant patch of a historically widespread distribution, or a moving contact zone scenario, in which a population migrates into new areas36. The SHI population distributed in Shikoku likely aligns with the first scenario for several reasons: (1) geographic and ecological barriers make recent colonization unlikely, (2) exhibits low genetic diversity (π = 0.057), suggesting a historical bottleneck, and (3) evidence of rapid expansion in the HO population by low genetic diversity in SNPs (π = 0.059) and neutrality tests, including a negative Tajima’s D (− 1.69) and Fu’s Fs (− 23.7). ADMIXTURE analysis further supports this hypothesis, revealing the SHI population share ancestry with the HO population in both the Kii Peninsula and Shikoku Island (Fig. S1). These findings suggest that the SHI population once extended across these regions but got fragmented as the rapidly expanding HO population isolated SHI individuals into enclaves. In contrast, presence of the HO population in northwestern Shikoku represents a moving contact zone scenario driven by its rapid expansion. Notably, the HO population exhibits the widest distribution among all, extending as far as Hokkaido and crossing the Blakiston Line, a key faunal boundary. Our results revealed no significant genetic differences in mtDNA and nuDNA between the Hokkaido specimens and other HO specimens, indicating either recent expansion or human-mediated introduction, as proposed by Sugime et al.11. These findings highlight the dynamic role of the HO population in shaping G. dehaani population distribution through rapid expansion, and its influence on the enclave distribution of other populations.

On the other hand, the sKK population from the southern Kyushu, its adjacent islands, and along the coastal areas from the Izu Peninsula to the Boso Peninsula (southern Kanto), forms an enclave, with a disjunct distribution across Kyushu and Honshu islands that creates a unique geographic pattern. Takenaka et al.17 proposed that the southern Kanto lineage (clades 3h in this study, see Fig. 2) expanded via oceanic dispersal from southern Kyushu when the Izu Peninsula was an island (i.e., moving contact zone scenario), and estimated their divergence at 1.1–6.0 MYA using mtDNA 16S and COI regions. Our genome-wide SNP data estimated the divergence time for the sKK population at 0.42 (0.22–0.63) MYA, earlier than their estimation. If the spread to the southern Kanto region occurred millions of years ago, the G. dehaani species complex, which lacks a planktonic larval stage, should have differentiated after geographical isolation and produced significant differences in our ADMIXTURE analysis, which suggests an overestimation of the divergence time for southern Kanto lineage at 1.1–6.0 MYA. Hence, we propose two alternative possibilities: (i) dispersal via marine routes after collision of the Izu Peninsula with Honshu, or (ii) persistence as remnant patches of a past widely distributed population. On the other hand, our divergence time estimation by the DIYABC is limited in its accuracy due to the absence of fossil records or secondary calibrations for Geothelphusa species, preventing the use of an external molecular clock. Future studies employing high-resolution methods, such as whole-genome sequencing, are necessary to clarify their intrapopulation genetic structure and provide deeper insights into the unique distribution of the sKK population.

Relationship between body color trait and genetic population structure

This study did not find a complete correspondence between body color types (DA, RE, BL, OC, and amakusa) and populations using COI and SNP analyses. However, we observed a biased distribution pattern along Pacific coast areas for the BL and RE types. Such a biased distribution of specific coloration types could arise either from repeated local adaptations to coastal climate and environment or from phenotypic plasticity that is environmentally induced along the same gradient. Although the BL type emerged twice independently in the evolutionary history of the G. dehaani species complex (Fig. 4, sKK and SHI populations), may be due to the repeated climate adaptation. For example, pale carapace coloration in crustaceans may play a thermoregulatory role37; furthermore, the range of the BL type overlaps with warm Pacific coastal areas influenced by the Kuroshio Current. In addition, BL type does not change across the body color types through rearing24, indicating that genetic adaptation determines to blue body color trait. On the other hand, the RE type, which showed a similar distribution pattern to BL type, may be due to phenotypic plasticity for environmentally or dietary variation. Although COI and SNP data revealed no genetic differences between DA and RE types within the same populations, dietary availability of astaxanthin precursors, such as β-carotene and zeaxanthin23,38, may generate a plastic response. Alternatively, an undetected genetic structure within the population may be another contributor. Regardless of whether the mechanism is the genetic or the plastic, the repeated emergence and coastal bias of the BL and RE types make the G. dehaani species complex a promising model for investigating how color traits respond to climate and dietary gradients.

Despite the incomplete correspondence between body color traits and populations, combining body color with collection locality data could help identify the population affinities of collected specimens. For example, the sKK population, exclusively comprising BL individuals, spans southern Kyushu and southern Kanto, indicating that body color and collection locality can together distinguish populations. Similarly, the HO and nKC populations, predominantly of DA or RE type, can often be identified this way, despite rare BL-type individuals (Table S1, No. 249 in the HO population area and No. 313 in the nKC area). These rare cases likely represent an individual-level color variant, considering the limited collection sites and specimen numbers, similar to findings for Ryukyum yaeyamense39.

In contrast, the SHI and cK populations exhibited greater intraspecific variation in body coloration (Figs. 1 and 3). Although the SHI population predominantly comprises BL individuals, it also includes DA/RE types. Mismatched specimens sharing multiple common ancestors (Fig. 3) suggest that the genetic introgression through secondary contact contributes to the discordance in body color within the population. The cK population also displayed various coloration types, including DA, RE, OC, and even G. amakusa types, despite consistent genetic proportions (Fig. 3). Such variations in SHI and cK populations imply historical or ongoing gene flow and highlight the need for reevaluating the genetic and morphological boundaries in the G. dehaani species complex, including G. amakusa. Molecular methods, especially genome-wide analyses, are essential for resolving the genetic structure and identifying individuals in SHI and cK effectively in future research.

Discordance between mtDNA and nuDNA sequences

Our molecular analysis highlights the need for caution when interpreting previous studies relying on the combinations of partial mtDNA and nuDNA sequences. While our COI-based neighbor-net diagram aligns with earlier low-resolution studies using isozymes or partial sequences14,15,17,27, the population structure revealed by our ADMIXTURE analysis using SNPs significantly differs. This discordance emphasizes the importance of genome-wide analyses alongside partial sequence analyses for accurately elucidating the genetic phylogeny and population structure of the G. dehaani species complex.

Discordances between mtDNA and nuDNA are common in various taxa and often attributed to incomplete lineage sorting or other biological processes for mtDNA (e.g., adaptive introgression, demographic disparities, or sex-related asymmetries)20. Although at present we cannot determine the mechanism and process of mito-nuclear discordance in the G. dehaani species complex, ILS is unlikely for this reason; there was a strong geographic discordance between patterns of nuDNA and mtDNA and geographically coherent distribution patterns (Figs. 1 and 3), whereas stochastic ILS should not be expected to produce such clear biogeographic pattern20. For the many taxa where mtDNA and nuDNA do not match, mtDNA is less structured than nuDNA20; a similar result was obtained for the G. dehaani species complex.

In contrast, population overlap areas shown by nuclear information, individuals exhibited genetic elements from both populations, indicate the past secondary contact between two populations (e.g., nKC and HO populations; Fig. 3). Breeding between isolated populations with divergent patterns of gene flow between mtDNA and nuDNA may promote the mito-nuclear discordance20. Our results imply that nuDNA may unify within populations with maintenance over time of mtDNA variation due to gene flow or ecological differences, such as sex-biased mobility. We, therefore, speculate that repeated contact and separation between populations created the complex structure observed in mtDNA COI, in contrast with the population structure derived from inter-simple sequence repeat (ISSR) SNPs. Our research indicates that gene flow may still be ongoing or might have historically occurred but has since ceased. This study did not analyze the gene flow dynamics in detail, as the samples were selected to investigate the overall genetic characteristics of G. dehaani populations. Future studies focusing on genetic boundary regions are needed to clarify these patterns.

Conclusion

Our study reveals the complex genetic population structure of the Japanese freshwater crab G. dehaani species complex, resulting from multiple geological events, including volcanic activity and sea level fluctuations, in the Japanese Islands. ADMIXTURE analysis identified five populations with distinct geographic boundaries. On Kyushu Island, three distinct populations (nKC, cK, and sKK) separated along a North–South gradient were likely influenced by large-scale disturbances, such as volcanic activity. The recent rapid expansion of the HO population, indicated by low genetic diversity, has contributed to the unique enclave distribution of the SHI population. Previously, a high genetic segmentation in G. dehaani sensu lato was expected because of its low dispersal ability and lack of planktonic phase; in fact, a recent study divided it into 10 populations or cryptic species across the Japanese Islands using a combination of partial mtDNA markers17. In contrast, our ADMIXTURE results indicate that G. dehaani species complex has a higher dispersal ability than expected. Thus, we hypothesized that adult crabs can move over wide distances, and the high adaptation of G. dehaani sensu lato to terrestrial life and terrestrial excursions with roaming behavior could support our hypothesis16,40. Further, regional body color variations partially correlate with SNP clades, suggesting the potential of combining collection locality and body color for population identification. However, further molecular and morphological approaches are needed for precise population-identification from specimens. Such integrating approaches will also contribute to resolve the taxonomic issues in G. dehaani species complex. Contrasting patterns between mtDNA and nuDNA highlight historical gene flow and introgression, underscoring the need for comprehensive genomic analysis. In addition, the discordance between nuclear SNPs and mtDNA classifications warrants caution when interpreting earlier phylogenetic studies based on combined mtDNA and nuDNA data. These findings provide new insights into the evolutionary history and genetic diversity of the G. dehaani species complex, providing a basis for future research on its taxonomy, phylogeny, and population dynamics.

Materials and methods

Sample collection and laboratory procedures

From 2019 to 2023, 504 Geothelphusa specimens, including G. dehaani species complex, G. exiguia, G. koshikiensis, G. marmorata, and G. sakamotoana, were collected from 217 sites, spanning from Hokkaido to the central Ryukyu Islands (Table S1). In this study, G. dehaani, G. mutsu, and G. amakusa were all considered as G. dehaani species complex without distinction due to ambiguous species boundaries and limited distribution knowledge. Almost all specimens were transported to the laboratory either alive or anesthetized, and photographs were captured to document live body coloration. To correct for color differences caused by photographic conditions, we performed color correction using Adobe Photoshop element 8.0 and a color scale bar (Casmatch, Bear Medic Co.) for quantitative comparisons. The specimens were categorized into four types based on Chokki21,22 and Naruse and Ng9: BL (grayish blue or bluish green carapace, occasionally including dark reddish purple, with light grayish or light yellow thoracic legs), DA (dark purplish brown carapace and thoracic legs), RE (dark brown carapace, sometimes with orange or brown color on the posterolateral portion, and reddish orange thoracic legs), and “amakusa” (a sky blue carapace with whitish, brownish, or reddish thoracic legs). Specimens with unclassifiable color patterns were labeled as OC (other color). Specimens were fixed in 80% ethanol, and part of a leg was transferred to 99% ethanol for DNA analysis. Some specimens were directly fixed in 99.5% ethanol in the field. In addition, out-group specimens, including G. exigua (n = 2, collected at southern Kyushu), G. marmorata (n = 2, Yakushima Island), G. koshikiensis (n = 6, Koshiki Islands), and G. sakamotoana (n = 2, Okinawa Island and Tokunoshima Island) were treated similar to G. dehaani samples. The identification of out-group specimens was based on morphological characteristics, such as body color and male first pleopod shapes10. All fixed specimens were registered and are preserved at the Wakayama Prefectural Museum of Natural History (WMNH). Our research protocols complied with the current laws of Japan, including the animal welfare ones, and followed the guidelines for protecting and promoting decapod crustacean welfare in research, by the Insect Welfare Research Society41.

DNA extraction, sequencing, and alignment

Mitochondrial DNA

A total of 485 specimens from 209 localities were selected for mtDNA analysis of the COI region, covering the distribution ranges of the G. dehaani species complex (Table S1). This selection criteria were set to include specimens as comprehensively as possible from Hokkaido to Tokara Islands, ensuring a variety of color patterns were included. Genomic DNA was extracted from the leg muscles of each specimen using a DNeasy Blood and Tissue Kit (Qiagen, Hilden, Germany). Partial sequences of the mtDNA COI-encoding region were amplified by PCR using the universal primers jgLCO1490 and jgHCO219842 with TaKaRa Ex Taq (TaKaRa, Shiga, Japan). The PCR protocol involved mixing 10 µL of Premix TaqTM (Takara Bio Inc., Tokyo, Japan), 1.2 µL of each primer (10 µM), 6.6 µL of Milli-Q water, and 1 µL of template DNA. PCR was performed using Applied Biosystems VeritiPro (Thermo Fisher Scientific K.K., Massachusetts, USA) under the following conditions: 94 °C for 1 min, followed by 35 cycles of 94 °C for 40 s, 51 °C for 40 s, and 72 °C for 1 min, and final elongation at 72 °C for 7 min. A 3-µL sample of the PCR product was mixed with 1 µL of Midori Green Direct DNA Stain (NIPPON Genetics, Tokyo, Japan) and electrophoresed on a 2% agarose gel at 100 V for 15 min. Then, the gel was visualized under UV light to confirm DNA amplification. Amplification products were purified by mixing 17 µL of PCR products with 1 µL of ExoSAP-IT solution (Thermo Fisher Scientific K.K., Massachusetts, USA) and incubating at 37 °C for 15 min and 80 °C for 15 min. The purified PCR products were then sent to Macro-gen Japan (Tokyo, Japan) for sequencing with the same primers used for PCR amplification. The resulting sequences were aligned using the MEGA 11 and ClustalW software43. The mtDNA sequences were registered in the DNA Data Bank of Japan (DDBJ) (accession numbers: LC864606–LC865102), and the haplotypes were determined (Table S2).

SNPs

Genome-wide SNPs were analyzed using multiplexed inter-simple sequence repeat genotyping by sequencing (MIG-seq)44 for 154 individuals from 92 sites in the G. dehaani species complex utilizing the Illumina Hiseq system (Illumina) (Table S1). MIG-seq is a technology that uses ISSR as a universal multiplex PCR primer to simultaneously amplify and analyze multiple genomic regions to obtain genome-wide sequence information44. To ensure high-quality sequence data, reads 1 and 2, adapter regions, and low-quality reads were removed using fastp ver. 345. The quality threshold was set to 30, with a required length of 134, trimming front 1 and front 2 by 14, cutting front/tail set to 20. The raw data obtained by MIG-seq were registered at DDBJ (accession numbers: DRR641568–DRR641721).

Population genetic analysis

Mitochondrial DNA analysis

In addition to the samples used in this study, Genbank samples from G. dehaani from Hachijyo Island (LC743147, LC743148), G. dehaani from Kuchinoshima Island (LC743174)17, G. dehaani from Naju, South Korea (MG674171)46, and G. sakamotoana from Takara Island (LC743301, LC743302)17 were included in the analysis. To assess genetic diversity and neutrality, haplotype diversity (h), nucleotide diversity (π)47, Tajima’s D test48, and Fu’s Fs test49 were calculated for each population using Arlequin version 3.5.2.250. Genetic differentiation among populations was examined using the pairwise fixation index (FST) and FST p-value51. A network diagram was created using the neighbor-net method52 with SplitsTree453.

SNP analysis

Quality-filtered sequence data were used to identify SNPs using Stacks 2.4154. The Stacks pipeline denovo_map.pl was implemented with default values, except where indicated. The values for ustacks and population analysis were set based on Onuki & Fuke55. In ustacks, the minimum coverage depth (m) was set to 5 and the maximum distance between stacks (M) to 2. For population analysis, the min-samples-overall (R) was set to 0.5, the minimum minor allele count (min-mac) to 2, and the maximum observed heterozygosity (max-obs-het) to 0.75; only the first SNP per locus was used, based on Onuki & Fuke55. We calculated the Ho, He47, π, and FIS56 using the fstats module in the populations program in Stacks 2.41. Pairwise population differentiation indices FST and FST57 were calculated with the packages hierfstat 0.5.11 and mmod 1.3.3 in R software 4.5.058, respectively, with 1,000 bootstrap replications and Bonferroni correction.

Genetic structure analysis was performed with ADMIXTURE version 1.3.059, using C set to 0.0001 and S set to time. Output data were visualized in Excel. All individuals included in the SNP analysis (set 0) were analyzed by dividing the population into three mtDNA clades on popmap. Five populations were identified based on set 0 and ADMIXTURE analysis was performed on six adjacent sets to assess gene flow (Table S5; Fig. S3). This approach examined whether individuals with genetic elements from the multiple populations detected in set 0 are affected by gene flow or by the false genetic flow caused by analysis involving many groups. In each set, we selected individuals presumed to have been affected by gene flow and pure individuals from two of the five groups identified in set 0, from adjacent regions and surrounding areas (Fig. S4). Individuals included in the set were treated as a single group on popmap. Set 1 included populations in the Kanto region (17 individuals from Ibaraki, Tochigi, Gunma, Saitama, Tokyo, Kanagawa, and Shizuoka prefectures). Set 2 covered the Kinki region (16 individuals from Shiga, Mie, Nara, and Wakayama prefectures) and set 3 the Chugoku region (16 individuals from Kyoto, Hyogo, Tottori, Okayama, Shimane, Hiroshima, and Yamaguchi Prefectures), while set 4 included populations in the Shikoku region (21 individuals from Awaji Island-Hyogo Prefecture, Kagawa, Tokushima, Kochi, and Ehime Prefectures), set 5 represented northern Kyushu (31 individuals from Fukuoka, Saga, Oita, Nagasaki, and Kumamoto Prefectures), and set 6 included populations in southern Kyushu (39 individuals from Saga, Nagasaki, Kumamoto, Miyazaki, and Kagoshima Prefectures).

Estimation of the population demography history

To investigate the history of genetic diversification, we conducted a coalescent analysis using ABC with DIYABC Random Forest version 2.160. Based on the results of our ADMIXTURE analysis (see results), we classified the populations into five groups: pop1 (Pacific coast from Boso Peninsula to Izu Peninsula + southern Kyushu and surrounding islands, hereafter referred to as the sKK population), pop2 (southern Shikoku + part of Kii Peninsula, SHI population), pop3 (central Kyushu, cK population), pop4 (northern Kyushu + Chugoku region of Honshu, nKC population), and pop5 (Hokkaido + Honshu from Tohoku to Chugoku + part of Shikoku, HO population). As ABC frameworks require populations without continuous gene flow, we focused the analysis on 118 specimens likely descending from a single ancestral population, based on the ADMIXTURE results (see results in detail). We used a Q-value threshold (< 0.95) often used to classify one genetic class (e.g61). In addition, we prepared a dataset with no missing values for DIYABC analysis, using 313 SNP loci. We used Hudson’s algorithm62 to calculate the minor allele frequency. Assuming a generation time of four years for G. dehaani29 and an Early Pleistocene age (2 Mya) for the ancestral population’s initial colony formation63, we set the effective population sizes and generation parameters a priori at 100,000–500,000 (Table S7). All prior values were drawn from uniform distributions, under the condition t1 < t2 < t3 < t4 are generation time for merging of populations.

For scenario selection, the DIYABC assigns a vote for each scenario, reflecting how often it was chosen across a forest of n trees. The scenario with the highest vote count is considered the best fit for the dataset, alongside an estimate of the posterior probability for that scenario. A training set comprising 96,000 simulations and 500 trees was used to identify the most supported model scenario. Prior to that, we performed pre-scenario checks using PCA to detect potential model misspecification by comparing the prior and posterior parameter distributions.

DIYABC-RF allows for simultaneous comparison of many alternative scenarios, because the Random-Forest classifier implemented in DIYABC-RF is designed for multi-class model choice and remains accurate even when analyzing dozens of competing scenarios at once60. In this study, we evaluated a total of 32 scenarios across eight groups, in a single run (Figure S5): 1, scenarios based on body color (i.e., BL vs. DA, scenarios 1–3); 2, scenarios based on geographic separation areas at first (i.e., Honshu vs. Kyushu, scenarios 4–7); 3–7, scenarios where each population diverged sequentially (i.e., from sKK to HO, scenarios 8–11, 12–15, 16–19, 20–23, and 24–28, respectively); and 8, scenarios involving nearly simultaneous multiple divergences (scenarios 29–32). Scenario groups 1 and 2 were derived from earlier hypotheses about geological (island) isolation and body color variation13,14,15,17,27. Furthermore, we added the scenario group 3–8 to obtain a detailed divergence order among all populations. Although we preliminary analyzed it by a two-step approach (i.e., DIYABC for each scenario group and a final DIYABC combining the best scenario for each group), the results were almost same as with the single approach. For conciseness, we present the single approach in this study.