Introduction

Despite ongoing efforts to understand the factors that influence the evolution of species, it is widely recognized that speciation is a complex process involving mutation, gene flow, natural selection, and genetic drift1. Natural selection is often considered the primary force shaping genetic diversity in species2,3,4, while neutral theories suggest that evolution is not natural selection, but rather random genetic drift5. In the last three decades, many authors have considered the neutral theory to be outdated6, but recent studies have demonstrated that short tandem repeat sequences (STRs) can bind transcription factor DNA-binding domains7. A new theory-the theory of maximum genetic diversity8 has also emerged, which retains the advantages of the neutral theory. Therefore, the present form of the neutral theory is incomplete and, despite its limited applicability, likely to remain an integral part of the exploration of molecular evolution5. The comparison of DNA sequences has become a valuable tool in studying the evolutionary forces at play9. For pests, insights into how phylogeography and populations’ evolutionary history patterns are shaped and their major associated evolutionary mechanisms are still lacking, but this is important for sustainable management and ecology and evolution studies10,11,12.

Mitochondrial DNA cytochrome c oxidase subunit I [COI] and ribosomal DNA Internal Transcribed Spacer I [ITS1] markers have been widely and successfully used to reveal the genetic structure and evolutionary biology of numerous agricultural pests, including pink rice borer13, Spodoptera frugiperda14, Arctodiaptomus dorsalis15, and Aculops lycopersici16. It is well known that ribosomal genes are commonly used in molecular phylogeny, kinship analysis, since ITS is under less natural selection pressure during the evolutionary process, more variation exists17, and the evolutionary rate of ITS fragments is 10 times higher than that of 18S rDNA. Recent studies have found that ITS1 is more efficient than ITS2, and its primers’ applicability are more general and widespread, making ITS1 superior to ITS2. The evolutionary rate of mitochondrial genes is faster18 and is more effective in revealing haplotypes and population histories than that of ribosomal genes.

The Pomacea spp., freshwater gastropod snails native to the Amazon region of South America, has spread and become a worldwide pest, owing to its extreme adaptability, very wide host range, and lack of natural enemies19,20. The Pomacea spp. are a group consisting of several species, including Pomacea canaliculata and Pomacea maculata, the primary species introduced into China. These two snails have been frequently mistaken for one species because of their morphological similarities21, and, for a long time, P. canaliculata was considered to be the only invasive snail, meaning that the two were conflated in early literature reports22. Analyses of the Pomacea spp. based on mitochondrial genes appeared in 2011, followed by the molecular identification of the Pomacea spp. using COI23,24, and genetic differentiation analyses of different populations have gradually increased in recent years25,26,27,28,29, with COI also being widely used in population history dynamics studies30,31,32. However, an analysis of the Pomacea spp. based on ribosomal genes has not been reported. Therefore, it is extremely necessary to detect the geographical distribution of the genetic structure of P. canaliculata from the perspective of various populations based on the two genes.

In China, however, knowledge of the phylogeography and evolutionary biology of Pomacea spp. is limited compared to other pests. In this study, we explored the invasive pathway and dispersal patterns of P. canaliculata through an integrated phylogeographic approach. We collected a large number of P. canaliculata specimens from almost all the damaged areas in China (20 geographic populations in total) and analyzed genetic data with mitochondrial and ribosomal DNA COI and ITS1, respectively, as deciphering the phylogeography and population evolutionary history can facilitate the development of effective pest suppression protocols and sustainable management strategies33,34. The primary objectives of this study were to: (a) evaluate the genetic diversity of P. canaliculata; (b) investigate the evolutionary history of population in China.

Materials and methods

Sample collection and DNA extraction

In 2021–2022, we collected both snails and egg of Pomacea spp. from rivers, ponds, and public farmland at 20 different sites across China, almost covering all the regions. The total DNA from the evenly sized, collected adult snails was purified using the AxyPrep Genomic DNA Kit (Suzhou, China).

The mitochondrial and ribosomal gene sequencing

The primers for COI and ITS1 were designed using Primer5. COI was amplified using Primers(F) (5’AGAGTGGTGCTGGAACTGGATGAA3’ and 5’CCGGTGCTCTATTAGGGGATGATCAA3’) and Reverse(R) (5’GCTAATATAGCATAAATTATTCCTAAAGTACC3’). The ITS1 region was amplified using primers F (5’CGGATTGGTCTCGGCCCGCCCTTCA3’) and R (5’ATCCACCGCCTAAAGTTGTT3’) also used for the sequencing reactions. For the polymerase chain reactions (PCRs), we used 2μL DNA in the reaction mixture containing 12.5μL Taq Master Mix, 1μL of each primer, and 8.5μL ddH2O. The PCR temperature profile was the following: 94℃ for 3min, 94℃ for 1min, 55℃ for 45s, 72℃ for 1min, 35 cycles, and 72℃ for 10min. An electrophoretic analysis of the PCR fragments was performed on 1% agarose gel. The resulting PCR products were purified using a Gel DNA Extraction Kit (Vazyme) and sequenced with an ABI 3730XL DNA sequencer (Sangon Biotech, Shanghai). Using the National Center for Biotechnology Information (NCBI)-Blast (http://www.ncbi.nlm.nih.gov/BLAST) based on the COI gene against homologous sequences in GenBank, the highest sequence homology (> 97%) was identified for P. canaliclata. The distribution of P. canaliculata was screened, with 6 uniformly sized adult snails selected at each locus for subsequent studies; the locations are shown in Fig. 1 and Table 1. Sequences of 18 locations were submitted to the NCBI (GenBank accession nos. PP461874-PP461981 and PP448193-PP448300).

Figure 1
figure 1

Sampling locations of 18 P. canaliculata populations in China: map created using Arcgis platform. Abbreviations described in Table 1.

Table 1 Sample collecting information of P. canaliculata in this study.

Genetic diversity analysis

We utilized CLUSTAL X 1.8335 to manually align the sequences after removing any redundant ones. Sequences were calculated for parameters such as parsimony informative sites (Pi), single mutation sites(S), variant sites(V), GC content of the sequences, reversion values(R) of the sequences, and distances within and among groups using MEGA X36,37. Genetic diversity was analyzed, including parameters such as the number of polymorphic sites(S), the total number of mutations(η)38, the number of haplotypes(H), nucleotide diversity (π), haplotype diversity (Hd)39, and the average number of nucleotide differences (K)40 using DNASP v. 5.041.

Population genetic structure analyses

We applied multiple complementary approaches to explore the population genetic structure of P. canaliculata based on both mtDNA and rDNA datasets.

Using the Bayesian Markov Chain Monte Carlo (MCMC) model in the STRUCTURE v2.3.3 software42,43. We performed this analysis with K ranging from 2 to 10, a burn-in of 10,000, and a run length of 100,000, each K-value obtained with 10 independent runs. The statistic “K”, indicating the most likely number of subpopulations, was calculated following the Evanno method44, and then the Q matrix under the highest K-value was obtained. Next, to visualize the genetic divergence between populations, a pairwise distance matrix derived from the COI and ITS1 genes Nei’s genetic distance for all samples was calculated, and a neighbor-joining(NJ) tree was constructed using TASSEL v5.0, which was used to analyze kinship45. In addition, we utilized principal component analysis (PCA) in the R environment as a complementary method to identify the genetic structure of P. canaliculata.

F-statistics value (Fst; differentiation index), gene flow (Nm), and exact tests between the populations were calculated based on both mtDNA and rNDA datasets using the Arlequin v. 3.5 software46. and an analysis of molecular variance (AMOVA).

Haplotype relationship analysis

We constructed split networks to reveal relationships among mitochondrial haplotypes using NETWORK v. 10.2.0.047,48and performed a neighbour-net analysis under a Kimura-2-parameter (K2P) distance model after 1000 bootstraps.

Demographic history analysis

The demographic history of P. canaliculata populations in China was studied using mismatch distributions data in DNASP v. 5.041. Neutrality tests based on Tajima's D and Fu's Fs parameters were performed, with all the parameters evaluated based on 1000 bootstrap replicates.

Result

High level of genetic diversity

There were 445 base pairs in the mtDNA COI sequences, with 232 bp and 114 bp being variable and parsimony informative, respectively. For the COI genes, 30 haplotypes were identified (Supplementary Table S1 online), with 22 distinct haplotypes only present in one individual, the remaining 8 haplotypes present in at least two or more individuals, and Hap9 being the most prevalent haplotype in all populations. The majority of the mutation sites were found at position 1 (140, 136, and 135 at position 1, 2, 3), according to our analysis of the polymorphisms at each of the codon's positions (1, 2, or 3) (Supplementary Table S3 online). There were 118 single-ton variable sites. In addition, the rDNA ITS1 sequences were 469 base pairs, with 89 bp and 56 bp being, respectively, variable being parsimony informative. There were 61 haplotypes found for the ITS1 genes (Supplementary Table S2 online), of which 52 were distinct haplotypes found in just one individual, while the other 9 haplotypes were found in at least two or more individuals. Hap1 was the most common and ubiquitous haplotype across the populations. The majority of the mutation sites were found at position 3 (146, 144 and 147 at position 1, 2, 3), according to our analysis of the polymorphisms at each of the codon's positions (Supplementary Table S4 online); 33 of these were singleton variables. The genetic variance in the eighteen P. canaliculata populations in China varied, with the majority showing significant levels of variation. For the COI and ITS1 genes independently, we created haplotype-based network graphs where each node corresponded to a haplotype (Fig. 2).

Figure 2
figure 2

Haplotype-based network graphs of P. canaliculata. (a) based on mtDNA COI gene. (b) based on rDNA ITS1 gene.

In terms of COI, the average haplotype diversity (Hd) across all the populations was 0.877, with an average nucleotide diversity (π) of 0.04304 and a mean number of nucleotide differences (K) of 14.719 (Table 2). Regarding ITS1, the mean haplotype diversity (Hd) across all the populations was 0.924, with a mean nucleotide diversity (π) of 0.02336 and an average number of nucleotide differences (K) of 10.161 (Table 2). Additionally, an evaluation of the genetic diversity of P. canaliculata was conducted by excluding unique haplotypes (e.g., Hap3 and Hap5) exclusive to individual P. canaliculata. Upon the removal of these distinctive haplotypes, the average haplotype diversity (Hd) for all the populations in the COI gene dropped to 0.812. The mean nucleotide diversity (π) across all the populations was 0.03600, with an average number of nucleotide differences (K) of 14.938. Similarly, for the ITS1 gene, the average haplotype diversity (Hd) among all the populations decreased to 0.717, the mean nucleotide diversity (π) was 0.02257, and the average count of nucleotide differences (K) was 9.910. These parameters collectively indicate a substantial level of genetic diversity within P. canaliculata.

Table 2 Parameters of genetic diversity of 18 P. canaliculata populations based on mtDNA and rDNA.

Demography history

We determined the population genetic structure of P. canaliculata based on both mtDNA and rDNA datasets.

From the mtDNA data analyses, a Bayesian cluster analysis using STRUCTURE revealed an optimal value of K = 4 was the best fit for the 18 populations in China and a relatively high value of Evanno’s delta (K) (Fig. 3a). The FST value among all the geographical groups was 0.45296 (Table 3), indicating a high level of genetic differentiation among different geographic populations, alongside genetic exchange (Fig. 4a). The Exact test showed that 67.32% of the populations (Table S5 online) did not support the random mating group hypothesis (P < 0.05), suggesting that reproductive isolation has occurred between some of the geographic populations. As a result, a possible cryptic species of P. canaliculata was found based on the COI gene alone. The results of the NJ phylogenetic tree (Fig. S1a online) and PCA (Fig. 3b) were consistent with the population stratification obtained from the STRUCTURE software (Fig. 3c). Therefore, four more subgroups were identified (Fig. 3a): the first subgroup included the JSYZ, WBWH, JGZTR, GXNN, GDGZ, and HNDZ populations; the second subgroup contained the ZJHZ, JXNC, JXSR, FJFZ, and CQ populations; the third subgroup contained the JSYC, SCCD, and YNYX populations; and the fourth subgroup comprised the JSSZ, FJXM, GZGY, and ZJZJ populations.

Figure 3
figure 3

Four P. canaliculata subgroups in China based on mtDNA data analysis. (a) Graph showing the number of inferred clusters (K) using Evanno’s delta (K) method. (b) Principal components analysis plot for P. canaliculata populations based on mtDNA genotypes; two-dimensional scales are used to reveal population stratification. (c) Four P. canaliculata subgroups in China based on STRUCTURE analysis. Abbreviations of the populations are described in Table S1online.

Table 3 Analysis of molecular variance (AMOVA) of COI gene and ITS1 gene for 108 individuals in 18 P. canaliculata populations.
Figure 4
figure 4

Heatmaps of genetic differentiation (Fst, lower left triangles) and gene flow (Nm, top right triangles) of different P. canaliculata populations in China using GraphPad Prism v8.0.2. (a) Analyzed using mtDNA COI gene. (b) Analyzed using rDNA ITS1 gene. Note: *P < 0.05, **P < 0.01, ***P < 0.001, ∞ infinity, green squares indicate Nm > 1.

From the rDNA data analyses, a Bayesian cluster analysis using STRUCTURE revealed an optimal value of K = 7 was the best fit for the 18 populations in China (Fig. 5a), and a relatively high Evanno’s delta (K) of K = 3 (Fig. 5a). Our TASSEL analyses also confirmed these three subgroups (Fig. S1b online). In the PCA, the populations from the second subgroup were located at the tips and connected the first and third subgroups (Fig. 5b). The FST value among all the geographic populations was 0.13573(Table 3), occurring primarily within populations, with the genetic variation within populations being caused by differences in the haplotypes among them. There was less genetic differentiation and more gene flow among different geographic populations (Fig. 4b), suggesting sufficient genetic exchange but some genetic differentiation between populations. The Exact test showed that 77.22% of the populations (Table S6 online) supported the random mating group hypothesis (P > 0.05), suggesting that the majority of the populations were not reproductively segregated from one another, and, therefore, there may be no cryptic P. canaliculata species found based on the ITS1 gene alone. Three more subgroups were identified (Fig. 5c): the first subgroup included the FJXM, FJFZ, JXNC, HNDZ, SCCD, GDGZ, GZGY, and GXNN populations; the second subgroup contained the JXSR, ZJZJ, ZJHZ, JSYC, JSSZ, JSYZ, GZTR, and CQ populations; and the third subgroup contained the HBWH and YNYX populations.

Figure 5
figure 5

Three P. canaliculata subgroups in China based on rDNA data analysis. (a) Graph showing the number of inferred clusters(K) using Evanno’s delta(K) method. (b) Principal components analysis plot in the populations of 108 accessions, two-dimensional scales were used to reveal population stratification. (c) Population structure derived from STRUCTURE software. Each parallel row represents one individual and each colored segment in each column represents the percentage of the individual in the population. Abbreviations of the populations are described in Table S1 online.

Demography history

To uncover the demography history of P. canaliculata in China, neutrality tests were conducted using Tajima’s D and Fu’s Fs statistics. Based on the mtDNA COI gene, the values of Tajima’s D for all the samples were positive (D = 0.12648), and Fu’s Fs statistic was positive (Fs = 3.37497) but not significant (P > 0.05). Tajima’s D and Fu’s Fs statistics for the FJFZ, GZGY, GZTR, HBWH, and ZJZJ populations were 0, indicating that these 5 populations conformed to a neutral evolution model. Meanwhile, Tajima’s D and Fu’s Fs statistics for the GDGZ, JSSZ, JSYZ, YNYX, and ZJHZ populations were positive but not significant(P > 0.05), which suggested that these 6 populations may have been in equilibrium or contraction (Table 4). Based on the rDNA ITS1 gene, the values of Tajima’s D for all the samples were positive (D = -0.02777), while Fu’s Fs statistic was negative (Fs = 0.76997) but not significant (P > 0.05). Tajima’s D and Fu’s Fs statistics for the GXNN population were 0, indicating that conformed to a neutral evolution model. Instead, Tajima’s D and Fu’s Fs statistics for the GDGZ, JSYC, JXSR, and CQ populations were negative but not significant (P > 0.05), suggesting that these four populations may have been in an expansionary phase. Finally, Tajima’s D and Fu’s Fs statistics for the FJFZ, FJXM, GZTR, HNDZ, JXNC, and ZJHZ populations were positive but not significant (P > 0.05), indicating that these 6 populations may have been in equilibrium or contraction (Table 4). Based on our analysis, the multimodal mismatch distribution of all P. canaliculata samples (Fig. 6) may indicate that the P. canaliculata populations in China fit a neutral evolution model or that our samples covered several divergent populations.

Table 4 Neutraility test based on mtDNA and rDNA.
Figure 6
figure 6

Mismatch distribution of P. canaliculata. (a) based on mtDNA COI gene. (b) based on rDNA ITS1 gene.

Discussion

For the COI gene, the genetic distance between groups of different geographical populations was within 7.0%, and the results of the molecular variance analysis (AMOVA) showed that the Fst value was 0.45296 (P = 0.0000), the Fst statistic at the population level showed a degree of genetic differentiation among the 18 populations, that is, values greater than 0.05 (P < 0.05) and 0.25 (P < 0.05) between 71.24 and 64.71% of the populations, respectively, and less than 0.05 between 13.73% of the populations, but the difference was not significant (P > 0.05). Previously, Balloux and Lugon-Moulin49 established that Fst values greater than 0.05 indicate the presence of genetic differentiation, with larger values indicating greater genetic differentiation. As a result, the values observed in our study indicated that a very great degree of genetic differentiation had occurred among the different geographical populations, with FJFZ and GXNN populations having the largest Fst value, followed by ZJHZ, ZJZJ, GZTR, and CQ populations. And FJFZ and GXNN populations shared one haplotype, GZTR shared one haplotype with JSYC and JSYZ, ZJHZ shared one haplotype with JSSZ, meanwhile ZJZJ was in the largest haplotype with CQ. In the gene flow results, we found that YNYX had an infinite gene flow with GDGZ, JSYZ, JXNC, and SCCD populations. According to the literature, the Pomacea spp. has been introduced into Yunnan Province multiple times. It was first introduced in 1985 into Dehong Dai and the Jingpo Autonomous Prefecture from the Aquatic Research Institute of Guangzhouand, in the same year, into the Xishuangbanna Dai Autonomous Prefecture as a farmer enrichment project. In 1986, it was brought into Jiangyin Farm, Huangping town, Heqing County50. In 1987, it was introduced into Xiashankou, Yousou town, Eryuan County, Dali Prefecture, subsequently abandoned in the Xihu and Donghu Wetlands and the farmland ditches around Xiashankou51, and brought back into the Dehong Prefecture during a study tour in 1990 to rear two snails, subsequently entered the farmland of Cangyuan County. In 2000, the infestation was heavier in the areas around Lake Erhai52, whose large-scale invasion began in 201053,54, consistent with our results.

For the ITS1 gene, the genetic distance between groups of different geographical populations was within 2.6%, and the results of the molecular variance analysis (AMOVA) showed the Fst value was 0.13573 (P = 0.0000), the Fst values between 24.18 and 17.65% of the populations were greater than 0.05 (P < 0.05) and 0.25 (P < 0.05), respectively, which produced a large genetic differentiation between GXNN and GZGY and other geographic populations, while Fst values was 0 for the GXNN and GZGY populations. According to Li et al.55, the first appearance of the Pomacea spp. in Guizhou Province corresponded to the introduction of fish fry from Guangxi into reservoirs and ponds in Pingtang County in the early 1990s, then spreading to rivers, ditches, and farmlands, while the earliest discovery of these species in Guangxi was in the mid-1980s in Fangchenggang city, and with uncertain origins. Therefore, it can be determined that GXNN and GZGY belong to the same invasion pathway and Fst values of GZGY with JXNC, HNDZ were less than 0.05 with Fst values less than 0 for JXNC with FJFZ, FJXM, GDGZ, SCCD, and HNDZ, therefore, JXNC, FJFZ, FJXM, GDGZ, GXNN, HNDZ, and GZGY were classified into a subgroup. The genetic differentiation between JXNC and JXSR populations was small (Fst = 0.1), and the gene flow large (Nm = 4.5). In contrast, there was greater genetic differentiation between GZGY and GZTR populations (Fst = 0.36667), suggesting different invasion pathways for different geographical populations in Guizhou Province. The Fst values of GZTR and JSSZ, JSYC, and JSYZ in Jiangsu Province were all less than 0.05, and the first discovery of Pomacea spp. in Jiangsu Province was in Zhangpu Town, Kunshan City56, however, the geographical distance between the two places is 1457 km, with no way to achieving population migration by relying on the free dispersal of snails, meaning that there must have been some influence from anthropogenic activities. The results of this study also found that there may have been population expansion in GDGZ, JSYC, JXSR, and CQ, preserving the genetic diversity of P. canaliculata and increasing the possibility of colonization in new environments, and that population dispersal occurred within China.

To sum up, the results obtained based on COI and ITS1 have both similarities and differences due to the varied information they carry. The greatest genetic differentiation occurred in the GXNN population, which was first introduced into Guangdong Province, in mainland China, in 198157, and then into Guangxi and Hainan Provinces from Guangdong Province in the mid-1980s58,59, autonomously spreading to Fujian Province in 198460 In the same year, snails were introduced into Sichuan Province, and in 1987, 1988, and 2004, more specimens were introduced from different areas. In different time periods, snails were introduced from different areas into Yunnan Province61,62. At the end of the 1980s, after the introduction of overseas Pomacea specimens into Jiangxi Province due to a flood, the species spread to the whole province63. During the same period, specimens appeared in Ningbo, in Zhejiang Province, and again in the early 1990s, two introductions which we infer originated from different places64,65. In 1989 it was also introduced in Chongqing and gradually spread66, and in the early 1990s snails were introduced from Guangxi Province into Guizhou Province, where a genetically distinct geographic population (GZTR) also exists55,67. In 1998 it was introduced into Hubei Province, reintroduced in 2014, and spread by flooding in 201668,69, and artificially introduced by GZTR into Jiangsu Province in 200356. From the time of invasion and our results, we obtained that the GXNN, HNDZ, and FJ populations were introduced from Guangzhou; subsequently, SCCD, YNYX, JX, ZJ, and CQ populations were introduced from different regions, and through environmental influences and active dispersal, several populations at different times underwent sufficient genetic exchange.

The results obtained from different genes were analyzed, and it was found that the evolutionary rate based on COI gene was faster and carried more information; the Fst was greater than that obtained by ITS1; and the number of mutation sites (V = 232 and V = 89, respectively) single mutation sites (S = 118 and S = 33, respectively) was larger. A total of 30 haplotypes were based on COI, while 61 were based on ITS1. Although the haplotype diversity of COI gene (Hd = 0.877, P < 0.05) was less than that of ITS1 gene (Hd = 0.924, P < 0.05), the nucleotide diversity of COI (Pi = 0.04304, P < 0.01) was greater than that of ITS1 gene (Pi = 0.02336, P < 0.01). The Exact tests for the COI gene revealed the possible existence of a cryptic P. canaliculata species, which is consistent with the result of Lv et al.70 and Yang et al.71, who named this cryptic species as Pomacea occulta72, whereas Exact tests for ITS1 gene did not support its existence. Further indicating that the information carried by the two genes is different and that mitochondrial DNA evolves at a faster rate.

Overall, COI can better reflect the origin, migration, and dispersal of a species in a certain geographic region and be used for the study of population history and dynamics, genealogical geography, genetic diversity, and phylogenetic classification, among other topics. More than 40 years have passed since the introduction of the Pomacea spp. to mainland China, and its genetic richness should be relatively evenly represented in its various populations rather than being concentrated in one or a few specific populations, which is in line with the following findings that the genetic diversity of P. canaliculata is relatively high in the populations in the lower reaches of the Yangtze River in China32, in Zhejiang Province27, in Suzhou City28, and in Nanchong City29 populations, all of the P. canaliculata populations showed high genetic diversity, and the Chinese population of P. canaliculata was closer to the Argentinean population71, and multiple invasions occurred in the different populations, which could obtain more genotypes from the original gene pool, thus facilitating the P. canaliculata population adaptation and expansion73. We hypothesized that the evolutionary history the Pomacea spp. in China may have also been complexly influenced by some abiotic (e.g., climate) and biotic components (e.g., host plants). We need more studies, both theoretical and experimental, to reach a full understanding of evolution and variation at the molecular level of this specie.