Abstract
Evidence for ancient interspecific gene flow through hybridization has been reported in many animal and plant taxa based on genetic markers. The study of genomic patterns of closely related species with allopatric distributions allows the assessment of the relative importance of vicariant isolating events and past gene flow. Here, we investigated the role of gene flow in the evolutionary history of four closely related freshwater fish species with currently allopatric distributions in western Iberian rivers—Squalius carolitertii, S. pyrenaicus, S. torgalensis and S. aradensis—using a population genomics dataset of 23,562 SNPs from 48 individuals, obtained through genotyping by sequencing (GBS). We uncovered a species tree with two well-differentiated clades: (i) S. carolitertii and S. pyrenaicus; and (ii) S. torgalensis and S. aradensis. By using D-statistics and demographic modelling based on the site frequency spectrum, comparing alternative demographic scenarios of hybrid origin, secondary contact and isolation, we found that the S. pyrenaicus North lineage is likely the result of an ancient hybridization event between S. carolitertii (contributing ~84%) and S. pyrenaicus South lineage (contributing ~16%), consistent with a hybrid speciation scenario. Furthermore, in the hybrid lineage, we identify outlier loci potentially affected by selection favouring genes from each parental lineage at different genomic regions. Our results suggest that ancient hybridization can affect speciation and that freshwater fish species currently in allopatry are useful to study these processes.
Similar content being viewed by others
Introduction
How populations diverge and ultimately originate new species has always intrigued evolutionary biologists. Speciation is assumed to occur due to a continuous reduction in gene flow until reproductive isolation is achieved and populations maintain phenotypic and genetic distinctiveness (Coyne and Orr 2004). Thus, irrespective of whether speciation occurs in a strictly allopatric scenario or in sympatry (Bush 1975; Schluter 2009), it is important to understand the role of gene flow.
Hybridization, i.e. mating between different species or genetically distinct lineages, can lead to interspecific gene flow. While its prevalence and role in evolution has been discussed for some time (e.g. Anderson and Stebbins 1954; Stebbins 1959; Grant 1981), advances in our ability to sequence and analyse genomes of non-model species (Davey et al. 2011; da Fonseca et al. 2016; Payseur and Rieseberg 2016) have shown that hybridization is more widespread than previously thought, especially in the animal kingdom (reviewed in Taylor and Larson 2019). Hybrids might be unfit in their parentals’ habitat or have reduced viability and fertility due to hybrid incompatibilities (Bateson–Dobzhansky–Muller incompatibilities (Dobzhansky 1937; Muller 1942)). However, mixing old alleles into new combinations through hybridization can fuel adaptation and speciation (Rieseberg et al. 2003; Hermansen et al. 2014; Wallbank et al. 2016; Richards and Martin 2017; Svardal et al. 2020; reviewed in Marques et al. 2019b). In this context, determining the timing and mode of gene flow is crucial. If hybridization between two species occurs at the time of origin of a third new reproductively isolated lineage, it is likely a case of hybrid speciation (reviewed in Mallet 2007; Abbott et al. 2013; Vallejo-Marín and Hiscock 2016). Instead, if hybridization occurs without producing a new lineage but results in introgression of genetic material through backcrossing, it is a case of secondary contact.
Due to their outstanding diversification and adaptive radiations, freshwater fish have been widely used as model systems to study speciation (reviewed in Bernardi 2013; Seehausen and Wagner 2014). Different events fuelled this diversity, including transitions from marine to freshwater habitats (Jones et al. 2012; Terekhanova et al. 2014), adaptation to extreme environments (Pfenninger et al. 2015), differentiation along water depth clines (Barluenga et al. 2006; Gagnaire et al. 2013) and changes in the configuration of rivers and lakes over geological time (Sousa-Santos et al. 2019). Freshwater fish also stand out as a vertebrate group with high hybridization rates (Hubbs 1955; Wallis et al. 2017). For example, hybridization events promoted rapid adaptive radiations within lake systems (Meier et al. 2017; Svardal et al. 2020), and several instances of hybridization between native and invasive species are documented (Nolte et al. 2006; Meraner et al. 2013).
In this work, we analyse the role of gene flow during the speciation process of four species of Iberian endemic chubs found in Western Iberian rivers: Squalius carolitertii (Doadrio, 1988), Squalius pyrenaicus (Gunther, 1868), Squalius torgalensis (Coelho et al., 1998) and Squalius aradensis (Coelho et al., 1998) (Fig. 1). As obligatory freshwater fish, their evolutionary history is intertwined with the geomorphological rearrangements of river systems. These species are distributed along a latitudinal cline with increasing temperatures and propensity for drought from north to south, reflecting the transition from an Atlantic to a Mediterranean climate type (Gasith and Resh 1999; Jesus et al. 2017). Squalius carolitertii inhabits northern rivers down to the Mondego basin, while Squalius pyrenaicus has a more southern distribution (Tagus, Sado, Guadiana and south-eastern basins) (Coelho et al. 1995, 1998). Contrastingly, the two other species are confined to small southwestern basins: Squalius torgalensis inhabits the Mira basin and Squalius aradensis basins in the extreme southwestern area (e.g. Arade) (Coelho et al. 1998). These two last species are ‘critically endangered’ and S. pyrenaicus is ‘endangered’ in the Portuguese Red List (Cabral 2005) and ‘vulnerable’ in the Spanish Red List (Doadrio 2002).
Although these species have allopatric distributions (Fig. 1) associated with different river basins, studies based on mitochondrial and nuclear markers found phylogenetic incongruences between them, which have been interpreted as possible past hybridization but remained unresolved (Brito et al. 1997; Waap et al. 2011; Sousa-Santos et al. 2019; Perea et al. 2020). Thus, this work had three major goals: (i) to characterize the genome-wide patterns of genetic differentiation and reconstruct the species tree for these four Squalius species; (ii) to investigate the role of gene flow in the evolution of these species, particularly to assess whether previously reported incongruent mitochondrial and nuclear phylogenies can be explained by incomplete lineage sorting; and (iii) to perform demographic modelling to compare alternative divergence scenarios, and to date and quantify past gene flow. To achieve these goals, we obtained genome-wide single-nucleotide polymorphisms (SNPs) for the four species through genotyping by sequencing (GBS) and applied methods to quantify genetic differentiation, reconstruct the evolutionary relationship between species and test for evidence of gene flow. Furthermore, we performed demographic modelling to compare alternative models of isolation, hybrid origin and secondary contact, accounting for drift and shared ancestry (i.e. incomplete lineage sorting). We found support for a potential case of hybrid speciation in S. pyrenaicus, uncovering genomic regions that might have been under selection in the hybrid.
Methods
Sampling and sequencing
We sampled 65 individuals from eight locations (Fig. 1), including at least one sampling location from a representative river basin per species. S. carolitertii individuals were collected from the Mondego basin (n = 10). In the northern part of S. pyrenaicus distribution, individuals were collected from the Ocreza river (n = 10) and Canha stream (n = 10), tributaries of the Tagus basin. Specimens were also collected in the Lizandro basin (n = 10). From here on, we use ‘S. pyrenaicus North’ to refer to S. pyrenaicus from Ocreza, Canha and Lizandro. In the southern part of the distribution, S. pyrenaicus was sampled in the Guadiana (n = 2) and Almargem (n = 8) basins, which we refer to as ‘S. pyrenaicus South’. S. torgalensis individuals were collected in the Mira basin (n = 10) and S. aradensis individuals from the Arade basin (n = 5). GPS coordinates of the sampling locations and fishing licences from the Portuguese authority for conservation of endangered species (Instituto de Conservação da Natureza e das Florestas) can be found in Table S1.
Fish were collected by electrofishing (300 V, 4 A), and total genomic DNA was extracted from fin clips using a phenol–chloroform protocol adapted from Taggart et al. (1992) and quantified using Qubit® 2.0 Fluorometer (Live Technologies). Samples were subjected to a paired-end GBS protocol (adapted from Elshire et al. 2011), performed at Beijing Genomics Institute (www.bgi.com). DNA was sent to the facility mixed with DNAstable Plus (Biomatrica) for preservation at room temperature during shipment. Briefly, upon arrival, DNA was fragmented using the restriction enzyme ApeKI and the fragments were amplified after adaptor ligation (Elshire et al. 2011). The resulting library was sequenced using Illumina Hiseq2000 with read length of 91 base pairs (bp).
Multi-species SNP dataset
We assessed sequence quality using FastQC (https://www.bioinformatics.babraham.ac.uk/projects/fastqc/) and MultiQC (Ewels et al. 2016). We used process_radtags from Stacks v2.2 (Rochette et al. 2019) to trim all reads to 82 bp and discard reads with uncalled bases and low-quality scores, using the default settings for window size (0.15 of the read length) and base quality threshold (10 Phred score). Given the absence of a reference genome for any of the species in this study, we built a catalog of all loci using the de novo approach from Stacks v2.2 (Rochette et al. 2019). We tested different M and n parameter values (see definition below) for catalog construction using a subset of representative individuals as recommended by Paris et al. 2017. We selected the parameters for final catalog construction based on the effect of M and n on the number of SNPs (Fig. S1), and on expected low intraindividual diversity (M) and moderate-to-high differentiation between individuals (n), according to previous estimates indicating a divergence time of 14 million years (Mya) (Sousa-Santos et al. 2019) and low genetic diversity (Almada and Sousa-Santos 2010). We constructed the final catalog using all individuals, requiring a minimum depth of coverage of 4x (m = 4) for every stack (set of exactly matching reads) and considering as putative alleles of the same locus stacks with a maximum of two mismatches within each individual (M = 2). We then allowed a maximum of four differences between stacks from different individuals (n = 4) to be considered one locus on the catalog. Given the possibility that forward and reverse sequences of the same fragment were treated as different loci, similar reads within the catalog were clustered using CD-HIT-EST from CD-HIT v4.7 (Li and Godzik 2006; Fu et al. 2012) with a word length of 6 and a sequence identity threshold of 0.85. The reads from each individual were aligned against the catalog using BWA-MEM from BWA v0.7.17-r1188 (Li 2013) with default parameters. We sorted the output alignments and removed unmapped reads using Samtools v1.10 (Li et al. 2009) and evaluated the alignments for each individual using Picard v2.18.13 (http://broadinstitute.github.io/picard/) (Table S2). To call genotypes for each individual at each site and identify SNPs, we used the method implemented in Freebayes v1.2.0 (Garrison and Marth 2012), discarding low-quality reads and bases and without using Hardy–Weinberg equilibrium priors (-p 2 –min-mapping-quality 30 --min-base-quality 20 --hwe-priors-off).
To discard sites and genotypes likely due to sequencing or mapping errors and to maximize the number of individuals with data, we applied extra filters using VCFtools v0.1.15 (Danecek et al. 2011) and BCFtools v1.6 (Li et al. 2009). First, we kept only SNPs present in all sampling sites in at least 50% of individuals. Second, genotypes with a depth of coverage (DP) outside of ¼ to 4 times the individual median DP and SNPs with excess of heterozygotes when pooling all individuals were removed. Third, individuals with more than 50% missing data were removed. Finally, only SNPs with a minor allele frequency (MAF) larger than 0.01 were kept (MAF ≥ 0.01).
Global patterns of genetic differentiation
To quantify the levels of differentiation between sampling locations, we calculated the pairwise FST using the Hudson estimator (Hudson et al. 1992), assessing the significance using 10,000 permutations. We then investigated fine population structure using individual-based methods. Principal component analysis (PCA) was performed using the package LEA (Frichot and François 2015) in RStudio v1.1.383 and R v3.4.4. We used ADMIXTURE v1.3 (Alexander et al. 2009) to determine the ancestry proportion of each individual from a specified number of clusters (K), testing values of K between 1 and 8, performing 100 independent runs for each K value. We identified the best K value as the one with the lowest fivefold cross-validation error. For 2 ≤ K ≤ 5, we assessed similarity across the 100 replicates using the Greedy algorithm implemented in CLUMPP v1.1.2 (Jakobsson and Rosenberg 2007).
Relationship between species and populations
We reconstructed a graph describing the relationships between populations using TreeMix v1.13 (Pickrell and Pritchard 2012), exploring models with no migration and up to two migration events. Since there is only one individual from S. pyrenaicus Guadiana in the final dataset (see ‘Results’), S. pyrenaicus South is represented only by S. pyrenaicus Almargem. The position of the root was not specified, thus the resulting trees are unrooted.
We also constructed a maximum likelihood phylogeny of individuals using IQ-TREE v1.6.12 (Nguyen et al. 2015). We used vcf2phylip v2.0 (Ortiz 2019) to convert the VCF file into PHYLIP format. We used ModelFinder (Kalyaanamoorthy et al. 2017) implemented on IQ-TREE to determine the best substitution model, limiting the search to models with ascertainment bias correction (+ASC). According to the corrected Akaike information criterion (AIC), the best model was GTR + F + ASC + R2, which was used to construct the phylogeny with 5,000 standard non-parametric bootstrap replicates. The resulting best tree was visualized with midpoint-rooting in FigTree v1.4.4 (http://tree.bio.ed.ac.uk/software/figtree/).
Effect of linked SNPs
To verify whether the results were influenced by potential linkage between SNPs, we produced a dataset minimizing the number of linked SNPs by dividing the catalog into blocks of 200 bp, which is larger than the mean size of GBS loci, and sampling one SNP per block. We selected the SNP with the least missing data per block. Using this ‘single SNP’ dataset, we repeated analyses that could be affected by non-independence of SNPs (PCA, ADMIXTURE and TreeMix).
Detection of introgression between S. carolitertii and S. pyrenaicus
To test for past introgression between S. carolitertii and S. pyrenaicus, we used the D-statistic (Durand et al. 2011) with four different combinations of populations, using both S. aradensis and S. torgalensis as outgroups. First, we tested either S. carolitertii or S. pyrenaicus South as the possible sources of introgression into S. pyrenaicus North. To test for a geographical cline in admixture proportions in S. pyrenaicus North, we tested if the northernmost sampling site of S. pyrenaicus (Ocreza) showed more shared alleles with S. carolitertii than the other S. pyrenaicus North (Lizandro and Canha). Finally, we also considered all S. pyrenaicus North as sister populations and S. carolitertii as the potential source of introgressed genes.
As in the TreeMix analysis, we used S. pyrenaicus Almargem as the S. pyrenaicus South population. Significance of D-statistic values was assessed using a block jackknife approach (Soraggi et al. 2018), dividing the dataset into 25 blocks with a similar number of SNPs. These computations were performed in RStudio v1.1.383 and R v3.4.4 using custom scripts.
If introgression between populations occurred in the relatively recent past, we would expect individuals within the same population to show different degrees of introgression. To test this hypothesis, we calculated the D-statistic for each S. pyrenaicus North individual for the same scenarios as above.
Demographic modelling of the divergence of S. carolitertii and S. pyrenaicus
We compared alternative divergence scenarios of S. pyrenaicus and S. carolitertii using the composite likelihood method based on the site frequency spectrum (SFS) implemented in fastsimcoal2 v2.6 (Excoffier et al. 2013).
First, we compared the fit of three models with and without admixture to the observed SFS (see Fig. S2 for parameters inferred in each model). We compared a model that assumes S. pyrenaicus North received a contribution α from S. pyrenaicus South and 1-α from S. carolitertii at the time of split (‘admixture’—Fig. S2A) with models without admixture (‘no admixture C-PN’ and ‘no admixture PN-PS’—Fig. S2B, C). Importantly, models without admixture account for incomplete lineage sorting as they consider a shared common ancestor (different split times), and specific effective sizes for each population. To ensure models have the same number of parameters, in models without admixture, we allowed for the possibility of a bottleneck associated with the split of S. pyrenaicus North, mimicking a founder effect. Models B and C without the possibility of bottlenecks were also considered. Second, we compared three models to distinguish between a hybrid origin of S. pyrenaicus North (‘hybrid origin’—Fig. S2D) and two models of secondary contact (‘C-PN + sec. contact PS-PN’ and ‘PN-PS + sec. contact C-PN’—Fig. S2E, F).
To obtain the observed MAF spectrum without missing data, we built the pairwise 2D-SFS downsampling three individuals from S. carolitertii and S. pyrenaicus South, and four individuals from S. pyrenaicus North from Ocreza and Canha. We excluded Lizandro since it is an independent river basin, and its removal maximized the number of SNPs in the SFS. To sample individuals without missing data, we used the initial dataset without the MAF filter and divided it into blocks of 200 bp. For each block, we sampled individuals from each population with the least missing data keeping only sites with data across all individuals. Since the SFS is affected by the depth of coverage, only genotypes with DP > 10x were used (Nielsen et al. 2011). This resulted in an observed SFS with 8,758 SNPs. For each model, we performed 100 independent runs with 100 cycles, approximating the SFS with 100,000 coalescent simulations. Since we used the SFS without monomorphic sites, all parameters were scaled in relation to a reference effective size (Ne), which was arbitrarily set as the Ne of S. carolitertii. To convert relative divergence times into absolute time in Mya, we assumed a generation time of 3 years (Magalhães et al. 2003; Almada and Sousa-Santos 2010).
We then used the AIC to compare models. Since composite likelihoods computed with linked sites and pairwise 2D-SFS tend to overestimate the likelihood due to non-independence of data points (Excoffier et al. 2013), we built a 3D-SFS minimizing linked sites. We obtained 1,000 bootstrap joint 3D-SFS MAF by resampling either all sites or one site from each GBS locus with more than two SNPs. We obtained the expected 3D-SFS for each model based on 100,000 coalescent simulations according to the parameters that maximized the composite likelihood. To account for noise of the approximation, we averaged the expected 3D-SFS of ten runs. We then computed the likelihood of the 1,000 bootstrap unlinked 3D-SFS for each model. For each bootstrap replicate, we computed the AIC and relative likelihood of each model, following Excoffier et al. (2013).
Detection of outlier loci in S. pyrenaicus North
To identify variation from the parental species (S. carolitertii or S. pyrenaicus South) that was likely selected in the introgressed S. pyrenaicus North we scanned for outlier loci using pairwise FST (Hudson et al. 1992). We assumed that selection favouring alleles from a target parental (P1) in the potential hybrid population (H) would increase effective introgression from P1 to H, leading to SNPs where (i) the target parental and hybrid have low differentiation (FST(H,P1) < quantile 0.05 FST(H,P1)); (ii) the hybrid and the other parental (P2) have high differentiation (FST(H,P2) > quantile 0.95 FST(H,P2)); and (iii) the differentiation between the hybrid and the other parental is higher than between the two parental populations (FST(H,P2) > quantile 0.95 FST(P1,P2) between parentals), because under neutrality we would expect the same level of differentiation between the minor parental (S. pyrenaicus South) and both the major parental (S. carolitertii) and the hybrid lineage (S. pyrenaicus North). Focusing on S. carolitertii and S. pyrenaicus (n = 34), we applied a filter of MAF > 0.05 and kept only sites with more than 20% of data per population. We used only catalog loci with more than two SNPs and a mean distance between SNPs higher than nine. To determine the significance quantile thresholds, we either (i) simulated neutral 10,000 blocks of 160 bp according to the inferred demographic history under model D-‘hybrid origin’, ensuring that the average number of SNPs per block was identical to the observed data, or (ii) used the quantiles from the empirical distribution of FST. These two approaches lead to similar results when pooling all S. pyrenaicus North locations. The analysis based on empirical distributions was repeated considering individuals from the three sampling locations of S. pyrenaicus North separately to identify outlier regions shared between them. Calculations were done in RStudio v1.1.383 and R v3.4.4 using custom scripts. Finally, we blasted the identified outlier catalog loci against the NCBI database using the default settings of BLASTN (Zhang et al. 2000) and BLASTX (Altschul 1997) v2.11.0+.
Results
Multi-species SNP dataset
After the initial processing, we obtained a mean of 5,759,264 high-quality reads per individual (Table S2). The catalog comprised 524,911 loci with a mean length of 160 bp. Mean depth of coverage per sample was 59.7x. The filters applied resulted in 17 individuals with more than 50% of missing data, which were removed. The final dataset comprised 23,562 SNPs with 36.98% missing data and 48 individuals: S. carolitertii (n = 10), S. pyrenaicus Ocreza (n = 6), S. pyrenaicus Lizandro (n = 4), S. pyrenaicus Canha (n = 7), S. pyrenaicus Almargem (n = 6), S. pyrenaicus Guadiana (n = 1), S. torgalensis (n = 9) and S. aradensis (n = 5).
Global patterns of genetic differentiation
Overall, the highest levels of genetic differentiation were between the two southwestern species (S. torgalensis and S. aradensis) and the two more widely distributed species (S. carolitertii and S. pyrenaicus) (FST ≥ 0.294; Table 1). The lowest levels of genetic differentiation were between pairs of S. pyrenaicus North sampling locations and between S. pyrenaicus North and S. carolitertii (FST ≤ 0.139). The latter were lower than those between S. pyrenaicus North and S. pyrenaicus South (FST ≥ 0.171).
The first three principal components (PCs) of the PCA explain ~25% of the variation (Fig. S3). PC1 (Figs. 2A.1 and S3C) explains ~14% of the variance and clearly separates two groups: (i) S. carolitertii and S. pyrenaicus, and (ii) S. aradensis and S. torgalensis. This is consistent with the higher pairwise FST values obtained between these two groups. PC2 explains ~6% of the variance and separates S. aradensis from S. torgalensis (Fig. 2A). Finally, PC3 explains ~5% and separates S. pyrenaicus South from a cluster formed by S. carolitertii and S. pyrenaicus North (Figs. 2A.2 and S3C).
A Principal component analysis. A.1 PC1 and PC2; A.2 PC2 and PC3. Each point corresponds to one individual. B Individual ancestry proportions inferred with ADMIXTURE. B.1 K = 2; B.2 K = 3; B.3 K = 4; B.4 K = 5. Each vertical bar corresponds to one individual and the proportion of each colour corresponds to the estimated ancestry proportion from a given cluster. Individuals are grouped from north to south. Sampling locations separated by black lines.
Regarding the ADMIXTURE analysis, K = 2 achieved the lowest cross-validation (Fig. S4). The results were consistent across the 100 runs for 2 ≤ K ≤ 5 (mean G′ =1.000 for K = 2, K = 4 and K = 5 and mean G′ ≥0.997 for K = 3). For the best K value (K = 2), one of the clusters includes the two southwestern species (S. torgalensis and S. aradensis), while the other comprises S. carolitertii and S. pyrenaicus (Fig. 2B.1). This clustering mimics the separation created by PC1 in the PCA. For K = 3, a third cluster formed by S. pyrenaicus South appears (Fig. 2B.2). For K = 4, the two southwestern species (S. torgalensis and S. aradensis) are differentiated into different clusters (Fig. 2B.3). Interestingly, in these last two K values, one of the individuals from the Canha sampling location of S. pyrenaicus North shows a significant proportion from the S. pyrenaicus South cluster. Two S. pyrenaicus North individuals also show a small proportion of the S. aradensis cluster in K = 4, which may be due to ancestral polymorphism. K = 5 does not fit the data as well (mean cross-validation much higher than for 2 ≤ K ≤ 4, Fig. S4), and fails to separate S. carolitertii from S. pyrenaicus North.
Relationship between species and populations
The unrooted population tree obtained with TreeMix (Fig. 3) shows a clear separation between two groups: (i) S. aradensis and S. torgalensis, and (ii) S. carolitertii and S. pyrenaicus. Within the group of S. carolitertii and S. pyrenaicus, we found two main lineages: S. pyrenaicus South (here represented by S. pyrenaicus Almargem) and S. carolitertii and S. pyrenaicus North, which is consistent with pairwise FST, PCA and ADMIXTURE results. One of the two migration edges inferred is between S. pyrenaicus North and S. pyrenaicus South.
Regarding the maximum likelihood phylogeny of individuals (Fig. S5), the topology is consistent with the one obtained with TreeMix. It strongly supports the paraphyly of S. pyrenaicus with respect to S. carolitertii, with two well-supported subclades with individuals of: (i) S. carolitertii and S. pyrenaicus North, and (ii) S. pyrenaicus South.
Effect of linked SNPs
The dataset with one SNP per block of 200 bp comprised 2,607 SNPs with ~32.4% missing data. The results of PCA, ADMIXTURE and TreeMix analysis were overall consistent with those from the initial dataset of 23,562 SNPs (Figs. S6–S8).
Detection of introgression between S. carolitertii and S. pyrenaicus
When considering S. carolitertii as the potential source of introgression, we found significant positive D-statistic values indicating an excess of sites where S. pyrenaicus North shares the same allele with S. carolitertii, independently of the outgroup used (Fig. 4A). This could indicate that S. carolitertii and S. pyrenaicus North share a more recent common ancestor, in agreement with TreeMix and phylogeny results (Figs. 3 and S5). However, when considering S. pyrenaicus South as the potential source of introgression, we also found significant positive D-statistic values, indicating an excess of sites where both S. pyrenaicus share the same allele (Fig. 4B), in agreement with the TreexMix migration edge (Fig. 3). This indicates that the relationship between the species is not described by a bifurcating tree with a more recent common ancestry of S. carolitertii and S. pyrenaicus North, as we would expect D = 0 if that was the case.
A S. carolitertii as the potential source of introgression. B S. pyrenaicus South as the potential source of introgression. For each topology, the results are shown for the different S. pyrenaicus North sampling locations (Ocreza, Lizandro, Canha) used. ‘S. carol’ stands for S. carolitertii and ‘S.pyr Almargem’ stands for S. pyrenaicus Almargem. Results obtained with each outgroup are represented by a different symbol (circles for S. torgalensis and triangles for S. aradensis). Full symbols represent significant D values (p < 0.05).
We find no evidence of a geographical cline of admixture with S. carolitertii along the S. pyrenaicus North distribution (Fig. S9), with its northernmost sampling location (Ocreza—location 2 on Fig. 1) showing no signs of sharing more alleles with S. carolitertii than with other S. pyrenaicus North populations. Thus, the inexistence of a geographical cline of admixture and the consistent signal across different S. pyrenaicus North sampling locations in Fig. 4 suggest a scenario of introgression between S. carolitertii and the ancestor of S. pyrenaicus North prior to the divergence of the different S. pyrenaicus North populations.
In case of recent introgression, we would expect to find differences in the D-statistic among individuals from a given population. However, when we computed the D-statistic by individual (Fig. S10 and Table S4), we found limited variation among different individuals from the same location, suggesting that introgression events likely pre-date the divergence of populations.
Demographic modelling of divergence of S. carolitertii and S. pyrenaicus
First, we compared models with admixture (Fig. 5A) to models of bifurcating trees without gene flow accounting for incomplete lineage sorting, i.e. different topologies, times of split and effective population sizes (Fig. 5B, C). The model with admixture reached a higher likelihood (Table S5) than the best bifurcating tree model with the topology supported by the individual-based phylogeny (Fig. S5), i.e. with a more recent ancestor of S. carolitertii and S. pyrenaicus North (Fig. 5B). This result was not affected by the number of parameters, as bifurcating tree models with or without an extra effective size parameter to allow for a potential bottleneck (see ‘Methods’) had lower likelihood values than the admixture model (Tables S5 and S6B). This supports that S. pyrenaicus North likely experienced introgression and that its relationship with S. carolitertii and S. pyrenaicus South cannot be explained by a simple bifurcating tree, in agreement with TreeMix and D-statistics (Figs. 3 and 4). Next, we explored models to distinguish between scenarios of hybrid origin of S. pyrenaicus North (Fig. 5D) and secondary contact (Fig. 5E, F). The parameter estimates were similar and consistent across models with admixture (Fig. 5A, D–F), indicating (i) a major contribution of 80–84% from S. carolitertii into S. pyrenaicus North; (ii) a relatively old divergence of the parental lineages (~15 times older than admixture times); and (iii) relatively larger effective sizes in ancestral lineages (Fig. 5A, D–F). To convert relative parameters into years, we considered different effective sizes of S. carolitertii, which was considered as a reference (Table S6). Model D achieved the higher average relative likelihood, followed by Model F, based on AIC of 1,000 bootstrap replicates with one SNP per GBS locus (Table S5), irrespective of the threshold used to pool SFS entries (Fig. S11B and Table S7). This was also found with all SNPs (Fig. S11A.1), but with a high uncertainty when using a minimum entry threshold of 5 (Fig. S11A.2 and Table S7). The best model D is compatible with a ‘hybrid origin’, inferring that at time of admixture S. pyrenaicus North received ~84% from S. carolitertii (major parental) and the remaining ~16% from S. pyrenaicus South (minor parental).
The name given to each model is indicated above the schematic representation, as well the difference to maximum likelihood (dif. to max. likelihood), which is the difference in log10 units between the estimated likelihood and the maximum likelihood if there was a perfect fit to the observed site frequency spectrum. The closer to zero (less negative values), the better the fit. The estimated admixture proportion is the parameter α. Models A–C have eight parameters and therefore are directly comparable. Models D–F have nine parameters and are also directly comparable. All inferred parameters are indicated in relation to the Ne of S. carolitertii (NeC)). TA – divergence from the ancestral; TP – divergence of the northern S. pyrenaicus; TS—secondary contact. Population sizes are not to scale. See Supplementary Fig. S11 for relative likelihoods based on AIC, and Supplementary Tables S5–S7 for estimated likelihoods and parameter values.
Detection of outlier loci in S. pyrenaicus North
We identified 12 outlier loci with low FST between S. carolitertii and S. pyrenaicus North and high FST between S. pyrenaicus North and S. pyrenaicus South (Fig. 6A—red triangles), corresponding to alleles from S. carolitertii which might have been selected in S. pyrenaicus North. However, when separating S. pyrenaicus North into the three sampling locations, none of these was shared between them (Fig. 6B). Moreover, none produced relevant BLAST hits.
A FST outliers identified in S. pyrenaicus North. Red triangles correspond to alleles from S. carolitertii potentially selected for in S. pyrenaicus North and blue triangles to alleles from S. pyrenaicus South potentially selected in S. pyrenaicus North. B Outlier loci from S. carolitertii shared between the different S. pyrenaicus North sampling locations. C Outlier loci from S. pyrenaicus South shared between the different S. pyrenaicus North sampling locations. An asterisk symbol indicates the catalog loci identified as vitellogenin.
We also identified four loci with high FST between S. carolitertii and S. pyrenaicus North and low FST between S. pyrenaicus North and S. pyrenaicus South (Fig. 6A—blue triangles). These correspond to alleles from S. pyrenaicus South that might have been selected in S. pyrenaicus North. One of the identified catalog loci, shared between the individuals from the locations ‘Ocreza’ and ‘Canha’ and also identified when all locations were pooled together (Fig. 6C), revealed an alignment match to vitellogenin, mostly from other cyprinid fish (Table S8). The other three outlier loci produced no relevant BLAST hits.
Discussion
Contrasting role of gene flow in the two species pairs
The genomic analysis presented here indicates a species tree comprising two main groups: (i) S. torgalensis and S. aradensis and (ii) S. carolitertii and S. pyrenaicus, in agreement with phylogenies previously obtained for mitochondrial (Brito et al. 1997; Sanjur et al. 2003; Mesquita et al. 2007) and different nuclear markers (Almada and Sousa-Santos 2010; Waap et al. 2011; Sousa-Santos et al. 2019; Perea et al. 2020). The results also show strong genetic differentiation between the two species pairs (Table 1 and Figs. 2, 3), as formerly reported (Coelho et al. 1995; Almada and Sousa-Santos 2010), reflecting an old divergence time, estimated to be ~14 Mya based on phylogenies of mitochondrial and nuclear genes (Perea et al. 2010; Sousa-Santos et al. 2019).
Gene flow appears to have played a very different role on the evolutionary history of the two species pairs. We found no evidence of gene flow involving S. torgalensis and S. aradensis. While we could not calculate D-statistics due to our sampling scheme, the results of the analyses including the four species (Table 1 and Figs. 2, 3) did not support interspecific gene flow involving S. torgalensis and/or S. aradensis. Thus, for these two species with more restricted distribution areas, our results are compatible with divergence of isolated lineages in allopatry. In contrast, for the two species with wider distributions (S. carolitertii and S. pyrenaicus), our results indicate that gene flow was paramount in their evolutionary history, specifically in the S. pyrenaicus North lineage (Figs. 4 and 5). This contrasting role of gene flow between the two more restricted and the two more widely distributed species might reflect their historical dispersal routes into different river basins. Our results are consistent with the hypothesis that the ancestor of the two southwestern species (S. torgalensis and S. aradensis) became isolated in southwestern Iberia around 11.6–7.2 Ma, while S. carolitertii and S. pyrenaicus only reached their current ranges in the last 2.6 Ma (Sousa-Santos et al. 2019), the latter as a consequence of geological changes leading to the current conformation of the major Iberian river basins (Cunha et al. 2005; Casas-Sainz and de Vicente 2009; Pais et al. 2012; Antón et al. 2014). Thus, the older isolation of river basins in southwestern Iberia probably created less opportunity for gene flow involving S. torgalensis and S. aradensis. This is in agreement with results from other freshwater fish species (Sousa et al. 2010) and amphibians (Martínez-Solano et al. 2006; Gonçalves et al. 2009), indicating higher genetic differentiation in this geographic region.
Evidence of hybridization between S. carolitertii and S. pyrenaicus
Our demographic modelling results indicate that the S. pyrenaicus North lineage is likely the result of hybridization – best model estimates indicate 84% contribution from S. carolitertii and 16% from S. pyrenaicus South (Fig. 5D and Fig. S11). A hybridization scenario is also supported by other analyses (Figs. 3, 4). For example, TreeMix infers a common ancestor for S. carolitertii and S. pyrenaicus North, together with a migration event between northern and southern S. pyrenaicus, indicating allele sharing that cannot be explained by a bifurcating tree (Fig. 3). The uncovered hybridization provides an explanation for the incongruences between mitochondrial and nuclear markers previously reported: while in mtDNA phylogenies all S. pyrenaicus had a more recent common ancestor (Brito et al. 1997; Mesquita et al. 2007), in nuclear phylogenies individuals of river basins corresponding to S. pyrenaicus North shared a more recent ancestor with S. carolitertii (Waap et al. 2011; Sousa-Santos et al. 2019; Perea et al. 2020).
The D-statistics results indicate that the hybridization event predates the differentiation of the three S. pyrenaicus North sampling locations, since the introgression signal is shared between them (Fig. 4). Given that Lizandro (Fig. 1—location 3) is an independent river basin without connections to the Tagus basin (where the other two S. pyrenaicus North were sampled), the most likely explanation is that the hybridization event predates the colonization of this independent river. Thus, the combination of the D-statistics with the best demographic model indicates that S. pyrenaicus North likely originated through hybridization and subsequently dispersed to its current range. Geological data indicate that prior to reaching their current conformation, the major Iberian river basins were endorheic (i.e. without flow to the sea, draining to lake like water masses), with the transition to exorheism occurring in the last 2.6 Mya (Cunha et al. 2005; Casas-Sainz and de Vicente 2009; Pais et al. 2012; Antón et al. 2014). Thus, it is possible that this transition increased the chances of contact between previously isolated lineages. Another possibility is that these ‘lake like’ endorheic basins allowed for lineages to mix, with subsequent dispersal of resulting hybrids.
The origin of S. pyrenaicus North through a single hybridization event followed by isolation, as modelled in Fig. 5D, is consistent with a scenario of hybrid speciation. Most authors agree that homoploid hybrid speciation (HHS), or recombinational speciation, occurs when hybridization plays a key role in establishing a new reproductively isolated lineage without changes in ploidy (Anderson and Stebbins 1954; Grant 1981; Mallet 2007; Mavárez and Linares 2008; Abbott et al. 2013). Schumer et al. (2014) establish three criteria that must be simultaneously fulfilled to demonstrate HHS: (1) hybrid lineages must be reproductively isolated from the parentals; (2) the genome of the hybrids must show evidence of hybridization; (3) reproductive isolation must be demonstrated to be a consequence of hybridization. Our results demonstrate S. pyrenaicus North meets the second criterion. In addition, it has a non-overlapping distribution with S. carolitertii and S. pyrenaicus South, thus being currently reproductively isolated from its parentals through their allopatric distributions, and ploidy is maintained between the three lineages (2n = 50) (Collares-Pereira et al. 1998). Yet, we currently lack information on the possible role of hybridization on the establishment of reproductive isolation to meet the three criteria. Indeed, very few biological systems clearly meet all the criteria outlined above. A notable example are Helianthus sunflowers, in which hybridization between two species produced independent lineages reproductively isolated from the parentals, with clear evidence of hybridization in their genomes, that can occupy dry and saline environments that the parents cannot as a direct result of hybridization (Rieseberg et al. 1995, 1996, 2003). In freshwater fish, some taxa have been proposed as being the result of HHS (e.g. DeMarais et al. 1992), although prior to the definition of the above criteria. Regarding S. pyrenaicus North, we note that models of secondary contact reach similar likelihood values as the hybrid speciation model when using all SNPs with pairwise 2D-SFS (Fig. 5 and Table S5). Thus, a secondary contact remains a possibility and future work based on more extensive sampling and larger number of markers should aim at clarifying this issue.
Potential implications of hybridization in S. pyrenaicus North
We estimated relative times indicating that the parental S. carolitertii and S. pyrenaicus South lineages diverged much earlier than the hybridization event that originated the S. pyrenaicus North lineage (Fig. 5D–F and Table S6C). In this scenario, S. pyrenaicus North might have benefited from the combination of different alleles from the two parentals, already filtered out by selection over many generations. Yet, the accumulation of genetic differentiation between the two parental lineages may also have led to genetic incompatibilities when the two genomes mixed. Our estimates indicate an asymmetric and larger contribution from S. carolitertii at the time of hybridization (~84%). Such asymmetries in admixture proportions have been reported in other freshwater fish, albeit in cases of recent hybridization, e.g. swordtail fish (Schumer et al. 2018) and sticklebacks (Marques et al. 2019a). Here, this asymmetry could reflect the relative proportions of S. carolitertii and S. pyrenaicus South in the initial reproductive pool that originated S. pyrenaicus North. Another possibility is that incompatibilities might have been resolved towards the S. carolitertii lineage. Notably, at mitochondrial markers, S. pyrenaicus North is more closely related with S. pyrenaicus South (Brito et al. 1997; Mesquita et al. 2007), despite the larger nuclear contribution of S. carolitertii. This contrasts to the pattern uncovered, for example, in Italian sparrows (Passer italiae), a species of potential hybrid origin with mtDNA more similar to its major parental (Elgvin et al. 2017). The mitochondrial pattern of S. pyrenaicus North can be due to neutral stochasticity, but it might also reflect cytonuclear incompatibilities or behavioural mechanisms favouring mating between S. carolitertii males and S. pyrenaicus South females. For example, in Darwin finches, female F1 hybrids resulting of crosses between Geospiza fortis females and G. scandens males preferentially mated with G. scandens males, which led to autosomal and mitochondrial introgression of G. fortis into G. scandens but little introgression on the Z chromosome (Lamichhaney et al. 2020).
Despite the asymmetry in parental contributions, there is variation in differentiation along the genome between the hybrid and parental lineages (Figs. 6 and S12). This variation reflects the stochasticity of neutral processes (e.g. drift and gene flow), but it also provides indirect evidence for the action of selection removing incompatibilities and/or favouring specific parental alleles (adaptive introgression). We identified four outlier regions compatible with selection of alleles from the minor parental lineage in S. pyrenaicus North, one of them corresponding to vitellogenin. Vitellogenin is an egg yolk precursor that is usually only expressed in females during oogenesis (Tyler et al. 1996), although its expression can also be triggered in males of different fish species in response to exogenous oestrogen exposure (Harries et al. 1997; Flammarion et al. 2000; Van Den Belt et al. 2003). Given its role in egg yolk formation, vitellogenin is of great importance to egg-laying organisms like fish. One possibility is that this genomic region was selected and maintained from the minor parental (S. pyrenaicus South) due to adaptive introgression, although the underlying selective pressures remain unclear. However, we cannot discard the hypothesis that genetic incompatibilities were involved and thus constrained the maintenance of this region from S. pyrenaicus South.
While we could not annotate other outlier genomic regions in S. pyrenaicus North, the geographic distribution of this lineage at an intermediate between the Atlantic and Mediterranean climate types provides future opportunity to investigate the potential adaptive role of hybridization. In particular, increasing temperatures and propensity for seasonal drought from north to south (the latter leading to isolation of fish in deeper water sections during summer, when large portions of the riverbed dry out) (Gasith and Resh 1999; Magalhães et al. 2003; Jesus et al. 2017) might impose strong selective pressures. This is consistent with previous studies that found (i) signatures of positive selection based on dN/dS on circadian genes likely associated with thermal adaptation in Squalius species (Moreno et al. 2021), and (ii) differences in gene expression between species inhabiting Atlantic (S. carolitertii) and Mediterranean climates (S. torgalensis) when exposed to increasing temperatures (Jesus et al. 2016, 2017). Thus, its hybrid origin might confer some advantage to S. pyrenaicus North at intermediate environmental conditions, as described in other organisms. In Saccharomyces yeast, laboratory crosses between S. cerevisiae and S. paradoxus produced transgressive F2 hybrids that slightly increased their environmental range and outcompeted their parentals across concentration clines of several environmental stressors (Stelkens et al. 2014). In freshwater fish, hybrids between low and high elevation swordtail species experience higher fitness at intermediate conditions (Culumber et al. 2012).
Implications for conservation
S. pyrenaicus populations, like other endemic Iberian cyprinids, have been impacted by habitat degradation, due to the construction of dams and water extraction for irrigation, and the introduction of exotic species. Consequently, S. pyrenaicus is listed as ‘endangered’ in the Portuguese Red List (Cabral 2005) and as ‘vulnerable’ in the Spanish Red List (Doadrio 2002). Our work provides evidence of two distinct lineages, each with its own evolutionary history, within the analysed S. pyrenaicus distribution: one inhabiting the Tagus and small Atlantic coastal basins (S. pyrenaicus North) and one distributed along the Guadiana and small Mediterranean basins (S. pyrenaicus South). Although a more extensive fine scale sampling is necessary for a clear delimitation of management units, we recommend that studies aiming to define such conservation units should consider sampling schemes covering the distribution area of the two groups detected.
Conclusions
Our study revealed a species tree with two well-differentiated groups in which gene flow had a contrasting role: (i) S. torgalensis and S. aradensis and (ii) S. carolitertii and S. pyrenaicus. We find no evidence of past gene flow in the first, consistent with divergence in allopatry. Contrastingly, we uncover past hybridization between S. carolitertii and S. pyrenaicus, originating the S. pyrenaicus North lineage, with ~84% contribution from S. carolitertii and ~16% from S. pyrenaicus South. Our estimates indicate that a hybridization event originating a new lineage, consistent with a scenario of hybrid speciation, is more likely than a secondary contact scenario. However, the results do not fully exclude the possibility of secondary contact or more complex scenarios, e.g. further changes in the past effective sizes. In the future, whole genome sequencing and more extensive sampling would be helpful to explore the sources of the unbalanced estimated admixture proportions in S. pyrenaicus North and evaluate the adaptive potential of this hybridization, given the contrasting environments of the parental lineages. This work adds to the growing list of examples where hybridization has been uncovered, describing a study system suitable for future work on the processes and consequences of hybridization.
Data availability
The catalog, datasets and scripts used are available from the Dryad Digital Repository https://doi.org/10.5061/dryad.v15dv41wk. Sequencing files (bam files for all individuals) are deposited on the Sequence Read Archive (SRA) under BioProject PRJNA751527.
References
Abbott R, Albach D, Ansell S, Arntzen JW, Baird SJE, Bierne N et al. (2013) Hybridization and speciation. J Evol Biol 26:229–246
Alexander DH, Novembre J, Lange K (2009) Fast model-based estimation of ancestry in unrelated individuals. Genome Res 19:1655–1664
Almada V, Sousa-Santos C (2010) Comparisons of the genetic structure of Squalius populations (Teleostei, Cyprinidae) from rivers with contrasting histories, drainage areas and climatic conditions based on two molecular markers. Mol Phylogenet Evol 57:924–931
Altschul S (1997) Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res 25:3389–3402
Anderson E, Stebbins GL (1954) Hybridization as an evolutionary stimulus. Evolution 8:378
Antón L, De Vicente G, Muñoz-Martín A, Stokes M (2014) Using river long profiles and geomorphic indices to evaluate the geomorphological signature of continental scale drainage capture, Duero basin (NW Iberia). Geomorphology 206:250–261
Barluenga M, Stölting KN, Salzburger W, Muschick M, Meyer A (2006) Sympatric speciation in Nicaraguan crater lake cichlid fish. Nature 439:719–723
Van Den Belt K, Verheyen R, Witters H (2003) Comparison of vitellogenin responses in zebrafish and rainbow trout following exposure to environmental estrogens. Ecotoxicol Environ Saf 56:271–281
Bernardi G (2013) Speciation in fishes. Mol Ecol 22:5487–5502
Brito RM, Briolay J, Galtier N, Bouvet Y, Coelho MM (1997) Phylogenetic relationships within genus Leuciscus (Pisces, Cyprinidae) in Portuguese fresh waters, based on mitochondrial DNA cytochrome b sequences. Mol Phylogenet Evol 8:435–442
Bush GL (1975) Modes of animal speciation. Annu Rev Ecol Syst 6:339–364
Cabral M, Almeida J, Almeida P, Dellinger T, Ferrand de Almeida N, Oliveira M et al. (2005) Livro vermelho dos vertebrados de Portugal. Instituto da Conservação da Natureza, Lisboa
Casas-Sainz AM, de Vicente G (2009) On the tectonic origin of Iberian topography. Tectonophysics 474:214–235
Coelho MM, Bogutskaya NG, Rodrigues JA, Collares-Pereira MJ (1998) Leuciscus torgalensis, and L. aradensis, two new cyprinids for Portuguese fresh waters. J Fish Biol 52:937–950
Coelho MM, Brito RM, Pacheco TR, Figueiredo D, Pires AM (1995) Genetic variation and divergence of Leuciscus pyrenaicus and L. carolitertii (Pisces, Cyprinidae). J Fish Biol 47:243–258
Collares-Pereira MJ, Próspero MI, Biléu RI, Rodrigues EM (1998) Leuciscus (Pisces, Cyprinidae) karyotypes: transect of Portuguese populations. Genet Mol Biol 21:63–69
Coyne JA, Orr HA (2004) Speciation. Sinauer Associates, Inc., Sunderland, Massachusetts
Culumber ZW, Shepard DB, Coleman SW, Rosenthal GG, Tobler M (2012) Physiological adaptation along environmental gradients and replicated hybrid zone structure in swordtails (Teleostei: Xiphophorus). J Evol Biol 25:1800–1814
Cunha PP, Martins AA, Daveau S, Friend PF (2005) Tectonic control of the Tejo river fluvial incision during the late Cenozoic, in Ródão—Central Portugal (Atlantic Iberian border). Geomorphology 64:271–298
Danecek P, Auton A, Abecasis G, Albers CA, Banks E, DePristo MA et al. (2011) The variant call format and VCFtools. Bioinformatics 27:2156–2158
Davey JW, Hohenlohe PA, Etter PD, Boone JQ, Catchen JM, Blaxter ML (2011) Genome-wide genetic marker discovery and genotyping using next-generation sequencing. Nat Rev Genet 12:499–510
DeMarais BD, Dowling TE, Marsh PC, Douglas ME, Minckley WL (1992) Origin of Gila seminuda (Teleostei: Cyprinidae) through introgressive hybridization: Implications for evolution and conservation. Evolution 89:2747–2751
Doadrio I (ed) (2002) Atlas y libro rojo de los peces continentales de España. Dirección General de Conservación de la Naturaleza, Museo Nacional de Ciencias Naturales, Madrid
Dobzhansky T (1937) Genetics and the origin of species. Columbia University Press, New York
Durand EY, Patterson N, Reich D, Slatkin M (2011) Testing for ancient admixture between closely related populations. Mol Biol Evol 28:2239–2252
Elgvin TO, Trier CN, Tørresen OK, Hagen IJ, Lien S, Nederbragt AJ et al. (2017) The genomic mosaicism of hybrid speciation. Sci Adv 3:e1602996
Elshire RJ, Glaubitz JC, Sun Q, Poland JA, Kawamoto K, Buckler ES et al. (2011) A robust, simple genotyping-by-sequencing (GBS) approach for high diversity species. PLoS ONE 6:1–10
Ewels P, Magnusson M, Lundin S, Käller M (2016) MultiQC: summarize analysis results for multiple tools and samples in a single report. Bioinformatics 32:3047–3048
Excoffier L, Dupanloup I, Huerta-Sánchez E, Sousa VC, Foll M (2013) Robust demographic inference from genomic and SNP data. PLoS Genet 9:e1003905
Flammarion P, Brion F, Babut M, Garric J, Migeon B, Noury P et al. (2000) Induction of fish vitellogenin and alterations in testicular structure: preliminary results of estrogenic effects in chub (Leuciscus cephalus). Ecotoxicology 9:127–135
da Fonseca RR, Albrechtsen A, Themudo GE, Ramos-Madrigal J, Sibbesen JA, Maretty L et al. (2016) Next-generation biology: sequencing and data analysis approaches for non-model organisms. Mar Genom 30:3–13
Frichot E, François O (2015) LEA: an R package for landscape and ecological association studies. Methods Ecol Evol 6:925–929
Fu L, Niu B, Zhu Z, Wu S, Li W (2012) CD-HIT: accelerated for clustering the next-generation sequencing data. Bioinformatics 28:3150–3152
Gagnaire PA, Pavey SA, Normandeau E, Bernatchez L (2013) The genetic architecture of reproductive isolation during speciation-with-gene-flow in lake whitefish species pairs assessed by rad sequencing. Evolution 67:2483–2497
Garrison E, Marth G (2012) Haplotype-based variant detection from short-read sequencing. http://arxiv.org/abs/12073907
Gasith A, Resh VH (1999) Streams in Mediterranean climate regions: abiotic influences and biotic responses to predictable seasonal events. Annu Rev Ecol Syst 30:51–81
Gonçalves H, MartÍnez-Solano I, Pereira RJ, Carvalho B, GarcÍa-ParÍs M, Ferrand N (2009) High levels of population subdivision in a morphologically conserved Mediterranean toad (Alytes cisternasii) result from recent, multiple refugia: Evidence from mtDNA, microsatellites and nuclear genealogies. Mol Ecol 18:5143–5160
Grant V (1981) Plant speciation, Second Edi. Columbia University Press
Harries JE, Sheahan DA, Jobling S, Matthiessen P, Neall P, Sumpter JP et al. (1997) Estrogenic activity in five United Kingdom rivers detected by measurement of vitellogenesis in caged male trout. Environ Toxicol Chem 16:534–542
Hermansen JS, Haas F, Trier CN, Bailey RI, Nederbragt AJ, Marzal A et al. (2014) Hybrid speciation through sorting of parental incompatibilities in Italian sparrows. Mol Ecol 23:5831–5842
Hubbs CL (1955) Hybridization between fish species in nature. Syst Zool 4:1–20
Hudson RR, Slatkint M, Maddison WP (1992) Estimation of levels of gene flow from DNA sequence data. Genetics 589:583–589
Jakobsson M, Rosenberg NA (2007) CLUMPP: A cluster matching and permutation program for dealing with label switching and multimodality in analysis of population structure. Bioinformatics 23:1801–1806
Jesus TF, Grosso AR, Almeida-Val VMF, Coelho MM (2016) Transcriptome profiling of two Iberian freshwater fish exposed to thermal stress. J Therm Biol 55:54–61
Jesus TF, Moreno JM, Repolho T, Athanasiadis A, Rosa R, Almeida-Val VMF et al. (2017) Protein analysis and gene expression indicate differential vulnerability of Iberian fish species under a climate change scenario (S Rutherford, Ed.). PLoS ONE 12:e0181325
Jones FC, Grabherr MG, Chan YF, Russell P, Mauceli E, Johnson J et al. (2012) The genomic basis of adaptive evolution in threespine sticklebacks. Nature 484:55–61
Kalyaanamoorthy S, Minh BQ, Wong TKF, Von Haeseler A, Jermiin LS (2017) ModelFinder: fast model selection for accurate phylogenetic estimates. Nat Methods 14:587–589
Lamichhaney S, Han F, Webster MT, Grant BR, Grant PR, Andersson L (2020) Female-biased gene flow between two species of Darwin’s finches. Nat Ecol Evol 4:979–986
Li H (2013) Aligning sequence reads, clone sequences and assembly contigs with BWA-MEM. http://arxiv.org/abs/13033997
Li W, Godzik A (2006) Cd-hit: a fast program for clustering and comparing large sets of protein or nucleotide sequences. Bioinformatics 22:1658–1659
Li H, Handsaker B, Wysoker A, Fennell T, Ruan J, Homer N et al. (2009) The Sequence Alignment/Map format and SAMtools. Bioinformatics 25:2078–2079
Magalhães MF, Schlosser IJ, Collares-Pereira MJ (2003) The role of life history in the relationship between population dynamics and environmental variability in two Mediterranean stream fishes. J Fish Biol 63:300–317
Mallet J (2007) Hybrid speciation. Nature 446:279–283
Marques DA, Lucek K, Sousa VC, Excoffier L, Seehausen O (2019a) Admixture between old lineages facilitated contemporary ecological speciation in Lake Constance stickleback. Nat Commun 10:1–14
Marques DA, Meier JI, Seehausen O (2019b) A combinatorial view on speciation and adaptive radiation. Trends Ecol Evol 34:531–544
Martínez-Solano I, Teixeira J, Buckley D, García-París M (2006) Mitochondrial DNA phylogeography of Lissotriton boscai (Caudata, Salamandridae): evidence for old, multiple refugia in an Iberian endemic. Mol Ecol 15:3375–3388
Mavárez J, Linares M (2008) Homoploid hybrid speciation in animals. Mol Ecol 17:4181–4185
Meier JI, Marques DA, Mwaiko S, Wagner CE, Excoffier L, Seehausen O (2017) Ancient hybridization fuels rapid cichlid fish adaptive radiations. Nat Commun 8:1–11
Meraner A, Venturi A, Ficetola GF, Rossi S, Candiotto A, Gandolfi A (2013) Massive invasion of exotic Barbus barbus and introgressive hybridization with endemic Barbus plebejus in Northern Italy: where, how and why? Mol Ecol 22:5295–5312
Mesquita N, Cunha C, Carvalho GR, Coelho MM (2007) Comparative phylogeography of endemic cyprinids in the south-west Iberian Peninsula: evidence for a new ichthyogeographic area. J Fish Biol 71:45–75
Moreno JM, Jesus TF, Coelho MM, Sousa VC (2021) Adaptation and convergence in circadian‐related genes in Iberian freshwater fish. BMC Ecol Evol 21:38
Muller H (1942) Isolating mechanisms, evolution, and temperature. Biol Symp 6:71–125
Nguyen LT, Schmidt HA, Von Haeseler A, Minh BQ (2015) IQ-TREE: a fast and effective stochastic algorithm for estimating maximum-likelihood phylogenies. Mol Biol Evol 32:268–274
Nielsen R, Paul JS, Albrechtsen A, Song YS (2011) Genotype and SNP calling from next-generation sequencing data. Nat Rev Genet 12:443–451
Nolte AW, Freyhof J, Tautz D (2006) When invaders meet locally adapted types: rapid moulding of hybrid zones between sculpins (Cottus, Pisces) in the Rhine system. Mol Ecol 15:1983–1993
Ortiz EM (2019) vcf2phylip v2.0: convert a VCF matrix into several matrix formats for phylogenetic analysis
Pais J, Cunha PP, Pereira D, Legoinha P, Dias R, Moura D et al. (2012) The Paleogene and Neogene of Western Iberia (Portugal): a Cenozoic record in the European Atlantic domain. Springer, Berlin, Heidelberg, p 1–138
Paris JR, Stevens JR, Catchen JM (2017) Lost in parameter space: a road map for stacks. Methods Ecol Evol 8:1360–1373
Payseur BA, Rieseberg LH (2016) A genomic perspective on hybridization and speciation. Mol Ecol 25:2337–2360
Perea S, Böhme M, Zupancic P, Freyhof J, Sanda R, Ozuluğ M et al. (2010) Phylogenetic relationships and biogeographical patterns in Circum-Mediterranean subfamily Leuciscinae (Teleostei, Cyprinidae) inferred from both mitochondrial and nuclear data. BMC Evol Biol 10:265
Perea S, Sousa-Santos C, Robalo J, Doadrio I (2020) Multilocus phylogeny and systematics of Iberian endemic Squalius (Actinopterygii, Leuciscidae). Zool Scr 49:440–457
Pfenninger M, Patel S, Arias-Rodriguez L, Feldmeyer B, Riesch R, Plath M (2015) Unique evolutionary trajectories in repeated adaptation to hydrogen sulphide-toxic habitats of a neotropical fish (Poecilia mexicana). Mol Ecol 24:5446–5459
Pickrell JK, Pritchard JK (2012) Inference of population splits and mixtures from genome-wide allele frequency data. PLoS Genet 8:e1002967
Richards EJ, Martin CH (2017) Adaptive introgression from distant Caribbean islands contributed to the diversification of a microendemic adaptive radiation of trophic specialist pupfishes (J Mallet, Ed.). PLOS Genet 13:e1006919
Rieseberg LH, Van Fossen C, Desrochers AM (1995) Hybrid speciation accompanied by genomic reorganization in wild sunflowers. Nature 375:313–316
Rieseberg LH, Raymond O, Rosenthal DM, Lai Z, Livingstone K, Nakazato T et al. (2003) Major ecological transitions in wild sunflowers facilitated by hybridization. Science 301:1211–1216
Rieseberg LH, Sinervo B, Linder CR, Ungerer MC, Arias DM (1996) Role of gene interactions in hybrid speciation: Evidence from ancient and experimental hybrids. Science 272:741–745
Rochette NC, Rivera-Colón AG, Catchen JM (2019) Stacks 2: analytical methods for paired-end sequencing improve RADseq-based population genomics. Mol Ecol 28:4737–4754
Sanjur OI, Carmona JA, Doadrio I (2003) Evolutionary and biogeographical patterns within Iberian populations of the genus Squalius inferred from molecular data. Mol Phylogenet Evol 29:20–30
Schluter D (2009) Evidence for ecological speciation and its alternative. Science 323:737–741
Schumer M, Rosenthal GG, Andolfatto P (2014) How common is homoploid hybrid speciation? Evolution 68:1553–1560
Schumer M, Xu C, Powell DL, Durvasula A, Skov L, Holland C et al. (2018) Natural selection interacts with recombination to shape the evolution of hybrid genomes. Science 360:656–660
Seehausen O, Wagner CE (2014) Speciation in freshwater fishes. Annu Rev Ecol Evol Syst 45:621–651
Soraggi S, Wiuf C, Albrechtsen A (2018) Powerful inference with the D-statistic on low-coverage whole-genome data. G3 Genes, Genomes, Genet 8:551–566
Sousa-Santos C, Jesus TF, Fernandes C, Robalo JI, Coelho MM (2019) Fish diversification at the pace of geomorphological changes: evolutionary history of western Iberian Leuciscinae (Teleostei: Leuciscidae) inferred from multilocus sequence data. Mol Phylogenet Evol 133:263–285
Sousa V, Penha F, Pala I, Chikhi L, Coelho MM (2010) Conservation genetics of a critically endangered Iberian minnow: evidence of population decline and extirpations. Anim Conserv 13:162–171
Stebbins GL (1959) The role of hybridization in evolution. Proc Am Philos Soc 103:231–251
Stelkens RB, Brockhurst MA, Hurst GDD, Miller EL, Greig D (2014) The effect of hybrid transgression on environmental tolerance in experimental yeast crosses. J Evol Biol 27:2507–2519
Svardal H, Quah FX, Malinsky M, Ngatunga BP, Miska EA, Salzburger W et al. (2020) Ancestral hybridization facilitated species diversification in the Lake Malawi cichlid fish adaptive radiation. Mol Biol Evol 37:1100–1113
Taggart JB, Hynes RA, Prodöuhl PA, Ferguson A (1992) A simplified protocol for routine total DNA isolation from salmonid fishes. J Fish Biol 40:963–965
Taylor SA, Larson EL (2019) Insights from genomes into the evolutionary importance and prevalence of hybridization in nature. Nat Ecol Evol 3:170–177
Terekhanova NV, Logacheva MD, Penin AA, Neretina TV, Barmintseva AE, Bazykin GA et al. (2014) Fast evolution from precast bricks: genomics of young freshwater populations of threespine stickleback Gasterosteus aculeatus. PLoS Genet 10:e1004696
Tyler CR, Van Der Eerden B, Jobling S, Panter G, Sumpter JP (1996) Measurement of vitellogenin, a biomarker for exposure to oestrogenic chemicals, in a wide variety of cyprinid fish. J Comp Physiol - B Biochem Syst Environ Physiol 166:418–426
Vallejo-Marín M, Hiscock SJ (2016) Hybridization and hybrid speciation under global change. N Phytol 211:1170–1187
Waap S, Amaral AR, Gomes B, Coelho MM (2011) Multi-locus species tree of the chub genus Squalius (Leuciscinae: Cyprinidae) from western Iberia: New insights into its evolutionary history. Genetica 139:1009–1018
Wallbank RWR, Baxter SW, Pardo-Diaz C, Hanly JJ, Martin SH, Mallet J et al. (2016) Evolutionary novelty in a butterfly wing pattern through enhancer shuffling. PLoS Biol 14:1–16
Wallis GP, Cameron-Christie SR, Kennedy HL, Palmer G, Sanders TR, Winter DJ (2017) Interspecific hybridization causes long-term phylogenetic discordance between nuclear and mitochondrial genomes in freshwater fishes. Mol Ecol 26:3116–3127
Zhang Z, Schwartz S, Wagner L, Miller W (2000) A greedy algorithm for aligning DNA sequences. J Comput Biol 7:203–214
Acknowledgements
We thank Tiago F. Jesus for the help in sample preparation. We also thank three anonymous reviewers and the associate editor for constructive comments on previous versions of the manuscript. This work was funded by strategic projects UID/BIA/00329/2013 (2015–2018) and UIDB/00329/2020 granted to cE3c from the Portuguese National Science Foundation—Fundação para a Ciência e a Tecnologia (FCT). SLM is funded by an FCT scholarship (SFRH/BD/145153/2019). VCS was funded by FCT (CEECIND/02391/2017 and CEECINST/00032/2018/CP1523/CT0008) and by EU H2020 program (Marie Skłodowska-Curie grant 799729). We thank the INCD (https://incd.pt/) for use of their computing infrastructure, which is funded by FCT and FEDER (project 01/SAICT/2016 n° 022153).
Author information
Authors and Affiliations
Contributions
MMC and VCS conceived and designed the study; MPM performed laboratory work; SLM and VCS analyzed the data; SLM, VCS, and MMC wrote the manuscript. All authors contributed to the drafts and gave final approval for publication.
Corresponding author
Ethics declarations
Competing interests
The author declares no competing interests.
Additional information
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary information
Rights and permissions
About this article
Cite this article
Mendes, S.L., Machado, M.P., Coelho, M.M. et al. Genomic data and multi-species demographic modelling uncover past hybridization between currently allopatric freshwater species. Heredity 127, 401–412 (2021). https://doi.org/10.1038/s41437-021-00466-1
Received:
Revised:
Accepted:
Published:
Issue date:
DOI: https://doi.org/10.1038/s41437-021-00466-1
This article is cited by
-
A review of catfish (Siluriformes) hybridization
Reviews in Fish Biology and Fisheries (2025)