Short-term evolutionary implications of an introgressed size-determining supergene in a vulnerable population

Lesturgie, Pierre; Denton, John S. S.; Yang, Lei; Corrigan, Shannon; Kneebone, Jeff; Laso-Jadart, Romuald; Lynghammar, Arve; Fedrigo, Olivier; Mona, Stefano; Naylor, Gavin J. P.

doi:10.1038/s41467-025-56126-z

Download PDF

Article
Open access
Published: 27 January 2025

Short-term evolutionary implications of an introgressed size-determining supergene in a vulnerable population

Nature Communications volume 16, Article number: 1096 (2025) Cite this article

6507 Accesses
3 Citations
64 Altmetric
Metrics details

Subjects

Abstract

The Thorny Skate (Amblyraja radiata) is a vulnerable species displaying a discrete size-polymorphism in the northwest Atlantic Ocean (NWA). We conducted whole genome sequencing of samples collected across its range. Genetic diversity was similar at all sampled sites, but we discovered a ~ 31 megabase bi-allelic supergene associated with the size polymorphism, with the larger size allele having introgressed in the last ~160,000 years B.P. While both Gulf of Maine (GoM) and Canadian (CAN) populations exhibit the size polymorphism, we detected a significant deficit of heterozygotes at the supergene and longer stretches of homozygosity in GoM population. This suggests inbreeding driven by assortative mating for size in GoM but not in CAN. Coalescent-based demographic modelling reveals strong migration between regions maintaining genetic variability in the recombining genome, preventing speciation between morphs. This study highlights short-term context-dependent evolutionary consequences of a size-determining supergene providing new insights for the management of vulnerable species.

Introduction

Chromosomal inversions prevent recombination, thereby maintaining the specific allelic arrangement within the genes they encompass, leading to enhanced fitness^1,2,3,4,5. Suites of genes in inversions are often referred to as supergenes, as they can lead to the Mendelian inheritance of complex phenotypes^6,7. While the existence of supergene systems has long been acknowledged⁸, the accessibility of whole genome sequencing (WGS) data has significantly amplified their detection. The presence of supergene-associated traits has been shown in several systems including sociality in ants^{9,10,11,12,13}, migratory behavior and adaptation to salinity and temperature in Atlantic cod^14,15,16, and wing morphology and pattern coloration in butterflies^17,18. Supergenes are maintained in populations by an interplay between demographic and selective processes⁶, the relative contributions of which can be difficult to disentangle without careful reconstruction of the demographic history of populations based on neutral markers. Varying selection in space and time is often invoked as one of the mechanisms to explain the long-term persistence of supergenes¹. Yet, the limited variability resulting from recombination suppression alongside complex phenotypes may impede swift adaptative responses to rapid environmental shifts. This could critically impact survival potential in threatened species, underscoring the potential role that supergenes might play in driving short-term evolutionary dynamics of species.

The thorny skate (Amblyraja radiata) is a demersal species found on the continental shelf from South Carolina in the Northwest Atlantic (NWA) via Greenland and Iceland to the British Isles and the Barents Sea in the Northeast Atlantic (NEA, Fig. 1)¹⁹. In the NWA, the thorny skate has long been taken as target catch or bycatch in commercial fisheries. Their abundance has declined steeply in waters off Canada and the US Gulf of Maine over the last 50 years^19,20. This decline prompted stringent conservation measures that eliminated directed harvest in large geographic areas. However, the Gulf of Maine population has shown no signs of recovery despite two decades of reduced fishing mortality^19,20. The factors impeding population recovery in the Gulf of Maine remain unknown^21,22,23,24. Complicating this conservation issue are two regionally sympatric size morphs present in the NWA, each displaying characteristic growth curves^{25,26,27,28,29}. The large morph is restricted to the NWA and reaches a maximum size of 105 cm total length (TL). The small morph occurs over the species’ entire range and reaches a maximum size of 72 cm TL²⁸. To date, the genetic underpinnings of this morphological polymorphism have eluded detection. Mitochondrial data suggest weak population genetic structure and isolation by distance across the entire range^30,31 and microsatellite data show regional genetic differentiation between the NWA and NEA³². However, neither data type indicates genetic differentiation between large and small morphotypes^31,32,33.

**Fig. 1: Whole genome sample scheme and population structure of thorny skates.**

Here we sought to understand the genomic and/or environmental origins of the size polymorphism in the NWA and any implications for the conservation of the species. We first established a high-quality reference genome for the thorny skate based on a combination of long read (PacBio), Hi-C, Bionano, and Illumina short read sequencing in collaboration with the Vertebrate Genomes Project (https://vertebrategenomesproject.org). Using this high-quality reference assembly, we carried out whole-genome population resequencing (Illumina ~24x coverage) of 49 individuals from representative locations spanning the species’ distribution range (Fig. 1A). This approach allowed us to discover a ~ 31 megabase (Mb) supergene on chromosome 2, likely contained in a chromosome inversion, that was polymorphic in the NWA. We expanded the scope of our study and screened for the genotype of the supergene in 465 individuals across the range of the species to characterize the distribution of the supergene’s alleles and investigate its association with size. To better understand the origin, maintenance, and allelic distribution of the supergene, we reconstructed the demographic history of the species based on analysis of millions of putatively neutral genome-wide Single Nucleotide Polymorphisms (SNPs). PacBio long read sequencing of an individual of the closely related species A. hyperborea revealed the supergene to be present in at least one of the congeners. When this information was combined with the extant geographic distribution of both alleles and the historical reconstruction of demography, we were able to infer that the supergene was most likely transmitted to A. radiata through cross-species introgression. Our findings showed significantly higher level of inbreeding, driven by assortative mating for size polymorphism, in the highly vulnerable Gulf of Maine population. We further discussed how assortative mating could hamper its recovery, presenting a particular challenge for thorny skate conservation and management in the NWA.

Results

Summary statistics

Mapping rate, total number of reads, average coverage, average depth and average mapping quality computed for each individual before filtering is presented in Table S1. After filtering and binning, we performed population structure analyses using ~1.15 to ~1.19 million SNPs. Genetic diversity estimates were computed from the Site Frequency Spectrum (SFS) in sampling locations with N ≥ 5 specimens, based on ~9.98 to ~13.93 million SNPs after filtering (Table 1). Additional description of summary statistics can be found in the supplementary material.

Table 1 Summary statistics for each sampling location

Full size table

Population structure

We used Principal Component Analysis (PCA) of SNP variation to explore population structure. The first axis (~14% of total variance) revealed two clusters, corresponding to the NEA and NWA regions, respectively (Fig. 1B). The second axis (~2% of total variance) separated NWA individuals sampled from the Gulf of Maine (GoM) with those from Newfoundland (CAN). The sparse non-Negative Matrix Factorization (sNMF) algorithm³⁴ further identified K = 2 as the most likely number of ancestral populations with individual ancestry coefficients perfectly matching the clusters detected by the PCA (Fig. 1B, S1). We further ran both the PCA and sNMF within each cluster separately. The first two PC axes explained only ~5% and ~4% of the total variance in NWA and NEA respectively (Fig. S2), and in both datasets K = 1 was the most likely number of ancestral populations (Fig. S1). However, both algorithms harbored signatures of fine scale population structure driven by the clinal distribution of genetic variation within both regions (Fig. S1 and S2). All pairwise F_ST comparisons were statistically significant (p ≤ 0.001) and generally consistent with the results provided by the clustering algorithms. Values ranged from 0.002 to 0.004 in intra-cluster comparisons (i.e., within NEA and within NWA) and from 0.173 to 0.189 when comparing NEA vs NWA sampling sites (Fig. 1C).

Detection of a supergene

Genome wide analyses highlighted geography as the main driver of population differentiation and did not detect any differences between the large and small morphs in the NWA (Fig. S2). However, when we used a genomic sliding windows analysis of PCA over the pooled NWA sample (to compute the percentage of total variance explained by the first axis within each window), we identified a ~ 31 Mb region in chromosome 2 (coordinates: ~17 Mb - ~48 Mb) that was strikingly different from the genome-wide average (Fig. 2A). Local PCA computed within this region displayed three clusters segregated by the first axis (Fig. 2B). The two most distant clusters on the first axis were characterized by an excess of the two alternative homozygous genotypes, while individuals in the middle of the first axis displayed an excess of heterozygous genotypes (Fig. 2C). This result was corroborated by the sNMF, which found K = 2 as the most likely number of ancestral populations with individuals showing an excess of heterozygotes being almost exactly half admixed between them (Fig. S3). Finally, we investigated Linkage Disequilibrium (LD) in both the pooled sample and in the two clusters separately (Fig. 2D and S3): the region is characterized by strong LD in the pooled sample when compared to the rest of chromosome 2. Conversely, LD values were similar to the rest of the genome (or lower) when computed within each previously identified cluster. Additionally, F_ST values between the two clusters characterized by an excess of homozygosity reached up to ~1 in the region (suggestive of fixation) while remaining distributed around ~0 outside (Fig. 2E). Collectively, these results suggest that this region is an inversion, where recombination has been suppressed.

Given the occurrence of 226 annotated genes in the 17–48 Mb region of chromosome 2 of the reference genome, we hereafter refer to this region as a supergene, characterized by two haplotypes (HB and HS) inherited in a Mendelian fashion⁶. Individuals with reference homozygous genotypes are HS/HS, individuals with alternative homozygous genotypes are HB/HB, and heterozygote individuals are HB/HS. Preliminary results suggested that the size of individuals was different between the two homozygous genotypes: HB/HB had an average size of ~71.7 cm and HS/HS of ~53.9 cm. However, sample sizes were too low (HB/HB: N = 9, HS/HS: N = 10) to model confounding factors such as sex and maturity. When a local PCA was run including the NEA samples, the first axis segregated NEA and HS/HS individuals from HB/HB and HB/HS individuals before the NEA-NWA geographical divergence detected by the genome-wide structure analyses (Fig. 2F). The same pattern was observed when computing the individual ancestry with the sNMF (Fig. S4), which implies that NEA individuals are all HS/HS and the divergence between HB and HS alleles predates the split between the NEA and NWA regions.

Genotype screening and size association

To further investigate the association between the supergene genotypes and size, we first selected two regions each with ≥ 4 SNPs discriminating HB and HS alleles within the supergene and further screened by PCR and Sanger sequencing of 501 individuals (465 after filtering, see supplementary results) throughout the species’ distribution range (Table 1). HB was absent in the NEA, consistent with the lack of body size polymorphism in this part of the range. Conversely, HB reached a frequency of 0.29 and 0.38 in the GoM and CAN sampling sites, respectively (Table 1 and Fig. 3A). However, the distribution of genotypes in the two sampling sites was strikingly different: the GoM displayed a strong deficit in heterozygotes (only 10 out of 282 individuals, Hardy-Weinberg exact-test: p < 0.001), while CAN was in Hardy-Weinberg equilibrium (Table 1). We then investigated the relationship between size and genotype controlling for maturity and/or sex using linear models in a Bayesian framework (Table S2) using the 241 GoM individuals with no missing information on any trait. Leave-One-Out cross validation indicated that the model including size and maturity provided the best fit (see supplementary results). Posterior distribution of size for HB/HB and HB/HS individuals largely overlapped, with median values and 95% credibility intervals (averaged over the levels of maturity) of 66.95 cm (95% CI [64.73; 69.17] cm) for the former and 65.61 cm (95% CI [60.81; 70.50] cm) for the latter. Conversely, size for HS/HS individuals was strikingly lower (median value of 50.68 cm, 95% CI [49.28; 52.07] cm) and associated with disjunct posterior distribution from HB carriers (Fig. 3B).

**Fig. 3: Distribution of supergene’s genotypes and their association with size.**

Range expansion and genomic inbreeding

Population structure (Fig. 1B) suggested strong genetic differences between NEA and NWA. To test for signatures of Range Expansion (RE) and understand the colonization history of the species, we examined genomic signatures of RE by fitting the directionality index between each pair of individuals to the time difference of arrival (TDOA) location algorithm³⁵. The density distribution of the center of origin of the RE computed over 100 independent runs was always found in NEA region, off the coast of Greenland, with more than 80% of runs indicating an origin off the eastern coast of Greenland (Fig. 1A). We then investigated the distribution of Runs of Homozygosity (ROH). Length and number of ROH depends on both species-wide demographic processes and on intra-population levels of genomic inbreeding. Consistently with the RE evidence, we found larger ROH in areas away from the putative center of origin (Fig. S5). For instance, the number (N_ROH) and genomic coverage (SUM_ROH) were always the lowest in the two Iceland sampling sites (W-IC and E-IC) and strikingly higher in the GoM followed by CAN and N-NW (Fig. S5). The significantly larger N_ROH and SUM_ROH in GoM than in CAN, despite the similar amount of genetic diversity (Table 1) and low genetic differentiation (Fig. 1), strongly suggest higher inbreeding in the former as consequence of different mating strategy.

Historical demography

The restricted distribution of HB might be the consequence of neutral and/or selective forces. To better understand the origin, maintenance, and historical demography of the size polymorphism, we first ran the Pairwise-Sequential Markovian Coalescent (PSMC) algorithm³⁶ on each of the 49 whole-genome sequenced individuals. PSMC curves were almost identical for every individual at the regional scale, but the dynamics differed between the NEA and NWA regions, whose trajectory started to diverge ~1 million years B.P. (Fig. 4A, S6). While the exact date of divergence between the two trajectories may be inaccurate³⁷, there is a clear separation of the evolutionary trajectories between the NEA and NWA, consistent with population structure (Fig. 1B).

To further investigate patterns of colonization, migration and divergence between and within the two meta-populations (NEA and NWA; Fig. 4C and S7), we investigated five demographic scenarios using fastsimcoal2³⁸. The AIC criterion computed after choosing the best out of 10 replicates of each model indicated IMM-5-NM-STOP as the most likely (Fig. S7). IMM-5-NM-STOP depicts two meta-populations composed (in this order) of GoM and CAN sampling locations (NWA meta-population) and W-IC, E-IC and N-NW sampling locations (NEA meta-population). Deme effective sizes were highly similar between the NEA and NWA demes (N_E ~ 80,000, 95% CI [74,595; 81,106] and N_W ~ 82,000, 95% CI [78,482; 90,962], Table S3). However, demes were twice as connected in the NEA than in the NWA despite largely overlapping confidence intervals (Nm_E~118 95% CI [76.74; 144.91] and Nm_W ~ 61 95% CI [49.67; 124.05], respectively, Fig. 4C) suggesting high local connectivity within both meta-populations. Going backward in time, the two meta-populations were isolated until T_CH ~ 160,000 (95% CI [47,000; 163,000]) years B.P. when began an asymmetrical exchange of migrants three times greater from the NEA to the NWA than the reverse (Nm_E→W ~ 5.1, 95% CI [4.88; 13.73] and Nm_W→E ~ 1.5, 95% CI [1.69; 5.56] per generation). All lineages finally merged into an ancestral population of N_ANC ~ 101,000 (95% CI [99,116; 106,634]) at T_DIV ~ 891,000 (95% CI [800,000; 920,000]) years B.P., corresponding to the NEA-NWA divergence time (Table S3).

Origin of the supergene

Population structure results within the supergene revealed that the divergence between alleles HB and HS pre-dates the divergence between the NEA and NWA regions (Figs. 1 and 2). This could be due to an introgression of allele HB^39,40. To test this hypothesis, we first computed a PCA within the supergene using A. radiata and a single representative of the congeneric species A. hyperborea. The analysis showed that the A. hyperborea individual was homozygous for allele HB as it clustered with HB/HB individuals from GoM, consistent with an introgression of HB (Fig. S8). We then computed rooted phylogenetic trees in 500 kb windows across chromosome 2 including A. hyperborea to test whether each tree followed a “species-tree” (Fig. 5A) or an “introgression tree” (Fig. 5B) using topology weighting⁴¹. The analysis strongly supported an introgression of HB: topological support was always of 100% for the introgression tree in the supergene region and almost always of ~100% for the species tree outside the region (Fig. 5C). This clearly suggests that HB allele is more closely related to A. hyperborea than to the HS allele. In addition, the Time to the Most Recent Common Ancestor (T_MRCA) extracted from each window’s consensus tree was similar along the chromosome (Fig. 5D), which would be consistent with A. hyperborea being the donor species of HB. The PSMC computed in both HS/HS and HB/HB individuals within the supergene was strikingly different from the trajectory estimated over the whole genome (Fig. 4A, B). Similarly, the one hundred PSMC curves obtained by randomly sampling each time a 31 Mb region in the genome from both HB/HB and HS/HS individuals were incongruent with the supergene’s PSMC curves but consistent with the genome-wide estimates (Fig. S9). Following⁴², we further estimated the divergence time between alleles HB and HS at ~1.5 million years B.P. by computing the PSMC in heterozygote individuals in the supergene region (Fig. 4B).

**Fig. 5: Characterization of the supergene’s introgression.**

Discussion

The striking size polymorphism in the thorny skate^{26,27,28,29,43} offers a rare opportunity to improve our understanding of how the presence of two (or more) morphs may affect the population dynamics through time. We first identified a ~ 31 Mb size-determining supergene characterized by two alleles, HB and HS (the reference genome is built from an individual with the HS/HS genotype), between which recombination is prevented. HB was only found in the NWA where genotype distribution was strikingly spatially differentiated, with a significant deficit of heterozygotes detected in the GoM but not in CAN. Heterozygotes and homozygotes for HB had an average size of ~67 cm, while homozygotes for HS had an average size of ~51 cm (Fig. 3B), suggesting the dominance of HB. Supergenes are known to determine a wide variety of traits^{9,10,11,12,13,14,15,16,17}, but this is a compelling example of the clear association between a supergene and a continuous quantitative phenotype in a group of individuals living in sympatry in a vertebrate species. Notably, size is a continuous trait with polygenic determinism across various species^44,45,46. It is likely that the near-discrete size-determinism revealed in our study involves several genes spanning the supergene region, but we cannot exclude the interplay between them and other genes across the genome^47,48. Given the substantial length of the supergene (~31 Mb) and the presence of numerous genes within it (226), it is also possible that this region controls multiple phenotypes, as exemplified in ants, cods, and butterflies^{9,10,11,12,13,14,15,16,17}. More phenotypic data will be required to better characterize the consequences of this supergene. In addition, the mechanisms inherent to the variation in size should be investigated in the future as it can arise from different non-exclusive molecular processes, e.g., gene silencing, promoter and enhancer placement, differences in transcriptional efficiencies and their downstream impacts on dosage dependent epistasis and pleiotropy, as well as to a different number of genes present in the two alleles. Characterizing differences in gene expression linked to the observed genotypes as well as obtaining high quality de novo assembly of the HB haplotype to uncover fine scale structural variation (such as indels and multiple inversions), will provide valuable knowledge in the future.

Supergenes are maintained in space and time through a combination of demographic (neutral) and selective processes⁶. To shed light on the distribution of the two supergene alleles and of the maintenance of genotypes in the NWA, we investigated the historical demography of A. radiata through its whole range. Defining a comprehensive demographic scenario explaining the whole history of a species is a challenging task, but WGS data provides resourceful basis upon which to test competing models of demographic history. First, clustering algorithms, F_ST and PSMC analyses supported the unambiguous signature of long-term divergence between the NEA and NWA, and weak but spatially continuous genetic differentiation within each region (Figs. 1, 4, S1, S2), consistent with recent findings based on mitogenomic variation³¹. We further detected signatures of range expansion, with shared derived allele frequencies supporting an origin close to Iceland/Greenland and a two-wave colonization: first eastward to the European coasts, and then westward into the NWA (Fig. 1). This is consistent with the distribution of ROH, which were never larger than ~1.41 Mb and were fewer and shorter in the peripheral populations of Greenland (Fig. S5). Range expansions are indeed characterized by a series of founding events that result in higher genetic drift in populations far away from the origin of the expansion⁴⁹. This pattern is consistent with the GoM, CAN and N-NW as the more derived and E-IC and W-IC as the more ancestral populations. The most likely scenario as inferred under the framework of fastsimcoal2⁵⁰ highlights a divergence between the NWA and NEA occurring ~900,000 years B.P., corresponding to the colonization time of the NWA. After colonization, the NWA and NEA metapopulations remained connected by asymmetrical migration, the rate of exchange being ~3 times higher from NEA to NWA than the reverse, until becoming isolated ~160,000 years B.P. The modelling of the genome wide diversity suggests that HB either originated or introgressed in the NWA regions more recently than ~160,000 years B.P., as this allele is absent in the NEA.

Sequencing of the congeneric species A. hyperborea provided strong evidence that HB did not originate in A. radiata, but rather introgressed from a donor species. Indeed, A. hyperborea variants were more frequent in HB than in HS, which was confirmed by a topology weighting approach strikingly grouping HB/HB individuals and the one A. hyperborea individual in the supergene region (Fig. 5). Furthermore, clustering algorithms indicate that HB/HB individuals are separated from all HS/HS independently of their geographic origin (NWA or NEA) (Fig. 2F and S4), in stark contrast with the genome wide results (Fig. 1), and closer to A. hyperborea (Fig S8). This clearly suggests that the HB-HS divergence predates the NEA-NWA divergence. Indeed, the PSMC estimated the divergence between HB and HS at ~1.5 million years B.P. (Fig. 4B), likely corresponding to the separation between A. radiata and the donor species. Given the demographic scenario, the time separation between HB and HS as well as their present-day spatial distribution, we believe that the time when migration stopped between the NEA and NWA provides a reasonable upper limit to the HB introgression into NWA individuals. The donor species could be A. hyperborea, in line with the similar distribution of the T_MRCA inside and outside the supergene (Fig. 5). However, this would need confirmation by studying other Amblyraja species. For instance, we notice that no other Amblyraja species exhibit size polymorphism, and their size distribution generally resembles that of HB carriers except for A. doellojuradoi, which reaches a maximum size closer to that of HS/HS⁵¹. This suggests that multi-species investigations will be required to understand the dynamics of the evolution of this supergene, similar to studies on the origin of the social supergene in fire ants and the timing of associated introgression^39,52,53.

Demographic modelling of genome wide data highlighted high connectivity between the GoM and CAN sampling sites (Nm~61, Fig. 4C and Table S3), which explains why the two sites have similar level of genetic variability, as shown by θ values, Tajima’s D (Table 1), as well as the genome-wide distribution of coalescence times estimated by the PSMC. However, demographic modelling cannot account for two differences observed between GoM and CAN: the genotype distribution at the supergene and the genome-wide excess of ROH in the former. Moreover, neither HB/HB nor HS/HS individuals in the NWA show a coalescence rate dynamic over time within the supergene consistent with the genome-wide pattern (Fig. 4, S6, S9). Supergenes can promote local adaptations even when gene flow is high⁵⁴, and more generally, local adaptations are usually maintained by various selective pressures^1,6,55. Here, we argue that positive assortative mating for size polymorphism could explain both the observed deficit in heterozygotes in the GoM and the significantly greater inbreeding observed through ROH in GoM than in CAN (Fig. S5). We note that it is possible that the supergene controls other traits than size, which may also be under negative selection against heterozygotes in the GoM and thus explain their deficit. However, negative selection could not explain the inbreeding excess, suggesting that positive assortative mating remains the most parsimonious hypothesis to reconcile all the observed genetic characteristics. This is consistent with the previously discussed physical incompatibility in mating between larger and smaller skates in the GoM^31,32 due to evident differences in maximum size and size-at-maturity^25,26,28. These differences tend to disappear northwards as skates sampled off the coast of Newfoundland (i.e., CAN sampling site) do not show a bi-modal distribution of size at first maturity as in the GoM, but rather a unimodal distribution associated with larger variance²⁸. Maturity can covary with the environment⁵⁶ and could be a key factor in explaining possible mating between the size morphs in CAN but not in the GoM. This would have considerable implications for the trajectory and conservation of the NWA population in the context of climate induced environmental change as increasing sea temperature can alter age and size-at-maturity⁵⁷.

Positive assortative mating can have strong short-term consequences. First, it can lead to sympatric speciation⁵⁸; however, high connectivity between the GoM and CAN could maintain recombination between HB and HS carriers (Figs. 2, 4 and Table S3) and thus explain the absence of neutral divergence between small and large individuals in the GoM (Fig. 1, S2). Second, it decreases the probability to find a mate in comparison to panmictic scenarios. We hypothesize that this could progressively lead to an Allee effect^59,60 in the GoM. For instance, Allee effects in marine organisms can occur due to exploitation⁶¹, and we argue here that past overfishing coupled with positive assortative mating could have led to an extinction vortex in the GoM, explaining the recovery trends observed in CAN (Newfoundland) but not in the GoM. All in all, this highlights the significant short-term and context-dependent effects related to phenotypes determined by a supergene locus in a Vulnerable population. We stress that our hypothesis will require direct validation in the future, as our data cannot provide explicit evidence of an Allee effect linked to supergene-determined phenotypes.

We discovered a size-determining supergene and demonstrated its introgression in A. radiata in the last ~160,000 years from a donor species. Our work brings light to new findings of general interest in evolutionary biology: (i) we provide a unique direct example of a continuous quantitative trait whose distribution is largely explained by a simple Mendelian inheritance in a vertebrate species. It is likely that other genomic regions (in conjunction with the environment) contribute to the determination of size, but the cluster of genes controlling such a complex trait will provide an opportunity to dissect its genetic architecture. (ii) Supergenes are known to alter mating patterns as a function of spatial heterogeneity⁷, but for the first time we are able to hypothesize a link between a supergene system and population survival. This is crucial as one relevant question in evolutionary biology is how supergene alleles are maintained through time. In addition, here, we speculate that changing environmental conditions (warming temperatures) may ultimately lead to the loss of HB (as, in the worst scenario, HS would still be present in NEA). (iii) Finally, our study demonstrates (once more) the importance of reconstructing the neutral evolutionary history of a species, an essential background needed to uncover complex non-neutral processes. The inferred demographic scenario was of paramount importance not only to interpret the spatial distribution of the two alleles of the supergene but also to issue a hypothesis for genotype distribution in the NWA, which in turn carry profound implications for the conservation of the thorny skate¹.

Methods

Whole genome sequencing

Forty-nine Amblyraja radiata individuals were sampled from ten regions throughout the North Atlantic including the US Gulf of Maine (GoM, N = 16), Newfoundland, Canada (CAN, N = 5), South West Greenland (SW-G, N = 2), South East Greenland (SE-G, N = 3), East Greenland (E-GR, N = 2), Western Iceland (W-IC, N = 5), Eastern Iceland (E-IC, N = 5), Western Norway (W-NW, N = 1), Southern Norway (S-NW, N = 1), and Northern Norway (N-NW, N = 9). Genomic DNA was extracted using the E.Z.N.A. Tissue DNA Kit (Omega Bio-Tek, Inc., Norcross, GA, USA) following the manufacturer’s instructions. The extracted DNAs were then sent to the Next-Generation Sequencing (NGS) Core of the University of Florida’s Interdisciplinary Center for Biotechnology Research (UF ICBR) for quality control. After that, libraries were prepared, pooled, and loaded on the Illumina NovaSeq 6000 platform for whole genome sequencing with S4 flow cell and 2×151 setup.

Bioinformatics

We used the reference genome of the thorny skate available from the NCBI website (sAmbRad1.1.pri; accession GCF_010909765.2). The genome was first masked using the Chondrichthyes database in a first run of RepeatMasker v.4.1.0⁶². We then created a de novo database for A. radiata by using RepeatModeler v.2.0.3⁶³ on the genome masked at the first step. Then, we masked the repeated elements annotated in the de novo database by running RepeatMasker a second time on the initially masked genome. We finally extracted a bed-file of the masked regions further used in downstream bioinformatic analyses.

Illumina reads for the 49 samples were trimmed for adapter and quality using bbduk from bbmap v.38.44 suite (sourceforge.net/projects/bbmap/). After checking for quality using FastQC v0.11.7⁶⁴, reads were mapped against the reference genome using bwa mem algorithm v.0.7.17⁶⁵ with -M option. Mapped reads were sorted and indexed using samtools v.1.10⁶⁶ and then marked for duplicates using Picard v.2.21.2 MarkDuplicates⁶⁷. Except for the PSMC analysis (see below), indexed reads were fed to the haplotypecaller algorithm in GATK v.4.1.9.0⁶⁸ for variant discovery using the -gvcf option to obtain individual variant calling files (VCF) with annotations for all sites. Individual VCFs were then combined using CombineGVCF to build different datasets according to the downstream analysis (see below). Joint calling was then performed for each dataset using GenotypeGVCF by including both monomorphic and polymorphic sites (all-sites argument) which are necessary for scaling genetic diversity correctly. We then selected the 49 identified autosomes and removed the regions annotated as repeats using the bed-file produced by the repeat masking step. By combining VariantFiltration and SelectVariant GATK’s scripts, we then filtered out sites with Mapping Quality <40 and marked genotypes as missing if genotypic depth (i.e., depth per individual and per site) was below 6 or over 50. We further removed chromosome 2 and 8 for all genome-wide historical demographic analyses after genomic scans identified two potential large chromosomal inversions (Table S4). Additional filters were applied on the resulting VCF depending on the analysis (described below).