Introduction

Understanding how species will respond to climate change is vital to design effective conservation actions that enhance species resilience and persistence on a rapidly changing planet1,2. Often it is not obvious what traits provide adaptation in species today, or what heritable genotypic variation supports adaptive trait values across a species range3,4. Genomics provides a means to identify DNA variants with clinal population frequencies spatially correlated with environmental variation5,6,7. Clinal variants are likely only a subset of adaptive variation for a given trait8, but in some cases the molecular signature of adaptation across the landscape is sufficient to evaluate whether populations are currently maladapted9, which populations occupy locations where current genomic variation is a poor match with future environments10,11, and which populations will need to shift to track environmental change. These insights can help prioritize conservation approaches to mitigate constraints on dispersal or evolutionary capacity that can lead to extirpation when environments change rapidly10. This is especially needed in freshwater systems, which are facing rapid and global biodiversity loss and are limited in dispersal capabilities by the boundaries of aquatic systems12,13,14,15.

Freshwater fishes, in particular, are one of the most threatened groups of vertebrates in the world16,17. The multifaceted and interacting effects of climate change on freshwater systems, such as increasing temperatures, decreasing snowpack, increasing fires, changes to flood regimes, and encroachment by invasives, are causing serious declines in suitable habitat for cold-water fishes18,19,20. A mechanistic evaluation of genomic influences upon thermal adaptation is needed to manage cold-water fishes in the face of warming aquatic environments by identifying populations most at risk, as well as source populations for translocation and reintroduction efforts10.

The brook trout (Salvelinus fontinalis) is a cold-water species experiencing climate induced population declines21,22,23. The brook trout native range spans the southern Appalachians through northern Quebec and the Great Lakes region. Previous work has revealed fine scale genetic differentiation within watersheds24,25, as well as intraspecific differences in brook trout physiological responses to thermal conditions26. The broad distribution of brook trout across North America, sensitivity to environmental conditions, and high level of population structuring make brook trout ideal for understanding local climate adaptation. In addition, this is complemented by the ecological and social importance of this fish in many north temperate aquatic ecosystems, which is exemplified by the fact that brook trout is the official state fish for 10 US states and of cultural importance to many Native American tribes. In this work we used genomics combined with a common garden experiment and climate change modeling to identify local adaptation to climate conditions in brook trout populations and identify areas where populations are at greatest risk of extirpation from climate change. We did this by (1) mapping genomic variation associated with climate conditions across the brook trout native range, (2) identifying genomic diversity linked with molecular responses to climate conditions, and (3) characterizing risk of extirpation across the range under varying climate conditions. Collectively, this work highlights the importance of understanding the links between genomic diversity, molecular response to climate conditions, and environmental conditions to predict population extirpation risk under future climate scenarios.

Results

Range-wide genotype-environment associations

First, we identified genetic variation associated with climate adaptation in brook trout by using restriction-site associated DNA sequencing (RADseq) to survey 24,337 single-nucleotide polymorphisms (SNPs) in 201 brook trout individuals from 82 sites across the native range of eastern North America. We then conducted a series of partial redundancy analyses (pRDA)11 to model genotypes as the multivariate response to climate variables from the ClimateNA database27. After controlling for multicollinearity, we retained a total of five uncorrelated climate variables for pRDA (mean annual precipitation, mean annual radiation, number of degree days >18 °C, summer heat moisture index, and mean annual relative humidity; Supplementary Fig. S1). Next, we used pRDA to perform variance partitioning aimed at identifying major drivers of spatial genetic variation, after accounting for genetic drift. We first quantified the amount of variance explained by a full model containing explanatory variables for climate, along with variables for neutral population structure (Supplementary Fig. S2) and geography. We then compared separate models for each set of variables, which allowed us to assess their independent contributions to observed genetic variance. Our full model explained 33.11% of total genetic variance among brook trout populations across northeastern North America (p < 0.001; Supplementary Table S1), suggesting the variables included in our model explain a large amount of genetic variation in native brook trout. We found that 15.39%, 32.36%, and 4.10% of this explainable variance was attributed to climate, neutral population structure, and geography, respectively (p < 0.001 for all models). The percentage of total variance explained by our climate model (5.10%) is relatively high compared to analogous values reported for other systems (0.1–5.0%), demonstrating a strong influence of climate on genomic variance in native brook trout11,28,29,30. This highlights the importance of local adaptation to climate in driving patterns of genomic diversity across the brook trout native range.

We next used pRDA to perform genotype-environment association analyses aimed at identifying loci under selection. We evaluated a model with the five climate variables as predictors of genetic variation while controlling for neutral population structure. This model explained 5.76% of total genetic variation, where the majority of the genetic variation associated with the climate predictors (60.7%) was explained by the first two RDA axes (Supplementary Fig. S3). Relationships on the first axis were primarily driven by variables for precipitation and radiation, whereas the second axis was strongly associated with moisture and humidity, suggesting that these climate variables play an important role in shaping genetic variation in brook trout across the native range. We used these pRDA results to identify 1059 outlier SNPs distributed throughout the brook trout genome (Supplementary Fig. S4) that potentially reflect adaptation to local climatic conditions.

After identifying candidate adaptive loci, we used the relationship of these loci to the climate variables to predict the degree of adaptive similarity among brook trout populations across the native species range. We performed an adaptively enriched pRDA11 based on the outlier loci (Fig. 1A). This model explained 19.02% of total genetic variation for the outlier loci, showing that the climate variables evaluated here explain a large proportion of the variation at candidate adaptive loci. Genotypes along the first RDA axis, which explained 57.8% of the genetic variation associated with the climate predictors, were positively associated with mean annual precipitation, mean annual radiation, and the number of degree days above 18 °C. Negative associations along this axis corresponded with the remaining two climate variables (summer heat moisture index, mean annual relative humidity). In comparison, an adaptive landscape based on RDA axis two (22% of the variation, Fig. 1B), which was strongly driven by the summer heat moisture index, primarily differentiated eastern and western regions of the native range. These results highlight how climate variables related to both precipitation and temperature have been important drivers of adaptation in brook trout. This is in line with other work that has shown warming water temperatures and changing precipitation patterns are two of the leading effects of climate change on freshwater fishes31. We then used the relationships on RDA axis one to calculate an adaptive index11,32 (Fig. 1B). By projecting this index across the native range, we show an adaptive landscape that primarily differentiates regions along a north-south gradient, with the greatest contrast between regions in the southern Appalachians and northern Canada. In addition, these two geographic regions also exhibit the highest number of private alleles within the outlier loci identified using pRDA, compared to all other locations along the native range (Fig. 2). Even though other regions exhibit private alleles within non-outlier loci, they have little to no private alleles in outlier SNPs. These results highlight the genomic distinction of brook trout at the northern and southern ends of the native range, which is largely driven by climate adaptation.

Fig. 1: SNPs associated with climate in brook trout across northeastern North America.
figure 1

A Adaptively enriched pRDA space depicting relationship of climate variables and outlier loci identified using pRDA. The first two RDA axes explained 80% of the genetic variance associated with the climate variables (DD18 the number of degree days above 18 °C, MAP mean annual precipitation, MAR mean annual radiation, SHM summer heat moisture index, and RH mean annual relative humidity). Outliers also identified as eQTLs are highlighted in red. B Adaptive landscape based on adaptive index from RDA1 and RDA2 projected across the native range. The color-scale indicates the relationship between climate variables and adaptive genetic variation on RDA axis one. Darker scores reflect a strong positive association between DD18, MAP, and MAR and adaptive diversity; lighter scores reflect a strong negative association with SHM and RH.

Fig. 2: Private variation found within major genetic groups of brook trout across the native species range.
figure 2

A Private genetic variation associated with non-outlier SNPs identified using pRDA and B outlier SNPs identified using pRDA. Genetic groups defined by Mamoozadeh et al. (25). Source data are provided as a Source Data file.

Identifying expression quantitative trait loci via a common garden experiment

Associations between climatic variables and genetic variation can show us patterns of adaptation across the landscape and highlight areas of the genome affected by selection. However, they do not show the underlying molecular mechanisms that link genotype with phenotype and climate33. A primary way that the genome of an organism interacts with its environment is through gene expression—the up and down regulation of genes underlying responses to environmental conditions. To understand if the outliers we identified are directly involved in gene expression responses under thermal stress, we conducted a common garden experiment to isolate the genotype-gene expression phenotype relationship. We used four populations of brook trout from Adirondack Park in upstate New York that are in close geographic proximity to one another. These lakes provide an excellent system for studying thermal adaptation as they support distinct populations in the same climatic region and genetic group (Northeastern group, as shown in Mamoozadeh et al.25), but experience varying thermal regimes as two of the lakes are thermally stratified in the summer, while two do not34. After wild fish were acclimatized to a captive environment for 30+ days, we exposed them to contrasting treatments—thermally stressful (21 C) or thermally optimal (15–17 C) temperature. We then sampled gill tissue from each fish for RNA-sequencing (RNAseq) and fin tissue for RADseq on day 3, 9, or 15 of the treatment.

Using the RADseq dataset derived from the thermal adaptation experiment, we identified 22,986 SNPs spread across the genome and found that each lake was genetically distinct (Fig. 3A). We also found, based on the RNAseq dataset, that each lake had a unique gene expression response to thermal stress (Fig. 3B, C). Within each lake, there was a small core set of genes (49–87) responding across time points, but there were large differences across time points in the number of genes differentially expressed (Fig. 3B). At each timepoint, the largest number of differentially expressed genes were unique to each lake; however, a core set of genes responded similarly to thermal stress regardless of lake origin (29–45 genes at each time point). This is consistent with other findings showing strong divergence in gene expression responses to thermal stress across fish species35 and ecotypes36. Here we show even finer scale divergence, at the level of populations within a geographic area, highlighting the fine scale at which responses to thermal conditions can occur and, therefore, the importance of identifying local adaptation to evaluate population responses to climate change. Using all of the differential expression data, we identified 42 modules of genes that were significantly correlated in their expression using weighted gene correlation network analysis (WGCNA, Supplementary Fig. S5, Supplementary Data 1). These modules represent groups of genes that are likely to interact during regulation. Twenty-five of these modules were significantly associated with temperature, with the remaining modules showing lake-specific expression patterns or linked with age or sex of the fish. The most highly significant module contained 2037 genes and was significantly associated with both temperature and the number of days at a given temperature, regardless of lake of origin. These results show a tight linkage between thermal conditions and gene expression patterns in brook trout, demonstrating the molecular mechanisms underlying responses to thermal stress and potential pathways for adaptation.

Fig. 3: Genetic and gene expression differences among the four lake populations.
figure 3

A MDS plot of 22,986 SNPs showing genetic distinction of each lake population. Individuals are given a unique number-letter combination and are color coded by their lake. Corresponding lake name is given next to each cluster. B Number of genes differentially expressed in response to thermal stress (>21 C), by lake and sampling time. C Venn Diagrams of differentially expressed genes in response to thermal stress by lake and time. The total number of genes differentially expressed in response to thermal stress for each lake is provided in parentheses after each lake name. Source data are provided as a Source Data file.

The gene expression patterns show the cellular response to thermal stress, however, the ability of populations to evolutionarily adapt to increasing temperatures relies on genetic underpinning for that response, along with the presence of variation in that genetic basis for selection to act upon. Therefore, to characterize the genetic basis for the observed gene expression responses to thermal stress, we identified SNPs where an allele at a particular locus was significantly correlated with up and/or down regulation of a particular gene in response to thermal conditions (ie expression quantitative trait loci, or eQTLs). We found 1154 SNPs, spread across the genome, where the allele at a particular SNP was highly significantly correlated with the gene expression pattern of a gene that was differentially expressed in response to thermal conditions. There were 292 differentially expressed genes that were highly significantly correlated with the genotype of at least one of these eQTLs, for a total of 1448 gene-eQTL associations, spread across the genome (Supplementary Fig. S6).

Ninety-one of these eQTLs were also present in the range-wide RADseq dataset and ten of them were identified as being significantly correlated with the five climate variables via the pRDA analysis (Supplementary Table S2, Supplementary Fig. S7). Given our conservative significance thresholds used to identify outliers in the pRDA, there is lower than 1% probability that any of the ten eQTLs had false positive associations with climate. This highlights the strength of evidence that these eQTLs not only have an impact on thermal gene expression responses, but are also linked with environmental variables across the native species range. For example, an eQTL found on chromosome 5 (Supplementary Fig. S4) exhibited the largest correlation among the eQTLs with three of the five climate variables analyzed here (mean annual precipitation, summer heat moisture index, mean annual radiation; Supplementary Table S2); this was also in the most highly significant gene module in the WGCNA analysis, which was associated with temperature. Allele frequencies for this locus were highest in the warmest part of the range (southern Appalachians) and lowest in the coldest part (northerly and westerly regions of the native species range) (Fig. 4; see Supplementary Fig. S8 for remaining climate variables).

Fig. 4: eQTLs were highly associated with climate in brook trout across eastern North America.
figure 4

Results are shown for the eQTL exhibiting the largest correlation with mean annual precipitation (MAP; Panel A) and summer heat moisture index (SHM; Panel B) in results from pRDA. Two eQTLs exhibited an equally large correlation with SHM; results for only one of these loci are shown here. Climate variables are shown in color-scale on each map. Points correspond with individual sampling sites and are color-coded by group allele frequency. Allele frequencies were calculated by combining sites into 31 groups by geographic and climatic proximity (see Methods for explanation of grouping). Left side of color-scale corresponds with allele frequency. Right side of color-scale corresponds with mm (Panel A) or C/(mm*1000) (Panel B). Pearson coefficients (r) describing the level of correlation between the climate variable and individual genotypes at the eQTL are shown (two-sided tests). The p-value used to distinguish the eQTL as an outlier based on pRDA results is also shown (using Bonferroni correction at an alpha significance level of 1%).

Genomic offset across the brook trout native range

Given this tight linkage between genotype, molecular response to heat stress, and environmental variables, we quantified genomic offset across the native species range to identify where populations are at greatest risk of being unable to adapt to changing climate due to a lack of genomic variation. Genomic offset1,11,37,38 quantifies the difference between adaptive index based on current vs. future conditions, and therefore is a proxy for the shift in adaptive genomic variation necessary to track climate change11. We used the relationships on RDA axis one from the adaptively enriched pRDA to calculate genomic offset between historical and future climate scenarios to identify areas within the native range where shifts in adaptive index may be necessary under climate change (Fig. 5). The resulting maladaptive landscapes reflected genomic offset across large portions of the native species range, with the greatest offset found in the southern Appalachians. This analysis shows that populations of brook trout at the southern end of the native range are at greatest risk of lacking genomic variation needed to withstand future warming at their current location, even under modest projections of climate change. This is alarming, and demonstrates a genomic framework is needed to mitigate and plan for the projected loss of up to 92% of trout habitat under similar climate change models in the southern Appalachians39. Under more dire climate scenarios (RCP 8.5), the high level of climate vulnerability extends throughout much of the US portion of the native range.

Fig. 5: Maladaptive landscapes based on genomic offset projected across eastern North America.
figure 5

Results are shown for the RCP4.5 emission scenario at the years 2050 and 2080 (likely global mean surface temperature change of 0.9–2.0 and 1.1–2.6 C, respectively) and for the RCP8.5 emission scenario at the years 2050 and 2080 (likely global mean surface temperature change of 1.4–2.6 and 2.6–4.8 C, respectively). Higher genomic offset values correspond with greater mismatch between current genomic diversity and future conditions. Insets depict linear model (solid line) with 95% confidence intervals (shading) describing highly significant relationship between genomic offset and latitude for each emission scenario (exact p-values as follows: 1.185e−30 [top left], 9.719e−31 [top right], 3.158e−28 [bottom left], and 2.591e−28 [bottom right]). Linear least squares were approximated using QR factorization using the lm function in R. Source data are provided as a Source Data file.

Discussion

Collectively, this study yields crucial insights for conservation strategies aiming to safeguard brook trout populations and highlights the importance of adopting a population-level approach to conservation in the face of climate change, particularly in wide-ranging species like brook trout. By delving into intraspecific variation within individual populations, our work sheds light on nuances that might be overlooked with a range-wide perspective alone. Notably, we emphasize the imperative to protect the southern edge of the range, recognizing it as an adaptively unique segment that contains variation that may be increasingly important to mid-latitude populations as temperatures rise. Given the additional heightened vulnerability of southern populations due to the greatest genomic offset under future warm conditions, targeted interventions, such as assisted migration and/or geneflow, may also be needed to prevent their extirpation due to warming. Furthermore, our research underscores the significance of safeguarding diverse populations across the range to preserve the full spectrum of genetic diversity within the species. Different populations may require different interventions (e.g. translocations, reintroductions, assisted geneflow, etc), depending on the amount of adaptive variation present and the range shift distances required to match habitable conditions40. For instance, regions exhibiting high genomic offsets and low allele frequencies at crucial thermal tolerance loci may necessitate assisted gene flow to provide an influx of adaptive variation or assisted migration to more habitable areas, or a combination of both. Additionally, while our genomic dataset interrogates a fraction of the genome, it uncovers important genomic variation that can be monitored throughout the range to ensure conservation and management actions (such as assisted gene flow, captive breeding, and reintroductions) are equipping populations with the variation needed to withstand future warming.

This study opens exciting avenues for future research to understand the efficacy of assisted gene flow, and provides testable hypotheses for the design of assisted gene flow projects within a species range, particularly given the tractable nature of working with brook trout in the wild and captivity40,41. An exciting next step in this work will be to further study the linkages between the important genomic variation identified here, gene expression patterns, and other fitness related phenotypes, such as growth and fecundity, under different environmental conditions. Additional future work exploring the entire genome with whole genome sequencing and through comparisons to the newly available brook trout reference genome may also uncover insights by identifying additional linkages between genomic variation and climate adaptation, including possible structural variants42. Additionally, our list of eQTLs, while highlighting important genomic variation for temperature adaptation, is certainly not exhaustive. Given that our pRDA identified climatic variables beyond temperature as being important for adaptive variation in brook trout, future research evaluating the gene expression patterns in response to additional important climatic conditions (such as precipitation) is likely to find additional important eQTLs. Further, studies using individuals from the southern end of the range, which contains the second highest number of private alleles in outlier SNPs (Fig. 2), would potentially find additional important variants. We expect that additional regions of the genome are likely involved in climate adaptation but were not identified here due to non-monotonic relationships between genetic and climatic variation8, which may affect predictions about future adaptation. In summary, additional work is now needed to verify the levels of genomic offset predicted here, including to account for other ecological or evolutionary dynamics that may positively or negatively affect future adaptation, among other factors potentially influencing the realized response of organisms to climate change over near- and long-term timescales8,43,44,45.

Collectively, we hope our findings will contribute to the ongoing discourse on conservation science and planning in the face of climate change. In addition to highlighting the importance of understanding the basis for adaptive capacity and the potential consequences of limited adaptive variation for species persistence under climate change, our study highlights the value of combining methodologies and spatial scales for studying adaptive variation. Much previous work has focused on associations between SNP genotypes and environmental variables, but few studies have then examined the role those SNPs play in molecular responses to environmental variables10. By linking molecular responses in controlled experiments at local scales with range-wide patterns in allele frequencies and environmental variables, we uncover direct links among genotypes, gene expression patterns, climate conditions, and climate change vulnerability. We also find that these loci are likely involved in adaptation to climatic conditions, beyond just thermal stress (e.g. precipitation), highlighting the complex ways organisms’ genomes and their environment interact. Collectively, this work demonstrates the capacity to understand an organism’s potential for climate adaptation by leveraging genomic and environmental data across a broad species range with experiments that reveal underlying mechanisms associated with this adaptation.

Methods

All research conducted in this study complies with all relevant ethical regulations. Animal care and use was approved by the Cornell University Institutional Animal Care and Use Committee.

Range-wide genotype-environment associations

Sample collection and range-wide SNP discovery via restriction site-associated DNA sequencing

To explore range-wide temperature associations, we leveraged the genome-wide SNP dataset generated by Mamoozadeh et al.25 in brook trout sampled from 82 sites spanning the full extent of the brook trout native range. See Fig. 4 for depiction of the 82 sampling sites. See Mamoozadeh et al.25, for the detailed methods on sample collection and SNP discovery. Briefly, we extracted genomic DNA from tissue samples using the magnetic bead-based protocol described by Ali et al.46. Gentra Puregene Tissue Kit (Qiagen), DNEasy 96 Blood and Tissue Kit (Qiagen), E-Z 96 Tissue DNA Kit (Omega Bio-Tek), or a modified version of the salt extraction protocol described by Aljanabi and Martinez47. Isolated DNA was quantified using Quant-iT PicoGreen assays (Thermo Fisher Scientific). We selected DNA for two to eight individuals from each site to construct six RADseq libraries. We followed the library preparation methods reported in 42, except we used 100 ng of DNA in digestion reactions and ligation reactions were incubated for 12 h. We performed PE150 sequencing of libraries on an Illumina HiSeq 4000 located at the Michigan State University Research Technology Support Facility.

SNPs were called from the resulting RADseq data using Stacks (version 2.448). We demultiplexed FASTQ files using the process_radtags module of Stacks and options for a barcode mismatch threshold of one base pair and to remove reads with uncalled bases or low quality scores. Demultiplexed reads were aligned to the Salvelinus sp. reference genome49 using BWA (version 0.7.17) as described above. We used SAMtools (version 1.9) to remove secondary and supplementary alignments and alignments exhibiting quality scores <30. We called SNPs from quality-filtered alignments using the gstacks module of Stacks and default settings. The gstacks module enables reference-informed variant calling, similar to the Stacks modules used to identify variants for the common garden experiment (described below). A VCF file was produced using the populations module of Stacks.

We applied a series of quality filters to the SNP dataset produced using Stacks. We used VCFtools to exclude loci missing ≥75% of genotypes, individuals missing ≥90% of genotypes, and genotypes with quality scores ≤30 and read depths <6, in that order. We used jvarkit to remove genotypes with an allele balance >0.80 or <0.20. We also removed loci with minor allele counts <3. We next performed iterative filtering to remove loci and individuals missing excessive genotype calls, and ultimately retained loci where <20% of genotype calls were missing and individuals where <30% of genotype calls were missing. We used HDplot to remove potentially paralogous SNPs by excluding loci corresponding with >50% heterozygotes or read ratio deviations >|4|. We removed SNPs rendered monomorphic by the filtering process using dartR (version 1.1.6). Finally, to reduce the probability of linkage disequilibrium among loci, we used a custom R script to retain only the first SNP on each RAD locus.

Climate data

We extracted data from the ClimateNA database for 27 bioclimatic variables at 82 sites across the brook trout native range. These bioclimatic variables consisted of:

MAT: mean annual temperature

MWMT: mean temperature of the warmest month

MCMT: mean temperature of the coldest month

TD: continentality (difference between MCMT and MWMT)

MAP: mean annual precipitation

MSP: mean summer (May to Sep) precipitation

AHM: annual heat moisture index

SHM: summer heat moisture index

DD_0: degree days below 0 C

DD5: degree days above 5 C

DD_18: degree days below 18 C

DD18: degree days above 18 C

NFFD: number of frost-free days

bFFP: julian date on which frost-free period begins

eFFP: julian date on which frost-free period ends

FFP: frost-free period

PAS: precipitation as snow

EMT: extreme minimum temperature over 30 years

EXT: extreme maximum temperature over 30 years

Eref: Hargreave’s reference evaporation

CMD: Hargreave’s climatic moisture index

MAR: mean annual solar radiation

RH: mean annual relative humidity

Tave_wt: winter (Dec to Feb) mean temperature

Tave_sm: summer (Jun to Aug) mean temperature

PPT_wt: winter (Dec to Feb) precipitation

PPT_sm: summer (Jun to Aug) precipitation

We extracted values for the period 1961–1990 to reflect historical climatic conditions. We also extracted values for projected future climatic conditions based on an ensemble of 15 CMIP5 AOGCMs and two emission scenarios (RCP4.5, RCP8.5) for the years 2050 and 2080. ClimateNA data were re-projected into the WGS84 reference coordinate system. We standardized the bioclimatic variables to control for differences in mean and standard deviation. We also controlled for multicollinearity among bioclimatic variables by retaining a single variable from groups of variables with Pearson correlation coefficients ≥0.70. Where possible, we determined which variable to retain based on known impacts on brook trout physiology.

Identification of climate-associated SNPs

We performed a series of partial redundancy analyses (pRDA) using the vegan R package50 to disentangle drivers of genetic variation and identify SNPs associated with climate in brook trout across the native range. We used pRDA in four ways: 1) to conduct variance partitioning to estimate the proportion of genetic variance explained by sets of predictor variables corresponding with climate, neutral population structure, and geography; 2) to perform genotype-environment association analyses to identify regions of the genome associated with climate and evaluate the relationship (magnitude and direction) of climate predictors; 3) to calculate an adaptive index based on the relationship of putatively adaptive loci and climate predictors to assess patterns of adaptation across the native range; and 4) to quantify genomic offset based on the difference in adaptive index between current and future predicted climate scenarios to identify the most at-risk regions across the native range. In all pRDA models, quality filtered SNPs for brook trout sampled across the native range were treated as the multivariate response to climate predictors. We tested five bioclimatic variables (MAP, SHM, DD18, MAR, RH) that remained after controlling for multicollinearity. We first used pRDA to conduct variance partitioning aimed at estimating the amount of genetic variance explained by one set of variables while controlling for the effect of other variables11. We tested a pRDA model that included variables for climate, neutral population structure, and sampling geography. We then performed pRDA using models with predictors for either climate or neutral population structure or sampling latitude, where remaining variables were retained as conditioning variables. Comparing results from models based on a single set of variables to the model based on all three sets of variables allowed us to assess the independent contribution of each set of variables to observed genetic variance. To account for neutral population structure, we used the allele loadings for axes one and two from a principal component analysis performed using putatively neutral SNPs (Supplementary Fig. S2). We identified putatively neutral SNPs based on results from PCadapt, where a false discovery rate of 40% was used to distinguish outlier loci, leaving behind loci conservatively identified as neutral. We took this approach to minimize the chance of false positives from neutral variation being identified as correlated with environmental variables. We also used the latitude of sampling sites to account for the primarily north-to-south geographic distribution of sampled individuals.

We next used pRDA to identify climate-associated SNPs. We tested a pRDA model that included the five selected bioclimatic variables as predictors of genetic variation. Additionally, we included conditioning variables for neutral population structure since variance partitioning analyses attributed a large proportion of variance to neutral population structure. We also inferred relationships where both neutral structure and latitude were used as conditioning variables; however, results were similar to models without latitude, therefore we describe results based on the more simplified model here. From these pRDA results, we delimited a set of outlier loci based on the Mahalanobis distance of each locus to the center of ordination space and a p-value of 0.01 modified using a Bonferonni correction11. Mahalanobis distances were based on the first two ordination axes because these axes explained a large proportion of genetic variance associated with the predictors. To assess the consistency of loci identified as outliers, we also tested a pRDA model where sampling site latitude was included among the conditioning variables. We also tested a model that did not include any conditioning variables. Finally, we tested whether statistically significant associations between SNPs and climate variables were apparent when analyzing a random set of brook trout across sampling sites. We selected 20 brook trout from across the native species range and used this same set of individuals to reflect genetic diversity at each sampling site. To ensure that genetic variation representative of the full native range was included in our sample set, we randomly selected four individuals from each of the five major genetic groups identified across the native range (as presented in 25). Thus, each sampling site corresponded with genotypes for the same 20 individuals, which reflected each of the major genetic groups of brook trout previously identified across the native range. We then conducted pRDA aimed at identifying SNPs significantly associated with climate; however, this pRDA model did not have any explanatory power (R2 = 0.000, Adjusted R2 = −0.003). This result indicates that the climate-associated SNPs we identified using the true dataset reflect associations driven by adaptation to local conditions.

Calculation of allele frequencies

We compared climate-associated SNPs identified using pRDA with eQTLs to assess whether SNPs underlying the molecular response to thermal stress in brook trout from Adirondacks Park are also associated with adaptation to thermal conditions elsewhere across the native species range. For SNPs identified as significant in both analyses, we visualized mean allele frequencies relative to the five selected bioclimatic variables across the native range. Because only a small number of individuals were surveyed at each sampling site, we formed larger groups of individuals by combining sites that met the following criteria: 1) sites located within one degree latitude of each other, 2) sites located within one degree longitude of each other, and 3) sites with values for a given bioclimatic variable within 0.5 units (standardized values). Each of these criteria were required to be met for individuals from more than one site to be combined into a single group prior to calculating allele frequencies within groups. These criteria were evaluated separately for each of the five climate variables; however, the resulting groups were identical across variables, thus a single grouping scheme was employed here. This resulted in 31 groups containing 1–25 (mean = 6) individuals sampled from 1 to 12 (mean = 2.5) sites (Supplementary Fig. S9). Allele frequencies were calculated within these groups. Although several groups contained only a single individual, this approach reduced the number of instances where allele frequences were calculated using only a small number of individuals sampled from a single site.

Adaptive and maladaptive landscapes

We used the relationship between climate-associated SNPs identified using pRDA and their potential climate drivers to predict adaptation to historical climatic conditions across the native range of brook trout. We performed an adaptively enriched RDA where climate-associated SNPs were treated as the multivariate response to the five selected bioclimatic variables. We then used the scores of the bioclimatic variables along the RDA axes to calculate an adaptive index for each environmental pixel of the brook trout native range. This index was calculated following the methods described in 11 (including provided analysis scripts), and calculations were performed independently for the first two ordination axes. We then mapped the adaptive index across the brook trout native range to visualize adaptive gradients across the landscape.

The relationship between climate-associated SNPs and their potential climate drivers was also used to predict maladaptation to future climatic conditions across the brook trout native range. We extrapolated results from the adaptively enriched RDA to future climatic conditions under the RCP4.5 and RCP8.5 emission scenarios at the years 2050 and 2080. We then calculated the Euclidean distance between predictions based on historical and future climatic conditions for each environmental pixel of the brook trout native range. This distance corresponds with genomic offset, which reflects the change in genetic composition necessary for populations to track changing climatic conditions. We followed the methods described in 11 (including provided analysis scripts) to calculate genomic offset and map results across the brook trout native range to visualize predicted future maladaptation across the landscape. Because genomic offset is derived from the RDA relationship, it is ultimately informed by information contained in individual genotypes.

Common garden experiment

Fish collections

We collected brook trout from four lakes in Adirondack Park, NY (East, Wilmurt, Panther, and Rock Lakes). We chose these lakes because they are in close geographic proximity to each other and they vary in their thermal profiles34. Two of the lakes (East and Wilmurt Lakes) stratify in the summer, so there is cold water refuge for brook trout during the heat of the summer. The other two lakes (Rock and Panther Lakes) are unstratified, meaning the lake mixes entirely and the temperature at the top of the lake is the same as the bottom of the lake. Adult fish were collected from each lake using trap nets in April-May of 2015. Trap nets were deployed for 24 h (East and Panther Lakes: April 29-30, Rock and Wilmurt Lakes: May 18-19,22.), after which fish were collected from the pen portion of the net and anesthesized with buffered MS-222 (500 mg/L MS222, 420 mg/L NaCO3). Weight and length was measured on each fish, and they were tagged with numbered and colored Floy tags, using one color per lake. This allowed us to then mix the fish from different lakes in tanks and still identify individuals and the lakes from which they originated. Immediately after measuring and tagging, fish were brought back to the Cornell Little Moose Field Station in Old Forge, NY where they were allowed to acclimate to holding conditions in large flow-through rearing circular tanks for at least 36 days prior to beginning the thermal stress experiment. Fish were fed to satiation each day, to ensure food availability did not affect experimental outcomes, on a hatchery meal diet.

Heat stress common garden experiment

Fish were transferred to 712-L tanks with a flow-through freshwater system for the heat stress common garden experiment. We took length and weight measurements on each fish prior to placing them in their assigned experimental tank. There were two replicate tanks per temperature treatment (thermally optimal and thermally stressful) with 9–12 individuals from each lake, for a total of 38–44 total fish per tank. The hypolimnetic water source for the tanks came from a nearby lake. After at least 24 h acclimation in the experimental tanks, we began the heat ramp for the 21 C tanks. Temperature in the “hot” treatment tanks was increased 0.5–1.0 C per 24 h using submersible heaters until reaching the target temperature of 21 C. Both tanks reached the target temperature within 8 days, marking the beginning of the exposure. Temperature loggers were deployed in the bottom and top of each tank at the beginning of the experiment and took measurements every 30 min through the entirety of the experiment. The water temperature in the “thermally optimal” treatment tanks ranged from ~15 to 17 C. Fish were fed each day to satiation, except 24 h prior to sampling. Each tank was aerated with submersible air blocks and the tanks were covered to avoid stress from movements outside the tanks, other than cleaning every other day to avoid food waste and algae buildup.

On days 3, 9, and 15 of exposure, we sampled fish from each lake/tank combination for transcriptome and genome sequencing (See Supplementary Table S2 for number of fish sampled per tank at each timepoint). Fish were euthanized by submersion in buffered MS-222. Fish were weighed and measured, and then gill tissue was immediately sampled and placed in RNAlater. Fin clips were also taken and placed in 95% ethanol. Otoliths were dissected and placed in 1.5 ml tubes for aging. Samples in RNAlater were allowed to sit at room temperature until that evening and then transferred to 4 C overnight. At 11am the following day samples were transferred to −80 C storage.

Fish were aged using one of two approaches: using known length-age relationships for each lake or with otoliths. The age range for Wilmurt, Rock, and Panther lakes was 1–2 years (average 1.08 years). The age range for East Lake was 2–4 years (average 2.95 years). Given this difference, age was used as a cofactor in all subsequent analyses.

Library preparation and sequencing

We extracted genomic DNA from fin clips using the Qiagen DNeasy 96 Blood and Tissue extraction kit, and quantified DNA concentrations using the Invitrogen Qubit Assay Kit (Life Technologies, Carlsbad, California). Samples were individually barcoded and randomly included in one of two library constructions. We constructed RADseq libraries with 150 ng of DNA per sample using the SbfI restriction enzyme and bestRAD protocols described in Ali et al.46. Libraries were sequenced at the Cornell Institute of Biotechnology using the Illumina NextSeq 500, generating 75 base pair paired end reads.

Gill tissue were homogenized using the Qiagen TissueLyser II in Buffer RLT Pluse (RNeasy Pluse Mini Kit) and then RNA was extracted following standard procedure using the Qiagen RNeasy Plus Mini Kit. Libraries were prepared for sequencing using the Illumina TruSeq Stranded mRNA High Throughput kit, following manufacturer’s protocol. Each fish was individually barcoded. Samples were randomly pooled across three library preparations, resulting in 45, 46, and 48 samples per library. Libraries were then sequenced at the Cornell Institute of Biotechnology on the Illumina HiSeq 2500 using 100 base pair read lengths. Each library was sequenced on 2–3 lanes to achieve desired sequence depth.

Statistics & reproducibility

SNP discovery

SNPs were called from the RADseq data using Stacks (version 1.4848). First, the fastq files were demultiplexed and cleaned with Stacks process_radtags. We removed individuals that had less than 120 K reads. After removing potential PCR duplicates with Stacks clone_filter, the reads for the remaining samples were aligned to the Salvelinus sp. reference genome49 with BWA mem51, and the output SAM files were filtered for primary reads in a proper pair with minimum mapping quality (MAPQ) of 30, using the view command of samtools (version 1.852). A catalog of loci was then generated from the filtered sam files using pstacks (minimum depth of coverage, -m of 3) and cstacks (using the –aligned and –gapped options). Reads were matched to the catalog with sstacks (again using the –aligned and –gapped options) and a VCF file was generated with the populations command. The Stacks workflow described here enables reference-informed variant calling, similar to the workflow described above for the range-wide RADseq dataset. We used VCFTools (version 0.1.12a53) to filter the dataset exported from sstacks by removing individual genotypes with a depth of less than seven, and then removing SNPs with more than 20% missing data. After filtering for a minimum minor allele frequency (MAF) of 1%, we used VCFTools to filtered loci for FIT between −0.2 and 0.5, to remove potentially paralogous false positive SNPs, while still allowing for substantial population structure.

SNPs were called from the RNAseq data following Illumina adapter trimming with Trimmomatic (options: ILLUMINACLIP 2:30:10 LEADING:5 TRAILING:5 SLIDINGWINDOW:4:14 MINLEN:36; version 0.3654) and “gap-aware” alignment to the Arctic Charr reference genome with STAR (version 2.5.3a55). Prior to indexing the genome with STAR, a gtf file was generated from the annotation file using the gffread program (version 0.9.12) from cufflinks56. The key parameters used with STAR and alignment step were sjdbOverhang of 100 for indexing and, for alignment, outFilterMismatchNoverLmax = 0.10 and outFilterIntronMotifs = RemoveNoncanonicalUnannotated. The resultant sam files were converted to bam, sorted, and indexed with samtools. For SNP calling, uniquely mapping reads (MAPQ of 255 from STAR) were fed into BCFtools (version 1.957) command mpileup with the skip-indels option invoked to produce a VCF file containing the allele depth for each putative SNP for each sample. The resultant VCF was weakly filtered via BCFtools view command for putative SNPs having an average depth per individual of 2 and a minor allele depth across all of the samples of at least seven, and then input into BCFtools call with the options: variants-only, skip indels, multiallelic-caller, and prior = 0 (i.e., no prior). The resultant SNPs were then sequentially filtered for those with an average depth per sample of 9, a minimum MAF of 2%, no more than 10% missing data, biallelic, a quality score of 999, having more than 99.9% of reads on one strand, and an FIT between −0.2 and 0.5 to remove potentially paralogous false positive SNPs, resulting in 130 total individuals genotyped. The RADseq and RNAseq generated SNP sets were then merged, removing redundancies, with a custom awk script (available upon request).

PCA and PCAdapt

Multidimensional Scaling (MDS) analysis was performed with PLINK (version 1.0758) on the merged SNP dataset. Outlier SNPs (q < 0.1) putatively under selection were identified with PCAdapt (version 4.3.159) with K = 4 chosen from the generated scree plot.

Differential gene expression in response to heat stress

Gene counting was performed on the sorted, indexed bam files from STAR alignment using the count.py program from HTSeq (version 0.6.160) and then the gene counts were merged across technical (sequencing) replicates with a custom awk script and read into the R package DESeq2 (version 1.24.061) via the DESeqDataSetFromHTSeqCount function. Exploratory PCA analysis with DESeq on the variance stabilized read counts identified three outlier samples (one sample from each of three lakes) which were excluded from further analyses (excluded due to anomolous results on PCA axes and/or poor alignment). A composite factor “TreatDayLake” was generated indicating the treatment (hot vs. cold), the duration in days (3, 9, or 15) and the lake of origin (East, Wilmurt, Panther, Rock) of each sample and then the counts were fit to a negative binomial model with the design “~Library + TreatDayLake”. Exploratory analysis indicated a strong library effect (driven by strong differences in Library #2) but no discernable effect of the additional covariates sex and age; hence, to minimize overfitting, sex and age were excluded from the negative binomial model. Differential expression analysis was performed between the hot vs. cold treatment within each lake and time and differentially expressed genes with an adjusted p-value less than 0.01 were identified. Lake stratification status (strat) vs. treatment (treat) effects were explored for each time point by fitting a negative binomial model with design “~Library + strat + treat + treat:strat” and then performing a likelihood ratio test vs. the reduced model without the interaction term (~Library + strat + treat) for each gene. Variance stabilized read counts were generated for genes with at least 5 reads per individual, with the library effect removed by using the removeBatchEffect function from limma (version 3.40.662). The variance stabilized read counts and differentially expressed gene lists (DEGs) from each contrast were used for downstream analyses.

GO enrichment

Tests for enrichment of gene ontology (GO) terms in each set of DEGs (p-adj < 0.01) between the hot vs. cold treatment for each timepoint for each lake of origin were performed with the R package TopGO (version 2.34.063) using the GO annotations provided by Christensen et al.49. The DEGs were split into up and down regulated sets prior to enrichment analysis. The reference set of genes consisted of the 28,204 genes with at least 5 reads per individual on average. Each of the three GO categories, Biological Process (BP), Molecular Function (MF), and Cellular Compartment (CC) were tested. GO terms with p < 0.001 in the Classic Fisher test were considered significantly enriched.

WGCNA and GO enrichment of WGCNA

We identified groups of highly correlated genes using the variance stabilized read counts and a weighted gene correlation network analysis (WGCNA R package version 1.6864). A power of 6 was used for adjacency and TOMsimilarity, the hclust method was set to “average”. We used a deepSplit of 2 and minClusterSize of 30 for cutreeDynamic. Similar modules were then merged based on the distance between their eigengenes: close modules were merged with a cutHeight of 0.25. Pearson correlations between module eigengenes and traits were examined for the following traits: hot, hotDays, coldDays, tankDays, stratified, East, Panther, Rock, Wilmurt, age, and female. Most of these are indicator variables (1 or 0), whereas hotDays is the number of days of heat exposure (0 for cold fish), tankDays is the number of days in a tank (3, 9, or 15), and age is the estimated age. GO category enrichment was performed with TopGO for each module using the set of genes with highest module membership, split into those with the highest positive or negative correlation with the module eigengene. For modules with at least 100 genes, either the top and bottom 10% of genes were used for GO category enrichment, or the top and bottom 50 genes, whichever was larger. For modules with fewer than 100 genes, all module genes were used, split into positively and negatively correlated. GO terms that were significant at the p < 0.001 were output for each module and direction for all three GO categories (Biological process, molecular function, and cellular component).

eQTL analysis

We analyzed the data to identify eQTLs using the variance stabilized counts from DESeq2, including all genes with total raw counts of at least 20. We used the 18,881 SNPs with a MAF of at least 5% in the eQTL analysis, after imputation (4.95^ of genotypes imputed) with Beagle 4.165) with a window size of 200 and an overlap of 50. Associations between gene expression levels and SNP genotypes for the 130 individuals with the SNPs were detected using the EqtlAssociationPlugin from TASSEL (version 5.2.5066) with MaxPValue set at 0.05. The EqtlAssociationPlugin is a highly computationally efficient implementation of the Matrix eQTL method of Shabalin67. Library, lake, sex, age (age 1 or not), heatDays (with all control treatments = 0), and coldDays (with all hot treatments = 0) were included as co-variates in the eQTL analysis. Additional hidden factors in the data were detected using PEER (version 1.068) with K (the number of hidden covariates) set at 34 (=25% of the sample size of 136 fish remaining in dataset after removing the three outliers). The PEER model converged after 290 iterations.

The significant eQTL hits were split into cis and trans acting SNPs with a custom awk script (cis defined as SNPs within 5 Mb of the start or end of the associated gene and trans defined as >5 MB away, scripts available upon request). We then used TreeQTL (version 2.069) for hierarchical control of the multiple testing error rate. In the TreeQTL analysis, the number of base pairs defined as “nearby” was set to 5e6. eQTL hits were considered as overlapping with differentially expressed genes if the q-value of the eQTL association was less than 0.01 and the associated gene was differentially expressed with p-adj less than 0.01.

Reporting summary

Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.