Introduction

Endemic species worldwide face rapid environmental change resulting from various types of human activity. For example, human-mediated translocation of species into new environments promotes novel ecological interactions, often with detrimental effects1. The consequences can be dire: species introductions have been identified as an underlying cause of at least 170 animal extinctions2. In some invasive-endemic interaction cases, standing genetic variation may provide endemic populations with the raw materials needed for quick adaptative responses3. Yet even then, such adaptations can induce cascading effects on the broader ecosystem4,5,6,7. Therefore, understanding the process of genetic adaptation to novel species interactions will better inform us of the scope of potential ecological impacts related to human behavior.

In the 1930s, the red imported fire ant, Solenopsis invicta, was accidentally transported, by humans, from South America (presumably northeastern Argentina) to Port Mobile, Alabama, in the United States of America8. Since then, fire ants have established and steadily expanded their range in the Southeastern USA, with both economic and public health impacts arising from their potent venom and aggressive nature9. Furthermore, ecological impacts stemming from fire ant invasion are marked and diverse. In the United States, fire ants outcompete and displace native ants, with cascading effects on the broader invertebrate community10,11,12. Experimental studies have confirmed that fire ants directly and indirectly impact various endemic small vertebrates. Specifically, predation by fire ants led to a reduction of up to 66% of Ambystoma salamander populations within 48 h13, whereas fire ant suppression led to significant increases in small vertebrate abundance14. Meanwhile, fire ant disruption of arthropod communities led to a 10% decrease in the number of eastern bluebird fledglings and to the displacement of adult birds15.

Among impacted vertebrates, the eastern fence lizard Sceloporus undulatus has been an important model system for studying the ecological impacts of this fire ant introduction. First, despite their toxicity, fire ants have become a novel prey item in the fence lizard diet16. Second, fire ants are a novel predator of fence lizards: swarming worker ants can envenomate and kill juvenile and adult individuals17, as well as prey on their eggs18. The predatory impact is likely very high: experimental removal of fire ants from enclosures increased fence lizard recruitment by ~60%19, while lizard hatchling survival is negatively associated with fire ant mound density20. Even lizards that initially survive fire ant encounters experience a 20% increase in mortality rate over the 11 weeks post-exposure relative to unexposed lizards21. Yet, fence lizards remain abundant in fire ant-invaded habitats, prompting behavioral, ecological, and evolutionary investigations into the underlying mechanisms that allow so.

Exposure to fire ant venom induces wide-ranged immunological responses in fence lizards22,23. Prior to venom exposure, however, behavioral defenses are also employed by fence lizards against swarming fire ants. These defenses include body-twitching—using their hind legs to directly remove ants with a flicking motion—and fleeing from the attack17. In behavioral trials, the success of these tactics was positively linked to hind limb length (relative to body size), with a ~20% longer relative hind limb resulting in the removal of ~30% more fire ants17. A morphological study of fence lizard museum specimens that predate the fire ant introduction revealed a latitudinal cline where lizards from the southern extreme of their range had ~5% smaller relative hind limb lengths than those in the northern extreme (a difference of ~14° in latitude), a trend likely driven by temperature and precipitation gradients24. However, a different pattern is observed in the present day, following the fire ant introduction: relative limb length is greater in populations with the longest history of invasion, a pattern that could be explained by recent evolutionary change in response to fire ants24. A positive, albeit non-significant, correlation between mother and offspring relative limb lengths suggests that this trait is likely heritable in fence lizards17, as it is for other lizards25,26. Together, these observations suggest that relative limb length in fence lizards may have been and could continue to be a viable target for natural selection in response to novel interactions with fire ants.

In this work, we combine a fence lizard limb length variation genome-wide association study with an evolutionary genomic comparison of one lizard population with long-term exposure to fire ants versus two unexposed populations. We identify genetic alleles associated with longer limb lengths that are highly differentiated between exposed and unexposed populations. Signatures of positive selection exclusive to the exposed population overlap genes with immune system, growth factor pathway, and morphological development gene functions, a pattern that may reflect a history of adaptation to fire ant invasion in the exposed population. However, our results and conclusions should be considered preliminary pending replication in additional populations (including with confirmation of parallel adaptive responses).

Results

In this study we sought to identify potential fire ant invasion-related genomic signatures of adaptation in eastern fence lizards. To do so, we analyzed whole-genome sequencing data for a total of n = 420 lizards from three sampling sites (Fig. 1A)—two sites were naïve to fire ants (in Arkansas, n = 19, and Tennessee, n = 20) and one had a long history of fire ant exposure (66 to 77 years in Alabama, n = 381). We first generated moderate-coverage sequencing data (average of 13.01X sequence reads per site, per individual) from n = 20 individuals from each site (total n = 59) for population history and evolutionary genomic analyzes. We also generated low-coverage (4.39X) sequencing data from an additional n = 361 individuals from the fire ant-invaded site in Alabama (total n = 381 when including those individuals also in the moderate-coverage dataset) to identify any significant genotype associations with relative hind limb length.

Fig. 1: Invasive fire ant distribution and fence lizard population structure.
figure 1

A Reported detection and quarantine of the red imported fire ant Solenopsis invicta in the Southeastern US since the 1930s27, along with sampling sites, sampling periods, and number of collected eastern fence lizards, Sceloporus undulatus. Clip art reproduced from PhyloPic (CC0). Photo by Tracy Langkilde. Map generated by the R package maps. B Principal Components Analysis and mean Weir & Cockerham FST values for the three pairwise population comparisons. C. Genetic distance-based neighbor-joining tree analysis. D. Admixture analysis for K = 2 and K = 3 ancestral groups. K = 3 yielded the lowest rate of cross-validation error (Supplementary Fig. 2). Source data are provided as a Source Data file.

The lizards from the fire ant-exposed population were collected between 2006 and 2017 from the Solon Dixon Forestry Education Center, in Andalusia, Alabama (AL). Fire ants were recorded and this site was quarantined in the early 1940s27. Estimates of eastern fence lizard generation time vary from 1.15 to 2.24 years28; thus, the approximate 70 years of coexistence with fire ants may have encompassed 31 to 60 generations of fence lizards. The other two sampled populations had no record of invasion by fire ants at the time of sampling. These lizards were collected from St. Francis National Forest, Arkansas (AR) in 2006, and from Edgar Evins State Park, Tennessee (TN) between 2011 and 2013.

We mapped sequence reads from the moderate-coverage dataset to the fence lizard 1.0 reference assembly29. After quality control and filtering (see “Methods”), we identified a total of 46,934,027 SNPs across the three populations, though only 2,249,567 were variable in all three populations. The AL population was the most genetically diverse (AL genome-wide mean pairwise nucleotide distance in 100Kb windows, π = 0.323%; TN π = 0.241%; AR π = 0.196%) and also had the largest number of private variants by far (AL = 24,705,862; TN = 7,360,015; AR = 5,985,756). The number of SNPs variable in two of the three populations were 5,975,685 for TN and AL; 3,452,917 for AR and AL; and 4,378,018 for TN and AR. In the AL population, the presence of such a large proportion of rare genetic variants—including 10,507,067 singleton or doubleton SNPs—is consistent with a recent, large-scale demographic expansion30,31, even in the face of fire ant invasion.

We used several descriptive approaches to examine genetic relationships among the three populations. First, we examined patterns of pairwise population differentiation using the FST statistic. For this analysis, we restricted the set of SNPs to loci that were either variable in both populations, or for which the minor allele in one population was fixed in the second. Average FST values were AL-TN = 0.210 (16,075,146 SNPs), AL-AR = 0.141 (33,225,824 SNPs), and TN-AR = 0.223 (16,383,584 SNPs; Fig. 1B).

Next, we restricted the set of SNPs to loci that were variable between at least two of the three populations (11,437,455 SNPs). Using this dataset, the three populations clustered independently based on results from a principal components analysis (Fig. 1B; results were qualitatively equivalent when using all 46,934,027 SNPs; Supplementary Fig. 1). We then constructed a genetic distance matrix-based neighbor-joining tree, in which individuals from each population were distinctly separated (Fig. 1C). Lastly, we used ADMIXTURE32 to compute model-based estimates of individual ancestry. When specifying K = 3 populations (which has the lowest cross-validation error; Supplementary Fig. 2), cluster membership proportions for members of all three populations were distinct (Fig. 1D). With K = 2, AL and AR membership proportions were distinct, with TN individuals exhibiting a mix of the two.

Taking all these results together, despite the Mississippi River being a putative biogeographic barrier for AR lizards, we did not observe a clear, strong genetic separation between AR vs. AL + TN, precluding the definitive assignment of any population as a true outgroup for downstream evolutionary analyzes. For this reason, we focused on population-specific and AL vs. (AR and TN) statistics.

Signatures of selection exclusive to Alabama lizards overlap morphology- and immune system-related genes

To identify candidate regions for recent positive selection in each lizard population, we used three different population genetic approaches: Tajima’s D33; saltilassi34, and LSBL (locus-specific branch length35;. If selective pressures imposed by fire ants resulted in recent genetic adaptation in the fire ant-exposed AL population, then we would predict the commensurate signatures of selection reflecting such adaptations to be absent in the two northern populations naïve to fire ants.

Tajima’s D compares the mean number of pairwise genetic differences to the sample size-corrected number of variable sites in a population across a given genomic region. Low Tajima’s D values (an excess of rare alleles) may reflect recent population expansion or positive selection30,31. With genome-wide deviations from D = 0 most likely reflecting past demographic history, we considered genomic regions containing Tajima’s D values in the lowest 0.5% of all 100 Kb regions for a given population as positive selection candidates. For AL we identified n = 143 such regions (D < −1.81; Fig. 2A). For the two uninvaded populations, AR and TN, we identified n = 98 (D < −2.11) and n = 97 (D < −1.9) Tajima’s D candidate selection regions, respectively (Supplementary Figs. 34).

Fig. 2: Signatures of genetic adaptation in fence lizards from the fire ant-invaded population in Alabama.
figure 2

A Top: genome-wide Tajima’s D in 100Kb windows with a 20Kb step. Candidate selection regions fall below the 0.05th percentile cutoff (D < -1.81). Middle: genome-wide saltilassi statistic for sweeping haplotypes34. Candidate selection regions fall above the 99.5th percentile cutoff (Λ > 92.24). Bottom: locus-specific branch length35 means for the AL population in 50Kb windows and a 10Kb step. B Number of genes overlapping or nearby (± 25Kb) candidate selection regions for each of the three selection statistics. C Functional enrichment analysis for genes overlapping candidate regions under selection for Tajima’s D. Plausible adaptations to selective pressure from fire ants include variants in genes for myosin, collagen, growth factors, and complement immune response. D Selected multi-signal candidate selection regions, with scores for the Tajima’s D and saltilassi statistics and annotated positions for relevant genes. Horizontal lines represent the same significance thresholds in (A). Source data are provided as a Source Data file.

The saltilassi statistic computes likelihood ratio tests on the haplotype frequency spectrum of a given population to identify haplotypes with high frequencies compared to the genome-wide expectation. This statistic implements a model that allows for one or more sweeping haplotypes (i.e., hard or soft sweeps), making it ideal to test the hypothesized rapid adaptation of invaded fence lizards. The ad hoc null distribution in saltilassi is generated from the genome-wide average haplotype frequency spectrum, thus not requiring a linkage map for its calculation. This feature makes saltilassi valuable for studies of non-model taxa without this information, as in our case. To identify candidate regions, we considered regions in the top 0.5% distribution of saltilassi’s Λ statistic (see “Methods”). For AL: Λ > 92.24 and n = 1582 regions; Fig. 2A; For AR: Λ > 407.56 and n = 566 regions; for TN: Λ > 252.27 and n = 764 regions (Supplementary Figs. 56).

Finally, we used the LSBL statistic35 to identify variants with frequencies highly differentiated in one population relative to each of the two others, based on all pairwise FST values (LSBLA = (AB FST + AC FST  – BC FST) / 2). LSBL is the precursor of the Population Branch Statistic (PBS), with the difference that FST’s are not log-transformed. This feature allowed us to retain sites with FST = 0 (i.e., not log-transformable) in a given pairwise comparison, which otherwise would exclude SNPs in our dataset that are invariant in two of the three populations. Here, we considered regions as candidates for positive selection if at least 3 SNPs in the top 0.1% of the LSBL distribution were observed within 50Kb of one another and in high linkage disequilibrium (r2 ≥ 0.9). In total, we identified n = 2,210 candidate regions (LSBL > 0.76) for AL (Fig. 2A), n = 2,960 regions (LSBL > 0.74) for TN, and n = 1,686 regions (LSBL > 0.78) for AR (Supplementary Fig. 78; Supplementary Data 1-3).

We initially focused our analysis on genomic regions that were flagged as candidates for a history of past selection by at least two of the above approaches for a given population, resulting in a dataset of 42, 35, and 24 multi-signal selection candidate regions for AL, TN, and AR, respectively (Supplementary Data 4). None of these candidate regions overlapped between populations. We used g:Profiler36 to perform functional profiling enrichment analyzes to identify known biological and molecular functions and pathways (see “Methods”) significantly overrepresented among the set of genes overlapping or nearby (± 25Kb) the candidate selection regions for each population.

For the fire ant-invaded population, AL, genes within the multi-signal selection candidate regions were significantly enriched for multiple anatomical structure and immune system functional categories (Supplementary Data 5), including Myofibril (GO:0030016; 7 observed vs. 1.21 expected genes; Fisher’s one-tailed test, FDR adjusted = 0.011) and Complement and coagulation cascades (KEGG:04610; 3 observed vs. 0.18 expected genes; FDR = 0.042). One of the multi-signal candidate regions that contains myofibril-related genes overlaps a myosin gene cluster on chromosome 2 (Fig. 2d). The different myosin proteins are responsible for the distinct contractile properties across muscle cells37,38. The complement immune system genes of interest include C8A and C8B (Fig. 2D), which are directly involved in the membrane attack complex (MAC). Nucleated cells targeted by the MAC can undergo autoimmune and inflammatory processes through the secretion of proinflammatory proteins such as IL-β and IL-1839,40, making genetic variants within or nearby these genes plausible targets for natural selection by fire ant venom exposure.

Lastly, another locus of interest (for Tajima’s D and saltilassi) is a 411Kb region in chromosome 1 that overlaps two genes – COL4A1 and COL4A2 – involved in the production of collagen IV (Fig. 2D). Collagen IV is the main component of the cellular basal membrane and is responsible for skeletal muscle stability41,42. In the green anole Anolis carolinensis, skeletal muscle is the tissue with second highest expression of COL4A1 and the third highest expression of COL4A243.

Among regions identified in the Tajima’s D analysis (but not necessarily as outliers with the other two statistics), growth and growth factor-related pathways were significantly overrepresented in the invaded AL lizard population (Supplementary Data 6). Significant enrichments included regulation of insulin-like growth factor transport and uptake by insulin-like growth factor binding proteins (REAC:R-MMU-381426; 7 observed vs. 1.12 expected genes; FDR < 0.01) and osteoblast signaling (WP:WP238; 2 observed vs. 0.1 expected genes; FDR = 0.042).

For the TN and AR populations, naïve to fire ants, genes overlapping candidate regions for positive selection in at least two statistics showed no functional enrichments analogous to those in the fire ant-invaded AL population. In TN (Supplementary Data 8), the most significant enrichment was for ovarian infertility (WP273; FDR = 0.021), while there were no significant enrichments for AR. However, when considering only the saltilassi results, there was a functional enrichment for negative regulation of developmental growth in AR (GO:0048640, 6 observed vs. 0.28 expected genes; FDR < 0.001) and analogous terms (Supplementary Data 12).

Genomic associations with lizard limb length in the fire ant-invaded population in Alabama

To uncover genomic associations with relative limb length variation, we brought low coverage (average 4.39X sequence coverage per individual) whole genome sequence data from an additional n = 361 lizards from the same population in Alabama into our study. We first used the moderate-coverage data for the 20 AL individuals (described above) to improve genotyping rates for the 361 low-coverage genomes via genomic imputation44. Following QC and filtering, our dataset consisted of 4,245,544 SNP genotypes for n = 381 total AL lizards, each with limb and body length phenotypic data available. We tested for genetic associations with relative limb length separately for each SNP using linear models, with individual sex and the first four components of a principal components analysis (to account for population structure) as covariates (see “Methods”). Using this procedure, we identified a total of n = 24 genomic loci with significant (P < 1e-6) genotype-phenotype associations (Fig. 3A; Q-Q plot available on Supplementary Fig. 9).

Fig. 3: Genome-wide genotype associations with fence lizard hind limb length.
figure 3

A Genome-wide associations with relative hind limb length (residuals of hind limb length on snout-to-vent length with sex and PCs 1-4 from a Principal Components Analysis of the genome-wide SNP genotype data as covariates) for n = 381 individual lizards. The green line indicates a significance cutoff of P < 5e-8 (four genomic loci) and the gray line a significance cutoff of P < 1e-6 (24 total loci) for two-sided linear regressions. B GWAS results for SNPs at the ARHGAP44 gene locus, with vertical bars representing exons, and arrows representing transcriptional direction. C. Representative SNP immediately upstream of ARHGAP44 and its association with relative limb length (β ± SE = 0.11 ± 0.02, t = 5.83, p = 2.75e-8). Horizontal lines represent the 25th, center, and 75th percentiles. Upper and lower whiskers represent the largest and lowest values respectively, within 1.5 times the interquartile range above and below the 75th and 25th percentile, respectively (n = 381 individuals). D. For this same SNP, the negative-effect allele is fixed in the fire ant-free TN and AR populations, while the limb length positive-effect allele is abundant in the fire ant-invaded AL population. P-values in panels (A, B, and C) are for two-sided tests with no adjustments for multiple comparisons. Source data are provided as a Source Data file.

A gene ontology enrichment analysis on the set of genes overlapping or nearby (±25Kb) these regions (Supplementary Data 13) revealed an overrepresentation of signaling by epidermal growth factor receptor pathway genes (REAC:R-MMU-177929; 2 observed vs. 0.03 expected genes; FDR = 0.049). The two genes, both members of the protein tyrosine phosphatase family, are PTPN11 on chromosome 7 and PTPN12 on chromosome 5. In humans, mutations in PTPN11 have a strong association with the Noonan syndrome, characterized by short stature and skeletal malformations45.

We did not observe more overlaps than expected by chance between the GWAS regions and our AL lizard candidate positive selection regions (permutation analysis; for Tajima’s D: P = 0.31; Supplementary Fig. 10; for saltilassi, no overlaps). However, two loci were both strongly associated with relative limb length and contained or were in proximity to alleles that were highly differentiated between AL and the fire ant-naïve AR and TN populations. The first locus overlaps PTPN11 (chromosome 7), discussed above as part of the enrichment for signaling by epithelial growth factor receptor (Supplementary Fig. 11). The second locus overlaps ARHGAP44 (chromosome 2), which belongs to the gene family of Rho GTPase-activating proteins. These proteins interact with insulin-like growth factors and the CREB transcription factor to modulate body size during embryonic development46. Here, three non-coding SNPs, located between 2747 and 3711 bp upstream of ARHGAP44 exon 1 (Fig. 3B), are strongly associated with relative limb length (β = 0.11, P = 2.75e-8; Fig. 3C). For these SNPs, the alleles with positive effects on relative limb length are abundant (> 84% frequency) in the AL population yet absent in TN and AR (Fig. 3D). The orthologous region in humans overlaps an annotated alternative splicing isoform of ARHGAP44 (AS-1).

Discussion

The adaptive potential of natural populations when faced with the sudden and pervasive impacts of anthropogenic activity is uncertain. Understanding these fundamental drivers of ecosystem change is crucial—especially in an era of unprecedented human activity and climate change. Our study focused on endemic fence lizards, who have co-existed with introduced fire ants for more than 70 years in parts of the Southeastern United States. We sampled lizards from a long-invaded site in Alabama and identified multiple genomic signatures of positive selection that were generally absent from two lizard populations naïve to fire ants, in Arkansas and Tennessee.

Putative adaptations to fire ants

The set of genes contained within or nearby the candidate signatures of positive selection that we identified in lizards from the fire ant-invaded AL population were significantly enriched for morphological development functions. Two additional loci —which respectively overlap and are immediately upstream of genes involved in body growth processes—contain alleles that are simultaneously strongly associated with longer legs in AL lizards and occur in regions of high genetic differentiation relative to the fire ant-unexposed TN and AR populations.

The natural history of the fire ant invasion—with introduction in Southern Alabama followed by range expansion from there —precludes us from disentangling biogeographical vs. invasion-related evolutionary effects on the patterns of genetic variation in fence lizards that we observed, in the absence of further experimental confirmation. That said, given that latitudinal patterning of fence lizard hind limb length variation is reversed from prior to fire ant invasion versus today24 and that longer hind limbs are associated with increased survival against attacking ants17, the genetic variants detected in our study may plausibly constitute some of the adaptations underlying fence lizard limb length variation.

We also identified multiple signatures of positive selection for variants within or near two genes associated with membrane attack complex inflammatory processes39,40. Inflammatory processes are a key component of the immunological response to cellular damage from venom toxins47,48,49,50. Remarkably, S. undulatus field and laboratory experiments have previously detected associations between fire ant venom and complement immune activity22,23, which includes the membrane attack complex. Specifically, field-caught lizards from AL had significantly lower levels of complement activity relative to northern, unexposed lizards23. Separately, lizards from a TN population naïve to fire ants showed higher levels of complement activity immediately after consuming fire ants, as well as three weeks post-fire ant stinging22. Therefore, it seems likely that the complement immune system is involved in both routes of exposure to fire ant venom, albeit with temporally distinct signals for each.

Given this context, the findings from our study lead us to suggest that population-level differences in complement immune activity may in part be a product of recent evolutionary response to the introduced fire ant. Specifically, we hypothesize that variants within the C8A and C8B membrane attack genes candidate selection region that we detected in the fire ant-exposed AL population but not the fire ant-naïve TN and AR populations may underlie an adaptive suppression of the innate complement system. If constant consumption of fire ants triggers chronic, costly inflammatory responses that are of limited immunological benefit, then it follows that such inflammatory suppression will be sparing of resources otherwise diverted from other physiological processes51. Such a tradeoff between immune response and energetic resources has been demonstrated in poultry52, house sparrows53, and the tobacco hornworm54.

Community-scale consequences

Across ecosystems, evidence for rapid adaptation to human-induced environmental changes continues to build6,55,56,57. A few illustrative examples include the water flea Daphnia showing signals of adaptation for resistance to salinization derived from human activity58, and independent populations of crested anoles (Anolis cristatellus) carrying signatures of convergent genetic adaptation to urbanization59. Furthermore, human-commensal house sparrows (Passer domesticus) have signatures of positive selection overlapping an amylase gene, which encodes an enzyme involved in starch digestion60, and poaching of African elephants for ivory appears to have resulted in rapid adaptation for a tuskless phenotype61.

Although the above examples and our results in fence lizards following the introduction of fire ants contribute to the growing evidence of successful native species adaptations in response to anthropogenic change, the species-specific and community-wide ramifications of rapid adaptation in a keystone species remain unknown4,6,62. For fence lizards, while fire ant predation is at least partly counteracted by longer hind limbs combined with a twitch/flee response for fire ant removal and escape, such behavior is likely accompanied by a break in crypsis. Field surveys in an Alabama fence lizard population have demonstrated that close to 50% of male lizards show signs of injuries – a two-fold increase from fire ant-naïve populations – speculatively due to increased detection by birds of prey63. This conundrum likely represents a delicate balance between context-dependent antipredator responses64 faced by fence lizards since fire ant introduction, exemplifying the broader community ramifications of adaptation to human activity.

Meanwhile, if it is true that fence lizards from fire ant invaded sites, such as in Alabama, have adapted to exploit fire ants as a novel food resource, as our results suggest, then this may have significant cascading effects on that ecosystem’s food web. The voracious predatory behavior of fire ants has been broadly demonstrated; they can effectively prey on higher-level consumers such as salamanders13, bird nestlings65, cotton rats66, and hatchling sea turtles67. Consequently, with potential adaptations to include fire ants in their dietary niche, fence lizards may indirectly assimilate biomass otherwise unavailable. The ability to exploit this nutritional input may be one factor underlying a recent, rapid population expansion of AL fence lizards, evidenced by an abundance of rare alleles observed in our analysis. Such food web disruptions by invasive species are to be expected68,69. However, less discussed is how standing genetic variation can capitalize on rapid anthropogenic change for an ecological advantage.

Potential limitations

As discussed above, the inter-population evolutionary genomic patterns we detected are consistent with adaptive responses to selective pressures imposed on native fence lizards by invasive fire ants. Yet the specific nature of the fire ant invasion in the Southeastern USA—originating in Alabama and subsequently expanding outward from there—limits the ability to distinguish conclusively between fire ant invasion or confounded environmental factors as ultimate selection drivers for the signatures of past positive selection.

Expanded geographic sampling and population genomics work could aid this effort, even within the confines of this natural experiment. Specifically, if there is evidence of both parallel and recent selection-driven increases in the frequencies of standing genetic variants disproportionately located nearby immune system, growth factor pathway, and morphological development genes across multiple fire ant-invaded sites (and not observed at sites without fire ants), then this result would more strongly support the adaptive response to fire ant invasion hypothesis. If the selection events were instead not recent, then that result would favor a non-fire ant environmental explanation. If the signatures of selection are instead observed only in the Alabama population (and not at other fire ant-exposed sites), then local environmental factors may explain the results.

Our sample size for our genotype-phenotype association analysis (n = 381) is also relatively small for a GWAS; we were thus underpowered to identify smaller-effect variants. A marked sample size increase would benefit this component of the study. Other phenotypes exhibited predominantly by exposed lizards and relevant to fire ant defense, such as propensity to flee and body twitch rather than freeze17, also warrant genome-wide association studies.

Summary

In conclusion, our study identifies genetic signatures of positive selection in fence lizards exposed to human-introduced fire ants. These plausible adaptations to the fire ant introduction are observed in conjunction with a recent and large-scale population expansion in fence lizards, inferred from our population genomic data. Together, our findings highlight the potential of standing genetic variation in promoting population resilience in the face of anthropogenic disturbance.

Methods

Animal sampling

Permits for animal collections were granted by the Alabama Department of Conservation and Natural resources, the Arkansas Game & Fish Commission, and the Tennessee Department of Environment and Conservation, for each collection year. Between 2006 and 2017, adult male and female lizards were collected from the three study populations using the lasso method. The AL population was included in the study due to its long history with introduced fire ants. The individuals from the TN and AR populations were included because these sampling locations had not yet been invaded by fire ants at the time of collection. Phenotypic data were collected from the n = 381 sexually mature lizards in our AL population sample: hind limb length (HLL) and snout-to-vent length (SVL) were each measured to the nearest 0.5 mm using a ruler following a protocol described in ref. 17.

DNA extraction

All the fence lizard toe and/or tail tissue samples were stored in 70% ethanol at 4°C. We used up to 30 mg of each of the 421 preserved fence lizard tissue samples for E.Z.N.A.® tissue kit (D3396, Omega Bio-Tek, Inc., Norcross, GA, USA) DNA extractions. DNA extractions were performed following the manufacturer’s instructions with the following exceptions: each tissue sample was ground with a polypropylene pestle in a 1.5-mL microcentrifuge tube; total digestion time was increased to 14–15 h in a 600 rpm shaking thermomixer; 1 μL of Pellet Paint® NF Co-Precipitant was added to each sample to increase DNA adherence in the HiBind® DNA Mini Column; and the total elution volume was halved (two 50-μL portions). Each sample’s DNA extraction concentration was obtained with a Qubit® 3.0 Fluorometer dsDNA High Sensitivity Assay Kit, and then stored at −20°C until library preparation.

Library preparation and sequencing

Portions of each DNA extract were sheared to a target length of 500 bp with a Covaris M220 Focused-ultra sonicator (Peak Incident Power: 50, Duty Factor: 20%, Cycles per Burst: 200, Temperature: 20 °C). Libraries for each sample were prepared from ≥ 200 ng of sheared DNA with TruSeq® Nano DNA High Throughput Library Prep Kit (20015965, Illumina Inc., San Diego, CA, USA) and IDT for Illumina – TruSeq® DNA UD Indexes v1 (Illumina Inc., San Diego, CA, USA). The libraries were pooled and sequenced with a paired-end 150 bp strategy on two Illumina NovaSeq 6000 S4 flowcells for 1.3 T of paired-end raw read data each. One pool (MC-60) had 20 randomly selected AL individuals as well as the 20 lizards from the uninvaded sites TN and AR. An average of 165.24 million reads were generated for each sample in pool MC-60. The second pool (LC-381) contained the full set of 381 AL individuals, with an average 27.88 million reads sequenced. Raw sequence data have been deposited in NCBI SRA BioProject: PRJNA656311.

Read Mapping and Quality Filtering

We used a chromosome-level reference genome assembly, “PBJelly,” that was recently developed from two male S. undulatus individuals collected at Solon Dixon Forestry Education Center, AL29,70 for read mapping. The annotated reference assembly was indexed with bwa v0.7.16 index and SAMtools v1.5 faidx71,72, and a sequence dictionary was created with picard CreateSequenceDictionary (Picard Toolkit 2019) for use in read mapping, SNP identification, and downstream analyzes.

The LC-381 group reads were sequenced without lanes in their NovaSeq S4 flowcell, but the MC-60 group reads were sequenced across four lanes and needed to be combined into one forward and reverse read prior to trimming and mapping to the reference genome. The raw reads were trimmed with Trimmomatic v0.36 to remove the Illumina TruSeq3-PE-2 adapters and other reads < 36 bases long, as well as leading and trailing low quality or N bases73. The trimmed reads were aligned to the reference genome with bwa v0.7.16 mem (default settings), an alignment tool specialized for large genome sizes that seeds alignments with maximal exact matches and extends seeds with Smith-Waterman’s affine-gap penalty for insertions or deletions74.

SAMtools v1.5 flagstat was used to calculate estimated genome-wide coverage for the mapped reads, and “-view” was used to convert the mapped.sam files to.bam files72. SAMtools BAMtools v2.4.1 was used to sort and filter out unmapped reads and mapped reads with mapQuality less than 5072,75. The subset of AL (Solon Dixon) samples subject to both moderate- and low-coverage sequencing (n = 20) were merged after mapping (using samtools -merge) and were processed as part of the LC-381 pool. Duplicates were marked in all samples (AL, TN, AR) using Picard MarkDuplicates (Picard Toolkit 2019). Read groups were added to the mapped read files with Picard AddOrReplaceReadGroups (Picard Toolkit 2019), then the reads were sorted and indexed with SAMtools v1.572. Sequence metrics were collected using Picard CollectWgsMetrics (Picard Toolkit 2019). For the remaining analyzes, we found that several of the more computationally intensive programs required working iteratively at the chromosome- rather than whole-genome level to finish processing and within the limits postulated by our computational cluster system; we indicate such cases accordingly.

SNP identification

We followed the Genome Analysis Toolkit (GATK, v4.2.0.0) “Best Practices” pipeline for germline short variant discovery in each of the sequencing pools76,77,78. Even though GATK’s pipeline was designed and optimized for analyzing human genetic data, it has been successfully applied in multiple non-model systems with available high-quality reference genomes for evolutionary genomic inferences79,80,81,82,83 and it has been reported to outperform other variant callers in capability and accuracy84,85.

GATK’s pipeline began with HaplotypeCaller calling germline SNPs and indels for each individual via local de novo assembly. In short, HaplotypeCaller defined active regions based on the presence of evidence for allele variation in each individual’s mapped reads, then built a De Bruijn-like graph to detect overlaps between sequences and reassemble the active region86. The possible active regions were realigned against the reference haplotype with the Smith-Waterman algorithm to identify potential variant sites, i.e., single nucleotide polymorphisms (SNPs86;). Likelihoods of alleles were determined using GATK’s PairHMM algorithm, and the most likely genotype per Bayes’ rule was assigned to each potentially variant site.

HaplotypeCaller generated an intermediate GVCF file that contained likelihood data for every position in each of the top 24 largest chromosomes in every individual’s mapped read data. The per-chromosome GVCFs were merged and then indexed with GATK’s IndexFeatureFile program. Following the “Best Practices” pipeline, GenomicsDBImport was used to import the single-sample GVCFs into a per-chromosome database (GenomicsDB) before joint genotyping with GenotypeGVCFs (per-chromosome, subset into six intervals each for the largest chromosomes, scaffolds 1-6). The resulting chromosome VCFs were then combined with VCFtools v0.1.12 “vcf-concat”87 into one file for each sequencing pool. There were 59,006,281 possible SNP sites identified in the MC-60 group and 67,124,902 SNP sites in the LC-381 group (bcftools -stats; bcftools v1.1287.

SNP filtering and quality control

The raw SNPs were filtered with a series of thresholds recommended by GATK77. GATK SelectVariants kept only variants that were classified as SNPs, then VariantFiltration removed SNPs with hard-filters based on the INFO and FORMAT fields of the VCF files: quality score by depth (QD) < 2.0, Phred-scaled p-value using Fisher’s exact test (FS) > 60.0, and mapping quality score (MQ) < 40.0. SelectVariants was applied again to only keep SNPs that were not filtered out by VariantFiltration. After GATK filtering there remained 56,598,888 SNPs in the MC-60 group and 64,214,575 SNPs in the LC-381 group.

The SNPs that remained in each pool after GATK’s suggested parameters were additionally filtered with VCFtools v0.1.12 for analysis87 Both pools were filtered to keep only biallelic sites (min-alleles 2, max-alleles 2) and remove sites with insertions and deletions87. The MC-60 pool was also filtered for Hardy-Weinberg Equilibrium with a low enough setting to remove sites that were likely to be erroneous variant calls (hwe 0.000001), leaving 49,837,059 SNPs for analysis87. PLINK (v1.9) was also used to filter the MC-60 VCF with -geno 0.05 and -mind 0.1 flags to filter out variants with missing call rates prior to downstream population analyzes, with 46,934,026 SNPs remaining88. The LC-381 group was also filtered for Hardy-Weinberg Equilibrium (hwe 0.001) and to remove sites with a minor allele frequency (MAF, number of times an allele appears over all individuals at that site divided by the total number of non-missing alleles at that site) less than 0.05 to prevent inflation in downstream statistical estimates and during imputation with the remaining 458,533 SNPs87. For access to genomic data used in all subsequent analyzes, see Data availability.

Genotype imputation

To improve genotyping rates for the low coverage dataset of 381 individuals, we leveraged the 20 moderate coverage sequences from AL as a reference panel for genotype imputation. To that end, we first used Shape-IT version 2.r83789 to phase each of the 11 chromosomes of the 20 moderate coverage sequences and obtain haplotype files. Imputation was performed with Beagle 5.244 using a window size of 100 and overlap of 10. Because the 20 reference individuals were also sequenced in our low coverage data generation step, this provided an opportunity to assess imputation accuracy. Specifically, we removed individual i from the moderate coverage reference panel (20 – 1 = 19) to then impute chromosome 6 of individual i based only on that individual’s low coverage data. This process was repeated in turn for all 20 individuals, yielding a mean rate of allele concordance between moderate coverage genotyping and low coverage imputation of 91.3% (Supplementary Fig. 12). We also imputed all chromosomes (1–11) for one individual using this approach, resulting in a genome-wide mean concordance rate of 91.7% (Supplementary Fig. 13). These rates outperform multiple other studies on non-model organisms and simulations across various methods44,90,91,92. Because these tests derived from smaller reference panels than the one used in our actual analysis (i.e., 19 individuals rather than 20), these accuracy estimates are conservative. Prior to the genome-wide genotype-phenotype association analysis that was conducted with the n = 381 AL lizard dataset, we removed genotypes with minor allele frequencies < 0.05, leaving 4,245,544 SNPs.

Demographics analyzes

We first calculated the mean genome-wide FST between each of the three pairs of populations using the VCFtools function --weir-fst-pop87. For this analysis, we filtered the SNP dataset to remove any SNPs whereby an allele that is fixed in one population is also the minor allele in the second population (for AL/TN, SNP count n = 16,075,146; for AL/AR, n = 33,225,824; for TN/AR, n = 16,383,584). We then ran three demographics analyzes involving the three populations: a principal components analysis (PCA), admixture analysis, and a genetic neighbor-joining tree based on a genetic distances matrix. For these, the original set of 46,934,026 SNPs was filtered to remove any SNPs whereby an allele that is fixed in two populations is also the minor allele in the third population, leaving 11,437,455 SNPs. The PCA was calculated using PLINK’s –pca function88. Population structure was inferred using ADMIXTURE32 for a number of ancestral populations K of 2 through 5. The neighbor-joining tree was built using the R package phangorn93 with a sample genetic distance matrix generated with PLINK88. Subsequent within-population analyzes below were conducted on the original set of 46,934,026 SNPs.

Genome-wide estimates of Tajima’s D

For each population, Tajima’s D was estimated in windows across the genomes using vcf-kit v0.2.694 Windows were 100 Kb in length with a 20 Kb step. For each population, putative windows under recent positive selection were those in the bottom 0.5th percentile of the genome-wide distribution (i.e., D < −1.81 for AL; D < −2.11 for AR; and D < −1.9 for TN).

Genome-wide haplotype distributions via saltilassi statistic

The filtered VCF files were assessed for signatures of positive selection with lassip v1.1.1 using the “saltiLassi” method34 The VCF files for each population were further separated into a single VCF file for each scaffold. A population ID file was created containing each individual sample ID and the corresponding population IDs. Each scaffold was passed to lassip with the following parameters to calculate haplotype statistics and the haplotype frequency spectrum (HFS): --hapstats –winsize 201 –k 20 –calc-spec –winstep 100. The genome wide average of the HFS, which functions as the null spectrum, was then determined before calculating the saltilassi statistic, a likelihood ratio statistic denoted as Λ, for each chromosome. For each population, haplotypes in the top 0.5% of the genome-wide Λ (i.e., Λ > 92.24 for AL; Λ > 407.56 for AR; and Λ > 252.27 for TN) distribution were designated as candidate signatures of positive selection.

Locus-specific branch lengths (LSBL)

We used LSBL to identify genomic regions in the AL population that are significantly differentiated from the two northern populations naïve to fire ants. We first used VCFtools to calculate the per-SNP FST of the three pairwise population comparisons, and then used those values to calculate LSBL for AL as per35: LSBL = (AL-AR FST + AL-TN FST – TN-AR FST) / 2. We identified SNPs in the top 0.1% of the genome-wide distribution (LSBL > 0.76) and combined these outlying SNPs into regions that grouped all outlying SNPs within 50 Kb and in high linkage disequilibrium (LD) with each other (R2 > = 0.9) into a single region. LD was calculated using vcftools (--geno-r2)87. Next, we used the R package ivs95 to combine linked SNPs into candidate genomic regions.

Functional profiling

For each population, we investigated functional enrichments in the sets of genes located within or nearby (± 25 Kb) candidate regions for positive selection identified by any two of the three selection statistics (Tajima’s D, saltilassi, or LSBL), and separately for Tajima’s D only and saltilassi only, for each population (Supplementary Data 512). To that end, we used the g:GOSt function of the g:Profiler platform36 based on functional annotations for Mus musculus and across all available databases (i.e., GO: molecular function, GO: biological process, GO: cellular component, KEGG, Reactome, WikiPathways, TRANSFAC, miRTarBase, Human Protein Atlas, CORUM, HP). The background set of genes were all genes in the SceUnd 1.0 assembly29. We corrected for multiple tests using the False Discovery Rate (FDR)96.

Genome-wide association study

Relative limb length was calculated by extracting the residuals of a linear regression for hind limb length on snout-to-vent length. The residuals were then assigned as the response variable for all models. We used the imputed genotype dataset (see SNP Filtering and Quality Control above) for the GWAS. Each SNP was numerically coded for each biallelic genotype (0, 1, or 2) and included as a predictor in its respective model. Covariates were individual sex, to control for sex-specific morphological differences, along with the eigenvectors of a population principal components analysis of the unimputed low coverage genomic dataset, to help control for population structure (Supplementary Fig. 14). Principal components 1 through 4 were included in each model, which in combination explained a total of 23.24% variation explained. Results for each of the 4,245,544 models are available in the Source Data.

Because we verified that genotype imputation of the low coverage sequences resulted in a small rate of genotyping error (see Genotype imputation) we assessed the potential impact of this error rate on the results reported for the GWAS. Following a series of simulations, we did not observe any false-positive result on the same level of significance as that reported for locus Chr2: 129,424,718 (Fig. 3), although false-negative results may have occurred. More details can be found in the Supplementary Information, figures 1518.

Permutation analysis

To test whether candidate regions for positive selection and genomic regions associated with limb length overlapped each other more often than expected by chance, we performed permutation analyzes with the R package regioneR97. For each pairwise permutation (e.g., pairwise comparisons between GWAS and Tajima’s D, saltilassi, LSBL, and the set of regions highlighted by at least two out of the three statistics), the set of regions for one statistic was held static while regions of same length as those from the second statistic were randomly placed across the genome in each of 10,000 permutations. To constrain the available genomic space wherein regions were to be permuted, the number and length of chromosomes was specified using the argument genome in the permTest function. These random permutations generated a neutral probability distribution, and we assessed whether our observed number of overlaps significantly deviated from this distribution (α < 0.05).

Reporting summary

Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.