Background & Summary

Orestias ascotanensis is a killifish endemic to the springs of the Ascotán Salt Pan on the Chilean Andean Plateau, located at an altitude over 3,600 meters above sea level (m.a.s.l.). This environment is marked by challenging conditions for animal life, including extreme diurnal temperature fluctuations, low humidity, reduced oxygen partial pressure, and increased exposure to UV radiation1. Despite these challenges, various organisms, including snails2,3, frogs4, fishes5, birds6, and camelids7, have managed to thrive in this harsh environment. While genomics and transcriptomic technologies have shed light on molecular adaptations to high-altitude living, primarily from studies on the Tibetan Plateau8,9,10,11,12,13,14,15,16, the unique adaptations of Andean species remain less explored. The genus Orestias, encompassing species from various aquatic systems across the Andean basins in Peru, Bolivia, and Chile, exhibits remarkable adaptability to both saltwater and freshwater environments, confronting low oxygen concentrations, high UV radiation, and variable heavy metal and salt concentrations derived from South American summer monsoons17. Embryonic analysis of O. ascotanensis revealed early pigmentation development and increased blood cell production, suggesting adaptations to UV radiation and low oxygen pressure18. The recent sequencing of O. ascotanensis genome highlighted genes under positive selection to DNA repair, indicating a mechanism for maintaining genome integrity under the high UV exposure19. These unique features position O. ascotanensis as a valuable model for understanding fish adaptations to high-altitude environments.

Fish biology is intricately tied to seasonal changes, that significantly influences their reproductive strategies to align offspring production with optimal environmental conditions. Consequently, this seasonality exerts a profound influence on various aspects of fish life, including reproductive timing, body mass variations, locomotion, and immune system performance20,21,22. In temperate aquatic ecosystems, these fluctuations are mainly driven by changes in temperature and photoperiod, which are closely interconnected and cyclical in nature. For instance, during the spring, water temperatures and light periods increase while the reverse occurs in autumn. To adapt to these environmental shifts, fish undergo compensatory responses at both cellular and molecular levels. These adaptations involve changes in the proteome23 and the transcriptome, inducing the expression of both protein-coding mRNAs and non-coding RNAs (ncRNAs)24, alterations in DNA methylation patterns25, and histone modifications26. Recent studies have also identified the role of long non-coding RNAs (lncRNAs) in modulating seasonal behavioral patterns, including migration and spawning27.

By definition, lncRNAs are RNA molecules exceeding 200 nucleotides in length, that are not translated into proteins. Their role in modulating gene expression has garnered significant attention, influencing a variety of biological processes28. LncRNAs operate through diverse mechanisms: they can function as molecular guides by directing chromatin modifying enzymes to target genes, as scaffolds by forming ribonucleoprotein complexes, or as decoys by binding to and sequestering transcription factors away from their target genes29. Additionally, lncRNAs can regulate microRNA (miRNA) activity by serving as competing endogenous RNAs (ceRNAs). By harboring miRNA-binding sites, they can effectively sequester miRNAs, preventing them from repressing their target protein-coding genes and thereby modulating gene expression30. Despite the explosion of next-generation sequencing efforts that have identified thousands of lncRNAs across a wide range of organisms, including mammals, zebrafish, insects, and nematodes, the cataloging of lncRNAs in non-model fish species remains limited. Currently, lncRNA studies in fish focus on model organisms and commercially important species, leaving lncRNAs from other fish species unexplored.

Increasing evidence demonstrates the transcriptomic responses of fish to changes in temperature, photoperiod and other abiotic factors under controlled laboratory conditions31. However, field-based studies that explore these dynamics in natural settings are notably rare. Our study addresses this gap by leveraging the Ascotán Salt Pan’s distinct seasonal variations as a “natural laboratory” to examine the molecular responses of Orestias ascotanensis. We used the pronounced temperature and photoperiod contrasts between the Ascotán Salt Pan’s summer and winter – specifically in January and July –to profile both mRNA and lncRNA transcriptomes, seeking to delineate gene expression changes across tissues tied to seasonal acclimatization. We selected gills, muscle, and skin tissues for analysis, given their documented sensitivity to environmental shifts32,33,34. This choice is further justified by the tissue-specific nature and lower sequence conservation of lncRNAs compared to protein-coding mRNAs35, suggesting that examining multiple tissues can reveal both common and unique transcriptomic responses. Our multi-tissue approach offers an in-depth look at the Andean killifish’s transcriptional adaptation to environmental changes, encompassing both protein-coding and long non-coding RNAs. In this work, we obtained and released a total of ~4.38 billion raw reads across 42 transcriptome libraries from gill, skin, and muscle tissues of Orestias ascotanensis. These libraries, representing 14 individuals sampled during summer and winter, yielded high-quality data with a median of 105 million raw reads and 92.85 million mapped reads per library, achieving an overall mapping efficiency of 88.45%. The unique ecological and physiological adaptations of O. ascotanensis to its high-altitude environment make this transcriptome an invaluable resource for understanding tissue-specific and seasonal expression patterns, contributing to the study of adaptation and potentially speciation of endemic fish in the Andean Plateau.

Methods

Ethical statements and sample collection

All procedures for the collection and handling of adult Orestias ascotanensis specimens were conducted in strict accordance with ethical guidelines of Universidad Andrés Bello, under the official permission issued by this institution (‘Resolución n° 2267’) and of the ‘Subsecretaría de Pesca y Acuicultura’, granted to Dr. Rodrigo Maldonado-Agurto. Specimens were collected from the Ascotan Salt Pan (21°29′46.5″S 68°15′26.3″W) at an elevation of 3,740 meters above sea level in July 2019 (n = 7) and January 2020 (n = 7) using a fishing net. Euthanasia was performed by administering an overdose of tricaine methanesulphonate (MS-222, 500 mg L−1; Sigma-Aldrich). Tissue samples from gills, skin, and muscle were harvested on-site. Each tissue sample was individually preserved in RNAlater (Invitrogen) immediately after dissection to stabilize and protect RNA. The samples were transported on ice to the laboratory and subsequently stored at −80 °C until RNA extraction.

RNA extraction and sequencing

Total RNA was extracted from three different tissues (gills, skin, and muscle) samples obtained from each of the 14 individuals captured during either summer (n = 7) or winter (n = 7), using TRIzol (Invitrogen) following the manufacturer’s protocol. The concentration and quality of the extracted RNA were assessed using a Qubit fluorometer (Invitrogen) and a Bioanalyzer 2100 (Agilent), determining the integrity and purity of each sample. Unstranded poly-A libraries were prepared from each tissue sample using the Illumina TruSeq RNA sample preparation kit (Illumina) according to the manufacturer’s instructions. The prepared RNA samples were sent to Quick Biology (Pasadena, CA, USA) for sequencing, where they were processed on the Illumina HiSeq 4000 platform to generate 150 bp paired-end reads. A total of 42 RNA-seq libraries (14 individuals × 3 tissues) yielded approximately 4.4 billion raw reads, with a median of 105 million reads per library. These high-quality sequencing data provided robust coverage for transcriptome profiling across tissues and seasons.

Data processing

The transcriptomic data of Orestias ascotanensis were processed to measure protein-coding gene expression and to identify long non-coding RNAs (lncRNAs). Raw sequencing reads underwent quality control using FastQC36 and trimming with TrimGalore (v0.6.6; parameters: -q 30 --retain_unpaired --phred33--illumina)37 to remove low-quality bases and adapter sequences. High-quality reads were aligned to the O. ascotanensis reference genome using HISAT238 (v2.2.1; parameters: --dta-cufflinks), and coordinate-sorted BAM files were generated using samtools (v1.10)39. Alignment performance was evaluated per sample based on mapping statistics, including the number of raw reads, mapped reads, mapping percentages, and properly paired read percentages40.

Principal component analysis

To evaluate sample similarity and clustering based on protein-coding gene expression profiles, principal component analysis (PCA) was performed after quantifying transcript abundance. Gene-level expression matrices were generated using featureCounts41, and the raw counts were normalized with DESeq 242 to correct for library size and sequencing depth. The normalized expression data were then used for PCA, which revealed distinct clustering of samples by tissue type (gill, skin, and muscle) (Fig. 1a), underscoring the robustness of tissue-specific transcriptomic profiles. This analysis also validated the consistency of biological replicates within each tissue group, ensuring the reliability of subsequent differential expression analyses. Separate PCAs were performed within each tissue to examine seasonal variation (Fig. 1b–d). Muscle tissue showed the clearest seasonal separation, while gill and skin exhibited overlapping expression patterns between summer and winter samples.

Fig. 1
figure 1

Principal component analysis (PCA) of transcriptome profiles in Orestias ascotanensis. (a) PCA of all samples, colored by tissue type (gills: red, muscle: blue, skin: green). (b) Seasonal PCA for gills (summer: light red, winter: dark red). (c) Seasonal PCA for muscle (summer: light blue, winter: dark blue). (d) Seasonal PCA for skin, showing seasonal overlap (summer: light green, winter: dark green).

Tissue-specific gene expression analysis

Coding and non-coding transcript abundance was quantified using featureCounts41. To estimate differential gene expression between tissues and seasons, we used DESeq 2 integrated into the SARTOOLs package43. To visualize tissue specific protein-coding gene expression, an UpSet plot was constructed for all mRNAs with more than 50 counts in at least one tissue (Fig. 2a). The majority of expressed genes (13,261) were detected across all three tissues, indicating shared core biological functions. Muscle showed the highest tissue specificity with 763 uniquely expressed genes, while gills and skin had a lower number of uniquely expressed genes, including 568 and 199 mRNAs, respectively. Inter-tissue overlapping expression included 209 shared genes between gills and muscle, 733 genes between muscle and skin, and 821 genes between gills and skin. Differential gene expression analysis (fold change ≥ 2, adjusted p-value < 0.05) was next used to further identify genes that can be potentially relevant in these tissues. Thus, 5,277 and 2,674 genes showed significantly higher expression in gills compared to muscle and skin, respectively, with a subset of 1,981 genes exhibiting distinctly higher expression in gills (Fig. 2b). Conversely, 5,623 and 3,788 genes showed selectively higher expression in muscle compared to gills and skin, respectively, with 3,183 genes uniquely upregulated in muscle (Fig. 2c). Skin also showed distinct expression patterns, including 3,536 and 3,872 genes that were upregulated compared to gills and muscle, respectively, with 584 genes markedly upregulated in this skin tissue (Fig. 2d).

Fig. 2
figure 2

Tissue-specific gene expression analysis in Orestias ascotanensis. (a) UpSet plot showing the number of genes uniquely expressed or shared among gills, skin, and muscle tissues. The highest bar represents genes expressed in all tissues (13,261). (b) Venn diagram displaying the number of genes upregulated in (b) gills compared to muscle and skin (1,981 genes). (c) muscle compared to gills and skin (3,183 genes) and, (d) skin compared to gills and muscle (585).

Gene ontology of tissue-enriched genes

To evaluate whether the gene expression patterns reflect the functional identity of each tissue, we performed gene ontology (GO) enrichment analysis on genes exhibiting significantly higher expression in one tissue compared to the other two using Metascape44. Since the tissues were manually dissected, this analysis aimed to determine if the observed gene expression patterns correspond to the physiological roles of each tissue. In gills, tissue-enriched genes were associated with processes such as the regulation of blood circulation and ion channel activity, which are part of the functions of gills in osmoregulation and gas exchange (Fig. 3a). For muscle tissue, enriched terms included oxidative phosphorylation and muscle contraction, highlighting the transcriptional activity of genes involved in locomotion and energy metabolism (Fig. 3b). In skin, enriched processes such as melanogenesis and pigmentation, reflecting the tissue’s protective and barrier functions (Fig. 3c). These findings suggest that the gene expression profiles accurately represent the functional identity of each tissue, despite their cellular heterogeneity, and validate the use of these profiles to infer tissue-specific roles in Orestias ascotanensis.

Fig. 3
figure 3

Gene ontology (GO) enrichment of tissue-enriched genes. Heatmaps of top 20 enriched terms across genes with enriched expression in. (a) gills. (b) muscle, and. (c) skin. The color gradient of the bars reflects the p-value, with darker colors representing lower p-values (higher significance). The heatmaps were produced using Metascape.

Seasonal differential gene expression

As previously shown by the principal component analysis, muscle tissue exhibited the highest separation in gene expression patterns between the summer and winter seasons. The other two tissues displayed a significant degree of overlapping gene expression patterns. Accordingly, the number of differentially expressed genes (DEGs) (FC ≥ 2, p-adjust ≤ 0.05) between summer and winter seasons were found to be 3, 42, and 1019 in gills, skin, and muscle, respectively. In gills and skin, all the DEGs were found as upregulated in summer, while in muscle, between summer (n = 6) and winter (n = 7), 281 (28.27%) genes were upregulated in summer, whereas 713 (71.73%) genes were found to be downregulated. For the DEG analysis, the Summer Muscle sample 4 was excluded from this analysis as it had significantly fewer mapped reads (20.8%) than the other samples (appears as an outlier in the PCA analysis) (Fig. 1c).

Our subsequent GO analyses were then based on the DEGs obtained in muscle tissue as these data showed robust gene expression patterns and an evident seasonal influence. For this analysis, we sought to input the group of up- or down- regulated genes of Orestias ascotanensis into the Metascape platform. Metascape permits selecting between different species for both input and analysis, hence allowing researchers to assess genes from one organism while leveraging the functional annotations of another specie. Because Orestias data is not yet available in this platform, we selected the vastly annotated Homo sapiens information to carry out both the input and subsequent analysis. This strategy provided us with a more comprehensive GO enrichment and pathway analysis for muscle-related genes. GO term enrichment analysis of differentially expressed genes in O. ascotanensis muscle tissue during summer revealed distinct biological processes associated with seasonal variation. Upregulated genes (Fig. 4a) were enriched for processes related to hormonal regulation (‘regulation of hormone levels’ GO:0010817), secretion (‘regulation of secretion’ GO:0051046), and immune cell activity (‘granulocyte migration’ GO:0097530). In contrast, downregulated genes (Fig. 4b) were significantly associated with cellular structural changes, including ‘epithelial cell differentiation’ (GO:0030855) and ‘actin filament-based processes’ (GO:0030029), as well as key regulatory pathways such as ‘VEGFA-VEGFR2 signaling’ (WP3888). Interestingly, within this broader physiological context, we also observed a complementary pattern in the regulation of cell proliferation-related processes, with genes linked to negative regulation of proliferation being upregulated (e.g., ‘negative regulation of cell population proliferation’ GO:0008285) and those involved in mitotic control being downregulated (e.g., ‘mitotic cell cycle’ GO:0000278). These findings suggest that seasonal variation in gene expression may be linked to broader physiological adaptations in O. ascotanensis, influencing both tissue remodeling and systemic regulatory processes.

Fig. 4
figure 4

Seasonal variation in Gene Ontology (GO) Terms in O. ascotanensis muscle tissue. Top 20 enriched terms for summer. (a) upregulated and (b) downregulated genes in O. ascotanensis muscle tissue. The color gradient of the bars reflects the p-value, with darker colors representing lower p-values (higher significance). The heatmaps were produced using Metascape.

Genome-wide Identification of lncRNAs in Tissues of Orestias ascotanensis

Coordinate-sorted BAM files generated above were used as input for transcriptome assembly via the Cufflinks pipeline45, resulting in 264,422 reconstructed transcripts from gills, muscle, and skin tissues. These transcripts underwent a series of filtering steps to exclude coding sequences and other non-lncRNA features, yielding an initial set of 17,697 lncRNA candidates (Fig. 5a). Filtering began with the removal of 51 transcripts with designation “class code”. Then, transcripts that overlap with protein coding sequences were also removed using IntersectBed. Next, the remaining 105,582 sequences were filtered for similarity with proteins within the SwissProt/Uniprot database using BLASTx, performed locally using DIAMOND46. All transcripts matching proteins in this database, as determined by the specified parameters of analysis [-e 0.01 –k 5 –matrix BLOSUM62 –gapopen 11 –gapextend 1 –more-sensitive] were removed. The subsequent analysis targeted the remaining sequences (17,697) for protein coding potential, employing four different tools: CPAT47, Transdecoder48, RNAmining49, and FEELnc50 —which independently predicted 17,392, 4,878, 16,829, and 7,624 lncRNAs, respectively. A robust intersection of these predictions produced a final high-confidence 1,580 transcripts consistently identified by all four tools (Fig. 5b) and 10,365 were predicted as lncRNA by at least three of these tools. The latter group of potential lncRNAs was then utilized in the subsequent analyses.

Fig. 5
figure 5

Workflow for the Identification and filtering of lncRNA candidates in O. ascotanensis. (a) Step-by-step process for identifying and filtering long non-coding RNA (lncRNA) candidates from the assembled transcripts of O. ascotanensis using multiple computational tools. (b) Venn diagram showing the overlap among candidate lncRNAs filtered by FEELnc, RNAmining, CPAT, and Transdecoder.

Calculation of Tau scores

The Tau (τ) index serves as an indicator of the specificity or broadness of expression for a given gene or transcript within the analyzed tissues51. A higher Tau value implies a predominantly tissue-specific expression pattern, whereas a lower Tau value indicates a gene or transcript that is expressed more broadly across multiple tissues. In our study, we employed the τ index as a measure to assess tissue specificity, calculated using normalized gene expression data obtained from featureCounts41. The τ index is calculated using the formula:

$$\tau =\frac{\mathop{\sum }\limits_{i=1}^{N}\left(1-{\hat{x}}_{i}\right)}{N-1};{\hat{x}}_{i}=\frac{{x}_{i}}{1\le i\le {nmax}\left({x}_{i}\right)}$$

Here, \(N\) is the number of tissues and \({\hat{x}}_{i}\) is the expression profile component normalized by the maximal component value. The τ values range from 0 to 1, reflecting the spectrum of gene expression specificity: genes with τ values up to 0.25 are considered ubiquitously expressed across various tissues, those with τ values between 0.25 and 0.80 reflect intermediate expression patterns, while τ values equal to or greater than 0.8 indicate tissue-specific gene expression.

Characterization of lncRNAs

The lncRNAs identified in Orestias ascotanensis were further characterized to understand their expression patterns and structural features. Tissue specificity was assessed using the Tau (τ) index, which measures the degree of expression specificity across tissues (Fig. 6a). A higher τ value indicates tissue-specific expression, while a lower value reflects broader expression. Among the putative lncRNAs, 21.9% exhibited tissue-specific expression (τ ≥ 0.80), compared to 15.9% of mRNAs. In contrast, 13.5% of the lncRNAs were ubiquitously expressed (τ ≤ 0.25), compared to 22% of mRNAs, highlighting the more specialized roles of lncRNAs. Structural comparisons revealed that lncRNAs generally differed from mRNAs in key features. Most lncRNAs were monoexonic (Fig. 6b), whereas mRNAs predominantly had multiple exons. Additionally, lncRNAs exhibited shorter exon lengths compared to mRNAs (Fig. 6c). These differences underscore the distinct characteristics of lncRNAs and their potential regulatory roles, which are often linked to their tissue-specific expression and unique genomic architecture. Interestingly, differential expression analysis revealed that both skin and muscle tissues exhibit differentially expressed lncRNAs, while no differentially expressed lncRNAs were found in gills (Table 1). In skin, it was determined that 2 of the lncRNAs are downregulated and 18 of them upregulated in the summer whereas in muscle, 128 lncRNAs were found downregulated and 49 upregulated in this summer season.

Fig. 6
figure 6

Characterization of O. ascotanensis lncRNAs. (a) Density distribution of tau scores, (b) frequency distribution of the number of exons, and (c) density distribution of exon lengths for mRNAs and lncRNAs.

Table 1 Differentially expressed lncRNAs in O. ascotanensis tissues.

Data Records

Gene expression profiles were deposited at the Gene Expression Omnibus (GEO) database under accession number GSE29410552. Raw sequencing data are available through the Sequence Read Archive (SRA) under study accession SRP55338353. Processed data, including mapping statistics and genome annotation files (reference genome, GTF annotation, and functional gene annotations), are publicly available on Figshare40.

Technical Validation

RNA integrity and library preparation

RNA integrity was evaluated using a Qubit fluorometer (Invitrogen) and a Bioanalyzer 2100 (Agilent Technologies, RNA 6000 Nano Kit), ensuring all RNA samples had RNA Integrity Numbers (RIN) ≥ 7. High-quality RNA was used to construct 42 paired-end unstranded poly-A libraries with the Illumina TruSeq RNA Sample Preparation Kit. Sequencing was performed on the Illumina HiSeq 4000 platform, generating 150 bp paired-end reads.

Quality filtering of sequencing reads

Raw sequencing reads underwent quality control with FastQC, confirming their quality met required thresholds. Adapter sequences and low-quality bases were trimmed using TrimGalore (v0.6.6; parameters: -q 30 --retain_unpaired --phred33 --illumina). Post-filtering, a total of 4.4 billion high-quality paired-end reads were retained across the 42 RNA-seq libraries, with a median of 105 million reads per library.

Read mapping and transcript assembly

Filtered reads were aligned to the O. ascotanensis reference genome using HISAT2 (v2.2.1; parameters: --dta-cufflinks), achieving a median mapping rate of 88.45%40. Coordinate-sorted BAM files were generated for downstream analyses.

Sample clustering and reproducibility

Principal component analysis (PCA) was performed to validate transcriptomic consistency across tissues and seasons. PCA of all samples revealed distinct clustering by tissue type (gill, skin, and muscle), underscoring the reproducibility of tissue-specific expression patterns (Fig. 1a).