Background & Summary

Skeletal muscle is a highly heterogeneous tissue, constituting approximately 40% of body weight, and is integral to the regulation of body movement, metabolism and homeostasis1,2,3. Porcine satellite cells (PSCs) are essential for skeletal muscle development and regeneration4. These quiescent cells reside between the basal lamina and plasma membrane of myofibers and, upon activation by injury or stress, proliferate and differentiate into myoblasts, thereby facilitating muscle repair and growth. Understanding the molecular mechanisms that govern PSC proliferation and differentiation is critical for the progression of animal agriculture, regenerative medicine, and research on muscular dystrophies.

Long non-coding RNAs (LncRNAs) have recently attracted considerable attention due to their complex roles in regulating various biological processes. LncRNAs are a class of RNA transcripts longer than 200 base pairs with less or no protein-coding potential, often exhibiting cell- or tissue-specific expression patterns. These molecules play crucial roles in various biological processes, including epigenetic regulation, cell differentiation, apoptosis, metabolism, signal transduction and immune response5,6,7. Emerging evidence indicates that certain lncRNAs are integral to skeletal muscle development. For instance, during early human myogenesis, lncFAM recruits HNRNPL to the MYBPC2 promoter, increasing its mRNA and protein level, thereby facilitating the differentiation of human myoblasts into myotubes8. In mice, the novel lncRNA lncMREF interacts with Smarca5, thereby promoting the binding of p300/CBP/H3K27ac to myogenic regulatory elements, which accelerates muscle regeneration9. Additionally, the mouse lncMGPF, homologous to pig lncRNA AK394747 and human lncRNA MT510647, facilitates muscle differentiation by asponging miR-135a-5p, resulting in increased MEF2C expression. In pigs, lncRNA H19 regulates porcine satellite cells by sponging miR-140-5p and binding to DBN110. Overall, lncRNAs are integral to skeletal muscle formation. Despite their prevalence and functional significance, the molecular mechanisms of lncRNAs in various species, including pigs, are not well understood.

In this study, we isolated PSCs in vitro and induced their differentiation, collecting cells at two time points during proliferation (P24h, P48h) and two time points during differentiation (D18h, D28h) (Fig. 1). We then performed cell line-specific RNA sequencing and bioinformatics analysis to characterize both lncRNA and mRNA expression profiles (Fig. 2). This comprehensive datasets is a valuable resource to exploring the molecular mechanisms underlying muscle development and regeneration, with implications for both animal agriculture and human health.

Fig. 1
figure 1

Proliferation and differentiation of PSCs. (a) The state of PSCs during proliferation and differentiation. P24h: PSCs proliferated for 24 hours, P48h: PSCs proliferated for 48 hours, D18h: PSCs differentiated for 18 hours, D28h: PSCs differentiated for 28 hours. (b) Statistics of the number and proportion of PAX7 + positive cells. (c) Expression of MYHC in PSCs at different stages of proliferation and differentiation. (d) Differentiation ability of PSCs detected by MyHC immunofluorescence.

Fig. 2
figure 2

The workflow of RNA-seq analysis and lncRNA identification.

Methods

Animals

Three seven-day-old male Large White piglets, all full-sib offspring, were used in this study. All animal care and experimental procedures were conducted in accordance with the National Research Council Guide for the Care and Use of Laboratory Animals and were approved by the Animal Care and Use Committee of Xinyang Normal University (XYEC-2019-009).

Cell isolation and culture

Porcine satellite cells were isolated from male Large white piglets euthanized via carotid artery incision. Hind leg muscles were rapidly collected and washed with PBS supplement with 1% antibiotic-antimycotic (AA; Gibco, 15240-096). After removing connective and adipose tissue, the muscle samples were minced into small pieces and digested with 300 U/mL type II collagenase (Gibco, 17101-015) at 37 °C in a water bash sharker for 2.5 hours. The digestion was terminated using high-glucose Dulbecco’s modified Eagle’s medium (DMEM; Gibco, 10569-010) containing 10% fetal bovine serum (FBS; Gibco, 10099-141).

The cell suspension was sequentially filtered through 100, 70, and 40 µm cell strainer, and the pellets were collected and resuspended in PBS (Gibco, SH30256.01), RPMI-1640 (Gibco, A10491-01), or complete culture medium after centrifugations. Finally, cells were resuspended and cultured in complete culture medium containing 20% FBS, 0.5% Chicken Embryo Extract (CEE; Gemini, 100–163 P), 1% GlutaMax (Gibco, 35050-061), 1% Non-Essential Amino Acids (NEAA; Gibco, 11140-050), 1% AA, 2.5 µg/L basic fibroblast growth factor (bFGF; Invitroogen, 13256-029) and RPMI-1640 at 37 °C in 5% CO2.

Fibroblasts were removed from the mixed cell population due to differential adhesion properties by incubating cells on uncoated plates for 2.5 hours. Purified satellite cells were then transferred to Matrigel-coated plates (BD Biosciences, 356234) for proliferation. Once porcine satellite cells researched 80–90% confluence, theywere transferred into differentiation medium (DMEM supplemented with 5% horse serum (HS; Gibco, 26050-070) and 1% AA) to induce myotubes formation at 37 °C with 5% CO211.

Total RNA isolation and RT-qPCR

Total RNA was extracted from satellite cells using TRIzol (Invitrogen, 15596-026) following the manufacturer’s protocol. Briefly, satellite cells were lysed with 1 mL of TRIzol, and supernatants were collected after centrifugation at 4 °C, 13000 rpm for 15 minutes. Then chloroform was used to separate the supernatants from proteins and DNA, and total RNA was precipitated with isopropyl alcohol. Finally, RNA pellets were washed with ethanol and dissolved in RNase-free water. RNA purity and concentration were assessed using a NanoDrop 2000 spectrophotometer (Thermo Scientific, USA), with 260/280 absorbance ratios ranging from 1.90 to 2.00. RNA integrity and contamination were verified via 1% agarose gel electrophoresis.

Complementary DNA (cDNA) was synthesized using the PrimeScript RT Reagent Kit with gDNA Eraser (Perfect Real Time) (TaKaRa, RR047A). Read-time quantitative polymerase chain reaction amplification (RT-qPCR) was performed with TB Green® Premix Ex Taq™ II (TaKaRa, RR820A) on a Bio-Rad CFX-96 Real-Time PCR detection system (Bio-Rad, USA). Relative gene expression level were calculated using the 2−ΔΔCT method, with 18S ribosomal RNA (18S rRNA) as an internal control for normalization.

Western blot

Total protein was extracted from porcine satellite cells using RIPA buffer (Beyotime, P0013B) and a protease inhibitor mix (Beyotime, P1008) after washing the cells three times with PBS. Protein concentration was determined using the BCA Protein Assay Kit (Beyotime, P0012), following standard protocols. A total of 20 μg of protein was loaded and separated on a 12% SDS polyacrylamide gel, followed by transfer to a polyvinylidene fluoride (PVDF) membrane (Millipore, IPVH00010). The membrane was blocked with 5% skimmed milk at room temperature for 2 hours, then incubated overnight at 4 °C with antibodies (1:1000) against MYHC (Santa, sc-32732) and β-tubulin (Beyotime, AF2835). After washing with Tris Buffered Saline with Tween (TBST), goat anti-mouse IgG-HRP (Abmart, M21001) or goat anti-rabbit IgG-HRP (Abmart, M21002) secondary antibodies were applied. Protein detection was performed using an ultrasensitive ECL chemiluminescence kit (Beyotime, P0018M) and visualized using the ChemiDoc™ MP Imaging System (Bio-Rad, California, USA).

Strand-specific RNA-seq library preparation & sequencing

We prepared a strand-specific RNA-seq library for each sample. First, ribosomal RNA (rRNA) was removed from 2 μg total RNA using Ribo-ZeroTM Gold Kits (Epicentre, USA). Sequencing libraries were then prepared using the NEBNext® UltraTM Directional RNA Library Prep Kit for Illumina (NEB, Ispawich, USA) according to the manufacturer’s recommendations with different index label. Finally, PCR products were purified using the AMPure XP system, and library quality was assessed with the Agilent Bioanalyzer 2100 system. Libraries were sequenced on an Illumina NovaSeq 6000 platform to generate 150 bp paired-end reads.

Transcript assembly and novel lncRNA transcripts prediction

The workflow for transcript assembly and novel lncRNA transcripts prediction is shown in Fig. 2. Sequence quality was evaluated using FastQC (version 0.11.8)12. Low-quality reads, adapter sequences, and reads containing poly-N were removed from raw sequencing reads using Trimmomatic (version 0.36) to obtain clean data13.

To ensure the accuracy of lncRNA identification, potential ribosomal RNA (rRNA) contamination was systematically addressed. Porcine rRNA reference sequences were retrieved from Silva (release 138.2), Ensembl (Sscrofa11.1) and NCBI RefSeq (assembly GCF_000003025.6). Cleaned data were aligned to these composite rRNA references using Bowtie2 (version 2.5.4) with stringent parameters (--norc).

The clean reads were then mapped to the Sus Scrofa reference genome (Sscrofa11.1) using HISAT2 (version 2.1.0) with default parameters14,15. Subsequently, transcript de novo assembly was performed using StringTie (version 2.0)15,16 with default parameters, guided by the Sus Scrofa reference annotation (Sscrofa11.1). The merge function of StringTie is used to merge all sample assembled transcripts with reference annotation into a single annotation file. This merged annotation file was used to reassemble transcripts and obtain the FPKM values of all genes.

To ensure the uniformity of sequencing, we analyzed 5′-3′ read coverage uniformity across all transcripts using Deeptools (version 3.5.5)17. Transcript coordinates were divided into 100 bins, and coverage was normalized by RPKM.

Novel lncRNA transcripts were identified using the following criteria: (1) transcripts unannotated in the genome; (2) FPKM > 0.5 in at least one sample; (3) transcripts comprising multiple exons; (4) transcript exceeding 200 bp in length; (5) transcripts that neither overlap with protein-coding genes exons nor are within 2 kb of protein-coding genes; and (6) trascripts with coding potential predicted by CPC and CNCI, where a CPC score and CNCI score below 0 indicate non-coding transcript18,19. Using these stringent criteria, we screened a total of 1950 novel lncRNA transcripts.

Then we performed a Blastn alignment (e-value < 1e-5) of our novel lncRNAs against the ALDB database20. None of the transcripts showed significant homology (identity >90%), confirming their novelty.

Differential expression of lncRNAs and mRNAs

First, read counts for all coding genes and lncRNAs, including annotated lncRNAs and novel lncRNAs, were obtained from SAM files using featureCounts (v2.0.0)21. Reads count were then used to perform differential expression analyses between different PSC stages using the R package DESeq 2 (v1.36.0)22. Significantly differentially expressed mRNAs and lncRNAs were identified based on |Log2 (fold change) | > 1 and adjusted P-value < 0.05. Volcano plots of differentially expressed mRNAs and lncRNAs were generated using the R package ggplot2 (v3.4.2)23, and heatmaps of their expression were produced using the R package pheatmap (v1.0.12).

Technical Validation

Isolation and identification of porcine satellite cells

PSCs were isolated from 7-day-old piglets using an enzymatic digestion method. The morphological characteristics of PSCs during proliferation and differentiation were evaluated at 24 and 48 hours of proliferation (P24, P48) and 18 and 28 hours of differentiation (D18, D28), respectively. As shown in Fig. 1a, PSCs exhibited a fusiform shape during proliferation and became progressively elongated following differentiation induction. Muscle tubes formed through cell fusion by D18, with both the number and thickness of myotubes increasing by D28. PAX7, a marker of quiescent and proliferative skeletal muscle satellite cells, was used to assess purity. Purity monitoring revealed that more than 95% of isolated cells were PAX7 positive (Fig. 1b), indicating high purity of the adherent PSCs, thereby making them suitable for subsequent studies. The differentiation potential of the PSCs was evaluated using Myosin Heavy Chain (MYHC) as a maker, with expression levels measured at various time points via Western Blot. The results demonstrated that MYHC was expressed at D18 and D28, but not during the proliferative phase of the PSCs (Fig. 1c). Immunofluorescence analysis at D28 further confirmed the myogenic differentiation potential (Fig. 1d). These findings are consistent with previous studies24, confirming that the isolated PSCs are appropriate for further experimental applications.

Quality control of RNA integrity

The quality of total RNA was assessed using NanoDrop2000 and agarose gel electrophoresis. All samples exhibited high RNA integrity, with concentrations ranging from 200 to 500 ng/µL, OD260/280 values between 1.90 and 2.00, and OD260/230 values between 1.80 to 2.00. These metrics confirmed that the RNA samples were of sufficient quality for further sequencing.

RNA-Seq data quality

FastQC was used to evaluate the quality of raw sequencing data. As illustrated in Fig. 3, the reads exhibited consistently high-quality scores (Fig. 3a), with GC content approximating 50% (median: 46%), which aligns closely with the theoretical GC content of mammalian coding regions (45~50%) as reported in genome-wide studies of Sus Scrofa25,26 (Fig. 3b). Approximately half of the reads were uniquely mapped (Fig. 3c). Additional FastQC metrics confirmed that the data were suitable for downstream analysis.

Fig. 3
figure 3

Quality monitoring results of RNA-seq. (a) Representative quality score distribution for all 150 bp bases. (b) Representative distribution of GC content of per sequence. (c) Representative distribution of unique reads and duplicate reads. (d) The distribution density of transcript length among new assembly lncRNAs, annotated lncRNAs and protein_coding RNAs. (e) Cumulative distribution of FPKM value among new assembly lncRNAs, annotated lncRNAs and protein_coding RNAs. (f) The distribution density of coding probability among new assembly lncRNAs, annotated lncRNAs and protein_coding RNAs.

And the alignment rates to rRNA sequences were consistently below 0.01% across all samples (Table S2) which demonstrates the effectiveness of our library preparation protocol in minimizing ribosomal RNA carryover, thereby ensuring that subsequenc analyses focus specifically on non-coding RNA species without interference from abundant rRNA fragments.

Furthermore, sequencing reads were uniformly distributed across both chromosomes and genome strands (Fig. 4a). Subsequently, mapping to the reference genome S. scrofa 11.1 yielded a mapping rate exceeding 97% across all samples (Table 1). Gene expression levels were also analyzed, revealing that the overall expression profiles of all transcripts were consistent across all samples (Fig. 4b). Moreover, principal component analysis (PCA) and hierarchical clustering revealed distinct expression patterns between different time points, indicating variability between groups and high reproducibility within each group (Fig. 4c,d).

Fig. 4
figure 4

(a) Reads density across all chromosomes. (b) The expression of all mRNAs in all samples. (c) Heatmap showing differentially expressed genes between proliferation and differentiation of PSCs. (d) Principal component analysis results.

Table 1 RNA-seq reads information.

To ensure the uniformity of sequencing, we analyzed 5′-3′ read coverage uniformity across all transcripts. Coverage plots for all transcripts were generated using Deeptools plotProfile, which demonstrate consistent read distribution patterns across the transcript bodies, with no significant positional bias (Fig. S2).

Identification of novel lncRNAs

A total of 1,950 novel lncRNAs were identified through a rigorous screening process (Fig. 2). We further analyzed and compared the length distribution, expression levels, and coding potential of novel lncRNAs, annotated lncRNAs, and mRNAs. Similar to annotated lncRNAs, novel lncRNAs exhibited shorter transcript lengths, lower coding potential, and reduced expression levels compared to mRNAs (Fig. 3d–f). These findings confirm that the identified novel lncRNAs are reliable and can be used for differential expression analysis.

Differential expression of lncRNA and mRNA

We analyzed the expression profiles of lncRNAs and mRNAs in porcine satellite cells (PSCs) across various stages of proliferation and differentiation. Gene expression analysis of PSCs at distinct temporal points revealed significant variations. Compared to P24 stage, 565 lncRNAs and 1,173 mRNAs were upregulated in P48 PSCs, while 116 lncRNAs and 820 mRNAs were downregulated (Fig. 5a,e). In contrast, when comparing the D18 stage to the earlier time point, 407 lncRNAs and 973 mRNAs were upregulated, while 315 lncRNAs and 1,203 mRNAs were downregulated (Fig. 5b,f).

Fig. 5
figure 5

Differential expression analysis of lncRNAs and mRNAs. (ac) Volcano plot showing differential expression results of lncRNAs in different stages of PSCs proliferation and differentiation. (d) Venn diagram showing that 16 lncRNAs were differentially expressed during the proliferation and differentiation of PSCs. (eg) Volcano plot showing differential expression results of mRNAs in different stages of PSCs proliferation and differentiation. (h) Venn diagram showing that 136 mRNAs were differentially expressed during the proliferation and differentiation of PSCs.

However, when analyzing gene expression differences between the two stages of differentiated PSCs (D18 and D28), fewer differentially expressed lncRNAs and mRNAs were observed. Specifically, compared to D18, only 40 lncRNAs and 171 mRNAs were upregulated in D28, while 23 lncRNAs and 170 mRNAs were downregulated (Fig. 5c,g). Across all time points, 16 lncRNAs and 136 mRNAs exhibited differential expression (Fig. 5d,h).

To further evaluate the reliability of RNA sequencing data, two differentially expressed lncRNAs were randomly selected, and their relative expression levels were assessed during PSCs proliferation and differentiation using RT-qPCR. As shown in Fig. 6, the RT-qPCR results were consistent with RNA-seq, confirming the reliability of the RNA-seq findings.

Fig. 6
figure 6

Quantitative verification of two differentially expressed lncRNAs. (a,b) The FPKM value of MSTRG.2252 (a) and MSTRG.20791 (b). (c,d) The relative expression of MSTRG.2252 (c) and MSTRG.20791 (d) in mRNA level.

Data Records

The raw sequence data reported in this paper have been deposited in the Genome Sequence Archive27 in National Genomics Data Center28, China National Center for Bioinformation/Beijing Institute of Genomics, Chinese Academy of Sciences (GSA: CRA01970429) that are publicly accessible at https://ngdc.cncb.ac.cn/gsa. Detailed sample information and RNA-seq read statistics are provided in Table 1. The expression data, read counts files, DEG lists and the BED file of novel lncRNAs have been deposited in the Figshare database30.