Introduction

Cashmere goats are a valuable livestock breed renowned for their production of high-quality cashmere, often referred to as the “fiber gem” or “soft gold”1. Studies have shown that the fleece of cashmere goats consists of two distinct fibers: guard hair (coarse hair) and cashmere undercoat, which are originating from primary and secondary follicles, respectively2,3.

Biologically, the development of a double coat is essential for the goat’s adaptation to harsh environments4. The guard hair offers protection against the elements, while the undercoat cashmere provides insulation during cold winters, safeguarding the goat’ skin5. This adaptation allows cashmere goats to thrive in diverse climates, from the Himalayas to other regions6. The fine, soft undercoat known as cashmere, highly valued for its warmth and luxurious feel, is a sought-after commodity7,8 utilized in the production of pashmina, sweaters, and other cashmere products2,6. In contrast, the coarser guard hair is removed and often discarded or repurposed for low-value items such as brushes or interlinings6.

Goat hair follicle development is a complex, multi-stage process involving intricate cellular interactions and signaling pathways9. This process begins with the formation of the hair follicle, which serves as the structural unit responsible for hair growth10. The hair follicle undergoes a cyclical pattern of growth and regression, known as the hair cycle, encompassing three main phases: anagen (AN), catagen (CA), and telogen (TE)11,12.

While the early stage of the anagen (ESA) phase in hair follicles is also crucial for initiating hair growth and ensuring the proper development of hair fibers. The ESA is a period of intense activity, marked by the activation of hair follicle stem cells, the rapid proliferation of matrix cells, and the crucial involvement of dermal papilla cells3,13. This phase lays the foundation for the subsequent stages of hair growth and development, ultimately leading to the formation of a new hair shaft14,15. However, there is currently limited research specifically focusing on this early stage of the anagen phase in hair follicles.

The Jinlan Cashmere Goat, a domesticated breed originating from Shanxi Province, China, is renowned for its high-quality cashmere production. The predominant coat type in this population is characterized by the coarse hairs exceeding the length of cashmere (CHLC), while a minority of individuals display the opposite trait, where the coarse hairs are shorter than the length of the cashmere (CHSC). We collected skin tissues from the early stage of anagen of Jinlan Cashmere Goat for transcriptome sequencing research and employs functional enrichment, and regulatory network construction to elucidate the mechanisms behind coat type differences, aiming to screen for key non-coding RNAs that affect the type of goat fleece.

Materials and methods

Ethics statement

Institutional Animal Care and Use Committee of Shanxi Agricultural University provided full approval for this research (SXAU-EAW-2023G.UV.005009210). We confirm that all experiments were performed in accordance with the relevant guidelines and regulations of China. We also confirm that this study was performed in accordance with the ARRIVE guidelines.

Animals and tissues

The Jinlan Cashmere Goat is a domestic breed of goat originating from Shanxi Province, China. The anagen of the hair follicles in Jinlan Cashmere Goats lasts from April to November each year, with a rapid growth phase after September. In July, during the early stage of the anagen phase, we collected skin tissue samples from both the coat types (CHLC and CHSC) of Jinlan Cashmere Goat at 2 years old and the detailed sampling procedure is outlined below. Firstly, administer 0.2 ml of Lumianning Injection (Jilin Huamu Animal Health Products Co., Ltd., Changchun, Jilin, China) to the goat for anesthesia. After 7–10 min, once the goat lies down stably, it indicates that the anesthesia has taken effect. Then, a 5 cm x 5 cm patch of hair was carefully trimmed off from the posterior edge of the left scapula, near the midline of the body. Any residual fine hairs on the skin were then meticulously scraped away with a razor. The skin tissue in this area was then wiped with alcohol and disinfected with iodine. Next, a circular skin biopsy punch with a diameter of 1 cm was used to collect the skin sample, with Yunnan Baiyao powder swiftly applied to the wound to stop any bleeding.

The four skin tissues collected in each group (CHLC and CHSC) were promptly placed into a 2 ml sterile tube and preserved in liquid nitrogen.

RNA extraction, library construction and sequencing

Total RNA was extracted using TRNzol Universal Reagent (Tiangen, China). The ribosomal RNA was depleted from total RNA samples using the rRNA Removal Kit (Tiangen, China). Sequencing libraries were generated using the Fast RNA-seq Lib Prep Kit V2 (ABclonal Technology, China) following the manufacturer’s recommendations and index codes were added to attribute sequences to each sample. At last, library quality was assessed on the Agilent 5400 system and quantified by QPCR (1.5 nM). The qualified libraries were pooled and sequenced on the Illumina novaseq 6000 platform with PE150 strategy in Novogene Bioinformatics Technology Co., Ltd (Beijing, China).

Bioinformatics analysis

The original fluorescence image files obtained from the Illumina platform are transformed into short reads (Raw data) by base calling and these short reads are recorded in FASTQ format, which contains sequence information and corresponding sequencing quality information. Then quality control was performed and it is an essential step applied to guarantee the meaningful downstream analysis. We used Fastp (version 0.23.1)16 to perform basic statistics on the quality of the raw reads and clean reads were finally obtained.

Clean reads of each sample were then mapped to the reference genome (NCBI_CAPRA_HIRCUS_GCF_001704415.2_ARS1_2) by HISAT2 (v2.0.5)17. We utilized StringTie (v1.3.3b)18 to perform transcript assembly and merging and novel transcripts were then obtained with the software gffcompare (v0.10.6). After predicting the protein-coding capability of novel transcripts using the softwares CNCI (v2.0)19, PFAM (v1.6)20, and CPC2 (v3.2.0)21, transcripts lacking coding potential in all three software were identified as novel lncRNAs, while those exhibiting coding potential in all three were classified as novel mRNAs.

The prediction of target genes for lncRNAs was performed utilizing two strategies: co-location and co-expression. Co-location focuses on the potential regulation of neighboring genes within 100 kb of the lncRNA, based on genomic proximity22. The co-expression targets of lncRNAs are predicted based on the correlation of expression levels between the lncRNA and other genes across multiple samples23. And the Pearson correlation coefficient | r | > 0.75 was used in the study.

Differentially expressed lncRNAs (DE lncRNAs) and genes (DE genes) were identified with P_value < 0.05 using the DESeq2 package (v1.20.0)24. Functional enrichment analysis including Gene Ontology (GO) and Kyoto Encyclopedia of Genes and Genomes (KEGG)25 was performed using clusterProfiler (v3.8.1) with a significance threshold of P_value < 0.0526. The targeting relationship of the ceRNA network is predicted using the miRanda (v3.3a), and its visualization is carried out using Cytoscape (3.8.2).

Results

Sequencing data quality control and alignment

Upon completion of the sequencing process, we amassed a comprehensive dataset comprising a total of 692,785,222 raw reads. After rigorous filtering, an impeccable 678,782,742 clean reads were obtained, constituting an impressive 97.98% of the original dataset (Table S1). Notably, each sample boasted a clean read proportion exceeding 97.26% (CHSC_2_ESA). Moreover, the GC content of these clean reads varied gracefully within a narrow range, spanning from 43.53 to 45.21% across all samples (Table S1), further underscoring the uniformity and quality of the data. A substantial proportion of over 96.29% of the clean reads from the CHSC_3_ESA sample, and similarly high percentages for other samples, were successfully mapped onto the reference genome (Table S2). This achievement underscores the precision and efficacy of our alignment strategies.

Collectively, these exhaustive findings paint a compelling picture of the exceptional quality of the sequencing data obtained, attesting to the rigor and meticulousness of our experimental protocols and analyses.

Differential expression analysis

We identified a total of 130 DE lncRNAs across the two comparison groups. Compared to the CHLC_ESA group, the CHSC_ESA group exhibited a significant alteration in the expression profile of these lncRNAs, with 68 lncRNAs being upregulated and 62 downregulated (Fig. 1A). Concurrently, we also identified 341 DE mRNAs, among which 165 were upregulated and 176 were downregulated (Fig. 1B). In addition, we performed heatmap clustering analysis to visualize the expression patterns (Fig. 1C,D). The results showed a clear and distinct pattern of expression clusters and successfully grouped samples from the same experimental condition together.

Fig. 1
figure 1

Differential expression analysis. (A) Volcano plot of DE lncRNAs; (B) Volcano plot of DE mRNAs; (C) Heatmap of DE lncRNAs; (D) Heatmap of DE mRNAs.

GO enrichment analysis

We predicted the target genes of DE lncRNA across the entire genome, yielding 552 co-location (Table S3) and 303 co-expression (Table S4) relationship pairs. Notably, these relationships encompassed 109 and 43 DE lncRNAs, respectively. Finally, we carried out GO functional enrichment analysis (Tables S5 and S6) on the target genes of DE lncRNAs in the study and the top 30 GO terms were shown in (Fig. 2).

Figure 2A shows that the proportion of Cell component (CC) classification is the highest, followed by Molecular function (MF) classification, and finally Biological process (BP) classification. Among them, the keratin filament, intermediate filament and intermediate filament cytoskeleton terms were closely related to the growth of hair follicles and hair. Further analysis revealed that all the three terms were enriched with 24 same genes, which corresponded to two lncRNAs with co-location relationship, namely TCONS_00050130 and TCONS_00050142 (Table 1). We also found some protein-related and nucleic acid-related entries, such as protein heterodimerization activity, protein dimerization activity, DNA binding, nucleic acid binding, DNA packaging complex, protein-DNA complex and so on. Hair follicle development and hair growth are inseparable from the regulation of proteins and nucleic acids.

The top 30 entries of the GO functional enrichment analysis of the co-expression target genes of 43 DE lncRNA are presented in Fig. 2B, and the MF terms are found to be the most abundant, with up to 25 entries. Among these top 30 entries, we found multiple receptor activity and signal transduction-related entries, including cytokine activity, cytokine receptor binding, receptor regulator activity, receptor ligand activity, transmembrane signaling receptor activity, signaling receptor activity, signaling receptor binding, signal transducer activity and G-protein-coupled receptor signaling pathway. Additionally, a significant portion of enzymes related to metabolic activities were also identified, for instance, serine-type endopeptidase activity, serine-type peptidase activity, serine hydrolase activity and pyrophosphatase activity. These findings suggested that cytokine activity, signal transduction regulation and enzymatic activity might play a role in modulating follicle development.

Fig. 2
figure 2

GO function enrichment analysis results. (A) The top 30 GO terms of co-location targets for the DE lncRNAs; (B) The top 30 GO terms of co-expression targets for the DE lncRNAs.

Table 1 Three GO terms related to hair follicle development enriched for co-location target genes of DE LncRNAs.

KEGG pathway analysis

KEGG pathway enrichment analysis of target genes for DE lncRNAs was also performed (Tables S7 and S8) and the top 20 KEGG pathways are shown in Fig. 3. The enrichment results of co-location target genes revealed several important pathways (Fig. 3A) such as the Estrogen signaling pathway, Amino sugar and nucleotide sugar metabolism and Adipocytokine signaling pathway. The KEGG pathway analysis of co-expression target genes has identified the Cytokine-cytokine receptor interaction pathway, which echoes the findings of cytokine activity and cytokine receptor binding terms from the GO enrichment results. Furthermore, our analysis has enriched Oxidative phosphorylation and Chemokine signaling pathway, suggesting their potential involvement in the growth and development of hair follicles or hair.

Fig. 3
figure 3

KEGG pathway enrichment results. (A) The top 20 KEGG pathway of co-location targets for the DE lncRNAs; (B) The top 20 KEGG pathway of co-expression targets for the DE lncRNAs.

Construction of CeRNA regulatory network

We utilized the DE lncRNAs and mRNAs to construct a ceRNA regulatory network, incorporating all detected miRNAs (Fig. 4 and Table S9). The Pearson correlation coefficients (r) between the expression levels of miRNA and lncRNA, as well as those between miRNA and mRNA, were used to further screen lncRNA-miRNA-mRNA regulatory pairs. We identified a total of 197 lncRNA-miRNA-mRNA regulatory pairs with | r | > 0.75, which include 11 DE lncRNAs and 52 DE mRNAs. We have identified multiple ceRNA regulatory axes with target genes such as EPPK1, IRF4, CCL20, and CXCR7, which are closely related to hair follicle or hair development. This network provides valuable insights into potential regulatory mechanisms and pathways of interest.

Fig. 4
figure 4

The ceRNA regulatory network diagram of lncRNA-miRNA-mRNA.

Discussion

Cashmere, prized for its softness and warmth, originates from secondary hair follicles and displays a finer, shorter, and less variable diameter compared to guard hair. Conversely, guard hair, produced by primary hair follicles, is characterized by its coarser, longer, and more variable fibers, serving as a protective outer coat7. Studies have explored the impact of external factors, such as nutrition, on hair follicle development. Nutritional restriction, for example, can significantly affect the development and transcriptome of hair follicles in sheep27. Furthermore, studies have delved into the genetic basis of these differences, identifying candidate genes associated with hair traits. For instance, the expression of fibroblast growth factor 21 (FGF21) has been found to differ significantly between cashmere and guard hair follicles, suggesting its potential involvement in regulating hair type-specific development28; Another study has demonstrated that the overexpression of thymosin beta-4 (Tβ4) in the hair follicle tissue of Alpas cashmere goats results in an increase in cashmere yield and promotes hair follicle development.

However, there have been relatively few studies on lncRNA to hair development in goats. This study collected skin tissues from Jinlan Cashmere Goats with two different coat types at the early stage of the anagen (ESA) for transcriptome sequencing analysis. Finally, a total of 130 DE lncRNAs were obtained in the comparison group CHSC_ESA vs. CHLC_ESA.

The GO enrichment analysis of the co-location target genes of lncRNAs revealed several entries that are directly associated with hair follicle or hair development. Keratin filaments are composed of two type keratin proteins: type I, which number 28, and type II, which number 2629. They are essential for hair follicle development and hair growth, providing structural integrity to the hair shaft, contributing to follicle formation, and regulating hair follicle cycling30,31. For example, Keratin 15 (K15) is a keratin protein mainly expressed in hair follicle stem cells (HFSCs) in humans and mice32,33. Study have shown that K15 is excessively degraded by KLHL24-ΔN28 in KLHL24-mutant mice and patients, leading to a cytoskeleton network disturbance in HFSCs, which then causes premature hair loss, HFSCs decrease, and finally hair follicle degeneration34. The intermediate filament cytoskeleton, comprising intermediate filaments and associated proteins, is essential for maintaining tissue integrity and responding to mechanical stress35. The intermediate filament cytoskeleton refers to the network formed by all intermediate filaments within a cell, including keratin filaments36,37. This network plays a crucial role in maintaining cell shape, resisting mechanical stress, and facilitating cell signaling38.

Additionally, we also found some protein-related and nucleic acid-related GO terms. Proteins and nucleic acids are essential for hair follicle development and hair growth, playing complementary roles in this intricate process. Proteins, particularly keratin, provide the structural integrity of hair shafts, giving them strength and resilience39,40. Nucleic acids, specifically DNA and RNA, are responsible for encoding the genetic information that dictates the synthesis of these proteins, including keratin41. Furthermore, proteins are involved in signaling pathways that regulate the hair follicle cycle, ensuring proper growth, regression, and resting phases42,43. In essence, proteins provide the building blocks and structural components of hair, while nucleic acids act as the blueprints and regulators, ensuring the precise synthesis and function of these proteins. This intricate interplay between proteins and nucleic acids is fundamental for healthy hair growth and development.

The KEGG pathway enrichment analysis of the co-location target genes of DE lncRNA suggests that the Estrogen signaling pathway, Amino sugar and nucleotide sugar metabolism, and Adipocytokine signaling pathway may play important roles. The estrogen signaling pathway includes multiple genes and proteins, including estrogen and its receptors44,45. Two important cell populations in HF, hair follicle stem cells (HFSCs) and dermal papilla (DP) cells, are required for initiating the first step of a new hair cycle46. And estrogen receptors are expressed in both the DP and HFSCs, suggesting a direct influence on hair follicle activity47. Another study showed Estrogen could modulate the hair follicle cycle by influencing the expression of genes involved in hair growth and differentiation48. Amino sugars and nucleotide sugars are essential building blocks for various macromolecules, including glycosaminoglycans (GAGs), which are crucial components of the extracellular matrix (ECM) surrounding hair follicles49. The ECM provides structural support and regulates cell signaling within the hair follicle, influencing its development and growth50. Adipocytokines, secreted by adipocytes, play a crucial role in regulating hair follicle activity. These signaling molecules can influence hair follicle cycling, growth, and even hair loss51. For example, leptin, an adipocytokine, has been shown to promote hair growth by stimulating hair follicle stem cell proliferation52.

Cytokine-related entries were identified in both GO and KEGG enrichment results of co-expression target genes for DE lncRNAs. Cytokines are small, secreted proteins that act as signaling molecules, mediating communication between cells within the immune system and other tissues53. They are essential for regulating a wide range of biological processes, including inflammation, immune responses, and cell growth and differentiation54. Within the hair follicle, cytokines play a crucial role in regulating hair growth, cycling, and even hair loss55. They act as messengers, mediating complex interactions between immune cells and hair follicle cells, ultimately shaping the fate of hair54,55,56.

Oxidative phosphorylation and Chemokine signaling pathway were also found in the TOP 20 pathways. Oxidative phosphorylation (OXPHOS) is a fundamental metabolic process that occurs in mitochondria, the powerhouses of cells. It involves the breakdown of glucose and other nutrients to generate ATP, the primary energy currency of cells57. In hair follicles, OXPHOS is essential for supporting the high energy demands of hair growth and cycling58. Study shows chemokine signaling continues to play a critical role in regulating hair growth and cycling59. Chemokines like CCL2 are involved in attracting immune cells to the hair follicle during the anagen (growth) phase, contributing to the formation of the hair shaft60.

LncRNA can function as competing endogenous RNAs (ceRNAs) by competitively binding to microRNAs (miRNAs), thereby regulating the expression of target messenger RNAs (mRNAs)61. Emerging evidence suggests that lncRNAs, acting as ceRNAs, are vital for proper hair follicle development and regeneration62,63. Epiplakin (EPPK1), a member of the plakin protein family that is exclusively expressed in epithelial tissues where it binds to keratins64. Previous studies have found that the expression of EPPK1 was highest in catagen of Yak Hair Follicle Cycle, which was consistent with hair keratin expression65. The above research suggested that EPPK1 might be one of the proteins involved in hair growth. IRF4 has been linked to hair graying, as it is involved in the production and storage of melanin, the pigment that determines hair, skin, and eye color66. IRF4 could cause excessive cell death in hair follicles when not regulated properly, which could lead to hair loss67,68. The ceRNA regulatory network reveals that chi-let-7a-3p directly acts on IRF4, and there are up to 5 DE lncRNAs interacting with the miRNA. CCL20 is a chemokine that plays a role in diverse physiological and pathological processes69. Scharschmidt et al.70 identified CCL20 as a HF-derived, microbiota-dependent chemokine and found its receptor, Ccr6, to be preferentially expressed by Tregs in neonatal skin. CeRNA regulatory network indicates that CCL20 is regulated by chi-let-7a-3p, which corresponds to six DE lncRNAs. ACKR3, also known as CXCR7, represents an atypical chemokine receptor that control the migration of normal human epidermal melanocytes71,72,73. And the melanocyte stem cell located in the hair follicle bulge contribute to follicular structures during each anagen phase of the hair cycle, synchronizing periodic activities to impact color to the hair74. The above DE genes, as well as upstream DE lncRNAs and miRNAs in the ceRNA network, all have the potential to regulate the development of different goat coat types.

Conclusions

In this study, we collected skin tissues from goats with different coat types for transcriptome sequencing analysis. GO functional enrichment analysis revealed keratin filament, intermediate filament, cytokine activity, and cytokine receptor binding terms. And KEGG pathway analysis found Estrogen signaling pathway, Amino sugar and nucleotide sugar metabolism and Adipocytokine signaling pathway. Finally, the analysis of the ceRNA regulatory network between DE lncRNAs and DE mRNAs also revealed several ceRNA regulatory axes, including those with EPPK1, IRF4, CCL20, and CXCR7 as target genes. This study identifies critical DE lncRNAs and regulatory pathways, which can be explored in future functional studies to elucidate their roles in coat type determination.