Introduction

Piper longum (family: Piperaceae) is an important medicinal plant ranking third after P. nigrum (black pepper) and P. betle (betel) in terms of economic importance1. It grows extensively in the hot and humid Indo-Malaya region2. Long pepper, or Piper longum, is an important medicinal plant valued for its many therapeutic uses in both traditional and modern medicine. For ages, the Ayurvedic, Siddha, and Unani medical systems have utilized this climber to cure various illnesses, such as infections, digestive problems, and respiratory diseases. The P. longum has several important bioactive compounds viz. alkaloids, flavonoids, glycosides, tannins, phenols, and sterols3. The bioactive ingredient piperine, a potent alkaloid with anti-inflammatory, antioxidant, antibacterial, and bioavailability-boosting qualities, is the primary source of long pepper’s pharmacological significance. Because of these qualities, long pepper is a crucial component of many herbal preparations, particularly those involving the creation of nutraceuticals and complementary medicines. Tonic can be replaced with dried unripe fruits of long pepper. Immature fruit and root decoction is used to treat colds, coughs, and chronic bronchitis. Fruits and roots are also utilized as antidotes for scorpion stings and snake bites. As an Ayurvedic contraceptive, a mixture of powdered Embelia ribes seeds, P. longum L. fruit, and borax powder have been utilized in equal amounts. P. longum L. has been used to treat a variety of illnesses in both conventional medicine and the Ayurvedic system4,5.

The plant is a powerful stimulant for both digestive as well as respiratory systems and has a rejuvenating effect on the lungs. Spikes play an important role in multiple ailments such as increased thyroid hormone level, thermogenic response, immune-stimulatory, antiulcer, anti-amoebic, antioxidant, hepatoprotective, cardiovascular, and anti-inflammatory activities6,7,8. The abundance of medicinal value of this plant resulted in extensive and indiscriminate collection from the wild that threatened its very existence, necessitating the development of methods to conserve P. longum9. In addition to its therapeutic value, long pepper has significant commercial value. It is extensively grown and traded for its dried spikes, which are used as a raw medicinal ingredient and as a spice. The increasing demand for natural and plant-derived substances around the world has increased the market value of long pepper and sparked interest in its sustainable production and large-scale cultivation. Nevertheless, despite its enormous promise, little is known about the molecular and genetic underpinnings of the biosynthesis of its secondary metabolites, especially piperine10.

The collection pressure on P. longum could be relieved by finding alternative sources of plants with similar medicinal properties or by biotechnological intervention in which the piperine content in the leaves could be increased. Piperine accumulation is the highest in the spikes, followed by roots and the lowest amount has been detected in the leaves of P. nigrum11 and P. longum8. However, piperine biosynthesis in P. longum, its transportation within the plant, and storage still need to be better understood. Although piperidine conversion to piperine using Piperoyl-CoA has been worked out in P. nigrum and steps of lysine conversion to piperidine have been elucidated in the bacterium Pseudomonas aeruginosa12.

Despite the reaction based on the fact that the acetyl or malonyl CoA of feruloyl-CoA is a conceivable component for the C2-extension of the precursor derived from cinnamic acid, ensuring the development of Piperoyl-CoA. In these steps, no data is available regarding molecular and enzymatic aspects. In another case, the related amide capsaicin was derived from buturyl-CoA by C2- elongations which are encoded by the Pun1 gene12. Recent advances in transcriptome analysis using Next Generation Sequencing have elucidated metabolic pathways in several medicinal plants, such as Artemisia annua, Withania somnifera, Panax ginseng, Picrorhiza kurroa, Tribulus terrestris to name a few13,14,15,16,17. Transcriptome analysis of the seed18 and root of P. nigrum19 has generated a large data bank of gene sets involved in various processes. However, such a transcriptomic study in P. longum is still lacking. A detailed investigation of the genome of long pepper transcriptome using RNA-seq can give us great insight into this species. The genomic data generated could be useful for understanding the genes involved in the biosynthetic pathway of not only piperine but also other secondary metabolites of importance.

Although Piper longum and its bioactive component, piperine, have enormous medical value, little is known about the underlying molecular mechanisms of their manufacture. While prior research has shed light on features of piperine production in related species such as P. nigrum, there is a paucity of thorough transcriptome data for P. longum19,20. Furthermore, there are still few molecular and enzymatic insights available regarding the biosynthesis pathways involving important precursors such as piperidine obtained from lysine and piperoyl-CoA formed from phenylpropanoid. The development of next-generation sequencing (NGS) has made it possible to perform transcriptome analysis on several medicinal plants, providing useful tools for researching secondary metabolite pathways. However, such studies in P. longum are scarce. Therefore, a detailed transcriptomic investigation of P. longum can provide critical insights into the genes, pathways, and regulatory mechanisms involved in piperine biosynthesis, paving the way for genetic and biotechnological interventions to enhance its medicinal potential.

In the present study, the transcriptome of long pepper leaves, roots, and spikes was analyzed using the Illumina HiSeq 2000 platform. Analysis of transcriptome data showed gene families that were involved in the biosynthesis of piperine and other secondary metabolites and housekeeping genes. This is the first dataset of sequence analysis of long pepper leaves, roots, and spikes and will be a useful genomic library for any molecular research.

Results

HPLC-based estimation of piperine content in leaves, roots, and spikes

In the present study, the piperine content was estimated in leaves, roots, and spikes using HPLC (Table 1). The spike tissues had a promising content of piperine (76 µg/ml) as compared to roots and leaves. Roots (43 µg/ml) contained the most alkaloid after spikes, followed by leaves (18 µg/ml) (Fig. 1).

Table 1 Quantitative analysis of the piperine content in P. longum using HPLC.
Fig. 1
figure 1

Graphical representation of analysis of piperine content in different parts of P. longum using HPLC. The three replicates ± SE is shown by the error bars above the means.

Sequencing and de novo transcriptome assembly

The next-generation sequencing for samples was performed on the Illumina platform which resulted in the generation of high-quality data. Raw reads totaling 5,070,436,800 from leaves, 8,160,880,210 from roots, and 6,870,010,500 from spikes were obtained. The NGS for P. longum leaves, roots, and spikes were performed using 2 × 150 bp on the Illumina platform which resulted in the generation of high-quality data. The high-quality reads statistics of P. longum leaves contained 16,901,456 paired reads, roots contained 54,993,496 paired reads and spikes contained 22,900,035 paired reads. De novo master assembly of pooled high-quality paired-end reads of P. longum roots, leaves, and spikes samples was accomplished and a total number of 173,381 transcripts were obtained (Table 2). The total transcriptome, N50, and maximum length of transcript in the libraries were 99,612,836, 722, and 12,994 bases, respectively. The size distribution of the raw reads and the assembled transcripts of the libraries were characterized (Fig. 2). CDS were predicted from the 173,381 assembled transcript sequences using Transdecoder (rel16JAN2014) with default parameters that resulted in the identification of 58,773 CDS and the total size of CDS was 3,87,34,722 bp. The maximum and N50 length of CDS were 12,990 and 747 bases respectively (Table 3) and the size distribution of the CDS was characterized in Fig. S1. Transcriptome completeness was evaluated through BUSCO analysis. A plant-specific database with 425 genes was used for BUSCO analysis, which found 174 (40.9%) complete BUSCOs. 133 (31.3%) of them were categorized as full, single-copy genes, and 41 (9.6%) were found to be duplication genes (Supplementary Table S1).

Table 2 Statistics of the assembled transcript of P. longum sample.
Fig. 2
figure 2

Length distribution of P. longum transcripts. Y axis represents the length range whereas the X axis represents the number of transcripts.

Table 3 Statistics of predicted CDS of Piper longum sample.

Gene annotation and functional classification

A homology-based methodology was implemented for functional annotation. All the CDS generated from the combined assembly of three different tissues of P. longum were predicted by aligning them with the NCBI non-redundant (Nr) protein database using BLASTx 2.2.30 + with an e-value less than 1e- 5. A total of 50,277 CDS showed significant similarity to known proteins in the NR database. The most significant BLAST hit was found in Nelumbo nucifera with around 11,934 CDS, followed by Vitis vinifera with 3816 CDS and Elaeis guineensis with 3496 CDS (Fig. 3). The analysis of the CDS data set showed that lysine synthesis occurs in P. longum. The spike has elevated alkaloid activity, particularly piperine, compared to the leaf and root due to increased lysine biosynthesis, which is then converted into piperine. Gene Ontology (GO) classification was used to annotate transcriptome data using an internationally accepted gene functional classification framework. Metabolic activities (24.6%) and cellular processes (21.9%) were the primary subcategories of biological processes. In the cellular component, membrane (13.8%) and cell (13.6%) were prominent. In molecular function, binding (23.7%) and catalytic activity (22.4%) were the key subcategories (Fig. 4). In P. longum, there were 18,508 CDS related to biological processes, 14,429 CDS related to cellular components, and 21,103 CDS related to molecular functions (Supplementary Table S2).

Fig. 3
figure 3

Homologous Resemblance in P. longum: Evaluation of unigene similarity based on annotations using the Nr database. The figure displays the distribution of E-values for the best BLAST hits associated with each unigene (E-value < 1e-5), the distribution of similarities for the top BLAST hits per unigene, and the species distribution of the most closely related sequence results for each unigene.

Fig. 4
figure 4

Comparative Gene Ontology (GO) classifications of CDS with functional annotations commonly expressed across various tissues in the P. longum transcriptome. These genes were categorized into three principal domains: cellular component (CC), molecular function (MF), and biological process (BP).

Functional characterization and scanning of piperine-related genes using KEGG

The KEGG pathway was used to identify the functional profiles of genes. Pathway analysis was carried out for all samples using the KEGG automatic annotation server. All CDS were compared with the KEGG database using BLASTx with a threshold bit score value of 60 (default). In this study, a total of 2097 CDS (3.5%) were assigned to the metabolism category followed by other pathways for P. longum (Fig. 5). Among the CDS count, 113 (0.2%), 290 (0.5%), and 103 (0.2%) were involved in the biosynthesis of other secondary metabolites, amino acid metabolism, and the metabolism of terpenoids and polyketides, respectively. These data provided a valuable resource that could be used to study specific processes and pathways in the development of P. longum. Few genes such as genes related to lysine biosynthesis and amino acid biosynthesis piperidine pathway were identified. A total of 14 CDS were identified which are related to tropane, piperidine and pyridine alkaloid biosynthesis [PATH: ko00960] and they would further provide valuable resources for further research.

Identification of candidate genes involved in the piperine biosynthesis pathway

A total of 4730 CDS counts were assigned to 377 different pathways in the KEGG database. The biosynthesis of secondary metabolites was classified into 28 sections including phenylpropanoid biosynthesis (61 CDS), flavonoid biosynthesis (17 CDS), tropane, piperidine, and pyridine alkaloid biosynthesis (14 CDS), an isoquinoline alkaloid biosynthesis (11 CDS), Stilbenoid, diarylheptanoid and gingerol biosynthesis (9 CDS), streptomycin biosynthesis (8 CDS) and biosynthesis of other secondary metabolites (Fig. 6). A higher proportion of CDS belonged to the pathway of translation, signal transduction, and carbohydrate metabolism. The majority of CDS were classified into metabolism, and the number of CDS was related to different secondary metabolisms. To explore the regulatory mechanisms for the accumulation of piperine in P. longum, the expression profile of genes involved in piperine biosynthesis was analyzed (Fig. 7). A total of 14 expressed CDS encoding tropane, piperidine, and pyridine alkaloid biosynthesis enzymes were identified in P. longum (Supplementary Table S3).

Fig. 5
figure 5

KEGG functional pathway of Piper longum. The Y axis represents the number of CDS whereas the X axis represents the KEGG functional pathway categories.

Fig. 6
figure 6

Classification based on categories of secondary metabolite biosynthesis. The Y axis represents the different biosynthetic pathways whereas the X axis represents the number of CDS.

Fig. 7
figure 7

The pictorial representation of the proposed piperine biosynthesis pathway based on prior studies, incorporating the CDS found in P. longum.

Piperoyl CoA and piperidine are the two precursors of piperine using acyltransferase. However, piperine is related to three main groups: Group A: Gene associated with phenylpropanoid pathway (map00940). In this pathway, cinnamoyl-CoA is developed for Piperoyl CoA biosynthesis. Group B: Genes related to L-lysine metabolism (map01064). In this pathway, lysine is converted into piperidine using a series of reactions. Group C: Genes allied with acyltransferase, which perform catalytic activity between Piperoyl coenzyme A and piperidine. The spike of plant shows the importance of arogenate dehydratase (ADT), aminotransferase (PPA-AT), p-coumarate 3-hydroxylase (C3H), cinnamate 4-hydroxylase (C4H/CYP7) HCT, caffeic acid-3-O-methyltransferase (CAOMT), p-coumarate-CoA-ligase (4CL). All of these genes are involved in the reaction of phenylpropanoids to Piperoyl CoA. The concern transcripts associated with different enzymes and compounds are shown in (Table 4). Finally, piperine synthesis and its related genes were observed and expanded in multiple gene families.

Table 4 Transcripts associated with different enzymes and compounds.

Transcription factors analysis (TFs)

Transcription factors play an important role in gene expression, plant metabolism, and biotic and abiotic stress management. Transcription factor-encoding transcripts were analyzed by CDS comparison to know transcription factor gene families. These factor families are known to manage secondary metabolism by playing an important role in the regulation of piperine biosynthesis. The TFs family plays a critical role in interactions with other molecules. In total, 21,235 transcription factor genes were distributed in approx. 65 families and the most abundant families identified were bHLH, NAC, and MYB related which consist of 2264, 1508, and 1370 CDS, respectively. During the annotation analysis of the transcriptome data, we identified several TFs belonging to different families (Fig. 8) that might be involved in the regulation of gene expression during flower and fruit development. Transcripts were found to be putative transcription factor-encoding regions that do not belong to any particular transcription factor family. These families were known to regulate secondary metabolism and play an important role in piperine biosynthesis. C3H transcription factors belong to the zinc finger motif family that plays an important role in interactions with other molecules.

Fig. 8
figure 8

Distribution of different transcription factors in CDS of Piper longum. Y axis represents the transcription factor families and X axis represents the number of transcription factors.

A total of 8041 SSRs were found in the assembled transcripts of P. longum (Table 5). Out of 8041 SSRs, 2619 SSRs were filtered to satisfy the 150 bp flanking criteria for P. longum. In these SSRs, the trinucleotide repeat motifs were the most abundant, accounting for 4506 SSRs (56.04%), followed by 3209 (39.91%) dinucleotide repeat motifs, 216 (2.69%) tetranucleotide repeat motifs, 79 (0.98%) pentanucleotide repeat motifs, and 31 (0.38%) hexanucleotide repeat motifs (Supplementary Table S4).

Table 5 SSRs identification statistics in Piper longum.

qRT-PCR validation of selected genes

To validate the Illumina sequencing expression profiles 26 genes were randomly selected including GAPDH as a housekeeping gene (Supplementary Tables S5 and S6) to confirm the expression level by quantitative real-time PCR (qRT-PCR) (Fig. 9). The figure shows the expression profile of the spikes. Genes including spindle checkpoint protein, kinesin-like protein, cyclin-b2-4, protein arginine n-methyltransferase, histone-lysine n-methyltransferase, aspartokinase/ homoserine dehydrogenase, 4-hydroxy-tetrahydrodipicolinate synthase, syntaxin-related protein, mitotic spindle checkpoint protein, G2/mitotic- specific cyclin and protein CHUP1 were down-regulated in the tissue of spikes. This suggests a differential regulation of these genes in spike tissues, providing valuable insights into the molecular dynamics and potential functional roles within this specific plant organ.

Fig. 9
figure 9

Validation of selected genes involved in piperine biosynthesis in P. longum (spike) through qRT-PCR. The Y-axis depicts the log2 fold change of selected differentially expressed genes and the X-axis depicts the genes from the transcriptome.

Discussion

The current study is the first to conduct a complete transcriptome investigation of Piper longum’s leaves, roots, and spikes, using next-generation sequencing to elucidate the molecular mechanisms underlying piperine manufacturing. This study finds important potential genes, pathways, and transcription factors involved in the manufacture of piperine and other secondary metabolites by producing and annotating 173,381 transcripts. It also displays the differential accumulation of piperine, with the spike tissue exhibiting the highest concentration, and exposes tissue-specific gene expression profiles. The study offers new insights into lysine metabolism, phenylpropanoid pathways, and regulatory mechanisms by combining functional annotation, KEGG pathway analysis, and SSR marker discovery in a unique method. In P. longum, this unparalleled genetic resource lays the groundwork for cutting-edge molecular research and biotechnological uses.

In nature, approximately 12,000 alkaloids have been identified, with many more awaiting discoveries, indicating the vast unexplored diversity of plant secondary metabolites21. Transcriptome databases of specific plant species play a crucial role in uncovering biosynthetic genes encoding enzymes with valuable catalytic activities. These often reveal variants with unique biochemical features that remain underutilized in biotechnology22. Despite the extensive medicinal applications of Piper longum and its primary bioactive compound, piperine, there is a noticeable gap in the genetic and molecular understanding of its biosynthetic pathway23,24,25.

Our study sought to address this gap by generating a high-quality transcriptome dataset from P. longum leaves, roots, and spikes using Illumina TruSeq RNA, producing 99 million high-quality reads. Analysis of these reads yielded 173,381 transcripts and 8,041 SSRs, providing a significant resource for understanding the plant’s genetic framework. This study identified tissue-specific expression, with 2,741 CDS unique to leaves and 1,595 CDS exclusive to roots, while a comparative analysis of leaves and spikes revealed 821 and 1,033 CDS, respectively. These findings underscore the functional differentiation among tissues and offer insights into the specific roles of genes involved in secondary metabolism26,27,28,29. GO analysis revealed 28 transcripts associated with secondary metabolite pathways, particularly piperine biosynthesis, including genes involved in lysine metabolism. This aligns with the limited existing knowledge on lysine-derived alkaloids, where lysine decarboxylation produces cadaverine, which is subsequently converted into piperidine, a key precursor of piperine30,31,32,33. Our findings also identified CDS related to enzymes integral to these steps, highlighting variations in their expression profiles across tissues.

This approach is used for identifying novel genes in secondary metabolite biosynthesis31,34,35. In this research, a comparative study of piperine biosynthesis related to tropane, piperidine, and pyridine alkaloid biosynthesis pathways was addressed, and 15 functional genes were reported. Our analysis provides the first accurate and comprehensive genetic information for P. longum. By using KEGG mapping of the best hits CDS, we have identified many CDS involved in different processes, including genetic information processing, metabolism, cellular processes, environmental information processing, and organizational systems. In the future, these CDS will be important resources for genetic manipulation of P. longum. The metabolic pathway of piperine is poorly understood, and little information is available on the biosynthesis of tropane, piperidine, and pyridine alkaloids in plants14,29,30.

As a lysine-derived alkaloid, piperine includes lysine decarboxylation as the first step in alkaloid biosynthesis, in which cadaverine is produced32, and cadaverine is then used as a substance to synthesize piperidine33. The metabolic pathway of piperine, while not yet fully elucidated, shares similarities with other alkaloid biosynthetic pathways, such as tropane, piperidine, and pyridine alkaloids. Notably, N-heterocycle piperidine and thioester piperoyl-CoA are hypothesized to be pivotal intermediates in piperine synthesis36,37,38. Computational analysis of related compounds like coumaroyl agmatine and feruloyltyramine provided additional insights, with earlier studies suggesting that piperine biosynthesis might involve analogous enzymatic reactions reported in P. nigrum39,40,41.

In this study, BLASTn analysis identified 14 candidate genes across various tissues involved in piperine biosynthesis, including piperic acid coenzyme A ligase and CYP719A37, which are known to catalyze critical steps in the pathway42. In an earlier study, piperic acid coenzyme A ligase and CYP719A37 catalyzes the formation of Methylenedioxy Bridge in piperine biosynthesis reported from Piper nigrum (MN729603.1 and MT643912.1). After performing BLASTn (with e-value cutoff 0.00001) with generated de-novo assembly 13 genes were found from different parts of the plant in the present study. Further after comparing with complete CDS of piperine synthase and piperamide synthase (MW354956.1 and MW354957) 6 additional genes were discussed in supplementary Table S7. In the derived piperine biosynthetic pathway, 11 genes might be responsible for 4-coumarate-ligase, 4 genes for putative CYP719, and 6 genes for hypothetical protein for piperoyl- Co A. This study not only advances knowledge of piperine biosynthesis but also highlights the untapped potential for genetic manipulation to enhance the production of this pharmaceutically important compound. By uncovering the molecular basis of piperine biosynthesis, this research lays the groundwork for future biotechnological innovations aimed at optimizing piperine yield in P. longum, fostering its medicinal and economic significance.

Conclusion

This article provides an in-depth study of the transcriptome of Piper longum (commonly known as Pippali and belonging to the Piperaceae family) to elucidate the molecular mechanisms involved in piperine biosynthesis across various tissues. Next Generation Sequencing was implemented to acquire reads of superior quality from the leaves, roots, and spikes of P. longum. The assembly of these reads yielded a total of 173,381 transcripts. These transcripts were subsequently subjected to further analysis to identify gene families associated with the biosynthesis of piperine and other secondary metabolites. The functional annotation and classification of the predicted coding sequences indicated the participation of diverse gene families in metabolic processes, cellular functions, and molecular activities.

The utilization of KEGG for pathway analysis facilitated the identification of prospective pathways linked to the biosynthesis of piperine, with a specific focus on lysine metabolism and acyltransferase. The investigation additionally identified SSR markers and discerned transcription factors that potentially influence the regulation of gene expression and secondary metabolism in the plant. Furthermore, the determination of piperine levels through the utilization of HPLC demonstrated that the spike tissue exhibited the most elevated concentration, succeeded by the roots and leaves, thereby suggesting the occurrence of piperine accumulation in a tissue-specific manner.

This study offers significant contributions to our understanding of the genetic mechanisms underlying piperine biosynthesis in P. longum. The genes, pathways, and transcription factors that have been identified present intriguing possibilities for further investigation in the fields of functional genomics and molecular genetics. The results of this study enhance our comprehension of the biosynthetic pathways and regulatory mechanisms associated with the production of piperine. The functional characterization of identified genes involved in piperine biosynthesis should be the focus of future research, with the use of sophisticated genome-editing technologies to increase piperine output. Additionally, Piper longum conservation and optimal therapeutic usage can be ensured by investigating regulatory processes through transcription factor research and creating sustainable growing practices.

Materials and methods

Plant materials

The rooted cuttings of the Piper longum plants were procured from the National Bureau of Plant Genetic Resources, Thrissur, Kerela, and were cultivated in the net house of DEI Herbal Garden of the Department of Botany, Dayalbagh Educational Institute, Agra (Uttar Pradesh), India. Young leaves, roots, and mature spikes were collected in November from four-years-old plants with the help of a sterilized blade. All samples were immediately frozen in liquid nitrogen for further use.

Instrumentation and chromatographic conditions

The Waters 600E HPLC system, which was utilized for the analysis, included a manual sample injector with an injection volume of 5 to 20 µL, a UV-Vis detector, a solvent delivery module, an online degasser, low-pressure gradient pump, and a system controller. Chromatographic separation was carried out using a Waters Symmetric C-18 column (4.6 mm x 250 mm, 5 μm) with a PDA detector. The mobile phase consisted of a 0.01 M acetate buffer with a pH of 5, combined with ACN in a ratio of 30:70 v/v. The samples were analyzed at a flow rate of 1 mL/min, with the detection wavelength set at 360 nm. For every analysis, a 20 µL sample was injected into the column. To prepare the mobile phase gradient, a 0.01 M acetate buffer (pH 5) and ACN were carefully filtered through a 0.22 μm nylon membrane. The solvent was then degassed in a sonicator before being utilized in the HPLC process.

Preparation of standard and sample solution

The ethanol extract derived from the P. longum leaves, roots, and spikes was produced by subjecting the mixture to reflux at 40ºC for 24 h. The samples were evaporated to dryness at 60 °C under reduced pressure using a rotary vacuum evaporator and were stored at 4 °C for future use. The sample solution was dissolved in HPLC-grade methanol at a concentration of 1 mg/ml to get a working solution. The extract was filtered using a 0.22 μm membrane. A stock solution of standard piperine (1 mg/mL) was prepared by dissolving accurately weighed 10 mg of piperine in 10 ml of HPLC-grade methanol. By dilution of the stock solution with the mobile phase, more working standard solutions were prepared.

RNA isolation

The TRIzol (Invitrogen) method was used to extract total RNA from roots, leaves, and fruits (three replicates). Complete cell lysis was ensured by homogenizing approximately 100 mg of tissue in 1 ml of TRIzol reagent. Nucleoprotein complexes were dissociated by incubating the homogenate for 5 min at room temperature. To the homogenate, 200 µl of chloroform was added, mixed thoroughly by vigorous shaking, and incubated for 3 min. The mixture was centrifuged at 12,000 × g for 15 min at 4 °C to separate it into aqueous and organic phases. Once the RNA-containing aqueous phase had been carefully moved to a fresh tube, 500 µl of isopropanol was added to precipitate the RNA, and it was centrifuged at 12,000 × g for 10 min at 4 °C. The resultant RNA pellet was washed with 1 ml of 75% ethanol, allowed to air dry for a short while, and then dissolved in 30–50 µl of RNase-free water. The quality of the total RNA was assessed through a 1% formaldehyde denaturing agarose gel, while the quantity was determined using the “Nanodrop 8000 spectrophotometer” (Thermo).

De novo assembly and sequencing

4 µg of total RNA was pooled for sequencing and the preparation was multiplexed on a single flow cell of an Illumina TruSeq. The final library was denatured, and the appropriate working concentration was loaded into the flow cell. Briefly, double-stranded cDNA was subjected to Covaris shearing followed by end-repair fragments of overhangs. The fragments were A-tailed, adapter-ligated, and then enriched by a limited number of PCR cycles. The prepared library was loaded onto the Illumina platform for cluster generation and sequencing was performed using a paired-end (PE) 2 × 150 bp library on the Illumina platform (Illumina TruSeq RNA 500)43. Subsequently, all assembled transcripts were validated using CLS Genomics Workbench 6.0 by plotting high-quality reads back to the assembled transcripts. All CDS (protein-coding sequences) were predicted from assembled transcripts using the Trans decoder (rel16JAN2014) with default parameters. The completeness of the P. longum transcriptome was evaluated using BUSCO analysis. The assessment was conducted against the Viridiplantae_odb10 database (updated on 2024–12–05) with default parameters applied.

Functional annotation and gene ontology (GO) analysis

The functional annotations were performed on the predicted CDS of the plant sample by aligning the CDS to the non-redundant protein database of NCBI using Basic Local Alignment Search Tool (BLASTx)44 with a minimum E-value less than 1e-5. BLASTx 2.2.30 + software was used for functional annotation detection standalone. The total CDS was found to have BLASTx hit against the ‘NR database of NCBI’. For annotating the transcript tags, the best match output of each BLAST was used. After each BLAST search, annotation tags with no matches and ones with predicted annotations were extracted for the next sequential BLAST search. The GO assignment was used to classify the functions of the predicted CDS. The GO mapping also provided ontology of defined terms representing gene product properties which were grouped into three main categories: molecular function (MF), biological process (BP), and cellular component (CC)45. The GO mapping was carried out using Blast2GO Pro software46 to retrieve GO terms for all BLASTx functionally annotated CDS, which included the use of BLASTx result accession IDs to retrieve gene names or symbols.

KAAS (KEGG Automatic Annotation Server - http://www.genome.jp/kegg/ko.html) was used to functionally annotate the CDS by BLAST comparisons against the Kyoto Encyclopedia of Genes and Genomes (KEGG) gene database47,48,49. The BBH (Bi-directional best hit) option was used to assign KEGG orthology (KO) terms. The KEGG Orthology database was used for pathway mapping50.

Transcription factors analysis

For the identification of transcription factor families represented in the transcriptome, all predicted CDS were searched against all transcription factor protein sequences in the Plant Transcription Factor Database (PlnTFDB; http://planttfdb.cbi.pku.edu.cn/download.php) using BLASTx with e-value less than 1e-5.

For identification of SSRs, all assembled transcripts were searched with Perl Script MISA. SSRs generated from transcriptome sequences of leaves, roots, and spikes of the same plants were used in the Microsatellite program (MISA). Scrutiny contains microsatellites from di-nucleotide to hexa-nucleotide. Mononucleotide repeat motifs were not considered in the analysis because of the chances of homopolymer tailing in the generated ESTs. SSRs having a flanking of 150 bp were filtered and used for primer design51. The analysis of SSRs can serve as a foundation for conducting further investigations into the genetic diversity of P. longum and its closely related species.

Quantitative gene expression analysis

Total RNA was isolated from the samples using the PureLink® RNA Mini Kit (Invitrogen, Thermo Fisher Scientific) and the quality of the RNA was checked. The cDNA synthesis of each RNA sample was carried out using the SuperScript® III first-strand synthesis kit for RT-PCR (Invitrogen, US). Expression in spikes was analyzed using the Real-Time PCR Detection Machine (Stratagene Mx3005P, Agilent Technologies) and the KAPA SYBR® FAST qPCR Master Mix (2X) kit (KAPA BIOSYSTEMS, US). A control reaction was also included for each set of primers that did not have a template. All qRT-PCRs were performed using the following conditions: 95 °C for 10 min followed by 40 cycles of 95 °C for 30 s, 55 °C for 30 s, 72 °C for 30 s, and melting curve 95 °C for 1 min, 55 °C for 30 s and 95 °C for 30 s. The GAPDH gene was used as an internal control to estimate the relative transcript level of the genes studied52. Data from qRT-PCR was analyzed using comparative ∆∆Ct and fold changes in the transcript level were calculated using the 2−∆∆Ct method53,54 considering the Ct value of GAPDH as the internal control. To apply the ∆∆Ct method in the qRT-PCR analysis, the Ct value of the internal control gene was subtracted from the Ct value of the target gene to determine the ∆Ct for each sample. The ∆Ct of the target gene was then subtracted from the ∆Ct of the control sample to find the ∆∆Ct. The fold change in gene expression was determined by calculating relative expression levels using the formula 2−∆∆Ct. Each experiment was repeated using three biological replicates and three technical replicates, and the data was statistically analyzed (± Standard Deviation).