Abstract
Jakyakgamcho-tang (JGT), the simplest form of herbal medicine, comprises Paeoniae Radix (PR) and Glycyrrhizae Radix et Rhizoma (GR). It has been used to treat muscle-related diseases and inflammation. However, its pharmacological effects may vary with the proportions of ingredients and preparatory factors such as the extraction method. Nevertheless, gene expression datasets systematically reflecting these variables are lacking. A total of 513 transcriptome profiles were created with three concentrations and three replicates of RNA-seq data. This dataset structure will enable multidimensional analysis of the effects of various JGT preparation factors on gene expression; these factors include the PR to GR proportional ratio (2:1, 1:1, and 1:2), solvent (water or 70% ethanol), and extraction method (combined or individual extraction method). The HepG2, C2C12, and PC12 cell lines were targeted. All raw and preprocessed data are available through GEO. Standardized metadata and ingredient data are also provided. This dataset provides a foundation for exploring traditional herbal formulations effects on cellular transcriptomic responses and can facilitate the scientific optimization of herbal medicines.
Similar content being viewed by others
Background & Summary
Jakyakgamcho-tang (JGT), a traditional East Asian herbal medicine composed of Paeoniae Radix (PR) and Glycyrrhizae Radix et Rhizoma (GR), has long been used to treat muscle spasms and pain1,2. Recent studies have extended its indications to include muscular atrophy3, cognitive impairment4, and inflammation5, this makes it a candidate for multi-targeted therapeutic applications. To investigate the organ systems that are directly or indirectly associated with the pathophysiology of the muscle spasms and pain targeted by JGT6,7, we generated a multidimensional transcriptomic dataset of JGT treatment in C2C12, HepG2, and PC12 cells. This dataset systematically captures transcriptional responses across (1) different extraction solvents, (2) PR:GR mixing ratios, and (3) extraction methods, thereby providing a comprehensive molecular landscape of JGT’s biological activity.
Herbal medicine is influenced by the proportions of its constituent herbs, which is an important but underexplored relationship. Herbal formulas with high PR content have shown different therapeutic effects, such as a shift in indications. For instance, Gyeji-tang is used to treat common colds8, but Gyeji-ga-jakyak-tang, which contains twice as much PR, is used to treat stomachache9. The extraction method also influences herbal efficacy. The combined extraction method (CEM; mix before extraction) allows for chemical interactions between herbs during co-boiling (or sonication), and this may alter bioactive compound availability10. In contrast, the individual extraction method (IEM; extraction before mix) enhances standardization and preserves volatile compounds by extracting herbs individually before mixing11. Therefore, the pharmacological effects of herbal medicine may vary with the proportions of constituents and the extraction method. Pharmacological data incorporating the proportions of ingredients and the extraction method under the same conditions are needed.
With the growing interest in the systematic investigation of drug mechanisms of action, drug-induced transcriptomics has emerged as a scalable and informative approach. The Connectivity Map (CMap)12 was one of the early large-scale initiatives that lay the foundation for data-driven pharmacology by linking small molecules to gene expression changes. This was advanced by the LINCS L1000 project13, which increased both the scale and diversity of perturbagen-induced transcriptome data by profiling thousands of compounds across various cell lines and treatment conditions. These resources have enabled comprehensive comparisons of compound-induced transcriptional responses and facilitated the development of computational methods for inferring relationships among drugs, genes, and diseases14,15. CMap and LINCS primarily focused on FDA-approved and experimental small molecules, but recent efforts have aimed to establish transcriptomic datasets tailored to context-specific or culturally relevant substances, such as traditional herbal medicines16. The HERB database systematic curated more than 6,000 herb- or ingredient-induced expression profiles and cross-referenced them with targets, diseases, and FDA-approved drugs to provide a ready interface between ethnopharmacology and modern systems biology17. More recently, the KORE-Map dataset compiled 1,200 perturbome signatures generated under standardized-only IEM protocols, including both hot water and 70% ethanol solvents, and standardized dosing conditions for tonifying prescriptions18. This dataset provides pathway- and network-level annotations in a reusable format.
However, these specialized resources do not provide data that allow for comparative analyses of differences in herbal constituent proportions or CEM and IEM, leaving an important gap in data-driven herbal medicine research. To address this gap, this study generated a transcriptome matrix containing a total of 513 RNA-seq data. The data were collated in an orthogonal design across three cell lines with three PR to GR ratios (2:1, 1:1, and 1:2), two solvents (water and 70% ethanol), and two extraction methods (CEM and IEM). The dataset was designed based on the rigorous methodologies of existing large-scale data resources, and raw and preprocessed data are publicly available via GEO. The purpose is to provide the first public dataset that allows quantitative analysis of the mechanisms of proportion- and extraction method-specific effects of traditional two-herbal combinations.
Methods
Selection of herbs and experimental material
Pharmacological transcriptome data of JGT are widely used in clinical practice19. JGT consists of only two herbal ingredients, and it is suitable for generating transcriptome data with varied simple herbal medicine mixing ratios and extraction methods. The efficacy of herbal medicine may vary with the extraction solvent, mixing ratio, and extraction method. This study developed herbal formulas using (a) water and 70% ethanol solvents; (b) single herbs, herbs in a 2:1 ratio, herbs in a 1:1 ratio, and herbs in a 1:2 ratio; and (c) two extraction methods (CEM and IEM). Water extraction was chosen to reflect the classical decoction method commonly used in East Asian medicine, whereas 70% ethanol extraction was included to improve the recovery of both hydrophilic and lipophilic constituents, as supported by previous pharmacological studies20,21,22. According to the Korean Herbal Pharmacopoeia, JGT is standardized at a 1:1 ratio of PR to GR23. However, variations such as 2:1 or 1:2 have also been documented, with PR-rich formulations traditionally used for abdominal pain and muscle-related disorders and GR-rich formulations employed for anti-inflammatory and antispasmodic effects24,25,26. To capture these differences, we systematically included two solvents and three mixing ratios.
Preparation of herbs
Dried medicinal herbs used in the preparation of JGT, PR, and GR were procured from Kwangmyung-dang Medicinal Herbs Co. (Ulsan, Republic of Korea) in accordance with the Korean Pharmacopoeia. Organoleptic assessment was conducted by Dr. Goya Choi, a certified expert in herbal quality evaluation recognized by the Korea Food and Drug Administration. Botanical identification was confirmed via DNA barcoding, and voucher specimens were deposited at the Korean Herbarium of Standard Herbal Resources (KHSR) at the Herbal Medicine Resources Research Center, Korea Institute of Oriental Medicine (KIOM), Naju, Republic of Korea (Table 1). The detailed specimen records are accessible online (https://oasis.kiom.re.kr/herblib).
Preparation method of hot water and 70% ethanol extracts of herbs and JGT
Two preparation approaches were employed to investigate the effects of the constituent proportion, extraction solvent, and extract preparation method on the composition of JGT: (a) CEM (mix to extraction): combined extraction of ingredients after mixing of herbs; and (b) IEM (extraction to mix): extraction of individual ingredients followed by their mixing.
CEM
Five different ratios of Paeoniae Radix to Glycyrrhizae Radix et Rhizoma (0:3, 1:2, 1:1, 2:1, and 3:0, w/w) were used in blending the powdered raw herbs. A total of 900 g of each mixture was extracted using two different solvents: (a) hot water extraction (10 volumes of distilled water [1:10, w/v]) via reflux at 100 ± 2 °C for 3 hours with a reflux extraction system (MS-DM609; MTOPS, Seoul, South Korea); and (b) ethanol extraction (70% ethanol [1:4, w/v]) via ultrasonic treatment for 1 hour, followed by a second 1-hour cycle with fresh solvent (total extraction time: 2 hours) with an ultrasonication system (VCP-20, Lab Companion, Daejeon, South Korea). All samples were filtered through a 53-μm mesh filter after extraction. The filtrates were concentrated under reduced pressure at 60 °C using a rotary evaporator (Ev-1020, SciLab, Seoul, South Korea) and lyophilized using a freeze dryer (LP-20, Ilshin-Bio-Base, Dongducheon, South Korea). The dried extracts were homogenized using a mortar to ensure uniformity. The extraction yields for each combination are presented in Table 2.
IEM
The lyophilized single-herb extracts extracted separately as described above were mixed in fixed ratios (1:2, 1:1, and 2:1, w/w) based on their extraction yields (Table 3). All extracts were stored at 4 °C. For in vitro use, 100 mg of the extract was dissolved in 10 mL of phosphate buffered saline (PBS; Gibco, Thermo Fisher Scientific, Waltham, MA, USA) containing 2% dimethyl sulfoxide (DMSO; Sigma-Aldrich, St. Louis, MO, USA). The mixture was vortexed for 30 min, sterilized through a 0.22-μm Minisart RC syringe filter (Sartorius, Göttingen, Germany), and stored at −80 °C until use. The 10 mg/mL stock solution was prepared before use.
Quantitative analysis of major compounds by high-performance liquid chromatography
The major bioactive constituents in JGT extracts were quantified using high-performance liquid chromatography. Chromatographic separation was performed using a Waters e2695 separation module equipped with a 2998 photodiode array detector (Waters Corp., Milford, MA, USA) in an INNO C18(2) column (250 × 4.6 mm, i.d. 5 µm; Youngjin Biochrom, Seongnam, Republic of Korea) maintained at 40 °C. The sample compartment was kept at 25 °C, and the injection volume was 20 μL. The flow rate was 1.0 mL/min. The mobile phases consisted of (a) 0.5% acetic acid in water and (b) 0.5% acetic acid in acetonitrile. Gradient elution was programmed as follows: 0–10 min, 95% A; 10–25 min, 95 to 80% A; 25–35 min, 80% A; 35–45 min, 80 to 70% A; 45–55 min, 70% A; 55–70 min, 70 to 55% A; 70–85 min, 55% A; and 85–95 min, 55 to 45% A. All samples were prepared by dissolving lyophilized extracts in 80% methanol and filtering them through a 0.45 μm syringe filter before injection. Each sample was analyzed in triplicate. The quantitative results (mean ± SD, μg/mL) for each herbal mixing ratio (0:3, 1:2, 1:1, 2:1, 3:0, w/w) and solvent (water or 70% ethanol) are summarized in Tables 4, 5. This quantitative HPLC analysis was performed to confirm the presence and relative levels of major marker compounds under different PR:GR ratios and solvent conditions, providing chemical reference data to support the interpretation of transcriptomic responses.
Cell culture and differentiation
Three cell lines were used in this study: HepG2 (a human hepatocellular carcinoma cell line), PC12 (a rat adrenal medulla-derived pheochromocytoma cell line), and C2C12 (a mouse skeletal muscle-derived myoblast cell line). These cell types were selected to represent the hepatic, neuronal, and muscular systems that are directly or indirectly associated with the pathophysiology of muscle spasms. All cell lines were obtained from the American Type Culture Collection (ATCC, Manassas, VA, USA) and confirmed to be free of mycoplasma contamination. Basal medium and supplements used for cell culture were purchased from Gibco, Thermo Fisher Scientific. The cell line information and culture conditions are summarized in Table 6. The HepG2 cells (ATCC HB-8065) were maintained in Dulbecco’s Modified Eagle’s Medium (DMEM) supplemented with 10% fetal bovine serum (FBS) and 1% penicillin-streptomycin (P/S). Cultures were incubated at 37 °C in a humidified atmosphere with 5% CO2. The PC12 cells (ATCC CRL-1721) were cultured on 100 mm collagen I-coated culture dishes (Coning BioCoat, Corning Inc., NY, USA) in DMEM containing 10% heat-inactivated horse serum (HS), 5% non-heat-inactivated FBS, and 1% P/S. For neuronal differentiation, cells were seeded onto collagen IV-coated multi-well plates (Coning BioCoat) and induced using differentiation medium (DM) composed of DMEM, 0.5% FBS, 100 ng/mL nerve growth factor (NGF; R&D Systems, Minneapolis, MN, USA), 1% N2 supplement (R&D Systems), and 1% P/S. Media were partially exchanged on days 3 and 6 of differentiation, and neuronal morphology was monitored under phase-contrast microscopy (IX71, Olympus Corporation, Tokyo, Japan). Differentiated PC12 cells on day 7 were treated with various drugs for testing. C2C12 myoblasts (ATCC CRL-1772) were cultured in DMEM supplemented with 10% FBS and 1% P/S. Upon reaching 80–90% confluence, cells were trypsinized and seeded into multi-well culture plates (Nunc, Thermo Fisher Scientific, Waltham, MA, USA). After 24 hours, the growth medium was replaced with differentiation medium composed of DMEM supplemented with 2% heat-inactivated horse serum and 1% P/S. The media were refreshed every 2 days to promote myotube formation, and differentiation was confirmed by morphological changes characteristic of myotube formation. Differentiated C2C12 cells on day 5 were treated with various drugs for testing. The ATCC does not provide the exact passage number at distribution for PC12 and C2C12 cells; therefore, thawing was designated as passage 1. Frozen vials were prepared at passage 10 for PC12 cells and passage 8 for C2C12 cells. For HepG2 cells, the cumulative passage number had reached 81 at the time of freezing. For all three cell lines, experiments were conducted using passages 3–10 after thawing.
Drug treatment and total RNA preparation for RNA sequencing (RNA-seq) analysis
The cells were treated after IC20 determination using the WST-8 cell viability assay (Biomax, Guri, Republic of Korea). Each JGT extract prepared using hot water (JGW) or 70% ethanol (JGE) was tested at a maximum concentration of 500 μg/mL on HepG2, PC12, and C2C12 cells. The IC20 values for each cell line and extract were calculated based on dose-response curves, and the mean ± SD for biological triplicates is reported. The results of the IC20 determination are summarized in Table 7. The extract concentrations used for mRNA sequencing analysis were selected based on these results. The IC20 was chosen as the reference concentration to minimize cytotoxicity while ensuring sufficient transcriptional perturbation. This sub-cytotoxic level (>80% viability) enables the identification of pharmacologically relevant gene expression changes without the confounding effects from extensive cell death. Concentrations of 100, 20, and 4 μg/mL were used for PC12 cells, while 500, 100, and 20 μg/mL were used for the C2C12 and HepG2 cells. These concentrations were not associated with cytotoxicity. The cells were seeded into 6-well plates at a predetermined density for an overnight culture (HepG2) or until differentiation (PC12 and C2C12) and treated with three serial concentrations of the extracts for 24 hours. At the time of RNA collection, HepG2 cells typically reached ~70–80% confluency, PC12 cells displayed ~60–70% confluency with extensive neurite outgrowth, and C2C12 cells were ~90–100% confluent with robust myotube formation.
Treatment solutions were prepared from 20 × extract stock solutions dissolved in 2% DMSO/PBS, and a final DMSO concentration of 0.1% was maintained. In addition, a vehicle control group (2% DMSO/PBS) was included in all experiments to account for solvent-related effects. The cells were washed twice with 3 mL of ice-cold PBS after treatment, and total RNA was extracted using 1 mL QIAzol Lysis Reagent (Qiagen, Hilden, Germany) following the manufacturer’s guidelines. All samples were collected in triplicate. To validate RNA-seq performance and provide a reference for a transcriptional response, each cell line was treated concurrently with positive control compounds known to induce well-characterized transcriptional changes relevant to their respective tissue or phenotype. All positive control compounds were solubilized in 100% DMSO and diluted to working concentrations in PBS immediately before application. The treatment duration and RNA extraction followed the same protocol used for JGT extract treatment. The treated concentrations for each compound were determined based on reference and cytotoxicity assays. Information on the positive control drugs is summarized in Table 8.
RNA-seq data generation and preprocessing
More than 500 ng of total RNA were extracted from each sample. RNA sequencing libraries were prepared using the MGIEasy RNA Directional Library Prep Kit (MGI Tech Co., Ltd., China) following the manufacturer’s instructions. Library concentrations were quantified using the QuantiFluor ONE dsDNA System (Promega Corporation, WI, USA). DNA nanoballs (DNBs) for sequencing were generated using DNB enzyme, and the libraries were quantified using the QuantiFluor ssDNA system (Promega Corporation, WI, USA). Paired-end sequencing (100 bp read length) was performed on the MGISeq system (MGI Tech Co., Ltd., China). The quality of the raw RNA-seq reads was assessed using FastQC (v. 0.11.9). Common MGISEQ adapter sequences were trimmed using TrimGalore (v. 0.6.6) to remove adapter contamination. High-quality reads were mapped to the respective reference genomes —HepG2 samples to hg38, C2C12 to mm10, and PC12 to Rn6—using STAR Aligner (v2.7.3a)27. The mapped reads were quantified for expression values for each gene using RSEM (v.1.3.3.)28. The raw sequence data (FASTQ files) and preprocessed expression matrix for each gene were deposited in Gene Expression Omnibus (GEO) under the accession numbers GSE29906329, GSE29772630, GSE29506931,32, GSE22749432,33, GSE29841434, and GSE28992935 (Table 9).
Comparisons with external drug-induced transcriptomic profiles
To assess the reproducibility of our RNA-seq data, we compared drug-induced transcriptomic changes in HepG2 cells, the only cell line used in this study (C2C12, PC12, and HepG2) that is included in the CMap database13. Corresponding CMap L1000 profiles were obtained for three positive control compounds—trichostatin A, wortmannin, and vorinostat—from the Clue.io platform (clue.io/data/CMap2020#LINCS2020, the level5_beta_trt_cp_n720216 × 12328.gctx file). The CMap L1000 dataset provides replicate-collapsed and quality-controlled moderated z-scores representing normalized gene expression changes upon drug-induced perturbations. To align with our experimental conditions, we selected signatures from HepG2 cells treated for 24 hours with trichostatin A (≤1 μM), wortmannin (10 μM), and vorinostat (10 μM). Only high-quality profiles that met the criteria distil_cc_q75 > 0.5 and pct_self_rank_q25 < 0.05 were retained. This resulted in 3 profiles for trichostatin A, 19 profiles for wortmannin, and 20 profiles for vorinostat. Data retrieval and filtering were performed using the CMapR R package (v1.8.0). To enable a cross-platform comparison between RNA-seq and L1000 assay platforms, we transformed gene-level signatures into pathway-level activity scores by performing a Gene Set Enrichment Analysis (GSEA)36 as described in a previous study37,38,39. Genes were ranked according to drug-induced expression changes: for RNA-seq data, by the DESeq. 2 Wald statistic and for CMap data, by the level-5 MODZ z-scores. A unified ranking metric was not imposed because RNA-seq (count-based, negative binomial modeling) and CMap L1000 (z-score standardized expression) differ fundamentally in the measurement scale and statistical distribution. Enrichment was calculated for 2,229 curated gene sets from the Molecular Signatures Database (MSigDB), including Hallmark, KEGG, REACTOME, Biocarta, PID, and WikiPathways collections. For each pathway \(g\), the pathway enrichment score (PES) was quantified as
where \({{NES}}_{g}\) and \({p}_{g}\) denote the normalized enrichment score and the nominal p-value from GSEA, respectively. A PES vector with a length of 2,229 was generated for each sample. Pearson correlation coefficients were then computed for comparisons between RNA-seq-derived and CMap-derived PES vectors to assess cross-dataset concordance. Comparisons were made under matched (same drug) and unmatched (different drugs) conditions. Matched correlations were defined as pairwise comparisons between PES vectors derived from the same compound (e.g., trichostatin A RNA-seq vs. trichostatin A L1000), whereas unmatched correlations were computed between different compounds among the three positives (e.g., trichostatin A RNA-seq vs. wortmannin L1000). When multiple L1000 profiles were available for a given compound, all pairwise correlations between its profiles and the corresponding RNA-seq signature were calculated and included in the matched or unmatched distribution. Differences between the matched and unmatched correlation distributions were evaluated by performing a two-sided Wilcoxon rank-sum test. The msigdbr and fgsea R packages were used to access the gene sets and implement GSEA. All analyses were performed using R software (v 4.2.1).
Data Records
The dataset in this study is available at the GEO under accession numbers GSE29906329, GSE29772630, GSE29506931,32, GSE22749432,33, GSE29841434, and GSE28992935. The corresponding accession links for each dataset can be found in Table 9. For each accession, both raw sequencing data (FASTQ files) and processed gene-level count matrices are provided, enabling full reprocessing as well as immediate downstream analyses. All GEO entries provide sample-level metadata as supplemental annotation files, including information on production methods, cell lines, dosage, as well as detailed descriptions of extraction methods, solvents, and combination ratios. The metadata are summarized in Supplementary Tables 1–3. The HPLC dataset generated in this study is publicly available via Figshare (https://doi.org/10.6084/m9.figshare.30962618)40. All raw and processed data, as well as associated metadata and calibration information, are provided to enable independent validation and reproducibility of the analyses. An overview of the standard operating procedure and conditions for generating the transcriptome data in this study is provided in Fig. 1.
Transcriptome data production protocols. A brief overview of standard operating procedure (SOP) and conditions for generating transcriptome data.
Technical Validation
To comprehensively present reliability, we validated the (i) input integrity and extract chemistry (botanical identity and HPLC profiling), (ii) sequence-level quality and mapping, (iii) internal reproducibility and cross-batch stability, and (iv) external benchmarking against independent drug-induced transcriptomic profiles.
Assessment of HPLC profiles of JGT
Quantitative HPLC analysis confirmed the systematic compositional differences depending on the PR:GR ratio and solvent (Tables 4, 5). PR-derived compounds (e.g., paeoniflorin, albiflorin) were increased proportionally in PR-rich formulations, whereas GR-derived compounds (e.g., glycyrrhizin, isoliquiritin) were predominant in GR-rich formulations. In addition, 70% ethanol extracts generally yielded higher concentrations of both PR- and GR-derived compounds compared with those in water extracts. Specifically, the PR-derived compound oxypaeoniflorin was extracted preferentially in the ethanol extract. These findings demonstrate that the chemical composition of JGT varies systematically with the ratio and solvent, providing an important context for interpreting the proportion- and solvent-specific transcriptomic profiles. These controlled, ratio- and solvent-dependent compositional differences establish well-defined chemical perturbations across conditions, which in turn contextualize the proportion- and solvent-specific transcriptomic responses described in the following sections.
Assessment of experimental reliability
The raw herbal materials were authenticated through DNA barcoding, and voucher specimens were deposited in the Korean Herbarium of Standard Herbal Resources. In addition, all cell-based experiments were performed based on independent biological triplicates, and positive control compounds with well-characterized transcriptional signatures were included to validate the responsiveness of each cell line.
Assessment of RNA quality and integrity
Comprehensive assessments of RNA purity and integrity were performed to ensure the suitability of the samples for downstream sequencing. Of the three cell lines, the stable HepG2 cells had the highest RNA purity and integrity. The C2C12 and PC12 cells, which undergo cellular differentiation processes, had lower values. The optical densities at 260 nm and 280 nm were measured using a Trinean Dropsense™ 96 microvolume spectrophotometer. The A260/280 ratio was used to estimate RNA purity, with values between 1.5 and 2.0 indicating relatively pure RNA. Most RNA samples had A260/280 ratios within this range, indicating minimal contamination by proteins or other impurities (Fig. 2a). The 28S/18S ribosomal RNA ratio (Fig. 2b) and RNA integrity number (RIN) were measured using an Agilent Bioanalyzer system. The average 28S to 18S rRNA ratio was 1.56, and the RIN was ≥ 7, indicating high RNA integrity. Some samples with RIN values lower than 7 were included in the analysis to ensure experimental reproducibility, despite being below the generally accepted threshold (Fig. 2c and Supplementary Table 4). These results demonstrated that the RNA samples were of suitable quality and integrity for subsequent RNA sequencing.
Quality evaluation of RNA samples used for sequencing. (a) The A260/A280 ratio for three cell lines. (b) The 28 s/18 s rRNA ratio and (c) RNA integrity number for individual samples, stratified by the three cell lines. The dotted line represents the widely accepted minimum value indicating high RNA quality and integrity.
Quality of RNA-seq data
The quality of raw RNA-seq data was assessed using FastQC (v0.11.9), which provides comprehensive metrics including per-read quality score distributions. As shown in Fig. 3a, the majority of reads revealed high average quality scores, with a sharp peak around a Phred score of 36. The overall shape of the distribution indicated that most reads maintained consistently high base quality, with only a minor fraction showing quality degradation. Figure 3b shows the distribution of the GC content, demonstrating that the GC counts per read were similar to the theoretical distribution. This pattern was consistently observed across all samples, ensuring the reliability of downstream analyses. Adapter trimming and removal of low-quality bases (Phred score <20) were performed using TrimGalore (v0.6.6). A high proportion of uniquely aligned reads was observed across all cell lines after preprocessing, indicating reliable mapping efficiency to their respective reference genomes: 95.7% for HepG2 (hg38; Fig. 3c), 93.7% for C2C12 (mm10; Fig. 3d), and 90.2% for PC12 (Rn6; Fig. 3e).
Quality assessment of RNA sequence. (a) Representative FastQC results showing per sequence quality scores and (b) distribution of GC content. Read mapping summary showing the number of uniquely and multiply mapped reads for (c) HepG2, (d) C2C12, and (e) PC12 cell lines. The total number of reads and the percentage of uniquely mapped reads are indicated below each panel. bp, base pairs.
Biological and technical reproducibility across species
The following analyses were performed to examine replicate concordance and cross-batch stability to confirm internal reproducibility. We determined the correlation of expression values for 15,095 common protein-coding genes to quantify biological and technical batch effects and ensure the reproducibility of gene expression data across multiple species (human, mouse, and rat). We analyzed three independent biological replicates for each treatment condition, including cell line, extraction method, solvent, treatment ratio, and dosage, to assess biological reproducibility. We calculated the pairwise Pearson correlation coefficients between replicates to quantify the similarity of each replicate. The average correlation coefficient was 0.977 for all conditions, indicating high biological reproducibility. Furthermore, the average correlation of expression levels exceeded 0.90 in three replicates for 94.23% of conditions (Fig. 4a). To assess technical reproducibility and evaluate potential sequencing batch effects, we investigated data generated across eight sequencing batches from three different cell lines. Pearson correlation analyses of these control samples showed minimal batch effects, with all control sample pairs exhibiting high correlation coefficients (>0.90), despite being sequenced in different batches (Fig. 4b and Supplementary Table 5). These results collectively demonstrate that both biological and technical reproducibility were well maintained across experimental conditions.
Replicability of RNA-seq profiles. (a) Distribution of average Pearson’s correlation coefficients for replicates (yellow; Same) and non-replicate samples (gray; Different). The correlation between replicate samples is shown in yellow, while that between replicate and non-replicate samples are shown in gray. The gray dashed line represents the mean value for each condition. (b) Heatmap of Pearson’s correlation coefficients for replicate samples across various sequencing batches. Correlations for vehicle samples in three cell lines are shown by batch.
Comparisons with external drug-induced transcriptomic profiles
As an independent benchmark of reliability, we compared pathway-level activity profiles for positive-control perturbations with an established external resource. To evaluate the similarity between our in-house RNA-seq data and external CMap transcriptomic signatures, we computed Pearson correlation coefficients between the pathway enrichment scores for each positive control drug under matched (same drug) and unmatched (different drugs) treatment conditions (Fig. 5). For trichostatin A, the average correlation with matched CMap profiles was 0.535 ± 0.037. In contrast, the correlations with unmatched CMap profiles were lower, with a mean of 0.209 ± 0.342. For wortmannin, the matched condition yielded a mean correlation of 0.532 ± 0.098, whereas the unmatched condition showed a considerably lower average of 0.200 ± 0.322. Similarly, the matched samples for vorinostat showed a mean correlation of 0.584 ± 0.035, relative to 0.181 ± 0.281 for unmatched samples. The correlation coefficients for all three compounds were consistently and significantly higher under matched conditions, supporting the transcriptomic reproducibility and external validity of our RNA-seq profiles in capturing drug-specific pathway perturbations.
Concordance between RNA-seq data and CMap transcriptomic profiles. Density plots show the distribution of Pearson’s correlation coefficients for pathway activity scores derived from our RNA-seq data and CMap L1000 profiles for positive control compounds in HepG2 cells: (a) trichostatin A, (b) wortmannin, and (c) vorinostat. Correlations were computed under matched treatment conditions (same drug; yellow) and unmatched conditions (different drugs; gray). Dashed vertical lines indicate the mean correlation for each group. Statistical significance of the differences in correlation distributions between matched and unmatched groups was assessed using the Wilcoxon rank-sum test.
Data availability
The datasets generated in this study have been deposited in the Gene Expression Omnibus (GEO) under accession numbers GSE29906329, GSE29772630, GSE29506931,32, GSE22749432,33, GSE29841434, and GSE28992935 (Table 9). The HPLC dataset underlying this manuscript is publicly available via Figshare (https://doi.org/10.6084/m9.figshare.30962618)40. This includes all chromatographic files, processed quantitative data, and associated metadata necessary for reproducing the analyses presented in the Technical Validation section.
Code availability
The software utilized for RNA-seq data analysis, along with their parameters, is fully detailed in the Methods section. Unless otherwise noted, the default settings recommended by the software developers were applied. The dataset curation and validation were carried out using custom R scripts, as described in detail in the Materials and Methods section. Researchers are encouraged to cite this publication when using the RNA-seq data available in GEO.
References
Jung, W. S., Moon, S. K., Park, S. U., Ko, C. N. & Cho, K. H. Clinical assessment of usefulness, effectiveness and safety of jackyakamcho-tang (shaoyaogancao-tang) on muscle spasm and pain: a case series. Am J Chin Med 32, 611–620, https://doi.org/10.1142/S0192415X04002247 (2004).
Han, K. et al. Jakyakgamcho-tang in the relief of delayed-onset muscle soreness in healthy adults: study protocol for a randomized, double-blind, placebo-controlled, crossover design clinical trial. Trials 21, 211, https://doi.org/10.1186/s13063-020-4119-4 (2020).
Kim, A. et al. Jakyak-gamcho-tang, a decoction of Paeoniae Radix and Glycyrrhizae Radix et Rhizoma, ameliorates dexamethasone-induced muscle atrophy and muscle dysfunction. Phytomedicine 123, 155057, https://doi.org/10.1016/j.phymed.2023.155057 (2024).
Chiu, Y. J. et al. Formulated Chinese medicine Shaoyao Gancao Tang reduces NLRP1 and NLRP3 in Alzheimer’s disease cell and mouse models for neuroprotection and cognitive improvement. Aging (Albany NY) 13, 15620–15637, https://doi.org/10.18632/aging.203125 (2021).
Chen, I. C. et al. Formulated Chinese Medicine Shaoyao Gancao Tang Reduces Tau Aggregation and Exerts Neuroprotection through Anti-Oxidation and Anti-Inflammation. Oxid Med Cell Longev 2018, 9595741, https://doi.org/10.1155/2018/9595741 (2018).
Mehta, S. S. & Fallon, M. B. Muscle cramps in liver disease. Clin Gastroenterol Hepatol 11, 1385-1391; quiz e1380, https://doi.org/10.1016/j.cgh.2013.03.017 (2013).
Sawlani, K. & Katirji, B. Peripheral Nerve Hyperexcitability Syndromes. Continuum (Minneap Minn) 23, 1437–1450, https://doi.org/10.1212/CON.0000000000000520 (2017).
Baek, E. B. et al. Anti-inflammatory effect of Gyeji-tang in a chronic obstructive pulmonary disease mouse model induced by cigarette smoke and lipopolysaccharide. Pharm Biol 60, 2040–2048, https://doi.org/10.1080/13880209.2022.2131841 (2022).
Takayama, S. et al. Clinical Practice Guidelines and Evidence for the Efficacy of Traditional Japanese Herbal Medicine (Kampo) in Treating Geriatric Patients. Front Nutr 5, 66, https://doi.org/10.3389/fnut.2018.00066 (2018).
Bayliak, M. M., Burdyliuk, N. I. & Lushchak, V. I. Effects of pH on antioxidant and prooxidant properties of common medicinal herbs. Open Life Sci 11, 298–307, https://doi.org/10.1515/biol-2016-0040 (2016).
Cheung, H. P. et al. Comparison of chemical profiles and effectiveness between Erxian decoction and mixtures of decoctions of its individual herbs: a novel approach for identification of the standard chemicals. Chin Med 12, 1, https://doi.org/10.1186/s13020-016-0123-8 (2017).
Lamb, J. et al. The Connectivity Map: using gene-expression signatures to connect small molecules, genes, and disease. Science 313, 1929–1935, https://doi.org/10.1126/science.1132939 (2006).
Subramanian, A. et al. A Next Generation Connectivity Map: L1000 Platform and the First 1,000,000 Profiles. Cell 171, 1437–1452 e1417, https://doi.org/10.1016/j.cell.2017.10.049 (2017).
Musa, A. et al. A review of connectivity map and computational approaches in pharmacogenomics. Brief Bioinform 19, 506–523, https://doi.org/10.1093/bib/bbw112 (2018).
Kwon, O. S., Kim, W., Cha, H. J. & Lee, H. In silico drug repositioning: from large-scale transcriptome data to therapeutics. Arch Pharm Res 42, 879–889, https://doi.org/10.1007/s12272-019-01176-3 (2019).
Lee, M. et al. Systems pharmacology approaches in herbal medicine research: a brief review. BMB Rep 55, 417–428, https://doi.org/10.5483/BMBRep.2022.55.9.102 (2022).
Fang, S. et al. HERB: a high-throughput experiment- and reference-guided database of traditional Chinese medicine. Nucleic Acids Res 49, D1197–D1206, https://doi.org/10.1093/nar/gkaa1063 (2021).
Park, M. et al. KORE-Map 1.0: Korean medicine Omics Resource Extension Map on transcriptome data of tonifying herbal medicine. Sci Data 11, 974, https://doi.org/10.1038/s41597-024-03734-x (2024).
Chen, F. P. et al. Modern use of Chinese herbal formulae from Shang-Han Lun. Chin Med J (Engl) 122, 1889–1894 (2009).
Tourabi, M. et al. Efficacy of various extracting solvents on phytochemical composition, and biological properties of Mentha longifolia L. leaf extracts. Sci Rep 13, 18028, https://doi.org/10.1038/s41598-023-45030-5 (2023).
Hwang, E. S. & Thi, N. D. Effects of Extraction and Processing Methods on Antioxidant Compound Contents and Radical Scavenging Activities of Laver (Porphyra tenera). Prev Nutr Food Sci 19, 40–48, https://doi.org/10.3746/pnf.2014.19.1.040 (2014).
Erhabor, J. O., Omokhua, A. G., Ondua, M., Abdalla, M. A. & McGaw, L. J. Pharmacological evaluation of hydro-ethanol and hot water leaf extracts of Bauhinia galpinii (Fabaceae): A South African ethnomedicinal plant. South African Journal of Botany 128, 28–34, https://doi.org/10.1016/j.sajb.2019.10.008 (2020).
Ministry of Food and Drug Safety (MFDS). Korean Herbal Pharmacopoeia. Notice No. 2023-93 (Dec 27,2023) edn, (Sejong, Korea: Ministry of Food and Drug Safety, 2023).
Kim, J. Y., Kim, M., Kim, R. Y., Park, W. K. & Park, Y. H. A 12-week, randomized, double-blind, placebo-controlled study assessing the efficacy of EGHB010, a standardized extract of Paeoniae radix and Glycyrrhizae radix, in patients with early age-related macular degeneration. Ann Transl Med 9, 541, https://doi.org/10.21037/atm-20-4701 (2021).
Wu, Y. et al. Shaoyao-Gancao Decoction, a famous Chinese medicine formula, protects against APAP-induced liver injury by promoting autophagy/mitophagy. Phytomedicine 135, 156053, https://doi.org/10.1016/j.phymed.2024.156053 (2024).
Bi, X., Gong, M. & Di, L. Review on prescription compatibility of shaoyao gancao decoction and reflection on pharmacokinetic compatibility mechanism of traditional chinese medicine prescription based on in vivo drug interaction of main efficacious components. Evid Based Complement Alternat Med 2014, 208129, https://doi.org/10.1155/2014/208129 (2014).
Dobin, A. et al. STAR: ultrafast universal RNA-seq aligner. Bioinformatics 29, 15–21, https://doi.org/10.1093/bioinformatics/bts635 (2013).
Li, B. & Dewey, C. N. RSEM: accurate transcript quantification from RNA-Seq data with or without a reference genome. BMC Bioinformatics 12, 323, https://doi.org/10.1186/1471-2105-12-323 (2011).
Baek, S., Cha, S. & Yi, J. GEO. https://identifiers.org/geo/GSE299063 (2025).
Baek, S., Cha, S. & Yi, J. GEO. https://identifiers.org/geo/GSE297726 (2025).
Kim, A., Park, S. & Baek, S. GEO. https://identifiers.org/geo/GSE295069 (2025).
Kim, A. et al. Integration of Transcriptomic Analysis, Network Pharmacology, and Experimental Validation Demonstrates Enhanced Muscle-Protective Effects of Ethanol Extract of Jakyak-Gamcho-Tang. Antioxidants (Basel) 14, https://doi.org/10.3390/antiox14070795 (2025).
Cha, S. & Kim, N. GEO. https://identifiers.org/geo/GSE227494 (2023).
Baek, S., Cha, S. & Yi, J. GEO. https://identifiers.org/geo/GSE298414 (2025).
Baek, S. GEO. https://identifiers.org/geo/GSE289929 (2025).
Subramanian, A. et al. Gene set enrichment analysis: a knowledge-based approach for interpreting genome-wide expression profiles. Proc Natl Acad Sci USA 102, 15545–15550, https://doi.org/10.1073/pnas.0506580102 (2005).
Ramilowski, J. A. et al. Functional annotation of human long noncoding RNAs via molecular phenotyping. Genome Res 30, 1060–1072, https://doi.org/10.1101/gr.254219.119 (2020).
Kirouac, D. C. et al. Deconvolution of clinical variance in CAR-T cell pharmacology and response. Nat Biotechnol 41, 1606–1617, https://doi.org/10.1038/s41587-023-01687-x (2023).
Ho, J. S. Y. et al. TOP1 inhibition therapy protects against SARS-CoV-2-induced lethal inflammation. Cell 184, 2618–2632 e2617, https://doi.org/10.1016/j.cell.2021.03.051 (2021).
Baek, S. Multidimensional transcriptome dataset for systematic evaluation of Jakyakgamcho-tang-induced cell signatures. figshare https://doi.org/10.6084/m9.figshare.30962618 (2026).
Acknowledgements
Total RNA isolation and RNA-seq were conducted by LAS Co. Ltd., South Korea. This study was supported by Grant number KSN2235120 from the Korea Institute of Oriental Medicine.
Author information
Authors and Affiliations
Contributions
All authors contributed to the study. Conceptualization, S.C. J.M.Y., M.P., S.J.B., H.L. and S.M.P.; Investigation, S.J.B., H.L., S.M.P., A.K. and N.S.K.; Formal analysis, S.J.B., H.L., S.M.P. and J.M.Y.; Data curation, S.J.B., S.M.P., H.L. and J.M.Y.; Writing–original draft preparation, S.J.B., H.L., S.M.P., M.P., J.M.Y. and S.C.; Writing–review and editing, A.K., N.S.K., E.H.S., A.Y.L., Y.R.K., Y.J.K. and J.Y.S.; Funding acquisition, S.C. All authors reviewed and edited the manuscript.
Corresponding authors
Ethics declarations
Competing interests
The authors declare no competing interests.
Additional information
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.
About this article
Cite this article
Baek, SJ., Lee, H., Park, SM. et al. Multidimensional transcriptome dataset for systematic evaluation of Jakyakgamcho-tang-induced cell signatures. Sci Data 13, 367 (2026). https://doi.org/10.1038/s41597-026-06759-6
Received:
Accepted:
Published:
Version of record:
DOI: https://doi.org/10.1038/s41597-026-06759-6







