Multidimensional transcriptome dataset for systematic evaluation of Jakyakgamcho-tang-induced cell signatures

Baek, Su-Jin; Lee, Haeseung; Park, Sang-Min; Kim, Aeyung; Kim, No Soo; Seo, Eun-Hye; Lee, A Yeong; Kim, Yu Ri; Kim, Wook Jin; Seo, Kyu-Won; Park, Musun; Yi, Jin-Mu; Cha, Seongwon

doi:10.1038/s41597-026-06759-6

Download PDF

Data Descriptor
Open access
Published: 06 February 2026

Multidimensional transcriptome dataset for systematic evaluation of Jakyakgamcho-tang-induced cell signatures

Scientific Data volume 13, Article number: 367 (2026) Cite this article

1399 Accesses
Metrics details

Subjects

Abstract

Jakyakgamcho-tang (JGT), the simplest form of herbal medicine, comprises Paeoniae Radix (PR) and Glycyrrhizae Radix et Rhizoma (GR). It has been used to treat muscle-related diseases and inflammation. However, its pharmacological effects may vary with the proportions of ingredients and preparatory factors such as the extraction method. Nevertheless, gene expression datasets systematically reflecting these variables are lacking. A total of 513 transcriptome profiles were created with three concentrations and three replicates of RNA-seq data. This dataset structure will enable multidimensional analysis of the effects of various JGT preparation factors on gene expression; these factors include the PR to GR proportional ratio (2:1, 1:1, and 1:2), solvent (water or 70% ethanol), and extraction method (combined or individual extraction method). The HepG2, C2C12, and PC12 cell lines were targeted. All raw and preprocessed data are available through GEO. Standardized metadata and ingredient data are also provided. This dataset provides a foundation for exploring traditional herbal formulations effects on cellular transcriptomic responses and can facilitate the scientific optimization of herbal medicines.

Chuanxiong Rhizoma regulates ferroptosis and the immune microenvironment in ischemic stroke through the JAK-STAT3 pathway

Article Open access 28 December 2024

A demethylation-driven gene signature predicts prognosis and therapeutic vulnerability in hepatocellular carcinoma

Article Open access 26 February 2026

Exploring the mechanism of Jinlida granules against type 2 diabetes mellitus by an integrative pharmacology strategy

Article Open access 04 May 2024

Background & Summary

Jakyakgamcho-tang (JGT), a traditional East Asian herbal medicine composed of Paeoniae Radix (PR) and Glycyrrhizae Radix et Rhizoma (GR), has long been used to treat muscle spasms and pain^1,2. Recent studies have extended its indications to include muscular atrophy³, cognitive impairment⁴, and inflammation⁵, this makes it a candidate for multi-targeted therapeutic applications. To investigate the organ systems that are directly or indirectly associated with the pathophysiology of the muscle spasms and pain targeted by JGT^6,7, we generated a multidimensional transcriptomic dataset of JGT treatment in C2C12, HepG2, and PC12 cells. This dataset systematically captures transcriptional responses across (1) different extraction solvents, (2) PR:GR mixing ratios, and (3) extraction methods, thereby providing a comprehensive molecular landscape of JGT’s biological activity.

Herbal medicine is influenced by the proportions of its constituent herbs, which is an important but underexplored relationship. Herbal formulas with high PR content have shown different therapeutic effects, such as a shift in indications. For instance, Gyeji-tang is used to treat common colds⁸, but Gyeji-ga-jakyak-tang, which contains twice as much PR, is used to treat stomachache⁹. The extraction method also influences herbal efficacy. The combined extraction method (CEM; mix before extraction) allows for chemical interactions between herbs during co-boiling (or sonication), and this may alter bioactive compound availability¹⁰. In contrast, the individual extraction method (IEM; extraction before mix) enhances standardization and preserves volatile compounds by extracting herbs individually before mixing¹¹. Therefore, the pharmacological effects of herbal medicine may vary with the proportions of constituents and the extraction method. Pharmacological data incorporating the proportions of ingredients and the extraction method under the same conditions are needed.

With the growing interest in the systematic investigation of drug mechanisms of action, drug-induced transcriptomics has emerged as a scalable and informative approach. The Connectivity Map (CMap)¹² was one of the early large-scale initiatives that lay the foundation for data-driven pharmacology by linking small molecules to gene expression changes. This was advanced by the LINCS L1000 project¹³, which increased both the scale and diversity of perturbagen-induced transcriptome data by profiling thousands of compounds across various cell lines and treatment conditions. These resources have enabled comprehensive comparisons of compound-induced transcriptional responses and facilitated the development of computational methods for inferring relationships among drugs, genes, and diseases^14,15. CMap and LINCS primarily focused on FDA-approved and experimental small molecules, but recent efforts have aimed to establish transcriptomic datasets tailored to context-specific or culturally relevant substances, such as traditional herbal medicines¹⁶. The HERB database systematic curated more than 6,000 herb- or ingredient-induced expression profiles and cross-referenced them with targets, diseases, and FDA-approved drugs to provide a ready interface between ethnopharmacology and modern systems biology¹⁷. More recently, the KORE-Map dataset compiled 1,200 perturbome signatures generated under standardized-only IEM protocols, including both hot water and 70% ethanol solvents, and standardized dosing conditions for tonifying prescriptions¹⁸. This dataset provides pathway- and network-level annotations in a reusable format.

However, these specialized resources do not provide data that allow for comparative analyses of differences in herbal constituent proportions or CEM and IEM, leaving an important gap in data-driven herbal medicine research. To address this gap, this study generated a transcriptome matrix containing a total of 513 RNA-seq data. The data were collated in an orthogonal design across three cell lines with three PR to GR ratios (2:1, 1:1, and 1:2), two solvents (water and 70% ethanol), and two extraction methods (CEM and IEM). The dataset was designed based on the rigorous methodologies of existing large-scale data resources, and raw and preprocessed data are publicly available via GEO. The purpose is to provide the first public dataset that allows quantitative analysis of the mechanisms of proportion- and extraction method-specific effects of traditional two-herbal combinations.

Methods

Selection of herbs and experimental material

Pharmacological transcriptome data of JGT are widely used in clinical practice¹⁹. JGT consists of only two herbal ingredients, and it is suitable for generating transcriptome data with varied simple herbal medicine mixing ratios and extraction methods. The efficacy of herbal medicine may vary with the extraction solvent, mixing ratio, and extraction method. This study developed herbal formulas using (a) water and 70% ethanol solvents; (b) single herbs, herbs in a 2:1 ratio, herbs in a 1:1 ratio, and herbs in a 1:2 ratio; and (c) two extraction methods (CEM and IEM). Water extraction was chosen to reflect the classical decoction method commonly used in East Asian medicine, whereas 70% ethanol extraction was included to improve the recovery of both hydrophilic and lipophilic constituents, as supported by previous pharmacological studies^20,21,22. According to the Korean Herbal Pharmacopoeia, JGT is standardized at a 1:1 ratio of PR to GR²³. However, variations such as 2:1 or 1:2 have also been documented, with PR-rich formulations traditionally used for abdominal pain and muscle-related disorders and GR-rich formulations employed for anti-inflammatory and antispasmodic effects^24,25,26. To capture these differences, we systematically included two solvents and three mixing ratios.

Preparation of herbs

Dried medicinal herbs used in the preparation of JGT, PR, and GR were procured from Kwangmyung-dang Medicinal Herbs Co. (Ulsan, Republic of Korea) in accordance with the Korean Pharmacopoeia. Organoleptic assessment was conducted by Dr. Goya Choi, a certified expert in herbal quality evaluation recognized by the Korea Food and Drug Administration. Botanical identification was confirmed via DNA barcoding, and voucher specimens were deposited at the Korean Herbarium of Standard Herbal Resources (KHSR) at the Herbal Medicine Resources Research Center, Korea Institute of Oriental Medicine (KIOM), Naju, Republic of Korea (Table 1). The detailed specimen records are accessible online (https://oasis.kiom.re.kr/herblib).

Table 1 Botanical information of the medicinal herbs used in Jakyakgamcho-tang.

Full size table

Preparation method of hot water and 70% ethanol extracts of herbs and JGT

Two preparation approaches were employed to investigate the effects of the constituent proportion, extraction solvent, and extract preparation method on the composition of JGT: (a) CEM (mix to extraction): combined extraction of ingredients after mixing of herbs; and (b) IEM (extraction to mix): extraction of individual ingredients followed by their mixing.

CEM

Five different ratios of Paeoniae Radix to Glycyrrhizae Radix et Rhizoma (0:3, 1:2, 1:1, 2:1, and 3:0, w/w) were used in blending the powdered raw herbs. A total of 900 g of each mixture was extracted using two different solvents: (a) hot water extraction (10 volumes of distilled water [1:10, w/v]) via reflux at 100 ± 2 °C for 3 hours with a reflux extraction system (MS-DM609; MTOPS, Seoul, South Korea); and (b) ethanol extraction (70% ethanol [1:4, w/v]) via ultrasonic treatment for 1 hour, followed by a second 1-hour cycle with fresh solvent (total extraction time: 2 hours) with an ultrasonication system (VCP-20, Lab Companion, Daejeon, South Korea). All samples were filtered through a 53-μm mesh filter after extraction. The filtrates were concentrated under reduced pressure at 60 °C using a rotary evaporator (Ev-1020, SciLab, Seoul, South Korea) and lyophilized using a freeze dryer (LP-20, Ilshin-Bio-Base, Dongducheon, South Korea). The dried extracts were homogenized using a mortar to ensure uniformity. The extraction yields for each combination are presented in Table 2.

Table 2 Extraction yields of co-extracted JGT using different Paeoniae Radix (PR): Glycyrrhizae Radix et Rhizoma (GR) ratios.

Full size table

IEM

The lyophilized single-herb extracts extracted separately as described above were mixed in fixed ratios (1:2, 1:1, and 2:1, w/w) based on their extraction yields (Table 3). All extracts were stored at 4 °C. For in vitro use, 100 mg of the extract was dissolved in 10 mL of phosphate buffered saline (PBS; Gibco, Thermo Fisher Scientific, Waltham, MA, USA) containing 2% dimethyl sulfoxide (DMSO; Sigma-Aldrich, St. Louis, MO, USA). The mixture was vortexed for 30 min, sterilized through a 0.22-μm Minisart RC syringe filter (Sartorius, Göttingen, Germany), and stored at −80 °C until use. The 10 mg/mL stock solution was prepared before use.

Table 3 Composition of JGT mixtures prepared by post-extraction mixing of individual herb extracts.

Full size table

Quantitative analysis of major compounds by high-performance liquid chromatography

The major bioactive constituents in JGT extracts were quantified using high-performance liquid chromatography. Chromatographic separation was performed using a Waters e2695 separation module equipped with a 2998 photodiode array detector (Waters Corp., Milford, MA, USA) in an INNO C18(2) column (250 × 4.6 mm, i.d. 5 µm; Youngjin Biochrom, Seongnam, Republic of Korea) maintained at 40 °C. The sample compartment was kept at 25 °C, and the injection volume was 20 μL. The flow rate was 1.0 mL/min. The mobile phases consisted of (a) 0.5% acetic acid in water and (b) 0.5% acetic acid in acetonitrile. Gradient elution was programmed as follows: 0–10 min, 95% A; 10–25 min, 95 to 80% A; 25–35 min, 80% A; 35–45 min, 80 to 70% A; 45–55 min, 70% A; 55–70 min, 70 to 55% A; 70–85 min, 55% A; and 85–95 min, 55 to 45% A. All samples were prepared by dissolving lyophilized extracts in 80% methanol and filtering them through a 0.45 μm syringe filter before injection. Each sample was analyzed in triplicate. The quantitative results (mean ± SD, μg/mL) for each herbal mixing ratio (0:3, 1:2, 1:1, 2:1, 3:0, w/w) and solvent (water or 70% ethanol) are summarized in Tables 4, 5. This quantitative HPLC analysis was performed to confirm the presence and relative levels of major marker compounds under different PR:GR ratios and solvent conditions, providing chemical reference data to support the interpretation of transcriptomic responses.

Table 4 Concentration of Paeoniae Radix-derived major compounds in co-extraction JGT extracts (μg/mL).

Full size table

Table 5 Concentration of Glycyrrhizae Radix et Rhizoma -derived major compounds in co-extraction JGT extracts (μg/mL).

Full size table

Cell culture and differentiation

Three cell lines were used in this study: HepG2 (a human hepatocellular carcinoma cell line), PC12 (a rat adrenal medulla-derived pheochromocytoma cell line), and C2C12 (a mouse skeletal muscle-derived myoblast cell line). These cell types were selected to represent the hepatic, neuronal, and muscular systems that are directly or indirectly associated with the pathophysiology of muscle spasms. All cell lines were obtained from the American Type Culture Collection (ATCC, Manassas, VA, USA) and confirmed to be free of mycoplasma contamination. Basal medium and supplements used for cell culture were purchased from Gibco, Thermo Fisher Scientific. The cell line information and culture conditions are summarized in Table 6. The HepG2 cells (ATCC HB-8065) were maintained in Dulbecco’s Modified Eagle’s Medium (DMEM) supplemented with 10% fetal bovine serum (FBS) and 1% penicillin-streptomycin (P/S). Cultures were incubated at 37 °C in a humidified atmosphere with 5% CO₂. The PC12 cells (ATCC CRL-1721) were cultured on 100 mm collagen I-coated culture dishes (Coning BioCoat, Corning Inc., NY, USA) in DMEM containing 10% heat-inactivated horse serum (HS), 5% non-heat-inactivated FBS, and 1% P/S. For neuronal differentiation, cells were seeded onto collagen IV-coated multi-well plates (Coning BioCoat) and induced using differentiation medium (DM) composed of DMEM, 0.5% FBS, 100 ng/mL nerve growth factor (NGF; R&D Systems, Minneapolis, MN, USA), 1% N2 supplement (R&D Systems), and 1% P/S. Media were partially exchanged on days 3 and 6 of differentiation, and neuronal morphology was monitored under phase-contrast microscopy (IX71, Olympus Corporation, Tokyo, Japan). Differentiated PC12 cells on day 7 were treated with various drugs for testing. C2C12 myoblasts (ATCC CRL-1772) were cultured in DMEM supplemented with 10% FBS and 1% P/S. Upon reaching 80–90% confluence, cells were trypsinized and seeded into multi-well culture plates (Nunc, Thermo Fisher Scientific, Waltham, MA, USA). After 24 hours, the growth medium was replaced with differentiation medium composed of DMEM supplemented with 2% heat-inactivated horse serum and 1% P/S. The media were refreshed every 2 days to promote myotube formation, and differentiation was confirmed by morphological changes characteristic of myotube formation. Differentiated C2C12 cells on day 5 were treated with various drugs for testing. The ATCC does not provide the exact passage number at distribution for PC12 and C2C12 cells; therefore, thawing was designated as passage 1. Frozen vials were prepared at passage 10 for PC12 cells and passage 8 for C2C12 cells. For HepG2 cells, the cumulative passage number had reached 81 at the time of freezing. For all three cell lines, experiments were conducted using passages 3–10 after thawing.

Table 6 Cell line information and culture conditions.

Full size table

Drug treatment and total RNA preparation for RNA sequencing (RNA-seq) analysis

The cells were treated after IC20 determination using the WST-8 cell viability assay (Biomax, Guri, Republic of Korea). Each JGT extract prepared using hot water (JGW) or 70% ethanol (JGE) was tested at a maximum concentration of 500 μg/mL on HepG2, PC12, and C2C12 cells. The IC20 values for each cell line and extract were calculated based on dose-response curves, and the mean ± SD for biological triplicates is reported. The results of the IC20 determination are summarized in Table 7. The extract concentrations used for mRNA sequencing analysis were selected based on these results. The IC20 was chosen as the reference concentration to minimize cytotoxicity while ensuring sufficient transcriptional perturbation. This sub-cytotoxic level (>80% viability) enables the identification of pharmacologically relevant gene expression changes without the confounding effects from extensive cell death. Concentrations of 100, 20, and 4 μg/mL were used for PC12 cells, while 500, 100, and 20 μg/mL were used for the C2C12 and HepG2 cells. These concentrations were not associated with cytotoxicity. The cells were seeded into 6-well plates at a predetermined density for an overnight culture (HepG2) or until differentiation (PC12 and C2C12) and treated with three serial concentrations of the extracts for 24 hours. At the time of RNA collection, HepG2 cells typically reached ~70–80% confluency, PC12 cells displayed ~60–70% confluency with extensive neurite outgrowth, and C2C12 cells were ~90–100% confluent with robust myotube formation.

Table 7 IC₂₀ results to determine concentration for RNA-seq.

Full size table

Treatment solutions were prepared from 20 × extract stock solutions dissolved in 2% DMSO/PBS, and a final DMSO concentration of 0.1% was maintained. In addition, a vehicle control group (2% DMSO/PBS) was included in all experiments to account for solvent-related effects. The cells were washed twice with 3 mL of ice-cold PBS after treatment, and total RNA was extracted using 1 mL QIAzol Lysis Reagent (Qiagen, Hilden, Germany) following the manufacturer’s guidelines. All samples were collected in triplicate. To validate RNA-seq performance and provide a reference for a transcriptional response, each cell line was treated concurrently with positive control compounds known to induce well-characterized transcriptional changes relevant to their respective tissue or phenotype. All positive control compounds were solubilized in 100% DMSO and diluted to working concentrations in PBS immediately before application. The treatment duration and RNA extraction followed the same protocol used for JGT extract treatment. The treated concentrations for each compound were determined based on reference and cytotoxicity assays. Information on the positive control drugs is summarized in Table 8.

Table 8 Positive control compounds for RNA-seq quality validation.

Full size table

RNA-seq data generation and preprocessing

More than 500 ng of total RNA were extracted from each sample. RNA sequencing libraries were prepared using the MGIEasy RNA Directional Library Prep Kit (MGI Tech Co., Ltd., China) following the manufacturer’s instructions. Library concentrations were quantified using the QuantiFluor ONE dsDNA System (Promega Corporation, WI, USA). DNA nanoballs (DNBs) for sequencing were generated using DNB enzyme, and the libraries were quantified using the QuantiFluor ssDNA system (Promega Corporation, WI, USA). Paired-end sequencing (100 bp read length) was performed on the MGISeq system (MGI Tech Co., Ltd., China). The quality of the raw RNA-seq reads was assessed using FastQC (v. 0.11.9). Common MGISEQ adapter sequences were trimmed using TrimGalore (v. 0.6.6) to remove adapter contamination. High-quality reads were mapped to the respective reference genomes —HepG2 samples to hg38, C2C12 to mm10, and PC12 to Rn6—using STAR Aligner (v2.7.3a)²⁷. The mapped reads were quantified for expression values for each gene using RSEM (v.1.3.3.)²⁸. The raw sequence data (FASTQ files) and preprocessed expression matrix for each gene were deposited in Gene Expression Omnibus (GEO) under the accession numbers GSE299063²⁹, GSE297726³⁰, GSE295069^31,32, GSE227494^32,33, GSE298414³⁴, and GSE289929³⁵ (Table 9).

Table 9 Sample information.

Full size table

Comparisons with external drug-induced transcriptomic profiles

To assess the reproducibility of our RNA-seq data, we compared drug-induced transcriptomic changes in HepG2 cells, the only cell line used in this study (C2C12, PC12, and HepG2) that is included in the CMap database¹³. Corresponding CMap L1000 profiles were obtained for three positive control compounds—trichostatin A, wortmannin, and vorinostat—from the Clue.io platform (clue.io/data/CMap2020#LINCS2020, the level5_beta_trt_cp_n720216 × 12328.gctx file). The CMap L1000 dataset provides replicate-collapsed and quality-controlled moderated z-scores representing normalized gene expression changes upon drug-induced perturbations. To align with our experimental conditions, we selected signatures from HepG2 cells treated for 24 hours with trichostatin A (≤1 μM), wortmannin (10 μM), and vorinostat (10 μM). Only high-quality profiles that met the criteria distil_cc_q75 > 0.5 and pct_self_rank_q25 < 0.05 were retained. This resulted in 3 profiles for trichostatin A, 19 profiles for wortmannin, and 20 profiles for vorinostat. Data retrieval and filtering were performed using the CMapR R package (v1.8.0). To enable a cross-platform comparison between RNA-seq and L1000 assay platforms, we transformed gene-level signatures into pathway-level activity scores by performing a Gene Set Enrichment Analysis (GSEA)³⁶ as described in a previous study^37,38,39. Genes were ranked according to drug-induced expression changes: for RNA-seq data, by the DESeq. 2 Wald statistic and for CMap data, by the level-5 MODZ z-scores. A unified ranking metric was not imposed because RNA-seq (count-based, negative binomial modeling) and CMap L1000 (z-score standardized expression) differ fundamentally in the measurement scale and statistical distribution. Enrichment was calculated for 2,229 curated gene sets from the Molecular Signatures Database (MSigDB), including Hallmark, KEGG, REACTOME, Biocarta, PID, and WikiPathways collections. For each pathway $g$, the pathway enrichment score (PES) was quantified as

$${{PES}}_{g}={sign}({{NES}}_{g})\times -{\log }_{10}({p}_{g}),$$

where ${{NES}}_{g}$ and ${p}_{g}$ denote the normalized enrichment score and the nominal p-value from GSEA, respectively. A PES vector with a length of 2,229 was generated for each sample. Pearson correlation coefficients were then computed for comparisons between RNA-seq-derived and CMap-derived PES vectors to assess cross-dataset concordance. Comparisons were made under matched (same drug) and unmatched (different drugs) conditions. Matched correlations were defined as pairwise comparisons between PES vectors derived from the same compound (e.g., trichostatin A RNA-seq vs. trichostatin A L1000), whereas unmatched correlations were computed between different compounds among the three positives (e.g., trichostatin A RNA-seq vs. wortmannin L1000). When multiple L1000 profiles were available for a given compound, all pairwise correlations between its profiles and the corresponding RNA-seq signature were calculated and included in the matched or unmatched distribution. Differences between the matched and unmatched correlation distributions were evaluated by performing a two-sided Wilcoxon rank-sum test. The msigdbr and fgsea R packages were used to access the gene sets and implement GSEA. All analyses were performed using R software (v 4.2.1).

Data Records

The dataset in this study is available at the GEO under accession numbers GSE299063²⁹, GSE297726³⁰, GSE295069^31,32, GSE227494^32,33, GSE298414³⁴, and GSE289929³⁵. The corresponding accession links for each dataset can be found in Table 9. For each accession, both raw sequencing data (FASTQ files) and processed gene-level count matrices are provided, enabling full reprocessing as well as immediate downstream analyses. All GEO entries provide sample-level metadata as supplemental annotation files, including information on production methods, cell lines, dosage, as well as detailed descriptions of extraction methods, solvents, and combination ratios. The metadata are summarized in Supplementary Tables 1–3. The HPLC dataset generated in this study is publicly available via Figshare (https://doi.org/10.6084/m9.figshare.30962618)⁴⁰. All raw and processed data, as well as associated metadata and calibration information, are provided to enable independent validation and reproducibility of the analyses. An overview of the standard operating procedure and conditions for generating the transcriptome data in this study is provided in Fig. 1.

Technical Validation

To comprehensively present reliability, we validated the (i) input integrity and extract chemistry (botanical identity and HPLC profiling), (ii) sequence-level quality and mapping, (iii) internal reproducibility and cross-batch stability, and (iv) external benchmarking against independent drug-induced transcriptomic profiles.

Assessment of HPLC profiles of JGT

Quantitative HPLC analysis confirmed the systematic compositional differences depending on the PR:GR ratio and solvent (Tables 4, 5). PR-derived compounds (e.g., paeoniflorin, albiflorin) were increased proportionally in PR-rich formulations, whereas GR-derived compounds (e.g., glycyrrhizin, isoliquiritin) were predominant in GR-rich formulations. In addition, 70% ethanol extracts generally yielded higher concentrations of both PR- and GR-derived compounds compared with those in water extracts. Specifically, the PR-derived compound oxypaeoniflorin was extracted preferentially in the ethanol extract. These findings demonstrate that the chemical composition of JGT varies systematically with the ratio and solvent, providing an important context for interpreting the proportion- and solvent-specific transcriptomic profiles. These controlled, ratio- and solvent-dependent compositional differences establish well-defined chemical perturbations across conditions, which in turn contextualize the proportion- and solvent-specific transcriptomic responses described in the following sections.

Assessment of experimental reliability

The raw herbal materials were authenticated through DNA barcoding, and voucher specimens were deposited in the Korean Herbarium of Standard Herbal Resources. In addition, all cell-based experiments were performed based on independent biological triplicates, and positive control compounds with well-characterized transcriptional signatures were included to validate the responsiveness of each cell line.

Assessment of RNA quality and integrity

Comprehensive assessments of RNA purity and integrity were performed to ensure the suitability of the samples for downstream sequencing. Of the three cell lines, the stable HepG2 cells had the highest RNA purity and integrity. The C2C12 and PC12 cells, which undergo cellular differentiation processes, had lower values. The optical densities at 260 nm and 280 nm were measured using a Trinean Dropsense™ 96 microvolume spectrophotometer. The A260/280 ratio was used to estimate RNA purity, with values between 1.5 and 2.0 indicating relatively pure RNA. Most RNA samples had A260/280 ratios within this range, indicating minimal contamination by proteins or other impurities (Fig. 2a). The 28S/18S ribosomal RNA ratio (Fig. 2b) and RNA integrity number (RIN) were measured using an Agilent Bioanalyzer system. The average 28S to 18S rRNA ratio was 1.56, and the RIN was ≥ 7, indicating high RNA integrity. Some samples with RIN values lower than 7 were included in the analysis to ensure experimental reproducibility, despite being below the generally accepted threshold (Fig. 2c and Supplementary Table 4). These results demonstrated that the RNA samples were of suitable quality and integrity for subsequent RNA sequencing.

Quality of RNA-seq data

The quality of raw RNA-seq data was assessed using FastQC (v0.11.9), which provides comprehensive metrics including per-read quality score distributions. As shown in Fig. 3a, the majority of reads revealed high average quality scores, with a sharp peak around a Phred score of 36. The overall shape of the distribution indicated that most reads maintained consistently high base quality, with only a minor fraction showing quality degradation. Figure 3b shows the distribution of the GC content, demonstrating that the GC counts per read were similar to the theoretical distribution. This pattern was consistently observed across all samples, ensuring the reliability of downstream analyses. Adapter trimming and removal of low-quality bases (Phred score <20) were performed using TrimGalore (v0.6.6). A high proportion of uniquely aligned reads was observed across all cell lines after preprocessing, indicating reliable mapping efficiency to their respective reference genomes: 95.7% for HepG2 (hg38; Fig. 3c), 93.7% for C2C12 (mm10; Fig. 3d), and 90.2% for PC12 (Rn6; Fig. 3e).

Biological and technical reproducibility across species

The following analyses were performed to examine replicate concordance and cross-batch stability to confirm internal reproducibility. We determined the correlation of expression values for 15,095 common protein-coding genes to quantify biological and technical batch effects and ensure the reproducibility of gene expression data across multiple species (human, mouse, and rat). We analyzed three independent biological replicates for each treatment condition, including cell line, extraction method, solvent, treatment ratio, and dosage, to assess biological reproducibility. We calculated the pairwise Pearson correlation coefficients between replicates to quantify the similarity of each replicate. The average correlation coefficient was 0.977 for all conditions, indicating high biological reproducibility. Furthermore, the average correlation of expression levels exceeded 0.90 in three replicates for 94.23% of conditions (Fig. 4a). To assess technical reproducibility and evaluate potential sequencing batch effects, we investigated data generated across eight sequencing batches from three different cell lines. Pearson correlation analyses of these control samples showed minimal batch effects, with all control sample pairs exhibiting high correlation coefficients (>0.90), despite being sequenced in different batches (Fig. 4b and Supplementary Table 5). These results collectively demonstrate that both biological and technical reproducibility were well maintained across experimental conditions.

Comparisons with external drug-induced transcriptomic profiles

As an independent benchmark of reliability, we compared pathway-level activity profiles for positive-control perturbations with an established external resource. To evaluate the similarity between our in-house RNA-seq data and external CMap transcriptomic signatures, we computed Pearson correlation coefficients between the pathway enrichment scores for each positive control drug under matched (same drug) and unmatched (different drugs) treatment conditions (Fig. 5). For trichostatin A, the average correlation with matched CMap profiles was 0.535 ± 0.037. In contrast, the correlations with unmatched CMap profiles were lower, with a mean of 0.209 ± 0.342. For wortmannin, the matched condition yielded a mean correlation of 0.532 ± 0.098, whereas the unmatched condition showed a considerably lower average of 0.200 ± 0.322. Similarly, the matched samples for vorinostat showed a mean correlation of 0.584 ± 0.035, relative to 0.181 ± 0.281 for unmatched samples. The correlation coefficients for all three compounds were consistently and significantly higher under matched conditions, supporting the transcriptomic reproducibility and external validity of our RNA-seq profiles in capturing drug-specific pathway perturbations.

Data availability

The datasets generated in this study have been deposited in the Gene Expression Omnibus (GEO) under accession numbers GSE299063²⁹, GSE297726³⁰, GSE295069^31,32, GSE227494^32,33, GSE298414³⁴, and GSE289929³⁵ (Table 9). The HPLC dataset underlying this manuscript is publicly available via Figshare (https://doi.org/10.6084/m9.figshare.30962618)⁴⁰. This includes all chromatographic files, processed quantitative data, and associated metadata necessary for reproducing the analyses presented in the Technical Validation section.

Code availability

The software utilized for RNA-seq data analysis, along with their parameters, is fully detailed in the Methods section. Unless otherwise noted, the default settings recommended by the software developers were applied. The dataset curation and validation were carried out using custom R scripts, as described in detail in the Materials and Methods section. Researchers are encouraged to cite this publication when using the RNA-seq data available in GEO.

References

Jung, W. S., Moon, S. K., Park, S. U., Ko, C. N. & Cho, K. H. Clinical assessment of usefulness, effectiveness and safety of jackyakamcho-tang (shaoyaogancao-tang) on muscle spasm and pain: a case series. Am J Chin Med 32, 611–620, https://doi.org/10.1142/S0192415X04002247 (2004).
Article PubMed Google Scholar
Han, K. et al. Jakyakgamcho-tang in the relief of delayed-onset muscle soreness in healthy adults: study protocol for a randomized, double-blind, placebo-controlled, crossover design clinical trial. Trials 21, 211, https://doi.org/10.1186/s13063-020-4119-4 (2020).
Article CAS PubMed PubMed Central Google Scholar
Kim, A. et al. Jakyak-gamcho-tang, a decoction of Paeoniae Radix and Glycyrrhizae Radix et Rhizoma, ameliorates dexamethasone-induced muscle atrophy and muscle dysfunction. Phytomedicine 123, 155057, https://doi.org/10.1016/j.phymed.2023.155057 (2024).
Article CAS PubMed Google Scholar
Chiu, Y. J. et al. Formulated Chinese medicine Shaoyao Gancao Tang reduces NLRP1 and NLRP3 in Alzheimer’s disease cell and mouse models for neuroprotection and cognitive improvement. Aging (Albany NY) 13, 15620–15637, https://doi.org/10.18632/aging.203125 (2021).
Article CAS PubMed PubMed Central Google Scholar
Chen, I. C. et al. Formulated Chinese Medicine Shaoyao Gancao Tang Reduces Tau Aggregation and Exerts Neuroprotection through Anti-Oxidation and Anti-Inflammation. Oxid Med Cell Longev 2018, 9595741, https://doi.org/10.1155/2018/9595741 (2018).
Article CAS PubMed PubMed Central Google Scholar
Mehta, S. S. & Fallon, M. B. Muscle cramps in liver disease. Clin Gastroenterol Hepatol 11, 1385-1391; quiz e1380, https://doi.org/10.1016/j.cgh.2013.03.017 (2013).
Sawlani, K. & Katirji, B. Peripheral Nerve Hyperexcitability Syndromes. Continuum (Minneap Minn) 23, 1437–1450, https://doi.org/10.1212/CON.0000000000000520 (2017).
Article PubMed Google Scholar
Baek, E. B. et al. Anti-inflammatory effect of Gyeji-tang in a chronic obstructive pulmonary disease mouse model induced by cigarette smoke and lipopolysaccharide. Pharm Biol 60, 2040–2048, https://doi.org/10.1080/13880209.2022.2131841 (2022).
Article CAS PubMed PubMed Central Google Scholar
Takayama, S. et al. Clinical Practice Guidelines and Evidence for the Efficacy of Traditional Japanese Herbal Medicine (Kampo) in Treating Geriatric Patients. Front Nutr 5, 66, https://doi.org/10.3389/fnut.2018.00066 (2018).
Article CAS PubMed PubMed Central Google Scholar
Bayliak, M. M., Burdyliuk, N. I. & Lushchak, V. I. Effects of pH on antioxidant and prooxidant properties of common medicinal herbs. Open Life Sci 11, 298–307, https://doi.org/10.1515/biol-2016-0040 (2016).
Article CAS Google Scholar
Cheung, H. P. et al. Comparison of chemical profiles and effectiveness between Erxian decoction and mixtures of decoctions of its individual herbs: a novel approach for identification of the standard chemicals. Chin Med 12, 1, https://doi.org/10.1186/s13020-016-0123-8 (2017).
Article ADS CAS PubMed PubMed Central Google Scholar
Lamb, J. et al. The Connectivity Map: using gene-expression signatures to connect small molecules, genes, and disease. Science 313, 1929–1935, https://doi.org/10.1126/science.1132939 (2006).
Article ADS CAS PubMed Google Scholar
Subramanian, A. et al. A Next Generation Connectivity Map: L1000 Platform and the First 1,000,000 Profiles. Cell 171, 1437–1452 e1417, https://doi.org/10.1016/j.cell.2017.10.049 (2017).
Article ADS CAS PubMed PubMed Central Google Scholar
Musa, A. et al. A review of connectivity map and computational approaches in pharmacogenomics. Brief Bioinform 19, 506–523, https://doi.org/10.1093/bib/bbw112 (2018).
Article CAS PubMed PubMed Central Google Scholar
Kwon, O. S., Kim, W., Cha, H. J. & Lee, H. In silico drug repositioning: from large-scale transcriptome data to therapeutics. Arch Pharm Res 42, 879–889, https://doi.org/10.1007/s12272-019-01176-3 (2019).
Article CAS PubMed Google Scholar
Lee, M. et al. Systems pharmacology approaches in herbal medicine research: a brief review. BMB Rep 55, 417–428, https://doi.org/10.5483/BMBRep.2022.55.9.102 (2022).
Article CAS PubMed PubMed Central Google Scholar
Fang, S. et al. HERB: a high-throughput experiment- and reference-guided database of traditional Chinese medicine. Nucleic Acids Res 49, D1197–D1206, https://doi.org/10.1093/nar/gkaa1063 (2021).
Article CAS PubMed PubMed Central Google Scholar
Park, M. et al. KORE-Map 1.0: Korean medicine Omics Resource Extension Map on transcriptome data of tonifying herbal medicine. Sci Data 11, 974, https://doi.org/10.1038/s41597-024-03734-x (2024).
Article PubMed PubMed Central Google Scholar
Chen, F. P. et al. Modern use of Chinese herbal formulae from Shang-Han Lun. Chin Med J (Engl) 122, 1889–1894 (2009).
ADS PubMed Google Scholar
Tourabi, M. et al. Efficacy of various extracting solvents on phytochemical composition, and biological properties of Mentha longifolia L. leaf extracts. Sci Rep 13, 18028, https://doi.org/10.1038/s41598-023-45030-5 (2023).
Article ADS CAS PubMed PubMed Central Google Scholar
Hwang, E. S. & Thi, N. D. Effects of Extraction and Processing Methods on Antioxidant Compound Contents and Radical Scavenging Activities of Laver (Porphyra tenera). Prev Nutr Food Sci 19, 40–48, https://doi.org/10.3746/pnf.2014.19.1.040 (2014).
Article PubMed PubMed Central Google Scholar
Erhabor, J. O., Omokhua, A. G., Ondua, M., Abdalla, M. A. & McGaw, L. J. Pharmacological evaluation of hydro-ethanol and hot water leaf extracts of Bauhinia galpinii (Fabaceae): A South African ethnomedicinal plant. South African Journal of Botany 128, 28–34, https://doi.org/10.1016/j.sajb.2019.10.008 (2020).
Article CAS Google Scholar
Ministry of Food and Drug Safety (MFDS). Korean Herbal Pharmacopoeia. Notice No. 2023-93 (Dec 27,2023) edn, (Sejong, Korea: Ministry of Food and Drug Safety, 2023).
Kim, J. Y., Kim, M., Kim, R. Y., Park, W. K. & Park, Y. H. A 12-week, randomized, double-blind, placebo-controlled study assessing the efficacy of EGHB010, a standardized extract of Paeoniae radix and Glycyrrhizae radix, in patients with early age-related macular degeneration. Ann Transl Med 9, 541, https://doi.org/10.21037/atm-20-4701 (2021).
Article CAS PubMed PubMed Central Google Scholar
Wu, Y. et al. Shaoyao-Gancao Decoction, a famous Chinese medicine formula, protects against APAP-induced liver injury by promoting autophagy/mitophagy. Phytomedicine 135, 156053, https://doi.org/10.1016/j.phymed.2024.156053 (2024).
Article CAS PubMed Google Scholar
Bi, X., Gong, M. & Di, L. Review on prescription compatibility of shaoyao gancao decoction and reflection on pharmacokinetic compatibility mechanism of traditional chinese medicine prescription based on in vivo drug interaction of main efficacious components. Evid Based Complement Alternat Med 2014, 208129, https://doi.org/10.1155/2014/208129 (2014).
Article PubMed PubMed Central Google Scholar
Dobin, A. et al. STAR: ultrafast universal RNA-seq aligner. Bioinformatics 29, 15–21, https://doi.org/10.1093/bioinformatics/bts635 (2013).
Article CAS PubMed Google Scholar
Li, B. & Dewey, C. N. RSEM: accurate transcript quantification from RNA-Seq data with or without a reference genome. BMC Bioinformatics 12, 323, https://doi.org/10.1186/1471-2105-12-323 (2011).
Article CAS PubMed PubMed Central Google Scholar
Baek, S., Cha, S. & Yi, J. GEO. https://identifiers.org/geo/GSE299063 (2025).
Baek, S., Cha, S. & Yi, J. GEO. https://identifiers.org/geo/GSE297726 (2025).
Kim, A., Park, S. & Baek, S. GEO. https://identifiers.org/geo/GSE295069 (2025).
Kim, A. et al. Integration of Transcriptomic Analysis, Network Pharmacology, and Experimental Validation Demonstrates Enhanced Muscle-Protective Effects of Ethanol Extract of Jakyak-Gamcho-Tang. Antioxidants (Basel) 14, https://doi.org/10.3390/antiox14070795 (2025).
Cha, S. & Kim, N. GEO. https://identifiers.org/geo/GSE227494 (2023).
Baek, S., Cha, S. & Yi, J. GEO. https://identifiers.org/geo/GSE298414 (2025).
Baek, S. GEO. https://identifiers.org/geo/GSE289929 (2025).
Subramanian, A. et al. Gene set enrichment analysis: a knowledge-based approach for interpreting genome-wide expression profiles. Proc Natl Acad Sci USA 102, 15545–15550, https://doi.org/10.1073/pnas.0506580102 (2005).
Article ADS CAS PubMed PubMed Central Google Scholar
Ramilowski, J. A. et al. Functional annotation of human long noncoding RNAs via molecular phenotyping. Genome Res 30, 1060–1072, https://doi.org/10.1101/gr.254219.119 (2020).
Article CAS PubMed PubMed Central Google Scholar
Kirouac, D. C. et al. Deconvolution of clinical variance in CAR-T cell pharmacology and response. Nat Biotechnol 41, 1606–1617, https://doi.org/10.1038/s41587-023-01687-x (2023).
Article CAS PubMed PubMed Central Google Scholar
Ho, J. S. Y. et al. TOP1 inhibition therapy protects against SARS-CoV-2-induced lethal inflammation. Cell 184, 2618–2632 e2617, https://doi.org/10.1016/j.cell.2021.03.051 (2021).
Article CAS PubMed PubMed Central Google Scholar
Baek, S. Multidimensional transcriptome dataset for systematic evaluation of Jakyakgamcho-tang-induced cell signatures. figshare https://doi.org/10.6084/m9.figshare.30962618 (2026).

Download references

Acknowledgements

Total RNA isolation and RNA-seq were conducted by LAS Co. Ltd., South Korea. This study was supported by Grant number KSN2235120 from the Korea Institute of Oriental Medicine.

Author information

These authors contributed equally: Su-Jin Baek, Haeseung Lee, Sang-Min Park.

Authors and Affiliations

Korean Medicine (KM) Data Division, Korea Institute of Oriental Medicine, Daejeon, 34054, Republic of Korea
Su-Jin Baek, Eun-Hye Seo, A Yeong Lee, Musun Park & Seongwon Cha
College of Pharmacy and Research Institute for Drug Development, Pusan National University, Busan, 46241, Republic of Korea
Haeseung Lee
College of Pharmacy, Chungnam National University, Daejeon, 34134, Republic of Korea
Sang-Min Park & Kyu-Won Seo
KM Application Center, Korea Institute of Oriental Medicine, Daegu, 41062, Republic of Korea
Aeyung Kim
KM Convergence Research Division, Korea Institute of Oriental Medicine, Daejeon, 34054, Republic of Korea
No Soo Kim, Yu Ri Kim & Jin-Mu Yi
Herbal Medicine Resources Research Center, Korea Institute of Oriental Medicine, Naju, 58245, Republic of Korea
Wook Jin Kim

Authors

Su-Jin Baek
View author publications
Search author on:PubMed Google Scholar
Haeseung Lee
View author publications
Search author on:PubMed Google Scholar
Sang-Min Park
View author publications
Search author on:PubMed Google Scholar
Aeyung Kim
View author publications
Search author on:PubMed Google Scholar
No Soo Kim
View author publications
Search author on:PubMed Google Scholar
Eun-Hye Seo
View author publications
Search author on:PubMed Google Scholar
A Yeong Lee
View author publications
Search author on:PubMed Google Scholar
Yu Ri Kim
View author publications
Search author on:PubMed Google Scholar
Wook Jin Kim
View author publications
Search author on:PubMed Google Scholar
Kyu-Won Seo
View author publications
Search author on:PubMed Google Scholar
Musun Park
View author publications
Search author on:PubMed Google Scholar
Jin-Mu Yi
View author publications
Search author on:PubMed Google Scholar
Seongwon Cha
View author publications
Search author on:PubMed Google Scholar

Contributions

All authors contributed to the study. Conceptualization, S.C. J.M.Y., M.P., S.J.B., H.L. and S.M.P.; Investigation, S.J.B., H.L., S.M.P., A.K. and N.S.K.; Formal analysis, S.J.B., H.L., S.M.P. and J.M.Y.; Data curation, S.J.B., S.M.P., H.L. and J.M.Y.; Writing–original draft preparation, S.J.B., H.L., S.M.P., M.P., J.M.Y. and S.C.; Writing–review and editing, A.K., N.S.K., E.H.S., A.Y.L., Y.R.K., Y.J.K. and J.Y.S.; Funding acquisition, S.C. All authors reviewed and edited the manuscript.

Corresponding authors

Correspondence to Musun Park, Jin-Mu Yi or Seongwon Cha.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

Table S1 (download XLSX )

Table S2 (download XLSX )

Table S3 (download XLSX )

Table S4 (download XLSX )

Table S5 (download XLSX )

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.

Reprints and permissions

About this article

Cite this article

Baek, SJ., Lee, H., Park, SM. et al. Multidimensional transcriptome dataset for systematic evaluation of Jakyakgamcho-tang-induced cell signatures. Sci Data 13, 367 (2026). https://doi.org/10.1038/s41597-026-06759-6

Download citation

Received: 25 June 2025
Accepted: 29 January 2026
Published: 06 February 2026
Version of record: 13 March 2026
DOI: https://doi.org/10.1038/s41597-026-06759-6

Subjects

Abstract

Similar content being viewed by others

Background & Summary

Methods

Selection of herbs and experimental material

Preparation of herbs

Preparation method of hot water and 70% ethanol extracts of herbs and JGT

CEM

IEM

Quantitative analysis of major compounds by high-performance liquid chromatography

Cell culture and differentiation

Drug treatment and total RNA preparation for RNA sequencing (RNA-seq) analysis

RNA-seq data generation and preprocessing

Comparisons with external drug-induced transcriptomic profiles

Data Records

Technical Validation

Assessment of HPLC profiles of JGT

Assessment of experimental reliability

Assessment of RNA quality and integrity

Quality of RNA-seq data

Biological and technical reproducibility across species

Comparisons with external drug-induced transcriptomic profiles

Data availability

Code availability

References

Acknowledgements

Author information

Authors and Affiliations

Contributions

Corresponding authors

Ethics declarations

Competing interests

Additional information

Supplementary information

Rights and permissions

About this article

Cite this article

Share this article

Search

Quick links