Abstract
PIWI-interacting RNAs (piRNAs) have been implicated in the biological processes of various cancers. This study aimed to investigate the diagnostic potential of circulating piRNAs in breast cancer (BC) using machine learning (ML) frameworks. A serum tri-piRNA signature (piR-139966, piR-2572505, piR-2570061) was selected via piRNA sequencing, validated by qPCR, and then analyzed in combination with related clinical factors. Predictive ML models for early diagnosis of BC combining piRNA expression with CA153 were constructed using 10 ML algorithms and evaluated by 8 performance metrics. Serum levels of piR-139966, piR-2572505, and piR-2570061 were significantly upregulated in early-stage BC patients compared to matched healthy controls. This tri-piRNA panel demonstrated enhanced diagnostic precision for BC detection and exhibited complementary value to CA153 measurements, whether used alone or combined. Through systematic ML optimization, we developed a stratified diagnostic model where XGBoost algorithm showed optimal performance in both training and validation cohorts for early-stage BC identification. With XGBoost algorithms applied to piRNA expression along with CA153, we developed and validated a predictive ML model with superior diagnostic accuracy compared to conventional approaches.
Similar content being viewed by others
Introduction
Breast cancer (BC) remains a significant global health burden, accounting for approximately 30% of female malignancies worldwide with steadily increasing incidence1. Early diagnosis is crucial for therapeutic success, as localized tumors exhibit > 90% 5-year survival rates compared to approximately 30% in metastatic disease2. Unlike traditional diagnostic approaches, circulating biomarkers overcome the limitations of tissue heterogeneity, enabling non-invasive serial sampling and dynamic tumor monitoring3. This capacity to resolve spatial-temporal heterogeneity provides unprecedented precision for early BC detection.
PiRNAs (Piwi-interacting RNAs), a class of 24–31 nucleotide non-coding RNAs (ncRNAs), interact with PIWI proteins to maintain genome stability through transposon silencing and epigenetic regulation4,5. Their dysregulation in tumors disrupts these protective mechanisms, promoting oncogenesis, metastasis, and therapy resistance through both oncogenic and tumor-suppressive functions6,7,8. For instance, piRNA-14633 acts as an oncogene by stabilizing METTL14 mRNA to enhance m6A methylation9. Conversely, the tumor-suppressive piRNA-36712 inhibits oncogenic SEPW1 signaling via competitive binding to SEPW1P pseudogene RNA, thereby accelerating disease progression and metastatic dissemination10.
PiRNAs exhibit significant diagnostic potential in oncology due to their tumor-specific expression profiles and stability in biofluids11. They circulate as free molecules, protein-bound complexes, or exosome-encapsulated cargo12. Characteristic 3’ terminal 2’-O-methylation confers enhanced stability and superior resistance to enzymatic degradation compared to other ncRNAs13,14. For example, piR-57125 remained highly stable in serum/plasma samples even after repeated freeze-thaw cycles or prolonged room-temperature storage14. Clinically, serum exosomal levels of piR-019308 and piR-004918 are significantly elevated in gastric cancer patients, with further increases observed in metastatic disease, suggesting utility for both diagnosis and disease monitoring15. Most recently, a study established machine learning (ML) classifiers using computationally optimized piRNA sequence descriptors, achieving up to 90.7% accuracy (Logistic Regression, LR) in BC prediction16. Collectively, these findings highlight disease-specific piRNA expression patterns and their promising potential as non-invasive biomarkers for early cancer detection17.
ML emulates biological neural processing to resolve classification challenges through adaptive optimization of hidden layer parameters during training18. This artificial intelligence (AI)-driven methodology has demonstrated significant utility in predicting clinical outcomes across diverse medical conditions19,20. In this study, we leveraged ML frameworks to evaluate the diagnostic potential of circulating piRNAs in BC. A serum tri-piRNA signature (piR-139966, piR-2572505, piR-2570061) was identified through piRNA sequencing and validated in a large-scale cohort, exhibiting superior diagnostic accuracy for early-stage BC. Predictive ML models integrating piRNA expression with CA153 were developed and validated, demonstrating significant improvements over conventional biomarkers and advancing liquid biopsy applications for precision oncology.
Materials and methods
Study population
Serum samples were collected from treatment-naïve BC patients recruited at Shandong Cancer Hospital Affiliated to Shandong First Medical University and Shandong Academy of Medical Sciences (March 2023 to March 2024). Controls were selected from physical examination departments meeting the following criteria: no history of malignancy, benign breast neoplasms, or abnormal inflammatory/physiological indicators (e.g., heart rate, blood pressure, temperature). Controls were age- and gender-matched to BC patients. All BC diagnoses followed AJCC 8th edition TNM staging criteria. CA153 levels were measured using Roche cobas e801 immunoassay analyzer (Roche, Shanghai, China). This study was approved by the Ethics Committee of Shandong Cancer Hospital Affiliated to Shandong First Medical University and Shandong Academy of Medical Sciences, which granted a waiver of informed consent for the analysis of residual clinical samples remaining after routine diagnostic testing.
Serum preparation and RNA extraction
Blood samples were collected in serum separation tubes and centrifuged at 3,000 × g for 10 min to separate serum from cellular components, followed by another centrifugation at 12,000 × g for 10 min at 4 °C to remove any residual particulate matter. The resulting supernatant was harvested and stored at − 80 °C until use. Total serum RNA was extracted using 750 µl of TRIzol® LS Reagent (Thermo Fisher, Carlsbad, CA, USA) per 250 µl of serum according to the reagent instructions.
PiRNA sequencing
PiRNA sequencing libraries were constructed from serum RNA of 5 patients (3 at TNM I-IIA, 2 at TNM III-IV) and 5 healthy controls, followed by Illumina sequencing. Quality control was performed using FastQC, with subsequent adapter trimming via cutadapt. Sequencing reads were aligned to piRBase using NovoAlign (v2.07.11) with a maximum of 2 mismatches allowed. Differentially expressed piRNAs were identified as those exhibiting |log2(fold change)| >2 with p < 0.05 based on TPM-normalized counts.
Reverse transcription and quantification by real-time PCR
Reverse transcription was performed using the miRNA 1st strand cDNA synthesis kit (Accurate Biotechnology (Hunan) Co., Ltd., Changsha, China). Quantitative PCR (qPCR) analysis was conducted on the LightCycler 480 system (Roche, Basel, Switzerland) using SYBR Green Premix Pro Taq HS qPCR Kit (Accurate Biotechnology). U6 small nuclear RNA was used as the endogenous reference gene. Relative piRNA expression levels were calculated by the comparative cycle threshold (CT) method: (ΔCT = CTpiRNAs-CTU6), as previously described. All qPCR primer sequences are listed in Table 1 and S1.
ML model building strategy
A computational framework integrating the piRNA expression and CA153 was developed to evaluate 10 ML algorithms for early breast cancer detection. The algorithms included K-nearest Neighbor (KNN), Logistic Regression (LR), Decision Tree (DT), Artificial Neural Networks (ANN), Support Vector Machine (SVM), Gradient Boosting Decision Tree (GBDT), Light Gradient Boosting Machine (LGBM), Adaptive Boosting (AdaBoost), and Extreme Gradient Boosting (XGBoost). All analyses were implemented in Python 3.10 using standardized libraries, including scikit-learn (v1.5.2; https://scikit-learn.org), XGBoost (v1.7.4; https://xgboost.readthedocs.io), and LightGBM (v4.0.0; https://lightgbm.readthedocs.io), etc. The dataset comprised 102 early-stage BC patients and 122 healthy volunteers, which were intrinsically balanced; thus, no artificial balancing techniques were employed during model construction. Model efficacy was systematically evaluated using the area under the receiver operating characteristic curve (AUC), calibration curves, and other performance metrics (sensitivity, specificity, accuracy, positive predictive value [PPV], negative predictive value [NPV], and F1-score). Clinical utility was quantified via decision curve analysis (DCA), and model interpretability was assessed using SHAP (SHapley Additive exPlanations, 0.46.0, https://shap.readthedocs.io/en/latest/). Through this rigorous evaluation, XGBoost emerged as the optimal classifier. Notably, for initial model screening, a 70/30 train-test split was used to compare classifier performance, which first identified XGBoost as the top candidate. This selection was then rigorously validated using 80/20 bootstrap-based cross-validation to ensure robust and unbiased performance estimates.
Cross-validation was performed using Bootstrap methodology, which involved 1000 iterations of resampling with replacement to generate distinct training sets. Each resampled dataset was partitioned into 80% training and 20% validation subsets. For each iteration, ROC curve metrics were calculated on the validation set. AUC values were collected, and the aggregate True Positive Rate (TPR) was computed as the mean TPR across all iterations (np.mean(tprs)). Model discriminative performance was summarized by the mean AUC, and variability was quantified by its standard deviation.
Statistical analysis
Statistical analyses were performed using the SPSS software (Statistical Package for the Social Sciences, 26.0, https://www.ibm.com/products/spss-statistics, Cary, NC, USA) and GraphPad Prism software (9.5, https://www.graphpad.com, San Diego, CA, USA). The Kolmogorov-Smirnov test was used to assess the normality of the data. If normal, the unpaired t-test was used for the analysis of two groups of data and a one-way analysis of variance (ANOVA) for the analysis of multiple groups. If not, Mann-Whitney U test was used for two groups of data and the Kruskal-Wallis test for multiple groups of analysis. ROC curve analysis was performed to evaluate diagnostic efficacy, with calculation of the AUC, sensitivity, specificity, and 95% confidence intervals (CI). Statistical significance was defined as two-tailed p < 0.05.
Results
Differential PiRNAs identification
Small RNA sequencing was performed on libraries constructed from serum samples of 5 patients with BC and 5 healthy controls. Comparative analysis identified differentially expressed piRNAs between the two groups, as visualized using a heatmap (Fig. 1A), scatter plot (Fig. 1B) and volcano map (Fig. 1C). Six piRNAs were initially selected for validation in an expanded cohort (24 BC patients vs. 24 healthy volunteers). Unexpectedly, piR-4468896, piR-1948986 and piR-4488028 did not exhibit significant differential expression. Conversely, piR-139966, piR-2572505, and piR-2570061 were selected for further large-scale validation based on their significant differential expression between groups. The sequences of these piRNAs are provided in Table S2.
Serum tri-piRNA signature as the biomarker for early diagnosis of BC
Next, the tri-piRNA signature (piR-139966/piR-2572505/piR-2570061) was validated in a large cohort comprising 161 BC patients and 129 healthy volunteers. Serum levels of all three piRNAs were significantly upregulated in BC patients (Fig. 2A). Detailed clinical characteristics of BC patients are summarized in Table 2. Notably, piR-139966 expression showed a specific correlation with lymph node metastasis (p = 0.0015). ROC analysis demonstrated individual diagnostic efficacy for BC detection, with AUCs of 0.7492 for piR-139966, 0.7894 for piR-2572505, and 0.7183 for piR-2570061, respectively (Fig. 2B). When combined, the tri-piRNA signature exhibited improved diagnostic performance with an AUC of 0.8125 (60.25% sensitivity and 93.02% specificity; Fig. 2C), confirming their potential as non-invasive BC biomarkers.
Serum tri-piRNA signature as biomarkers for BC diagnosis. (A) The differential expressions of serum piR-139966, piR-2572505, and piR-2570061 were analyzed between BC patients and health donors. The AUCs of serum piR-139966, piR-2572505, and piR-2570061 were 0.7492, 0.7894 and 0.7183, respectively (B). The diagnostic performance for their combination demonstrated an AUC of 0.8125 (C). Mann-Whitney U test; ****P < 0.0001.
In early-stage BC (Tis + I + IIA, n = 109), serum levels of all three piRNAs were also significantly elevated compared with healthy controls (Fig. 3A). The association between piRNA expression and clinicopathological features in this early-stage cohort is detailed in Table 3. ROC analysis revealed diagnostic AUCs of 0.7777 for piR-139966, 0.7773 for piR-2572505, and 0.7213 for piR-2570061 (Fig. 3B). The combined piRNA panel achieved superior performance (AUC = 0.8098; 61.47% sensitivity, 89.92% specificity; Fig. 3C), supporting its utility for non-invasive early BC detection.
Serum tri-piRNA signature as biomarkers for early diagnosis of BC. (A) The differential expressions of serum piR-139966, piR-2572505, and piR-2570061 were analyzed in early-stage BC patients compared with health donors. The AUCs of serum piR-139966, piR-2572505, and piR-2570061 were 0.7777, 0.7773, and 0.7213, respectively (B). The diagnostic performance for their combination demonstrated an AUC of 0.8098 (C). Mann-Whitney U test; ****P < 0.0001.
Given the clinical significance of young-onset BC (≤ 45 years) characterized by distinct biology and the necessity for personalized management21, we further evaluated the tri-piRNA signature in this specific subgroup. Serum levels of all three piRNAs were upregulated in young-onset BC patients. ROC analysis showed that AUCs for individual piRNAs ranged from 0.79 to 0.83, while the combined panel yielded an AUC of 0.86 (Fig. S1). This performance was consistent with that observed in early-stage BC, primarily due to the overlap between early-stage and young-onset BC cohorts, as younger patients were predominantly diagnosed at earlier disease stages.
Serum tri-piRNA signature enhances the diagnostic accuracy of CA153
CA153, a cornerstone serum biomarker in BC management since the 1980s22, was evaluated for its diagnostic synergy with the tri-piRNA signature. Pairwise integration of CA153 with individual piRNAs improved diagnostic metrics over CA153 alone (Fig. S2): the combination with piR-139966 elevated the AUC from 0.7492 to 0.7882, that with piR-2572505 increased the AUC from 0.7894 to 0.7935, and that with piR-2570061 enhanced the AUC from 0.7183 to 0.7445; however, the triple-piRNA combination with CA153 showed no additive benefit (Fig. 4A). In the early-stage subgroup analyses (Fig. 4B), the AUC of piR-139966 combined with CA153 increased from 0.7777 to 0.7892, and the AUC of piR-2570061 combined with CA153 increased from 0.7213 to 0.7302. In contrast, neither the piR-2572505/CA153 combination nor the combined panel of all three piRNAs with CA153 resulted in an increase in AUC.
Serum tri-piRNA signature enhances diagnostic accuracy of CA153. The AUCs of CA153 combined with piR-139966, piR-2572505, piR-2570061, and their combination in BC patients relative to healthy individuals(A). Serum piRNAs enhance the diagnostic accuracy of CA153 for early-stage BC. The AUCs of CA153 combined with piR-139966, piR-2572505, piR-2570061, and their combination in early-stage BC relative to healthy individuals (B).
ML model based on the piRNA expression with CA153
We developed an ML framework that integrates the aforementioned tri-piRNA signature and CA153 to evaluate the predictive performance of ten distinct ML algorithms. In the training cohort, GBDT, AdaBoost, and XGBoost achieved perfect classification accuracy (AUC = 1.0), followed by LGBM (AUC = 0.97) and random forest (AUC = 0.91). In validation testing, XGBoost emerged as the superior classifier (AUC = 0.84), marginally outperforming KNN (AUC = 0.81), SVM (AUC = 0.80) and LGBM (AUC = 0.79) (Fig. 5A). The Precision-Recall (PR) curve illustrated the precision-recall trade-off across classification thresholds (Fig. 5B), while calibration analysis demonstrated that the predicted probabilities of all models were well-calibrated (Fig. 5C). Based on these comprehensive evaluations, XGBoost was selected as the optimal algorithm due to its consistently superior performance in both the training and validation cohorts.
ML Model based on the piRNA expression with CA153. (A) ROC curves for 10 ML algorithms in the training cohort (left) and validation cohort (right); (B) PR curves for 10 ML algorithms in the training cohort (left) and validation cohort (right); (C) Calibration curves for 10 ML algorithms in the training cohort (left) and validation cohort (right).
XGBoost model simplification
To streamline prediction models, random forest analysis identified significant contributions from all three piRNAs and CA153, with piR-2570061 exhibiting the lowest impact (Fig. 6A). Pearson correlation revealed similar distributions between piR-2572505 and piR-2570061 (Fig. 6B). LASSO regression, employed to address collinearity, excluded piR-2,570,061, resulting in a final XGBoost model incorporating piR-139966, piR-2572505, and CA153 (Fig. 6C). The model achieved perfect performance in the training phase (AUC = 1.00, Fig. 6D), maintained robust diagnostic accuracy via bootstrap cross-validation (AUC = 0.89, Fig. 6E), and attained an AUC of 0.84 in the validation cohort (Fig. 6F). Three orthogonal validation analyses were conducted to verify the model logic. PR curves and DCA demonstrated favorable precision-recall trade-offs and substantial net benefit in both training and validation cohorts (Fig. 6G-H), while SHAP analysis identified piR-2572505 as the primary predictive determinant (Fig. 6I). Comprehensive diagnostic performance metrics for all algorithms are already compared in Table 4.
XGBoost Model Simplification. (A) Random forest for significant contributions of three piRNAs and CA153 to model predictions; (B,C) Pearson correlation (B) and LASSO regression analysis (C) for collinearity analysis of three piRNAs and CA153; (D–F) ROC curves for XGBoost algorithm in the training cohort (D), cross-validation (E) and validation cohort (F); (G,H) PR curve (G) and DCA (H) for XGBoost algorithm in the training cohort (left) and validation cohort (right); (I) SHAP interpretation of the model constructed by the XGBoost algorithm.
Discussion
Accumulating evidence underscores the distinctive advantages of piRNAs as tumor biomarkers compared to conventional ncRNAs6,23,24,25. Their evolutionarily conserved 2’-O-methylated 3’ termini endow them with exceptional resistance to RNases, thereby enhancing their stability in bodily fluids26. This structural durability, combined with malignancy-specific expression signatures, enables superior clinical detectability. In the current study, our investigation identified a serum tri-piRNA signature (piR-139966, piR-2572505, piR-2570061) that holds diagnostic potential for liquid biopsy-based detection of early-stage BC.
PiRNAs are a novel type of small molecule ncRNA identified in animal germ cells in July 2006. The three piRNAs described above were characterized in mammalian ovarian tissues27; however, their expression patterns and biological functions remain unreported in the literature. Given that piRNAs typically combine with proteins of the PIWI family, conventional miRNA target prediction algorithms are biologically inappropriate for targetome mapping, such as those relying on seed sequence matching and 3’UTR enrichment analyses. Notably, our data revealed the upregulation of these three piRNAs in the sera of BC patients. This finding not only confirms their stability as liquid biopsy biomarkers but also indicates their significant overexpression in BC pathogenesis, thereby suggesting potential driver roles in oncogenesis.
Early-stage BC frequently evades detection due to non-palpable manifestations in over 80% of cases, highlighting the need for hypersensitive biomarkers28. Our study confirms a circulating tri-piRNA panel (piR-139966, piR-2572505, piR-2570061) with utility for early BC detection. This panel exhibited superior diagnostic discriminative ability, and its synergistic integration with CA153 enhanced performance both in standalone and combined applications. Furthermore, we developed an XGBoost-based ML framework incorporating piRNA expression with CA153, which achieved perfect discriminative performance in the training cohort (AUC = 1.00) while maintaining diagnostic accuracy in validation (AUC = 0.84). This approach overcomes conventional diagnostic limitations by enabling multidimensional analysis of complex biomarker interactions28,29. Robust validation metrics confirmed the diagnostic credibility, while model interpretability was validated through PR curves, calibration plots, DCA, and SHAP value quantification. These findings establish ML-driven frameworks as pivotal tools for intercepting early tumorigenesis, with future validations poised to advance clinical translation.
Several limitations of the current study should be carefully considered. First, cohort size (161 BC cases vs. 129 healthy controls) may have reduced statistical power of the predictive modeling. Second, the piRNA-seq analysis was constrained by limited biological replicates (5 healthy volunteers vs. 5 BC cases), which may partially account for the discordance between the sequencing data and qPCR validation results. Finally, although we established an ML framework for early-stage BC detection, we were unable to develop a clinically applicable calculator due to technical constraints, which need further investigation in the future.
Despite significant advancements in targeted oncology, a substantial portion of molecular targets, including kinases and non-coding RNAs, remain underexplored. For example, previous work has highlighted the untapped potential of the human kinome in cancer30, paralleling the emerging utility of piRNAs explored in our study. Notably, over 90% of piRNAs remain clinically unexploited, representing both a critical knowledge gap and an unprecedented translational opportunity. Collectively, we constructed and validated a predictive ML model that offers substantial improvements over traditional approaches by integrating serum piRNA expression with CA153, thereby enhancing liquid biopsy-driven precision oncology strategies.
Data availability
The data reported in this paper have been deposited in the OMIX, China National Center for Bioinformation/Beijing Institute of Genomics, Chinese Academy of Sciences (https://ngdc.cncb.ac.cn/omix: accession no.OMIX009645.
References
Siegel, R. L. et al. Cancer statistics, 2025. CA Cancer J. Clin. 75 (1), 10–45 (2025).
Giaquinto, A. N. et al. Breast cancer statistics 2024. CA Cancer J. Clin. 74 (6), 477–495 (2024).
Ohmura, H. et al. Liquid biopsy for breast cancer and other solid tumors: a review of recent advances. Breast Cancer. 32 (1), 33–42 (2025).
Iwasaki, Y. W., Siomi, M. C. & Siomi, H. PIWI-Interacting RNA: its biogenesis and functions. Annu. Rev. Biochem. 84, 405–433 (2015).
Wang, X. et al. Emerging roles and functional mechanisms of PIWI-interacting RNAs. Nat. Rev. Mol. Cell. Biol. 24 (2), 123–141 (2023).
Liu, Y. et al. The emerging role of the pirna/piwi complex in cancer. Mol. Cancer. 18 (1), 123 (2019).
Weng, W., Li, H. & Goel, A. Piwi-interacting RNAs (piRNAs) and cancer: emerging biological concepts and potential clinical implications. Biochim. Biophys. Acta Rev. Cancer. 1871 (1), 160–169 (2019).
Zhang, Q. et al. The epigenetic regulatory mechanism of piwi/pirnas in human cancers. Mol. Cancer. 22 (1), 45 (2023).
Xie, Q. et al. piRNA-14633 promotes cervical cancer cell malignancy in a METTL14-dependent m6A RNA methylation manner. J. Transl Med. 20 (1), 51 (2022).
Tan, L. et al. PIWI-interacting RNA-36712 restrains breast cancer progression and chemoresistance by interaction with SEPW1 pseudogene SEPW1P RNA. Mol. Cancer. 18 (1), 9 (2019).
Chen, S. et al. The biogenesis and biological function of PIWI-interacting RNA in cancer. J. Hematol. Oncol. 14 (1), 93 (2021).
Hong, Y. et al. Systematic characterization of seminal plasma PiRNAs as molecular biomarkers for male infertility. Sci. Rep. 6, 24229 (2016).
Kurth, H. M. & Mochizuki, K. 2’-O-methylation stabilizes Piwi-associated small RNAs and ensures DNA elimination in tetrahymena. RNA 15 (4), 675–685 (2009).
Yang, X. et al. Detection of stably expressed PiRNAs in human blood. Int. J. Clin. Exp. Med. 8 (8), 13353–13358 (2015).
Ge, L. et al. Circulating Exosomal small RNAs are promising non-invasive diagnostic biomarkers for gastric cancer. J. Cell. Mol. Med. 24 (24), 14502–14513 (2020).
Zhao, A. R. et al. Machine-learning diagnostics of breast cancer using PiRNA biomarkers. Biomarkers 30 (2), 167–177 (2025).
AmeliMojarad, M. & Amelimojarad, M. PiRNAs and PIWI proteins as potential biomarkers in breast cancer. Mol. Biol. Rep. 49 (10), 9855–9862 (2022).
Ong, M. E. et al. Prediction of cardiac arrest in critically ill patients presenting to the emergency department using a machine learning score incorporating heart rate variability compared with the modified early warning score. Crit. Care. 16 (3), R108 (2012).
Greener, J. G. et al. A guide to machine learning for biologists. Nat. Rev. Mol. Cell. Biol. 23 (1), 40–55 (2022).
Deo, R. C. Machine learning in medicine. Circulation 132 (20), 1920–1930 (2015).
Anastasiadi, Z. et al. Breast cancer in young women: an overview. Updates Surg. 69 (3), 313–317 (2017).
Li, X., Xu, Y. & Zhang, L. Serum CA153 as biomarker for cancer and noncancer diseases. Prog. Mol. Biol. Transl Sci. 162, 265–276 (2019).
Tamtaji, O. R. et al. PIWI-interacting RNAs and PIWI proteins in glioma: molecular pathogenesis and role as biomarkers. Cell. Commun. Signal. 18 (1), 168 (2020).
Deng, X. et al. The burgeoning importance of PIWI-interacting RNAs in cancer progression. Sci. China Life Sci. 67 (4), 653–662 (2024).
Qian, L. et al. Piwi-Interacting rnas: A new class of regulator in human breast cancer. Front. Oncol. 11, 695077 (2021).
Pastore, B. et al. pre-piRNA trimming and 2’-O-methylation protect PiRNAs from 3’ tailing and degradation in C. elegans. Cell. Rep. 36 (9), 109640 (2021).
Roovers, E. F. et al. Piwi proteins and PiRNAs in mammalian oocytes and early embryos. Cell. Rep. 10 (12), 2069–2082 (2015).
Swanson, K. et al. From patterns to patients: advances in clinical machine learning for cancer diagnosis, prognosis, and treatment. Cell 186 (8), 1772–1791 (2023).
Tran, K. A. et al. Deep learning in cancer diagnosis, prognosis and treatment selection. Genome Med. 13 (1), 152 (2021).
Essegian, D. et al. The clinical kinase index: A method to prioritize understudied kinases as drug targets for the treatment of cancer. Cell. Rep. Med. 1 (7), 100128 (2020).
Funding
This study was financially supported by Shandong Natural Science Foundation Innovation and Development Joint Fund (ZR2023LZL011), Taishan Youth Scholar Program of Shandong Province (No. tsqn202312366), Collaborative Academic Innovation Project of Shandong Cancer Hospital (TS-010) and Shandong Traditional Chinese Medicine Technology Project (M2023-013).
Author information
Authors and Affiliations
Contributions
XG S designed the experiments and guaranteed the integrity of the entire study. LM N, X L carried out experiments; WC Z contributed to the machine learning model construction; LM N and JM Z wrote the manuscript and prepared the figures and tables; L L analyzed the experimental data and conducted statistical analysis; All authors contributed to the article and approved the submitted version.
Corresponding author
Ethics declarations
Competing interests
The authors declare no competing interests.
Ethics statement
The human participants in this study were reviewed and approved by the Ethics Committee of Shandong Cancer Hospital Affiliated to Shandong First Medical University and Shandong Academy of Medical Sciences with full respect to the 1964 Helsinki Declaration. All detections in the current study were carried out using the remaining sample after completing clinical inspection, the requirement for informed consent was waived by the Ethics Committee of Shandong Cancer Hospital Affiliated to Shandong First Medical University and Shandong Academy of Medical Sciences in accordance with the relevant guidelines and regulations.
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary Information
Below is the link to the electronic supplementary material.


Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.
About this article
Cite this article
Niu, L., Zhou, W., Li, X. et al. Machine learning model for early diagnosis of breast cancer based on PiRNA expression with CA153. Sci Rep 15, 30586 (2025). https://doi.org/10.1038/s41598-025-15431-9
Received:
Accepted:
Published:
Version of record:
DOI: https://doi.org/10.1038/s41598-025-15431-9








