Abstract
Extracellular vesicles (EVs) are emerging as promising noninvasive biomarkers, yet their clinical translation faces substantial hurdles, primarily due to the challenge of identifying assay-compatible markers. Here, in this Review, we outline sophisticated computational frameworks, particularly leveraging artificial intelligence, to bridge this gap. We detail the integration of diverse data resources, including disease-specific omics, EV, protein localization, tissue-specific, drug, model system and immune databases. This Review comprehensively describes computational selection strategies, from rule-based sequential filtering to advanced machine learning for data fusion and deep learning for multi-omics integration. Crucially, it discusses the refinement of biomarker candidates using artificial-intelligence-driven predictions of protein structure and physicochemical properties, ensuring compatibility with existing assay systems. By systematically evaluating biomarkers for predictive performance, biological plausibility and clinical utility, this framework aims to accelerate the transition of EV research from discovery to clinical application, thereby enhancing precision medicine.
Similar content being viewed by others
Introduction
Extracellular vesicles (EVs) are nanosized vesicles secreted by cells, carrying several molecules such as RNAs, proteins, lipids and DNA1,2. Due to their molecular complexity and cell-type specificity, EVs have emerged as a promising source of noninvasive biomarkers for various diseases, including cancers3,4,5 and neurodegenerative disorders6,7,8. Over the past decade, advances in omics technologies and EV isolation methods have greatly expanded the landscape of potential EV-based biomarkers. Despite these advances, translating EV biomarkers into clinical practice remains challenging9. Conventional discovery strategies have primarily focused on identifying disease-associated molecules through differential expression or multi-omics integration. However, markers identified solely on the basis of disease relevance may not always be compatible with available assay platforms owing to molecular properties10,11,12. A number of promising candidates uncovered by the conventional methodologies lack the necessary accessibility for antibody binding in assays or the structural stability required for consistent measurement, thereby hindering their transition from discovery to clinical utility. For instance, in 2021 alone, there were more than 1000 research papers published on EV-based biomarkers, yet only 4 EV-based biomarker assays have been clinically validated13,14. Therefore, identifying assay-eligible biomarkers is increasingly recognized as a crucial step toward clinical utility. These challenges highlight the need for developing sophisticated computational frameworks enabling precise biomarker identification for clinical use15. In response to this need, recent breakthroughs in artificial intelligence (AI), particularly in protein structure prediction and molecular interaction modeling16, have introduced novel avenues for biomedical research17,18,19. These technologies may offer unprecedented opportunities to bridge this gap between computational EV marker discovery and clinical assay design by prioritizing markers based not only on biological relevance but also on molecular characteristics and interactions. In this Review, we comprehensively introduce the computational frameworks for the identification of candidate EV biomarkers. Here, we first summarize several data resources that are important for the discovery of EV biomarkers from a computational perspective, including specialized disease-specific omics databases, EV databases, protein localization and tissue-specific databases, drug databases, model system databases and immune databases, highlighting their distinct applications. In the following section, we will outline a variety of computational selection strategies, ranging from stepwise filtering methods based on sequential selection to advanced AI techniques for identifying informative EV biomarkers. Furthermore, the critical aspect of integrating computational insights with advanced assay systems is discussed, emphasizing how structural and physicochemical predictions based on AI can refine EV biomarker candidates for the application of the assay. In conclusion, this Review summarizes future perspectives and challenges in this field. Harnessing the synergistic potential of cutting-edge computational techniques in conjunction with EV assays can accelerate the transition of EV research from promising research findings into routine clinical practice (Fig. 1).
Data resources and their potential utilities
Identifying EV biomarkers for its successful clinical application is complex, necessitating integration of large heterogeneous biological and clinical datasets. As EV research rapidly expands, so does the quantity and diversity of available resources, spanning from the molecular information of EVs to comprehensive disease profiles, protein location, drug interactions, model system characteristics and responses in immune cells. Systematic knowledge and use of such data resources are essential to expedite the discovery of robust, assay-eligible EV biomarkers. This section discusses the major types of databases for EV biomarker discovery and the role each database type plays in marker selection strategies (Table 1).
(1) Disease datasets. Large-scale disease cohort datasets (for example, publicly available; The Cancer Genome Atlas (TCGA), CPTAC, TARGET, AD Knowledge Portal, COPD Cell Atlas, Prostate Cancer Transcriptome Atlas and so on) are crucial in identifying key disease-associated molecules that can serve as candidate EV biomarkers. These datasets are invaluable in the candidate generation stage, enabling identification of genes or proteins that are disease specific20,21,22, and can be used to validate detected EV biomarkers. For example, TCGA analysis was used to elucidate the role of their marker in tumors, which revealed its consistent overexpression across multiple cancers compared with normal tissues23,24.
(2) EV databases. EV databases such as Vesiclepedia, EVpedia, ExoCarta, EV-TRACK and EV-COMM systematically document the molecular contents (proteins, RNAs and lipids) of EVs and methodological details from thousands of EV studies. These resources are essential during the initial selection stage, where they enhance confidence in recurrent EV markers, support validation of isolation and extraction methods by enabling cross-comparison and help reconcile discrepancies across studies. For instance, colorectal cancer study and non-small-cell lung cancer study confirmed detected proteins as EV-related by matching them to these databases23,25, while ovarian cancer studies used them to filter out common EV proteins and prioritize lineage-specific EV biomarkers26.
(3) Protein localization and tissue-specific databases. Protein localization and tissue-specific databases (TCSA, HPA and GTEx) are supportive of EV biomarker discovery. These resources facilitate the refinement of biomarker candidates by identifying molecules with a high likelihood of being localized on EV membranes or within EVs, while also assessing their tissue specificity, thereby supporting the selection of optimal targets for the intended assay20,27,28,29. For example, the HPA was utilized for the initial identification of 488 proteins exclusively expressed in the brain, contributing to the validation of brain-specific EV biomarker candidates such as APLP1. TCSA was used to screen for surface proteins on pancreatic ductal adenocarcinoma EVs.
(4) Drug databases. Drug databases collectively enable researchers to enhance the translational potential of EV biomarker discovery. Pharos, DrugBank, ClinicalTrials.gov, Drugst.One, NeDRex and LINCS collectively provide data on drug-target relationships, ligand interactions, druggability tiers and ongoing clinical investigations. These databases can play a pivotal role in the target validation stage, allowing researchers to select EV biomarkers with therapeutic relevance or existing clinical interest. By filtering EV markers on the basis of therapeutic relevance with known drug associations, these resources help focus on targets more likely to be clinically actionable27.
(5) Model system databases. Only one tissue-based study cannot discover EV markers associated with the systemic responses to cancer. Databases of model organisms and cell lines are invaluable to explore and validate EV biomarkers experimentally. Resources such as the Cancer Cell Line Encyclopedia (CCLE), DepMap and the NCI’s Patient-Derived Models Repository (PDMR) provide comprehensive molecular characterization and pharmacological profiling of cell lines, xenografts and organoids. These resources provide crucial experimental contexts for preclinical selection based on the expression of identified EV biomarkers, or for discovering novel biomarkers across various tumor cell lines and organoid models30,31,32.
(6) Immune databases. Effective EV protein biomarker selection requires consideration of immune cell expression profiles to maintain disease specificity. DMAP, ImmGen and ImmPort curate gene expression profiles across human and mouse immune cells. Immune expression databases can facilitate the removal of immune-associated markers during the marker identification step, ensuring that selected biomarkers provide clearer, disease-specific information20,22,27. In addition, their immune information can provide an opportunity to identify essential EV biomarkers for predicting responses to immunotherapy in precision immuno-oncology4
Selection strategies
EV biomarker identification involves considerable analytical challenges due to multi-omics and diverse data types from various experimental platforms and methodologies. Identification of clinically and biologically useful EV biomarkers in these datasets requires advanced computational feature selection strategies. The following section provides a comprehensive overview of these computational selection strategies, highlighting their strengths and limitations.
(1) Rule-based sequential selection. Rule-based sequential selection for EV biomarker discovery involves a stepwise filtering framework that systematically integrates biological knowledge and multi-omics data to identify disease-specific EV markers. The process involves identifying disease-specific markers from disease datasets while removing broadly expressed ones referring to housekeeping or immune-related genes, then verifying their association and expression with model system databases, EV databases and molecular localization databases, as appropriate for the assay method. Finally, it prioritizes targets on the basis of their functional and therapeutic relevance to drug databases (Fig. 2).
This sequential selection serves as a representative example of a biologically driven, multilayered integration strategy for EV marker and could be adapted to different disease contexts. It further promotes reproducibility and robustness by reducing algorithmic bias and noise, while focusing on features that are consistently salient. In addition, selecting targets with druggability or clinical trial relevance enhances the translatability of the prioritized markers. However, this strategy has limitations. The effectiveness of this strategy relies on the availability and quality of public multi-omics data, which is not always the case, especially for rare or understudied conditions. Furthermore, the filtering criteria might exclude novel EV markers that have not yet been described in contemporary databases. Ultimately, computational prioritization alone cannot replace empirical validation, as functional assays are essential to confirm the diagnostic or therapeutic utility. This rule-based selection has been effectively applied in various cancer types to identify disease-specific EV markers. For instance, in pancreatic ductal adenocarcinoma, this sequential approach integrating CPTAC and CCLE proteomics with surface/EV association (TCSA, Vesiclepedia) led to the selection of MUC1, EGFR and TROP2, which were validated as diagnostic markers in patient plasma27. Similarly, researchers in osteosarcoma combined differential expression analysis with curated surfaceome lists and applied EV-related evidence to rank candidate EV surface proteins sequentially20. In brain-associated EV studies, the GTEx and HPA databases were utilized to evaluate brain specificity and select brain EV biomarkers across different datasets28,33. In hepatocellular carcinoma, this selection strategy involved identifying tumor-enriched genes while filtering highly expressed genes in immune cells using DMAP, followed by confirmation of vesicle association and therapeutic relevance22.
(2) Data fusion using machine learning (ML). ML methods have recently drawn attention as powerful tools to identify informative EV novel markers that may not be detectable through conventional statistical approaches. In this strategy, heterogeneous data (for example, mRNA, protein and clinical data) are fused into a ML model. ML algorithms process multi-omic profiles to implicitly weigh and select the most informative features, for example, for distinguishing disease from control samples (supervised learning) or for discovering underlying biology (unsupervised learning). ML-based data fusion can capture complex, nonlinear interactions between features. A primary advantage is to enable a comprehensive view and to discover biomarkers that show relatively weak contributions separately, while they play synergistically together. However, limitations remain, such as complexities in model building due to heterogeneous data types and the risk of overfitting in the absence of robust cross-validation strategies. Nevertheless, the integration of multi-omics data with ML has shown promising outcomes in disease-specific biomarker identification, especially when feature regularization or ensemble learning is used to reduce the risks of overfitting.
Several supervised ML algorithms have been widely adopted in EV biomarker discovery pipelines. Least Absolute Shrinkage and Selection Operator (LASSO) regression was applied to identify combinations of EV surface proteins (PLAU, ITGAX, ANXA1 and ITGA4) that distinguish Alzheimer’s disease from healthy controls34. Random forest, an ensemble method combining multiple decision trees, has been shown alongside LASSO regression and stepwise elimination to effectively rank feature importance and build models for pan-cancer detection of EV protein markers35. Support vector machines also excel at complex classification tasks by projecting data into higher-dimensional spaces using kernel functions to identify nonlinear decision boundaries. One study applied support vector machine modeling to plasma-derived EV proteomics data to build a seven-protein signature for early pancreatic cancer detection36. Beyond algorithms, integrating multiple ML approaches and data types is often essential due to the complexity of biological systems. Data Integration Analysis for Biomarker discovery using Latent cOmponents (DIABLO) maximizes correlation across multiple omics datasets while identifying key variables and disease phenotypes37. Similarly, AutoOmics provides an automated ML framework to select and optimize models across different omics layers before fusing latent features into final classifiers38, and iClusterBayes takes a Bayesian approach to integrative clustering, jointly modeling correlations across omics types to identify molecular subtypes in complex diseases39.
(3) Data integration using deep learning (DL). As biological datasets grow in complexity and size, AI is increasingly used to discover clinically relevant biomarkers. DL models (for example, convolutional neural networks, autoencoders and graph neural networks) can combine multimodal data and incorporate known biological networks. However, these models often function as black boxes, offering limited interpretability without additional explainability tools (for example, SHapley Additive exPlanations (SHAP), which is a unified explainable ML method). They also require large-scale datasets for training and involve complex model design and computational burden. Although still limited in EV biomarker research, DL tools leveraging multi-omics data for biomarker discovery are emerging, and this section reviews how DL models are being used in biomarker research. A major application lies in multi-omics data integration, where DL frameworks surpass conventional methods by capturing cross-modal relationships. For instance, MOGONET uses graph convolutional networks (GCNs) to integrate mRNA, microRNA (miRNA) and DNA methylation data, improving disease classification while identifying omics-specific biomarkers40. Similarly, GOAT, an attention-based graph neural network, combines multi-omics features with protein–protein interaction (PPI) networks to prioritize key regulatory factors, such as CTNNB1 and JUN, that were not detectable by traditional analysis41. DL approaches have proven effective in disease diagnosis and classification. In chronic obstructive pulmonary disease (COPD), a GCN-based framework was trained using gene expression and proteomic profiles onto a PPI network to distinguish diseased from normal samples, while explainable AI (xAI) techniques, including SHAP analysis, highlighted critical features such as CXCL11, IL-2 and CD4842. DL also supports identifying patterns and clustering data without labels, enabling the discovery of key features on clustering novel groups within disease multi-omics data, outlier detection and dimensionality reduction for biomarker discovery. Autoencoders, for example, can compress high-dimensional multi-omics data into lower-dimensional representations to identify subpopulations. Furthermore, important features can be identified by assessing which biomarkers most strongly influence the encoding through analyses such as SHAP or gradient-based attribution. A tool like Multi-omics Autoencoder Integration (MAUI) is based on a variational autoencoder (β-VAE). It integrates various data types (for example, gene expression, mutations and copy-number alterations) and stratifies colorectal cancer subtypes43. DeepProg combines autoencoders with ML to predict survival subtypes and classify high-risk groups using multi-omics data (for example, RNA sequencing, methylation and miRNA)44. Finally, CrossPred, a deep multi-encoder model, links exosomal miRNAs and intracellular mRNAs through a shared embedding space, effectively denoising data and identifying cancer-linked miRNAs and genes45. In scenarios where samples of labeled data are scarce, a DL model can leverage both labeled and unlabeled data to strengthen its power. DL can first learn general features from abundant unlabeled data, then refine these features using a small set of labeled samples. As an example, MOSEGCN integrates mRNA, miRNA and methylation data using GCNs and attention, propagating label information across graphs of patient samples with mixed labeled and unlabeled data. It demonstrated high accuracy (~83% for Alzheimer’s, ~87% for breast cancer subtypes) and identified meaningful disease-related genes46.
The most appropriate approach for EV biomarker discovery depends on the specific project’s needs. Rule-based sequential approaches are advantageous when working with well-annotated datasets that require clear filtering criteria, robust noise reduction and high interpretability, especially for rapidly prioritizing assay-compatible or clinically relevant markers. Conversely, ML is adept at capturing complex interactions and subtle signals in diverse multi-omics data, identifying biomarkers missed by rule-based strategies. However, it requires large datasets and is susceptible to bias induced by data noise. Emerging DL models can integrate heterogeneous multi-omics data and uncover hidden patterns in complex data. However, these models face challenges with interpretability, high computational burdens and complex design requirements. In addition, further extensive validation is needed as the application in the EV biomarker field is still limited (Fig. 3).
Integration with assay system
For computationally identified EV biomarkers to reach clinical use, compatibility with existing assay platforms and reliable detection are crucial. After computational selection, translating EV biomarkers into clinical use requires ensuring compatibility with existing assay platforms. Many EV-derived RNAs and proteins exhibit structural or physicochemical traits that hinder detection. Prioritizing markers with assay-friendly properties is therefore critical. This section reviews existing EV assay systems for RNA and protein detection and then discusses how advanced computational insights can further refine biomarker selection for enhanced assay performance in terms of particularly structural and physicochemical predictions.
(1) Existing assay systems for RNA or protein detection. Bridging computational discovery and clinical use requires understanding currently available EV-based RNA and protein assays, which lead to noninvasive liquid biopsy technologies. One prominent example is the ExoDx Prostate (IntelliScore) test, a noninvasive urine test that utilizes EVs (exosomes) to assess a man’s risk of having high-grade prostate cancer. By analyzing specific RNA markers (PCA3, ERG and SPDEF) within these exosomes, the test provides a personalized risk score to help determine the necessity of an initial prostate biopsy, thereby potentially reducing unnecessary initial prostate biopsies47,48. Similarly, other examples include miR Sentinel test series analyzing small noncoding RNAs and miRNA patterns from urinary exosomes (for prostate and bladder cancer risk assessment)49, as well as the blood-based ClarityDx Prostate test (for high-grade prostate cancer protein analysis)50. Beyond these established tests, novel assays are continually emerging, such as an EV mRNA Digital Assay for hepatocellular carcinoma treatment response22, an EV Surface Protein Assay for noninvasive pancreatic cancer detection27 and an OS EV MMP Activity Assay for osteosarcoma monitoring, all showing substantial potential for early detection and disease management20.
(2) Computational strategies for biomarker refinement for enhanced EV detection assay system: protein structure and physicochemical properties. After initial biomarker selection, understanding molecular properties such as accessibility, binding efficiency and behavior in assay environments is critical for diagnostic or therapeutic use. Recent advances in AI have led to functional genomics models such as Evo2, AlphaGenome, LucaOne and ChatNT predicting genome function from DNA sequences and are driving progress across the bioindustry51,52,53,54. In line with these advancements, this section introduces AI tools that can be applied after biomarker selection, particularly those capable of predicting protein structures and modeling their interactions, which are essential for assay platform compatibility.
AlphaFold3 (AF3) is a notable advancement, capable of modeling diverse biomolecules (proteins, DNA and RNA) and their interactions at atomic resolution. Unlike traditional docking tools such as AutoDock Vina55, HADDOCK56, ClusPro57 or RosettaDock58, which require predetermined structures, AF3 can generate entire molecular complex structures end-to-end from sequence input.
It outperforms traditional docking tools and shows superior accuracy for various biomolecular interactions59. RoseTTAFold All-Atom (RFAA) is another notable generalist method. RFAA is highly advantageous for its ability to accurately model a diverse range of biomolecular assemblies and uniquely offers de novo design of novel small molecule-binding proteins with custom pockets through RFdiffusion All-Atom (RFdiffusionAA), a capability that has been experimentally validated. However, RFAA’s performance in protein–ligand interactions is outperformed by AF3, and RoseTTAFold2NA surpasses it in nucleic acid predictions60,61. Even with their advancements, current AI structural prediction tools have limitations. AF3, for example, occasionally faces issues related to chirality, overlapping atoms and a focus on static rather than dynamic structures, and may require numerous predictions to achieve the highest accuracy for complex targets59.
Despite these limitations, these AI tools might be crucial for the post-biomarker selection phase. In particular, by providing critical insights into biomarker accessibility and binding dynamics for EV detection assays, they could help bridge the gap between initial biomarker discovery and clinical application.
Computational evaluation of biomarker candidates
After computational detection and refinement of optimal EV markers as discussed in the previous section, their comprehensive and multifaceted evaluation is important. This evaluation step includes not only experimental validation but also a variety of computational strategies to predict and optimize the performance of candidate EV biomarkers. First, the predictive performance of diagnostic and prognostic biomarkers is assessed using standard classification metrics (for example, area under the receiver operating characteristic curve, accuracy, sensitivity and specificity). For prognostic markers, patient outcomes are evaluated using Kaplan–Meier curves and hazard ratios from Cox models. Beyond statistical significance, the biological context of the selected biomarkers is crucial, integrating biomarkers into known networks to confirm mechanistic plausibility and confirming by cross-validation using independent datasets for generalizability. Finally, potential clinical utility can be assessed by correlating biomarkers with clinical parameters and confirming clinical relevance to existing drugs through drug databases such as ClinicalTrials.gov, Pharos or DrugBank. This systematic evaluation refines candidates, increasing experimental validation success and accelerating clinical translation.
Perspectives
To harness the potential resources of AI within the EV biomarker discovery field, it is essential to address several gaps that limit the clinical application of computational results. First, notable data heterogeneity and sparseness exist in the EV research area. Facing these challenges, it is vital to adhere to standardized protocols (for example, MISEV guidelines) for comparable data, as well as to deploy integration methods driven by AI to harmonize disparate sources. Regarding data integration, DL models can provide efficient data integration capabilities. They are capable of directly handling diverse heterogeneous data and automatically learning hierarchical and complex interactions from each type of data. This allows them to effectively detect hidden patterns between multiple omics datasets, revealing synergistic interactions from biomarker signals. However, their lack of interpretability can hinder biological insights. Developing interpretable AI models and incorporating xAI tools, such as SHAP or gradient-based methods, can clarify which EV components (for example, specific proteins or miRNAs) most strongly influence predictions, thereby enhancing both performance and biological understanding. By moving in this direction, we can ensure that AI-driven integration not only delivers high performance in EV biomarker identification but also provides the biological understanding necessary for clinical translation. Second, in parallel, advanced AI tools like AF3 and RFAA can strengthen EV biomarker selection by modeling protein structures and interactions. Incorporating these AI tools into the biomarker selection pipeline will yield valuable insights into candidate proteins’ accessibility, binding and structural stability under specific assay conditions, guiding the selection of optimal EV markers. This process will be a pivotal step in translating computational biomarker findings into high-performance EV detection systems (Fig. 4). Finally, a major barrier in the clinical translation of EV biomarkers is the limited availability of precious clinical samples, which restricts the application of current multi-omic profiling technologies for EVs that may typically demand large input. Advancements in assay platforms capable of detecting low-abundance signals, along with increased support for EV multi-omics consortia and initiatives, could meet wider clinical adoption. This involves creating sophisticated algorithms that effectively reduce noise in both extensive and sparse data.
This diagram presents an end-to-end AI pipeline for refined biomarker selection. It integrates GNN for analyzing complex biological networks and Explainable AI (XAI) for interpretable modeling. The framework also utilizes structural tools like AlphaFold3 and RoseTTAFold All-Atom to predict molecular interactions, facilitating clinical validation. GNN,Graph Neural Networks;SHAP,SHapley Additive exPlanations.
Conclusion
AI has transformed biomarker studies by integrating and analyzing complex multi-omics data. Although the use of AI to EV marker discovery is in its early stages, AI has great potential to identify clinically useful markers that cannot be easily detected using conventional methods. Realizing this potential will inevitably require the development of common data practice, interpretable models and seamless integration with experimental validation, ultimately leading to a faster transition of EV biomarkers from discovery to clinical utility and improving precision medicine.
References
Kalluri, R. & LeBleu, V. S. The biology, function, and biomedical applications of exosomes. Science 367, eaau6977 (2020).
Yáñez-Mó, M. et al. Biological properties of extracellular vesicles and their physiological functions. J. Extracell. Vesicles 4, 27066 (2015).
Liu, S.-Y., Liao, Y., Hosseinifard, H., Imani, S. & Wen, Q.-L. Diagnostic role of extracellular vesicles in cancer: a comprehensive systematic review and meta-analysis. Front. Cell Dev. Biol. 9, 705791 (2021).
Asleh, K. et al. Extracellular vesicle-based liquid biopsy biomarkers and their application in precision immuno-oncology. Biomark. Res. 11, 99 (2023).
Kalluri, R. & McAndrews, K. M. The role of extracellular vesicles in cancer. Cell 186, 1610–1626 (2023).
Chen, J., Tian, C., Xiong, X., Yang, Y. & Zhang, J. Extracellular vesicles: new horizons in neurodegeneration. EBioMedicine 113, 105605 (2025).
Li, Z. et al. Research progress on the role of extracellular vesicles in neurodegenerative diseases. Transl. Neurodegener. 12, 43 (2023).
Wang, L. et al. Extracellular vesicles: biological mechanisms and emerging therapeutic opportunities in neurodegenerative diseases. Transl. Neurodegener. 13, 60 (2024).
Ghodasara, A., Raza, A., Wolfram, J., Salomon, C. & Popat, A. Clinical translation of extracellular vesicles. Adv. Healthc. Mater. 12, e2301010 (2023).
Van Dorpe, S., Tummers, P., Denys, H. & Hendrix, A. Towards the clinical implementation of extracellular vesicle-based biomarker assays for cancer. Clin. Chem. 70, 165–178 (2024).
Théry, C. et al. Minimal information for studies of extracellular vesicles 2018 (MISEV2018): a position statement of the International Society for Extracellular Vesicles and update of the MISEV2014 guidelines. J. Extracell. Vesicles 7, 1535750 (2018).
Yekula, A. et al. From laboratory to clinic: translation of extracellular vesicle based cancer biomarkers. Methods 177, 58–66 (2020).
Nieuwland, R., Enciso-Martinez, A. & Bracht, J. W. P. Clinical applications and challenges in the field of extracellular vesicles. Med. Genet. 35, 251–258 (2023).
Enderle, D. & Noerholm, M. Are extracellular vesicles ready for the clinical laboratory?. J. Lab. Med. 46, 273–282 (2022).
Soekmadji, C. et al. The future of extracellular vesicles as theranostics—an ISEV meeting report. J. Extracell. Vesicles 9, 1809766 (2020).
Baek, M. et al. Accurate prediction of protein structures and interactions using a three-track neural network. Science 373, 871–876 (2021).
Azenkot, T., Rivera, D. R., Stewart, M. D. & Patel, S. P. Artificial intelligence and machine learning innovations to improve design and representativeness in oncology clinical trials. Am. Soc. Clin. Oncol. Educ. Book 45, e473590 (2025).
Arango-Argoty, G. et al. AI-driven predictive biomarker discovery with contrastive learning to improve clinical trial outcomes. Cancer Cell 43, 875–890.e8 (2025).
Prelaj, A. et al. Artificial intelligence for predictive biomarker discovery in immuno-oncology: a systematic review. Ann. Oncol. 35, 29–65 (2024).
Ji, Y.-R. et al. Noninvasive assessment of protease activity in osteosarcoma via click chemistry-mediated enrichment of extracellular vesicles. Adv. Funct. Mater. https://doi.org/10.1002/adfm.202422469 (2025).
Wang, J. et al. Integrative proteomic profiling of tumor and plasma extracellular vesicles identifies a diagnostic biomarker panel for colorectal cancer. Cell Rep. Med. 6, 102090 (2025).
Zhao, C. et al. Extracellular vesicle digital scoring assay for assessment of treatment responses in hepatocellular carcinoma patients. J. Exp. Clin. Cancer Res. 44, 136 (2025).
Yuan, Y. et al. Identification of a biomarker panel in extracellular vesicles derived from non-small cell lung cancer (NSCLC) through proteomic analysis and machine learning. J. Extracell. Vesicles 14, e70078 (2025).
Greenberg, Z. F. et al. Nanomaterial isolated extracellular vesicles enable high precision identification of tumor biomarkers for pancreatic cancer liquid biopsy. J. Nanobiotechnol. 23, 467 (2025).
Mohamedali, A. et al. A proteomic examination of plasma extracellular vesicles across colorectal cancer stages uncovers biological insights that potentially improve prognosis. Cancers 16, 4259 (2024).
Trinidad, C. V. et al. Lineage specific extracellular vesicle-associated protein biomarkers for the early detection of high grade serous ovarian cancer. Sci. Rep. 13, 18341 (2023).
Zhao, C. et al. Identification of tumor-specific surface proteins enables quantification of extracellular vesicle subtypes for early detection of pancreatic ductal adenocarcinoma. Adv. Sci. 12, e2414982 (2025).
Choi, Y. et al. Blood-derived APLP1+ extracellular vesicles are potential biomarkers for the early diagnosis of brain diseases. Sci. Adv. 11, eado6894 (2025).
Muraoka, S. et al. Comprehensive proteomic profiling of plasma and serum phosphatidylserine-positive extracellular vesicles reveals tissue-specific proteins. iScience 25, 104012 (2022).
Huang, L. et al. PDX-derived organoids model in vivo drug response and secrete biomarkers. JCI Insight https://doi.org/10.1172/jci.insight.135544 (2020).
Turaga, S. M. et al. Identification of small extracellular vesicle protein biomarkers for pediatric Ewing sarcoma. Front. Mol. Biosci. 10, 1138594 (2023).
Fathi, M. et al. Identifying signatures of EV secretion in metastatic breast cancer through functional single-cell profiling. iScience 26, 106482 (2023).
Norman, M. et al. Toward identification of markers for brain-derived extracellular vesicles in cerebrospinal fluid: a large-scale, unbiased analysis using proximity extension assays. J. Extracell. Vesicles 14, e70052 (2025).
Cai, Y. et al. Surface protein profiling and subtyping of extracellular vesicles in body fluids reveals non-CSF biomarkers of Alzheimer’s disease. J. Extracell. Vesicles 13, e12432 (2024).
Min, Y. et al. Single extracellular vesicle surface protein-based blood assay identifies potential biomarkers for detection and screening of five cancers. Mol. Oncol. 18, 743–761 (2024).
Bockorny, B. et al. A large-scale proteomics resource of circulating extracellular vesicles for biomarker discovery in pancreatic cancer. eLife https://doi.org/10.7554/eLife.87369 (2024).
Singh, A. et al. DIABLO: an integrative approach for identifying key molecular drivers from multi-omics assays. Bioinformatics 35, 3055–3062 (2019).
Xu, C. et al. AutoOmics: new multimodal approach for multi-omics research. Artif. Intell. Life Sci. 1, 100012 (2021).
Mo, Q. et al. A fully Bayesian latent variable model for integrative clustering analysis of multi-type omics data. Biostatistics 19, 71–86 (2018).
Wang, T. et al. MOGONET integrates multi-omics data using graph convolutional networks allowing patient classification and biomarker identification. Nat. Commun. 12, 3445 (2021).
Jeong, D., Koo, B., Oh, M., Kim, T.-B. & Kim, S. GOAT: Gene-level biomarker discovery from multi-Omics data using graph ATtention neural network for eosinophilic asthma subtype. Bioinformatics 39, btad582 (2023).
Zhuang, Y. et al. Deep learning on graphs for multi-omics classification of COPD. PLoS ONE 18, e0284563 (2023).
Ronen, J., Hayat, S. & Akalin, A. Evaluation of colorectal cancer subtypes and cell lines using deep learning. Life Sci. Alliance 2, e201900517 (2019).
Poirion, O. B., Jing, Z., Chaudhary, K., Huang, S. & Garmire, L. X. DeepProg: an ensemble of deep-learning and machine-learning models for prognosis prediction using multi-omics data. Genome Med. 13, 112 (2021).
Athaya, T., Li, X. & Hu, H. A deep learning method to integrate extracelluar miRNA with mRNA for cancer studies. Bioinformatics 40, btae653 (2024).
Wang, J., Liao, N., Du, X., Chen, Q. & Wei, B. A semi-supervised approach for the integration of multi-omics data based on transformer multi-head self-attention mechanism and graph convolutional networks. BMC Genomics 25, 86 (2024).
McKiernan, J. et al. A novel urine exosome gene expression assay to predict high-grade prostate cancer at initial biopsy. JAMA Oncol. 2, 882–889 (2016).
Tutrone, R. et al. Clinical utility of the exosome based ExoDx Prostate(IntelliScore) EPI test in men presenting for initial Biopsy with a PSA 2-10 ng/mL. Prostate Cancer Prostatic Dis. 23, 607–614 (2020).
Wang, W.-L. W. et al. Expression of small noncoding RNAs in urinary exosomes classifies prostate cancer into indolent and aggressive disease. J. Urol. 204, 466–475 (2020).
Hyndman, M. E. et al. Development of an effective predictive screening tool for prostate cancer using the ClarityDX machine learning platform. NPJ Digit. Med. 7, 163 (2024).
Brixi, G. et al. Genome modeling and design across all domains of life with Evo 2. Preprint at bioRxiv https://doi.org/10.1101/2025.02.18.638918 (2025).
Avsec, Ž. et al. AlphaGenome: advancing regulatory variant effect prediction with a unified DNA sequence model. Preprint at bioRxiv https://doi.org/10.1101/2025.06.25.661532 (2025).
He, Y. et al. Generalized biological foundation model with unified nucleic acid and protein language. Nat. Mach. Intell. 7, 942–953 (2025).
de Almeida, B. P. et al. A multimodal conversational agent for DNA, RNA and protein tasks. Nat. Mach. Intell. 7, 928–941 (2025).
Trott, O. & Olson, A. J. AutoDock Vina: improving the speed and accuracy of docking with a new scoring function, efficient optimization, and multithreading. J. Comput. Chem. 31, 455–461 (2010).
Dominguez, C., Boelens, R. & Bonvin, A. M. J. J. HADDOCK: a protein–protein docking approach based on biochemical or biophysical information. J. Am. Chem. Soc. 125, 1731–1737 (2003).
Kozakov, D. et al. The ClusPro web server for protein–protein docking. Nat. Protoc. 12, 255–278 (2017).
Lyskov, S. & Gray, J. J. The RosettaDock server for local protein–protein docking. Nucleic Acids Res. 36, W233–8 (2008).
Abramson, J. et al. Accurate structure prediction of biomolecular interactions with AlphaFold 3. Nature 630, 493–500 (2024).
Krishna, R. et al. Generalized biomolecular modeling and design with RoseTTAFold All-Atom. Science 384, eadl2528 (2024).
Baek, M. et al. Accurate prediction of protein–nucleic acid complexes using RoseTTAFoldNA. Nat. Methods 21, 117–121 (2024).
Cancer Genome Atlas Research Network, Weinstein, J. N. et al. The Cancer Genome Atlas Pan-Cancer analysis project. Nat. Genet. 45, 1113–1120 (2013).
Edwards, N. J. et al. The CPTAC Data Portal: a resource for cancer proteomics research. J. Proteome Res. 14, 2707–2713 (2015).
Greenwood, A. K. et al. The AD Knowledge Portal: a repository for multi-omic data on Alzheimer’s disease and aging. Curr. Protoc. Hum. Genet. 108, e105 (2020).
Sauler, M. et al. Characterization of the COPD alveolar niche using single-cell RNA sequencing. Nat. Commun. 13, 494 (2022).
You, S. et al. Integrated classification of prostate cancer reveals a novel luminal subtype with poor outcome. Cancer Res. 76, 4948–4958 (2016).
Chitti, S. V. et al. Vesiclepedia 2024: an extracellular vesicles and extracellular particles repository. Nucleic Acids Res. 52, D1694–D1698 (2024).
Kim, D.-K. et al. EVpedia: a community web portal for extracellular vesicles research. Bioinformatics 31, 933–939 (2015).
Keerthikumar, S. et al. ExoCarta: a web-based compendium of exosomal cargo. J. Mol. Biol. 428, 688–692 (2016).
EV-TRACK Consortium et al. EV-TRACK: transparent reporting and centralizing knowledge in extracellular vesicle research. Nat. Methods 14, 228–232 (2017).
Chen, J. et al. EV-COMM: a database of interspecies and intercellular interactions mediated by extracellular vesicles. J. Extracell. Vesicles 13, e12442 (2024).
Hu, Z. et al. The Cancer Surfaceome Atlas integrates genomic, functional and drug response data to identify actionable targets. Nat. Cancer 2, 1406–1422 (2021).
Uhlén, M. et al. Proteomics. Tissue-based map of the human proteome. Science 347, 1260419 (2015).
Uhlén, M. et al. The human secretome. Sci. Signal 12, eaaz0274 (2019).
GTEx Consortium The Genotype-Tissue Expression (GTEx) project. Nat. Genet. 45, 580–585 (2013).
Kelleher, K. J. et al. Pharos 2023: an integrated resource for the understudied human proteome. Nucleic Acids Res. 51, D1405–D1416 (2023).
Knox, C. et al. DrugBank 6.0: the DrugBank knowledgebase for 2024. Nucleic Acids Res. 52, D1265–D1275 (2024).
Maier, A. et al. Drugst.One—a plug-and-play solution for online systems medicine and network-based drug repurposing. Nucleic Acids Res. 52, W481–W488 (2024).
Sadegh, S. et al. Network medicine for disease module identification and drug repurposing with the NeDRex platform. Nat. Commun. 12, 6848 (2021).
Keenan, A. B. et al. The Library of Integrated Network-Based Cellular Signatures NIH program: system-level cataloging of human cells response to perturbations. Cell Syst. 6, 13–24 (2018).
Barretina, J. et al. The Cancer Cell Line Encyclopedia enables predictive modelling of anticancer drug sensitivity. Nature 483, 603–607 (2012).
Ghandi, M. et al. Next-generation characterization of the Cancer Cell Line Encyclopedia. Nature 569, 503–508 (2019).
Nusinow, D. P. et al. Quantitative proteomics of the Cancer Cell Line Encyclopedia. Cell 180, 387–402 (2020).
Novershtern, N. et al. Densely interconnected transcriptional circuits control cell states in human hematopoiesis. Cell 144, 296–309 (2011).
Heng, T. S. P. & Painter, M. W. Immunological Genome Project Consortium. The Immunological Genome Project: networks of gene expression in immune cells. Nat. Immunol. 9, 1091–1094 (2008).
Bhattacharya, S. et al. ImmPort, toward repurposing of open access immunological assay data for translational and clinical research. Sci. Data 5, 180015 (2018).
Funding
This research was funded by National Institutes of Health R01CA277530 (H.-R.T., Y.Z., V.G.A., J.D.Y. and S.Y.), R01CA255727 (Y.Z., H.-R.T. and S.Y.), R01CA253651 (H.-R.T., V.G.A. and S.Y.), R01CA253651-04S1 (Y.Z., H.-R.T. and S.Y.), R01CA246304 (H.-R.T., V.G.A. and S.Y.), P01CA278732 (S.Y.) and The Samuel Oschin Comprehensive Cancer Institute (SOCCI) at Cedars-Sinai Medical Center through 2024 Program Project Grant (PPG) Team Science Award (S.Y.).
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Competing interests
The authors declare no conflicts of interest.
Declaration of generative AI and AI-assisted technologies in the writing process
During the preparation of this work, the author(s) used Google Gemini to check this manuscript for errors and improve language, such as corrections or rephrasing. After using this tool/service, the author(s) reviewed and edited the content as needed and take(s) full responsibility for the content of the publication.
Additional information
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.
About this article
Cite this article
Kim, J., Yang, J.D., Agopian, V.G. et al. Computational frameworks for enhanced extracellular vesicle biomarker discovery. Exp Mol Med (2026). https://doi.org/10.1038/s12276-025-01622-x
Received:
Revised:
Accepted:
Published:
Version of record:
DOI: https://doi.org/10.1038/s12276-025-01622-x






