Fig. 1 | Scientific Data

Fig. 1

From: TrialBench: Multi-Modal AI-Ready Datasets for Clinical Trial Prediction

Fig. 1

Overview of TrialBench. (a) TrialBench comprises 23 AI-ready clinical trial datasets for 8 well-defined tasks: clinical trial duration forecasting, patient dropout rate prediction, serious adverse event, all-cause mortality event prediction, trial approval outcome prediction, trial failure reason identification, eligibility criteria design, and drug dose finding. For each task, we extracted appropriate multi-modal variables and prediction targets from ClinicalTrials.gov, implemented evaluation metrics, and constructed a multi-modal baseline model to assess dataset quality and to serve as the baseline model. We integrate drug SMILES strings, textual descriptions (e.g., eligibility criteria), Medical Subject Heading (MeSH) term, disease ICD-10 code, and other categorical or numerical features as up to five distinct modal features. The multi-modal model utilizes message-passing neural networks (MPNNs), Bio-BERT, MeSH embedding layer, Graph-based Attention Model (GRAM), and DANet basic blocks to process each modality, respectively. (b) We present the trial failure reason identification task (a classification task) as an illustrative example to aid understanding, showcasing the input features, prediction target, baseline model, and some evaluation metrics. TP: True Positive; FP: False Positive; TN: True Negative; FN: False Negative; Prec: Precision.

Back to article page