Fig. 3: Literature screen experiment results. | npj Digital Medicine

Fig. 3: Literature screen experiment results.

From: Accelerating clinical evidence synthesis with large language models

Fig. 3

a Streamline study screening using TrialMind with human in the loop. The left panel shows the list of eligibility criteria suggested by TrialMind and is subject to the user’s edits. The right panel shows the TrialMind assessments for the criterion-level eligibility of all identified studies. Red, green, and gray fields indicate the assessment “ineligible”, “eligible”, and “unknown”, respectively. b Ranking performances for Recall@20/50 within across therapeutic areas. The bars on the right show the numbers of the fold TrialMind's performance against the best baseline across that row. c Recall@20 and Recall@50 for TrialMind and selected baselines. d Effect of individual criteria on ranking results. To assess this effect, we remove one criterion at a time from the criteria set, re-rank the results, and measure the change in recall, reflecting the criterion’s impact on ranking quality. Most criteria positively influence the ranking, while a small portion has a negative effect. e Ranking performance for Recall@K with varying K in four topics. Shaded areas are 95% confidence interval.

Back to article page