Fig. 1: The question level accuracy and concept-wise accuracy of different models and approaches.

a OncoLLM outperforms most of the prominent LLMs at criteria/question level answering accuracy. The first column, All, shows the question level accuracy across all the 720 Q&A dataset for oncology-related clinical trials. The second column, Without N/A samples shows question level accuracy after removing those questions whose answers were `N/A' by medical experts. * Human accuracy was obtained only on 109 questions, which was annotated by two medical experts ** Meditron and TrialLLAMA could only process ≈30% of samples due to their small context window of 4k tokens. b OncoLLM (in red) performs consistently well across all the relevant oncology-related concepts.