Fig. 8: RAG experiments with 3 passages injected.
From: Evaluating search engines and large language models for answering health questions

Accuracy of the LLMs with varying numbers of correct passages (0/3, 1/3, 2/3, or 3/3). On each panel, GPT-4 is represented by a blue line, LLama3 by an orange line, MedLlama3 by a green line, ChatGPT by a red line, and text-davinci-002 by a purple line. a Plots results for the TREC HM 2020 collection. b Plots results for the TREC HM 2021 collection. c Plots results for the TREC HM 2022 collection.