Fig. 7: Retrieval-augmented LLMs, accuracy for “expert” prompt.
From: Evaluating search engines and large language models for answering health questions

Each “Top n” bar depicts the performance obtained by feeding the n-th result from Google until the top 5 (top 1 results are represented as a light blue bar, top 2 as a navy blue one, top 3 as a light green one, top 4 as a green one and top 5 as a salmon one). The baseline (no retrieval augmentation) is represented by a red dashed line. a Plots results for the TREC HM 2020 collection. b Plots results for the TREC HM 2021 collection. c Plots results for the TREC HM 2022 collection.