Fig. 2: Scoring of experimental planning assistance questions with and without context.
From: Opportunities for retrieval and tool augmented large language models in scientific facilities

We score the models on relevance, absence of hallucination, and completeness of response.