Extended Data Table 6 One-shot evaluation results for 4-billion parameter LLMs

From: Medical large language models are vulnerable to data-poisoning attacks

  1. Complete results of the open-source benchmark suite for 4-billion parameter language models in the one-shot (one example question/answer pair given as additional context) settings. Results of multiple-choice benchmarks were obtained by aggregating all permutations of each question/answer.