Extended Data Table 2 Question answering evaluation datasets for human evaluation

From: Toward expert-level medical question answering with large language models

  1. .