Extended Data Table 3 Confidence assessment by GPT-4 versus human

From: Evaluation of large language models for discovery of gene set function

  1. We asked a human reviewer to read GPT-4’s proposed name and supporting analysis text for 25 gene sets and independently assign high or medium confidence (the reviewer was blinded to GPT-4’s own confidence assessment). The agreement between human and GPT-4 confidence assessment is presented in this table. Significance is determined using a two-sided Fisher’s exact test.