Extended Data Fig. 4: Supplemental analysis of the confidence score.
From: Evaluation of large language models for discovery of gene set function

a, Distribution of confidence scores (n = 300) assigned by GPT-4 with confidence level threshold set based on the distribution pattern. “High confidence” (red): 0.87–1.00; “Medium confidence” (blue): 0.82–0.86; “Low confidence” (dark orange): 0.01–0.81; “Name not assigned” (gray): 0. b, Scatter plot of naming accuracy versus GPT-4 self-assessed confidence score for real gene sets drawn from GO (points, n = 100). Accuracy is estimated by the semantic similarity between the GPT-4 proposed name and the real GO term name. The best-fit regression line is shown in dark gray. The correlation coefficient (R) is determined by a two-sided Pearson’s correlation with p-value shown.