Fig. 2: LLM Cell type annotation performance.
From: Benchmarking cell type and gene set annotation by large language models with AnnDictionary

LLM cell type annotation quality as compared to manual annotation, rated by an LLM at three levels: 1) perfect, 2) partial, and 3) non-matching, and two resolutions: (A) cells and (B) by cell type. Inter-rater reliability measured as pairwise kappa between each LLM (C). Mean and (D) Standard deviation. All metrics are shown as mean and standard deviation across five replicates. Source data are provided as a Source Data file.