Fig. 2: Inter-rater reliability across scenarios for each LLM. | BDJ Open

Fig. 2: Inter-rater reliability across scenarios for each LLM.

From: Assessing the power of AI: a comparative evaluation of large language models in generating patient education materials in dentistry

Fig. 2

Radar chart comparing Fleiss Kappa scores of four large language models (LLMs) across four scenarios. Each axis represents a scenario, with Fleiss Kappa values plotted radially from the center. Model performance is shown as distinct geometric lines. Higher values toward the outer edges indicate stronger inter-rater agreement.

Back to article page