Nature Medicine

Extended Data Table 7 Fleiss kappa between the 3 runs of each model for test-retest repeatability

From: Comparative benchmarking of the DeepSeek large language model on medical tasks and clinical reasoning

Back to article page

Search

Advanced search

Quick links