Fig. 1: Comparison of accuracy between Moonshot-v1-128k and Claude-3.5-sonnet in extracting data and assessing ROB. | npj Digital Medicine

Fig. 1: Comparison of accuracy between Moonshot-v1-128k and Claude-3.5-sonnet in extracting data and assessing ROB.

From: Language models for data extraction and risk of bias assessment in complementary medicine

Fig. 1

This figure compares the accuracy of two language models, Claude-3.5-sonnet and Moonshot-v1-128k, in data extraction and risk-of-bias (ROB) assessments across multiple domains in 107 RCTs. Claude-3.5-sonnet demonstrated higher overall accuracy in data extraction (96.2%, 95% CI: 95.8% to 96.5%) than Moonshot-v1-128k (95.1%, 95% CI: 94.7% to 95.5%), with a statistically significant difference of 1.1% (p < 0.001). In the ROB assessment, Claude-3.5-sonnet also achieved slightly higher accuracy (96.9% vs. 95.7%), though the difference was not statistically significant. The greatest difference in domain-specific accuracy was observed in the Baseline Characteristics for data extraction, where Claude-3.5-sonnet outperformed Moonshot-v1-128k.

Back to article page