npj Digital Medicine

Table 2 Accuracy of LLM-only and LLM-assisted risk-of-bias assessments

From: Language models for data extraction and risk of bias assessment in complementary medicine

Domains of ROB assessment	LLM-only risk-of-bias assessments			LLM-assisted risk-of-bias assessments			Rate difference between LLM-assisted and LLM-only ROB assessments (95% CI)
Domains of ROB assessment	No. of correct assessments	No. of total assessments	Correct assessment rate (95% CI)	Mean No. of correct assessments per reviewer	Mean No. of total assessments per reviewer	Correct assessment rate (95% CI)
Sequence generation	94	107	87.85% (80.12% to 93.37%)	103	107	96.26% (90.70% to 98.97%)	8.41% (1.25% to 15.57%)
Allocation sequence concealment	103	107	96.26% (90.70% to 98.97%)	105	107	98.13% (93.41% to 99.77%)	1.87% (-2.55% to 6.29%)
Blinding: patients	102	107	95.33% (89.43% to 98.47%)	104	107	97.20% (92.02% to 99.42%)	1.87% (-3.21% to 6.95%)
Blinding: healthcare providers	103	107	96.26% (90.70% to 98.97%)	105	107	98.13% (93.41% to 99.77%)	1.87% (-2.55% to 6.29%)
Blinding: data collectors	101	107	94.39% (88.19% to 97.91%)	103	107	96.26% (90.70% to 98.97%)	1.87% (-3.78% to 7.52%)
Blinding: outcome assessors	103	107	96.26% (90.70% to 98.97%)	103	107	96.26% (90.70% to 98.97%)	0.00% (-5.08% to 5.08%)
Blinding: data analysts	103	107	96.26% (90.70% to 98.97%)	104	107	97.20% (92.02% to 99.42%)	0.93% (-3.83% to 5.70%)
Missing outcome data	107	107	100.00% (96.61% to 100.00%)	106	107	99.07% (94.90% to 99.98%)	-0.93% (-3.49% to 1.62%)
Selective outcome reporting	106	107	99.07% (94.90% to 99.98%)	106	107	99.07% (94.90% to 99.98%)	0.00% (-2.58% to 2.58%)
Other bias	102	107	95.33% (89.43% to 98.47%)	103	107	96.26% (90.70% to 98.97%)	0.93% (-4.44% to 6.31%)
Overall	1024	1070	95.70% (94.31% to 96.84%)	1041	1070	97.29% (96.13% to 98.18%)	1.59% (0.03% to 3.15%)

LLM: large language model.

Back to article page

Search

Advanced search

Quick links