Table 2 Extraction Accuracy by LLM (n = 50; prompt C).
LLM | Accuracy (%) | 95% Confidence Interval |
|---|---|---|
gpt-4.1 | 93.33% | [86.11%, 97.48%] |
gpt-4.1-mini | 93.33% | [86.11%, 97.48%] |
gpt-4o-mini | 91.67% | [84.03%, 96.34%] |
o3 | 96.33% | [88.08%, 98.67%] |
o4-mini | 95.00% | [88.08%, 98.67%] |