Fig. 2

Comparative validation of AI-generated and expert-assessed scores. (A) Pre–AI-assisted quality control (QC), showing the agreement between scores generated by an AI-based scoring model and expert-assessed scores. Significant discrepancies (greater than ten points) between the AI-generated scores and human expert scores (highlighted in red) led to re-examination by trained experts. (B) Post–AI-assisted QC, demonstrating improved agreement between AI-generated scores and expert-corrected scores following the expert re-evaluation. In both panels, ‘predicted scores’ refer to scores generated by an AI-based RCFT scoring model, and ‘ground truth scores’ refer to expert-assessed (or expert-corrected) scores.