Fig. 3: Clinical validation results (ex vivo).

a Examples of the AiFURS’s stone count detection performance. b Examples of the AiFURS’s stone size detection performance. c–j Correlation (top row) and Bland–Altman (bottom row) plots depict the concordance between the AiFURS’s predicted outcomes and manual reference measurements for the quantification and size evaluation of kidney stones under controlled ex vivo conditions. In the Bland–Altman plots, the black dashed lines at both ends represent the limits of agreement (LoAs), while the red dashed line represents the bias. Stone number c Spearman’s r = 0.9698, 95% confidence interval (CI): 0.9597–0.9774, p < 0.0001 and g bias = −0.1623, 95% LoA: -0.8869–0.5623. Stones with a maximum diameter >2 mm d Spearman’s r = 0.8134, 95% CI: 0.7316–0.8721, p < 0.0001 and h bias = −0.0011, 95% LoA: −0.7814–0.7792. Stones with a maximum diameter of 1–2 mm e Spearman’s r = 0.3764, 95% CI: 0.1887–0.5376, p = 0.0001 and i bias = 0.0272, 95% LoA: −0.5108–0.5652. Stones with a maximum diameter <1 mm f Spearman’s r = 0.4728, 95% CI: 0.2994–0.6160, p < 0.0001 and j bias = 0.0817, 95% LoA: −0.3474–0.5108. Abbreviations: AiFURS artificial intelligence flexible ureteroscopy system.