Fig. 3: Prediction performance of model and human.

a Prediction performance of model. The table displays the AUC, specificity, sensitivity, accuracy, negative predictive value (NPV) and positive predictive value (PPV) prediction performance of LLNM-Net for patients across the three datasets. Source data are provided as a Source Data file. b Prediction performance of the comparative test between radiologists and LLNM-Net. Radiologists performed well in the malignant classification test but poorly in the LLNM classification test. In contrast, LLNM-Net performed better on the same test dataset. Source data are provided as a Source Data file. c ROCs of LLNM-Net on the training set, validation set, and external test sets, as well as the predictive performance of senior and junior radiologists. The AUC results are presented as mean values, and 95% confidence intervals are derived from n = 100 experimental replicates for each task setting. In each replicate trial, real patient input data are selected via bootstrap sampling from the real dataset. We used a two-sample two-sided unadjusted Kolmogorov-Smirnov (KS) test for goodness of fit to examine the predictive distribution values of radiologists and LLNM-Net. Raincloud plots with violin and box diagrams are used to show the comparison of individual-level prediction probabilities between the radiologists (Doctor raincloud plot, mean accuracy of 108 radiologists, n = 200) and the LLNM-Net (LLNM-Net raincloud plot, n = 200, KS = 0.385, P < 1 × 10−12). Each boxplot includes a box representing the median value and interquartile range (IQR). The whiskers extend from the box to the maximum and minimum values, with their length not exceeding 1.5 times the IQR. The red color indicates LLNM-positive samples, while the blue color represents LLNM-negative samples. Source data are provided as a Source Data file.