Table 1 Detailed breakdown of full dataset including “SEED” and “EXT” by ground truth class and characteristics (study, device, and geography), highlighting both the number of images and relative percentage.

Dataset characteristics			Ground truth categories no. (%)						Grand total by dataset characteristics no. (%)
Study	Device	Geography	Low Quality (n = 5902)		Intermediate (n = 10,553)		High Quality (n = 24,704)		Total (n = 41,159)
Study	Device	Geography	# images		# images		# images		# images
“SEED” dataset (Model Development and Internal Validation)
NHS	Cervigram	Costa Rica	508	(8.8%)	0	(0.0%)	9901	(40.4%)	10,409	(25.7%)
ALTS	Cervigram	USA	418	(7.2%)	0	(0.0%)	4027	(16.4%)	4445	(11.0%)
CVT	DSLR	Costa Rica	1160	(20.0%)	1391	(13.6%)	2104	(8.6%)	4655	(11.5%)
Biop	DSLR	USA	328	(5.7%)	826	(8.1%)	524	(2.1%)	1678	(4.1%)
D Biop	DSLR	Europe	0	(0.0%)	423	(4.1%)	749	(3.1%)	1172	(2.9%)
Itoju	DSLR	Nigeria	548	(9.5%)	1139	(11.1%)	3835	(15.7%)	5522	(13.6%)
Itoju	J5	Nigeria	678	(11.7%)	1469	(14.3%)	2427	(9.9%)	4574	(11.3%)
Itoju	S8	Nigeria	2150	(37.1%)	4996	(48.8%)	933	(3.8%)	8079	(19.9%)
Total			5790	(100.0%)	10,244	(100.0%)	24,500	(100.0%)	40,534	(100.0%)
“EXT” dataset (External Validation)
PAVE	IRIS	Cambodia	116	(45.8%)	224	(35.4%)	165	(36.3%)	505	(37.7%)
PAVE	IRIS	DR	137	(54.2%)	409	(64.6%)	289	(63.7%)	835	(62.3%)
Total			253	(100.0%)	633	(100.0%)	454	(100.0%)	1340	(100.0%)
Grand total by ground truth
no. (%)			6043		10,877		24,954		41,874
no. (%)			(14.4%)		(26.0%)		(59.6%)		(100.0%)

Overall, the “SEED” dataset comprised of a total of 40,534 images while the “EXT” dataset comprised of a total of 1340 images.

Quick links

Search