Table 1 Detailed breakdown of full dataset including “SEED” and “EXT” by ground truth class and characteristics (study, device, and geography), highlighting both the number of images and relative percentage.

From: Generalizable deep neural networks for image quality classification of cervical images

Dataset characteristics

Ground truth categories

no. (%)

Grand total by dataset characteristics

no. (%)

Study

Device

Geography

Low Quality (n = 5902)

Intermediate (n = 10,553)

High Quality (n = 24,704)

Total

(n = 41,159)

# images

# images

# images

# images

“SEED” dataset (Model Development and Internal Validation)

NHS

Cervigram

Costa Rica

508

(8.8%)

0

(0.0%)

9901

(40.4%)

10,409

(25.7%)

ALTS

Cervigram

USA

418

(7.2%)

0

(0.0%)

4027

(16.4%)

4445

(11.0%)

CVT

DSLR

Costa Rica

1160

(20.0%)

1391

(13.6%)

2104

(8.6%)

4655

(11.5%)

Biop

DSLR

USA

328

(5.7%)

826

(8.1%)

524

(2.1%)

1678

(4.1%)

D Biop

DSLR

Europe

0

(0.0%)

423

(4.1%)

749

(3.1%)

1172

(2.9%)

Itoju

DSLR

Nigeria

548

(9.5%)

1139

(11.1%)

3835

(15.7%)

5522

(13.6%)

Itoju

J5

Nigeria

678

(11.7%)

1469

(14.3%)

2427

(9.9%)

4574

(11.3%)

Itoju

S8

Nigeria

2150

(37.1%)

4996

(48.8%)

933

(3.8%)

8079

(19.9%)

Total

5790

(100.0%)

10,244

(100.0%)

24,500

(100.0%)

40,534

(100.0%)

“EXT” dataset (External Validation)

PAVE

IRIS

Cambodia

116

(45.8%)

224

(35.4%)

165

(36.3%)

505

(37.7%)

PAVE

IRIS

DR

137

(54.2%)

409

(64.6%)

289

(63.7%)

835

(62.3%)

Total

253

(100.0%)

633

(100.0%)

454

(100.0%)

1340

(100.0%)

Grand total by ground truth

no. (%)

6043

10,877

24,954

41,874

(14.4%)

(26.0%)

(59.6%)

(100.0%)

  1. Overall, the “SEED” dataset comprised of a total of 40,534 images while the “EXT” dataset comprised of a total of 1340 images.