Scientific Data

Table 1 Summary of reviewed literature with a focus on dataset split and reported test classification performance.

From: Inflation of test accuracy due to data leakage in deep learning-based classification of OCT images

Ref.	OCT dataset	Data split strategy	Model performance on testing set
⁹	Thyroid, parathyroid, fat and muscle samples	per-image	97.12% accuracy
³⁵	Pituitary adenoma	per-image	0.96 AUC
¹⁸	Ophthalmology¹⁵* (version 2)	original split	95.55% accuracy
¹⁹	Ophthalmology¹⁵* (version 2)	original split	99.1% accuracy
²¹	Ophthalmology¹⁵* (version 3)	original split	98.7% accuracy
²²	Ophthalmology¹⁵* (version 3)	original split	96.6% accuracy
²⁷	Ophthalmology¹⁵* (train version 2, test version 3)	original split	99.6% accuracy
²³	(1) Ophthalmology¹⁵* (train version 2, test version 3) (2) Ophthalmology¹⁶*	(1) original split (2) per-volume/subject	(1) 99.80% accuracy (2) 100% accuracy
³⁶	Coronary artery	per-volume/subject	96.05% accuracy
³⁷	Kidney^†	per-volume/subject	82.6% accuracy
³⁸	High and low grade brain tumors	per-volume/subject	97% accuracy
³⁹	Colon**^†	per-volume/subject	88.95% accuracy on 2D images
⁴⁰	Breast tissue	per-volume/subject	91.7% specificity
²⁰	Ophthalmology¹⁵* (version 2)	per-volume/subject	98.46% accuracy
¹⁰	(1) Ophthalmology¹⁵* (version 3) (2) Ophthalmology¹⁶* (3) Breast tissue¹⁴*	(1) original split (2) per-volume/subject (3) per-image	(1) ~96% accuracy (2) >98.8% accuracy (3) 98.8% accuracy
²⁴	(1) Ophthalmology¹⁶* (2) Ophthalmology⁴¹* (3) Ophthalmology⁴²* (4) Ophthalmology¹⁵* (unclear version)	(1) per-volume/subject (2) per-volume/subject (3) per-volume/subject (4) original split	(1) 96.66% accuracy (2) 98.97% accuracy (3) 99.74% accuracy (4) 99.78% accuracy
⁴³	Dentistry	No description given	98% sensitivity 100% specificity
⁴⁴	Ophthalmology	No description given	99.19% accuracy

Open-access datasets and the ones available upon request are marked by * and **, respectively. The dataset is not open-access if not specified. Datasets obtained from animal model samples are marked by ^†. The difference in performance between studies using the same datasets results from the different methods implemented.

Back to article page

Search

Advanced search

Quick links