Table 2 Team scores for artefact class detection and out-of-sample generalization.

From: An objective comparison of detection and segmentation algorithms for artefacts in clinical endoscopy

Team name

Detection

Generalization

mAPd

IoUd

scored

mAPg

IoUg

devg

yangsuhui

0.3235

0.4172

0.361

0.3187

0.0734

0.1018

ZhangPY

0.3117

0.4051

0.3491

0.3518

0.0889

0.0984

Keisecker

0.3087

0.3997

0.3451

0.2848

0.3902

0.0696

VegZhang

0.3371

0.3517

0.3429

0.3991

0.1783

0.101

YWa

0.3842

0.2368

0.3252

0.3746

0.1481

0.0424

michaelqiyao

0.3842

0.2368

0.3252

0.3746

0.1780

0.0742

ilkayoksuz

0.2719

0.3456

0.3014

0.2974

0.0688

0.0859

swtnb

0.2901

0.318

0.3013

0.2914

0.2547

0.0854

Witt

0.3148

0.2621

0.2937

0.2897

0.1854

0.1003

akhanss

0.2581

0.333

0.288

0.2187

0.2262

0.0770

XiaokangWang

0.2621

0.3205

0.2855

0.2515

0.2058

0.0728

a545306097

0.2547

0.2719

0.2616

0.1122

0.2244

0.1298

nqt52798669

0.3068

0.1222

0.233

0.3154

0.0871

0.0515

ShufanYang

0.2208

0.1955

0.2107

0.1931

0.1365

0.0478

xiaohong1

0.2416

0.3482

0.2842

0.1764

0.2671

0.0555

Faster R-CNN (baseline)

0.2226

0.2751

0.2436

0.2172

0.1647

0.0893

Retinanet (baseline)

0.2135

0.2270

0.2189

0.2499

0.1679

0.0665

Merged (super baseline)

0.3331

0.3793

0.3516

0.3433

0.2610

0.0610

  1. Off-the-shelf Faster R-CNN [20] and RetinaNet16 are reported as baselines (as labeled) for comparison. We also include the performance of a super classifier denoted ‘Merged’ constructed from merging the predicted bounding boxes of all participants. Performance evaluated using the detection or out-of-sample generalization dataset is differentiated by subscripts ‘d’ and ‘g’, respectively. Teams are ordered in decreasing order of scored. The better the method, the higher the mAP and IoU, the lower the devg. Top 5 values for each evaluation metric is shown in bold.