Table 2 iESPer scores for RCC and KTX

From: Ecologically sustainable benchmarking of AI models for histopathology

RCC Task (iESPer)

MODEL

CO2eq/Slide (g)

AUROC

95%CI

ACCURACY

95%CI

PRECISION

95%CI

RECALL

95%CI

F1

95%CI

TransMIL

0.046

0.964

0.936–0.986

0.8

0.732–0.868

0.736

0.634–0.835

0.695

0.577–0.822

0.676

0.556–0.798

CLAM

0.048

0.937

0.908–0.960

0.815

0.753–0.874

0.833

0.764–0.897

0.54

0.442–0.661

0.583

0.453–0.717

InceptionV3

0.073

0.665

0.511–0.788

0.41

0.238–0.580

0.451

0.258–0.640

0.41

0.234–0.592

0.394

0.218–0.584

ViT

0.065

0.693

0.593–0.784

0.643

0.571–0.718

0.448

0.270–0.654

0.393

0.316–0.485

0.37

0.272–0.484

Prov-GigaPath

0.229

0.349

0.336–0.360

0.303

0.279–0.325

0.261

0.206–0.318

0.214

0.177–0.259

0.219

0.175–0.267

KTX Task (iESPer)

MODEL

CO2eq/Slide (g)

AUROC

95%CI

ACCURACY

95%CI

PRECISION

95%CI

RECALL

95%CI

F1

95%CI

TransMIL

0.046

0.579

0.501–0.660

0.279

0.209–0.368

0.428

0.322–0.533

0.265

0.200–0.345

0.257

0.184–0.343

CLAM

0.048

0.547

0.472–0.627

0.234

0.169–0.312

0.294

0.216–0.385

0.236

0.171–0.306

0.223

0.157–0.295

InceptionV3

0.073

0.391

0.334–0.451

0.062

0.038–0.097

0.007

0.004–0.011

0.093

0.093–0.093

0.017

0.012–0.024

ViT

0.065

0.439

0.373–0.511

0.068

0.037–0.105

0.008

0.004–0.012

0.1

0.100–0.100

0.019

0.011–0.026

Prov-GigaPath

0.229

0.188

0.160–0.222

0.065

0.044–0.089

0.110

0.062–0.168

0.055

0.041–0.072

0.042

0.025–0.062

  1. Mean iESPer scores for each metric with corresponding confidence intervals (95%), for each examined model benchmarked on the RCC and KTX task, including CO2eq measurements per slide for inference, respectively.