Table 1 Results of model architectures for the different learning schemes and label types

From: Real world federated learning with a knowledge distilled transformer for cardiac CT imaging

Training Scheme

Model

HPs & COs [mm]

MS [mm]

Calc [DICE]

  

Training

Other

Training

Other

Training

Other

Local

UNet

3.48 ± 2.77

4.27 ± 2.94

3.01 ± 1.84

4.30 ± 1.82

0.708 ± 0.103

0.644 ± 0.290

 

ViT

9.45 ± 11.87

14.85 ± 16.35

6.86 ± 11.14

37.88 ± 33.10

0.644 ± 0.184

0.474 ± 0.275

 

SWIN

2.66 ± 1.79

4.89 ± 4.08

3.96 ± 2.19

4.06 ± 2.16

0.709 ± 0.190

0.466 ± 0.265

Fed

UNet

2.91 ± 2.54

3.75 ± 2.38

3.27 ± 2.02

3.75 ± 1.96

0.495 ± 0.209

0.391 ± 0.212

 

ViT

4.75 ± 4, 17

3.71 ± 1.88

3.82 ± 2.50

5.32 ± 4.98

0.671 ± 0.191

0.636 ± 0.285

 

SWIN

3.53 ± 2.82

3.98 ± 2.05

3.03 ± 1.85

3.30 ± 1.60

0.683 ± 0.202

0.692 ± 0.232

FedKD

UNet

3.54 ± 2.85

4.25 ± 2.94

2.99 ± 1.81

3.40 ± 1.56

0.527 ± 0.209

0.526 ± 0.228

 

ViT

4.70 ± 4.14

3.72 ± 1.88

3.28 ± 2.31

4.35 ± 2.34

0.562 ± 0.200

0.566 ± 0.240

 

SWIN

3.04 ± 2.34

3.54 ± 2.12

2.95 ± 1.72

3.29 ± 1.45

0.646 ± 0.208

0.670 ± 0.231

  1. Three model types are investigated: convolutional UNet, a vision transformer (ViT) for segmentation, and SWIN-UNETR, which uses a siding attention approach different to a conventional ViT. All architectures are trained locally per location (Local), federated across labeled subsets (Fed), and with our federated knowledge distillation (FedKD) approach. Results are reported for locations the model was trained at (Training) and tested at the remaining (Other). All values are presented as mean ± std. Bold values represent the best performing model for each task.
  2. HPs & COs = Hinge Points and Coronary Ostia, MS = Membranous Septum, Calc = Calcification, KD = Knowledge Distillation.