Table 4 Summary of the IDH mutation status classification performance of different models trained on real or synthetic MRI data

From: Categorical and phenotypic image synthetic learning as an alternative to federated learning

Experiments

Training data

Metrics

UCSF

NYU

UWM

UPenn

UTSWp2

Overall

McNemar Test

Centralized Training

Real TCGA + Real UTSW + Real EGD

ACC

96.2

94.5

95.9

97.9

94.5

96.2

-

SEN (MT)

88.4

87.5

77.8

72.7

88.0

86.5

SPE (WT)

98.2

97.0

97.5

98.5

97.4

98.0

AUC

0.980

0.966

0.969

0.978

0.992

0.979

FL using the FedAvg algorithm with 100 FL rounds

Real TCGA + Real UTSW + Real EGD

ACC

95.6

92.8

95.4

97.9

95.1

95.8

χ²(1) = 2.45, p = 0.1175

SEN (MT)

89.3

85.4

77.8

63.6

92.0

87.0

SPE (WT)

97.2

95.5

97.0

98.8

96.5

97.4

AUC

0.983

0.968

0.961

0.975

0.988

0.980

Centralized Training using Synthetic Samples

Synthetic TCGA + Synthetic UTSW + Synthetic EGD

ACC

94.8

92.8

95.0

98.1

94.5

95.5

χ²(1) = 1.31, p = 0.2531

SEN (MT)

83.5

91.7

88.9

72.7

88.0

86.1

SPE (WT)

97.7

93.2

95.5

98.8

97.4

97.2

AUC

0.972

0.952

0.962

0.960

0.962

0.966

  1. Statistical comparisons of classification results among methods are performed using the McNemar test. All tests are two-sided, and no adjustments are applied
  2. Sensitivity (SEN) and specificity (SPE) correspond to the accuracy of the mutated (MT) and wild-type (WT) classes, respectively.
  3. ACC accuracy, AUC area under the receiver operating characteristic curve, EGD Erasmus Glioma Database, NYU New York University, TCGA The Cancer Genome Atlas, UCSF University of California San Francisco Preoperative Diffuse Glioma MRI dataset, UPenn University of Pennsylvania glioblastoma cohort, UTSW University of Texas Southwestern Medical Center, UWM University of Wisconsin–Madison, UTSWp2 University of Texas Southwestern Medical Center part 2.
  4. Source data are provided as a Source Data file.