Table 1 Multitask morphology assessment

From: Federated task-adaptive learning for personalized selection of human IVF-derived embryos

Task

Cohort

Local

FedAvg

FedProx

FedDWA

FedEmbryo

Centralized

Day 1 Pronuclear (AUC)

Internal test sets

0.64 (0.54,0.74)

0.74 (0.64, 0.83)

0.73 0(0.65, 0.82)

0.75 (0.68, 0.82)

0.76 (0.68, 0.84)

0.73 (0.69, 0.75)

External test sets

0.60 (0.56, 0.63)

0.67 (0.63, 0.71)

0.66 (0.61, 0.69)

0.68 (0.65,0.71)

0.69 (0.65, 0.72)

0.70 (0.66, 0.73)

Day 3 Symmetry (AUC)

Internal test sets

0.75 (0.67, 0.81)

0.79 (0.73, 0.86)

0.80 (0.73, 0.85)

0.82 (0.76, 0.87)

0.87 (0.81, 0.91)

0.84 (0.81, 0.86)

External test sets

0.69 (0.65, 0.72)

0.79 (0.75, 0.81)

0.77 (0.74, 0.80)

0.80 (0.77, 0.82)

0.83 (0.81, 0.85)

0.84 (0.82, 0.86)

Day 3 Fragmentation rate (PCC)

Internal test sets

0.60 (0.27, 0.82)

0.81 (0.57, 0.93)

0.81 (0.54, 0.92)

0.82 (0.57, 0.93)

0.83 (0.61, 0.93)

0.84 (0.61, 0.93)

External test sets

0.57 (0.20, 0.80)

0.79 (0.51, 0.92)

0.79 (0.51, 0.92)

0.80 (0.53, 0.92)

0.82 (0.58, 0.93)

0.83 (0.57, 0.93)

Day 3 Number of cells (PCC)

Internal test sets

0.41 (0.03, 0.72)

0.75 (0.44, 0.90)

0.74 (0.42, 0.89)

0.75 (0.44, 0.90)

0.80 (0.53, 0.92)

0.81 (0.56, 0.92)

External test sets

0.42 (0.03, 0.73)

0.75 (0.43, 0.90)

0.74 (0.43, 0.90)

0.76 (0.47, 0.90)

0.81 (0.56, 0.93)

0.81 (0.55, 0.92)

Day 5 Blastocyst formation (AUC)

Internal test sets

0.68 (0.65, 0.71)

0.80 (0.77, 0.82)

0.80 (0.77, 0.82)

0.82 (0.79, 0.85)

0.86 (0.84, 0.89)

0.86 (0.84, 0.88)

External test sets

0.58 (0.52, 0.65)

0.65 (0.57, 0.73)

0.63 (0.56, 0.70)

0.68 (0.60,0.75)

0.74 (0.67, 0.81)

0.72 (0.65, 0.80)

  1. The table compares the performance of different approaches across internal and external test sets for multitask morphology assessment. The approaches include the Local scenario (a model trained solely on a single client, representing lower-bound performance), federated baselines (FedAvg, FedProx, and FedDWA), our approach (FedEmbryo), and the Centralized scenario (a model trained centrally on all clients, representing upper-bound performance). The evaluation metrics are the AUC and PCC scores. Bold indicates the best results. The results are presented with the 95% confidence intervals.
  2. AUC area under the curve, PCC Pearson’s correlation coefficient.