Extended Data Fig. 2: MedMNIST training convergence. | Nature Computational Science

Extended Data Fig. 2: MedMNIST training convergence.

From: Overcoming data scarcity in biomedical imaging with a foundational multi-task model

Extended Data Fig. 2

a Architecture comparison between ResNet-5070 and Swin Transformer in the “tiny” variant33, evaluated on combined 2D and 3D multi-task trainings. b Comparison of training schemes for the Swin Transformer tiny architecture. Traditional SGD used SGD optimizer without momentum and without gradient accumulation. Traditional Adam used the same setting but with the Adam optimizer. Balanced added 12 gradient accumulation steps to the traditional Adam setting. Cyclic systematically sampled each task exactly once per update step, identical to the method used to train UMedPT. The average standard deviation across five independent experiments of the last 10 epochs of validation accuracy was 1.81 ± 1.79% for balanced sampling and 1.17 ± 1.09% for cyclic sampling (Mean ± SD).

Source data

Back to article page