Fig. 2: Comparison of different state-of-the-art architectures when applied to 3D aberration sensing.

a–d, Comparison of model variants ConvNeXt-T/S/B/L (blue), ViT/16-S/B (orange) and AOViFT-T/S/B/L/H (gray). a, Total number of trainable parameters. b, Maximum predictions per second, using a batch size of 1,024 on a single A100 GPU. Higher values are better. c, Training time on eight H100 GPUs. d, Median λ RMS residuals over 10,000 test samples after one correction, with aberrations ranging between 0.2λ and 0.4λ, simulated with 50,000 to 200,000 integrated photons. e,f, Median λ RMS residuals using our Small variant of AOViFT model for a single bead over a wide range of SNR. g,h, Median λ RMS residuals using our Small variant of AOViFT model for several beads (up to 150 beads), simulated at photon levels from 50,000 to 200,000 per bead. Lower values are better for all performance indicators listed here, except for b. CDF, cumulative distribution function; KDE, kernel density estimation.