Fig. 4: Visualizing data distribution shift and dataset size with performance.

Scatter plot of the average central moment discrepancy (CMD) in label and feature space for each downstream task across the top-n (i.e., “closest” n) extractors for n = 4. Other values of n were also explored without ostensibly different results. Marker sizes are scaled with the downstream dataset size and colored by MoE’s improvement on MAE over STL.