Figure 1

Comparing EDeiTs to the previous SOTA. For each dataset, we show the error, which is the fraction of misclassified test images (\(1-\text {accuracy}\)). The error of the existing SOTA model is shown in orange. For the ensembles of DeiTs, we show two ways of combining the individual learnings: through arithmetic (blue) and geometric (purple) averaging.