Figure 2

Comparison between three-model ensemble models based on CNNs and on DeiTs on the ZooLake test set. The bar heights indicate how often each combination (RRR, RRW, RWW, WWW) appeared. RRR indicates that all the models gave the right answer, RRW means that one model gave a wrong answer, and so on. The numbers below each bar indicate explicitly the height of the bar. On panel (a) we consider the whole test set, on panel (b) we only consider the examples which were correctly classified by the arithmetic ensemble average, and on panel (c) those correctly classified through geometric ensemble average.