Fig. 3: Performance gain of generalized forecast models as the size of the volcano ensemble is increased.

Median and 33-67 percentile range of AUC for models trained with ensemble sizes ranging from 1 to 20 volcanoes. Ensemble membership is selected randomly from the world pool, whereupon a train a 25-decision tree model is trained and then tested on the remaining unselected (out-of-sample) volcanoes in the pool. Performance appears to saturate at an AUC > 0.8 for ensembles sizes larger than 8 volcanoes.