Fig. 2: Relative performance of generalized and tailored machine learning forecasts on out-of-sample volcano data and benchmarked against seismic intensity (RSAM) models.

ROC = Receiver Operating Characteristic curve and AUC = Area Under the (ROC) Curve are metrics that quantify a model’s ability to discriminate pre-eruptive signals from the non-eruptive background. Subplots (a–c) show performance of the three generalized forecasters (magmatic, phreatic, and world pools). Subplots (d–f) compare performance of tailored and generalized forecasters for Bezymianny, Whakaari, and Copahue. Diagonal dashed lines show a reference random model with no predictive skill. Models with higher AUC have greater predictive skills. (g–i) Relative frequency of forecast values, distinguishing between pre-eruptive and non-eruptive windows, over the 10-year record at Whakaari (Fig. S2). A larger separation between distributions denotes improved predictive skill. j The same frequency plot as (g–i) but for 14 volcanoes in the phreatic pool.