Fig. 2: Model evaluation on the validation dataset.

True vs. predicted values for the 3.8M event realizations from the validation subset, where an event realization is a single log(RotD50) recording on a single site for one hypothetical scenario from the validation dataset. True refers to RotD50 values in logarithmic units directly for the synthetics, while predicted refers to ML inferences. a shows the results for the RF algorithm in blue and b for the DNN algorithm in green. Columns, from left to right, correspond to the four considered periods, namely T = 2s, 3s, 5s, and 10s. Given the number of data values, a color intensity map has been used to display the density of data counts in each of the 100 × 100 cells of each plot, with dark hues indicating a high count. The dashed black line shows a perfect prediction for reference. The predictions are consistent for all models. The corresponding averages of the six considered score metrics for all event realizations are summarized in Supplementary Table 1.