Fig. 5: Comparison statistics for satellite-based flood extents.
From: Modeling surge dynamics improves coastal flood estimates in a global set of tropical cyclones

a–c Three metrics (higher is better) are used to express the overall performance of inundation models in reproducing the observed flood extents for 71 storm events. d The ratio of true negative areas (TNR) expresses the share of areas correctly classified as dry (higher is better)—an aspect that is not covered by the F1 and F2 performance scores (b and c). e The bias score expresses the tendency of a model to over- (positive values) or underpredict (negative values). Note the logarithmic scaling on the y-axis with linear scaling between −1 and 1. The evaluation can either be done for each flood map separately, resulting in a range of score values for each of the models (map-by-map score; colored boxes), or across all grid cells contained in all flood maps, resulting in a single performance score for each of the models (total score; black stars). The boxes denote the interquartile ranges with a horizontal black line for the median value, whiskers for the 95% intervals, and circles for the minimum and maximum values.