Fig. 4
From: Scalable deep learning artificial intelligence histopathology slide analysis and validation

The graph shows macro F-score vs. overlap \(\:\varphi\:\) from averaged test runs across all data. After an overlap of 30%, the model’s predictions can become unstable due to overfitting the minority classes. \(\:\varphi\:=0.3\) provides the best stability generally while maximizing the F-score.