Fig. 1: Data preparation workflow.

a Haematoxylin and eosin (H&E)-stained whole slide images (WSIs) were de-stained after which immunohistochemistry was performed using an anti-phosphorylated histone H3 (pHH3) antibody which labels mitotic figures (MFs) (STMF-V0). b An initial Mask-RCNN model trained on STMF-V0 was applied to new WSIs for detecting MFs, which were then labelled by six pathologists as MF or false positives. This process facilitated the iterative refinement and expansion of the dataset to produce STMF. c The masks of the MFs from STMF and the bounding boxes from four external datasets were refined by Segment Anything (SAM) and integrated with ICPR to create the final dataset. The original and refined masks are presented in yellow and blue, respectively.