Fig. 2: System architecture.
From: Biological data annotation via a human-augmenting AI-based labeling system

a Data pre-processing. Digital images are first pre-processed by passing through a deep learning model (HoverNet9), which segments and generates bounding boxes for each cell. The image and bounding boxes are then used in real-time via a labeling interface outfitted with two AI models that serve to augment and accelerate expert labeling (b). Real-time AI augmentation. As annotators provide cellular labels (bright green/blue boxes denote different classes), they are stored in a data set and iteratively used to finetune a ResNet classification22 model, pre-trained on the PanNuke cellular data set, but initially untrained on the task at hand. As the classifier learns the annotations, it renders ever more accurate predictions (pastel green/blue boxes) to the annotator. Simultaneously, it feeds in high-dimensional feature vector representations of the labeled and unlabeled data set to the active learner, which determines the next best patch that an annotator should label. Together they increase the speed of annotation and the quality of the labeled data set.