Fig. 2: A bootstrapping-driven pipeline for estimating the uncertainty in a machine-annotated categorical data variable V.

The crucial component is the test set for comparing human expert annotation (ground truth) and machine predictions (or that of human coders). This provides an estimate of annotator accuracy and class confusion within the variable, which can then be used in bootstrapping the confidence intervals for the statistic of interest.