Fig. 6: Detailed dataset distribution of ten cases and their corresponding micro-F1 performance on HPH-50 without kNN-based inference involved, where the number of the samples in incomplete raw EMR dataset, the proportion between noisy and clean-labelled dataset, and the noise ratio in the labelled dataset are adjusted, respectively.
From: Optimising the paradigms of human AI collaborative clinical coding

The number and proportion at the top denote the metrics for quantifying the size and distribution of the three datasets with different noise levels. The textures indicate the noise in the three types of datasets.