Background & Summary

The White Blood Cell (WBC) differential test is the second most commonly conducted hematological examination, providing crucial clinical insights1. It involves the manual enumeration of 200 WBCs by two experts using a peripheral blood smear stained with Romanowsky procedure2. However, this process is labor-intensive and time-consuming as the experts have to confirm each subtype of the WBCs manually3. Over the past few decades, automated digital microscopy, such as digital morphology analyzers, have been seamlessly integrated into laboratory systems, significantly enhancing operational efficiency4. Nonetheless, the verification and manual review process remains a necessity due to regulatory requirements and risk management in hematology laboratories3,5,6.

While machine learning research has successfully improved automated digital microscopy’s performance in terms of accuracy and efficiency7,8,9,10, significant improvements highly depend on the availability of large and high-quality datasets. Some authors developed and published on a small private dataset11,12. However, it has been claimed that the complexities of real-world data cannot be captured using the small datasets13. To address this, large-scale datasets for WBC classification have recently been published and made publicly available10,11,12,13,14, aiming to build more robust and generalized machine learning models. A summary comparing these datasets with our dataset is shown in Table 1. Some of the datasets were manually prepared and imaged11,12,13,14, while others, including ours10,15, were prepared and imaged automatically.

Table 1 WBC open dataset comparison.

Although not highlighted by other works, high-magnification imaging using brightfield microscopy often results in a limited depth of field (DoF)16,17,18. Such a constraint can be challenging for automated digital microscopy because it takes multiple images in a z-stack to fully image a sample or cell. For example, a sample thickness of 3 to 4 requires at least seven focal planes for an axial step of 400 to 500 nm using a microscope with a DoF of 0.517. Multiple images are needed to fully reconstruct the image using fusion techniques16,17,19,20, which results in low throughput and increased data storage, and this may be problematic in automatic digital microscopy. As highlighted in Table 1, publicly available WBC classification datasets have limitations as they do not contain focal stack data, which could be useful in addressing the problems of limited depth of field (DoF) and multi-focus issues.

This paper presents the details of our newly released comprehensive, multi-focus dataset with granular labels for WBC classification. Our dataset contains 25,773 image stacks from 72 patients. The image labels consist of 18 classes encompassing normal and abnormal cells, with two experts reviewing all labels. This significant contribution aims to aid various research areas, primarily in improving classification models and focus stacking. With 10 z-stack images, researchers can leverage various techniques to fully utilize the information from each z-image. For example, our multi-focus dataset can help reduce labeling effort and mitigate human bias when using weakly supervised learning approaches21. Additionally, our dataset can be used to construct image fusion models based on machine learning17,19,20, test model robustness by simulating real application scenarios similar to the referenced study22, explore and research focus algorithms23,24,25,26, and augment model training with techniques akin to defocus blur augmentation, which is known to produce more robust models compared to those without augmentation27,28,29,30.

Methods

Peripheral blood samples were collected from 72 Asan Medical Center (AMC) patients in Seoul, Republic of Korea (Institutional Review Board approval number: S2019-2564-0004). These blood samples were stored in ethylenediaminetetraacetate (EDTA) tubes. Residual samples after testing were used to prepare the dataset. Due to this condition, the IRB was exempted from obtaining written informed consent, as the blood samples did not contain identifiable personal information. Consequently, since identifiable personal information was not included, we were able to retrieve normal and abnormal cases based on WBC classes only. As shown in Table 2, 57 samples contained abnormal WBCs, while 15 samples only contained normal WBCs.

Table 2 The number of blood samples and corresponding prepared slides from patients.

Sample preparation, which included smearing and staining, was performed using miLab, a fully automated staining and digital imaging platform developed by Noul, Co., Ltd.31. A drop of 4 to 5 microliters of whole blood was inserted into the cartridge, equipped with a thin plastic film for smearing. After smearing, the blood cells were fixed, and the blood film was stained using a newly developed hydrogel-based staining method incorporated into the cartridge32. From 72 whole blood samples, a total of 214 slides were prepared to acquire the data (Table 2).

Data acquisition

Images were captured using miLab, which features a motorized stage, a 50X lens, and a digital camera. When a stained blood smear was ready for imaging, the device initiated imaging of fields containing white blood cells. The motorized stage moved until the camera detected one WBC. Once the WBC was detected, a stack of ten multi-focus images was acquired with a step interval of approximately 400. The device continued capturing WBCs until a specific count was reached.

Data annotation

Figure 1 schematically describes the processing data for annotation. Each field of view (FoV) that contains WBCs was imaged for ten z-stacks using miLab. As a result, the information on bounding boxes was included to locate the WBCs. Each bounding box was used to crop the z-stack images. After cropping a stack of 10 images, they were concatenated to form a single image for the annotation. Two medical technologists primarily participated in determining the WBC subtype by examining all z-stack images guided by the Clinical & Laboratory Standards Institute (CLSI)2. Initially, a medical technologist with approximately two years of experience at a university hospital identified the WBC subtypes. Subsequently, these annotations were reviewed by a technologist with around 16 years of experience. In cases of disagreement, the cells were reexamined by additional experts with over 20 years of experience. If there was no agreement, the remaining cells were evaluated by additional experts with over 20 years of experience. The image was excluded from annotation if the stain quality was deemed too poor or the bounding box extraction was not appropriate for confirmation.

Fig. 1
figure 1

The process of multi-focus image data acquisition. A stack of 10 images is captured, and a U-net-based segmentation model is used on the best-focus image determined by the Laplacian filter. Then, the bounding box is extracted to find the location of the WBC. Experts examine all z-stacks to confirm the subtype.

Data Records

The dataset is publicly available on the figshare data repository15. It is comprised of 257,730 cropped WBC images (10 times 25,773 images) and two CSV files (“labels.csv” and “slide_number.csv”). Each image has a cropped size of 200 × 200 pixels. The image name is formatted with the stack number. For example, “100_4.jpg” refers to cell image number 100 with a z-stack number of 4. Z-stack images range from 0 to 9. The CSV file called “labels.csv” contains bounding box locations within the cropped images and WBC subtype labels for each image crop in the same folder. Each row of the CSV file includes the image name, top-left x and y coordinates of the bounding box, width and height of the bounding box, and a WBC subtype label. The second CSV file, called “slide_number.csv”, contains the slide number associated with each cell image. It lists the image number alongside the slide number from which the cell was obtained, with each slide containing multiple cells. There are 214 slides as shown in Table 2.

The dataset contains 18 classes, including segmented neutrophil, banded neutrophil, eosinophil, basophil, lymphocyte, monocyte, abnormal lymphocyte, metamyelocyte, myelocyte, promyelocyte, blast, immature WBCs, giant platelet, aggregated platelet, smudge, broken cell, nucleated red blood cell, unknown WBC, and artifact. The detailed statistics of the dataset are listed in Table 3.

Table 3 Multi-focus WBC dataset information table.

Additional sets of images and CSV files are provided to help readers reproduce the technical validation related to Fig. 2. These resources are located in a folder named “validation”. These include label files similar to those in the open dataset; however, the labels are only available for WBC subtypes such as Neutrophil, Lymphocyte, Basophil, Monocyte, Eosinophil, and Others. Additionally, a CSV file named “cbc_result.csv” is included, which provides the ground truth described in the Technical Validation section.

Fig. 2
figure 2

Calculated R2 scores between miLab expert classification and manual microscope are shown. Five normal WBC subtypes (neutrophil, lymphocyte, monocyte, eosinophil, basophil) and other immature WBCs are listed. The x-axis corresponds to the percentage of WBC cells of a specific subtype that exist in a ground truth slide. The y-axis represents the percentage of WBCs that miLab detected and classified by our experts. The axes are in logarithmic scale to better visualize data points.

Technical Validation

Technical validation of our dataset consists of two components. In the first component we demonstrate that the distribution of miLab expert classifications corresponds to the distribution of cells that are provided by AMC as the ground truth. This validates our method by showing that the slides and annotation quality, including miLab preparation and expert classification, correspond to the slides and ground truth provided by the gold standard method employed at the hospital. In the second component we verify the accuracy of our labels by training a classification model using our dataset and re-examining the labels of the misclassified images.

To validate the annotation quality of the dataset, we obtained 40 additional blood samples from AMC which are not included in the main dataset. The ground truth for the distribution of WBC subtypes was provided by AMC. Samples underwent a Complete Blood Count (CBC) using XN-Series (XN-20, Sysmex, Kobe, Japan). Then, additional manual differential counts were conducted if any flags appeared. These manual differential counts followed the guidelines provided by CLSI2. The cell types included were Blast, Myelocyte, Metamyelocyte, Band neutrophil, Segmented neutrophil, Lymphocyte, Monocyte, Eosinophil, Basophil, Neoplastic lymphocyte, Reactive lymphocyte, Atypical lymphocyte, and nRBC. Certain cell types were combined for analysis; for example, Band neutrophil and Segmented neutrophil were grouped as Neutrophil. Additionally, Blast, Myelocyte, Metamyelocyte, Neoplastic lymphocyte, Reactive lymphocyte, and Atypical lymphocyte were grouped together as Others.

Using the same blood samples, we then automatically smeared, stained and detected WBCs using miLab. The detected cells were then classified by our two experts and the distribution of the outputs was compared to the ground truth from AMC. Figure 2 shows that the miLab prepared slide with expert classification and manual microscopic ground truth from AMC generally provide a high correlation (R2 > 0.9). However, we notice that monocytes and basophils provide lower values of 0.836 and 0.428 respectively. The low correlation coefficient for Basophils is likely due to the low number of basophils in each sample. For monocytes we currently have no intuition as to why the correlation is lower; however we have observed that the same general trend exists in other published papers33,34.

A classification model using just one best-focused image per cell was developed to confirm the whole labeling correctness. The best-focused image was selected from a stack of 10 images using an edge filter using variance similar to Laplacian filter35 because we assumed an image would be best viewed if its edge was strengthened. Each input WBC was cropped from the context image and upsampled to a size of 200 × 200. Therefore, the resized ratio was also used as an input feature. The dataset was randomly split into a training, valid, and test set with a ratio of 70%, 20%, and 10% respectively, and a fine-tuned ResNet34 model yielded an acceptable test accuracy of over 90%. To double check the label accuracy of the data, we then used our trained model to find difficult examples and re-examined them with our domain experts to ensure that they were not mislabeled.

The experiments on label correctness observed several common misclassifications when a single focused image per cell was evaluated. Specifically, the model needed clarification for the following classes. First, some cells classified in immature WBCs tended to be highly confused with those in blasts.We found that there was a some confusion in classifying between immature WBCs and blasts. Additionally, we found a high confusion rate between lymphocytes and variant lymphocytes. Finally, we observed some confusion between neutrophils, metamyelocytes and myelocytes. Interestingly, mistakes that junior microscopists tend to make are similar.

Usage Notes

The dataset consists of images in the JPEG format and label files in the CSV format. As such, no special software is required to use this dataset. Including fine-grained classes, labels allow the dataset’s reformulation to fit the user’s specific needs. See Table 3 for the list of available classes. The multiple stacks of focal planes for WBC might help improve classification performance in machine learning.