Fig. 4: Pathology and radiology imaging data processing pipelines implemented in HONeYBEE. | npj Digital Medicine

Fig. 4: Pathology and radiology imaging data processing pipelines implemented in HONeYBEE.

From: HONeYBEE: enabling scalable multimodal AI in oncology through foundation model-driven embeddings

Fig. 4

A Whole slide image processing: (i) Tissue samples are digitized into gigapixel WSIs using slide scanners. (ii) Stain normalization is applied to reduce inter-slide variability caused by differences in hematoxylin, eosin, or DAB staining. Methods include Reinhard, Vahadane, and Macenko normalization techniques. (iii) The source WSI undergoes preprocessing, including tissue segmentation and filtering. (iv) Tissue regions are divided into smaller, information-rich patches for analysis. (v) A pretrained tissue detector model classifies regions as slide background, tissue, or noise to identify high-quality tissue areas. (vi) Grid-based patch extraction is performed over valid tissue regions. (vii) A pretrained embedding extraction model processes the tissue patches to generate fixed-length vectors. These embeddings, along with metadata, are stored in a structured database for downstream AI applications. B Radiological image processing: (i) Radiology images are acquired and ingested in DICOM or NIfTI format. (ii) Spatial standardization and resampling are applied to harmonize voxel spacing and orientation. (iii) Denoising and artifact reduction methods, such as non-local means or deep learning-based techniques, improve signal quality. (iv) Segmentation models isolate anatomical structures or lesions. (v) Intensity normalization ensures consistency across patients and scanners. (vi) Preprocessed images are passed through an embedding extraction model, and the resulting feature vectors are stored with metadata. This standardized pipeline enables downstream analysis such as classification, retrieval, and prognosis modeling across both pathology and radiology imaging modalities.

Back to article page