Fig. 1 | Scientific Reports

Fig. 1

From: Generating and evaluating synthetic data in digital pathology through diffusion models

Fig. 1

The proposed pipeline. The graphical workflow of our proposed pipeline is demonstrated. (A) Demonstrates the preprocessing step, where first, through a set of steps the tissue is detected in a slide followed by a tiling step to extract batches of tiles. (B) Demonstrates the model training step where the extracted tiles are input into a denoising diffusion model, where noise is initially added to images. These perturbed images, along with the noise level embedding and the class embedding (in our case, the tissue type), are subsequently fed into a U-Net model, which predicts the noise added to the image. Both the initial noise and the predicted noise are utilized to compute the loss and drive the training process. (C) Depicts the data generation step. Here, pure noise along with the desired class embedding are fed into the trained model to predict the noise image, which is then subtracted from the original noise to create a less noisy image. This resulting image serves as the new noisy input. After this process is repeated for ‘t’ steps, the final image is generated. (D) Subsequently, the generated dataset undergoes comprehensive evaluations, divided into three distinct categories. Quantitative assessment entails metrics such as Inception Score (IS), Frechet Inception Distance (FID), improved precision-recall (PRC, RCL), and density-coverage (DEN, COV) to gauge the similarity between the generated and real datasets as well as IL-NIQE to assess image quality. A practical evaluation involves training a ResNet classifier for tissue detection, assessing and comparing the usability of the generated data with the real dataset in terms of performance and explainability. Lastly, the biological realism of the generated data is evaluated through a series of questionnaires administered to expert pathologists.

Back to article page