AutoQC-Bench: a diffusion model and benchmark for automatic quality control in high-throughput microscopy

Pan, Zixuan; Sonneck, Justin; Nagel, Dennis; Hasenberg, Anja; Gunzer, Matthias; Shi, Yiyu; Chen, Jianxu

doi:10.1038/s44303-025-00117-8

Download PDF

Brief Communication
Open access
Published: 07 November 2025

AutoQC-Bench: a diffusion model and benchmark for automatic quality control in high-throughput microscopy

Zixuan Pan¹^na1,
Justin Sonneck^2,3^na1,
Dennis Nagel⁴,
Anja Hasenberg⁴,
Matthias Gunzer^2,4,
Yiyu Shi¹^na2 &
…
Jianxu Chen²^na2

npj Imaging volume 3, Article number: 57 (2025) Cite this article

2023 Accesses
1 Citations
Metrics details

Subjects

Abstract

Reliable biomedical imaging demands rigorous quality control, yet high-throughput microscopy remains prone to diverse artifacts. We present AutoQC-Bench, a software based on a reconstruction-driven diffusion model flagging abnormal images without prior knowledge, and along with a benchmark of 8000 images capturing common quality issues. The software outperforms existing methods, generalizes across modalities, and supports large-scale bioimaging studies. The software and benchmark are openly shared to advance robust microscopy quality control.

Introduction

Advanced microscopy technologies have transformed high-throughput biomedical research, enabling large-scale experiments such as drug screening and high-throughput profiling. For instance, high-throughput time-lapse imaging assays for cell migration analysis and toxicity screening can produce vast amounts of data, often comprising hundreds or thousands of movies, each with several hundreds of time frames. The fast development of artificial intelligence (AI) based microscopy image analysis has revolutionized quantitative studies at unprecedented accuracy and scale, but could still suffer from out-of-distribution (OOD)¹ data (i.e., images with unexpected features, such as unusual lighting that significantly differs from the training data) and therefore degraded performance. Various quality issues, such as poor illumination, contamination, optical defects, out-of-focus images, etc., could compromise the dataset, despite the sophistication of modern imaging pipelines.

Ensuring data quality through effective quality control (QC) is critical for reliable downstream analysis. Traditional QC procedures typically rely on a combination of automated methods, informed by prior knowledge or heuristics (e.g., detecting intensity values outside predefined ranges), and manual selective verification. However, these semi-automated approaches can still be labor-intensive and time-consuming, and often struggle to effectively address the wide diversity of potential issues in large-scale datasets (not only imaging issues, e.g., lighting, but also sample issues, e.g, contamination), leaving considerable errors undetected. Some pilot bioimaging AI models, such as a denoising foundation model² or parts of Cellpose3³, are able to deal with specific issues, e.g. restoring noisy microscopy images out-of-the-box to some extent, but may not be sufficient for comprehensive QC.

To address this problem, in this work, we developed a diffusion model-based automated QC toolbox (together with a new microscopy QC benchmark), which is agnostic to specific QC issues. Evaluated on both our new benchmark set and another real large-scale dataset from a public bioimaging AI challenge (LightMyCell, orginally for a different task), we show that our proposed tool can effectively flag potential quality issues and avoid false alarms on natural biological variations (see examples in Fig. 1 and more details in Methods).

**Fig. 1: Overview of the AutoQC-Bench (software and benchmark data) and qualitative and quantitative results.**

The core methodology of our AutoQC tool is a reconstruction-based framework as depicted in Fig. 1a. Briefly, we train a specially designed diffusion model on a high-quality reference set with only “normal” images. When new images are collected, the model attempts to reconstruct them. If an image is “normal”, the reconstruction will closely resemble the original image. However, if an image is “abnormal”, the model will reconstruct it as if it were “normal”, since the model has only been trained on “normal” images. By comparing the discrepancy between the original and reconstructed images, we can automatically detect a wide range of potential issues, including unexpected errors. Here, the definition of “normal” images may not be the best images from an imaging perspective, but vary in different applications and may even include images containing moderate noise or artifacts. For example, a certain degree of flat-field issue due to a specific lens used in a study could have negligible effects on downstream quantitative analysis, e.g., quantifying the motility of bacteria.

While the reconstruction-based framework is not new, many different models could be used, such as the classic auto-encoder models, but the effectiveness significantly varies through our systematic evaluations. To this end, we provide a dataset of “normal” images and a comprehensive collection of flawed images from five distinct categories of issues as a benchmark for the bioimaging community. Additionally, we release the code for our method and other state-of-the-art related methods as part of a benchmarking toolkit. We hope that our work will raise the awareness of and foster further developments in automated QC, which is crucial for high-quality, large-scale downstream analyses. The overall benchmarking toolkit is presented in Fig. 1b.

We collected images for benchmarking purposes with the ComplexEye system⁴ (see details in Methods), a multi-lens array microscope designed for high-throughput immune cell migration analysis, enabling simultaneous time-lapse imaging of up to 64 wells in a 384-well plate with rapid frame acquisition. 8233 images (individual 2D frames extracted from time-lapse movies) of migrating neutrophils from human blood samples were manually curated and used as a reference set for model training. The set of images held-out for tests and benchmarking were constructed from two sources: (1) 2D image frames with imaging artifacts from a real high-throughput compound screening experiment with human blood samples, and (2) 2D image frames from short movies specially collected for this novel benchmark with mouse bone marrow samples, to represent additional normal images from different assays and additional problematic images with potential issues not in the screening experiments. In the end, the held-out evaluation set contains 100 positive samples (neutrophil images of acceptable qualities from both human blood samples and mouse bone marrow samples) and 48 negative samples with five types of issues: air bubbles, artifacts, Z-shift, illumination issues, and contamination (Fig. 1b). To enable comprehensive quantitative benchmarking, we also roughly annotated a mask for the problematic areas in all negative samples, so that the sensitivity and specificity of the model can be estimated. It is worth emphasizing that such masks are only rough delineations, different from segmentation masks, and therefore only serve as a complementary quantitative metric besides the positive/negative classification labels. In contrast to the existing dataset⁵, which provides image-level labels for supervised artefact classification in multispectral imaging patches, our dataset additionally includes pixel-level annotations of common artefacts in full-size brightfield microscopy imaging, enabling both classification and anomaly localization tasks. In practice, the accurate localization of the problematic areas could be combined with additional heuristics to further improve the QC workflow.

Quantitative results are shown in Fig. 1c with example images in Fig. 1d and additional visualizations in the Supplementary Information. It is evident that different models under the reconstruction-based framework have very different performance in identifying images with various issues. In particular, diffusion-based approaches achieved the strongest results, with pDDPM, a patch-based variant of DDPM that reconstructs each patch conditioned on its surrounding regions, performing the best. By incorporating global context while preserving local details, pDDPM improves reconstruction fidelity and enhances sensitivity to various anomalies. In our AutoQC software, this enables the model not only to detect all types of issues in the benchmark set (without prior knowledge of potential anomaly), but also generalizes well from human neutrophils to mouse neutrophils (Supplementary Fig. 2), while the problematic areas identified by the model align reasonably well with the areas highlighted by human experts.

In addition to this new benchmark set, we also tested our AutoQC software (using pDDPM) on another large-scale dataset from the public LightMyCell grand challenge (https://lightmycells.grand-challenge.org/), to evaluate the effectiveness of our method in different microscopy modalities. Briefly, the LightMyCell dataset contains over 2500 paired transmitted light images and fluorescence microscopy images of 4 different organelles in vertebrate cells to evaluate the performance of in-silico labeling models, collected from 30 different studies in the France Bioimaging Infrastructure. The scale of the data prohibited exhaustive manual or semi-automatic QC and therefore the dataset may contain various “noisy” samples in the training data. As a proof-of-concept, we take all 1289 fluorescence images of nuclei from the largest study (ID=25) and manually curated a set of 103 normal images to train our AutoQC model. Then, we applied the model to all remaining fluorescence images of nuclei. Sample images with the highest confidence and lowest confidence of being normal are presented in Fig. 1e. Our AutoQC software can provide reliable assistance in curating the large diverse dataset, which is crucial in establishing high-quality datasets for large-scale bioimaging AI competitions or for training large foundation models.

We witness a fast progress in bioimaging AI and microscopy technology in tandem. High-quality large-scale datasets play a central role, permitting innovative biomedical studies only possible at scale, and paving the path for AI researchers to establish more large foundation models for the bioimaging field. We hope that the AI-based AutoQC software we introduced in this work can help biomedical researchers to effectively reduce potential noise in their large-scale datasets, and the QC benchmark dataset we released with this work can further raise the awareness for automatic quality control in order to stimulate the development of more effective methods. We welcome contributions from the community to enlarge the benchmark set and eventually establish a community-driven standard for automatic microscopy image QC.

Methods

Principles of reconstruction-based quality control methods

Let ${{\bf{X}}}^{n}={\{{{\bf{x}}}_{i}^{n}\in {{\mathcal{X}}}^{n}\}}_{i = 1}^{N}$ represent the set of N samples on a normal data space ${{\mathcal{X}}}^{n}$, where each ${{\bf{x}}}_{i}^{n}$ is a clean image without any abnormal regions. Reconstruction-based quality control methods usually train a model f_θ( ⋅ ) that reconstructs ${{\bf{x}}}_{i}^{n}$ from a corrupted version ${{\bf{x}}}_{i}^{n^{\prime} }$ by minimizing a reconstruction loss:

$$\mathop{\min }\limits_{\theta }\frac{1}{N}\mathop{\sum }\limits_{i=1}^{N}{L}_{{\rm{train}}}\left({{\bf{x}}}_{i}^{n},{\hat{{\bf{x}}}}_{i}^{n}\right),\quad \,{\text{where}}\,\,{\hat{{\bf{x}}}}_{i}^{n}={f}_{\theta }\left({{\bf{x}}}_{i}^{n^{\prime} }\right).$$

(1)

L_train is a function to measure the reconstruction quality.

When deployed, we have a test dataset with anomalies ${{\bf{X}}}^{a}={\{{{\bf{x}}}_{j}^{a}\in {{\mathcal{X}}}^{a}\}}_{j = 1}^{M}$. For any test image ${{\bf{x}}}_{j}^{a}\in {{\bf{X}}}^{a}$, we first degrade it to ${{\bf{x}}}_{j^{\prime} }^{a}$, and then use the well-trained reconstruction model ${f}_{{\theta }^{* }}(\cdot )$ to get the reconstruction ${\hat{{\bf{x}}}}_{j}^{a}$. The pixel-wise anomaly score map Λ_j is defined by the reconstruction error:

$${\Lambda }_{j}={L}_{{\rm{test}}}\left({{\bf{x}}}_{j}^{a},{\hat{{\bf{x}}}}_{j}^{a}\right),\quad \,{\text{where}}\,\,{\hat{{\bf{x}}}}_{j}^{a}={f}_{\theta }^{* }\left({{\bf{x}}}_{j}^{a^{\prime} }\right).$$

(2)

Here, higher values in the score map correspond to larger reconstruction errors, indicating higher probability of being abnormal. L_test serves the same purpose of assessing the reconstructed image as L_train, though it may use a different function.

Once we obtain the anomaly score map Λ_j, we can determine whether the test image is normal or anomalous by calculating an overall anomaly score AS_j. In this work, we define three methods for this calculation:

Maximum Value: The overall anomaly score is the maximum value in Λ_j:

$$A{S}_{j}=\max ({\Lambda }_{j}).$$

(3)

Mean Value: The overall anomaly score is the mean of all values in Λ_j:

$$A{S}_{j}=\,\text{mean}\,({\Lambda }_{j}).$$

(4)

Patch-Based Maximum: we use a sliding window to extract overlapping patches from Λ_j, compute the mean anomaly score for each patch, and select the maximum among these patch-wise mean scores as the overall anomaly score:

$${A}{S}_{j} = {\rm{max}}\,\left(\right.\!{\text{mean}}\left(\right.\!\Lambda_{j}^{p}\,\left)\right.\!\left)\right., \quad \forall p \in {\text{patches}}.$$

(5)

In practical applications, we can collect a small validation set containing both normal images and annotated abnormal images. To detect anomalies, we first determine an optimal threshold, denoted as AS_*, by performing a greedy search on the validation set to best separate normal and abnormal samples. During test, the software raises an alarm if the anomaly score of a given frame exceeds this threshold, i.e., AS_j > AS_*. For frames flagged as abnormal, the software then generates a pixel-wise anomaly segmentation mask to further localize the anomalous regions. To determine the optimal binarization threshold λ_*, we conduct another greedy search on the abnormal samples by iteratively calculating Dice scores across different threshold values. The best threshold found is then used to generate the final anomaly segmentation mask y_j, defined as:

$${y}_{j}(i,k)={\mathbb{I}}({\Lambda }_{j}(i,k) > {\lambda }_{* })$$

(6)

where ${\mathbb{I}}(\cdot )$ is the indicator function:

$${\mathbb{I}}({\Lambda }_{j}(i,k) > {\lambda }_{* })=\left\{\begin{array}{ll}1,\quad &\,\text{if}\,\,{\Lambda }_{j}(i,k) > {\lambda }_{* },\\ 0,\quad &\,\text{otherwise}\,.\end{array}\right.$$

(7)

Baseline Methods Implemented for Benchmarking

We briefly introduce the baseline algorithms implemented in our software in this subsection.

AE

Autoencoders (AEs)⁶ are reconstruction-based models that consist of two main components: an encoder ${E}_{\phi }:{{\mathbb{R}}}^{H\times W\times C}\to {{\mathbb{R}}}^{d}$ and a decoder ${D}_{\theta }:{{\mathbb{R}}}^{d}\to {{\mathbb{R}}}^{H\times W\times C}$. The encoder compresses the input image ${\bf{x}}\in {{\mathbb{R}}}^{H\times W\times C}$ into a latent representation z = E_ϕ(x), while the decoder reconstructs it from the latent code: $\hat{{\bf{x}}}={D}_{\theta }({\bf{z}})$.

The model is trained by minimizing the reconstruction error between the input x and its reconstruction $\hat{{\bf{x}}}$:

$${L}_{{\rm{AE}}}={{\mathbb{E}}}_{{\bf{x}} \sim {p}_{{\rm{data}}}}\parallel {\bf{x}}-\hat{{\bf{x}}}{\parallel }_{p},\quad \,\text{where}\,\,\hat{{\bf{x}}}={D}_{\theta }({E}_{\phi }({\bf{x}})).$$

(8)

VAE

Variational Autoencoders (VAEs)⁶ extend AEs by modeling the latent space probabilistically. It constrains the outputs of encoder to a Gaussian distribution in the latent space:

$${q}_{\phi }({\bf{z}}| {\bf{x}})={\mathcal{N}}({\mu }_{\phi }({\bf{x}}),{\sigma }_{\phi }^{2}({\bf{x}})),$$

(9)

where μ_ϕ(x) and ${\sigma }_{\phi }^{2}({\bf{x}})$ are learned by the encoder. The decoder reconstructs an image from a sampled latent code:

$$\hat{{\bf{x}}}={D}_{\theta }({\bf{z}}),\quad {\bf{z}} \sim {q}_{\phi }({\bf{z}}| {\bf{x}}).$$

(10)

The VAE is trained to minimize the following loss:

$${L}_{{\rm{V\; AE}}}={{\mathbb{E}}}_{{q}_{\phi }({\bf{z}}| {\bf{x}})}\parallel {\bf{x}}-\hat{{\bf{x}}}{\parallel }_{p}+\,\text{KL}\,({q}_{\phi }({\bf{z}}| {\bf{x}})\parallel p({\bf{z}})),$$

(11)

where KL( ⋅ ) is the Kullback-Leibler divergence, and $p({\bf{z}})={\mathcal{N}}({\bf{0}},{\bf{I}})$ is a standard Gaussian prior on the latent space. VAEs ensure a smooth and structured latent space, which can help improve the generation performance.

f-AnoGAN

f-AnoGAN⁷ is a fast GAN-based anomaly detection method that trains a Wasserstein GAN (WGAN)⁸, consisting of a generator G and a discriminator D, exclusively on normal images. Additionally, an encoder E maps input images to the GAN’s latent space. The training consists of two steps: 1) Training G and D following the standard WGAN procedure with gradient penalty. 2) Training E by minimizing the loss ${L}_{{{\rm{izi}}}_{f}}$, which accounts for both the residual between real and reconstructed images and the residual in the discriminator’s feature space:

$${L}_{{{\rm{izi}}}_{f}}={{\mathbb{E}}}_{{\bf{x}} \sim {p}_{{\rm{data}}}}\left(\frac{1}{n}\parallel {\bf{x}}-G(E({\bf{x}})){\parallel }^{2}+\frac{\kappa }{{n}_{d}}\parallel f({\bf{x}})-f(G(E({\bf{x}}))){\parallel }^{2}\right),$$

(12)

where n is the number of image pixels, κ is a weighting factor, n_d is the dimensionality of the feature representation, and f( ⋅ ) represents the discriminator’s intermediate layer features.

DDPM

Denoising Diffusion Probabilistic Models (DDPMs)^9,10 are a class of generative models that have recently gained significant popularity. DDPM training consists of a Markovian forward process (diffusion process) and a reverse sampling procedure (reverse process). In the diffusion process guided by a noise schedule ${\{{\beta }_{t}\}}_{t = 1}^{T}$, the image x₀ is degraded to the noisy image x_t by

$$\begin{array}{l}q({{\bf{x}}}_{t}| {{\bf{x}}}_{0}):= {\mathcal{N}}\left({{\bf{x}}}_{t};\sqrt{{\bar{\alpha }}_{t}}{{\bf{x}}}_{0},(1-{\bar{\alpha }}_{t}){\bf{I}}\right),\\ {{\bf{x}}}_{t}=\sqrt{{\bar{\alpha }}_{t}}{{\bf{x}}}_{0}+\sqrt{1-{\bar{\alpha }}_{t}}{\boldsymbol{\epsilon }},\quad {\boldsymbol{\epsilon }} \sim {\mathcal{N}}({\bf{0}},{\bf{I}}),\end{array}$$

(13)

where α_t ≔ 1 − β_t and ${\bar{\alpha }}_{t}:= \mathop{\prod }\nolimits_{s = 1}^{t}{\alpha }_{s}$.

¹⁰ show that, in the reverse process, we can model the distribution p_θ(x_t−1∣x_t) of x_t−1 given x_t as a diagonal Gaussian:

$${p}_{\theta }({{\bf{x}}}_{t-1}| {{\bf{x}}}_{t})={\mathcal{N}}({\mu }_{\theta }({{\bf{x}}}_{t},t),{\Sigma }_{\theta }({{\bf{x}}}_{t},t)),$$

(14)

where the mean μ_θ(x_t, t) can be calculated as a function of ϵ(x_t, t), and the covariance Σ_θ(x_t, t) can be fixed to a known constant, following¹⁰. Therefore, DDPMs usually use UNet to estimate the added noise of x_t and optimize the network with a simple loss function:

$${L}_{{\rm{DDPM}}}:= {{\mathbb{E}}}_{t \sim {\mathcal{U}}(1,T),{{\bf{x}}}_{0} \sim q({{\bf{x}}}_{0}),{\boldsymbol{\epsilon }} \sim {\mathcal{N}}({\bf{0}},{\bf{I}})}\left[\parallel {\boldsymbol{\epsilon }}-{{\boldsymbol{\epsilon }}}_{\theta }({{\bf{x}}}_{t},t){\parallel }_{2}^{2}\right].$$

(15)

In the context of reconstruction-based anomaly detection, our objective is not to create new images from pure noise but to reconstruct the normal image given a noisy input image. Therefore, following^9,11,12, we train a denoising model f_θ(x_t, t) to directly estimate x₀ instead of ϵ_t. Another modification is that we estimate x₀ from x_t at a fixed time step t_test when sampling, rather than T steps in traditional DDPM. This simplification significantly decreases the sampling time but does not affect the anomaly detection performance.

pDDPM

pDDPM¹² proposes to conduct the diffusion and reverse processes only on a patch p₀ while using the remaining regions as condition c. This technique offers better reconstruction by encouraging global context information incorporation. During evaluation, pDDPM reconstructs from noisy patches sampled over the whole image and merge all patches to one final reconstruction.

Evaluation metrics

In this paper, we evaluate the performance of the methods using both classification (ACC and AUC) and segmentation (DICE and AUPRC) metrics.

ACC

Accuracy (ACC) measures the overall correctness of the classification predictions. It is defined as:

$$\,\text{ACC}=\frac{\text{TP}+\text{TN}}{\text{TP}+\text{TN}+\text{FP}+\text{FN}\,},$$

(16)

where TP, TN, FP, and FN represent true positives, true negatives, false positives, and false negatives, respectively. Higher ACC values indicate better overall classification performance.

AUC

The Area Under the ROC Curve (AUC) is used to measure the classification performance, indicating the model’s ability to distinguish between normal and anomalous samples. A higher AUC value signifies better classification performance.

DICE

The Dice Similarity Coefficient (DICE) evaluates the overlap between predicted segmentation and ground truth. It is defined as:

$$\,{\text{DICE}}\,=\frac{2| P\cap G| }{| P| +| G| },$$

(17)

where P and G are the predicted and ground truth segmentation masks, respectively. Higher DICE values indicate more accurate segmentation.

AUPRC

The Area Under the Precision-Recall Curve (AUPRC) measures the segmentation model’s performance in detecting anomalies at varying thresholds.

Benchmark data acquisition with high-throughput imaging system

The ComplexEye microscope is a high-throughput multi-lens array system designed for real-time imaging of migrating immune cells. It features 16 independently controlled aberration-corrected glass lenses, each paired with a CMOS image sensor and Köhler-optimized LED illumination, enabling simultaneous imaging of 16 wells in a 96-well plate or 64 wells in a 384-well plate. The system uses a motorized XYZ stage to move the optical assembly while keeping the sample stationary, preventing motion artifacts in non-adherent cells. Imaging is performed at 4.7 × magnification with a numerical aperture (NA) of 0.3, providing a field of view of 825.8 × 512 μm per well. Time-lapse imaging is captured at one frame per 8 seconds over an hour-long period, ensuring sufficient temporal resolution for automated cell tracking. The system is housed in a temperature-controlled incubator set to 37 °C to maintain optimal physiological conditions. Data acquisition and autofocus adjustments are managed by an FPGA-based controller, which optimizes image sharpness across all lenses in real time, minimizing variability between wells. This setup enables robust, high-throughput analysis of cell migration with single-cell resolution, making it ideal for large-scale screening and clinical studies.

Sources of anomalies in benchmark dataset

Air bubbles

Air bubbles may occur during probe processing. Especially when pipetting small volumes such as in a 384-well plate, unwanted bubbles can occur within the well plate. When light passes those air bubbles, light will be reflected and refracted and an uneven exposure of the probe is highly likely¹³.

Artifacts

Artifacts can occur when dirt, most likely dust, gets into the light path, most of the parts of the microscope are not always assessable or easily cleanable. This blocks the light from passing through the probe and creates shadows and unwanted dots that might be misinterpreted as structures of interest during an automated tracking¹⁴.

Z-shift

A Z-shift can result in a blurred and unsharp image. This can occur when cells move in the Z-axis. Reasons for this can be thermodynamic changes of the media leading to movements of the cell culture media. Small inaccuracies of the microscope in finding the same Z-position reproducibly in an imaging scene can also lead to a Z-shift and a blurred image^15,16.

Illumination

Illumination problems such as overexposure or uneven exposure can be caused by poor calibration or exposure time or intensity, most likely produced by a wrong Köhler illumination or a problem with the aperture¹⁴.

Contamination

When working with biological systems, even under great precautions, contaminations can occur. Biological structures such as hair, dust or mold can be introduced by the animal or the researcher into the probe itself or the microscope. This can lead to unsharp bright spots in the images¹⁴.

Data availability

The benchmark dataset is released on Bioimage Archive: https://doi.org/10.6019/S-BIAD2133.

Code availability

The whole project including the source code can be accessed from https://github.com/MMV-Lab/mmv_AutoQC.

References

Pernice, W. M. et al. “Out of distribution generalization via interventional style transfer in single-cell microscopy,” in Proc. IEEE/CVF Conf. on Computer Vision and Pattern Recognition, pp. 4326–4335, 2023.
Ma, C., Tan, W., He, R. & Yan, B. Pretraining a foundation model for generalizable fluorescence microscopy-based image restoration. Nat. Methods 21, 1558–1567 (2024).
Article CAS PubMed Google Scholar
Stringer, C. & Pachitariu, M. “Cellpose3: one-click image restoration for improved cellular segmentation,” Nature Methods, pp. 1–8, 2025.
Cibir, Z. et al. ComplexEye: a multi-lens array microscope for high-throughput embedded immune cell migration analysis. Nat. Commun. 14, 8103 (2023).
Article CAS PubMed PubMed Central Google Scholar
Sharma, V. & Yakimovich, A. A deep learning dataset for sample preparation artefacts detection in multispectral high-content microscopy. Sci. Data 11, 232 (2024).
Article PubMed PubMed Central Google Scholar
Baur, C., Denner, S., Wiestler, B., Navab, N. & Albarqouni, S. Autoencoders for unsupervised anomaly segmentation in brain MR images: a comparative study. Med. Image Anal. 69, 101952 (2021).
Article PubMed Google Scholar
Schlegl, T., Seeböck, P., Waldstein, S. M., Langs, G. & Schmidt-Erfurth, U. f-AnoGAN: Fast unsupervised anomaly detection with generative adversarial networks. Med. Image Anal. 54, 30–44 (2019).
Article PubMed Google Scholar
Arjovsky, M., Chintala, S., and Bottou, L. “Wasserstein generative adversarial networks,” in International Conference on Machine Learning, pp. 214–223, (PMLR, 2017).
Wyatt, J., Leach, A., Schmon, S. M. & Willcocks, C. G. “Anoddpm: Anomaly detection with denoising diffusion probabilistic models using simplex noise,” in Proc. IEEE/CVF Conf. on Computer Vision and Pattern Recognition, pp. 650–656 (2022).
Ho, J., Jain, A. & Abbeel, P. Denoising diffusion probabilistic models. Adv. Neural Inf. Process. Syst. 33, 6840–6851 (2020).
Google Scholar
Iqbal, H., Khalid, U., Chen, C. & Hua, J. “Unsupervised anomaly detection in medical images using masked diffusion model,” in International Workshop on Machine Learning in Medical Imaging, pp. 372–381, (Springer, 2023).
Behrendt, F., Bhattacharya, D., Krüger, J., Opfer, R. & Schlaefer, A. “Patched diffusion models for unsupervised anomaly detection in brain MRI,” in Medical Imaging with Deep Learning, pp. 1019–1032 (2024).
Wolf, K. B. & Krotzsch, G. Geometry and dynamics in refracting systems. Eur. J. Phys. 16, 14 (1995).
Article Google Scholar
Reddy, C. A., Beveridge, T. J., Breznak, J. A. & Marzluf, G., Methods for General and Molecular Microbiology, American Society for Microbiology Press, (2007).
Munson, B. R., Young, D. F. & Okiishi, T. H. Fundamentals of fluid mechanics. Oceanographic Lit. Rev. 10, 831 (1995).
Google Scholar
Incropera, F. P., DeWitt, D. P., Bergman, T. L. & Lavine, A. S. Introduction to Heat Transfer, 5th ed., John Wiley & Sons, 2007, p. 6.

Download references

Acknowledgements

The authors would like to thank the organizers of the LightMyCell Challenge for permission to use the data for testing. The work of ISAS was supported by the “Ministerium für Kultur und Wissenschaft des Landes Nordrhein-Westfalen” and “Der Regierende Bürgermeister von Berlin, Senatskanzlei Wissenschaft und Forschung.” The work of J.S. was partly supported by the Deutsche Forschungsgemeinschaft (DFG) under project number 528777169. The work of J.C. was further supported by the Bundesministerium für Forschung, Technologie und Raumfahrt, BMFTR under the funding reference 161L0272. M.G. was supported by the Deutsche Forschungsgemeinschaft (DFG), research grant GU 769/10-1 and GU769/15-1 and 15-2 (Immunostroke). M.G. and A.H. are partly supported by the DFG CRC TRR332, project C6 and project A5, respectively.

Funding

Open Access funding enabled and organized by Projekt DEAL.

Author information

These authors contributed equally: Zixuan Pan, Justin Sonneck.
These authors jointly supervised this work: Yiyu Shi, Jianxu Chen.

Authors and Affiliations

Computer Science and Engineering, University of Notre Dame, Notre Dame, USA
Zixuan Pan & Yiyu Shi
Leibniz-Institut für Analytische Wissenschaften - ISAS - e.V., Dortmund, Germany
Justin Sonneck, Matthias Gunzer & Jianxu Chen
Faculty of Computer Science, Ruhr-University Bochum, Bochum, Germany
Justin Sonneck
Institute for Experimental Immunology, Imaging, University Hospital, University Duisburg-Essen, Essen, Germany
Dennis Nagel, Anja Hasenberg & Matthias Gunzer

Authors

Zixuan Pan
View author publications
Search author on:PubMed Google Scholar
Justin Sonneck
View author publications
Search author on:PubMed Google Scholar
Dennis Nagel
View author publications
Search author on:PubMed Google Scholar
Anja Hasenberg
View author publications
Search author on:PubMed Google Scholar
Matthias Gunzer
View author publications
Search author on:PubMed Google Scholar
Yiyu Shi
View author publications
Search author on:PubMed Google Scholar
Jianxu Chen
View author publications
Search author on:PubMed Google Scholar

Contributions

Z.P. implemented the software and partially conducted the experiments. J.S. partially conducted the experiments and organized the benchmark dataset. D.N. collected microscopy images. A.H. and M.G. oversaw the biological experiments and image acquisition. Y.S. and J.C. jointly supervised the project. P.Z., J.S. and J.C. wrote the initial draft. All authors reviewed and revised the manuscript.

Corresponding authors

Correspondence to Yiyu Shi or Jianxu Chen.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

Supplementary information (download PDF )

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Pan, Z., Sonneck, J., Nagel, D. et al. AutoQC-Bench: a diffusion model and benchmark for automatic quality control in high-throughput microscopy. npj Imaging 3, 57 (2025). https://doi.org/10.1038/s44303-025-00117-8

Download citation

Received: 22 July 2025
Accepted: 30 September 2025
Published: 07 November 2025
Version of record: 07 November 2025
DOI: https://doi.org/10.1038/s44303-025-00117-8