Rethinking deep learning in bioimaging through a data centric lens

Cao, Jiajun; Wenzel, Jan; Zhang, Shanghang; Lampe, Josephine; Wang, Hongxiao; Yao, Jiachen; Zhang, Zhicheng; Zhao, Shuo; Zhou, Yu; Chen, Chao; Schwaninger, Markus; Yang, Jufeng; Chen, Danny Z.; Chen, Jianxu

doi:10.1038/s44303-025-00092-0

Download PDF

Comment
Open access
Published: 26 June 2025

Rethinking deep learning in bioimaging through a data centric lens

Jiajun Cao¹,
Jan Wenzel^2,3,
Shanghang Zhang¹,
Josephine Lampe^2,3,
Hongxiao Wang⁴,
Jiachen Yao⁵,
Zhicheng Zhang^6,7,
Shuo Zhao⁸,
Yu Zhou^8,9,
Chao Chen⁵,
Markus Schwaninger^2,3,
Jufeng Yang^6,7,
Danny Z. Chen¹⁰ &
…
Jianxu Chen⁸

npj Imaging volume 3, Article number: 29 (2025) Cite this article

1884 Accesses
4 Altmetric
Metrics details

Subjects

Deep learning has become essential in bioimaging for tasks. By examining data-centric strategies in general AI and revisiting existing deep learning methods in bioimaging, we describe a prototypical “BioData-Centric AI” framework. For AI users in bioimaging, this framework promotes a more practical approach beyond simply annotating large datasets or relying on a universal model. For method developers, it highlights key research directions to enhance AI toolboxes for the bioimaging community.

Introduction

The rapid advancement of artificial intelligence (AI) has catalyzed a profound transformation within the field of bioimaging. Currently, there are two prevailing strategies in developing AI-based bioimage analysis methods: the model-centric and data-centric strategies (Fig. 1). In general, many endeavors align with the model-centric paradigm, where researchers iterate on models and algorithms against fixed benchmark datasets (Fig. 1a), such as the Cell Segmentation benchmark¹, the Light My Cell challenge (https://lightmycells.grand-challenge.org/) and the CellMap challenge (https://cellmapchallenge.janelia.org/), to achieve higher evaluation scores over the state-of-the-art. These researches have undeniably played a pivotal role in motivating the development of cutting-edge AI methods, propelling progress from AlexNet to ResNet and the subsequent evolution into Vision Transformers, as well as nnUNet². In contrast, the data-centric paradigm³ focuses on systematic data engineering, i.e., carefully preparing and refining datasets (e.g., data cleaning, de-biasing, etc.), usually with a fixed model architecture, to achieve the desired performance.

**Fig. 1: Model-centric AI vs. data-centric AI.**

In practice, the data-centric strategy shows much more relevance in bioimaging (but unfortunately much under-explored) than the model-centric strategy, as many practical challenges in bioimaging do not directly originate from readily available benchmark datasets. In this paper, we want to rethink existing deep learning methods for various bioimaging problems with a data-centric mindset, as illustrated in Fig. 1b, from which we distill a prototypical BioData-Centric framework, as a conceptual guide on future tool and method developments. Different from the general data-centric AI, commonly referred as a discipline studying how to systematically engineer the data to build AI systems, BioData-Centric AI is a framework (1) that biologists can use as a template to conceptually design an AI solution for their real-world biological applications, and (2) where bioimaging AI method developers can use as a bridge to adapt state-of-the-art algorithm in general data-centric AI to the bioimaging field. In other words, the BioData-Centric framework is to use or adapt related data engineering techniques in general data-centric AI to solve biological problems, instead of studying and researching how to do systematic data engineering.

A prototypical BioData-Centric framework for bioimaging AI

Distilling from solving various types of bioimage analysis problems in practice, a prototypical BioData-centric framework for AI-based bioimage analysis is depicted in Fig. 1c, including four key phases: Pretraining, Assessing the dataset, Hunting for mistakes (development phase), and Monitoring the performance (deployment phase). The core concept is to leverage the large quantity of data effectively and to iteratively improve the model with minimal (but not zero) human intervention to obtain reliable analysis.

The initial step aims to give the model a robust starting point, either by utilizing an existing model for related problems or by allowing the model to learn directly from the raw dataset in a self-supervised manner, attaining a pre-trained model. The pre-trained model then serves as an effective “probe” to assess the dataset, e.g., enabling the gathering of an appropriate subset of data for initial fine-tuning of the pre-trained model in a supervised (or weakly supervised) manner. Here, such dataset assessment may involve identifying outliers (e.g., poor-quality images), detecting potential bias (e.g., severe imbalance in phenotypes), selecting the most representative images for expert annotation, etc. After the initial fine-tuning, it is important to efficiently identify potential errors, such as data samples with which the model struggles the most, so that human experts can curate such results, with minimal effort, and provide additional supervision to further fine-tune the model. This process can be repeated iteratively with a human-in-the-loop until the desired performance is achieved. Then, the model can be deployed, e.g., for different experiments or high-throughput studies, with performance monitoring for quality control, even in the absence of ground truth references. When potential issues are detected, further human-in-the-loop curation and fine-tuning could be performed to enhance the model continuously, embodying the concept of “life-long learning”⁴.

To illustrate this framework, we use a real microscopy image segmentation problem as an example that many computational biology groups may have encountered in a similar way. We seek to use this simple example (1) to explain how bioimage analysis problems could be solved with the BioData-Centric framework and (2) to point out important directions bioimaging AI method developers in our community could take for further investigations.

An illustrating example: a vascular structure segmentation problem

Problem description

Over 800 three-dimensional (3D) microscopy images were collected under different conditions to investigate the vascular effects and injuries that could occur in mouse models of diseases, after external influences, or upon genetic modifications. Each image is a z-stack of size 1024 × 1024 pixels along XY (one pixel = 0.51 µm × 0.51 µm) with 7 Z-steps of step size 4.25 µm (See Supplementary for details). Our image analysis goal is to accurately segment all vessels in all different conditions, which will then be used for 3D spatial quantification in morphology (e.g., branch density, thickness, branching point density, etc.) and topology (e.g., loops, voids, etc.).

Considerable variations (e.g., different vessel morphology, different signal-to-noise ratio, etc., see Fig. 2a) in this large-scale data cohort make it impossible for a single classic vessel segmentation workflow (e.g., Frangi filter-based methods) to robustly work on all conditions. A deep learning-based solution holds great potential. Due to the limited number of z-slices in each z-stack, we chose to employ two-dimensional neural network models, which will be applied on the z-stack slice by slice. Considering the manual annotation cost, we divided our images into small patches of size 224 × 224 pixels for subsequent pre-training and training processes. At the beginning, there is no ground truth for training or no ready-to-use model (such as a pre-trained model or a foundation model). A current common practice in prevailing tools is manually annotating a large number of images and then training a deep learning model.

**Fig. 2: A vascular structure segmentation problem.**

Then, how can we approach this problem with the four-stage BioData-Centric framework? It is important to note that the exact algorithms used here are all basic algorithms from the literature, selected intentionally to illustrate the core concepts. Further variations or more sophisticated solutions will be discussed in the next section.

Pre-training: Roughly speaking, as long as the dataset to be analyzed contains more images than one can easily annotate manually, performing a pre-training on the entire raw dataset (or even including additional similar external datasets) is usually beneficial. Technically, in this example, we employed a Masked Autoencoder⁵ for self-supervised pre-training. This model learns to reconstruct masked parts of a raw image from the unmasked portions, allowing it to understand general patterns and features without manual annotations. The trained encoder will be used later in a specific segmentation model (TransUNet⁶ in this example) and further fine-tuned with a smaller annotated dataset to detect the structures of interest. This method reduces the amount of needed manual annotation by leveraging the knowledge gained during pre-training.
Assessing the dataset: Not all data is equally important for a model to learn. If we allocate a specific time budget for annotation, say ten images, to train a model, there have been extensive studies showing that randomly choosing ten images to annotate usually yields much worse performance than carefully selecting the ten most representative images (so-called “core set”) to annotate. To select the core set in this example, we first applied the encoder of the pre-trained model on the entire dataset to map each raw microscopy image patch into a high-dimensional feature vector (e.g., dimension = 256) forming a so-called latent space. Two raw image patches are similar to each other means their corresponding feature vectors in the latent space are close to each other. By analyzing the latent space, we want to select K “hub” images, as the core set, which are (1) distant from one another but (2) close to the majority of the dataset in the latent space, so that the K images can represent the characteristics of the entire dataset. Technically, we employed the max K-cover algorithm as used in ref. ⁷ to select K = 25 image patches (from 90,860 patches in total) as the core set (see Fig. 2b), which were then manually annotated by human experts and used to fine-tune a TransUNet (with the pre-trained encoder) to get the initial segmentation model M0.
Hunting for mistakes and hard cases iteratively: Even though the core set with 25 samples is very representative, it is still possible that the trained model may not be robust to all potential variations in the dataset. We could iteratively find more samples (image patches) that are “most challenging” (the so-called “critical set”) to the current model, and then curate them (i.e., inspect the segmentation results and manually fix segmentation errors if necessary) and add the curated data to the training set to improve the model. Technically, in this example, we employed a simple dropout algorithm as a Bayesian approximation⁸, which allows the model to generate not only segmentation but also an associated uncertainty value of the segmentation at each pixel. We then inspected 30 patches with the highest uncertainty and selected and curated three patches (as the “critical set”, see Fig. 2c) to enrich the training set. Then, we further fine-tuned the model M0 into M1 with the enriched training data. Figure 2d visualizes segmentation results from models M0 and M1, demonstrating the necessity of error hunting and the effectiveness of fine-tuning on the “critical set”. It is worth mentioning that the visual comparison in this toy example emphasizes that visual inspection could be sufficient at a certain stage in certain applications. In reality, minor errors in the results of model M1 could be further curated, i.e., may lead to M2, M3, etc., iteratively. It will depend on the downstream application whether we need a hold-out subset with manually annotated segmentation for evaluation (example quantitative evaluations on a benchmark can be found in the Supplementary). Here, we stopped at M1 only for the demonstration purpose.
Monitoring the performance: In practice, usually we do not have a large hold-out set with ground truth to validate the model performance, especially when new data are acquired, e.g., every month. From the data-centric point of view, we need to monitor the performance of the model when applied to images without ground truth continuously. Uncertainty, as in the previous stage, could be informative but is not necessarily correlated to accuracy in theory, due to potential mis-calibration⁹. Technically, in this example, we employed the reverse classification accuracy (RCA) technique¹⁰ to estimate the segmentation errors in an image without ground truth, as follows. For each new Z-stack (z = 7), say I₀, we trained a simple Random Forest pixel classifier using the current segmentation of I₀ (generated by the deployed model, i.e., M1 in this example) and then applied the pixel classifier on the core set. Since we know the ground truth for samples in the core set, we can calculate the maximum accuracy of the pixel classifier on the core set, to represent the accuracy of the current segmentation of I₀. The rationale for this approach is that if the current segmentation of I₀ is reasonable, then a pixel classifier trained with such segmentation should yield reasonable results for those representative patches (with ground truth) in the core set. Sample results can be found in Fig. 2e, where a higher RCA value indicates better segmentation. If major issues are detected, additional annotations could be recruited to fine-tune the model.

Beyond the simple example: further considerations and future directions

The aforementioned example aims to provide an intuitive illustration of our proposed four-stage BioData-Centric framework. Intentionally, we only chose simple algorithms in each stage to simplify the demonstration, which can definitely be further improved with more advanced techniques. The same framework can be realized for different problems with different variations, some of which have been explored preliminarily in the literature, while some require further investigation from method developers in our bioimaging community. In this section, we will revisit the four-stage data-centric framework from a broader perspective, comment on further technique considerations, and identify research gaps and future directions.

Effective fine-tuning

Fine-tuning is a fundamental building block in the entire data-centric framework. Making the fine-tuning process effective also requires data-centric considerations. For example, new training data are collected by curating the critical set. If the new data are much smaller than the previous training data, simply fine-tuning the model with only the new data may cause losing the knowledge learned before (the so-called catastrophic forgetting, which could be potentially tackled with sophisticated algorithms¹¹) and therefore cause overfitting to the new data. On the other hand, simply merging old and new training data may lead to ineffective training due to the imbalance in their sizes. Weighted combination, data augmentation of the new data, or representative sub-sampling on the old training set might be necessary.

The fine-tuning process could also be conducted using different techniques. The first example is dataset distillation¹² or dataset quantization¹³, where a small “stereotypical” dataset can be generated when deploying the model to represent the knowledge where the model was trained on. Then, further fine-tuning can be easily performed on a combination of this distilled dataset and new training data. Second, AI-assisted labeling¹⁴ or weak labels could be a cost-efficient solution to reduce human efforts and therefore might help generate necessary supervision for fine-tuning efficiently. Weak labels refer to incomplete or imprecise labels, e.g., bounding boxes¹⁵, point annotations¹⁶ or scribbles¹⁷. Some pilot works on weakly supervised microscopy image segmentation have already shown promising results^18,19,20.

Pre-training

Transfer learning (TL) and self-supervised learning (SSL) are two primary approaches for pre-training. They share the same spirit: make good use of a large-scale (as large as possible) set of data, regardless of the availability of ground truth and the similarity to the microscopy images to be analyzed. The two approaches are suitable for different tasks and different types of data²¹. There is no conclusive study yet on a one-size-fits-all solution. As a rule of thumb, the decision mainly depends on the specific problem and resource availability. For example, when dealing with a nuclei segmentation problem, say, on a special type of cell not widely studied in the literature, TL from a pre-trained Cellpose²² model could provide a quick solution. On the other hand, when dealing with an in silico labeling problem on brightfield images and without public pre-trained in silico labeling models, TL from a pre-trained brightfield image segmentation model may not be ideal, as the two tasks may require different perceptions of the brightfield images. In this case, SSL was recently demonstrated to be very effective in improving in silico prediction performance^23,24.

Different pre-training techniques have been developed for different types of models and different types of problems, e.g., pre-training for diffusion models for segmentation²⁵, hierarchical pre-training²⁶, and pre-training for video data²⁷ (could be applied for time-lapse microscopy). As a community, having a collection of largely pre-trained models, like in the medical imaging field²⁸, would be beneficial. Luckily, platforms like Bioimage Model Zoo are paving the path to this goal technically.

Finally, for certain problems, pre-training could also be done in a supervised fashion via pseudo-labels. Taking segmentation for example, this refers to “ground truth” generated automatically from another procedure (e.g., a classic image processing algorithm), despite it being error-prone. How to leverage the vast amount of pseudo-labels is still an active field in computer vision²⁹, but not yet well explored in bioimaging.

Assessing the dataset

Quality control on the dataset is important, especially for large-scale experiments. It might be necessary to clean the data before any analysis, such as removing duplicate records, mislabels, broken images, or poor-quality images, etc. There are several data cleaning tools in the computer vision community, like CleanLab (https://cleanlab.ai/), but not optimized for bioimages.

There are many different ways for constructing the core set³⁰, among which one important consideration is the optimal core set size. Possible solutions include the classic ones, like Cohen’s d (effective size, see supplementary experiments) or more advanced refined core set selection³¹.

Quantitatively assessing the quality of a dataset has been a long-standing problem in machine learning³². One key metric is the diversity, as surveyed in ref. ³³. A related metric on the opposite side of diversity is the bias in the dataset. Here, bias could refer to the amount of data from different sub-groups of the entire dataset (e.g., a rare phenotype in a large, diverse screening dataset). In this situation, data augmentation is particularly important in the BioData-Centric framework. This includes both basic operations, e.g., rotation or deformation, or more sophisticated strategies to mitigate bias^34,35. Other de-biasing approaches beyond data augmentation also exist, such as adversarial training in clinical AI applications³⁶, but not yet widely explored in bioimaging.

Hunting for mistakes and hard cases iteratively

Besides using the simple pixel-wise uncertainty estimation method for constructing the critical set, there are different types of potential errors, defined by different types of uncertainty⁹. For example, for a thin curve- or tube-like structure, e.g., microtubule, errors in one or two pixels could be viewed as negligible by the model from a pixel-wise point of view, but could be critical in the topological sense when such “small” errors occur right along the curve (connect or disconnect). Thus, a topological uncertainty estimation³⁷ would be beneficial. In addition, errors may also be “soft”. For example, in a mitosis classification problem, the model could make some very uncertain predictions on boundary cases, e.g., a cell near the end of mitosis can hardly be classified with a binary mitosis/interphase label. In this case, training and evaluation may need adjustment accordingly (e.g., training with soft labels³⁸). In order to fix those mistakes, in practice, especially when dealing with extremely large datasets, seamless integration of user-friendly tools and effective uncertainty estimation would be critical to ensure the curation work is practically manageable. So, it would be great if common interactive platforms or tools, such as the human-in-the-loop component in Cellpose³⁹ or the iterative curation in Allen Cell and Structure Segmenter⁴⁰, could be equipped with sophisticated “hunting” algorithms.

Monitoring the performance

Estimating the model performance on data without ground truth is still an open topic in the AI community. In the bioimaging field, it is still common to use the performance on a random hold-out set (with manual annotation when necessary) to represent the performance, but this could be problematic. First, this is only an average estimation over the given dataset (i.e., the dataset that the hold-out set is selected from), and cannot ensure catching individual failures or “outliers”. The RCA method used in the illustrating example provides a possible solution, but is still far from fully satisfactory in terms of computation efficiency and accuracy, and could be further improved (e.g., refs. ^41,42,43,44). Second, this cannot represent the performance on newly acquired data (i.e., outside the dataset from which the hold-out set is selected). This is especially common in biological studies (e.g., extending the study cohort during paper revision). In this scenario, it will be important to analyze the domain gap between old and new data and detect out-of-distribution samples⁴⁵ (e.g., initially studied on images of three types of cells, but later needing to include a new cell type with potential major morphological difference, or training an image restoration model on synthetic data but trying to apply it on real microscopy images, or training an in silico labeling model on images of fixed cells but trying to apply on images of live cells, etc.), where a special technique called “domain adaptation” or “test-time-training” could be used to improve the model performance. Domain adaptation is a widely studied topic in computer vision and medical imaging, and recently just started to be investigated in our bioimaging community⁴⁶, while test-time-training improves the model’s generalizability at test time with self-supervision training and started to be tested in microscopic images⁴⁷.

Discussions and conclusion

Model-centric AI vs. data-centric AI

As in the BioData-Centric framework described above, we believe the data-centric and model-centric approaches are not necessarily mutually exclusive and could actually be complementary in practice⁴⁸. For instance, the training data quality and quantity accumulate over an iterative data-centric workflow. When the training set becomes considerable in size, certain model-centric concepts, such as AutoML or nnUNet, can be employed to further improve performance.

Validation

The methods for evaluation and validation could evolve at different stages in the BioData-Centric framework. In practice, having a large hold-out set with ground truth is rare. In this situation, for example, visual inspection could suffice in the iterative “hunting for mistakes” stage. For quantitative metrics, common pitfalls have recently been summarized⁴⁹. We could consider collecting special experimental data to quantify biological validity⁵⁰. When applying the model to answer different biological questions, different metrics might be appropriate⁵¹.

Requirements for efficient data versioning and management tools

Data-centric approaches generally involve many iterations of data addition, removal, or adjustments, therefore posing a great challenge in properly versioning the data and systematic management. In recent years, data version control tools have emerged, such as DVC (https://dvc.org/) and Git LFS (https://git-lfs.com/), making data versioning as easy as code versioning on GitHub. But, there is no product tailored for bioimage data, while some existing data storage infrastructures may not even be compatible with data versioning (e.g., due to limited storage space). We would encourage consideration of integrating data version control when planning the next generation of major bioimaging data management platforms and core facility data storage infrastructures.

Foundation models

The rising trend of large foundation models has revolutionized, or sooner or later will radically change, how AI-based bioimage analysis works. We believe that if we can view foundation models from a data-centric perspective, the power could be further amplified. For example, the core of the new Segment Anything Model⁵² lies in over a billion annotated data obtained with a mix of various data-centric strategies (e.g., assisted manual labels and pseudo labels), which paves a viable path to build new foundation models, e.g., for general microscopy image restoration⁵³ or universal in silico labeling, etc. Another issue is that when applying foundation models for fully automatic bioimage analysis, especially on a large scale of microscopy images, it will be important to automatically alarm potential failures, as discussed above. Beyond vision foundation models, a Multimodal Large Language Model that generates descriptive information to capture the biological features in images could also be an effective way to help researchers identify key attributes of the dataset or even automatically identify potential errors⁵⁴. In a nutshell, the synergy between foundation models and data-centric AI will have the potential to redefine the bioimage analysis field.

In summary, we find that state-of-the-art data-centric AI algorithms in the broad machine learning community could shed light on improving bioimaging AI works in practice. When adapting general data-centric AI algorithms into the prototypical BioData-Centric framework, there are two key high-level adaptations and considerations:

The many facets of “BioData” makes “systematic data engineering” a complex interdisciplinary task rather than an algorithmic engineering problem: for example, different biological validations for different problems at different stages (e.g., may need special web-lab experiments to collect validation data), consideration of the biological context (e.g., prior knowledge in biology or microscopy may affect implementation strategies), etc.
The underlying biological application plays a decisive role in “BioData-Centric” AI: When should one start improving the machine learning model, instead of exclusively focusing on the data engineering? What types of mistakes are most critical to hunt for in the specific biological application? Which performance monitoring strategy is the most suitable one considering the specific biological assay? Etc. The key is to make sure it is “application-appropriate”.

Within this prototypical framework, we discussed major directions for consideration in practice and further exploration. We believe that revisiting any of the current bioimaging AI works through a data-centric lens will reveal new opportunities for further improvement from both the application and method development point of view.

Data availability

The implementation of the algorithms used in this example of data-centric workflow can be found at https://github.com/PKU-HMI/Data-Centric-Mindset-in-Bioimaging-AI.

Abbreviations

AlexNet:: One of the first artificial intelligence models that dramatically improved image recognition by using multiple hierarchical deep convolutional layers to extract visual features.
ResNet:: A special artificial intelligence model that uses “shortcut connections” to train very deep neural networks effectively to achieve human-level performance on image recognition for the first time.
Vision Transformer:: A special artificial intelligence model that analyzes images by breaking them into smaller patches and processing them in sequence to recognize patterns and features.
nnUNet:: An artificial intelligence model that automatically adapts itself to segment biomedical images, enabling accurate identification of structures across various biomedical datasets.
TransUNet:: An artificial intelligence model that combines convolutional neural networks with Transformers to accurately segment images by capturing both local details and global context.
Self-supervised learning:: An artificial intelligence method where models teach themselves to recognize patterns within data by using the data’s inherent structure as guidance (e.g., by predicting one part of the data from another).
Foundation models:: Large-scale artificial intelligence models trained on extensive datasets to serve as a base for various tasks.
Latent space:: An internal representation where AI models encode data into a compressed form, capturing underlying patterns.

References

Ma, J. et al. The multimodality cell segmentation challenge: toward universal solutions. Nat. Methods 21, 1103–1113 (2024).
Article CAS PubMed PubMed Central Google Scholar
Isensee, F., Jaeger, P. F., Kohl, S. A. A., Petersen, J. & Maier-Hein, K. H. nnU-Net: a self-configuring method for deep learning-based biomedical image segmentation. Nat. Methods 18, 203–211 (2021).
Article CAS PubMed Google Scholar
Zha, D. et al. Data-centric artificial intelligence: a survey. ACM Computing Surveys https://doi.org/10.1145/3711118 (2023).
Parisi, G. I., Kemker, R., Part, J. L., Kanan, C. & Wermter, S. Continual lifelong learning with neural networks: a review. Neural Netw 113, 54–71 (2019).
Article PubMed Google Scholar
He, K. et al. Masked autoencoders are scalable vision learners. In 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 15979–15988 (IEEE, 2022).
Chen, J. et al. TransUNet: Rethinking the U-Net architecture design for medical image segmentation through the lens of transformers. Medical Image Analysis. 97, 103280 (2024).
Yang, L. et al. Suggestive annotation: a deep active learning framework for biomedical image segmentation. In Medical Image Computing and Computer Assisted Intervention − MICCAI 2017 (eds Descoteaux, M. et al.) 399–407 (Springer International Publishing, 2017).
Gal, Y. & Ghahramani, Z. Dropout as a Bayesian approximation: representing model uncertainty in deep learning. In Proc. 33rd International Conference on Machine Learning (eds Balcan, M. F. & Weinberger, K. Q.) 1050–1059 (PMLR, Proceedings of Machine Learning Research, 2016).
Gawlikowski, J. et al. A survey of uncertainty in deep neural networks. Artif. Intell. Rev. https://doi.org/10.1007/s10462-023-10562-9 (2023)
Valindria, V. V. et al. Reverse classification accuracy: predicting segmentation performance in the absence of ground truth. IEEE Trans. Med. Imaging 36, 1597–1606 (2017).
Article PubMed Google Scholar
Kirkpatrick, J. et al. Overcoming catastrophic forgetting in neural networks. Proc. Natl. Acad. Sci. USA 114, 3521–3526 (2017).
Article CAS PubMed PubMed Central Google Scholar
Sachdeva, N. & McAuley, J. Data distillation: a survey. Transactions on Machine Learning Research. https://openreview.net/forum?id=lmXMXP74TO (2024).
Zhou, D. et al. Dataset quantization. In 2023 IEEE/CVF International Conference on Computer Vision (ICCV) 17159–17170 (IEEE, 2023).
Mots’oehli, M. Assistive image annotation systems with deep learning and natural language capabilities: a review. 2024 International Conference on Emerging Trends in Networks and Computer Communications (ETNCC), 1–9 Windhoek, Namibia, https://doi.org/10.1109/ETNCC63262.2024.10767526 (2024).
Khoreva, A., Benenson, R., Hosang, J., Hein, M. & Schiele, B. Simple does it: weakly supervised instance and semantic segmentation. In 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) 1665–1674 (IEEE, 2017).
Gao, Z., Puttapirat, P., Shi, J. & Li, C. Renal cell carcinoma detection and subtyping with minimal point-based annotation in whole-slide images. In Medical Image Computing and Computer Assisted Intervention – MICCAI 2020. MICCAI 2020. Lecture Notes in Computer Science (eds Martel, A. L. et al.) 439–448 (Springer International Publishing, 2020).
Luo, X. et al. Scribble-supervised medical image segmentation via dual-branch network and dynamically mixed pseudo labels supervision. In Medical Image Computing and Computer Assisted Intervention – MICCAI 2022 (eds Wang, L., Dou, Q., Fletcher, P. T., Speidel, S. & Li, S.) 528–538 (Springer Nature Switzerland, 2022).
Lee, H. & Jeong, W.-K. Scribble2Label: scribble-supervised cell segmentation via self-generating pseudo-labels with consistency. In Medical Image Computing and Computer Assisted Intervention – MICCAI 2020 (eds Martel, A. L. et al.) 14–23 (Springer International Publishing, 2020).
Ke, R., Bugeau, A., Papadakis, N., Schuetz, P. & Schönlieb, C.-B. Learning to segment microscopy images with lazy labels. In Computer Vision – ECCV 2020 Workshops (eds Bartoli, A. & Fusiello, A.) 411–428 (Springer International Publishing, 2020).
Li, S. et al. Point-supervised segmentation of microscopy images and volumes via objectness regularization. In 2021 IEEE 18th International Symposium on Biomedical Imaging (ISBI) 1558–1562 (IEEE, 2021).
Yang, X. et al. Transfer learning or self-supervised learning? A tale of two pretraining paradigms. Preprint at https://doi.org/10.48550/ARXIV.2007.04234 (2020).
Stringer, C., Wang, T., Michaelos, M. & Pachitariu, M. Cellpose: a generalist algorithm for cellular segmentation. Nat. Methods 18, 100–106 (2021).
Article CAS PubMed Google Scholar
Le, T. & Lundberg, E. High-resolution in silico painting with generative models. Preprint at https://doi.org/10.1101/2024.05.31.596710 (2024).
Zhou, Y., Zhao, S., Sonneck, J. & Chen, J. 2D label-free prediction of multiple organelles across different transmitted-light microscopy images with bag-of-experts, 2024 IEEE International Symposium on Biomedical Imaging (ISBI), Athens, Greece, 2024, pp. 1–4, https://doi.org/10.1109/ISBI56570.2024.10635298 (2024).
Baranchuk, D., Rubachev, I., Voynov, A., Khrulkov, V. & Babenko, A. Label-efficient semantic segmentation with diffusion models. https://doi.org/10.48550/ARXIV.2112.03126 (2021).
Reed, C. J. et al. Self-supervised pretraining improves self-supervised pretraining. In 2022 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV) 1050–1060 (IEEE, 2022).
Wang, L. et al. VideoMAE V2: scaling video masked autoencoders with dual masking. In 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 14549–14560 (IEEE, 2023).
Zhou, Z., Sodha, V., Pang, J., Gotway, M. B. & Liang, J. Models genesis. Med. Image Anal. 67, 101840 (2021).
Article PubMed Google Scholar
Wang, Y. et al. Semi-supervised semantic segmentation using unreliable pseudo-labels. In 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 4238–4247 (IEEE, 2022).
Guo, C., Zhao, B. & Bai, Y. DeepCore: a comprehensive library for coreset selection in deep learning. In Database and Expert Systems Applications (eds Strauss, C., Cuzzocrea, A., Kotsis, G., Tjoa, A. M. & Khalil, I.) 181–195 (Springer International Publishing, 2022).
Xia, X. et al. Refined coreset selection: towards minimal coreset size under model performance constraints. In Proc. 41st International Conference on Machine Learning (eds Salakhutdinov, R. et al.) 54082–54103 (PMLR, Proceedings of Machine Learning Research, 2024).
Gong, Y., Liu, G., Xue, Y., Li, R. & Meng, L. A survey on dataset quality in machine learning. Inf. Softw. Technol. 162, 107268 (2023).
Article Google Scholar
Zhao, D. et al. Position: measure dataset diversity, don’t just claim it. In Proc. 41st International Conference on Machine Learning (eds Salakhutdinov, R. et al.) 60644–60673 (PMLR, Proceedings of Machine Learning Research, 2024).
Zhang, H., Cisse, M., Dauphin, Y. N. & Lopez-Paz, D. mixup: Beyond empirical risk minimization. 6th International Conference on Learning Representations (ICLR), https://openreview.net/forum?id=r1Ddp1-Rb (2017).
Chawla, N. V., Bowyer, K. W., Hall, L. O. & Kegelmeyer, W. P. SMOTE: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002).
Article Google Scholar
Yang, J., Soltan, A. A. S., Eyre, D. W., Yang, Y. & Clifton, D. A. An adversarial training framework for mitigating algorithmic biases in clinical machine learning. npj Digit. Med. 6, 55 (2023).
Article PubMed PubMed Central Google Scholar
Gupta, S., Zhang, Y., Hu, X., Prasanna, P. & Chen, C. Topology-aware uncertainty for image segmentation. https://doi.org/10.48550/ARXIV.2306.05671 (2023).
Collins, K. M., Bhatt, U. & Weller, A. Eliciting and learning with soft labels from every annotator. In AAAI Conference on Human Computation & Crowdsourcing (AAAI, 2022).
Pachitariu, M. & Stringer, C. Cellpose 2.0: how to train your own model. Nat. Methods 19, 1634–1641 (2022).
Article CAS PubMed PubMed Central Google Scholar
Chen, J. et al. The Allen Cell and Structure Segmenter: a new open source toolkit for segmenting 3D intracellular structures in fluorescence microscopy images. Preprint at bioRxiv https://doi.org/10.1101/491035 (2018).
Zaman, F. A., Roy, T. K., Sonka, M. & Wu, X. Patch-wise 3D segmentation quality assessment combining reconstruction and regression networks. J. Med. Imaging 10, 054002 (2023).
Chen, H. & Murphy, R. F. Evaluation of cell segmentation methods without reference segmentations. Mol. Biol. Cell 34, ar50 (2023).
Article CAS PubMed PubMed Central Google Scholar
Kohlberger, T., Singh, V., Alvino, C., Bahlmann, C. & Grady, L. Evaluating segmentation error without ground truth. In Medical Image Computing and Computer-Assisted Intervention – MICCAI 2012 (eds Ayache, N., Delingette, H., Golland, P. & Mori, K.) 528–536 (Springer, Berlin, Heidelberg, 2012).
Sims, Z. et al. SEG: segmentation evaluation in absence of ground truth labels. Preprint at bioRxiv https://doi.org/10.1101/2023.02.23.529809 (2023).
Yang, J., Zhou, K., Li, Y. & Liu, Z. Generalized out-of-distribution detection: a survey. Int J Comput Vis 132, 5635–5662 (2024).
Article Google Scholar
Archit, A. & Pape, C. Probabilistic domain adaptation for biomedical image segmentation. Preprint at https://doi.org/10.48550/ARXIV.2303.11790 (2023).
Mansour, Y. et al. TTT-MIM: test-time training with masked image modeling for denoising distribution shifts. In Computer Vision – ECCV 2024 (eds Leonardis, A. et al.) 341–357 (Springer Nature Switzerland, 2025).
Lee, G. et al. MEDIAR: harmony of data-centric and model-centric for multi-modality microscopy. In Proc. Cell Segmentation Challenge in Multi-modality High-Resolution Microscopy Images (eds Ma, J. et al.) 1–16 (PMLR, Proceedings of Machine Learning Research, 2023).
Reinke, A. et al. Understanding metric-related pitfalls in image analysis validation. Nat. Methods 21, 182–194 (2024).
Article CAS PubMed PubMed Central Google Scholar
Sonneck, J., Zhao, S. & Chen, J. On the risk of manual annotations in 3D confocal microscopy image segmentation. In 2023 IEEE/CVF International Conference on Computer Vision Workshops (ICCVW) 3896–3904 (IEEE, 2023).
Chen, J., Viana, M. P. & Rafelski, S. M. When seeing is not believing: application-appropriate validation matters for quantitative bioimage analysis. Nat. Methods 20, 968–970 (2023).
Article CAS PubMed Google Scholar
Archit, A. et al. Segment anything for microscopy. Nat. Methods 22, 579–591 (2025).
Article CAS PubMed PubMed Central Google Scholar
Ma, C., Tan, W., He, R. & Yan, B. Pretraining a foundation model for generalizable fluorescence microscopy-based image restoration. Nat. Methods https://doi.org/10.1038/s41592-024-02244-3 (2024).
Zhang, S., Dai, G., Huang, T. & Chen, J. Multimodal large language models for bioimage analysis. Nat. Methods 21, 1390–1393 (2024).
Article CAS PubMed Google Scholar

Download references

Acknowledgements

Y.Z., S.Z., and J.C. were supported by the Federal Ministry of Education and Research (BMBF) in Germany under the funding reference 161L0272, and also supported by the Ministry of Culture and Science (MKW) of the State of North Rhine-Westphalia. H.W. was supported by Beijing Natural Science Foundation Youth Fund (Grant No. 4254093). J.C. and S.Z. were supported by the National Science and Technology Major Project (No. 2022ZD0117800). J.W. was supported by the German Research Foundation (Deutsche Forschungsgemeinschaft, DFG) (reference WE 6456/1-1). M.S. was supported by the European Research Council (ERC) under the European Union’s Horizon 2020 research and innovation program (grant agreement No. 810331).

Author information

Authors and Affiliations

National Key Laboratory for Multimedia Information Processing, School of Computer Science, Peking University, Beijing, China
Jiajun Cao & Shanghang Zhang
Institute for Experimental and Clinical Pharmacology and Toxicology, Center of Brain, Behavior and Metabolism (CBBM), University of Lübeck, Lübeck, Germany
Jan Wenzel, Josephine Lampe & Markus Schwaninger
DZHK (German Research Centre for Cardiovascular Research), Hamburg-Lübeck-Kiel, Germany
Jan Wenzel, Josephine Lampe & Markus Schwaninger
Academy for Multidisciplinary Studies, Capital Normal University, Beijing, China
Hongxiao Wang
Stony Brook University, Stony Brook, NY, USA
Jiachen Yao & Chao Chen
VCIP & TMCC & DISSec, College of Computer Science, Nankai University, Tianjin, China
Zhicheng Zhang & Jufeng Yang
Nankai International Advanced Research Institute, Shenzhen, China
Zhicheng Zhang & Jufeng Yang
Leibniz-Institut für Analytische Wissenschaften – ISAS – e.V., Dortmund, Germany
Shuo Zhao, Yu Zhou & Jianxu Chen
Faculty of Computer Science, Ruhr University Bochum, Bochum, Germany
Yu Zhou
Department of Computer Science and Engineering, University of Notre Dame, Notre Dame, IN, USA
Danny Z. Chen

Authors

Jiajun Cao
View author publications
Search author on:PubMed Google Scholar
Jan Wenzel
View author publications
Search author on:PubMed Google Scholar
Shanghang Zhang
View author publications
Search author on:PubMed Google Scholar
Josephine Lampe
View author publications
Search author on:PubMed Google Scholar
Hongxiao Wang
View author publications
Search author on:PubMed Google Scholar
Jiachen Yao
View author publications
Search author on:PubMed Google Scholar
Zhicheng Zhang
View author publications
Search author on:PubMed Google Scholar
Shuo Zhao
View author publications
Search author on:PubMed Google Scholar
Yu Zhou
View author publications
Search author on:PubMed Google Scholar
Chao Chen
View author publications
Search author on:PubMed Google Scholar
Markus Schwaninger
View author publications
Search author on:PubMed Google Scholar
Jufeng Yang
View author publications
Search author on:PubMed Google Scholar
Danny Z. Chen
View author publications
Search author on:PubMed Google Scholar
Jianxu Chen
View author publications
Search author on:PubMed Google Scholar

Contributions

J. Cao did all the implementation and experiments with supervision from S. Zhang and J. Chen. J. Cao, S. Zhang, H. Wang, J. Yao, Z. Zhang, S. Zhao, Y. Zhou, C. Chen, and J. Yang conducted the comprehensive literature survey, each focusing on a different topic. J. Wanzel, J. Lampe, and M. Schwaninger conducted the biomedical experiments and image collection, and also provided biomedical expertise in the entire project. J. Chen conceived the original idea and wrote the first draft. D. Chen provided a major revision of the first draft. All authors provided critical feedback throughout the project and contributed to the final version of the manuscript.

Corresponding author

Correspondence to Jianxu Chen.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

Supplementary Information

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Cao, J., Wenzel, J., Zhang, S. et al. Rethinking deep learning in bioimaging through a data centric lens. npj Imaging 3, 29 (2025). https://doi.org/10.1038/s44303-025-00092-0

Download citation

Received: 13 January 2025
Accepted: 30 May 2025
Published: 26 June 2025
DOI: https://doi.org/10.1038/s44303-025-00092-0