An annotated dataset of Gram stains from positive blood cultures

Yi, Qiaolian; Gou, Xiaoyan; Zhu, Renyuan; Xie, Xiuli; Hu, Mengting; Wang, Xing; Wang, Tai’e; Xu, Kaiwen; Xu, Ying-Chun

doi:10.1038/s41597-026-06651-3

Download PDF

Data Descriptor
Open access
Published: 23 January 2026

An annotated dataset of Gram stains from positive blood cultures

Qiaolian Yi¹,
Xiaoyan Gou¹,
Renyuan Zhu¹,
Xiuli Xie¹,
Mengting Hu¹,
Xing Wang¹,
Tai’e Wang¹,
Kaiwen Xu¹ &
…
Ying-Chun Xu¹

Scientific Data , Article number: (2026) Cite this article

967 Accesses
1 Altmetric
Metrics details

We are providing an unedited version of this manuscript to give early access to its findings. Before final publication, the manuscript will undergo further editing. Please note there may be errors present which affect the content, and all legal disclaimers apply.

Subjects

Abstract

Bloodstream infections (BSIs) of high morbidity and mortality are across all age groups, and urgent for accurate intervention. Gram stain interpretation of positive blood cultures (PBCs) is crucial for early diagnosing BSIs, yet this manual process is labor-intensive, time-consuming, and highly operator-dependent. Artificial intelligence (AI)-assisted microscopic interpretation of stained smears presents beneficial to microbiology diagnostics. Addressing the auto-identification of blood-culture Gram stains, this study introduces a dataset of Gram-stain smears collected in clinical practice. The dataset includes 505 microscopic images, covering up to 57 species associated with BSIs, with a total of 7528 annotations. These annotations categorized by staining characteristics and morphological features into cocci, bacilli, and fungi. We trained and validated an object detection model based on the YOLOv10 architecture on this dataset to automatically localize and classify these morphological categories in microscopic images. The publicly released dataset will help developments that utilize artificial intelligence to auto-interpretate the Gram stains from PBCs for routine clinical application.

Monitoring for early prediction of gram-negative bacteremia using machine learning and hematological data in the emergency department

Article Open access 19 November 2025

Bacterial profile, antimicrobial susceptibility patterns, and associated factors among bloodstream infection suspected patients attending Arba Minch General Hospital, Ethiopia

Article Open access 05 August 2021

Rapid diagnosis of bloodstream infections using a culture-free phenotypic platform

Article Open access 23 April 2024

Data availability

The dataset is available at the Figshare repository¹⁶.

Code availability

The annotation tool used for the dataset labelling is publicly available in GitHub, https://github.com/jsbroks/coco-annotator/. The customizable Image Annotation Tools used for the dataset labelling technical check is available from https://github.com/KeyOfSpectator/ImageAnnotationTools, including Double Check IoU Annotation Tool and COCO Json Merge/Split Tool.

References

Jin, L. et al. Clinical Profile, Prognostic Factors, and Outcome Prediction in Hospitalized Patients With Bloodstream Infection: Results From a 10-Year Prospective Multicenter Study. Front Med (Lausanne) 8 (2021).
Dubourg, G., Raoult, D. & Fenollar, F. Emerging methodologies for pathogen identification in bloodstream infections: an update. Expert Rev Mol Diagn 19, 161–173 (2019).
Google Scholar
Adrie, C. et al. Attributable mortality of ICU-acquired bloodstream infections: Impact of the source, causative micro-organism, resistance profile and antimicrobial therapy. J Infect 74, 131–141 (2017).
Google Scholar
Ikuta, K. S. et al. Global mortality associated with 33 bacterial pathogens in 2019: a systematic analysis for the Global Burden of Disease Study 2019. The Lancet 400, 2221–2248 (2022).
Google Scholar
Timsit, J. F., Ruppé, E., Barbier, F., Tabah, A. & Bassetti, M. Bloodstream infections in critically ill patients: an expert statement. Intensive Care Med 46, 266–284 (2020).
Google Scholar
Cecconi, M., Evans, L., Levy, M. & Rhodes, A. Sepsis and septic shock. The Lancet 392, 75–87 (2018).
Google Scholar
Kern, W. V. & Rieg, S. Burden of bacterial bloodstream infection—a brief update on epidemiology and significance of multidrug-resistant pathogens. Clinical Microbiology and Infection 26, 151–157 (2020).
Google Scholar
Pien, B. C. et al. The clinical and prognostic importance of positive blood cultures in adults. American Journal of Medicine 123, 819–828 (2010).
Google Scholar
Lamy, B., Sundqvist, M. & Idelevich, E. A. Bloodstream infections – Standard and progress in pathogen diagnostics. Clinical Microbiology and Infection 26, 142–150 (2020).
Google Scholar
Ito, H. et al. The role of Gram stain in reducing broad-spectrum antibiotic use: A systematic literature review and meta-analysis. Infect Dis Now 53, 104764 (2023).
Google Scholar
Thomson, R. B. One small step for the Gram stain, one giant leap for clinical microbiology. J Clin Microbiol 54, 1416–1417 (2016).
Google Scholar
Smith, K. P. & Kirby, J. E. Image analysis and artificial intelligence in infectious disease diagnostics. Clinical Microbiology and Infection 26, 1318–1323 (2020).
Google Scholar
Smith, K. P., Kang, A. D. & Kirby, J. E. Automated interpretation of blood culture Gram stains by use of a deep convolutional neural network. J Clin Microbiol 56 (2018).
Walter, C. et al. Performance evaluation of machine-assisted interpretation of Gram stains from positive blood cultures. J Clin Microbiol 62 (2024).
Makrai, L. et al. Annotated dataset for deep-learning-based bacterial colony detection. Sci Data 10, 497 (2023).
Google Scholar
Yi, Q. et al. An annotated dataset of Gram stains from positive blood cultures. Figshare. https://doi.org/10.6084/m9.figshare.26004610

Download references

Acknowledgements

We appreciate the help from Mr. Shichun Feng for technical validation. This work has been supported by the Noncommunicable Chronic Diseases-National Science and Technology Major Project [No.2024ZD0532800], Peking Union Medical College Hospital Talent Cultivation Program-Category D [No. UHB12289], and Young Elite Scientists Sponsorship Program of the Beijing High Innovation Plan.

Author information

Authors and Affiliations

Department of Laboratory Medicine, Peking Union Medical College Hospital, Chinese Academy of Medical Sciences and Peking Union Medical College, Beijing, 100730, China
Qiaolian Yi, Xiaoyan Gou, Renyuan Zhu, Xiuli Xie, Mengting Hu, Xing Wang, Tai’e Wang, Kaiwen Xu & Ying-Chun Xu

Authors

Qiaolian Yi
View author publications
Search author on:PubMed Google Scholar
Xiaoyan Gou
View author publications
Search author on:PubMed Google Scholar
Renyuan Zhu
View author publications
Search author on:PubMed Google Scholar
Xiuli Xie
View author publications
Search author on:PubMed Google Scholar
Mengting Hu
View author publications
Search author on:PubMed Google Scholar
Xing Wang
View author publications
Search author on:PubMed Google Scholar
Tai’e Wang
View author publications
Search author on:PubMed Google Scholar
Kaiwen Xu
View author publications
Search author on:PubMed Google Scholar
Ying-Chun Xu
View author publications
Search author on:PubMed Google Scholar

Contributions

Q.Y. and Y.X. conceived the concept of the work. X.G., M.H. X.W. and T.W. performed the microorganism identification. Q.Y., X.G., R.Z. K.X. and X.X. made the digital images and annotated the images. R.Z. K.X. and X.X. curated the digital images. Q.Y. drafted the manuscript and performed the technical validation. Q.Y. and Y.X. had overarching administrative responsibility for the project. All authors read and approved the final manuscript.

Corresponding authors

Correspondence to Qiaolian Yi or Ying-Chun Xu.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.

Reprints and permissions

About this article

Cite this article

Yi, Q., Gou, X., Zhu, R. et al. An annotated dataset of Gram stains from positive blood cultures. Sci Data (2026). https://doi.org/10.1038/s41597-026-06651-3

Download citation

Received: 05 May 2025
Accepted: 19 January 2026
Published: 23 January 2026
DOI: https://doi.org/10.1038/s41597-026-06651-3