Background & Summary

Non-sterile products (NSPs), which include pharmaceuticals, cosmetics, and medical devices, are extensively utilized. While these products are not mandated to be sterile, they are nevertheless required to adhere to specific microbiological purity standards to safeguard patient health1. Microbial contamination in NSPs has the potential to alter product effectiveness2,3. In particular, immunocompromised individuals may face severe infections due to contaminated NSPs4,5,6. The designation “objectionable microorganisms” (OMs) refers to microbial species that pose a risk of harm when present in NSPs7.

Pharmacopoeias, including the Chinese Pharmacopoeia (ChP), United States Pharmacopoeia (USP), and European Pharmacopoeia (EP), define microbial limit testing procedures and establish control requirements for a limited selection of microbial reference strains8,9,10. Notable examples of these strains include Pseudomonas aeruginosa, Staphylococcus aureus, Salmonella spp., and Candida albicans11,12. However, numerous other microbial species not included in these compendia may still be classified as OMs under certain conditions13,14. The classification of a microorganism as objectionable is dependent on contextual factors, including the product’s dosage form, the route of administration, the target population, and the contamination level15,16. Currently, there is a lack of a comprehensive dataset that systematically catalogs OMs along with their associated risks in NSPs. Furthermore, the absence of standardized decision-support tools complicates regulatory assessment of newly identified contaminants, while fragmented information across different datasets hinders comprehensive risk evaluation17,18. This regulatory uncertainty has contributed to significant economic losses through product recalls and market withdrawals, with microbial contamination being a leading cause of pharmaceutical and cosmetic product recalls worldwide19. Moreover, advances in molecular detection technologies have enabled identification of previously unrecognizable microorganisms20, yet the integration of genomic data, antimicrobial resistance profiles, and virulence factors into practical risk assessment frameworks remains inadequate21.

In this study, we present the release of the Non-sterile Product Objectionable Microorganism Dataset (NSPOMdb). NSPOMdb has three major features: (i) It curates a total of 1,360 recall events related to microbial contamination in NSPs. This includes 989 pharmaceutical products and 371 cosmetic products. Detailed information regarding dosage forms and contaminating microbial genera or species was manually extracted from original product descriptions and recall reasons. (ii) It provides a curated compilation of 89 potentially OMs with comprehensive information. Antimicrobial resistance genes and virulence factor genes were predicted from 39,426 genome sequences of these potentially OMs, revealing widespread multidrug resistance and diverse virulence factor profiles among common microbial contaminants. (iii) It offers a program to predict antimicrobial resistance genes (ARGs) and virulence factor genes (VFGs) in microbial genome sequences. Existing resources, such as the ARG database CARD22, the VFG database VFDB23, and the bacterial diversity database BacDive24, provide bacterial resistance, virulence, or strain-level phenotypic information individually; however, none establish direct links between microbial characteristics and contamination events in non-sterile products. NSPOMdb fills this gap by integrating recall event data, curated information on objectionable microorganisms, and genomic annotations into a unified, application-oriented dataset for microbial risk evaluation. NSPOMdb may assist manufacturers and regulators in managing microbial safety and serve as a foundation for future improvements in risk prediction and regulatory practices.

Methods

Data collection of NSP recall events

The recall events were collected from various publicly accessible websites of regulatory agencies. These include the U.S. Food and Drug Administration (FDA, https://www.fda.gov/safety/recalls-market-withdrawals-safety-alerts), the European Commission Safety Gate (https://ec.europa.eu/safety-gate-alerts/screen/search), the Medicines and Healthcare products Regulatory Agency (MHRA, https://www.gov.uk/drug-device-alerts), and the Therapeutic Goods Administration (TGA, https://apps.tga.gov.au/Prod/DRAC/arn-entry.aspx). Keywords “non-sterile”, “microorganism”, “biological”, “contaminated”, and “contamination” were used to filter non-sterile product recall events related to microorganism contamination. The contaminating microorganisms and product dosage form for each recall event were manually extracted from the product descriptions and recall notices. For the recalls without clearly identified contaminating microorganisms or without identifiable dosage, the forms were kept in the dataset, but the corresponding fields were left blank. As of June 15, 2025, NSPOMdb contains 1,360 non-sterile product recall events related to microorganism contamination (Table 1).

Table 1 The number of recall events from different sources.

Compilation of the potentially objectionable microorganism list

In the 2025 edition of the Chinese Pharmacopoeia, Burkholderia cepacia complex, Escherichia coli, Salmonella, Pseudomonas aeruginosa, Staphylococcus aureus, Clostridioides, and Candida albicans are listed as target organisms for microbial limit testing in NSPs. However, past NSP recall events have demonstrated that many other microorganisms can contaminate NSPs, leading to quality issues or health risks. Here, based on collected recall data and industry survey results, we compiled a list of potentially OMs comprising 89 entries. The list of 89 potentially objectionable microorganisms was assembled from real-world data using three complementary sources: (i) microorganisms explicitly specified in international pharmacopeial standards and regulatory guidances, (ii) microorganisms documented in adverse event reports, recall notices, and regulatory warning letters, and (iii) microorganisms reported in the published literature as being associated with clinical adverse events related to pharmaceutical products. For genus-level records, several representative species were listed based on their frequency in contamination reports and their known pathogenicity. Moreover, to assist users in identifying and evaluating microorganisms present in their products, we collected information on the typical morphological, physiological, biochemical, genomic, and risk-related features of these organisms. This information was obtained from BacDive24, MicrobeWiki (https://microbewiki.kenyon.edu/index.php/MicrobeWiki), Microbe Canvas (https://microbe-canvas.com/), FungalTraits25,as well as relevant biosafety regulatory and guidance documents26,27,28,29. We also provide genome data through direct links to BV-BRC30 and MycoBank31.

Prediction of antimicrobial resistance genes and virulence factor genes using ARG-VFG-finder

ARG-VFG-finder is an automated pipeline in NSPOMdb for annotating antimicrobial resistance genes and virulence factor genes from microbial sequences. The tool accepts bacterial DNA sequences or fungal protein sequences as input. The pipeline follows different processing workflows depending on the input sequence type (Fig. 1). For bacterial genomes, sequences are processed via Prokka32 for protein prediction (Prokka v1.1.3 was used in this study, and future updates of NSPOMdb will adopt Bakta33 or other actively maintained annotation tools). This is followed by Abricate34 and BLASTp35 searches against ResFinder (v4.0)36 and VFDB (2025 release)23 databases, respectively. For fungal protein sequences, the tool uses HMMER37 and BLASTp searches against ResFungi (2024 release)38 and DFVF (2012 release)39 databases. The tool outputs predicted ARGs and VFGs. This tool provides users with a comprehensive assessment of the resistance and virulence profiles associated with each microorganism. For BLASTp searches, detection thresholds were set at a minimum sequence identity of 80% and an alignment coverage of 80%. For HMMER searches, hits were retained only with the sequence score ≥ 100 and the domain score ≥ 50. Redundant or overlapping hits were filtered to retain the best-scoring matches based on bit score and coverage.

Fig. 1
Fig. 1
Full size image

The pipeline of ARG-VFG-finder provided by NSPOMdb. It takes bacterial DNA sequences or fungal protein sequences as input to predict antimicrobial resistance genes and virulence factor genes.

Genome sequences of the 89 potentially objectionable microorganisms were obtained from public NCBI repositories prior to downstream annotation. Accession numbers for all available bacterial genomes were first retrieved from the NCBI RefSeq FTP site (https://ftp.ncbi.nlm.nih.gov/genomes/refseq/bacteria/), and those for fungal genomes were retrieved from the GenBank FTP site (https://ftp.ncbi.nlm.nih.gov/genomes/genbank/fungi/). The accession lists were then filtered by species names corresponding to the 89 target microorganisms. The matched genome assemblies were downloaded using the NCBI command-line toolkit. Only assemblies labeled as “Complete Genome” or “Chromosome” for bacteria, and “Chromosome” or “Scaffold” for fungi, were retained.

Data Records

The NSPOMdb dataset is available in two complementary formats: an Excel file on figshare (https://doi.org/10.6084/m9.figshare.30085240.v1)40. The dataset structure was designed using an entity-relationship (ER) model, shown in Fig. 2. The recall event dataset includes 1,360 historical recall events related to microbial contamination in non-sterile products. It provides details on product descriptions, dosage forms, and contaminating microorganisms. The objectionable microorganism dataset contains curated data on 89 potentially objectionable microorganisms. Each entry is annotated for taxonomy, morphology, physiological and biochemical characteristics, risk-related information, and sequence data. The genomic annotation dataset integrates predicted ARGs and VFGs from 39,626 microbial genome sequences. This offers comprehensive genomic insights into objectionable microorganisms.

Fig. 2
Fig. 2
Full size image

Entity-relationship diagram of NSPOMdb dataset structure. The connecting lines between tables illustrate the relational links and foreign key relationships among dataset fields.

In addition to these core datasets, NSPOMdb also provides structured reference datasets related to dosage forms, routes of administration, and patient populations, compiled from General Rule 1107 and Guideline Principle 9211 of the Chinese Pharmacopoeia (2025 Edition). The dosage form dataset includes information on common non-sterile product dosage forms, their typical water activity levels, and corresponding microbiological control standards. The route of administration dataset and patient population dataset summarize the relative risk levels associated with different administration routes and patient categories. Together, these datasets form a unified resource for exploring microbial contamination patterns, identifying high-risk organisms, and supporting data-driven microbial risk assessment in non-sterile products.

Data Overview

NSPOMdb contained eighty-nine entries in the list of potentially OMs in NSPs. This list includes 6 genera, 1 species complex, and 82 individual species. The compilation covers 59 bacterial and 30 fungal organisms. These microorganisms were systematically curated from NSP recall events, industry surveys, and clinical case reports, with selection criteria based on their prevalence in contamination incidents and pathogenic potential. The compiled list encompasses the most frequently reported microorganisms in NSP recall events, including Burkholderia cepacia complex (n = 162), Ralstonia pickettii (n = 48), Pseudomonas aeruginosa (n = 42), and Salmonella spp. (n = 28), which collectively account for a significant proportion (280 occurrences among 357 recalls with identified contaminating species) of quality failures and safety concerns in NSPs. For each listed microorganism, general features were collected covering taxonomic classification, morphological characteristics, physiological properties, and biochemical features, primarily sourced from BacDive24 and FungalTraits25 databases. In practice, morphological, biochemical, and sequence features may aid in the identification of these microorganisms in NSPs. Physiological features and risk-related information can support microbial risk assessment in product-specific contexts.

Technical Validation

Among the NSPOMdb-archived 1,360 NSP recall events, only 378 (27.8%) specified the species or genus of the contaminating microorganisms in the report (Fig. 3a). At the species level, Burkholderia cepacia was the most common objectionable microorganism in NSPs, followed by Ralstonia pickettii and Pseudomonas aeruginosa (Fig. 3b). Notably, some NSP recall event reports only specified the genus of contaminating microorganisms, such as Salmonella (Fig. 3c).

Fig. 3
Fig. 3
Full size image

Statistical analysis of non-sterile product recall events collected by NSPOMdb. (a) Distribution of the number of reported microorganisms per recall event. (b) Distribution of microbial species reported in recall events. (c) Distribution of microbial genera reported in recall events. (d) Distribution of dosage forms among recalled products. (e) Microbial species occurrence in powder and syrup products.

NSPOMdb categorized these recalled NSPs into 16 common types of dosage forms (Fig. 3d), based on the General Rule 1107, Chinese Pharmacopoeia (2025 edition)8. It provides a comprehensive and well-defined classification system for dosage forms. The relationship between NSP dosage forms and contaminating microorganisms is shown in Fig. 3e. Salmonella is the most frequent microorganism in powders, while Burkholderia cepacia is the most frequent microorganism in syrups. This information can guide users in determining which microorganisms should receive more attention based on the dosage form of their products.