Abstract
Quality defects in substandard medicines represent a threat to public health. Rapid and accurate identification of these defects is critical to prioritise the cases and implement regulatory measures. However, the development of surveillance systems to classify reports of health product defects remains a largely unmet need in medicine safety monitoring. The objective of this study is to implement an AI system to support the classification and prioritization of health product defect reports. To develop a deep learning system for the classification of health product defect reports, 13,830 reports collected between 2010 and 2021 were used. The reports were labelled by a panel of pharmacovigilance experts into 21 categories following standardised medical terminology. Our system harnesses state-of-the-art language algorithms that extract rich textual features to classify the reports. The functionality of the system is enhanced with explainable features that provide interpretability and actionable insights to decision-makers. Our system achieves top-1, top-2, and top-3 accuracies of 86%, 93%, and 96%, respectively. There is a statistically significant positive correlation between sample size and top-1 (Pearson’s r: 0.643; 95% CI: [0.2921, 0.8411]; p-value: 0.0016), top-2 (Pearson’s r: 0.735; 95% CI: [0.4439, 0.8856]; p-value: 0.0001), and top-3 (Pearson’s r: 0.635; 95% CI: [0.2808, 0.8374]; p-value: 0.0020) performance metrics. Likewise, model accuracy is positively correlated with confidence scores (Pearson’s r: 0.927; 95% CI: [0.8253, 0.9703]; p-value < 0.00001). A feature analysis reveals that the most influential words in model decision are conceptually and semantically related to their respective product defect categories. Our model has been validated with prospective data. The developed classification system allows for standardisation in case triage, and potentially improves case prioritisation and processing workflows, leading to more prompt response for quality defects with high public health impact.
Data availability
The data that support the findings of this study are not openly available due to the confidentiality and proprietary nature of the records, especially those obtained from companies’ product defect reports, information sharing via international regulatory working groups, and good manufacturing practice inspections. Data are located in controlled access data storage at the Singapore Health Sciences Authority. The data are, however, available from the corresponding author upon reasonable request.
Code availability
Open source libraries used in this study are referenced in the Resources section of the Methods. Custom code developed in this study is available at the following GitHub repository: https://github.com/hytting/Product-defect .
References
Nagaich, U. & Sadhna, D. Drug recall: An incubus for pharmaceutical companies and most serious drug recall of history. Int. J. Pharm. Investig. 5, 13–19 (2015).
US Food & Drug Administration. Annual Report. (2022). https://www.fda.gov/media/166289/download
Lindström-Gommers, L. & Mullin, T. International Conference on Harmonization: Recent reforms as a driver of global regulatory harmonization and innovation in medical products. Clin. Pharmacol. Ther. 105, 926–931 (2019).
Ang, P. S. et al. A risk classification model for prioritising the management of quality issues relating to substandard medicines in Singapore. Pharmacoepidemiol. Drug Saf. 31, 729–738 (2022).
Vasey, B. et al. Reporting guideline for the early-stage clinical evaluation of decision support systems driven by artificial intelligence: DECIDE-AI. Nat. Med. 28, 924–933 (2022).
Vaswani, A. et al. Attention Is All You Need. Adv. Neural Inf. Process. Syst. 30, 5999–6009 (2017).
Vig, J. A. Multiscale Visualization of Attention in the Transformer Model. ACL 2019–57th Annual Meeting of the Association for Computational Linguistics, Proceedings of System Demonstrations 37–42. arXiv preprint arXiv:1906.05714 (2019).
Rogers, A., Kovaleva, O. & Rumshisky, A. A primer in bertology: What we know about how BERT works. Trans. Assoc. Comput. Linguist. 8, 842–866 (2020).
Clark, K., Khandelwal, U., Levy, O. & Manning, C. D. What Does BERT Look At? An Analysis of BERT’s Attention. arXiv preprint arXiv:1906.04341. (2019).
He, P., Liu, X., Gao, J. & Chen, W. DeBERTa: Decoding-enhanced BERT with Disentangled Attention. arXiv preprint arXiv:2006.03654 (2020).
Suzgun, M. et al. Challenging BIG-Bench Tasks and Whether Chain-of-Thought Can Solve Them. (2022). arXiv preprint arXiv:2210.09261.
Raffel, C. et al. Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer. J. Mach. Learn. Res. 21, 5485–5551 (2020).
Radford, A. et al. Language Models are Unsupervised Multitask Learners. OpenAI blog. 1, 9 (2019).
Yang, Z. et al. XLNet: Generalized Autoregressive Pretraining for Language Understanding. Adv Neural Inf. Process. Syst https://doi.org/10.48550/arXiv.1906.08237 (2019).
Liu, Y. et al. RoBERTa: A Robustly Optimized BERT Pretraining Approach. arXiv preprint arXiv:1907.11692 (2019).
Brown, T. B. et al. Language Models are Few-Shot Learners. Adv. Neural Inf. Process. Syst. 33, 1877–1901 (2020).
Clark, K. & ELECTRA. : Pre-training Text Encoders as Discriminators Rather Than Generators. arXiv preprint arXiv:2003.10555 (2020).
Bahdanau, D., Cho, K. H. & Bengio, Y. Neural Machine Translation by Jointly Learning to Align and Translate. 3rd International Conference on Learning Representations, ICLR - Conference Track Proceedings. arXiv preprint arXiv:1409.0473 (2014). arXiv preprint arXiv:1409.0473 (2014). (2015).
Bommasani, R. et al. On the Opportunities and Risks of Foundation Models. arXiv preprint arXiv:2108.07258 (2021).
Ghaseminejad Raeini, M. The evolution of language models: From N-Grams to LLMs, and beyond. Nat. Lang. Process. J. 12, 100168 (2025).
Hu, Y. et al. PheCatcher: Leveraging LLM-Generated Synthetic Data for Automated Phenotype Definition Extraction from Biomedical Literature. Stud. Health Technol. Inf. 329, 718–722 (2025).
Li, Y., Li, J., He, J. & Tao, C. AE-GPT: Using large language models to extract adverse events from surveillance reports-A use case with influenza vaccine adverse events. PLoS One https://doi.org/10.1371/journal.pone.0300919 (2024).
Devlin, J. et al. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. NAACL HLT 2019–2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies - Proceedings of the Conference 1, 4171–4186. arXiv preprint arXiv:1810.04805 (2019).
Sun, C. et al. Biomedical named entity recognition using BERT in the machine reading comprehension framework. J Biomed. Inform 118, 103799 (2021).
Gu, Y. U. et al. Domain-Specific Language Model Pretraining for Biomedical Natural Language Processing. ACM Trans. Comput. Healthc. (HEALTH). 3 (1), 1–23 (2021).
Tan, F. et al. Multigrained Representation Analysis and Ensemble Learning for Text Moderation. IEEE Trans. Neural Netw. Learn. Syst. 34, 7014–7023 (2022).
Senn, S., Tlachac, M. L., Flores, R. & Rundensteiner, E. Ensembles of BERT for depression classification. Annu. Int. Conf. IEEE Eng. Med. Biol. Soc. 2022, 4691–4694 (2022).
Widad, A., El Habib, B. L. & Ayoub, E. F. Bert for Question Answering applied on Covid-19. Procedia Comput. Sci. 198, 379–384 (2022).
Xu, C., Yuan, F. & Chen, S. BJBN: BERT-JOIN-BiLSTM networks for medical auxiliary diagnostic. J. Healthc. Eng. https://doi.org/10.1155/2022/3496810 (2022).
Ji, Z., Wei, Q. & Xu, H. BERT-based ranking for biomedical entity normalization. AMIA Jt. Summits Transl. Sci. Proc. 2020, 269–277 (2020).
Jiang, L. et al. IUP-BERT: Identification of umami peptides based on BERT features. Foods 11, 3742 (2022).
Aldahdooh, J., Vähä-Koskela, M., Tang, J. & Tanoli, Z. Using BERT to identify drug-target interactions from whole PubMed. BMC Bioinformatics 23245. (2022).
Tejani, A. S. et al. Performance of Multiple Pretrained BERT Models to Automate and Accelerate Data Annotation for Large Datasets. Radiol Artif. Intell 4, e220007 (2022).
Kuo, C. C., Chen, K. Y. & Luo, S. B. Audio-Aware Spoken Multiple-Choice Question Answering with Pre-Trained Language Models. IEEE/ACM Trans. Audio Speech Lang. Process. 29, 3170–3179 (2021).
Wang, Z. Y. et al. Pre-trained models based receiver design with natural redundancy for Chinese characters. IEEE Commun. Lett. 26, 2350–2354 (2022).
Kowsher, M. et al. Bangla-BERT: Transformer-based efficient model for transfer learning and language understanding. IEEE Access 10, 91855–91870 (2022).
Zhu, X., Wu, H. & Zhang, L. Automatic short-answer grading via BERT-based deep neural networks. IEEE Trans. Learn. Technol. 15, 364–375 (2022).
Liu, N., Hu, Q., Xu, H., Xu, X. & Chen, M. Med-BERT: A pretraining framework for medical records named entity recognition. IEEE Trans. Industr Inf. 18, 5600–5608 (2022).
Zhou, C. Comparative evaluation of GPT, BERT, and XLNet: Insights into their performance and applicability in NLP tasks. Trans. Comput. Sci. Intell. Syst. Res. 7, 415–421 (2024).
Gardazi, N. M. et al. BERT applications in natural language processing: a review. Artif. Intell. Rev. 2025 58, 166 (2025).
Zhong, R., Ghosh, D., Klein, D. & Steinhardt, J. Are Larger Pretrained Language Models Uniformly Better? Comparing Performance at the Instance Level. Findings of the Association for Computational Linguistics: ACL-IJCNLP 3813–3827 (2021).
Vinyals, O. et al. Matching Networks for One Shot Learning. Adv. Neural Inf. Process. Syst. 29, 3637–3645 (2016).
Baevski, A. et al. Cloze-driven Pretraining of Self-attention Networks. EMNLP-IJCNLP 2019–2019 Conference on Empirical Methods in Natural Language Processing and 9th International Joint Conference on Natural Language Processing, Proceedings of the Conference 5360–5369. arXiv preprint arXiv:1903.07785 (2019).
Schick, T. & Schütze, H. Exploiting Cloze Questions for Few Shot Text Classification and Natural Language Inference. EACL –16th Conference of the European Chapter of the Association for Computational Linguistics, Proceedings of the Conference 255–269. arXiv preprint arXiv:2001.07676 (2020). 255–269. arXiv preprint arXiv:2001.07676 (2020). (2021).
Schick, T. & Schütze, H. It’s Not Just Size That Matters: Small Language Models Are Also Few-Shot Learners. NAACL-HLT 2021–2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Proceedings of the Conference 2339–2352. arXiv preprint arXiv:2009.07118 (2020).
Gao, T., Fisch, A. & Chen, D. Making Pre-trained Language Models Better Few-shot Learners. ACL-IJCNLP 2021–59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing, Proceedings of the Conference 3816–3830. arXiv preprint arXiv:2012.15723 (2020).
Shin, T. et al. Eliciting Knowledge from Language Models with Automatically Generated Prompts. EMNLP –2020 Conference on Empirical Methods in Natural Language Processing, Proceedings of the Conference 4222–4235. arXiv preprint arXiv:2010.15980 (2020). 4222–4235. arXiv preprint arXiv:2010.15980 (2020).
Lester, B., Al-Rfou, R. & Constant, N. The Power of Scale for Parameter-Efficient Prompt Tuning. EMNLP –2021 Conference on Empirical Methods in Natural Language Processing, Proceedings 3045–3059. arXiv preprint arXiv:2104.08691 (2021). 3045–3059. arXiv preprint arXiv:2104.08691 (2021).
Liu, X. et al. GPT Understands, Too. AI Open https://doi.org/10.1016/j.aiopen.2023.08.012(2023).
Li, X. L., Liang, P. & Prefix-Tuning Optimizing Continuous Prompts for Generation. ACL-IJCNLP 2021–59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing, Proceedings of the Conference 4582–4597. arXiv preprint arXiv:2101.00190 (2021).
Qin, G. & Eisner, J. Learning How to Ask: Querying LMs with Mixtures of Soft Prompts. NAACL-HLT 2021–2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Proceedings of the Conference 5203–5212. arXiv preprint arXiv:2104.06599 (2021).
Liu, X. et al. P-Tuning v2: Prompt Tuning Can Be Comparable to Fine-tuning Universally Across Scales and Tasks. arXiv preprint arXiv:2110.07602 (2021).
Khandelwal, U., He, H., Qi, P., Jurafsky, D. S. & Nearby Fuzzy Far Away: How Neural Language Models Use Context. ACL –56th Annual Meeting of the Association for Computational Linguistics, Proceedings of the Conference (Long Papers) 1, 284–294. arXiv preprint arXiv:1805.04623 (2018). 1, 284–294. arXiv preprint arXiv:1805.04623 (2018).
Zorzi, M., Combi, C., Lora, R., Pagliarini, M. & Moretti, U. Automagically encoding Adverse Drug Reactions in MedDRA. International Conference on Healthcare Informatics, IEEE 90–99 (2015). 90–99 (2015).
Tiftikci, M., Özgür, A., He, Y. & Hur, J. Machine learning-based identification and rule-based normalization of adverse drug reactions in drug labels. BMC Bioinform. 20, 1–9 (2019).
Létinier, L. et al. Artificial Intelligence for Unstructured Healthcare Data: Application to Coding of Patient Reporting of Adverse Drug Reactions. Clin. Pharmacol. Ther. 110, 392–400 (2021).
McInnes, L., Healy, J. & Melville, J. U. M. A. P. Uniform Manifold Approximation and Projection for Dimension Reduction. arXiv preprint arXiv:1802.03426 (2018).
Lundberg, S. M. & Lee, S. I. A Unified Approach to Interpreting Model Predictions. Adv Neural Inf. Process. Syst https://doi.org/10.48550/arXiv.1705.07874 (2017).
Peryea, T. et al. Global Substance Registration System: consistent scientific descriptions for substances related to health. Nucleic Acids Res. 49, D1179–D1185 (2021).
Li, Y. et al. Artificial intelligence-powered pharmacovigilance: A review of machine and deep learning in clinical text-based adverse drug event detection for benchmark datasets. J. Biomed. Inf. 152, 104621 (2024).
Howard, J. & Ruder, S. Universal Language Model Fine-tuning for Text Classification. ACL –56th Annual Meeting of the Association for Computational Linguistics, Proceedings of the Conference (Long Papers) 1, 328–339. arXiv preprint arXiv:1801.06146 (2018). 1, 328–339. arXiv preprint arXiv:1801.06146 (2018).
He, J. et al. Prompt Tuning in Biomedical Relation Extraction. J. Healthc. Inf. Res. 8, 206–224 (2024).
Chooi, W. H. et al. Vaccine contamination: Causes and control. Vaccine 40, 1699–1701 (2022).
Wu, Y. et al. Google’s Neural Machine Translation System: Bridging the Gap between Human and Machine Translation. arXiv preprint arXiv :160908144 (2016).
Hao, Y., Dong, L., Wei, F. & Xu, K. Visualizing and Understanding the Effectiveness of BERT. EMNLP-IJCNLP 2019–2019 Conference on Empirical Methods in Natural Language Processing and 9th International Joint Conference on Natural Language Processing, Proceedings of the Conference 4143–4152. arXiv preprint arXiv:1908.05620 (2019).
Tan, C. et al. A Survey on Deep Transfer Learning. In: Kůrková, V., Manolopoulos, Y., Hammer, B., Iliadis, L., Maglogiannis, I. (eds) Artificial Neural Networks and Machine Learning – ICANN 2018. ICANN 2018. Lecture Notes in Computer Science(), vol 11141. Springer, Cham. https://doi.org/10.1007/978-3-030-01424-7_27 (2018)
Kingma, D. P., Ba, J. L. & Adam A Method for Stochastic Optimization. 3rd International Conference on Learning Representations, ICLR - Conference Track Proceedings. arXiv preprint arXiv:1412.6980 (2014). arXiv preprint arXiv:1412.6980 (2014). (2015).
Loshchilov, I. & Hutter, F. Decoupled Weight Decay Regularization. 7th International Conference on Learning Representations, ICLR. arXiv preprint arXiv:1711.05101 (2017). arXiv preprint arXiv:1711.05101 (2017). (2019).
Beltagy, I., Lo, K. & Cohan, A. SciBERT: A Pretrained Language Model for Scientific Text. EMNLP-IJCNLP –2019 Conference on Empirical Methods in Natural Language Processing and 9th International Joint Conference on Natural Language Processing, Proceedings of the Conference 3615–3620 (2019). 3615–3620 (2019).
Alsentzer, E. et al. Publicly Available Clinical BERT Embeddings. Proceedings of the 2nd Clinical Natural Language Processing Workshop, 72–78 (2019).
Johnson, A. E. W. et al. MIMIC-III, a freely accessible critical care database. Sci. Data. 2016 3, 1–9 (2016).
Lee, J. et al. BioBERT: A pre-trained biomedical language representation model for biomedical text mining. Bioinformatics 36, 1234–1240 (2019).
Shrikumar, A., Greenside, P. & Kundaje, A. Learning important features through propagating activation differences. 34th International Conference on Machine Learning ICML 2017 70, 3145–3153 (2017).
Acknowledgements
We acknowledge Michelle Ng, Chih Tzer Choong, Doris Phuah, Dorothy Tan, Filina Tan, Huilin Huang, Maggie Tan and Jalene Poh for their expert opinion and assistance in this work.We want to thank Govindaraj Roshni Daksha for performing a thorough error analysis and providing valuable insights.
Funding
This initiative received no specific grant from any funding agency in the public, commercial, or not-for-profit sectors.
Author information
Authors and Affiliations
Contributions
Desmond Teo, Sreemanee Dorajoo and Pei San Ang proposed the research idea. Vicente Sancenon, Yiting Huang and Lin Zou designed the models and analysed the data. Desmond Teo, Sreemanee Dorajoo and Pei San Ang provided the domain expertise for manual annotation of the data. Han Leong Goh and Andy Ta provided the thought leadership for the project.
Corresponding author
Ethics declarations
Competing interests
The authors declare no competing interests.
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary Information
Below is the link to the electronic supplementary material.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.
About this article
Cite this article
Sancenon, V., Huang, Y., Zou, L. et al. Classification of health product defect reports by deep learning. Sci Rep (2026). https://doi.org/10.1038/s41598-026-43961-3
Received:
Accepted:
Published:
DOI: https://doi.org/10.1038/s41598-026-43961-3