Classification of health product defect reports by deep learning

Sancenon, Vicente; Huang, Yiting; Zou, Lin; Teo, Desmond C. H.; Dorajoo, Sreemanee R.; Ang, Pei San; Goh, Han Leong; Ta, Andy W. A.

doi:10.1038/s41598-026-43961-3

Download PDF

Article
Open access
Published: 14 March 2026

Classification of health product defect reports by deep learning

Vicente Sancenon¹^na1,
Yiting Huang¹^na1,
Lin Zou¹,
Desmond C. H. Teo²,
Sreemanee R. Dorajoo²,
Pei San Ang²,
Han Leong Goh¹ &
…
Andy W. A. Ta¹

Scientific Reports , Article number: (2026) Cite this article

671 Accesses
Metrics details

We are providing an unedited version of this manuscript to give early access to its findings. Before final publication, the manuscript will undergo further editing. Please note there may be errors present which affect the content, and all legal disclaimers apply.

Subjects

Abstract

Quality defects in substandard medicines represent a threat to public health. Rapid and accurate identification of these defects is critical to prioritise the cases and implement regulatory measures. However, the development of surveillance systems to classify reports of health product defects remains a largely unmet need in medicine safety monitoring. The objective of this study is to implement an AI system to support the classification and prioritization of health product defect reports. To develop a deep learning system for the classification of health product defect reports, 13,830 reports collected between 2010 and 2021 were used. The reports were labelled by a panel of pharmacovigilance experts into 21 categories following standardised medical terminology. Our system harnesses state-of-the-art language algorithms that extract rich textual features to classify the reports. The functionality of the system is enhanced with explainable features that provide interpretability and actionable insights to decision-makers. Our system achieves top-1, top-2, and top-3 accuracies of 86%, 93%, and 96%, respectively. There is a statistically significant positive correlation between sample size and top-1 (Pearson’s r: 0.643; 95% CI: [0.2921, 0.8411]; p-value: 0.0016), top-2 (Pearson’s r: 0.735; 95% CI: [0.4439, 0.8856]; p-value: 0.0001), and top-3 (Pearson’s r: 0.635; 95% CI: [0.2808, 0.8374]; p-value: 0.0020) performance metrics. Likewise, model accuracy is positively correlated with confidence scores (Pearson’s r: 0.927; 95% CI: [0.8253, 0.9703]; p-value < 0.00001). A feature analysis reveals that the most influential words in model decision are conceptually and semantically related to their respective product defect categories. Our model has been validated with prospective data. The developed classification system allows for standardisation in case triage, and potentially improves case prioritisation and processing workflows, leading to more prompt response for quality defects with high public health impact.

Data availability

The data that support the findings of this study are not openly available due to the confidentiality and proprietary nature of the records, especially those obtained from companies’ product defect reports, information sharing via international regulatory working groups, and good manufacturing practice inspections. Data are located in controlled access data storage at the Singapore Health Sciences Authority. The data are, however, available from the corresponding author upon reasonable request.

Code availability

Open source libraries used in this study are referenced in the Resources section of the Methods. Custom code developed in this study is available at the following GitHub repository: https://github.com/hytting/Product-defect .

References

Nagaich, U. & Sadhna, D. Drug recall: An incubus for pharmaceutical companies and most serious drug recall of history. Int. J. Pharm. Investig. 5, 13–19 (2015).
Google Scholar
US Food & Drug Administration. Annual Report. (2022). https://www.fda.gov/media/166289/download
Lindström-Gommers, L. & Mullin, T. International Conference on Harmonization: Recent reforms as a driver of global regulatory harmonization and innovation in medical products. Clin. Pharmacol. Ther. 105, 926–931 (2019).
Google Scholar
Ang, P. S. et al. A risk classification model for prioritising the management of quality issues relating to substandard medicines in Singapore. Pharmacoepidemiol. Drug Saf. 31, 729–738 (2022).
Google Scholar
Vasey, B. et al. Reporting guideline for the early-stage clinical evaluation of decision support systems driven by artificial intelligence: DECIDE-AI. Nat. Med. 28, 924–933 (2022).
Google Scholar
Vaswani, A. et al. Attention Is All You Need. Adv. Neural Inf. Process. Syst. 30, 5999–6009 (2017).
Google Scholar
Vig, J. A. Multiscale Visualization of Attention in the Transformer Model. ACL 2019–57th Annual Meeting of the Association for Computational Linguistics, Proceedings of System Demonstrations 37–42. arXiv preprint arXiv:1906.05714 (2019).
Rogers, A., Kovaleva, O. & Rumshisky, A. A primer in bertology: What we know about how BERT works. Trans. Assoc. Comput. Linguist. 8, 842–866 (2020).
Google Scholar
Clark, K., Khandelwal, U., Levy, O. & Manning, C. D. What Does BERT Look At? An Analysis of BERT’s Attention. arXiv preprint arXiv:1906.04341. (2019).
He, P., Liu, X., Gao, J. & Chen, W. DeBERTa: Decoding-enhanced BERT with Disentangled Attention. arXiv preprint arXiv:2006.03654 (2020).
Suzgun, M. et al. Challenging BIG-Bench Tasks and Whether Chain-of-Thought Can Solve Them. (2022). arXiv preprint arXiv:2210.09261.
Raffel, C. et al. Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer. J. Mach. Learn. Res. 21, 5485–5551 (2020).
Google Scholar
Radford, A. et al. Language Models are Unsupervised Multitask Learners. OpenAI blog. 1, 9 (2019).
Google Scholar
Yang, Z. et al. XLNet: Generalized Autoregressive Pretraining for Language Understanding. Adv Neural Inf. Process. Syst https://doi.org/10.48550/arXiv.1906.08237 (2019).
Google Scholar
Liu, Y. et al. RoBERTa: A Robustly Optimized BERT Pretraining Approach. arXiv preprint arXiv:1907.11692 (2019).
Brown, T. B. et al. Language Models are Few-Shot Learners. Adv. Neural Inf. Process. Syst. 33, 1877–1901 (2020).
Google Scholar
Clark, K. & ELECTRA. : Pre-training Text Encoders as Discriminators Rather Than Generators. arXiv preprint arXiv:2003.10555 (2020).
Bahdanau, D., Cho, K. H. & Bengio, Y. Neural Machine Translation by Jointly Learning to Align and Translate. 3rd International Conference on Learning Representations, ICLR - Conference Track Proceedings. arXiv preprint arXiv:1409.0473 (2014). arXiv preprint arXiv:1409.0473 (2014). (2015).
Bommasani, R. et al. On the Opportunities and Risks of Foundation Models. arXiv preprint arXiv:2108.07258 (2021).
Ghaseminejad Raeini, M. The evolution of language models: From N-Grams to LLMs, and beyond. Nat. Lang. Process. J. 12, 100168 (2025).
Google Scholar
Hu, Y. et al. PheCatcher: Leveraging LLM-Generated Synthetic Data for Automated Phenotype Definition Extraction from Biomedical Literature. Stud. Health Technol. Inf. 329, 718–722 (2025).
Google Scholar
Li, Y., Li, J., He, J. & Tao, C. AE-GPT: Using large language models to extract adverse events from surveillance reports-A use case with influenza vaccine adverse events. PLoS One https://doi.org/10.1371/journal.pone.0300919 (2024).
Google Scholar
Devlin, J. et al. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. NAACL HLT 2019–2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies - Proceedings of the Conference 1, 4171–4186. arXiv preprint arXiv:1810.04805 (2019).
Sun, C. et al. Biomedical named entity recognition using BERT in the machine reading comprehension framework. J Biomed. Inform 118, 103799 (2021).
Google Scholar
Gu, Y. U. et al. Domain-Specific Language Model Pretraining for Biomedical Natural Language Processing. ACM Trans. Comput. Healthc. (HEALTH). 3 (1), 1–23 (2021).
Google Scholar
Tan, F. et al. Multigrained Representation Analysis and Ensemble Learning for Text Moderation. IEEE Trans. Neural Netw. Learn. Syst. 34, 7014–7023 (2022).
Google Scholar
Senn, S., Tlachac, M. L., Flores, R. & Rundensteiner, E. Ensembles of BERT for depression classification. Annu. Int. Conf. IEEE Eng. Med. Biol. Soc. 2022, 4691–4694 (2022).
Google Scholar
Widad, A., El Habib, B. L. & Ayoub, E. F. Bert for Question Answering applied on Covid-19. Procedia Comput. Sci. 198, 379–384 (2022).
Google Scholar
Xu, C., Yuan, F. & Chen, S. BJBN: BERT-JOIN-BiLSTM networks for medical auxiliary diagnostic. J. Healthc. Eng. https://doi.org/10.1155/2022/3496810 (2022).
Google Scholar
Ji, Z., Wei, Q. & Xu, H. BERT-based ranking for biomedical entity normalization. AMIA Jt. Summits Transl. Sci. Proc. 2020, 269–277 (2020).
Google Scholar
Jiang, L. et al. IUP-BERT: Identification of umami peptides based on BERT features. Foods 11, 3742 (2022).
Google Scholar
Aldahdooh, J., Vähä-Koskela, M., Tang, J. & Tanoli, Z. Using BERT to identify drug-target interactions from whole PubMed. BMC Bioinformatics 23245. (2022).
Google Scholar
Tejani, A. S. et al. Performance of Multiple Pretrained BERT Models to Automate and Accelerate Data Annotation for Large Datasets. Radiol Artif. Intell 4, e220007 (2022).
Google Scholar
Kuo, C. C., Chen, K. Y. & Luo, S. B. Audio-Aware Spoken Multiple-Choice Question Answering with Pre-Trained Language Models. IEEE/ACM Trans. Audio Speech Lang. Process. 29, 3170–3179 (2021).
Google Scholar
Wang, Z. Y. et al. Pre-trained models based receiver design with natural redundancy for Chinese characters. IEEE Commun. Lett. 26, 2350–2354 (2022).
Google Scholar
Kowsher, M. et al. Bangla-BERT: Transformer-based efficient model for transfer learning and language understanding. IEEE Access 10, 91855–91870 (2022).
Google Scholar
Zhu, X., Wu, H. & Zhang, L. Automatic short-answer grading via BERT-based deep neural networks. IEEE Trans. Learn. Technol. 15, 364–375 (2022).
Google Scholar
Liu, N., Hu, Q., Xu, H., Xu, X. & Chen, M. Med-BERT: A pretraining framework for medical records named entity recognition. IEEE Trans. Industr Inf. 18, 5600–5608 (2022).
Google Scholar
Zhou, C. Comparative evaluation of GPT, BERT, and XLNet: Insights into their performance and applicability in NLP tasks. Trans. Comput. Sci. Intell. Syst. Res. 7, 415–421 (2024).
Google Scholar
Gardazi, N. M. et al. BERT applications in natural language processing: a review. Artif. Intell. Rev. 2025 58, 166 (2025).
Google Scholar
Zhong, R., Ghosh, D., Klein, D. & Steinhardt, J. Are Larger Pretrained Language Models Uniformly Better? Comparing Performance at the Instance Level. Findings of the Association for Computational Linguistics: ACL-IJCNLP 3813–3827 (2021).
Vinyals, O. et al. Matching Networks for One Shot Learning. Adv. Neural Inf. Process. Syst. 29, 3637–3645 (2016).
Google Scholar
Baevski, A. et al. Cloze-driven Pretraining of Self-attention Networks. EMNLP-IJCNLP 2019–2019 Conference on Empirical Methods in Natural Language Processing and 9th International Joint Conference on Natural Language Processing, Proceedings of the Conference 5360–5369. arXiv preprint arXiv:1903.07785 (2019).
Schick, T. & Schütze, H. Exploiting Cloze Questions for Few Shot Text Classification and Natural Language Inference. EACL –16th Conference of the European Chapter of the Association for Computational Linguistics, Proceedings of the Conference 255–269. arXiv preprint arXiv:2001.07676 (2020). 255–269. arXiv preprint arXiv:2001.07676 (2020). (2021).
Schick, T. & Schütze, H. It’s Not Just Size That Matters: Small Language Models Are Also Few-Shot Learners. NAACL-HLT 2021–2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Proceedings of the Conference 2339–2352. arXiv preprint arXiv:2009.07118 (2020).
Gao, T., Fisch, A. & Chen, D. Making Pre-trained Language Models Better Few-shot Learners. ACL-IJCNLP 2021–59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing, Proceedings of the Conference 3816–3830. arXiv preprint arXiv:2012.15723 (2020).
Shin, T. et al. Eliciting Knowledge from Language Models with Automatically Generated Prompts. EMNLP –2020 Conference on Empirical Methods in Natural Language Processing, Proceedings of the Conference 4222–4235. arXiv preprint arXiv:2010.15980 (2020). 4222–4235. arXiv preprint arXiv:2010.15980 (2020).
Lester, B., Al-Rfou, R. & Constant, N. The Power of Scale for Parameter-Efficient Prompt Tuning. EMNLP –2021 Conference on Empirical Methods in Natural Language Processing, Proceedings 3045–3059. arXiv preprint arXiv:2104.08691 (2021). 3045–3059. arXiv preprint arXiv:2104.08691 (2021).
Liu, X. et al. GPT Understands, Too. AI Open https://doi.org/10.1016/j.aiopen.2023.08.012(2023).
Li, X. L., Liang, P. & Prefix-Tuning Optimizing Continuous Prompts for Generation. ACL-IJCNLP 2021–59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing, Proceedings of the Conference 4582–4597. arXiv preprint arXiv:2101.00190 (2021).
Qin, G. & Eisner, J. Learning How to Ask: Querying LMs with Mixtures of Soft Prompts. NAACL-HLT 2021–2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Proceedings of the Conference 5203–5212. arXiv preprint arXiv:2104.06599 (2021).
Liu, X. et al. P-Tuning v2: Prompt Tuning Can Be Comparable to Fine-tuning Universally Across Scales and Tasks. arXiv preprint arXiv:2110.07602 (2021).
Khandelwal, U., He, H., Qi, P., Jurafsky, D. S. & Nearby Fuzzy Far Away: How Neural Language Models Use Context. ACL –56th Annual Meeting of the Association for Computational Linguistics, Proceedings of the Conference (Long Papers) 1, 284–294. arXiv preprint arXiv:1805.04623 (2018). 1, 284–294. arXiv preprint arXiv:1805.04623 (2018).
Zorzi, M., Combi, C., Lora, R., Pagliarini, M. & Moretti, U. Automagically encoding Adverse Drug Reactions in MedDRA. International Conference on Healthcare Informatics, IEEE 90–99 (2015). 90–99 (2015).
Tiftikci, M., Özgür, A., He, Y. & Hur, J. Machine learning-based identification and rule-based normalization of adverse drug reactions in drug labels. BMC Bioinform. 20, 1–9 (2019).
Google Scholar
Létinier, L. et al. Artificial Intelligence for Unstructured Healthcare Data: Application to Coding of Patient Reporting of Adverse Drug Reactions. Clin. Pharmacol. Ther. 110, 392–400 (2021).
Google Scholar
McInnes, L., Healy, J. & Melville, J. U. M. A. P. Uniform Manifold Approximation and Projection for Dimension Reduction. arXiv preprint arXiv:1802.03426 (2018).
Lundberg, S. M. & Lee, S. I. A Unified Approach to Interpreting Model Predictions. Adv Neural Inf. Process. Syst https://doi.org/10.48550/arXiv.1705.07874 (2017).
Google Scholar
Peryea, T. et al. Global Substance Registration System: consistent scientific descriptions for substances related to health. Nucleic Acids Res. 49, D1179–D1185 (2021).
Google Scholar
Li, Y. et al. Artificial intelligence-powered pharmacovigilance: A review of machine and deep learning in clinical text-based adverse drug event detection for benchmark datasets. J. Biomed. Inf. 152, 104621 (2024).
Google Scholar
Howard, J. & Ruder, S. Universal Language Model Fine-tuning for Text Classification. ACL –56th Annual Meeting of the Association for Computational Linguistics, Proceedings of the Conference (Long Papers) 1, 328–339. arXiv preprint arXiv:1801.06146 (2018). 1, 328–339. arXiv preprint arXiv:1801.06146 (2018).
He, J. et al. Prompt Tuning in Biomedical Relation Extraction. J. Healthc. Inf. Res. 8, 206–224 (2024).
Google Scholar
Chooi, W. H. et al. Vaccine contamination: Causes and control. Vaccine 40, 1699–1701 (2022).
Google Scholar
Wu, Y. et al. Google’s Neural Machine Translation System: Bridging the Gap between Human and Machine Translation. arXiv preprint arXiv :160908144 (2016).
Hao, Y., Dong, L., Wei, F. & Xu, K. Visualizing and Understanding the Effectiveness of BERT. EMNLP-IJCNLP 2019–2019 Conference on Empirical Methods in Natural Language Processing and 9th International Joint Conference on Natural Language Processing, Proceedings of the Conference 4143–4152. arXiv preprint arXiv:1908.05620 (2019).
Tan, C. et al. A Survey on Deep Transfer Learning. In: Kůrková, V., Manolopoulos, Y., Hammer, B., Iliadis, L., Maglogiannis, I. (eds) Artificial Neural Networks and Machine Learning – ICANN 2018. ICANN 2018. Lecture Notes in Computer Science(), vol 11141. Springer, Cham. https://doi.org/10.1007/978-3-030-01424-7_27 (2018)
Kingma, D. P., Ba, J. L. & Adam A Method for Stochastic Optimization. 3rd International Conference on Learning Representations, ICLR - Conference Track Proceedings. arXiv preprint arXiv:1412.6980 (2014). arXiv preprint arXiv:1412.6980 (2014). (2015).
Loshchilov, I. & Hutter, F. Decoupled Weight Decay Regularization. 7th International Conference on Learning Representations, ICLR. arXiv preprint arXiv:1711.05101 (2017). arXiv preprint arXiv:1711.05101 (2017). (2019).
Beltagy, I., Lo, K. & Cohan, A. SciBERT: A Pretrained Language Model for Scientific Text. EMNLP-IJCNLP –2019 Conference on Empirical Methods in Natural Language Processing and 9th International Joint Conference on Natural Language Processing, Proceedings of the Conference 3615–3620 (2019). 3615–3620 (2019).
Alsentzer, E. et al. Publicly Available Clinical BERT Embeddings. Proceedings of the 2nd Clinical Natural Language Processing Workshop, 72–78 (2019).
Johnson, A. E. W. et al. MIMIC-III, a freely accessible critical care database. Sci. Data. 2016 3, 1–9 (2016).
Google Scholar
Lee, J. et al. BioBERT: A pre-trained biomedical language representation model for biomedical text mining. Bioinformatics 36, 1234–1240 (2019).
Google Scholar
Shrikumar, A., Greenside, P. & Kundaje, A. Learning important features through propagating activation differences. 34th International Conference on Machine Learning ICML 2017 70, 3145–3153 (2017).
Google Scholar

Download references

Acknowledgements

We acknowledge Michelle Ng, Chih Tzer Choong, Doris Phuah, Dorothy Tan, Filina Tan, Huilin Huang, Maggie Tan and Jalene Poh for their expert opinion and assistance in this work.We want to thank Govindaraj Roshni Daksha for performing a thorough error analysis and providing valuable insights.

Funding

This initiative received no specific grant from any funding agency in the public, commercial, or not-for-profit sectors.

Author information

Vicente Sancenon and Yiting Huang have contributed equally to this work.

Authors and Affiliations

Synapxe, 1 N Buona Vista Link, #05-01 Elementum, Singapore, 139691, Singapore
Vicente Sancenon, Yiting Huang, Lin Zou, Han Leong Goh & Andy W. A. Ta
Health Sciences Authority (HSA), 11 Biopolis Way, #11-01, Helios, Singapore, 138667, Singapore
Desmond C. H. Teo, Sreemanee R. Dorajoo & Pei San Ang

Authors

Vicente Sancenon
View author publications
Search author on:PubMed Google Scholar
Yiting Huang
View author publications
Search author on:PubMed Google Scholar
Lin Zou
View author publications
Search author on:PubMed Google Scholar
Desmond C. H. Teo
View author publications
Search author on:PubMed Google Scholar
Sreemanee R. Dorajoo
View author publications
Search author on:PubMed Google Scholar
Pei San Ang
View author publications
Search author on:PubMed Google Scholar
Han Leong Goh
View author publications
Search author on:PubMed Google Scholar
Andy W. A. Ta
View author publications
Search author on:PubMed Google Scholar

Contributions

Desmond Teo, Sreemanee Dorajoo and Pei San Ang proposed the research idea. Vicente Sancenon, Yiting Huang and Lin Zou designed the models and analysed the data. Desmond Teo, Sreemanee Dorajoo and Pei San Ang provided the domain expertise for manual annotation of the data. Han Leong Goh and Andy Ta provided the thought leadership for the project.

Corresponding author

Correspondence to Vicente Sancenon.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

Below is the link to the electronic supplementary material.

Supplementary Material 1 (download DOCX )

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.

Reprints and permissions

About this article

Cite this article

Sancenon, V., Huang, Y., Zou, L. et al. Classification of health product defect reports by deep learning. Sci Rep (2026). https://doi.org/10.1038/s41598-026-43961-3

Download citation

Received: 20 January 2025
Accepted: 09 March 2026
Published: 14 March 2026
DOI: https://doi.org/10.1038/s41598-026-43961-3

Classification of health product defect reports by deep learning

Subjects

Abstract

Data availability

Code availability

References

Acknowledgements

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Competing interests

Additional information

Publisher’s note

Supplementary Information

Supplementary Material 1 (download DOCX )

Rights and permissions

About this article

Cite this article

Keywords

Search

Quick links

Subjects

Abstract

Data availability

Code availability

References

Acknowledgements

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Competing interests

Additional information

Publisher’s note

Supplementary Information

Supplementary Material 1 (download DOCX )

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Quick links