Commentary on: Frequently asked questions on erectile dysfunction: evaluating artificial intelligence answers with expert mentorship

Venishetty, Nikit; Raheem, Omer A.

doi:10.1038/s41443-024-00901-x

Download PDF

Comment
Open access
Published: 24 May 2024

Commentary on: Frequently asked questions on erectile dysfunction: evaluating artificial intelligence answers with expert mentorship

International Journal of Impotence Research volume 37, pages 340–341 (2025)Cite this article

1321 Accesses
2 Citations
1 Altmetric
Metrics details

Subjects

In an article recently accepted for publication in the journal of International Journal of Impotence Research, Baturu et al. evaluated the accuracy of artificial intelligence (AI)-generated responses to frequently asked questions on erectile dysfunction (ED) [1]. By using two expert urologists who evaluated questions through the Global Quality Score (GQS), there were significant agreement measurements across AI platforms including BARD, ChatGPT 3.5, and ChatGPT 4 [1]. Specifically Baturu’s study found that ChatGPT 3.5 and ChatGPT 4 achieved a higher GQS compared to BARD in categories including causes (p < 0.001), treatment options (p < 0.001), protective measures (p < 0.013), relationships with other illnesses (p = 0.006) and treatment with herbal agents (p = 0.043). Moreover, the authors used the F1 metric to evaluate the models’s accuracy in machine learning (ML). With a higher score (1) indicating a better model performance, the authors found an overall F1 score of 0.58. Specific categories like causes, diagnosis, treatment options, and protective measures showed excellent results, while others lacked reliability due to the absence of information, warranting improvement in generated answers for those categories. The authors concluded that there was no significant difference between ChatGPT 3.5, ChatGPT 4, and BARD in terms of the quality of answers but had a better GQS.

It is apparent that over the past ten years, the use of AI in medicine has been evolving rapidly including machine learning (ML), artificial neural networks (ANNs), deep learning (DL), robots, and natural language processing (NLP) for massive data analysis. Specifically, AI chatbots have been increasingly used in various healthcare domains such as symptom detection to assist patients manage their conditions appropriately, with or without a physician [2, 3]. With more than 2.9 million outpatient visits made for ED, counsel, manage and treat their symptoms without invasive methodologies or delays in treatment [4]. Moreover, many men are embarrassed by their ED symptoms and experience concomitant (ED) in the United States alone, it becomes crucial to provide patients with an impactful way to shame, preventing them from reaching medical providers for assistance [5]. According to previous studies, only 32.4% of men feel comfortable in starting a conversation regarding ED with their providers [6]. Taken together, an impartial and unbiased entity such as ChatGPT may allow more men to seek help and care for their underlying ED and ultimately reducing significant anxiety associated with their medical issues. However, the use of AI language models has been understudied [7].

Current evidence has shown that AI responses can be used to provide valuable information regarding ED. Studies have found that AI open-source language models such as Google BARD and ChatGPT had significantly more accuracy, robustness, and unbiased responses compared to expert urologists [8]. Other studies have assessed the accuracy, readability, and reproducibility of ChatGPT’s answers to commonly requested questions about ED. The results demonstrated a fair degree of repeatability and high accuracy (Cohen’s kappa coefficient = 0.61) in providing thorough or accurate but insufficient answers about the epidemiology and dangers of ED [5]. On the other hand, comments about treatment and prevention were often too sophisticated for the average patient to read, and they were also less accurate with poor reliability [5].

It is important to note that GPT models are not trained in medical knowledge, while specialized systems such as Medpalm 2 can be used for medical purposes [9]. Another notable criticism of ChatGPT’s ability to assess medical inquiries is because of its human-like delivery style and propensity for patients to become unduly dependent on its responses without question, there is a significant risk of mistakes. Physicians must take an active role in the development and evaluation of AI-powered chatbots, rather than merely accepting them or interacting with them at a later stage, because of the unique character of these outputs and the hidden sources that support them [10]. Another limitation of ChatGPT’s medical ability lies within OpenAI’s usage policy that states to not provide tailored medical/health advice without review by a qualified medical professional [11]. Thus, medical information regarding one’s ED must be cautiously taken to avoid serious complications.

Lastly, the authors should be commended for their great efforts – albeit at its infancy stages—in evaluating the role of AI in ED. The integration of AI such as ChatGPT and BARD in healthcare delivery is still a controversial topic and indeed warrants further studies. However, to date, 1100 citations are referencing “ChatGPT” on PubMed showcasing the irreversible path to change [5]. Nonetheless, these AI models demonstrated excellent readability and accuracy when addressing the epidemiology and hazards of ED, they had difficulty in explaining the available alternatives for therapy and prevention, which may have limited their usefulness in managing ED in real-world settings. Moreover, concerns about lack of medical training and human-like delivery provide a risk of over-reliance and mistakes in medical queries, making physician participation in creation and review necessary to guarantee readability and correctness [10]. Furthermore, it would be useful to compare the levels of proficiency measured by each study in a standardized way to truly assess the accuracy of AI languages for ED issues. Finally, there is no doubt that the utilization of AI-based language machines may in part solve the advancement of urological care and the management of ED symptoms. However, the need for physician oversight and further studies assessing the efficacy of management of ED symptoms in clinical practice is necessary for further endorsement of AI models such as ChatGPT 3.5, ChatGPT 4, and BARD.

References

Baturu M, Solakhan M, Kazaz TG, Bayrak O. Frequently asked questions on erectile dysfunction: evaluating artificial intelligence answers with expert mentorship. Int J Impot Res. 2024. https://doi.org/10.1038/s41443-024-00898-3.
Venishetty N, Alkassis M, Raheem OA. The role of artificial intelligence in male infertility: evaluation and treatment: a narrative review. Urology. 2024;4:23–35. https://doi.org/10.3390/uro4020003.
Article Google Scholar
Şahin MF, Ateş H, Keleş A, Özcan R, Doğan Ç, Akgül M, et al. Responses of five different artificial intelligence chatbots to the top searched queries about erectile dysfunction: a comparative analysis. J Med Syst. 2024;48:38. https://doi.org/10.1007/s10916-024-02056-0.
Article PubMed PubMed Central Google Scholar
Miller DC, Saigal CS, Litwin MS. The demographic burden of urologic diseases in America. Urol Clin North Am. 2009;36:11–27. https://doi.org/10.1016/j.ucl.2008.08.004.
Article PubMed PubMed Central Google Scholar
Razdan S, Siegal AR, Brewer Y, Sljivich M, Valenzuela RJ. Assessing ChatGPT’s ability to answer questions pertaining to erectile dysfunction: can our patients trust it? Int J Impot Res. Published online November 2023. https://doi.org/10.1038/s41443-023-00797-z.
Ab Rahman AA, Al-Sadat N, Yun Low W. Help seeking behaviour among men with erectile dysfunction in primary care setting. J Mens Health. 2011;8:S94–6. https://doi.org/10.1016/S1875-6867(11)60033-X.
Article Google Scholar
Hsiang WR, Honig S, Leapman MS. Evaluation of online telehealth platforms for treatment of erectile dysfunction. J Urol. 2021;205:330–2. https://doi.org/10.1097/JU.0000000000001378.
Article PubMed Google Scholar
Raheem OA, Pathuri M, Marciano O. Assessment of artificial intelligence and ChatGPT quality in the management of erectile dysfunction. J Sex Med. 2024;21:qdae002.099. https://doi.org/10.1093/jsxmed/qdae002.099.
Article Google Scholar
Singhal K, Azizi S, Tu T, Mahdavi SS, Wei J, Chung HW, et al. Large language models encode clinical knowledge. Nature. 2023;620:172–80. https://doi.org/10.1038/s41586-023-06291-2.
Article CAS PubMed PubMed Central Google Scholar
Hershenhouse JS, Cacciamani GE. Comment on: Assessing ChatGPT’s ability to answer questions pertaining to erectile dysfunction. Int J Impot Res. 2024. https://doi.org/10.1038/s41443-023-00821-2.
Usage policies. Accessed April 14, 2024. https://openai.com/policies/usage-policies.

Download references

Author information

Authors and Affiliations

Paul L. Foster School of Medicine, Texas Tech Health Sciences Center, El Paso, TX, USA
Nikit Venishetty & Omer A. Raheem
Department of Surgery, Section of Urology, The University of Chicago, Chicago, IL, USA
Nikit Venishetty & Omer A. Raheem

Authors

Nikit Venishetty
View author publications
Search author on:PubMed Google Scholar
Omer A. Raheem
View author publications
Search author on:PubMed Google Scholar

Contributions

All authors contributed equally to manuscript. Authors NV and OAR contributed to analyzing data and writing manuscripts and references. Author OAR contributed and supervised the complete and final version of the manuscript and provided extensive revisions and final conclusions.

Corresponding author

Correspondence to Omer A. Raheem.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Venishetty, N., Raheem, O.A. Commentary on: Frequently asked questions on erectile dysfunction: evaluating artificial intelligence answers with expert mentorship. Int J Impot Res 37, 340–341 (2025). https://doi.org/10.1038/s41443-024-00901-x

Download citation

Received: 17 April 2024
Revised: 22 April 2024
Accepted: 26 April 2024
Published: 24 May 2024
Issue date: April 2025
DOI: https://doi.org/10.1038/s41443-024-00901-x

This article is cited by

Artificial Intelligence-Based Clinical Decision-Making in Erectile Dysfunction: a Narrative Review
- Ahmet Serdar Teoman
- Ege Can Serefoglu
Current Urology Reports (2025)
Commentary on: Can AI chatbots accurately answer patient questions regarding vasectomies?
- Omar Almidani
- Hend Alhosani
- Omer A. Raheem
International Journal of Impotence Research (2024)