Readability analysis of ChatGPT's responses on lung cancer

Gencer, Adem

doi:10.1038/s41598-024-67293-2

Download PDF

Article
Open access
Published: 26 July 2024

Readability analysis of ChatGPT's responses on lung cancer

Adem Gencer ORCID: orcid.org/0000-0003-1305-6524¹

Scientific Reports volume 14, Article number: 17234 (2024) Cite this article

3435 Accesses
16 Citations
1 Altmetric
Metrics details

Subjects

Abstract

For common diseases such as lung cancer, patients often use the internet to obtain medical information. As a result of advances in artificial intelligence and large language models such as ChatGPT, patients and health professionals use these tools to obtain medical information. The aim of this study was to evaluate the readability of ChatGPT-generated responses with different readability scales in the context of lung cancer. The most common questions in the lung cancer section of Medscape^® were reviewed, and questions on the definition, etiology, risk factors, diagnosis, treatment, and prognosis of lung cancer (both NSCLC and SCLC) were selected. A set of 80 questions were asked 10 times to ChatGPT via the OpenAI API. ChatGPT's responses were tested using various readability formulas. The mean Flesch Reading Ease, Flesch-Kincaid Grade Level, Gunning FOG Scale, SMOG Index, Automated Readability Index, Coleman-Liau Index, Linsear Write Formula, Dale-Chall Readability Score, and Spache Readability Formula scores are at a moderate level (mean and standard deviation: 40.52 ± 9.81, 12.56 ± 1.66, 13.63 ± 1.54, 14.61 ± 1.45, 15.04 ± 1.97, 14.24 ± 1.90, 11.96 ± 2.55, 10.03 ± 0.63 and 5.93 ± 0.50, respectively). The readability levels of the answers generated by ChatGPT are "collage" and above and are difficult to read. Perhaps in the near future, the ChatGPT can be programmed to produce responses that are appropriate for people of different educational and age groups.

A critical assessment of using ChatGPT for extracting structured data from clinical notes

Article Open access 01 May 2024

Quality of information and appropriateness of ChatGPT outputs for urology patients

Article 29 July 2023

Accuracy, readability, and understandability of large language models for prostate cancer information to the public

Article Open access 14 May 2024

Introduction

Lung cancer is one of the most common malignant tumors, with high morbidity and mortality¹. The five-year survival rate for individuals diagnosed with lung cancer is typically reported to be between 10 and 20%^2,3. As in many diseases, the internet is a popular platform to access information on lung cancer today⁴. The tendency of patients to search for answers to health-related questions on the internet is increasing day by day. While search engines are often used for this purpose, artificial intelligence tools such as GPT (Generative Pre-trained Transformer) are increasingly being used for this purpose as a result of developments in technology ⁴.

Large language models (LLMs) are algorithms that can detect and analyze natural language and generate unique responses and are new developments in artificial intelligence and neural networks⁵. OpenAI (San Francisco, CA) developed ChatGPT, one of the most well-known LLMs today⁶. Since its initial debut in November 2022, it has, on average, added 25 million users by February 2023. ChatGPT's generative powers set it apart from other AI solutions⁷. ChatGPT is a promising technology that has the potential to revolutionize the healthcare industry, including pharmacy, by offering practitioners, students, and researchers’ most up-to-date medical information and support in a conversational, interactive manner⁸.

ChatGPT's ability to provide accurate and fast answers to complex health questions has attracted the interest of many researchers, and many studies are planned to examine the potential of ChatGPT on medical-related topics. Previous research has shown that ChatGPT can be successful in medical exams^9,10. Some researchers have mentioned the advantages of ChatGPT in medical article writing^6,11,12. ChatGPT also provides diagnosis and treatment recommendations to patients and healthcare professionals regarding medical issues^13,14,15. Therefore, ChatGPT is being used, researched, and tested by more and more people in this field.

Undoubtedly, the accuracy and reliability of ChatGPT's answers to health-related questions are extremely important. Several studies have been documented in the academic literature pertaining to this particular topic^14,16,17. Nevertheless, the readability and comprehensibility of the responses generated by ChatGPT are equally significant factors to consider. The aim of this study was to evaluate the readability of ChatGPT-generated responses with different readability scales in the context of lung cancer.

Material and methods

This article does not contain any studies with human or animal subjects, and ethical approval is not applicable for this article.

For this study, the most common questions in the lung cancer section of Medscape^® (WebMD LCC, US) were reviewed, and 80 questions on the definition, etiology, risk factors, diagnosis, treatment, and prognosis of lung cancer (both NSCLC and SCLC) were selected. Medscape^® is a leading online global destination for physicians and healthcare professionals worldwide, offering the latest medical news and expert perspectives; essential point-of-care drug and disease information; and relevant professional education and CME.

A python code specially prepared for this study was used to transmit the questions to ChatGPT and receive the answers. The answers were obtained through English version of ChatGPT-API, supported by the "gpt-3.5-turbo" model provided by OpenAI^®. Each question was asked to ChatGPT 10 times in total, and 10 answers were obtained. The Python code was run in a single run on October 1, 2023. A total of 800 answers obtained for 80 questions were exported to a file (Supplementary Material 1) and analyzed for readability.

Readability formulas

Flesch Reading Ease (FRE) formula

Rudolph Flesch developed the Flesch Reading Ease (FRE) formula in 1948. The FRS ranges from 1 to 100, where 100 is the highest level of readability. A score of 60 is considered standard for publications targeting a general audience, and a score of 70 or more is considered easy for the average adult to read¹⁸.

flesch-kincaid grade level (FKGL)

The Flesch Reading Grade Level formula was built upon in FRE by Kincaid et al. in 1975 for the US Navy to give a grade level to written material. It is commonly referred to as the Flesch–Kincaid Grade Level (FKGL). Both FRE and FKGL calculate the readability based on two variables: average sentence length (based on the number of words) and average word length (based on the number of syllables)¹⁹.

Fog scale (gunning FOG formula)

The Gunning Fog Index is a readability formula that estimates the years of formal education required to understand a piece of text on the first reading²⁰. It is based on the average number of words per sentence and the percentage of complex words in the text. The formula calculates the grade level at which the text is written, with a higher grade level indicating more complex and difficult-to-understand text²¹.

SMOG index

The Simplified Measure of Gobbledygook (SMOG) index is a readability formula used to assess the readability of a piece of text. It estimates the years of education required to understand the text on the first reading²². The SMOG index takes into account the number of polysyllabic words in a sample of text and uses a formula to calculate the grade level at which the text is written²¹.

Automated readability index (ARI)

The Automated Readability Index (ARI) is a readability formula used to assess the readability of a piece of text. It estimates the years of education required to understand the text on the first reading. The Automated Readability Index (ARI) considers the mean number of characters per word and the mean number of words per sentence within a given text sample. By employing a specific formula, the ARI determines the grade level at which the text is composed²³.

Coleman-Liau index

The Coleman-Liau Index is a readability formula used to assess the readability of a piece of text. It estimates the years of education required to understand the text on the first reading. The Coleman-Liau Index is a metric that considers the mean number of characters per word and the mean number of sentences per 100 words within a given text sample. By employing a specific formula, this index determines the grade level at which the text is composed²⁴.

Linsear write formula

The Linsear Write Formula is a readability formula used to assess the readability of a piece of text. The metric provides an estimation of the number of years of formal education necessary to comprehend the content upon initial perusal. The Linsear Write Formula considers the presence of both simple and complex words within a given text sample, employing a specific formula to determine the grade level at which the text is written²⁵.

Dale-Chall readability score

The Dale-Chall Readability Score is a widely used formula for assessing the readability of a text. The text's grade level is determined by analyzing the frequency of complex vocabulary employed within it. This method has been utilized in numerous research endeavors to assess the comprehensibility of diverse forms of literature, encompassing materials designed for patient education, survey inquiries, and internet health-related content²⁶.

Spache readability formula

The Spache Readability Formula is a widely employed tool for evaluating the readability of written material, with a specific focus on children's literature. The text's grade level can be determined by estimating the frequency of familiar words it contains. In honor of his wife Alice Spache, G. Harry McLaughlin created the formula, which is also known as the Spache formula ²⁷.

Statistical analysis

We used a custom code written in Python (v3.9.18) to get the responses from ChatGPT. ChatGPT communication was set up with the English version of ChatGPT-API (premium version) based on the "gpt-3.5-turbo" model provided by OpenAI^®. The "textstat 0.7.3" python library was used to calculate readability formulas. Data analysis was performed on Python (v3.9.18) using Pandas (v1.4.4) and Numpy (v1.24.3) libraries. The results obtained from the study were presented using descriptive statistical methods (mean, standard deviation, minimum, and maximum).

Results

The 80 questions (with 10 iterations) on diagnosis, treatment, prognosis, and risk factors of lung cancer (both SCLC and NSCLC) were asked to ChatGPT with a Python script specific to this study. It took approximately 4 h and 7 min to obtain a total of 800 responses. The mean response time for each question was 18.52 ± 5.53 s. The fastest response was 4.26 s, while the slowest response was 97.80 s.

The shortest response given by ChatGPT to questions related to lung cancer was "How frequently is tobacco smoking the cause of non-small cell lung cancer?”. The response to this question contains 4 sentences, 33 words, and 328 characters. The longest response was to the question "How is lung cancer diagnosed?" and was 23 sentences, 250 words, and 2579 characters. The mean response length is 12.95 ± 3.76 sentences, 144.25 ± 35.73 words, and 1428.76 ± 380.58 characters.

Considering the readability of all the responses given by ChatGPT, it is seen that the mean Flesch Reading Ease, Flesch-Kincaid Grade Level, Gunning FOG Scale, SMOG Index, Automated Readability Index, Coleman-Liau Index, Linsear Write Formula, Dale-Chall Readability Score, and Spache Readability Formula scores are at a high level (mean and standard deviation: 40.52 ± 9.81, 12.56 ± 1.66, 13.63 ± 1.54, 14.61 ± 1.45, 15.04 ± 1.97, 14.24 ± 1.90, 11.96 ± 2.55, 10.03 ± 0.63 and 5.93 ± 0.50, respectively). Descriptive statistics on the readability levels of all responses can be seen in Table 1. Among the reponses given by ChatGPT to the questions, the sample responses with the highest and lowest FRE scores are given in Table 2.

Table 1 Readability level of ChatGPT responses.

Full size table

Table 2 Sample of the lowest and highest scores based on the FRE.

Full size table

Discussion

Today, many people, whether they are patients or not, receive information from alternative sources other than face-to-face meetings with physicians and health professionals. With the development of technology and especially the widespread use of the internet, studies have shown that a significant proportion of patients use the internet for health-related purposes, including seeking information about their conditions, treatment options, and medications^28,29,30. In addition, exciting developments in artificial intelligence have enabled patients and even health professionals to add a new one to the sources of health information^31,32.

ChatGPT is one of the most exciting technologies in today's technology world. The use and potential of this technology, which can produce answers by understanding the commands (questions) given, in the field of health is being investigated more and more every day. Large language models belonging to the natural language processing sub-branch of artificial intelligence can analyze and make sense of questions asked in natural spoken language and produce original answers very quickly. In our study, the API script using the "chatgpt-3.5-turbo" model answered the questions relatively quickly (mean response time was 18.52 ± 5.53 s). It is possible that improvements in processor, storage, and internet connection speeds could reduce this time even further.

The most important feature of artificial intelligence and natural language models is that they produce original responses using natural language. Although it is the nature of the system to be authentic, the authenticity of the responses produced by ChatGPT has been investigated in many studies in the literature^33,34.

The fast and unique responsiveness of ChatGPT will be useless if it cannot produce accurate and reliable answers. Especially in the field of health, ChatGPT is expected to be much more reliable. Providing false, incomplete, or misleading information through ChatGPT and similar artificial intelligence applications will significantly affect the health of patients. For example, if patients are not given accurate information about lung cancer, there may be a delay in diagnosis, and the patient may miss the chance of an early diagnosis. Moreover, inaccurate information in treatment protocols may affect the decisions of healthcare professionals who are supported by artificial intelligence applications such as ChatGPT while creating diagnosis and treatment strategies. For this reason, many studies have investigated how accurately ChatGPT can produce answers to health-related questions^35,36,37,38. Many studies have been published on how successful ChatGPT can be in exams for medical students, physicians, and health professionals^10,39,40,41. Although it has been suggested that chatGPT can be successful in medical exams, there are some studies in the literature that argue the opposite⁴².

ChatGPT's ability to produce fast and original answers that are also accurate and reliable is, of course, a great achievement. ChatGPT and many other artificial intelligence tools are used by many people of very different ages and education levels. The fact that these tools do not require any additional cost other than an internet connection and provide more natural responses allows them to be used by many people. For example, a smoker may want to investigate etiological issues related to lung cancer. Or a person whose radiology report shows a nodule or mass may want to find out the stage of his or her cancer before consulting his or her physician. In addition, of course, medical students and other health sciences students, healthcare professionals, physicians, and those who provide professional healthcare services also benefit from this service offered by artificial intelligence. As a result, there is a group with a very different level of education and age. Therefore, in a disease with a high mortality rate, such as lung cancer, it is extremely important that ChatGPT not only provides correct answers but also provides readable and understandable answers. To address this aspect of ChatGPT, we investigated several readability scores accepted in the literature.

The most commonly used formulas for readability testing are Flesch Reading Ease (FRE) and Flesch-Kincaid Reading Grade Level (FKGL). According to the FRE score, the most comprehensible response produced by the ChatGPT was at the "standard" level, while the most incomprehensible response was at the "very confusing" level (69.52 and 6.95, respectively). In FKGL, the lowest score was 7.1 and the highest score was 18.7 ("professional" level and "college graduate" level, respectively). A study of urology patients found that the readability level of ChatGPT responses was similarly low according to the FRE and FKGL formulas (median 18, 15.8; IQR 21, 3, respectively)⁴. These results show that the FRE score was very variable in the study and that the ChatGPT responses were very difficult to read. In a study of radiology reports, although FRE and FKGL levels were slightly higher (means difficult to read), they were still below the values in our study (38.0 ± 11.8 vs. 40.52 ± 9.81, 10.4 ± 1.9 vs. 12.58 ± 1.66, respectively)⁴³. Similar to the literature, the average FRE and FKGL scores found in our study indicate that the responses generated by ChatGPT are very difficult to read and can only be understood by university graduates.

The responses were found to be at the "college freshman" level according to the Gunning fog index and at the "college student" level according to the automated readability index (ARI) (13.63 ± 1.54 and 15.04 ± 1.97, respectively). According to the ARI index, the answers can only be understood by those aged 18–22 and older (maximum level). According to other readability formulas, the Coleman-Liau index and the Dale-Chall index, the responses given by ChatGPT were found to be at the "collage" level (not easy to read, difficult) (14.24 ± 1.90 and 10.03 ± 0.62, respectively). In the SMOG index, which is frequently used in the field of health, the average readability level is 14.61 ± 1.45, indicating that the texts produced by ChatGPT are quite difficult to read. In another study on urology patients, the readability scores of the texts produced by ChatGPT were evaluated, and the mean SMOG index was found to be 8.7 ± 2.1. In the same study (8th or 9th grade), the mean FKE and FKGR scores of the summary texts produced by ChatGPT were also high (means difficult to read) (56.0 ± 13.7 and 10.0 ± 2.4, respectively)⁴⁴.

Conclusions

This study has shown that the readability levels of the responses generated by ChatGPT are "collage" and above and are difficult to read. Of course, the fact that the subject we tested belongs to a high-level field such as medicine is also effective in reaching this conclusion. However, considering that many people of different age groups and educational levels use ChatGPT to get information about lung cancer, it should be considered that the readability level will be high along with the reliability of the answers given and may be misunderstood or not understood at all. Perhaps in the near future, the ChatGPT can be programmed to produce responses that are appropriate for people of different educational and age groups. It is also clear that there is a need for more extensive and advanced research on a wider range of medical topics.

Data availability

The data that support the findings of this study are included in the article/Supplementary Material. Further inquiries can be directed to the corresponding author.

References

Howlader, N. et al. The Effect of Advances in Lung-Cancer Treatment on Population Mortality. N. Engl. J. Med. [Internet] 383(7), 640–649. https://doi.org/10.1056/NEJMoa1916623 (2020).
Article CAS PubMed Google Scholar
Siegel, R. L., Miller, K. D. & Jemal, A. Cancer statistics, 2020. CA A Cancer J. Clin. [Internet] 70(1), 7–30. https://doi.org/10.3322/caac.21590 (2020).
Article Google Scholar
Sung, H. et al. Global cancer statistics 2020: GLOBOCAN Estimates of ıncidence and mortality worldwide for 36 cancers in 185 countries. CA A Cancer J. Clin. [Internet] 71(3), 209–249. https://doi.org/10.3322/caac.21660 (2021).
Article CAS Google Scholar
Cocci A, Pezzoli M, Lo Re M, Russo GI, Asmundo MG, Fode M, et al. Quality of information and appropriateness of ChatGPT outputs for urology patients. Prostate Cancer Prostatic Dis [Internet]. 2023 Jul 29 [cited 2023 Oct 5]; Available from: https://www.nature.com/articles/s41391-023-00705-y
Luitse, D. & Denkena, W. The great transformer: Examining the role of large language models in the political economy of AI. Big Data Soc. [Internet] 8(2), 205395172110477. https://doi.org/10.1177/20539517211047734 (2021).
Article Google Scholar
Buholayka, M., Zouabi, R. & Tadinada, A. Is ChatGPT ready to write scientific case reports independently? A comparative evaluation between human and artificial intelligence. Cureus https://doi.org/10.7759/cureus.39386 (2023).
Article PubMed PubMed Central Google Scholar
Liu, Y. et al. Generative artificial intelligence and its applications in materials science: Current situation and future perspectives. J. Materiomics [Internet] 9(4), 798–816 (2023).
Article Google Scholar
Arif, T. B., Munaf, U. & Ul-Haque, I. The future of medical education and research: Is ChatGPT a blessing or blight in disguise?. Med. Educ. Online [Internet] 28(1), 2181052. https://doi.org/10.1080/10872981.2023.2181052 (2023).
Article PubMed Google Scholar
Gilson, A. et al. How does ChatGPT perform on the united states medical licensing examination? The ımplications of large language models for medical education and knowledge assessment. JMIR Med. Educ. [Internet] 9, e45312 (2023).
Article PubMed Google Scholar
Gencer, A. & Aydin, S. Can ChatGPT pass the thoracic surgery exam?. Am. J. Med. Sci. [Internet] 366(4), 291–295 (2023).
Article PubMed Google Scholar
Biswas, S. ChatGPT and the future of medical writing. Radiology [Internet] 307(2), e223312. https://doi.org/10.1148/radiol.223312 (2023).
Article PubMed Google Scholar
Mondal, H., Mondal, S. & Podder, I. Using ChatGPT for writing articles for patients’ education for dermatological diseases: A pilot study. Indian Dermatol Online J. [Internet] 14(4), 482. https://doi.org/10.4103/idoj.idoj_72_23 (2023).
Article PubMed Google Scholar
Schulte B. Capacity of ChatGPT to Identify Guideline-Based Treatments for Advanced Solid Tumors. Cureus [Internet]. 2023 Apr 21 [cited 2023 Oct 5]; Available from: https://www.cureus.com/articles/149231-capacity-of-chatgpt-to-identify-guideline-based-treatments-for-advanced-solid-tumors
Walker, H. L. et al. Reliability of medical ınformation provided by ChatGPT: Assessment against clinical guidelines and patient ınformation quality ınstrument. J. Med. Internet Res. [Internet] 25, e47479 (2023).
Article PubMed Google Scholar
Hamed, E., Sharif, A., Eid, A., Alfehaidi, A. & Alberry, M. Advancing artificial ıntelligence for clinical knowledge retrieval: A case study using ChatGPT-4 and link retrieval plug-ın to analyze diabetic ketoacidosis guidelines. Cureus https://doi.org/10.7759/cureus.41916 (2023).
Article PubMed PubMed Central Google Scholar
Almazyad, M. et al. Enhancing expert panel discussions in pediatric palliative care: Innovative scenario development and summarization with ChatGPT-4. Cureus https://doi.org/10.7759/cureus.38249 (2023).
Article PubMed PubMed Central Google Scholar
Rahsepar, A. A. et al. How AI responds to common lung cancer questions: ChatGPT versus google bard. Radiology [Internet] 307(5), e230922. https://doi.org/10.1148/radiol.230922 (2023).
Article PubMed Google Scholar
Flesch, R. A new readability yardstick. J. Appl. Psychol. [Internet] 32(3), 221–233. https://doi.org/10.1037/h0057532 (1948).
Article CAS PubMed Google Scholar
Jindal, P. & MacDermid, J. Assessing reading levels of health information: Uses and limitations of flesch formula. Educ. Health [Internet] 30(1), 84. https://doi.org/10.4103/1357-6283.210517 (2017).
Article Google Scholar
Athilingam, P., Jenkins, B. & Redding, B. A. Reading level and suitability of congestive heart failure (CHF) Education in a mobile app (CHF Info App): Descriptive design study. JMIR Aging [Internet] 2(1), e12134 (2019).
Article PubMed Google Scholar
Arora, A., Lam, A. S., Karami, Z., Do, L. G. & Harris, M. F. How readable are Australian paediatric oral health education materials?. BMC Oral Health [Internet] 14(1), 111. https://doi.org/10.1186/1472-6831-14-111 (2014).
Article PubMed Google Scholar
Hamnes, B., Van Eijk-Hustings, Y. & Primdahl, J. Readability of patient information and consent documents in rheumatological studies. BMC Med Ethics https://doi.org/10.1186/s12910-016-0126-0 (2016).
Article PubMed PubMed Central Google Scholar
Mc Carthy, A. & Taylor, C. SUFE and the internet: Are healthcare information websites accessible to parents?. bmjpo 4(1), e000782 (2020).
Article Google Scholar
Azer, S. A., AlOlayan, T. I., AlGhamdi, M. A. & AlSanea, M. A. Inflammatory bowel disease: An evaluation of health information on the internet. WJG 23(9), 1676 (2017).
Article PubMed PubMed Central Google Scholar
Lambert, K., Mullan, J., Mansfield, K., Koukomous, A. & Mesiti, L. Evaluation of the quality and health literacy demand of online renal diet information. J. Hum. Nutr. Diet [Internet] 30(5), 634–645. https://doi.org/10.1111/jhn.12466 (2017).
Article CAS PubMed Google Scholar
Koo, K. & Yap, R. L. How readable Is BPH treatment ınformation on the ınternet? assessing barriers to literacy in prostate health. Am. J. Mens Health [Internet] 11(2), 300–307. https://doi.org/10.1177/1557988316680935 (2017).
Article PubMed Google Scholar
Begeny, J. C. & Greene, D. J. can readabılıty formulas be used to successfully gauge dıffıculty of readıng materıals?. Psychol. Schools [Internet] 51(2), 198–215. https://doi.org/10.1002/pits.21740 (2014).
Article Google Scholar
Wong, D. K. K. & Cheung, M. K. Online health ınformation seeking and ehealth literacy among patients attending a primary care clinic in hong kong: A cross-sectional survey. J. Med. Internet Res. [Internet] 21(3), e10831 (2019).
Article PubMed Google Scholar
Potemkowski, A. et al. Internet usage by polish patients with multiple sclerosis: A multicenter questionnaire study. Interact J. Med. Res. [Internet]. 8(1), e11146 (2019).
Article PubMed PubMed Central Google Scholar
Duymus, T. M. et al. Internet and social media usage of orthopaedic patients: A questionnaire-based survey. WJO [Internet] 8(2), 178 (2017).
Article PubMed Google Scholar
Boillat, T., Nawaz, F. A. & Rivas, H. Readiness to embrace artificial ıntelligence among medical doctors and students: Questionnaire-based study. JMIR Med. Educ. [Internet] 8(2), e34973 (2022).
Article PubMed Google Scholar
Fritsch, S. J. et al. Attitudes and perception of artificial intelligence in healthcare: A cross-sectional survey among patients. Dıgıtal Health [Internet] 8, 205520762211167. https://doi.org/10.1177/20552076221116772 (2022).
Article Google Scholar
Bhattacharya, K. et al. ChatGPT in surgical practice—a new kid on the block. Indian J. Surg. https://doi.org/10.1007/s12262-023-03727-x (2023).
Article Google Scholar
Elkhatat, A. M. Evaluating the authenticity of ChatGPT responses: A study on text-matching capabilities. Int. J. Educ. Integr. 19(1), 15. https://doi.org/10.1007/s40979-023-00137-0 (2023).
Article Google Scholar
Yeo, Y. H. et al. Assessing the performance of ChatGPT in answering questions regarding cirrhosis and hepatocellular carcinoma [Internet]. Gastroenterology https://doi.org/10.1101/2023.02.06.23285449 (2023).
Article PubMed Google Scholar
Kusunose, K., Kashima, S. & Sata, M. Evaluation of the accuracy of ChatGPT in answering clinical questions on the Japanese society of hypertension guidelines. Circ. J. [Internet] 87(7), 1030–1033 (2023).
Article PubMed Google Scholar
Suppadungsuk, S. et al. Examining the validity of ChatGPT in ıdentifying relevant nephrology literature: Findings and ımplications. JCM [Internet] 12(17), 5550 (2023).
Article PubMed Google Scholar
Samaan, J. S. et al. Assessing the accuracy of responses by the language model ChatGPT to questions regarding bariatric surgery. Obes. Surg. [Internet] 33(6), 1790–1796. https://doi.org/10.1007/s11695-023-06603-5 (2023).
Article PubMed Google Scholar
AlessandriBonetti, M., Giorgino, R., Gallo Afflitto, G., De Lorenzi, F. & Egro, F. M. How Does ChatGPT perform on the ıtalian residency admission national exam compared to 15,869 medical graduates?. Ann. Biomed. Eng. https://doi.org/10.1007/s10439-023-03318-7 (2023).
Article PubMed Google Scholar
Wang, X. et al. ChatGPT Performs on the Chinese national medical licensing examination. J. Med. Syst. 47(1), 86. https://doi.org/10.1007/s10916-023-01961-0 (2023).
Article PubMed Google Scholar
Kung, T. H. et al. Performance of ChatGPT on USMLE: Potential for AI-assisted medical education using large language models. Plos Digit Health 2(2), e0000198. https://doi.org/10.1371/journal.pdig.0000198 (2023).
Article PubMed PubMed Central Google Scholar
Weng, T. L., Wang, Y. M., Chang, S., Chen, T. J. & Hwang, S. J. ChatGPT failed Taiwan’s family medicine board exam. J. Chinese Med. Assoc. 86(8), 762–766. https://doi.org/10.1097/JCMA.0000000000000946 (2023).
Article Google Scholar
Li, H. et al. Decoding radiology reports: Potential application of OpenAI ChatGPT to enhance patient understanding of diagnostic reports. Clin. Imag. 101, 137–141 (2023).
Article Google Scholar
Eppler, M. B. et al. Bridging the gap between urological research and patient understanding: The role of large language models in automated generation of layperson’s summaries. Urol. Pract. [Internet] 10(5), 436–443. https://doi.org/10.1097/UPJ.0000000000000428 (2023).
Article PubMed Google Scholar

Download references

Funding

This research did not receive any specific grant from funding agencies in the public, commercial, or not-for-profit sectors.

Author information

Authors and Affiliations

Department of Thoracic Surgery, Faculty of Medicine, Afyonkarahisar Health Sciences University, Zafer Sağlık Külliyesi, Dörtyol Mah. 2078 Sok. No:3 A Blok Afyonkarahisar, Afyonkarahisar, Turkey
Adem Gencer

Authors

Adem Gencer
View author publications
Search author on:PubMed Google Scholar

Contributions

The author confirms sole responsibility for the following: study conception and design, data collection, analysis and interpretation of results, and manuscript preparation.

Corresponding author

Correspondence to Adem Gencer.

Ethics declarations

Competing interests

The author declares no competing interests.

Additional information

Publisher's note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

Supplementary Information.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Gencer, A. Readability analysis of ChatGPT's responses on lung cancer. Sci Rep 14, 17234 (2024). https://doi.org/10.1038/s41598-024-67293-2

Download citation

Received: 11 April 2024
Accepted: 09 July 2024
Published: 26 July 2024
DOI: https://doi.org/10.1038/s41598-024-67293-2

Keywords

This article is cited by

Areas of research focus and trends in the research on the application of AIGC in healthcare
- Chen Wang
- Yingying Zhu
- Huiying Qi
Journal of Health, Population and Nutrition (2025)
An academic evaluation of ChatGpt’s ability and accuracy in creating patient education resources for rare cardiovascular diseases
- Samet Sevinç
- Mustafa Candemir
- Asife Şahinarslan
Scientific Reports (2025)
Evaluation of ChatGPT-4 responses on physical activity guidance in children with cystic fibrosis: reliability, quality, and readability
- Zeliha Çelik
- Fulden Sari
European Journal of Pediatrics (2025)