Introduction

The integration of artificial intelligence (AI) in healthcare has significantly improved the speed and accuracy of disease diagnosis, revolutionizing the industry1. Computer-Aided Diagnosis (CAD) systems, particularly in fields like radiology, cardiology, and oncology, have shown great promise in detecting conditions such as cancer, heart diseases, and stroke2,3. However, despite advancements in diagnostic accuracy, patient engagement and experience remain secondary considerations4. Current CAD systems predominantly focus on clinical data analysis without offering compassionate, patient-friendly communication during the diagnosis process. This lack of personalized interaction can lead to confusion, stress, and dissatisfaction among patients5.

Recent developments in natural language processing (NLP), specifically conversational AI models like GPT, have emerged as potential solutions to bridge the communication gap between CAD systems and patients6. We propose CareAssist-GPT, an innovative AI model that enhances patient experience by integrating diagnostic aid with personalized, real-time communication7. Unlike conventional CAD systems that rely solely on clinical outputs, CareAssist-GPT simplifies medical jargon, providing patient-friendly explanations that foster trust and understanding8.

Figure 1 illustrates a typical CAD system used for pneumonia diagnosis, demonstrating its sequential approach: lung segmentation from CT scans, followed by nodule detection and segmentation, and advanced analyses such as shape and growth rate estimation9. While this systematic process aids in identifying benign or malignant formations, it fails to address the patient’s need for clear, understandable feedback10. Diagnostic errors account for roughly 10% of global deaths, highlighting the urgent need for improved diagnostic tools that also consider the patient’s perspective11.

Fig. 1
figure 1

A typical CAD system for pneumonia diagnosis. The system involves multiple stages, starting with lung segmentation from CT scan images taken at two-time intervals. This is followed by nodule detection and segmentation to locate potential growths the lungs. Advanced analyses, such as shape analysis, growth rate estimation, and appearance analysis, are subsequently applied to the segmented nodules to assess abnormalities, ultimately leading to a diagnostic output.

The integration of artificial intelligence (AI) into healthcare has revolutionized diagnostic processes, offering unprecedented levels of accuracy and efficiency12. However, a significant gap persists between the technical capabilities of AI systems and the need for patient-centered care, which diminishes the full potential of AI in healthcare12. While AI-driven computer-aided diagnosis (CAD) systems excel in processing complex medical data, they often fail to address the emotional and communicative needs of patients, leading to disengagement, anxiety, and a lack of trust in the diagnostic process13. This gap highlights the critical need for AI models that not only improve diagnostic outcomes but also foster meaningful patient interactions.

Conversational AI models, such as GPT, have emerged as promising tools for addressing these challenges, particularly in scenarios requiring continuous patient engagement, such as chronic pain management, diabetes care, and mental health disorders14,15. Research by McKinsey suggests that conversational AI can reduce diagnostic errors by up to 30% while significantly enhancing patient satisfaction scores16. Despite these advancements, the integration of GPT-based systems into CAD tools to simultaneously enhance diagnostic accuracy and patient communication remains underexplored17. Existing CAD systems primarily focus on technical performance, often neglecting the human element of healthcare delivery, which is crucial for building patient trust and ensuring long-term engagement18.

This paper introduces CareAssist-GPT, a novel AI-assisted diagnostic model designed to bridge the gap between technical efficiency and patient-centered care. Unlike traditional CAD systems, CareAssist-GPT combines advanced diagnostic capabilities with natural language communication, enabling real-time, patient-friendly explanations and feedback. By integrating multimodal data sources—such as high-resolution X-ray images, real-time vital signs, and clinical notes—into a unified predictive framework, CareAssist-GPT achieves superior diagnostic accuracy while maintaining high specificity, precision, and recall. Furthermore, its conversational interface simplifies complex medical information, empowering patients to better understand their conditions and fostering a sense of involvement in their healthcare journey.

The development of CareAssist-GPT addresses several critical limitations of existing CAD systems. First, it overcomes the lack of patient engagement by incorporating interpersonal communication into the diagnostic process, ensuring that patients feel heard and understood19. Second, it provides a comprehensive roadmap for implementing conversational AI in clinical settings, balancing technological advancements with emotional and psychological patient needs20. Finally, CareAssist-GPT demonstrates how AI can be aligned with the principles of patient-centered care, offering a model that not only enhances diagnostic reliability but also builds patient trust and satisfaction21.

In modern healthcare, the emphasis on technical efficiency often overshadows the importance of patient-provider communication. Studies have shown that patients who feel informed and involved in their care are more likely to adhere to treatment plans and report higher satisfaction levels. However, traditional CAD systems, while effective in processing data, lack the ability to translate complex medical jargon into accessible language, leaving patients feeling alienated. CareAssist-GPT addresses this issue by leveraging the natural language processing (NLP) capabilities of GPT to provide clear, empathetic, and personalized explanations. This approach not only improves patient understanding but also strengthens the therapeutic alliance between patients and healthcare providers.

Moreover, the rapid adoption of AI in healthcare has raised concerns about the dehumanization of medicine, with critics arguing that over-reliance on technology may erode the patient-provider relationship. CareAssist-GPT counters this narrative by positioning AI as a complementary tool that enhances, rather than replaces, human interaction. By automating routine diagnostic tasks and providing real-time insights, the model allows healthcare providers to focus on building rapport and addressing the emotional needs of their patients. This dual focus on technical and interpersonal excellence positions CareAssist-GPT as a transformative solution for modern healthcare challenges.

In this paper, we assess the limitations of current CAD systems and explore the transformative potential of conversational AI in healthcare. We present CareAssist-GPT as a solution to the challenges of integrating AI-driven diagnostics with effective patient communication, offering a system that meets both technical and emotional needs. By doing so, we aim to pave the way for a new generation of AI tools that prioritize both accuracy and empathy, ultimately enhancing the overall healthcare experience for patients and providers alike. The purpose of this study is to create and assess CareAssist-GPT: a patient-oriented AI system that can improve the diagnostic outcomes of CAD by engaging patients in natural language conversations that simplify the medical information and encourage patients’ active participation.

  • This is to develop a novel patient-centred artificial intelligence based diagnostic model (CareAssist-GPT) that combines NLP with the current CAD systems for enhancing the real-time interaction and communication with the patients.

  • To assess the device’s efficacy in improving patient comprehension and satisfaction especially in relation to the CareAssist-GPT tool designed to translate medical diagnosis in simple and patient friendly language.

  • To assess the diagnostic performance of CareAssist-GPT against conventional CAD tools in different healthcare environments, and to examine areas of enhanced diagnostic effectiveness and patient interaction.

  • In the following study, using CareAssist-GPT in CAD systems, the ethical considerations and data privacy issues will be discussed and potential recommendations for the safe and responsible application of conversational AI in clinical practice will be provided.

This research proposes CareAssist-GPT, an AI system that builds upon CAD for improving patient interaction and assesses CAD’s ability to increase both diagnostic accuracy and patient satisfaction. The authors also consider the ethical issues mainly present a roadmap on how to advance and implement conversational AI ethically in clinical practice.

  1. 1.

    Introducing the concept of CareAssist-GPT, a conversational AI model that combines NLP with CAD systems to deliver diagnostic results with a real-time explanation.

  2. 2.

    Implementation of patient-centric model for CAD, where patient interaction is improved by replacing complicated medical jargons with plain language and providing easy to use interface for diagnosis.

  3. 3.

    Comparison of the diagnostic performance of CareAssist-GPT with that of conventional CAD systems, showing that the system could achieve similar or even better performance to existing tools while increasing patient satisfaction.

  4. 4.

    Exploration of patient satisfaction indexes concerning how conversational feedback, delivered in real time by informational AI systems, enhances patients’ comprehension of their managing treatment plans and how it influences their overall healthcare journey.

  5. 5.

    Providing ethical and privacy implications for the operationalization of conversational AI in the domain of health care and presenting precautions regarding the implementation of conversational AI.

This paper is divided into five sections. It starts with a brief presentation of CareAssist-GPT and a literature survey of the existing CAD systems, the approach used for designing and evaluating the model is also described. The results section of the paper also presents the comparison between CareAssist-GPT and traditional systems, followed by the conclusion, limitation, and recommendation section, and future studies section22.

Problem statement and contributions

Despite significant advancements in AI-driven diagnostic systems, several critical issues remain unresolved in current clinical applications. First, most Computer-Aided Diagnosis (CAD) systems are designed with a purely technical focus, overlooking the importance of patient engagement and understandable communication. Second, the interpretability of AI-generated diagnoses remains limited, leading to reduced trust and usability among both clinicians and patients. Third, while multimodal patient data—such as imaging, physiological signals, and clinical narratives—are often available, existing models fail to integrate these sources effectively in real time. These challenges hinder the deployment of AI in patient-centered care and create a clear need for solutions that combine diagnostic precision with transparency and user-centered communication.

To address these challenges, this paper introduces CareAssist-GPT, a patient-centered diagnostic framework that integrates multimodal data fusion with a conversational AI interface. The key contributions of this research are as follows:

  • Development of CareAssist-GPT, a multimodal diagnostic model that integrates high-resolution chest X-ray images, real-time vital signs, and clinical text records to form a unified predictive framework.

  • Design of a patient-centric conversational interface that utilizes transformer-based NLP to translate complex diagnostic results into understandable, empathetic feedback in real time.

  • Demonstration of improved diagnostic performance, with CareAssist-GPT achieving a diagnostic accuracy of 95.8%, precision of 94.3%, recall of 93.8%, and a patient satisfaction score of 9.3 out of 10.

  • Comprehensive evaluation across multiple metrics, including specificity, response time, AUC-ROC (0.97), interpretability, and usability, to assess both clinical reliability and patient experience.

  • Implementation of ethical safeguards, such as GDPR/HIPAA compliance, anonymized data processing, and model fairness testing, ensuring responsible deployment of AI in healthcare.

The remainder of this paper is structured as follows: Section “Literature review” presents a detailed review of the related work, highlighting key advancements and identifying existing research gaps in multimodal AI diagnostics and conversational healthcare systems. Section “Methodology” outlines the methodology employed for designing, developing, and optimizing the CareAssist-GPT framework, including data preprocessing, model architecture, and training procedures. Section “Results and discussions” discusses the experimental results, performance metrics, comparative analysis with existing models, and patient satisfaction outcomes. Section “Conclusions” examines the study’s limitations, ethical considerations, and privacy safeguards implemented in the model. Finally, Sect. “Conclusions” concludes the paper by summarizing key findings, discussing practical implications, and proposing directions for future research.

Literature review

Artificial intelligence (AI) is one of the fastest-growing technologies impacting nearly every facet of healthcare, from diagnosis to patient care and clinical workflows. Among various AI advancements, language models like ChatGPT have emerged as effective tools for engaging patients, providing diagnostic support, and facilitating clinical decision-making. Given the increasing focus on patient-centered care and data-driven diagnostics, there is a pressing need for intelligent, conversational AI systems in healthcare. This review synthesizes the existing literature on the application of ChatGPT in healthcare, comparing its strengths and limitations while identifying gaps and opportunities for future research.

Enhancing patient communication and engagement

Patient communication is a critical aspect where ChatGPT has shown significant potential. Amin et al. highlighted its ability to simplify complex medical terms in radiology reports, enhancing patient comprehension and engagement2. Similarly, Badsha et al. explored the use of ChatGPT in rheumatology, demonstrating its capability to offer differential diagnoses in alignment with expert opinions, thus aiding clinicians in patient management12. Javaid et al. further emphasized the model’s utility in personalized patient interactions, noting its effectiveness in making timely recommendations15. However, while these studies support the use of ChatGPT for enhancing patient understanding, there are differences in their focus. Amin et al. stress patient comprehension in technical fields like radiology, while Javaid et al. highlight broader, patient-centered care applications. This suggests a need for a unified framework that can adapt ChatGPT’s communication capabilities across various medical specialties.

Clinical utility and workflow optimization

The literature underscores the utility of ChatGPT in optimizing clinical workflows and supporting decision-making processes. Krishnan et al. argued that ChatGPT contributes to a more sustainable healthcare model by reducing resource consumption and improving efficiency16. Garg’s analysis demonstrated how ChatGPT synthesizes patient data and manages communication, transforming it into a patient-facing tool that enhances overall user experience17. In emergency and surgical settings, Rao et al. found that ChatGPT could improve clinical outcomes by providing quick and reliable decision support19. These findings collectively indicate that ChatGPT can streamline workflows across various clinical environments. However, the degree of its effectiveness may vary depending on the complexity of the clinical setting and the quality of data inputs, which raises questions about the scalability and reliability of these AI models in diverse healthcare scenarios.

Advances in computer-aided diagnosis (CAD) and deep learning integration

The shift from traditional machine learning (ML) to deep learning (DL) in CAD systems has been transformative, allowing for greater accuracy and the ability to handle large datasets. Guetari et al. compared conventional ML approaches with modern DL methods, highlighting ChatGPT’s capacity for handling complex, multimodal data8. Hu et al. demonstrated the application of ChatGPT in medical imaging, where it effectively provided contextual descriptions, aiding in image interpretation and diagnosis9. This suggests that integrating ChatGPT with deep learning frameworks can enhance diagnostic performance, particularly in real-time scenarios. However, the robustness of these systems is still limited by factors such as noisy or incomplete data, which can lead to diagnostic errors. This technical limitation underscores the need for improved data preprocessing and model explainability in healthcare applications.

Cross-specialty applicability of ChatGPT in healthcare

ChatGPT’s versatility across medical specialties is well-documented. Lima et al. explored its use in orthodontics, showing how it aids in diagnosis and treatment planning by providing clear explanations to patients12. Smith and Green examined its application in dermatology, demonstrating that ChatGPT reduces assessment time while increasing diagnostic accuracy23. Yeasmin et al. highlighted the model’s effectiveness in deep learning-based image diagnosis, particularly in explaining complex findings due to its conversational abilities13. Despite these successes, there is a noticeable gap in studies exploring the integration of ChatGPT across multiple clinical departments simultaneously. Addressing this gap would require overcoming challenges related to data interoperability, clinical workflows, and the adaptation of AI systems to specialty-specific needs.

Ethical considerations and patient safety

The integration of AI in healthcare introduces significant ethical challenges. Sallam raised concerns about data privacy and the potential for biases in AI algorithms, which can negatively impact patient outcomes22. Shahsavar and Choudhury pointed out the risks of over-reliance on AI for self-diagnosis, cautioning that unsupervised use could lead to severe misdiagnoses18. These ethical issues highlight the need for balanced AI-human collaboration to ensure safe and effective healthcare delivery. Real-world scenarios have already shown biases in AI models affecting patient care, such as disparities in diagnostic accuracy across different demographic groups. The regulatory landscape for AI in healthcare is still evolving, with limited frameworks addressing these concerns. Thus, future research should focus on developing robust guidelines and ethical protocols to mitigate these risks.

Long-term implications of AI-driven communication tools

While immediate benefits of ChatGPT, such as improved patient engagement and diagnostic support, are evident, its long-term impact on doctor-patient relationships remains uncertain. The literature has not sufficiently explored how the widespread use of AI-driven communication tools might alter patient trust over time or affect clinicians’ roles in patient care. As patients become more reliant on AI for medical advice, there is a risk of diminishing the human element of healthcare, potentially undermining the traditional doctor-patient relationship. This shift could lead to a decrease in interpersonal communication skills among clinicians, emphasizing the need for training programs that integrate AI tools without compromising human interaction.

Literature summary

Recent advancements in AI-assisted medical diagnostics have leveraged multimodal data integration to enhance clinical decision-making, patient engagement, and predictive accuracy. Zhao et al.24 explored deep multimodal fusion techniques for medical diagnosis, highlighting challenges and future directions, while Wang et al.25 proposed transformer-based multimodal learning to automate radiology report generation, demonstrating improved interpretability and diagnostic consistency. The role of AI-driven personalized medicine was emphasized by Jiang et al.26, who integrated multimodal data for improved diagnostics and treatment planning, reinforcing the significance of real-time patient engagement models as proposed by Chen et al.27. Several studies have also advanced deep learning in medical imaging, with Xu et al.28 focusing on multimodal data fusion for enhanced medical imaging analysis and Su et al.29 utilizing machine learning and bioinformatics analysis for colon cancer staging and diagnosis. Moreover, Zeng et al.30 explored the application of convolutional neural networks (CNNs) with Raman spectroscopy for rapid breast cancer classification, illustrating the potential of AI-driven spectroscopic analysis. In the field of electronic health record (EHR) processing, Li et al.31 introduced LI-EMRSQL, a text-to-SQL model enhancing structured query parsing on complex EMR datasets, facilitating automated clinical decision support. Privacy concerns in AI-driven healthcare were addressed by Zhang et al.32, who introduced age-dependent differential privacy mechanisms to balance data security and utility in medical AI models. Additionally, Jung et al.33 conducted a meta-analysis on AI-based fracture detection, comparing image modalities and data types to assess performance variations across different AI frameworks. Beyond diagnostics, Pan and Xu34 explored human–machine plan conflicts in visual search tasks, contributing to human-AI collaboration frameworks in medical imaging and diagnostics. These studies collectively underscore the evolving landscape of AI-powered diagnostic systems, emphasizing multimodal learning, patient-centric engagement, medical imaging innovations, EHR integration, and privacy-aware AI frameworks, ultimately driving next-generation AI applications in healthcare. Recent studies such as XEMLPD35 and PD_EBM36 emphasize the increasing importance of explainable ensemble learning techniques in the diagnosis of neurodegenerative diseases, particularly Parkinson’s disease. The XEMLPD model integrates multiple machine learning classifiers and applies a voting mechanism, enhanced with optimized feature selection, to improve diagnostic accuracy while maintaining interpretability. Similarly, PD_EBM introduces an integrated boosting-based approach that leverages selective features and provides both global and local explanations for its predictions, ensuring that clinicians and patients can understand how decisions are made at both the system-wide and individual levels. These models demonstrate that incorporating explainability alongside high-performance algorithms not only boosts trust in AI-driven diagnoses but also aids clinicians in validating model recommendations through interpretable outputs. Their frameworks align closely with current demands in medical AI development—especially the need for transparency, accountability, and decision traceability. This trend reinforces the critical role of explainability and selective feature learning in clinical decision support systems, paving the way for safer, more ethical, and widely acceptable deployment of AI in healthcare.

Research gaps and future directions

While existing studies demonstrate the value of ChatGPT in isolated applications such as radiology, dermatology, and rheumatology, there is a clear lack of integrated systems that combine conversational AI with multimodal clinical diagnostics in real-time settings. Most prior approaches focus on either diagnostic performance or patient communication—not both. Furthermore, current literature seldom explores cross-specialty applicability, explainability, and ethical handling of multimodal data. No existing model holistically addresses how to deliver interpretable, patient-friendly, and accurate diagnoses using AI across diverse healthcare contexts. CareAssist-GPT fills this critical gap by integrating imaging, vital signs, and clinical notes into a unified diagnostic pipeline with real-time GPT-powered communication.

Although ChatGPT shows promise in various healthcare applications, its integration into real-time, interdisciplinary clinical settings remains limited. Challenges include technical barriers, such as handling incomplete data and ensuring model explainability, as well as organizational issues like aligning AI systems with existing health IT infrastructure. There is a need for comprehensive studies to establish unified protocols for implementing ChatGPT across multiple specialties. Additionally, exploring the role of AI in under-researched areas such as chronic disease management and mental health support could broaden its application scope.

Despite the promising applications of AI in healthcare, several critical gaps remain unaddressed, limiting the full potential of AI-driven diagnostic models. One of the most significant challenges is the integration of multimodal AI models for real-time patient interaction, which is still in its early stages. Current AI-based healthcare solutions primarily focus on improving diagnostic accuracy, but they often fail to incorporate context-aware, patient-specific explanations that enhance comprehension and engagement. AI-generated diagnoses must not only be precise but also transparent and interpretable, ensuring that both clinicians and patients can understand the rationale behind medical recommendations. However, limited research has been conducted on optimizing AI models for dynamic, real-time patient communication, making it difficult for AI to replicate the personalized approach of human healthcare providers. Furthermore, many AI-driven diagnostic systems are designed for structured clinical environments, with limited adaptability to diverse healthcare settings, where patient needs and communication preferences vary significantly. This gap underscores the need for adaptive AI systems that personalize interactions based on patient demographics, cognitive abilities, and medical literacy levels.

Another critical challenge is the lack of explainability and trust in AI-driven diagnoses. While deep learning models demonstrate high accuracy, their black-box nature raises concerns about interpretability, ethical decision-making, and regulatory compliance. Patients and healthcare professionals are often hesitant to rely on AI-generated recommendations without clear justifications for predictions. Advancements in explainable AI (XAI) techniques, such as attention-based visualizations, feature attribution, and interpretable neural architectures, can enhance transparency and improve clinician confidence in AI-supported decision-making. Additionally, future AI models should incorporate real-time physician feedback and patient responses to continuously refine their decision-making process. Leveraging self-supervised learning (SSL) and reinforcement learning (RL), AI can adapt its responses based on evolving clinical guidelines, patient history, and real-world healthcare outcomes. These improvements will be crucial in making AI-driven healthcare systems not only accurate and efficient but also patient-centered, trustworthy, and adaptable to diverse medical environments.

Comparison with other AI technologies in healthcare

While ChatGPT has distinct advantages in natural language processing and patient interaction, it is essential to compare it with other AI models like computer vision systems for diagnostics or predictive analytics models. Traditional machine learning algorithms often excel in diagnostic accuracy, particularly in structured tasks such as medical image analysis. However, they lack the conversational and interpretative capabilities of ChatGPT, which makes it uniquely suited for patient-facing applications. Understanding these differences can help determine the most appropriate AI tools for specific healthcare needs, optimizing both patient outcomes and operational efficiency. Table 1 shows the Comparison of ChatGPT and AI Applications in Healthcare. Table 1 provides a comparative overview of various studies examining the use of ChatGPT and other AI models across different healthcare applications. The table highlights key studies, their focus areas, the AI models used, the specific applications targeted, and the main findings.

Table 1 Comparison of ChatGPT and AI applications in healthcare.

The review of the literature shows that there has been progress in using ChatGPT and AI in the health sector mainly in diagnosis, patient interaction and information sharing. However, a research gap has been identified in the absence of the integration of ChatGPT in real-time, interdisciplinary, and multiple clinical specialties. Even though there are some works dedicated to the usage of AI in particular domains such as radiology, orthodontics, and telemedicine, there are no sufficient investigations on the integration of AI LM into the multifaceted healthcare process of multiple departments. Moreover, there are other issues, including ethical issues such as data privacy11, bias in AI, and patients relying entirely on AI for diagnoses, have been recognized, to some extent, but more attention has not been paid to their long-term outcomes and regulatory measures. The suggested future research should aim at the following tasks: to establish the unified protocols for AI integration into the healthcare systems; to elaborate the efficient strategies for AI ethical policies3.

Data processing

To ensure optimal model performance and enhance diagnostic accuracy, multiple preprocessing techniques were applied to the multimodal dataset before feeding it into the AI framework. X-ray images were processed using Contrast-Limited Adaptive Histogram Equalization (CLAHE) to improve the visibility of lung abnormalities, ensuring better contrast enhancement while preserving fine-grained medical details. To handle variability and noise commonly present in real-world vital sign time-series data, a low-pass filtering technique was applied to eliminate high-frequency noise, preserving only clinically relevant physiological trends. Additionally, missing data points were imputed using the Kalman smoothing algorithm, which leverages temporal dependencies to accurately estimate and restore incomplete sequences. These preprocessing steps enhance the robustness and reliability of the model’s input, ensuring that downstream feature extraction captures meaningful and consistent patterns. For clinical text records, an advanced Named Entity Recognition (NER) technique was implemented to extract key medical concepts, such as disease names, symptoms, and prescribed treatments, ensuring that relevant information was structured and easily interpretable. These text records were further tokenized using BioBERT embeddings, a domain-specific adaptation of BERT designed for biomedical applications, to enhance semantic representation. Once preprocessed, the different data modalities were synchronized and integrated into a feature fusion network, where X-ray image features were extracted using Convolutional Neural Networks (CNNs), time-series features were captured using Recurrent Neural Networks (RNNs), and clinical text embeddings were generated using transformer-based architectures. These extracted features were subsequently concatenated into a shared latent space, allowing the AI system to leverage complementary information from multiple sources for more accurate and context-aware diagnostic predictions. This integrated approach enhances the system’s ability to detect complex patterns across modalities, making CareAssist-GPT a robust AI-powered decision-support tool for clinical diagnostics.

CareAssist-GPT’s architecture enables seamless adaptation across a wide range of medical disciplines due to its powerful integration of multimodal data fusion and advanced natural language processing capabilities. While its performance in pneumonia diagnosis has been validated, the model is equally suited for applications in fields such as oncology, where it can interpret imaging data alongside pathology reports; endocrinology, where it can analyze hormone panels, glucose trends, and patient history; and mental health, where contextual analysis of clinical notes and patient communication is crucial. The model’s conversational interface allows for continuous patient engagement, making it effective in chronic disease management scenarios like diabetes and hypertension, where patients require regular updates and education. In telehealth, CareAssist-GPT can assist remote consultations by interpreting symptoms, summarizing prior medical history, and generating simplified explanations for patients. Furthermore, in rehabilitation settings, it can help track recovery progress by integrating physiotherapy logs, sensor data, and patient-reported outcomes. This versatility positions CareAssist-GPT as a truly cross-disciplinary AI solution that bridges high technical performance with human-centric, context-aware interaction across multiple clinical domains.

Methodology

This section outlines the methodological approach used to design, train, and evaluate CareAssist-GPT, a patient-centered AI diagnostic system. The proposed model is built upon a multimodal data framework that integrates high-resolution chest X-ray images, real-time physiological vital signs, and unstructured clinical text records to form a unified, context-aware diagnostic pipeline. The model architecture combines convolutional neural networks (CNNs) for spatial feature extraction from imaging data, gated recurrent units (GRUs) for temporal analysis of time-series vital signs, and transformer-based natural language processing (NLP) modules for semantic parsing of clinical notes.

Each modality undergoes a dedicated preprocessing pipeline to enhance input quality and consistency: X-ray images are normalized and augmented, vital sign data is smoothed and standardized using Kalman filtering and z-score normalization, and clinical text is cleaned, tokenized, and embedded using BioBERT. The extracted features from each modality are then fused into a shared latent space using a fully connected integration layer, enabling the model to capture cross-modal interactions relevant to clinical diagnosis.

The final prediction layer consists of dense units with sigmoid activation, optimized using cross-entropy loss with L2 regularization to reduce overfitting. The training process employs mini-batch gradient descent with the Adam optimizer, early stopping, dropout layers, and hyperparameter tuning to achieve generalization across diverse patient profiles. This integrated design allows CareAssist-GPT to deliver interpretable, accurate, and real-time diagnostic predictions while supporting personalized patient communication through its conversational interface. Figure 2 shows the Workflow of the CareAssist-GPT Model: Training and Testing Phases.

Fig. 2
figure 2

Workflow of the CareAssist-GPT Model: Training and Testing Phases.

Figure 2 illustrates the comprehensive workflow of the CareAssist-GPT model, detailing both the training and testing phases. The process begins with data collection, integrating three primary types: X-ray images, vital signs, and clinical text records. During the training phase, data preprocessing is performed, including image normalization, text tokenization, and time-series standardization. The multimodal data inputs are then fed into specialized neural network components: convolutional neural networks (CNNs) for image analysis, recurrent neural networks (RNNs) for time-series data, and transformer-based models for text analysis. Features extracted from each component are fused in a multimodal integration layer, followed by a dense layer for final classification. The model undergoes hyperparameter tuning and cross-validation to optimize performance. In the testing phase, new patient data is processed similarly, and the trained model provides diagnostic predictions alongside patient-friendly explanations. This streamlined workflow highlights the integration of multimodal data and real-time interpretability, aiming to enhance diagnostic accuracy and patient engagement.

This section outlines the comprehensive methodology employed in developing, training, and evaluating the CareAssist-GPT model. The approach focuses on integrating multimodal data, optimizing model performance, and ensuring robust, patient-centered AI applications in healthcare diagnostics.

The dataset used for this study was meticulously compiled from diverse medical sources, comprising three primary subsets: high-resolution chest X-ray images, vital signs time-series data, and clinical text records (https://www.kaggle.com/datasets/nih-chest-xrays/data). The image data included labeled samples of both normal and pneumonia-affected lungs, providing a rich basis for training the convolutional layers of the model. The vital signs dataset encompassed real-time physiological parameters such as heart rate, respiratory rate, blood pressure, and oxygen saturation, crucial for temporal analysis using recurrent neural networks (RNNs). Clinical text records were gathered from electronic health records (EHRs), consisting of free-text notes that capture patient symptoms, medical histories, and treatment plans. This multimodal dataset ensured a comprehensive input representation, allowing the model to interpret diverse health indicators holistically.

Data preprocessing was a critical step to enhance the quality and uniformity of the inputs. For image data, all chest X-rays were resized to a standard dimension of 224 × 224 pixels and normalized to scale pixel intensity values between 0 and 1, improving convergence during training. Image augmentation techniques were extensively applied to increase the diversity of training samples, including random rotations, horizontal and vertical flips, brightness adjustments, and zooming. These augmentations mitigated overfitting and improved the model’s robustness to variations in medical imaging. Time-series data were preprocessed through z-score normalization, standardizing the physiological parameters to a common scale, thus preventing any single feature from dominating the model’s learning process. Text data underwent a detailed preprocessing pipeline, including tokenization, stop-word removal, lemmatization, and synonym replacement. This approach enriched the semantic diversity of clinical notes, allowing the natural language processing (NLP) component of the model to better capture medical context.

Fusion strategy Features from X-ray images, vital signs, and clinical notes are extracted separately and then combined using a late fusion strategy, where all modality-specific outputs are concatenated and passed through fully connected layers. This enables the model to learn joint representations and improve diagnostic accuracy by leveraging complementary information across data types.

The dataset was split into training, validation, and testing sets in a 70:15:15 ratio, respectively. We employed fivefold cross-validation to assess the generalizability of the model and reduce the likelihood of bias stemming from any particular data split. This validation strategy ensured that the model’s performance was evaluated across multiple subsets, enhancing its robustness. Handling class imbalance was crucial, as the dataset had a lower representation of pneumonia cases compared to normal cases. To address this, we implemented Synthetic Minority Over-sampling Technique (SMOTE), which generated synthetic samples for the minority class. Additionally, the loss function was adjusted using class weights, assigning higher penalties to misclassifications of the minority class, thereby improving the model’s sensitivity and reducing bias.

The core architecture of CareAssist-GPT integrates convolutional neural networks (CNNs), recurrent neural networks (RNNs), and transformer-based models, creating a multimodal fusion layer that aggregates features from image, time-series, and text data. The CNN component processes the X-ray images through several convolutional and pooling layers, extracting deep spatial features indicative of lung pathology. The RNN component, specifically utilizing gated recurrent units (GRUs), captures temporal dependencies in the vital signs data, identifying trends and anomalies critical for real-time health assessments. The NLP component leverages transformer architecture for parsing clinical notes, extracting key medical insights, and contextualizing patient information. The multimodal fusion layer concatenates these extracted features, which are then fed into a fully connected dense layer for final diagnosis prediction. This fusion strategy enhances the model’s capability to draw a comprehensive understanding of the patient’s health status by integrating heterogeneous data sources effectively.

Hyperparameter tuning was conducted using a combination of grid search and random search techniques to find the optimal settings for model training. Key parameters tuned included the learning rate, batch size, dropout rate, and optimizer choice. The learning rate was tested across a logarithmic scale from \({10}^{-5}\) to \({10}^{-3}\), while batch sizes of 16, 32, and 64 were evaluated to balance training speed and stability. The dropout rate, set between 0.2 and 0.5, was optimized to prevent overfitting, particularly in dense and convolutional layers. We experimented with various optimizers, including Adam, SGD, and RMSprop, ultimately selecting Adam due to its superior convergence properties and adaptability to complex gradients in multimodal data processing.

Overfitting prevention and generalization were key concerns addressed through multiple strategies. Dropout layers were incorporated after each convolutional and fully connected layer to randomly deactivate neurons during training, reducing model complexity and improving generalization. Early stopping was employed based on validation loss monitoring, halting training when the model’s performance plateaued, and thus preventing excessive training and potential overfitting. Additionally, L2 regularization was applied to the loss function, penalizing large weights and ensuring a smoother decision boundary, which further enhanced the model’s ability to generalize across unseen data.

The training process was carried out on a high-performance computing setup, utilizing NVIDIA A100 GPUs with 80 GB of VRAM, supported by 64-core Intel Xeon CPUs and 512 GB of RAM. This hardware configuration enabled efficient processing of the large, multimodal dataset and accelerated the training cycle, significantly reducing the time required for model convergence. The computational resources allowed for parallel processing of image, text, and time-series data, optimizing the end-to-end workflow.

Ethical considerations were paramount in the development of CareAssist-GPT, especially given its deployment in healthcare settings. We prioritized patient data privacy by adhering to regulations such as the General Data Protection Regulation (GDPR) and Health Insurance Portability and Accountability Act (HIPAA), ensuring all data was anonymized and securely stored. The model was rigorously tested for biases, particularly in terms of demographic disparities, by conducting subgroup analyses. Where discrepancies were identified, mitigation strategies such as rebalancing the training data and adjusting class weights were implemented. Transparency was also enhanced by incorporating an interpretability module, providing clinicians with clear, understandable explanations for each diagnostic decision made by the model, thus increasing trust and aiding in clinical decision-making.

Despite its strong performance, CareAssist-GPT has several limitations. The dataset, while comprehensive, may not fully encompass rare medical conditions, potentially limiting the model’s applicability in such scenarios. Additionally, integrating the model into existing clinical workflows presents challenges, particularly with legacy health information systems that may lack compatibility with modern AI technologies. Future work will focus on expanding the dataset to include a wider variety of medical conditions and demographics, enhancing the model’s generalizability. Furthermore, efforts will be directed towards improving interpretability through advanced explainable AI (XAI) methods and exploring seamless integration pathways for real-time clinical deployment.

In summary, the methodological framework of CareAssist-GPT emphasizes a balanced approach, integrating robust data preprocessing, advanced model architecture, rigorous hyperparameter tuning, and ethical considerations. These elements collectively contribute to the model’s superior diagnostic accuracy, patient engagement, and potential for real-world clinical application, positioning CareAssist-GPT as a significant step forward in the AI-driven transformation of healthcare diagnostics.

Problem formulation

Training loss minimization for GPT-based diagnosis models

This problem relates to the training of the CareAssist-GPT model by reducing the loss function. The goal is to minimize the gap between the model’s prediction and real diagnosis while at the same time being able to make predictions on unseen patient data. The complexity of the model varies by having other parameters such as, the learning rate, dropout rate, weight decay, and other techniques in which regularizations has to be controlled in order to achieve its best performance.

Let \({\text{y}}_{{\text{i}}}\) denote the model’s prediction for the \({\text{i}}\)-th diagnosis and \({\text{y}}_{{\text{i}}}\) denote the true diagnosis. We aim to minimize the loss function \({\text{L}}\left( {\uptheta } \right)\), where \({\uptheta }\) represents the model parameters. For binary classification (e.g., diagnosis: disease/no disease), the cross-entropy loss is used, with regularization to avoid overfitting:

$${\text{L}}\left( {\uptheta } \right) = - \frac{1}{{\text{N}}}\mathop \sum \limits_{{{\text{i}} = 1}}^{{\text{N}}} \left[ {{\text{y}}_{{\text{i}}} {\text{log}}\left( {{\text{y}}_{{\text{i}}} } \right) + \left( {1 - {\text{y}}_{{\text{i}}} } \right){\text{log}}\left( {1 - {\text{y}}_{{\text{i}}} } \right)} \right] + {\uplambda }\parallel {\uptheta }\parallel_{2}^{2}$$
(1)

where:

  • \({\text{N}}\) is the total number of patient cases,

  • \({\text{y}}_{{\text{i}}}\) is the probability prediction for the class (e.g., disease detected),

  • \({\text{log}}\) is the natural logarithm,

  • \(\parallel {\uptheta }\parallel_{2}^{2}\) is the \({\text{L}}_{2}\) regularization term to prevent overfitting, with regularization strength \({\uplambda }\).

Model prediction

The model prediction \({\text{y}}_{{\text{i}}}\) can be expressed as a function of the input features \({\text{x}}_{{\text{i}}}\) and the model parameters \({\uptheta }\):

$${\text{y}}_{{\text{i}}} = {\upsigma }\left( {{\uptheta }^{{\text{T}}} {\text{x}}_{{\text{i}}} } \right) = \frac{1}{{1 + {\text{e}}^{{{ - {\uptheta }{\text{T}}} {\text{x}}_{{\text{i}}} }} }}$$
(2)

where:

\({\upsigma }\left( \cdot \right)\) is the sigmoid activation function, which converts the raw model output to a probability.

Thus, the loss function \({\text{L}}\left( {\uptheta } \right)\) becomes:

$${\text{L}}\left( {\uptheta } \right) = - \frac{1}{{\text{N}}}\mathop \sum \limits_{{{\text{i}} = 1}}^{{\text{N}}} \left[ {{\text{y}}_{{\text{i}}} {\text{log}}\left( {\frac{1}{{1 + {\text{e}}^{{{ - {\uptheta }{\text{T}}} {\text{x}}_{{\text{i}}} }} }}} \right) + \left( {1 - {\text{y}}_{{\text{i}}} } \right){\text{log}}\left( {1 - \frac{1}{{1 + {\text{e}}^{{{ - {\uptheta }{\text{T}}} {\text{x}}_{{\text{i}}} }} }}} \right)} \right] + {\uplambda }\parallel {\uptheta }\parallel_{2}^{2}$$
(3)

Additional parameters

In addition to minimizing the cross-entropy loss, several other parameters influence the optimization:

  • Learning rate (\(\eta\)) Controls the step size during gradient descent updates.

  • Dropout rate (d) A regularization technique that randomly sets a fraction of neuron activations to zero during training to prevent overfitting.

  • Weight decay (\(\alpha\)) Another form of regularization applied during optimization to penalize large weights, where represents the decay rate.

  • Batch size (B) The amount of training samples that could be processed, in one iteration of gradient descent.

Objective function

The overall objective is to minimize the training loss while balancing accuracy and model complexity:

$${\text{minL}}\left( {\uptheta } \right) + {\upalpha }\parallel {\uptheta }\parallel_{2}^{2}$$
(4)

subject to the following constraints:

Model parameter constraints

$$\parallel {\uptheta }\parallel_{2}^{2} \le {\Theta }_{{{\text{max}}}}$$
(5)

where \({\Theta }_{{{\text{max}}}}\) is the maximum allowable complexity for the model parameters to prevent overfitting.

Prediction accuracy constraint

$${\text{A}}_{{{\text{diag}}}} \ge {\text{A}}_{{{\text{min}}}}$$
(6)

where \({\text{A}}_{{{\text{min}}}}\) is the minimum acceptable diagnostic accuracy after training.

Learning rate constraint

$${\upeta }_{{{\text{min}}}} \le {\upeta } \le {\upeta }_{{{\text{max}}}}$$
(7)

where \({\upeta }_{{{\text{min}}}}\) and \({\upeta }_{{{\text{max}}}}\) are the bounds on the learning rate to ensure stable convergence.

Dropout constraint

$$0 \le {\text{d}} \le {\text{d}}_{{{\text{max}}}}$$
(8)

where \({\text{d}}_{{{\text{max}}}}\) is the maximum allowable dropout rate to maintain network capacity.

The general criterion is that of trying to minimize the training loss while keeping into account both the desired accuracy rate and the model’s complexity. The problem formulation is centered on the training loss with model complexity regulated by means of regularization, weight decay and dropout. The sigmoid function guarantees results of the binary classification in form of probabilities while the regularizations punish excessively large parameters θ, which might cause overfitting. The purpose is to achieve the best parameter set that would decrease the loss and at the same time guarantee a satisfactory diagnostic accuracy.

Enhanced multi-disease prediction optimization for CareAssist-GPT

CareAssist-GPT is designed to optimize multi-disease prediction, focusing on accuracy while minimizing computational cost, inference time, and energy consumption. The system must handle complex patient data in real-time for reliable diagnosis.

The problem formulation involves several key parameters essential for optimizing the multi-disease prediction model. The set of diseases being predicted is denoted by \(\text{D}\). The model parameters, represented by \(\uptheta\), belong to \({\text{R}}^{\text{p}}\), where \(\text{p}\) indicates the total number of trainable parameters in the model. Medical data is captured in the input matrix \(\text{X}\in {\text{R}}^{\text{n}\times \text{m}}\), where \(\text{n}\) is the number of samples, and \(\text{m}\) denotes the number of features present in each sample. For each sample, true disease labels are represented by \({\text{Y}}_{\text{d}}\), while \({\text{Y}}_{\text{d}}\) stands for the predicted labels generated by the model for disease \(\text{d}\), based on input data \(\text{X}\) and model parameters \(\uptheta\).

To measure prediction performance, a loss function \({\text{L}}_{\text{d}}\left(\uptheta \right)\) is defined for each disease \(\text{d}\), which is aggregated into a total loss function \(\text{L}\left(\uptheta \right)\) across all diseases in the set \(\text{D}\). Additionally, computational complexity \(\text{C}\left(\uptheta ,\text{X}\right)\) is considered, reflecting the computational cost associated with processing the input data. Inference time \(\text{t}\left(\uptheta ,\text{X}\right)\) represents the time required by the model to generate predictions for a given dataset. Energy consumption \(\text{E}\left(\uptheta ,\text{X}\right)\) is also a critical factor, being proportional to the computational complexity and influenced by hardware-specific constants.

The loss function for disease \({\text{d}}\) is defined as:

$${\text{L}}_{{\text{d}}} \left( {\uptheta } \right) = \frac{1}{{\text{n}}}\mathop \sum \limits_{{{\text{i}} = 1}}^{{\text{n}}} {\text{L}}\left( {{\text{Y}}_{{\text{d}}}^{{\left( {\text{i}} \right)}} ,{\text{Y}}_{{\text{d}}}^{{\left( {\text{i}} \right)}} \left( {\uptheta } \right)} \right)$$
(9)

where \({\text{L}}\) denotes a classification loss function such as cross-entropy.

The total loss function across all diseases in the set \({\text{D}}\) is:

$${\text{L}}\left( {\uptheta } \right) = \mathop \sum \limits_{{{\text{d}} \in {\text{D}}}} {\text{L}}_{{\text{d}}} \left( {\uptheta } \right)$$
(10)

The computational complexity is modeled as:

$${\text{C}}\left( {{\uptheta },{\text{X}}} \right) = {\text{O}}\left( {{\text{p}} \cdot {\text{m}} \cdot {\text{logn}}} \right)$$
(11)

The inference time is represented by:

$${\text{t}}\left( {{\uptheta },{\text{X}}} \right) = {\text{O}}\left( {{\text{p}} \cdot {\text{m}}} \right)$$
(12)

Energy consumption is given by:

$${\text{E}}\left( {{\uptheta },{\text{X}}} \right) = {\updelta } \cdot {\text{C}}\left( {{\uptheta },{\text{X}}} \right)$$
(13)

where \({\updelta }\) is a hardware-specific constant.

Constraints

Accuracy constraint: Model must meet a minimum accuracy threshold \({\upeta }\):

$${\text{Accuracy}}\left( {\uptheta } \right) \ge {\upeta }$$
(14)

Computational complexity constraint:

$${\text{C}}\left( {{\uptheta },{\text{X}}} \right) \le {\text{C}}_{{{\text{max}}}}$$
(15)

Inference time constraint:

$${\text{t}}\left( {{\uptheta },{\text{X}}} \right) \le {\text{t}}_{{{\text{max}}}}$$
(16)

Energy consumption constraint:

$${\text{E}}\left( {{\uptheta },{\text{X}}} \right) \le {\text{E}}_{{{\text{max}}}}$$
(17)

Objective function with constraints:

$${\text{min}}\left[ {\mathop \sum \limits_{{{\text{d}} \in {\text{D}}}} {\text{L}}_{{\text{d}}} \left( {\uptheta } \right) + {\uplambda }_{1} {\text{C}}\left( {{\uptheta },{\text{X}}} \right) + {\uplambda }_{2} {\text{t}}\left( {{\uptheta },{\text{X}}} \right) + {\uplambda }_{3} {\text{E}}\left( {{\uptheta },{\text{X}}} \right)} \right]$$
(18)

subject to:

$${\text{Accuracy}}\left( {\uptheta } \right) \ge {\upeta },{\text{C}}\left( {{\uptheta },{\text{X}}} \right) \le {\text{C}}_{{{\text{max}}}} ,{\text{t}}\left( {{\uptheta },{\text{X}}} \right) \le {\text{t}}_{{{\text{max}}}} ,{\text{E}}\left( {{\uptheta },{\text{X}}} \right) \le {\text{E}}_{{\text{max }}}$$
(19)

The multi-disease prediction optimization is centered on developing a model that can predict several diseases at once given the input medical data. The objective is to achieve low total prediction loss for all diseases, at the same time balancing computational cost, inference time, and energy consumption for the model to be practical and can handle large data sets.

Dataset description

The dataset used for training and testing the CareAssist-GPT was designed to provide robust diagnostic features as well as patient-centered approach. This type of data is a collection of different patient information types; therefore, it creates a strong basis for developing a model that can analyze and interpret intricate health data. The dataset consists of three primary subsets: Radiographic images, physiological parameters, and free text notes. The data was then gathered from a broad spectrum of demographic categories to make the model more versatile accommodating to any type of patients.

The dataset used for this study includes 35,000 labeled X-ray images, 12,000 real-time vital sign recordings, and 50,000 clinical text records obtained from publicly available Kaggle datasets and hospital Electronic Health Records (EHRs). To ensure a comprehensive and unbiased representation, the dataset was carefully curated to include a proportional distribution of male and female patients, spanning various age groups (18–80 years old) and covering different disease severity levels. This diverse sampling strategy was employed to mitigate bias in model predictions and improve generalizability across different patient populations. The dataset includes cases collected from multiple hospitals, capturing a wide range of imaging conditions, clinical workflows, and equipment variations. This diversity ensures that the model is exposed to real-world heterogeneity during training, improving its ability to generalize across different healthcare environments. To further enhance robustness, extensive data augmentation techniques—such as random rotation, contrast enhancement, and noise injection—were applied to X-ray images, simulating the variability encountered in practical medical imaging scenarios.

A patient satisfaction survey was conducted with a sample of 500 patients drawn from three different hospitals to ensure diverse representation across demographics, socioeconomic groups, and literacy levels. A stratified random sampling technique was employed to balance the inclusion of individuals with varying health literacy—categorized as low, moderate, and high—as well as patients managing chronic conditions requiring long-term care. The survey comprised 10 structured items focusing on key aspects of the user experience, including clarity of AI-generated explanations, emotional response, perceived helpfulness, and overall ease of interaction. In addition to the structured responses, open-ended questions were included to capture qualitative feedback on specific concerns, expectations, and recommendations for enhancing AI-assisted communication in clinical settings.

Future research could expand this survey to a larger and more diverse cohort, incorporating patients from rural and underserved areas, to further assess the usability, accessibility, and trustworthiness of AI-driven diagnostic tools in real-world clinical settings.

  • Image data (X-ray) This subset consists of high-resolution X-ray images with annotations and labels of different conditions such as pneumonia. Every image contains clear anatomical features that can be extracted by the convolutional layers of the model.

  • Vital signs data The vital signs subset includes time series data on patients’ physiological data such as heart rate, blood pressure, respiratory rate, temperature and oxygen saturation. These readings were taken at intervals with record of each patient’s health status needed for real-time assessment as required.

  • Clinical text records This subset includes clinical free-text notes and summaries written by clinicians. Among the notes are symptoms of the patient, his medical history, diagnosis, and prescribed treatment. Such textual information helps the model interpret and place patients’ health conditions within an NLP context.

The following table summarizes the main characteristics of each dataset subset:

The dataset used for training and evaluation of CareAssist-GPT model includes normal cases and patients with pneumonia in chest X-ray images. This multimodal dataset guarantees that the model is trained with images of different conditions, as well as the corresponding vital signs and clinical text records. The use of both normal and pathological samples is helpful in enhancing the model’s diagnostic performance and its applicability. Figure 3 shows the Chest X-ray with and without pneumonia. Figure 3 presents a visual comparison of chest X-ray images, contrasting a healthy lung (without pneumonia) against a lung affected by pneumonia. The image with pneumonia shows visible opacities and white patches, indicating fluid build-up and inflammation, which are typical signs of infection. In contrast, the healthy lung appears clear with well-defined lung fields, illustrating the differences that CareAssist-GPT uses for accurate diagnosis.

Fig. 3
figure 3

Chest X-ray with and without pneumonia.

This structure of a multimodal dataset offers the Care Assist-GPT model to utilize the large variety of images, numbers, and texts in order to enhance the diagnostic performance and interact with patients. All subsets provide distinct complementary information that enriches different steps of the model’s data processing chain and enables reliable feature extraction and analysis of multiple health markers.

Model architecture overview

The proposed model has a number of blocks, and each block is designed to serve a specific purpose of feature extraction and decision making. While GPT is used for patient communication, diagnostic inference is based on CNNs (image) and RNNs (vitals), with transformer layers handling clinical text. The NLP module translates diagnostic outputs to patient-friendly language post-classification. Table 2 shows the parts of the proposed model as well as each layer, function, and number of parameters. Table 2 outlines the detailed architecture of the CareAssist-GPT model, starting with an input layer that processes multimodal data (X-ray images, vital signs, and clinical text). The model uses convolutional layers (Conv1 and Conv2) for feature extraction, followed by dropout layers to prevent overfitting and max pooling layers to reduce dimensionality. The fully connected layer (FC1) aggregates the extracted features, leading to the final output layer, which provides a comprehensive diagnosis, assesses risk, and offers personalized recommendations. This structured architecture ensures accurate, efficient, and interpretable diagnostic predictions.

Table 2 Care Assist-GPT model architecture.

To support clinical interpretability and build trust in the diagnostic process, CareAssist-GPT incorporates explainable AI (XAI) features within its architecture. The model includes an interpretability module that generates attention-based visualizations, such as heatmaps over X-ray regions, highlighting areas contributing most to the diagnostic output. Additionally, the NLP component employs key phrase extraction and token-level attention scores to provide context-aware summaries from clinical notes. These explainability tools enable clinicians to understand the rationale behind predictions, facilitating informed decision-making and enhancing the model’s transparency in real-world applications.

The overall architectural flow of CareAssist-GPT is illustrated in Fig. 4. It shows how multimodal clinical data—X-ray images, vital signs, and clinical notes—are independently processed through CNN, RNN, and Transformer modules, respectively. The extracted features are fused and passed through a fully connected layer to generate comprehensive diagnostic outputs, including risk assessment and patient-friendly feedback. This modular design ensures real-time responsiveness and clinical interpretability.

Fig. 4
figure 4

Architectural design of the CareAssist-GPT model.

To ensure the model’s generalizability and mitigate overfitting, CareAssist-GPT was evaluated using fivefold cross-validation and tested on external datasets, demonstrating stable performance across varied clinical scenarios.

Layer-by-layer mathematical model

The mathematical representation for each layer is given as follows:

Convolutional layers

Each convolutional layer \({\text{l}}\) applies a convolution operation over the input feature map \({\text{X}}^{{\left( {{\text{l}} - 1} \right)}}\) using a set of filters (kernels) \({\text{W}}^{{\left( {\text{l}} \right)}}\) and a bias \({\text{b}}^{{\left( {\text{l}} \right)}}\). The convolution output \({\text{X}}^{{\left( {\text{l}} \right)}}\) at layer \({\text{l}}\) is given by:

$${\text{X}}^{{\left( {\text{l}} \right)}} = {\upsigma }\left( {{\text{W}}^{{\left( {\text{l}} \right)}} {\text{*X}}^{{\left( {{\text{l}} - 1} \right)}} + {\text{b}}^{{\left( {\text{l}} \right)}} } \right)$$
(20)

where denotes the convolution operation and \({\upsigma }\) is the ReLU activation function.

Dropout layers

Dropout layers introduce regularization by randomly setting a fraction of the neurons to zero during each forward pass. For a given input \({\text{X}}^{{\left( {\text{l}} \right)}}\) at layer \({\text{l}}\), the dropout output \({\text{X}}_{{{\text{drop}}}}^{{\left( {\text{l}} \right)}}\) is:

$${\text{X}}_{{{\text{drop}}}}^{{\left( {\text{l}} \right)}} = {\text{X}}^{{\left( {\text{l}} \right)}} \cdot {\text{mask}}^{{\left( {\text{l}} \right)}}$$
(21)

where \({\text{mas}}{\text{k}}^{\left(\text{l}\right)}\) is a binary mask vector with a probability \(\text{p}\) of retaining each neuron.

Pooling layers

Pooling layers reduce the spatial dimensions by taking the maximum (max pooling) value over a defined window. For a feature map \({\text{X}}^{{\left( {\text{l}} \right)}}\), the output after max pooling \({\text{X}}_{{{\text{pool}}}}^{{\left( {\text{l}} \right)}}\) is:

$${\text{X}}_{{{\text{pool}}}}^{{\left( {\text{l}} \right)}} \left( {{\text{i}},{\text{j}}} \right) = {\text{max}}\left\{ {{\text{X}}^{{\left( {\text{l}} \right)}} \left( {{\text{i}} + {\text{m}},{\text{j}} + {\text{n}}} \right)} \right\}$$
(22)

where \(\left( {{\text{m}},{\text{n}}} \right)\) is the pooling window size.

Fully connected layers

Fully connected (dense) layers compute the weighted sum of inputs followed by an activation function. The output \({\text{Y}}^{{\left( {\text{l}} \right)}}\) of a fully connected layer \({\text{l}}\) is:

$${\text{Y}}^{{\left( {\text{l}} \right)}} = {\upsigma }\left( {{\text{W}}^{{\left( {\text{l}} \right)}} {\text{X}}^{{\left( {{\text{l}} - 1} \right)}} + {\text{b}}^{{\left( {\text{l}} \right)}} } \right)$$
(23)

where \({\text{W}}^{{\left( {\text{l}} \right)}}\) and \({\text{b}}^{{\left( {\text{l}} \right)}}\) are the weights and biases, respectively, and \({\upsigma }\) is typically a ReLU or softmax activation for the output layer.

Optimization strategy

To optimize the training process, the model uses a combination of techniques:

  • Learning rate adjustment Another technique is the adaptive learning rate to be used during training.

  • Gradient descent Acceleration by means of mini-batch gradient descent with a backpropagation algorithm.

  • L2 regularization The common method of controlling overfitting by adding a penalty to large weights of the model parameters.

The user-de need objective that is optimized during the training of this architecture constitutes the cross-entropy loss with L2 norm:

$${\text{L}}\left( {\uptheta } \right) = - \frac{1}{{\text{N}}}\mathop \sum \limits_{{{\text{i}} = 1}}^{{\text{N}}} \left[ {{\text{y}}_{{\text{i}}} {\text{log}}\left( {{\text{y}}_{{\text{i}}} } \right) + \left( {1 - {\text{y}}_{{\text{i}}} } \right){\text{log}}\left( {1 - {\text{y}}_{{\text{i}}} } \right)} \right] + {\uplambda }\parallel {\uptheta }\parallel_{2}^{2}$$
(24)

where \({\text{N}}\) is the number of samples, \({\text{y}}_{{\text{i}}}\) is the predicted probability, \({\text{y}}_{{\text{i}}}\) is the true label, \({\uplambda }\) is the regularization strength, and \(\parallel {\uptheta }\parallel_{2}^{2}\) is the \({\text{L}}_{2}\) norm of the model parameters.

The model training process was guided by empirically selected hyperparameters shown in Table 3. These parameters were tuned using a combination of grid search and validation feedback to ensure robust performance across different clinical data distributions. The use of dropout, L2 regularization, and early stopping mechanisms collectively contributed to improved generalization and minimized overfitting.

Table 3 Model training parameters.

Layer-by-layer summary table

A detailed summary of each layer’s function, type, and input/output dimensions is presented in Table 4.

Table 4 Layer-by-layer summary of care Assist-GPT model.

Figure 5 depicts the overall architecture of the CareAssist-GPT model, showcasing its neural network design focused on patient-centered diagnostics. It integrates convolutional layers for image processing, recurrent layers for analyzing time-series vital signs, and transformer-based layers for understanding clinical text. This combined approach allows the model to interpret complex multimodal data, enhancing both diagnostic accuracy and patient communication.

Fig. 5
figure 5

CareAssist-GPT model architecture: a neural network for patient-centered diagnostics.

Figure 6 provides a detailed view of the internal schematics of CareAssist-GPT, highlighting key optimization layers. It includes dropout layers for regularization, batch normalization for stabilizing training, and activation functions like ReLU for non-linear feature extraction. These components work together to optimize the model’s performance, reduce overfitting, and ensure reliable, interpretable predictions in clinical scenarios.

Fig. 6
figure 6

CareAssist-GPT model architecture with internal schematics and optimization layers.

Evaluation metrics

To provide a comprehensive evaluation of CareAssist-GPT’s diagnostic performance, multiple key performance metrics were analyzed alongside accuracy. The F1-score was recorded at 94.0%, reflecting a strong balance between precision (94.3%) and recall (93.8%), ensuring that the model effectively minimizes both false positives and false negatives in classification. Additionally, specificity, derived from the confusion matrix, was measured at 92.7%, highlighting the model’s ability to correctly identify non-disease cases and reduce misclassifications. The AUC-ROC curve, which serves as a robust indicator of model discrimination power, achieved an impressive score of 0.97, confirming high classification capability across different disease conditions. Notably, the model excelled in distinguishing early-stage pneumonia from other lung conditions, an area where traditional diagnostic approaches often struggle due to subtle visual differences in X-ray images. These results demonstrate that CareAssist-GPT is not only highly accurate but also reliable in real-world clinical applications, where minimizing diagnostic errors is crucial for early detection and effective patient management. These metrics include diagnostic accuracy rate and user-centered factors, since the model aims at increasing diagnostic reliability while at the same time improving patients’ satisfaction. The following Table 4 shows the evaluation metrics used:

In the Table 5, TP = True Positives, TN = True Negatives FP = False Positives, FN = False Negatives. Diagnosis performance is explained by the Accuracy, Precision, Recall, and F1 Score measurements. The two metrics, which are the Patient Satisfaction Score and the Interpretability Score, focus on the patient and retain simplicity for both the patient and clinician. The AUC-ROC, Response Time, and MSE are chosen as evaluation measures for the model’s dependability and speed, as well as for the possible errors when making continuous predictions. This evaluation approach offers an overall picture of CareAssist-GPT performance, integrating both precise and user-oriented aspects that are critical for the practical use of the model in healthcare settings.

Table 5 Evaluation metrics for CareAssist-GPT model.

Results and discussions

This section gives the analysis of the CareAssist-GPT model that has been developed in this study. The results are presented in tabular form in which each table consists of the performance of the proposed model and the previous models, namely Models A, B, and C identified from the literature. This detailed analysis makes it easy to understand how CareAssist-GPT enhances the existing diagnostic frameworks.

ROC curve and AUC-ROC analysis

The ROC (Receiver Operating Characteristic) curve of the CareAssist-GPT model is given in the Fig. 7. It gives a graphical representation relating the True Positive Rate with the False Positive rate to give a clue on the classification ability of a developed model at various cutoff points. Specifically, the AUC—Area Under the Curve of 0.99 of the points to the theorem that this model has excellent discriminative ability between classes.

Fig. 7
figure 7

ROC Curve for CareAssist-GPT Model with an AUC of 0.99. This high AUC suggests that the model has a high degree of accuracy in distinguishing between normal and pneumonia cases.

Training and validation accuracy and loss

The training and validation accuracy and loss over epochs are depicted in Fig. 8. The plot on the left shows the accuracy and it follows the trend during training and validation phases and the final values suggest that there is little overfitting. The plot on the right reflects the model loss which is gradually declining and proves the efficiency of the model learning process.

Fig. 8
figure 8

Training and Validation Accuracy (left) and Loss (right) for CareAssist-GPT Model. The graphs illustrate the model’s convergence and low variance between training and validation, highlighting robust learning with minimal overfitting.

Confusion matrix analysis

The confusion matrix of the CareAssist-GPT model is presented in the Fig. 9 below. They give a detailed result distinctive between True Positive, False Positive, True negative and False Negative and therefore can be used to assess the reliability of models for a given set of data. The matrix shows that the model correctly classifies most of the cases with minimal misclassifications, thus, high sensitivity and specificity for normal and pneumonia conditions.

Fig. 9
figure 9

Confusion Matrix for CareAssist-GPT Model. The matrix shows a high number of true positives and true negatives, confirming the model’s accuracy and effectiveness in handling pneumonia detection.

All these measures validate CareAssist-GPT as a dependable diagnostic tool for pneumonia, with high accuracy, efficient learning patterns, and low false prediction, hence appropriate for use in clinical setting.

Evaluation metric results

This section presents the findings of the CareAssist-GPT model’s classification when used to identify pneumonia from chest X-ray images. The test images were provided and labelled as either having pneumonia or being normal. Some of the classified images are shown in the Fig. 10 below. The figure shows each image with its class to indicate the diagnostic result given by the model.

Fig. 10
figure 10

Sample results from the CareAssist-GPT model on chest X-ray images. The images were classified into two categories: “pneumonia” and “normal.” This visual summary illustrates the model’s effectiveness in distinguishing between infected and healthy lungs.

The classification results have shown the effectiveness of the model at distinguishing between cases of pneumonia and non-pneumonia. As depicted in Fig. 10, the model is able to successfully detect regions of infection suggesting pneumonia as well as normal lung tissue. These findings lend credence to the model for utilization in reliable diagnostic settings in clinical practice.

To measure performance of CareAssist-GPT in an unbiased manner, we used several standard metrics for evaluation. All of the above metrics were calculated on the test data set and compared with prior models. The findings show the changes in diagnostic accuracy, interpretability and response time among the variables etc.

Accuracy

The Accuracy metric calculates the total percent of precise predictions made by the model and gives first insight regarding the efficiency. From the results presented in Table 6, it is clear that CareAssist-GPT has higher accuracy than the previous models implying better diagnostic capability.

Table 6 Accuracy comparison of CareAssist-GPT with previous models.

Model A CNN-based X-ray classifier; Model B: Hybrid CNN-RNN model; Model C: Rule-based CAD system. CareAssist-GPT outperformed all, as shown in Tables 5, 6, 7 and 8.

The accuracy results presented in this work show that our CareAssist-GPT model is superior to earlier models by up to 2.4% which implies that it is capable of providing accurate diagnostic solutions in various healthcare domains.

Precision and recall

Accuracy and completeness give a measure of how well the model is likely to identify positive cases while at the same time reducing on the likelihood of false positives. Table 7 presents the comparison of CareAssist-GPT with previously developed models.

Table 7 Precision and recall comparison of CareAssist-GPT with previous models.

The model shows an enhanced precision and recall, which decreases the number of false positive and false negative results. This improvement is important for diagnostic uses where it is important to correctly identify positive cases.

F1 score

Precision and recall are micro-averaged and F1 score, which is the harmonic mean of precision and recall, is used in the case of imbalanced data sets. Table 8 shows a comparison of the F1 scores.

Table 8 F1 score comparison of CareAssist-GPT with previous models.

The F1 score outcomes confirm that CareAssist-GPT yields a fairly good overall enhancement over prior models, especially the healthcare cases that demand high accuracy at the cost of recall.

Area under the ROC curve (AUC-ROC)

The AUC-ROC talks about the ability of the model to classify classes well. The higher AUC value indicates that the diagnostic tool developed in this study is more accurate. Table 9 compares the AUC-ROC values.

Table 9 AUC-ROC comparison of CareAssist-GPT with previous models.

It also achieves a marked qualitative advantage in terms of AUC-ROC compared to previous CareAssist models, which implies greater accuracy in predicting both false negatives and false positives.

Response time (latency)

The 500 ms response time of CareAssist-GPT is achieved through an optimized pipeline that includes data preprocessing (120 ms), model inference (280 ms), and output generation (100 ms), ensuring seamless real-time performance. The system leverages an efficient GPU-based architecture with parallelized tensor computations and batch inference optimization, allowing it to process multiple patient cases simultaneously without significant delays. Additionally, quantized model execution and memory-efficient deep learning techniques reduce computational overhead, further enhancing responsiveness. This low-latency design ensures that CareAssist-GPT can deliver instantaneous diagnostic feedback, making it highly suitable for real-time clinical decision support and emergency healthcare applications.

Real-time response is of paramount importance in healthcare-based applications. CareAssist-GPT was developed to offer a quick result, as indicated in the Table 10.

Table 10 Response time comparison of CareAssist-GPT with previous models.

CareAssist-GPT has a response time of 500 ms which is 23.1% slower than the fastest previous model. This real time response is critical in order to offer timely help in clinical settings.

Patient satisfaction score

While CareAssist-GPT demonstrated a high patient satisfaction score of 9.3, the study does not account for potential variations across different patient demographics, which may significantly influence the overall user experience. Previous research suggests that patient comprehension, trust in AI, and interaction preferences can vary based on age, educational background, health literacy, and language proficiency. For example, older adults or patients with low digital literacy may require simplified explanations, visual aids, or guided interactions, while younger patients may be more comfortable with text-based AI-driven consultations. Additionally, language barriers and cultural differences may affect how patients interpret AI-generated medical explanations, particularly in multilingual or low-resource healthcare settings where standardized medical terminology might be difficult to understand.

To improve accessibility and inclusivity, future studies should assess how CareAssist-GPT performs across diverse socioeconomic groups and explore adaptive personalization techniques that modify the AI’s communication style based on patient needs. Implementing language-specific summaries, real-time audio explanations, interactive chatbot interfaces, and simplified medical visuals could significantly enhance patient engagement and comprehension, particularly for individuals with low literacy levels or disabilities. Furthermore, incorporating patient feedback loops could allow the AI to dynamically adjust its tone, terminology, and response complexity over time, ensuring greater alignment with individual user preferences. By addressing these factors, CareAssist-GPT could evolve into a more patient-centric AI assistant, fostering greater trust, usability, and effectiveness in real-world healthcare applications. The Table 11 shows the comparison of the proposed CareAssist-GPT model with the previous models.

Table 11 Patient satisfaction score comparison of CareAssist-GPT with previous models.

CareAssist-GPT had a patient satisfaction score of 9.3 which was much higher than previous models. This means that patients considered the model’s interface and the explanations provided to be more helpful and easy to understand.

To further contextualize the patient satisfaction findings, CareAssist-GPT was tested across multiple clinical environments, including emergency care, outpatient radiology units, and general practice settings. Stratified analysis revealed consistent satisfaction scores above 9.0, with minor variations based on patient age and digital literacy. Additionally, feedback was collected longitudinally over follow-up interactions to assess changes in perception. These insights suggest that beyond first impressions, ongoing trust and comfort with the system increased over time, reinforcing the model’s adaptability and long-term usability in real-world healthcare workflows.

Comparative analysis with previous models

The foregoing tables summarized in total show that CareAssist-GPT is superior to existing models in all of the critical measures. Specifically:

  • Accuracy CareAssist-GPT has a more accurate diagnostic with an improvement of 2.4%.

  • Precision and recall The model minimizes both false positive and false negatives, which are crucial in medical applications.

  • Response time Due to the 500 ms response time, CareAssist-GPT is appropriate for real-time healthcare applications.

  • Patient satisfaction Higher patient satisfaction means that patients are more satisfied with the care they are receiving because diagnosis is explained better.

As shown in Table 12, CareAssist-GPT outperforms all baseline models across key performance indicators, including diagnostic accuracy, precision, recall, and AUC. It also achieves the highest patient satisfaction score, highlighting its dual strength in clinical reliability and user-centered communication.

Table 12 Comparison with state-of-the-art models.

In summary, the improvement of CareAssist-GPT in all aspects proves the possibility of applying this model in clinical practice, as it is crucial in practice to achieve both high diagnostic accuracy and patient satisfaction.

Ethical implications are paramount when deploying AI in healthcare, particularly concerning data privacy, informed consent, and bias. CareAssist-GPT addresses these issues through several strategies:

  • Data privacy and security We ensured compliance with the Health Insurance Portability and Accountability Act (HIPAA) and General Data Protection Regulation (GDPR). All patient data was anonymized, and secure storage protocols were implemented to protect sensitive information.

  • Bias and fairness Demographic biases in training data, such as underrepresentation of certain racial or age groups, can lead to biased diagnostic outcomes. To mitigate this, we conducted regular audits of the model’s predictions across different demographic subgroups. Adjustments, such as rebalancing the training dataset and using class weight modifications, were employed to reduce disparities.

  • Informed consent Patients interacting with CareAssist-GPT were informed about the use of AI in their diagnostic process, ensuring transparency. Consent was obtained before data collection, aligning with ethical guidelines for patient autonomy.

These measures align with ethical frameworks such as the IEEE Global Initiative on Ethics of Autonomous and Intelligent Systems, ensuring that the model’s deployment prioritizes patient welfare and fairness. The patient satisfaction score, a key metric for assessing the usability of CareAssist-GPT, was derived from a standardized survey conducted with 100 patients who interacted with the system. The survey included questions related to the clarity of diagnostic feedback, the helpfulness of the AI’s explanations, and the overall user experience. The results indicated that 85% of patients rated their experience as excellent, citing the system’s patient-friendly language and timely responses as major factors. The high satisfaction score of 9.3 out of 10 underscores the importance of effective communication in AI-driven healthcare tools, enhancing patient trust and engagement. CareAssist-GPT’s strong performance metrics suggest that it has the potential to make a significant impact in real-world healthcare settings. By providing accurate and timely diagnoses, the model can help reduce the workload of healthcare providers, allowing them to focus on more complex cases. The enhanced interpretability of the model’s predictions, facilitated by attention maps and key phrase extraction from clinical text, supports clinicians in understanding the AI’s reasoning process, improving their confidence in using the tool as a diagnostic aid.

However, the model’s generalizability may be limited by the diversity of the training dataset. While CareAssist-GPT performed well on standard test datasets, its effectiveness in handling rare medical conditions remains an area for further exploration. Additionally, integrating the model into existing clinical workflows, especially in facilities with legacy health IT systems, presents challenges that need to be addressed for broader adoption.

In real-world scenarios, clinical data is often noisy, incomplete, or inconsistent. CareAssist-GPT’s robustness was tested by introducing random noise and missing entries into the validation dataset. Although the performance declined slightly (accuracy reduced by 1.2%), the model demonstrated resilience due to its robust preprocessing pipeline and regularization techniques. Future research could focus on implementing advanced data denoising algorithms and adaptive learning strategies to further enhance the model’s ability to handle variable data quality. Despite its promising results, CareAssist-GPT has limitations. The model’s reliance on high-quality input data may reduce its effectiveness in environments with limited resources or low-quality imaging equipment. Additionally, while the interpretability module helps explain predictions, more sophisticated explainable AI (XAI) methods are needed to provide deeper insights, particularly for complex medical cases. Future research should explore:

  • Enhancing language models Developing specialized language models trained on medical literature to improve the understanding of complex medical terminology.

  • Incorporating emotional intelligence Integrating sentiment analysis and emotion recognition capabilities to address the emotional aspects of patient communication, particularly in sensitive medical scenarios.

  • Expanding data integration Including comprehensive patient profiles, such as genomic data and social determinants of health, to enhance diagnostic accuracy and personalized care.

  • Optimizing for scalability Utilizing model compression techniques like pruning and quantization to reduce computational requirements, enabling deployment in lower-resource settings.

  • Real-world clinical trials Conducting extensive clinical trials across diverse medical specialties and patient demographics to validate the model’s effectiveness and refine its performance in live healthcare environments.

The 2.4% improvement in diagnostic accuracy achieved by CareAssist-GPT was evaluated against three baseline models, each representing a distinct approach to AI-driven medical diagnosis. These models were selected based on established AI techniques in computer-aided diagnosis (CAD) to ensure a fair comparison of multimodal fusion performance. The first baseline, Model A, was a CNN-based classifier trained exclusively on X-ray images, achieving an accuracy of 93.4%. While this model performed well in image-based disease detection, it lacked the ability to incorporate additional patient data, such as vital signs, laboratory results, and clinical history, which are crucial for context-aware and patient-specific medical decision-making. The second baseline, Model B, was a hybrid CNN-RNN model that integrated X-ray images and time-series vital sign data, yielding a slightly lower accuracy of 91.2%. While this hybrid approach introduced multimodal capabilities, it was limited by the absence of clinical text processing, making it less effective in cases requiring historical patient information or physician-recorded notes. The third baseline, Model C, was a traditional rule-based CAD system, relying on predefined medical rules and expert-crafted criteria rather than machine learning. While this model provided interpretable decision-making, its rigid structure and lack of adaptability resulted in an accuracy of 89.5%, making it less reliable for detecting complex, ambiguous, or rare medical conditions.

In contrast, CareAssist-GPT’s multimodal fusion approach effectively leveraged interdependent medical features by integrating X-ray images, real-time vital sign readings, and unstructured clinical text within a unified deep learning framework. Unlike the baseline models, which relied on single or dual data modalities, CareAssist-GPT combined CNNs for image analysis, RNNs for time-series processing, and transformer-based NLP models for clinical text interpretation, providing a more comprehensive and contextually aware diagnosis. The integration of structured (numerical) and unstructured (textual) data enabled the model to detect subtle disease markers, which might be overlooked when analyzing individual data sources separately. Additionally, CareAssist-GPT’s adaptive fusion mechanism allowed it to dynamically weight different data modalities based on case-specific importance, further optimizing diagnostic accuracy.

The 2.4% increase in accuracy over the best-performing baseline model (Model A) underscores the effectiveness of multimodal AI in clinical decision support. The ability to process and synthesize heterogeneous data sources enhances diagnostic confidence, particularly in early disease detection scenarios, where relying on a single data modality may lead to incomplete or misleading interpretations. Furthermore, CareAssist-GPT demonstrated improved performance in handling borderline cases, where symptoms were mild or ambiguous, thanks to its context-aware text analysis that incorporated historical patient data and physician notes. These findings reinforce the potential of CareAssist-GPT to transform AI-assisted diagnostics, optimize clinical workflows, and improve real-world patient outcomes by offering a more holistic and reliable decision-support system for healthcare practitioners.

Clinical significance of the results

The CareAssist-GPT model achieved a diagnostic accuracy of 95.8%, outperforming existing models (Model A: 89.5%, Model B: 91.2%, Model C: 93.4%). This improvement translates into a substantial reduction in diagnostic errors, which is critical in clinical settings where timely and accurate diagnosis can directly impact patient outcomes. For instance, faster and more precise identification of pneumonia cases can lead to earlier interventions, reducing the risk of complications and improving recovery rates. Furthermore, the model’s ability to provide patient-friendly explanations enhances patient engagement and trust, leading to better adherence to treatment plans and improved overall healthcare experiences.

Statistical validation of model performance

To statistically validate the differences in performance between CareAssist-GPT and baseline models, we conducted paired t-tests and ANOVA analyses. The results confirmed that the improvements in accuracy, precision, recall, and F1 score were statistically significant (p < 0.05). For example, the increase in precision from Model C’s 91.5% to CareAssist-GPT’s 94.3% demonstrated a significant reduction in false positives, crucial for reducing unnecessary treatments and patient anxiety. The AUC-ROC analysis also highlighted the superior discriminative capability of CareAssist-GPT, with an AUC of 0.97 compared to 0.91, 0.93, and 0.95 for Models A, B, and C, respectively.

A paired t-test was conducted to compare the diagnostic accuracy of CareAssist-GPT with traditional Computer-Aided Diagnosis (CAD) models, revealing a statistically significant improvement (p < 0.001), indicating that the proposed AI system consistently outperforms conventional approaches in clinical decision-making. To further assess model robustness, an Analysis of Variance (ANOVA) test was performed across multiple hospital datasets, demonstrating that CareAssist-GPT maintains consistent performance across different medical institutions (F = 14.78, p < 0.05), thereby confirming its generalizability across diverse healthcare settings. Additionally, a comparison of Receiver Operating Characteristic—Area Under the Curve (AUC-ROC) values showed an absolute improvement of 0.02 (95% Confidence Interval: 0.015–0.025), signifying that CareAssist-GPT exhibits a higher discriminatory ability in differentiating between disease conditions. These findings reinforce the statistical significance of CareAssist-GPT’s diagnostic enhancements and its potential as a reliable AI-assisted decision support system in clinical practice, capable of improving early disease detection, reducing false positives, and enhancing medical workflow efficiency. Future studies could extend this statistical validation by incorporating larger multi-center trials, exploring real-time physician-AI collaboration, and integrating prospective patient outcomes to further validate the clinical impact of CareAssist-GPT.

To further validate the statistical significance of our model’s performance, we conducted a series of statistical analyses comparing CareAssist-GPT to baseline models, including CNN-only architectures, XGBoost classifiers, and traditional CAD systems. The mean accuracy improvement of CareAssist-GPT over the best-performing baseline model was 2.4% (95% CI: 1.8–3.0%), confirming a statistically significant enhancement (p < 0.001). This confidence interval suggests that, even in the worst-case scenario, CareAssist-GPT still provides a meaningful improvement over conventional methods. Additionally, an independent samples t-test was conducted between CareAssist-GPT and traditional models, yielding a t-value of 5.73 (p < 0.0001), indicating a statistically significant difference. An ANOVA test across all model variations further supported these results (F = 12.82, p < 0.0005), demonstrating that CareAssist-GPT consistently outperforms existing approaches across multiple independent datasets.

To assess the robustness of the model across different patient populations, we performed stratified subgroup analysis based on age, gender, and comorbidities. The model’s performance remained stable across these subgroups, with minimal variance (σ2 = 0.0025), suggesting that its predictions are generalizable and unbiased across diverse demographics. Furthermore, a Wilcoxon signed-rank test was conducted on the AUC-ROC values across multiple runs, yielding p = 0.021, reinforcing that the observed improvements were not due to random fluctuations. These statistical findings confirm that CareAssist-GPT’s enhancements in diagnostic accuracy and efficiency are both clinically and statistically significant, underscoring its potential for real-world clinical deployment.

Generalizability and real-world applicability

The generalizability of CareAssist-GPT was assessed using an external validation dataset sourced from a different healthcare institution. The model maintained a high diagnostic accuracy of 94.6%, indicating its robust performance across varying patient demographics and clinical conditions. This suggests that CareAssist-GPT can be effectively applied in diverse real-world scenarios, making it suitable for deployment in different healthcare settings. However, the performance slightly declined when tested on a dataset with rare medical conditions not represented in the training set, highlighting an area for future improvement.

Error analysis

A detailed error analysis was conducted to examine misclassifications. The majority of errors were false negatives in cases of early-stage pneumonia, where visual indicators were subtle. These errors were primarily due to the limitations of the X-ray image quality or ambiguous clinical text data. Additionally, some false positives were observed in cases with pre-existing lung conditions, such as chronic obstructive pulmonary disease (COPD), which presented features similar to pneumonia. This analysis highlights the need for improved preprocessing techniques and potential inclusion of additional clinical data (e.g., patient history) to reduce misclassification rates.

Patient satisfaction score and its importance

The patient satisfaction score, measured through a standardized survey, was 9.3 out of 10, significantly higher than previous models (Model A: 7.8, Model B: 8.1, Model C: 8.5). This metric reflects the ease of use and clarity of explanations provided by CareAssist-GPT. The satisfaction survey included questions on the clarity of diagnostic feedback, perceived helpfulness, and overall user experience. High satisfaction scores indicate that patients felt more informed and engaged, reducing anxiety and increasing their trust in the AI system. This is particularly important in clinical practice, where effective communication can enhance patient adherence to treatment recommendations.

Scalability and integration challenges

While CareAssist-GPT demonstrates strong performance, scalability remains a key challenge, particularly in resource-constrained healthcare settings. The model’s computational requirements, including the need for high-performance GPUs and extensive memory, may limit its deployment in smaller clinics or hospitals with limited IT infrastructure. Integration with existing electronic health record (EHR) systems is also challenging, as many legacy systems may not support seamless data exchange with advanced AI models. Addressing these challenges requires collaboration with healthcare providers to develop optimized, scalable versions of the model and to create standardized integration protocols.

Real-world integration discussion

For real-world adoption, CareAssist-GPT must integrate seamlessly with existing hospital IT systems, ensuring compatibility with diverse clinical workflows, regulatory requirements, and varying levels of computational resources. Deployment strategies should focus on scalability, security, and interoperability, allowing for efficient adoption across healthcare institutions of different sizes and technological capabilities.

To support real-world integration, CareAssist-GPT is designed with a modular deployment strategy. This includes:

  • Cloud-based inference servers for scalable access in low-resource settings.

  • On-premise options for data-sensitive environments (e.g., tertiary hospitals).

  • Compatibility with FHIR-based EHR systems through secure API endpoints.

  • Lightweight containerization using Docker/Kubernetes for flexible deployment across hospital networks.

These strategies enable CareAssist-GPT to be tailored to diverse healthcare infrastructures, ranging from large hospital networks to smaller community clinics, without compromising speed or data integrity.

Cloud-based AI inference for scalability

Many hospitals, especially smaller clinics and resource-limited healthcare facilities, lack the high-end GPU infrastructure required for real-time AI inference. To address this, CareAssist-GPT can be deployed using cloud-based AI inference models, leveraging platforms such as AWS HealthLake, Google Cloud Healthcare API, and Microsoft Azure AI for Health. Cloud-based deployment offers several key advantages:

  • Scalability Allows hospitals to dynamically allocate computational resources based on patient load and diagnostic demand.

  • Cost efficiency Reduces the need for hospitals to invest in expensive on-premise hardware while maintaining AI-driven diagnostic capabilities.

  • Remote accessibility Enables AI-powered diagnostics to be accessible in telemedicine settings and rural healthcare environments.

  • Automatic model updates Ensures that the latest AI improvements and security patches are seamlessly integrated without requiring local IT intervention.

For hospitals prioritizing on-premise deployment due to data security concerns, CareAssist-GPT can be optimized for edge computing and hardware acceleration using NVIDIA TensorRT, Intel OpenVINO, and model quantization techniques, enabling efficient AI inference on CPU-based hospital servers.

FHIR-compliant APIs for EHR interoperability

To maximize clinical utility, CareAssist-GPT must be able to communicate with existing Electronic Health Record (EHR) systems. The Fast Healthcare Interoperability Resources (FHIR) standard provides a universally accepted framework for data exchange, ensuring that CareAssist-GPT can seamlessly integrate across various healthcare IT infrastructures. The key functionalities of FHIR-compliant APIs include:

  • Real-time data retrieval CareAssist-GPT can fetch patient data (X-rays, vital signs, and clinical notes) from hospital EHR systems, ensuring accurate and up-to-date diagnostics.

  • Automated diagnostic reports AI-generated insights can be directly uploaded to patient records, reducing manual documentation efforts for clinicians.

  • Bidirectional data exchange Physicians can validate AI-driven recommendations, provide corrections, and refine model outputs, improving overall diagnostic reliability.

  • Privacy & security compliance FHIR-based integration ensures compliance with HIPAA (U.S.), GDPR (Europe), and other global healthcare regulations, safeguarding sensitive patient data.

By leveraging FHIR-compliant APIs and cloud-based AI models, CareAssist-GPT can enhance diagnostic efficiency, support clinical decision-making, and improve patient outcomes, making AI-assisted diagnostics a practical reality in modern healthcare settings.

In addition to HIPAA and GDPR, the development and deployment of CareAssist-GPT is guided by globally recognized ethical frameworks, including the OECD AI Principles, the EU AI Act (draft compliance structure), and the IEEE Global Initiative on Ethics of Autonomous and Intelligent Systems. These frameworks emphasize the importance of human oversight, transparency, safety, and accountability in AI systems. In clinical contexts, CareAssist-GPT is designed to operate within ethical boundaries ensuring informed consent, non-maleficence, equity in access, and explainability—core principles critical to AI regulation in healthcare.

Ethical, regulatory, and privacy considerations

To promote Trustworthy AI in clinical diagnostics, CareAssist-GPT aligns with core principles including transparency, fairness, accountability, privacy, and robustness. The model includes an interpretability module for explainable predictions, fairness audits to mitigate demographic bias, and rigorous logging mechanisms for traceability of diagnostic outputs. All patient interactions are encrypted, and data governance frameworks are implemented in accordance with best practices, reinforcing both patient data protection and trust in AI-generated results.

Deploying AI models like CareAssist-GPT in clinical practice raises important ethical and regulatory concerns. Data privacy is a primary issue, especially given the sensitive nature of patient information. We ensured compliance with data protection regulations, including GDPR and HIPAA, by anonymizing patient data and employing secure storage solutions. Although federated learning was not implemented, all data were anonymized. Future work will explore differential privacy and decentralized learning frameworks. Bias in AI predictions is another critical concern, as models trained on specific demographic data may perform less accurately on underrepresented groups. Regular bias audits were conducted, and adjustments in class weights and data balancing techniques were applied to mitigate these issues. The current regulatory framework for AI in healthcare is still evolving, which necessitates ongoing monitoring and adaptation to comply with emerging standards and guidelines.

Interpretability of model predictions

The interpretability of AI models is crucial for clinical acceptance. CareAssist-GPT includes an interpretability module that provides explanations for its predictions using attention maps for image data, temporal trend analysis for vital signs, and key phrase extraction for text data. This transparency helps clinicians understand the reasoning behind the model’s decisions, fostering trust and enabling them to make informed final decisions. Feedback from healthcare professionals indicated that the interpretability features significantly improved their confidence in using the model as a diagnostic aid.

Limitations of the study

Despite promising results, several limitations remain. The model’s reliance on high-quality multimodal inputs may reduce effectiveness in settings with limited resources or low-fidelity imaging. Clinical validation was primarily focused on pneumonia, and additional trials across other conditions are needed. Also, while the interpretability module aids explainability, further work is needed on deep transparency for complex predictions. Lastly, integration into clinical practice may face hurdles due to IT interoperability, workflow inertia, and regulatory bottlenecks in different regions. Addressing these issues is key to enabling widespread adoption.

Despite its promising performance, CareAssist-GPT has several limitations. The model’s reliance on high-quality input data may reduce its effectiveness in environments with noisy or incomplete data. For example, low-resolution X-ray images or poorly documented clinical notes could degrade diagnostic accuracy. Additionally, the current dataset does not fully capture rare medical conditions, which may limit the model’s generalizability to less common scenarios. Future work should focus on expanding the dataset to include a broader range of medical conditions and improving the robustness of the model against data quality issues. One limitation of CareAssist-GPT is the potential for model bias, as the dataset used for training primarily consists of hospital patients from urban centers, which may lead to underrepresentation of rural or low-resource healthcare settings where access to advanced imaging and digital health records is limited. This imbalance could affect the model’s generalizability, particularly in settings where disease prevalence, diagnostic standards, and healthcare accessibility differ significantly. Additionally, while CareAssist-GPT has demonstrated high diagnostic accuracy, it has shown a tendency to misclassify mild pneumonia cases as normal, particularly when subtle radiographic abnormalities are present in X-ray images. This could be attributed to the model’s reliance on high-contrast features, which may not adequately capture the faint opacities or early-stage inflammatory changes characteristic of mild pneumonia. Future improvements should focus on integrating active learning approaches, where the model is continuously updated with new, diverse patient cases, particularly from underrepresented populations and rare disease categories. Furthermore, enhancing explainability mechanisms, such as attention-based heatmaps, case-based reasoning, and uncertainty quantification, can improve physician trust by providing clear, interpretable justifications for AI-generated decisions. Addressing these limitations will help ensure that CareAssist-GPT is not only highly accurate but also equitable, transparent, and adaptable to diverse clinical environments. While CareAssist-GPT shows promising results, the study has several limitations. The model’s performance on rare diseases was not extensively evaluated, and real-world deployment was limited to controlled hospital settings. Additionally, the system currently lacks integration with electronic health records (EHR) and has not yet been tested for multilingual or low-resource language support. Future work will aim to address these gaps to enhance model scalability and inclusivity.

Future research directions

Building on the current findings, future research should explore the integration of additional data sources, such as patient medical history and genomic data, to enhance diagnostic accuracy further. Developing more advanced explainable AI (XAI) techniques could also improve the interpretability of the model, making it easier for clinicians to understand complex predictions. Furthermore, efforts should be made to optimize the model for lower-resource environments, enabling broader scalability and accessibility across diverse healthcare settings.

Handling noisy or inconsistent data

The model’s robustness to noisy or inconsistent real-world data was tested by introducing random noise and incomplete entries into the test dataset. While performance declined slightly, the model demonstrated resilience due to its robust preprocessing steps and the use of dropout layers and regularization techniques. Future enhancements could include incorporating advanced data denoising algorithms and developing adaptive learning strategies to handle variable data quality more effectively.

CareAssist-GPT employs multi-modal dropout techniques to effectively handle missing or incomplete data during inference, ensuring robustness in real-world clinical settings where patient records may be fragmented. When vital sign readings are absent, the model utilizes a temporal imputation strategy, estimating missing values based on historical trends from the patient’s past medical records and dynamically adjusting for variations in heart rate, blood pressure, and oxygen saturation levels. This approach enables the model to maintain continuity in time-series data, reducing the risk of misleading predictions due to incomplete inputs. Similarly, when clinical text records contain gaps, CareAssist-GPT integrates an NLP-driven, context-aware imputation system that reconstructs missing medical terms or incomplete physician notes by identifying semantically similar cases within the dataset. By leveraging pre-trained transformer embeddings, the system ensures that imputed text remains clinically relevant and contextually accurate, minimizing errors that could arise from textual ambiguity. These data-handling mechanisms not only enhance CareAssist-GPT’s adaptability in diverse healthcare environments but also improve prediction reliability, making it well-suited for real-time clinical decision-making and AI-assisted diagnostics.

Conclusions

CareAssist-GPT is a novel AI-driven diagnostic framework designed to integrate multimodal clinical data—including X-ray images, vital signs, and clinical text—to significantly improve diagnostic accuracy while enhancing patient communication. The model has shown marked improvements over traditional diagnostic systems, achieving a diagnostic accuracy of 95.8%, a response time of 500 ms, and a patient satisfaction score of 9.3/10. Unlike conventional CAD systems, which focus primarily on analytical performance, CareAssist-GPT addresses the broader clinical need for explainable, patient-friendly diagnostics. By simplifying medical terminology and delivering real-time conversational feedback, it increases patient engagement, improves comprehension, and supports better treatment adherence—ultimately contributing to improved health outcomes. The model’s key innovation lies in its dual emphasis on diagnostic performance and patient-centered communication. This balance enables clinicians to rely on CareAssist-GPT for accurate insights while ensuring that patients are not left overwhelmed by technical jargon. The conversational interface acts as a bridge between clinical intelligence and user-friendly healthcare, fostering greater trust in AI-assisted medicine and potentially reducing the communication gap in physician–patient interactions. As discussed earlier, several limitations remain that must be acknowledged. CareAssist-GPT has not yet been extensively tested on rare or highly complex medical cases. Its deployment has so far been limited to controlled environments with robust computational infrastructure, which may not reflect the challenges faced in lower-resource clinical settings. The model currently lacks deep EHR integration and full multilingual support, which could limit accessibility for patients with diverse linguistic or literacy backgrounds. Additionally, while efforts have been made to reduce bias and improve transparency, continuous validation across varied clinical contexts is required. Future research directions, as outlined earlier, will focus on several high-impact areas. These include developing specialized language models trained on medical corpora to better interpret complex terminology, integrating emotion recognition to enhance empathetic communication, and incorporating additional data sources such as genomic data, medical history, and lifestyle factors for more personalized diagnostics. Efforts will also focus on optimizing computational performance through techniques such as model quantization and deploying cloud-based inference solutions, making the system more accessible to resource-constrained healthcare environments. Furthermore, establishing Fast Healthcare Interoperability Resources (FHIR)-compliant interfaces will be critical for seamless EHR integration and clinical decision support. By addressing these limitations and pursuing these future directions, CareAssist-GPT can evolve into a comprehensive, adaptive, and patient-centered AI solution that scales across medical specialties and settings. Large-scale clinical trials and external validation from regulatory authorities will play a crucial role in confirming the model’s real-world reliability, fairness, and clinical safety. As the field of AI in healthcare continues to mature, ongoing work in language modeling, emotional intelligence, explainability, scalability, and ethical deployment will be key to realizing a new era of accessible, trustworthy, and human-centered AI diagnostics. In conclusion, CareAssist-GPT represents a transformative step toward the integration of intelligent, interpretable, and empathetic AI in modern healthcare. Its ability to deliver high-accuracy, real-time, and comprehensible diagnostic feedback has the potential to reshape how medicine is practiced—empowering patients, supporting clinicians, and fostering a more inclusive and effective healthcare system worldwides.