Table 3 Summary of unified evaluation constructs
From: A scoping review of large language models for generative tasks in mental health care
Step | Higher-order construct | Lower-order construct | Definition | Examples | Article references |
---|---|---|---|---|---|
1 | Safety, Privacy, and Fairness | Safety | Prevent worse outcomes for the patient, provider, or health system from occurring as a result of the use of an ML algorithm. | Outcome proxy appropriateness, Data provenance, Harm control, Reducing automation bias, Critical help, Ethics, etc. | |
Safety, Privacy, and Fairness | Privacy | Protect privacy according to standards like HIPAA and GDPR, ensuring user autonomy and dignity. | Data exchange, Data collection and storage, Data usage, Privacy Policy, Data protection, etc. | ||
Safety, Privacy, and Fairness | Fairness and bias management | Ensure the chatbot operate with minimized and acknowledged biases to ensure fair outcomes. | Systemic Bias, Computational and Statistical Bias, Human-cognitive biases, Population bias, etc. | ||
2 | Trustworthiness and Usefulness | Beneficence | Ensure the chatbot positively impacts its intended outcomes, emphasizing measurable benefits over potential risks | Health Outcomes, Clinical Evidence, User Behaviors, Intervention, Healthcare System, etc. | |
Trustworthiness and Usefulness | Generalizability | Apply learned patterns to new, unseen data. | Contextual Adaptability, Novel Data Performance, etc. | ||
Trustworthiness and Usefulness | Reliability | Ensure that the chatbot consistently performs as intended under various conditions and maintains dependable operation over time. | Failure Prevention, Robustness, Workflow Integration, Reproducibility, Monitoring, Up-to-dateness, etc. | ||
Trustworthiness and Usefulness | Validity | Ensure the chatbot performs as expected in real-world conditions | Data Relevance and Credibility, Language Understanding, Information Retrieval Accuracy, Outcome Accuracy, Task Completion, etc. | ||
3 | Design and Operational Effectiveness | Accessibility | Ensure those involved in the chatbot’s lifecycle uphold standards of auditability and harm minimization. | Versatile access, User literacy required, User experience, User Interface Design, Simplicity/Ease of Use, etc. | |
Design and Operational Effectiveness | Personalized Engagement | Tailor responses based on patient data and preferences. | Personalization, Anthropomorphism/relationship, User Adherence, Feedback Incorporation, Progress awareness, etc. | ||
Design and Operational Effectiveness | Cost-Effectiveness | Assess whether the chatbot delivers beneficial outcomes at a reasonable cost, providing a better or more economical solution compared to existing methods. | Comparative Effectiveness, Economical Viability, Environmental Viability, Task Efficiency, Workflow Considerations, etc. |