Table 3 Summary of unified evaluation constructs

From: A scoping review of large language models for generative tasks in mental health care

Step	Higher-order construct	Lower-order construct	Definition	Examples	Article references
1	Safety, Privacy, and Fairness	Safety	Prevent worse outcomes for the patient, provider, or health system from occurring as a result of the use of an ML algorithm.	Outcome proxy appropriateness, Data provenance, Harm control, Reducing automation bias, Critical help, Ethics, etc.	^20,34
	Safety, Privacy, and Fairness	Privacy	Protect privacy according to standards like HIPAA and GDPR, ensuring user autonomy and dignity.	Data exchange, Data collection and storage, Data usage, Privacy Policy, Data protection, etc.	³⁵
	Safety, Privacy, and Fairness	Fairness and bias management	Ensure the chatbot operate with minimized and acknowledged biases to ensure fair outcomes.	Systemic Bias, Computational and Statistical Bias, Human-cognitive biases, Population bias, etc.	²⁰
2	Trustworthiness and Usefulness	Beneficence	Ensure the chatbot positively impacts its intended outcomes, emphasizing measurable benefits over potential risks	Health Outcomes, Clinical Evidence, User Behaviors, Intervention, Healthcare System, etc.	^{14,15,16,18,20,21,22,35}
	Trustworthiness and Usefulness	Generalizability	Apply learned patterns to new, unseen data.	Contextual Adaptability, Novel Data Performance, etc.	^20,34
	Trustworthiness and Usefulness	Reliability	Ensure that the chatbot consistently performs as intended under various conditions and maintains dependable operation over time.	Failure Prevention, Robustness, Workflow Integration, Reproducibility, Monitoring, Up-to-dateness, etc.	^19,48
	Trustworthiness and Usefulness	Validity	Ensure the chatbot performs as expected in real-world conditions	Data Relevance and Credibility, Language Understanding, Information Retrieval Accuracy, Outcome Accuracy, Task Completion, etc.	^20,21,26,34
3	Design and Operational Effectiveness	Accessibility	Ensure those involved in the chatbot’s lifecycle uphold standards of auditability and harm minimization.	Versatile access, User literacy required, User experience, User Interface Design, Simplicity/Ease of Use, etc.	^{15,16,18,20,21,26,28,32,35}
	Design and Operational Effectiveness	Personalized Engagement	Tailor responses based on patient data and preferences.	Personalization, Anthropomorphism/relationship, User Adherence, Feedback Incorporation, Progress awareness, etc.	^{18,20,23,31,32,33,34,35}
	Design and Operational Effectiveness	Cost-Effectiveness	Assess whether the chatbot delivers beneficial outcomes at a reasonable cost, providing a better or more economical solution compared to existing methods.	Comparative Effectiveness, Economical Viability, Environmental Viability, Task Efficiency, Workflow Considerations, etc.	^20,26,34

Table 3 summarizes the mapped primary and second-level constructs across the reviewed studies. We have also included examples of sub-constructs for each mapped second-level construct for the readers to understand the mapped constructs. Further details of evaluation subjects, evaluation methods, sample sizes, scale names, original constructs, mapped second-level constructs, and levels associated with each article can be found in Supplementary Table 3. Practical evaluation questions related to each construct can be found in the original article.
Constructs have been mapped to the second level to avoid excessive scarcity and granularity.

Back to article page

Table 3 Summary of unified evaluation constructs

Search

Quick links