Fig. 3: Overview of the four healthcare evaluation metric groups.
From: Foundation metrics for evaluating effectiveness of healthcare conversations powered by generative AI

Accuracy metrics are scored based on domain and task types, trustworthiness metrics are evaluated according to the user type, empathy metrics consider patients needs in evaluation (among the user type), and performance metrics are evaluated based on the three confounding variables. The metrics identify the listed problems of healthcare chatbots. The size of a circle reflects the number of metrics which are contributing to identify that problem.