Fig. 1: An overview of the metrics proposed in the literature.
From: Foundation metrics for evaluating effectiveness of healthcare conversations powered by generative AI

a Existing intrinsic metrics which are categorized into general LLM metrics and Dialog metrics. b Existing extrinsic metrics for both general domain and healthcare-specific evaluations are presented.