Table 1 Comparison of running time and cost for rationale assessment among text-similarity metrics, LLM-w-Rationale, and expert evaluation on the MedThink-Bench dataset

Running time for text-similarity metrics represents the cumulative time across all evaluated metrics. Cost calculations are limited to API service fees; GPU running costs are excluded, as local GPUs on a Linux server were utilized for experiments without incurring additional hardware operation expenses.
NA not applicable.

Quick links

Search