Fig. 5: Construction and demonstration of the Link-Eval, an automated evaluation system for evidence-traceable text.

a Example of the citation-based generated text format used by LINS. b Evaluation dimensions of Link-Eval. c Process for calculating the correct citation set, correct citations, and valid citations. d Example calculation of citation precision and citation recall scores. e Process for calculating statement correctness and statement fluency. f Test accuracy of the NLI models, clinical physician, and lay user on the MedNLI-mini dataset. g Cramér’s V correlation scores between the NLI models, physician, lay user, and gold label on the MedNLI-mini dataset. h Evaluation results of LINS using Link-Eval on PubMedQA* and HealthSearchQA. Source data are provided as a Source Data file.