Fig. 4: Comparative analysis on model ratings. | Nature Communications

Fig. 4: Comparative analysis on model ratings.

From: Towards building multilingual language model for medicine

Fig. 4

a Score bars represent ranked scores under different metrics. BLEU score rating denotes the rating score calculated based on ranking by BLEU score. Human rating refers to rankings provided by humans, while GPT-4 rating refers to rankings generated by GPT-4. b The fitted lines present the correlation between human rating results and different automatic metrics. Ï„ is the Kendall rank correlation coefficient while k is the slope of fitted line. Source data are provided as a Source Data file.

Back to article page