Fig. 5: Average Kendall’s tau of various methods for six target diseases.

To calculate Kendall’s tau, cosine similarities are used for MUGS and four benchmark methods. Relatedness scores ranging from 0 to 1 are used for GPT-3.5 and GPT-4. A scale of 0 (not related), 0.5 (maybe related), and 1 (strongly related) is used for manually corrected survey labels.