Table 2 Results of human evaluation, comparing our method with ChatGPT and GPT-4 in terms of factuality, completeness, safety and preference

From: A collaborative large language model for drug analysis

Metric

ChatGPT1 versus DrugGPT

GPT-42 versus DrugGPT

ChatGPT wins

Tie

DrugGPT wins

GPT-4 wins

Tie

DrugGPT wins

Factuality

24.0

16.0

60.0

33.0

20.0

47.0

Completeness

15.0

22.0

63.0

21.0

28.0

51.0

Safety

19.0

12.0

69.0

23.0

18.0

59.0

Preference

13.0

11.0

76.0

18.0

14.0

68.0

  1. All values are reported in percentage (%).