Table 1 Performance (in %) of general prompts (P/R/F).
LLL | IEPA | HPRD50 | AIMed | BioInfer | Avg_perf | |
---|---|---|---|---|---|---|
GPT-3.5 | ||||||
Prompt 1 | 50.0/28.6/36.4 | 100.0/7.1/13.3 | 55.0/84.6/66.7 | 52.2/92.3/66.7 | 66.7/58.8/62.5 | 64.8/54.3/49.1 |
Prompt 2 | 43.8/33.3/37.8 | 50.0/50.0/50.0 | 66.7/61.5/64.0 | 50.0/76.9/60.6 | 60.0/52.9/56.3 | 54.1/54.9/53.7 |
Prompt 3 | 57.7/71.4/63.8 | 50.0/64.3/56.3 | 56.5/100.0/72.2 | 44.0/84.6/57.9 | 50.0/58.8/54.1 | 51.6/75.8/60.9 |
Prompt 4 | 50.0/42.9/46.2 | 28.6/14.3/19.0 | 38.5/76.9/51.3 | 31.6/46.2/37.5 | 51.7/88.2/65.2 | 40.1/53.7/43.8 |
Prompt 5 | 70.0/66.7/68.3 | 63.2/85.7/72.7 | 66.7/92.3/77.4 | 57.9/84.6/68.8 | 61.9/76.5/68.4 | 69.9/81.2/71.1 |
GPT-4 | ||||||
Prompt 1 | 41.7/23.8/30.3 | 40.0/14.3/21.1 | 46.4/100.0/63.4 | 41.9/100.0/59.1 | 55.0/64.7/59.5 | 45.0/60.6/46.7 |
Prompt 2 | 50.0/42.9/46.2 | 60.0/64.3/62.1 | 66.7/61.5/64.0 | 52.4/84.6/64.7 | 52.9/52.9/52.9 | 56.4/61.2/58.0 |
Prompt 3 | 60.7/81.0/69.4 | 56.0/100.0/71.8 | 50.0/92.3/64.9 | 44.8/100.0/61.9 | 46.2/70.6/55.8 | 51.5/88.7/64.8 |
Prompt 4 | 53.8/100.0/70.0 | 47.1/57.1/51.6 | 40.6/100.0/57.8 | 34.2/100.0/51.0 | 44.8/76.5/56.5 | 44.5/86.7/57.4 |
Prompt 5 | 71.4/95.2/81.6 | 56.0/100.0/71.8 | 54.2/100.0/70.3 | 52.0/100.0/68.4 | 51.5/100.0/68.0 | 57.0/99.0/72.0 |