Table 1 Performance (in %) of general prompts (P/R/F).

	LLL	IEPA	HPRD50	AIMed	BioInfer	Avg_perf
GPT-3.5
Prompt 1	50.0/28.6/36.4	100.0/7.1/13.3	55.0/84.6/66.7	52.2/92.3/66.7	66.7/58.8/62.5	64.8/54.3/49.1
Prompt 2	43.8/33.3/37.8	50.0/50.0/50.0	66.7/61.5/64.0	50.0/76.9/60.6	60.0/52.9/56.3	54.1/54.9/53.7
Prompt 3	57.7/71.4/63.8	50.0/64.3/56.3	56.5/100.0/72.2	44.0/84.6/57.9	50.0/58.8/54.1	51.6/75.8/60.9
Prompt 4	50.0/42.9/46.2	28.6/14.3/19.0	38.5/76.9/51.3	31.6/46.2/37.5	51.7/88.2/65.2	40.1/53.7/43.8
Prompt 5	70.0/66.7/68.3	63.2/85.7/72.7	66.7/92.3/77.4	57.9/84.6/68.8	61.9/76.5/68.4	69.9/81.2/71.1
GPT-4
Prompt 1	41.7/23.8/30.3	40.0/14.3/21.1	46.4/100.0/63.4	41.9/100.0/59.1	55.0/64.7/59.5	45.0/60.6/46.7
Prompt 2	50.0/42.9/46.2	60.0/64.3/62.1	66.7/61.5/64.0	52.4/84.6/64.7	52.9/52.9/52.9	56.4/61.2/58.0
Prompt 3	60.7/81.0/69.4	56.0/100.0/71.8	50.0/92.3/64.9	44.8/100.0/61.9	46.2/70.6/55.8	51.5/88.7/64.8
Prompt 4	53.8/100.0/70.0	47.1/57.1/51.6	40.6/100.0/57.8	34.2/100.0/51.0	44.8/76.5/56.5	44.5/86.7/57.4
Prompt 5	71.4/95.2/81.6	56.0/100.0/71.8	54.2/100.0/70.3	52.0/100.0/68.4	51.5/100.0/68.0	57.0/99.0/72.0

Quick links

Search