Table 1 Performance (in %) of general prompts (P/R/F).

From: The influence of prompt engineering on large language models for protein–protein interaction identification in biomedical literature

 

LLL

IEPA

HPRD50

AIMed

BioInfer

Avg_perf

GPT-3.5

 Prompt 1

50.0/28.6/36.4

100.0/7.1/13.3

55.0/84.6/66.7

52.2/92.3/66.7

66.7/58.8/62.5

64.8/54.3/49.1

 Prompt 2

43.8/33.3/37.8

50.0/50.0/50.0

66.7/61.5/64.0

50.0/76.9/60.6

60.0/52.9/56.3

54.1/54.9/53.7

 Prompt 3

57.7/71.4/63.8

50.0/64.3/56.3

56.5/100.0/72.2

44.0/84.6/57.9

50.0/58.8/54.1

51.6/75.8/60.9

 Prompt 4

50.0/42.9/46.2

28.6/14.3/19.0

38.5/76.9/51.3

31.6/46.2/37.5

51.7/88.2/65.2

40.1/53.7/43.8

 Prompt 5

70.0/66.7/68.3

63.2/85.7/72.7

66.7/92.3/77.4

57.9/84.6/68.8

61.9/76.5/68.4

69.9/81.2/71.1

GPT-4

 Prompt 1

41.7/23.8/30.3

40.0/14.3/21.1

46.4/100.0/63.4

41.9/100.0/59.1

55.0/64.7/59.5

45.0/60.6/46.7

 Prompt 2

50.0/42.9/46.2

60.0/64.3/62.1

66.7/61.5/64.0

52.4/84.6/64.7

52.9/52.9/52.9

56.4/61.2/58.0

 Prompt 3

60.7/81.0/69.4

56.0/100.0/71.8

50.0/92.3/64.9

44.8/100.0/61.9

46.2/70.6/55.8

51.5/88.7/64.8

 Prompt 4

53.8/100.0/70.0

47.1/57.1/51.6

40.6/100.0/57.8

34.2/100.0/51.0

44.8/76.5/56.5

44.5/86.7/57.4

 Prompt 5

71.4/95.2/81.6

56.0/100.0/71.8

54.2/100.0/70.3

52.0/100.0/68.4

51.5/100.0/68.0

57.0/99.0/72.0

  1. Bold values indicate highest performance scores in each comparison.