Table 4 Performance (in %) on PPI datasets (P/R/F).

From: The influence of prompt engineering on large language models for protein–protein interaction identification in biomedical literature

PPI dataset

Raw data

Refined data

GPT-3.5

GPT-4

Gemini 1.5 flash

Gemini 1.5 pro

GPT-3.5

GPT-4

Gemini 1.5 flash

Gemini 1.5 pro

LLL

86.3/73.2/79.2

81.5/93.9/87.3

84.0/83.5/85.8

87.9/95.4/90.3

-

-

-

-

IEPA

53.0/86.0/65.6

47.2/99.4/64.0

50.3/92.7/65.1

55.8/99.2/68.2

83.5/86.0/84.7

76.4/99.4/86.4

80.4/92.7/86.9

85.8/99.4/89.7

HPRD50

57.4/71.2/63.6

48.4/94.5/64.0

52.0/83.0/63.2

60.8/97.6/67.5

80.6/71.2/75.6

76.6/94.5/84.6

78.3/89.7/83.3

82.2/96.8/87.1

AIMed

26.6/79.2/39.8

23.8/93.4/37.9

25.1/86.3/38.1

28.9/96.7/41.3

44.0/79.2/56.6

41.5/93.4/57.5

43.7/86.4/58.3

46.5/96.2/60.8

BioInfer

36.9/58.6/45.3

40.1/76.7/53.0

38.6/67.9/49.2

42.6/80.9/55.3

57.3/58.6/58.0

56.6/76.7/65.2

57.7/67.8/61.1

59.5/80.4/68.7

PEDD

34.8/62.7/44.8

56.3/81.0/66.4

45.6/72.8/55.4

60.6/88.8/70.2

52.2/62.7/57.0

64.8/81.0/72.0

58.3/72.7/65.4

68.488.6/75.2

  1. Bold values indicate highest performance scores in each comparison.