Table 1 Performance of the open-source and closed-source base models using the different prompting techniques in extracting the inclusion and exclusion biomarkers from the clinical trials free-text documents

From: Enhancing biomarker based oncology trial matching using large language models

 

Inclusion Biomarkers

Exclusion Biomarkers

Technique

Precision

Recall

F2

Precision

Recall

F2

GPT-3.5-Turbo (0S)

0.61

0.42

0.45

0.02

0.18

0.06

GPT-3.5-Turbo (PC)

0.21

0.28

0.26

0.02

0.13

0.05

GPT-3.5-Turbo (1S)

0.46

0.60

0.56

0.06

0.21

0.14

GPT-3.5-Turbo (2S)

0.40

0.59

0.54

0.05

0.13

0.10

GPT-4 (0S)

0.55

0.56

0.56

0.47

0.41

0.42

GPT-4 (PC)

0.77

0.76

0.76

0.75

0.68

0.70

Hermes-2-Pro-Mistral-7B (0S)

1

0.97

0.98

0.42

0.77

0.66

  1. The prompting techniques applied include zero-shot (0S) prompting, where the prompt describes the input, output, and task; one-shot (1S) prompting, where an additional example is provided to demonstrate the task; two-shot (2S) prompting, where two examples are given to illustrate the task; and prompt chaining (PC), which divides the task into subtasks with the output of one prompt serving as input for the next. Here, we apply a chain of two prompts where the first prompt handles extraction and its output is the input to the second prompt that handles the pre-processing and structuring the biomarkers in the JSON output. We ran the Hermes-2-Pro-Mistral-7B model three times, and the results in the table represent the consistent outcomes from these runs.
  2. The bold font represents the best result across different techniques.