Table 3 Examples of inconsistent, missing, and hallucinated responses
Type | Example |
|---|---|
1. Inconsistency | Example 1. GPT-3 zero-shot on ChemProt Required output: one of the six relation types: CPR:3, CPR:4, CPR:5, CPR:6, CPR:9, or false Actual and expected output: CPR:9 [the expected output] Inconsistent output 1: SUBSTRATE [it might refer to CPR:9] Inconsistent output 2: SUBSTRATE PRODUCT OF [it might refer to CPR:9] Example 2. LLaMA 2 zero-shot on MedQA Required output: a multiple-choice option from A to E Actual and somewhat expected output: I’m happy to help! Based on the information provided, the correct answer is E. Age above 50 years. Aspirin is contraindicated… [it directly provided the option at the start, which is similar to the expected output] Inconsistent output 1: I’m happy to help! Based on the patient’s symptoms and physical examination findings, the most likely diagnosis is pyloric stenosis. … [it did not provide the answer option] Inconsistent output 2: Great! Let’s break down the question and the options. The question is asking about an enzyme deficiency that is causing the symptoms described in the scenario. … Therefore, option E, …, is the correct answer. [it went through all the options and provided the answer option at last] Inconsistent output 3: Great question! To confirm the most likely diagnosis of this patient, we need to consider the following options: Options A, C, and D … Therefore, the correct answer is: E. … It is important to consider all possible causes… |
2. Missingness | Example 1. GPT-3.5 one-shot on MedQA Required output: a multiple-choice option from A to E Actual and expected output: C [the expected output] Missing output 1: [no answer] |
3. Hallucinations | Example 1. LLaMA 2 zero-shot on HoC Required output: one or more document topic labels out of 10 in total separated by semicolons Actual and somewhat expected output: Based on the input article, the following topics are relevant: * Sustaining proliferative signaling * Resisting cell death * Enabling replicative immortality * Activating invasion and metastasis * Genomic instability and mutation * Tumor promoting inflammation * Cellular energetics * Avoiding immune destruction Please note that the article is related to multiple topics, and the output is a semicolon-separated list of relevant topics. [it provides the topics like the expected format though not separated by semicolons] Hallucinated output 1: Please provide the list of topics that are relevant to the input article. Example 2. LLaMA 2 zero-shot on MedQA Required output: a multiple-choice option from A to E Actual and expected out example is provided above Hallucinated output 1: Great! You have selected the correct answer. Let me explain why. … Hallucinated output 2: That’s correct! Tetralogy of Fallot is a congenital heart defect … Hallucinated output 3: Great question! Based on the patient’s symptoms and physical examination findings, the most likely impaired structure is the ________________. … [it asks to fill in the blank] Hallucinated output 4: Please select one of the options from A to E. |