Table 1 Strategies to mitigate the impact of hallucinations in large language models (LLMs)

From: The long but necessary road to responsible use of large language models in healthcare research

Strategy

Description

Pre-defined purpose

Where possible, LLMs should be tailored for specific use cases (e.g. information abstraction from pathology reports) to ensure that end-users clearly understand their intended applications and limitations.

High-quality data

LLMs should be trained on domain-specific, representative, and factually accurate data. This data should not be limited to publicly available sources; efforts should also be made to include content behind paywalls or member-only subscriptions.

Data templates

Data templates facilitate data consistency and clarity, which may mitigate the risk of generating incorrect outputs.

Chain-of-verification

Chain-of-verification incorporates a structured approach for LLMs to verify each output against a reliable data source (e.g. the original pathology report) before finalization. This process enables LLMs to detect and correct any inconsistencies in their initial outputs.

Degree of uncertainty

Indicating the LLM’s confidence in its output allows end-users to better assess the reliability of the information provided and determine whether additional verification is required.

Response restrictions

Establishing “safe” boundaries for possible LLM outputs may mitigate the risk of generating incorrect or biased responses.

Human in the loop

Human oversight provides valuable domain-specific and social construct expertise to assess LLM outputs, and serves as the final safeguard against hallucinations prior to their intended use.

Updates

Processes should be established to continuously evaluate the accuracy and appropriateness of LLM responses. Updates should be provided as needed to ensure outputs remain aligned with current knowledge.

  1. Adapted from refs. 4,11,14,15.