Table 1 Strategies to mitigate the impact of hallucinations in large language models (LLMs)
From: The long but necessary road to responsible use of large language models in healthcare research
Strategy | Description |
---|---|
Pre-defined purpose | Where possible, LLMs should be tailored for specific use cases (e.g. information abstraction from pathology reports) to ensure that end-users clearly understand their intended applications and limitations. |
High-quality data | LLMs should be trained on domain-specific, representative, and factually accurate data. This data should not be limited to publicly available sources; efforts should also be made to include content behind paywalls or member-only subscriptions. |
Data templates | Data templates facilitate data consistency and clarity, which may mitigate the risk of generating incorrect outputs. |
Chain-of-verification | Chain-of-verification incorporates a structured approach for LLMs to verify each output against a reliable data source (e.g. the original pathology report) before finalization. This process enables LLMs to detect and correct any inconsistencies in their initial outputs. |
Degree of uncertainty | Indicating the LLM’s confidence in its output allows end-users to better assess the reliability of the information provided and determine whether additional verification is required. |
Response restrictions | Establishing “safe” boundaries for possible LLM outputs may mitigate the risk of generating incorrect or biased responses. |
Human in the loop | Human oversight provides valuable domain-specific and social construct expertise to assess LLM outputs, and serves as the final safeguard against hallucinations prior to their intended use. |
Updates | Processes should be established to continuously evaluate the accuracy and appropriateness of LLM responses. Updates should be provided as needed to ensure outputs remain aligned with current knowledge. |