Table 1 Recommendations for mitigating risks in LLM-based cancer decision-making
From: Large language model integrations in cancer decision-making: a systematic review and meta-analysis
Risk category | Risk description | Recommendations and sample approaches |
---|---|---|
Automation bias | Clinicians or patients may over-trust LLM-generated outputs without critical review. | Integrate LLMs within accountable clinical decision support systems that require mandatory human verification before adoption of model suggestions. |
Lack of real-patient data | Many LLM evaluations rely on synthetic, curated, or small sample datasets, limiting their relevance to real clinical practice. | Conduct data gathering involving direct interaction with patients and clinical professionals. |
Harm and safety monitoring | There is limited systematic evaluation of potential harms, adverse events, or unintended consequences resulting from LLM-generated clinical recommendations. | Develop robust safety and harm evaluation metrics and implement them in both pre-deployment validation and post-deployment monitoring. |
Data privacy and ethical oversight | Patient data may be exposed during model development, and many studies lack clear reporting on ethical review processes. | Strengthen ethical oversight by requiring IRB review where applicable and adopting privacy-preserving machine learning practices. |
Equity and representation | Non-representative training data risks reinforcing healthcare disparities. | Mandate demographic reporting and dataset audits in evaluation studies. |
Generalizability | Findings based on non-diverse datasets may not apply to broader clinical populations. | Validate LLM outputs across expansive and diverse patient populations before deployment. |
Reproducibility | Limited sharing of datasets, LLM prompts, and evaluation protocols reduces transparency and prevents independent replication of LLM results. | Promote open access to datasets, model prompts, and evaluation benchmarks to strengthen reproducibility. |