Table 1 Recommendations for mitigating risks in LLM-based cancer decision-making

From: Large language model integrations in cancer decision-making: a systematic review and meta-analysis

Risk category

Risk description

Recommendations and sample approaches

Automation bias

Clinicians or patients may over-trust LLM-generated outputs without critical review.

Integrate LLMs within accountable clinical decision support systems that require mandatory human verification before adoption of model suggestions.

Lack of real-patient data

Many LLM evaluations rely on synthetic, curated, or small sample datasets, limiting their relevance to real clinical practice.

Conduct data gathering involving direct interaction with patients and clinical professionals.

Harm and safety monitoring

There is limited systematic evaluation of potential harms, adverse events, or unintended consequences resulting from LLM-generated clinical recommendations.

Develop robust safety and harm evaluation metrics and implement them in both pre-deployment validation and post-deployment monitoring.

Data privacy and ethical oversight

Patient data may be exposed during model development, and many studies lack clear reporting on ethical review processes.

Strengthen ethical oversight by requiring IRB review where applicable and adopting privacy-preserving machine learning practices.

Equity and representation

Non-representative training data risks reinforcing healthcare disparities.

Mandate demographic reporting and dataset audits in evaluation studies.

Generalizability

Findings based on non-diverse datasets may not apply to broader clinical populations.

Validate LLM outputs across expansive and diverse patient populations before deployment.

Reproducibility

Limited sharing of datasets, LLM prompts, and evaluation protocols reduces transparency and prevents independent replication of LLM results.

Promote open access to datasets, model prompts, and evaluation benchmarks to strengthen reproducibility.