Extended Data Fig. 4: Pseudocode for defense algorithm.
From: Medical large language models are vulnerable to data-poisoning attacks

First, knowledge triplets representing medical phrases are extracted from unstructured text using named entity recognition. Each triplet is flagged as invalid or harmful by default. Triplet components (origin, relation, target) are embedded and matched to the graph vocabulary to form candidate triplets. Each candidate triplet is cross-checked with the ground truth knowledge graph. Triplets that can be matched to the graph are marked as valid or non-harmful. A passage is scored non-harmful only if it contains no invalid triplets.