Fig. 2: Overview of the ZETA framework.
From: Interpretable multimodal zero shot ECG diagnosis via structured clinical knowledge alignment

(1) ECG Observation Generation & Curation: Large language models (LLMs) generate candidate positive and negative observations (e.g., for 1AVB), which are validated and refined by clinical experts. (2) Observation Comparison: A pre-trained multimodal model compares ECG embeddings against expert-reviewed observation embeddings, yielding aggregated similarity scores for a final likelihood score. This process provides interpretable insights by linking predictions back to structured, clinically relevant diagnostic features.