Fig. 1: Overview of the KnowRare framework.

KnowRare operates in three steps: a Data extraction and graph construction: Structured EHR data, including demographic data, vital signs, laboratory tests, diagnoses, and drug records, are extracted and preprocessed through aggregation, imputation, and normalisation. Condition similarities are quantified from three perspectives: diagnosis co-occurrence (categorical), record variable distributions (continuous), and shared drug usage (categorical). These similarities are integrated into a heterogeneous condition knowledge graph (KG), capturing comprehensive clinical relationships. b Condition-level representation learning: This step involves two modules. The condition-agnostic pre-training module trains a time-series encoder via self-supervision to learn general temporal patterns independent of specific conditions, providing robust initial latent representations. Concurrently, the condition KG embedding module uses KG embedding techniques to generate condition embeddings that represent clinical similarities among conditions. c Rare condition adaptation: This step optimises patient-level representations. First, the knowledge-guided domain selection module identifies the top-k source conditions most similar to the target rare condition by calculating cosine similarity between condition embeddings. Subsequently, the joint adversarial domain adaptation module fine-tunes the pre-trained time-series encoder with the target rare condition and the selected top-k source conditions. The encoder produces patient-level latent representations (hT), integrating condition-agnostic knowledge with insights derived from similar source conditions. Based on these latent representations, a classifier predicts clinical outcomes (\(\widehat{y}\); mortality, readmission, length of stay, etc.). Concurrently, a discriminator network is trained adversarially to distinguish whether the latent representations and predicted outcomes originate from patients with the target rare condition or the selected source conditions. This adversarial process ensures the encoder generates robust representations, which improve predictive performance for heterogeneous rare conditions in the ICU. The figure is created using Microsoft PowerPoint.