Fig. 5: The overview of our method. | Nature Communications

Fig. 5: The overview of our method.

From: Large-scale long-tailed disease diagnosis on radiology images

Fig. 5

Three parts demonstrate our proposed visual encoders and fusion module, together with the knowledge enhancement strategy respectively. a The three types of vision encoder, i.e., ResNet-based, ViT-based, and ResNet-ViT-mixing. b The architecture of the fusion module. The figure shows the transformer-based fusion module, enabling case-level information fusion. c The knowledge enhancement strategy. We first pre-train a text encoder with extra medical knowledge with contrastive learning, leveraging synonyms, descriptions, and hierarchy. Then we view the text embedding as a natural classifier to guide the diagnosis classification.

Back to article page