Fig. 2: Network architecture of IRENE. | Nature Biomedical Engineering

Fig. 2: Network architecture of IRENE.

From: A transformer-based representation-learning model with unified processing of multimodal input for clinical diagnostics

Fig. 2

a, Overall workflow of IRENE in the first task, that is, pulmonary disease identification. The input data consist of five parts: the chief complaint (ChiComp), laboratory test results (LabTest), demographics (sex and age) and radiograph. Our MDT includes two bidirectional multimodal attention blocks and ten self-attention blocks. The training process is guided by pulmonary disease annotations provided by human experts. b, Encoding different types of clinical text in the free-form embedding. Specifically, IRENE accepts unstructured chief complaints as part of the input. c, Encoding a radiograph as a sequence of image patch tokens. d, Detailed design of a bidirectional multimodal attention block, which consists of two-layer normalization layers (Norm), one bidirectional multimodal attention layer and one MLP. e, Detailed attention operations in the bidirectional multimodal attention layer, where representations across multiple modalities are learned and fused simultaneously. f, Detailed architecture of a self-attention block. PI, position injection.

Back to article page