Fig. 1 | Scientific Reports

Fig. 1

From: Representation learning approach for understanding structured documents

Fig. 1

Overview of the proposed DREEL (Document Relationship Entity Embedding Learner) framework. The architecture processes multi-modal inputs (images, bounding boxes, articles, and OCR text) through a dedicated Embedding Block. The Neighbourhood Encoding module utilizes self-attention mechanisms to capture contextual relationships between entities. These features are synthesized in the Feature Processing Block using a Vision Language Transformer and Feed-Forward Networks (FFN) to produce final outputs for detection and semantic structural congruence.

Back to article page