Fig. 5: Single modality and fusion model pipeline.
From: Leveraging multimodal machine learning for accurate risk identification of intimate partner violence

The tabular data includes static demographics data and time-stamped data like diagnoses, medications, and vitals. The timestamped data are processed using time-series techniques to capture historical feature variations (such as the most recent value, mean, max, etc.). The unstructured clinical texts are processed using a transformer-based clinical Large Language Model, and the resulting fixed-length vector embeddings of the clinical texts are used as inputs to the downstream models. For single-modality models, we train classification models using either the tabular data or the embedding representations of the clinical texts. The HAIM fusion model concatenates the embeddings extracted from the tabular data with those of clinical texts as inputs to the model.