Fig. 3
From: Leveraging multimodal large language model for multimodal sequential recommendation

A schematic diagram of the MLLM-SRec framework. (a) Dynamic user multimodal interaction sequence construction with a special window and step. (b) On the basis of VQA-based item understanding, multimodal summarization is carried out through the Item Multimodal Summary Generator Unit. (c) Our proposed MLLM-based SR framework based on the base model.