Sequential translation-based multimodal sentiment analysis under uncertain missing modalities

Hai, Yan; Lu, Shanqi; Liu, Zhizhong; Shang, Ling; Sun, Hongxiang; Wang, Jing

doi:10.1038/s41598-026-46910-2

Download PDF

Article
Open access
Published: 07 May 2026

Sequential translation-based multimodal sentiment analysis under uncertain missing modalities

Yan Hai¹,
Shanqi Lu¹^na1,
Zhizhong Liu²,
Ling Shang³^na1,
Hongxiang Sun²^na1 &
…
Jing Wang¹^na1

Scientific Reports , Article number: (2026) Cite this article

We are providing an unedited version of this manuscript to give early access to its findings. Before final publication, the manuscript will undergo further editing. Please note there may be errors present which affect the content, and all legal disclaimers apply.

Subjects

Abstract

Multimodal Sentiment Analysis (MSA) aims to fuse information from multiple modalities to achieve precise sentiment classification. Recently, the issue of uncertain missing modalities has become one of the new challenges in MSA. Previous studies have attempted to solve this issue by building information interactions on modality pairs consisting of two modalities. However, existing methods typically rely on interactions between paired modalities to compensate for missing information. Such representations struggle to accurately reconstruct true cross-modal semantics due to the absence of guidance from a third modality. Additionally, existing approaches have neglected the effective utilization of text modality and the complexity of the models is relatively high. To tackle the above issues, we propose a sequential translation-based MSA model (STMSA). This model incorporates two key designs. First, the text-centric bidirectional translation mechanism leverages the dominant role of the text modality in affective tasks to sequentially establish bidirectional mappings with the audio and video modalities. This mechanism fully explores the deep connections among the three modalities through semantic guidance from text, enabling cross-modal representations that more closely align with real affective distributions. Second, the low-complexity non-modal completion architecture performs distributed fitting on joint representations in a shared space using only an encoder-decoder, thereby avoiding complex missing-modality generation processes. Extensive experiments were conducted on two public datasets, CMU-MOSI and IEMOCAP, demonstrating that the proposed model outperforms 10 state-of-the-art baseline models.

Adaptive multimodal transformer based on exchanging for multimodal sentiment analysis

Article Open access 26 July 2025

A text guided multimodal scale path fusion network for multimodal sentiment analysis

Article Open access 15 December 2025

Multimodal sentiment analysis based on multi-layer feature fusion and multi-task learning

Article Open access 16 January 2025

Acknowledgements

This work was supported by the National Natural Science Foundation of China (Grant nos. 62273290), the Special Funding Program of Shandong Taishan Scholars Project, Key Technology Research and Development Program of Shandong (Grant no. 2025CXPT077), National Cultural and Tourism Technology Innovation Research and Development Project, Henan Province Science and Technology Research Project (No. 252102210138).

Author information

Shanqi Lu, Ling Shang, Hongxiang Sun and Jing Wang contributed equally to this work.

Authors and Affiliations

School of Information Engineering, North China University of Water Resources and Electric Power, Zhengzhou, 450056, China
Yan Hai, Shanqi Lu & Jing Wang
The School of Computer and Control Engineering, Yantai University, Yantai, 264005, China
Zhizhong Liu & Hongxiang Sun
Digital Smart Creative Department, Henan Culture and Tourism Investment Group Co., Ltd, Luoyang, 471026, China
Ling Shang

Authors

Yan Hai
View author publications
Search author on:PubMed Google Scholar
Shanqi Lu
View author publications
Search author on:PubMed Google Scholar
Zhizhong Liu
View author publications
Search author on:PubMed Google Scholar
Ling Shang
View author publications
Search author on:PubMed Google Scholar
Hongxiang Sun
View author publications
Search author on:PubMed Google Scholar
Jing Wang
View author publications
Search author on:PubMed Google Scholar

Corresponding author

Correspondence to Zhizhong Liu.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.

Reprints and permissions

About this article

Cite this article

Hai, Y., Lu, S., Liu, Z. et al. Sequential translation-based multimodal sentiment analysis under uncertain missing modalities. Sci Rep (2026). https://doi.org/10.1038/s41598-026-46910-2

Download citation

Received: 19 October 2025
Accepted: 28 March 2026
Published: 07 May 2026
DOI: https://doi.org/10.1038/s41598-026-46910-2