Facial expression recognition via variational inference

Lv, Gang; Zhang, JunLing; Tsoi, Chiki

doi:10.1038/s41598-026-38734-x

Download PDF

Article
Open access
Published: 05 February 2026

Facial expression recognition via variational inference

Gang Lv¹,
JunLing Zhang¹ &
Chiki Tsoi²

Scientific Reports , Article number: (2026) Cite this article

We are providing an unedited version of this manuscript to give early access to its findings. Before final publication, the manuscript will undergo further editing. Please note there may be errors present which affect the content, and all legal disclaimers apply.

Subjects

Abstract

Facial expressions in the wild are rarely discrete; they often manifest as compound emotions or subtle variations that challenge the discriminative capabilities of conventional models. While psychological research suggests that expressions are often combinations of basic emotional units, most existing FER methods rely on deterministic point estimation, failing to model the intrinsic uncertainty and continuous nature of emotions. To address this, we propose POSTER-Var, a framework integrating a Variational Inference-based Classification Head (VICH). Unlike standard classifiers, VICH maps facial features into a probabilistic latent space via the reparameterization trick, enabling the model to learn the underlying distribution of expression intensities. Furthermore, we enhance feature representation by introducing layer embeddings and nonlinear transformations into the feature pyramid, facilitating the fusion of hierarchical semantic information. Extensive experiments on RAF-DB, AffectNet, and FER+ demonstrate that our method effectively handles fine-grained expression recognition, achieving state-of-the-art performance. The code has been open-sourced at: https://github.com/lg2578/poster-var.

Data availability

The RAF-DB dataset is available from the original authors upon request for non-commercial research purposes. Researchers affiliated with academic institutions may request access by contacting the authors as described at http://whdeng.cn/RAF/model1.html. The FER+ dataset is available at https://github.com/microsoft/FERPlus. The AffectNet dataset can be requested from the original authors at https://mohammadmahoor.com/pages/databases/affectnet/ by eligible researchers (e.g., Principal Investigators) subject to a signed license agreement.

References

Wang, C., Chen, L., Wang, L., Li, Z. & Lv, X. Qcs: Feature refining from quadruplet cross similarity for facial expression recognition. In Proceedings of the AAAI conference on artificial intelligence, vol. 39, pp. 7563–7572 (2025).
Deng, J., Guo, J., Xue, N. & Zafeiriou, S. Arcface: Additive angular margin loss for deep face recognition. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 4690–4699 . http://openaccess.thecvf.com/content_CVPR_2019/html/Deng_ArcFace_Additive_Angular_Margin_Loss_for_Deep_Face_Recognition_CVPR_2019_paper.html (2019).
Savchenko, A.V. Facial expression and attributes recognition based on multi-task learning of lightweight neural networks. In 2021 IEEE 19th International symposium on intelligent systems and informatics (SISY). IEEE, pp. 119–124 https://ieeexplore.ieee.org/abstract/document/9582508/ (2021).
Xue, F., Wang, Q. & Guo, G. Transfer: Learning relation-aware facial expression representations with transformers. in: Proceedings of the IEEE/CVF International Conference on Computer vision, pp. 3601–3610. http://openaccess.thecvf.com/content/ICCV2021/html/Xue_TransFER_Learning_Relation-Aware_Facial_Expression_Representations_With_Transformers_ICCV_2021_paper.html (2021) (Accessed 2025-04-10).
Zheng, C., Mendieta, M. & Chen, C. Poster: A pyramid cross-fusion transformer network for facial expression recognition. In: Proceedings of the IEEE/CVF International conference on computer vision, pp. 3146–3155. https://openaccess.thecvf.com/content/ICCV2023W/AMFG/html/Zheng_POSTER_A_Pyramid_Cross- Fusion_Transformer_Network_for_Facial_Expression_Recognition_ICCVW_2023_paper.html (2023) (Accessed 2025-04-10).
Mao, J. et al. POSTER++: A simpler and stronger facial expression recognition network. Patt. Recognit. 157, 110951. https://doi.org/10.1016/j.patcog.2024.110951. (2025) (Accessed 2025-03-10) .
Google Scholar
Chen, C. PyTorch Face Landmark: A fast and accurate facial landmark detector. Opensource software available at https://github.com/cunjian/pytorch_face_landmark, 27 (2021).
Ekman, P. & Friesen, W.V. Facial action coding system. Environmental Psychology & Nonverbal Behavior (1978).
Plutchik, R. A general psychoevolutionary theory of emotion. In: Theories of Emotion, pp. 3–33. Elsevier https://www.sciencedirect.com/science/article/pii/B9780125587013500077 (1980).
Zhou, Y., Xue, H. & Geng, X. Emotion Distribution Recognition from Facial Expressions. In: Proceedings of the 23rd ACM International Conference on Multimedia, pp. 1247–1250. ACM, Brisbane Australia. https://doi.org/10.1145/2733373.2806328 (2015).
Jia, X., Zheng, X., Li, W., Zhang, C. & Li, Z. Facial emotion distribution learning by exploiting low-rank label correlations locally. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 9841–9850. http://openaccess.thecvf.com/content_CVPR_2019/html/Jia_Facial_Emotion_Distribution_Learning_by_Exploiting_Low-Rank_Label_Correlations_Locally_CVPR_2019_paper.html (2019) (Accessed 2025-11-10).
Yang, S., Yang, X., Wu, J. & Feng, B. Significant feature suppression and cross-feature fusion networks for fine-grained visual classification. Sci. Rep. 14(1), 24051. https://doi.org/10.1038/s41598-024-74654-4 (2024) . (Accessed 2025-12-02).
Google Scholar
Zhao, Z., Liu, Q. & Zhou, F. Robust lightweight facial expression recognition network with label distribution training. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 35, pp. 3510–3519. https://ojs.aaai.org/index.php/aaai/article/view/16465 (2021) (Accessed 2025-04-10).
Gildenblat, J. contributors: PyTorch library for CAM methods. GitHub. https://github.com/jacobgil/pytorch-grad-cam (2021).
Zhang, C., Bütepage, J., Kjellström, H. & Mandt, S. Advances in Variational Inference. IEEE Trans. Pattern Anal. Mach. Intell. 41(8), 2008–2026. https://doi.org/10.1109/TPAMI.2018.2889774. (2019) (Accessed 2025-04-12).
Google Scholar
Van Den Oord, A. & Vinyals, O. Neural discrete representation learning. Advances in neural information processing systems 30. (2017) (Accessed 2025-04-17).
Zhang, Z., Li, X., Guo, K. & Xu, X. Facial expression recognition based on multi-task self-distillation with coarse and fine grained labels. Expert Syst. Appl. 281, 127440. https://doi.org/10.1016/j.eswa.2025.127440 (2025) (Accessed 2025-07-10).
Google Scholar
Parthasarathy, S., Rozgic, V., Sun, M. & Wang, C. Improving emotion classification through variational inference of latent variables. In: ICASSP 2019-2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 7410–7414. IEEE, https://ieeexplore.ieee.org/abstract/document/8682823/ (2019) (Accessed 2025-04-12).
Chamain, L. D., Qi, S. & Ding, Z. End-to-end image classification and compression with variational autoencoders. IEEE Internet Things J. 9(21), 21916–21931. https://doi.org/10.1109/JIOT.2022.3182313 (2022) (Accessed 2025-03-14).
Google Scholar
Hashemifar, S., Marefat, A., Hassannataj Joloudari, J. & Hassanpour, H. Enhancing face recognition with latent space data augmentation and facial posture reconstruction. Expert Syst. Appl. 238, 122266. https://doi.org/10.1016/j.eswa.2023.122266 (2024) (Accessed 2025-07-10).
Google Scholar
Hu, J., Shen, L. & Sun, G. Squeeze-and-excitation networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 7132–7141. http://openaccess.thecvf.com/content_cvpr_2018/html/Hu_Squeeze-and-Excitation_Networks_CVPR_201 8_paper.html (2018) (Accessed 2025-04-28).
Woo, S., Park, J., Lee, J.-Y. & Kweon, I.S. CBAM: Convolutional Block Attention Module. In: Computer Vision – ECCV 2018: 15th European Conference, Munich, Germany, September 8–14, 2018, Proceedings, Part VII, pp. 3–19. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-030-01234-2_1 (2018).
Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G. & Gelly, S. An image is worth 16x16 words: Transformers for image recognition at scale. In: International Conference on Learning Representations (ICLR) (2021).
He, J. et al. Micro_nest: multi-scale attention enhanced micro-expression recognition framework. Expert Syst. Appl. 290, 128372. https://doi.org/10.1016/j.eswa.2025.128372 (2025) (Accessed 2025-07-10).
Google Scholar
Lu, Z., Lin, R. & Hu, H. Tri-level modality-information disentanglement for visible-infrared person re-identification. IEEE Trans Multim 26, 2700–2714 (2023) (Accessed 2025-11-08).
Google Scholar
Lu, Z., Lin, R. & Hu, H. Disentangling modality and posture factors: Memory-attention and orthogonal decomposition for visible-infrared person re-identification. IEEE Trans. Neural Netw. Learn. Syst. 36(3), 5494–5508 (2024).
Google Scholar
Hatamizadeh, A., Yin, H., Heinrich, G., Kautz, J. & Molchanov, P. Global context vision transformers. In: International conference on machine learning, pp. 12633–12646. PMLR. https://proceedings.mlr.press/v202/hatamizadeh23a.html (2023).
Li, S., Deng, W. & Du, J. Reliable crowdsourcing and deep locality-preserving learning for expression recognition in the wild. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2852–2861. http://openaccess.thecvf.com/content_cvpr_2017/html/Li_Reliable_Crowdsourcing_and_CVPR_2017_paper.html (2017) (Accessed 2025-03-19).
Mollahosseini, A., Hasani, B. & Mahoor, M. H. Affectnet: A database for facial expression, valence, and arousal computing in the wild. IEEE Trans. Affect. Comput. 10(1), 18–31 (2017).
Google Scholar
Barsoum, E., Zhang, C., Ferrer, C.C. & Zhang, Z. Training deep networks for facial expression recognition with crowd-sourced label distribution. In: Proceedings of the 18th ACM International Conference on Multimodal Interaction, pp. 279–283. ACM, Tokyo Japan. https://doi.org/10.1145/2993148.2993165. (2016) (Accessed 2025-04-26).
Loshchilov, I. & Hutter, F. Decoupled weight decay regularization. In: International Conference on Learning Representations (ICLR) (2019).
Vo, T.-H., Lee, G.-S., Yang, H.-J. & Kim, S.-H. Pyramid with super resolution for in-the-wild facial expression recognition. IEEE Access 8, 131988–132001. https://doi.org/10.1109/ACCESS.2020.3010018 (2020) (Accessed 2025-06-10).
Google Scholar
Zeng, D., Lin, Z., Yan, X., Liu, Y., Wang, F. & Tang, B. Face2Exp: Combating Data Biases for Facial Expression Recognition, pp. 20291–20300. https://openaccess.thecvf.com/content/CVPR2022/html/Zeng_Face2Exp_Combating_Data_Biases_for_Facial_Expression_Recognition_CVPR_2022_paper.html (2022) (Accessed 2025-06-10).
Xu, J., Li, Y., Yang, G., He, L. & Luo, K. Multiscale facial expression recognition based on dynamic global and static local attention. IEEE Trans. Affect. Comput. https://doi.org/10.1109/TAFFC.2024.3458464 (2024) (Accessed 2025-11-08).
Google Scholar
Kingma, D. P. & Welling, M. Auto-encoding variational bayes. Preprint at https://arxiv.org/abs/1312.6114https://arxiv.org/abs/1312.6114 (2013).

Download references

Funding

This work was supported by funding from Zhejiang Office Philosophy and Social Sciences Planning Project (24NDJC04Z), the 3rd Batch of Scientific Research Innovation Teams of Zhejiang Open University. Jinhua Science and Technology Bureau (2025-4-178). The funders had no role in the design of the study, collection and analysis of data, writing of the manuscript, or decision to submit the manuscript for publication.

Author information

Authors and Affiliations

Learning and Information Center, JinHua Open University, No. 18 Qingzhao Road, JinHua, 321000, Zhejiang, China
Gang Lv & JunLing Zhang
Advanced Institute of Information Technology, Peking University, No. 233 Yonghui Road, Xiaoshan District, Hangzhou, 311200, Zhejiang, China
Chiki Tsoi

Authors

Gang Lv
View author publications
Search author on:PubMed Google Scholar
JunLing Zhang
View author publications
Search author on:PubMed Google Scholar
Chiki Tsoi
View author publications
Search author on:PubMed Google Scholar

Contributions

Gang lv: Conceptualization, Methodology, Writing-Original draft preparation, Investigation, Software, Validation Junling Zhang: Conceptualization, Writing- Reviewing and Editing Chiki Tsoi:Validation,Provided valuable guidance–particularly on improving the figures

Corresponding author

Correspondence to Gang Lv.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Lv, G., Zhang, J. & Tsoi, C. Facial expression recognition via variational inference. Sci Rep (2026). https://doi.org/10.1038/s41598-026-38734-x

Download citation

Received: 10 September 2025
Accepted: 30 January 2026
Published: 05 February 2026
DOI: https://doi.org/10.1038/s41598-026-38734-x

Facial expression recognition via variational inference

Subjects

Abstract

Data availability

References

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Competing interests

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Keywords

Search

Quick links

Subjects

Abstract

Data availability

References

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Competing interests

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Quick links