Abstract
Facial expressions in the wild are rarely discrete; they often manifest as compound emotions or subtle variations that challenge the discriminative capabilities of conventional models. While psychological research suggests that expressions are often combinations of basic emotional units, most existing FER methods rely on deterministic point estimation, failing to model the intrinsic uncertainty and continuous nature of emotions. To address this, we propose POSTER-Var, a framework integrating a Variational Inference-based Classification Head (VICH). Unlike standard classifiers, VICH maps facial features into a probabilistic latent space via the reparameterization trick, enabling the model to learn the underlying distribution of expression intensities. Furthermore, we enhance feature representation by introducing layer embeddings and nonlinear transformations into the feature pyramid, facilitating the fusion of hierarchical semantic information. Extensive experiments on RAF-DB, AffectNet, and FER+ demonstrate that our method effectively handles fine-grained expression recognition, achieving state-of-the-art performance. The code has been open-sourced at: https://github.com/lg2578/poster-var.
Data availability
The RAF-DB dataset is available from the original authors upon request for non-commercial research purposes. Researchers affiliated with academic institutions may request access by contacting the authors as described at http://whdeng.cn/RAF/model1.html. The FER+ dataset is available at https://github.com/microsoft/FERPlus. The AffectNet dataset can be requested from the original authors at https://mohammadmahoor.com/pages/databases/affectnet/ by eligible researchers (e.g., Principal Investigators) subject to a signed license agreement.
References
Wang, C., Chen, L., Wang, L., Li, Z. & Lv, X. Qcs: Feature refining from quadruplet cross similarity for facial expression recognition. In Proceedings of the AAAI conference on artificial intelligence, vol. 39, pp. 7563–7572 (2025).
Deng, J., Guo, J., Xue, N. & Zafeiriou, S. Arcface: Additive angular margin loss for deep face recognition. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 4690–4699 . http://openaccess.thecvf.com/content_CVPR_2019/html/Deng_ArcFace_Additive_Angular_Margin_Loss_for_Deep_Face_Recognition_CVPR_2019_paper.html (2019).
Savchenko, A.V. Facial expression and attributes recognition based on multi-task learning of lightweight neural networks. In 2021 IEEE 19th International symposium on intelligent systems and informatics (SISY). IEEE, pp. 119–124 https://ieeexplore.ieee.org/abstract/document/9582508/ (2021).
Xue, F., Wang, Q. & Guo, G. Transfer: Learning relation-aware facial expression representations with transformers. in: Proceedings of the IEEE/CVF International Conference on Computer vision, pp. 3601–3610. http://openaccess.thecvf.com/content/ICCV2021/html/Xue_TransFER_Learning_Relation-Aware_Facial_Expression_Representations_With_Transformers_ICCV_2021_paper.html (2021) (Accessed 2025-04-10).
Zheng, C., Mendieta, M. & Chen, C. Poster: A pyramid cross-fusion transformer network for facial expression recognition. In: Proceedings of the IEEE/CVF International conference on computer vision, pp. 3146–3155. https://openaccess.thecvf.com/content/ICCV2023W/AMFG/html/Zheng_POSTER_A_Pyramid_Cross- Fusion_Transformer_Network_for_Facial_Expression_Recognition_ICCVW_2023_paper.html (2023) (Accessed 2025-04-10).
Mao, J. et al. POSTER++: A simpler and stronger facial expression recognition network. Patt. Recognit. 157, 110951. https://doi.org/10.1016/j.patcog.2024.110951. (2025) (Accessed 2025-03-10) .
Chen, C. PyTorch Face Landmark: A fast and accurate facial landmark detector. Opensource software available at https://github.com/cunjian/pytorch_face_landmark, 27 (2021).
Ekman, P. & Friesen, W.V. Facial action coding system. Environmental Psychology & Nonverbal Behavior (1978).
Plutchik, R. A general psychoevolutionary theory of emotion. In: Theories of Emotion, pp. 3–33. Elsevier https://www.sciencedirect.com/science/article/pii/B9780125587013500077 (1980).
Zhou, Y., Xue, H. & Geng, X. Emotion Distribution Recognition from Facial Expressions. In: Proceedings of the 23rd ACM International Conference on Multimedia, pp. 1247–1250. ACM, Brisbane Australia. https://doi.org/10.1145/2733373.2806328 (2015).
Jia, X., Zheng, X., Li, W., Zhang, C. & Li, Z. Facial emotion distribution learning by exploiting low-rank label correlations locally. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 9841–9850. http://openaccess.thecvf.com/content_CVPR_2019/html/Jia_Facial_Emotion_Distribution_Learning_by_Exploiting_Low-Rank_Label_Correlations_Locally_CVPR_2019_paper.html (2019) (Accessed 2025-11-10).
Yang, S., Yang, X., Wu, J. & Feng, B. Significant feature suppression and cross-feature fusion networks for fine-grained visual classification. Sci. Rep. 14(1), 24051. https://doi.org/10.1038/s41598-024-74654-4 (2024) . (Accessed 2025-12-02).
Zhao, Z., Liu, Q. & Zhou, F. Robust lightweight facial expression recognition network with label distribution training. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 35, pp. 3510–3519. https://ojs.aaai.org/index.php/aaai/article/view/16465 (2021) (Accessed 2025-04-10).
Gildenblat, J. contributors: PyTorch library for CAM methods. GitHub. https://github.com/jacobgil/pytorch-grad-cam (2021).
Zhang, C., Bütepage, J., Kjellström, H. & Mandt, S. Advances in Variational Inference. IEEE Trans. Pattern Anal. Mach. Intell. 41(8), 2008–2026. https://doi.org/10.1109/TPAMI.2018.2889774. (2019) (Accessed 2025-04-12).
Van Den Oord, A. & Vinyals, O. Neural discrete representation learning. Advances in neural information processing systems 30. (2017) (Accessed 2025-04-17).
Zhang, Z., Li, X., Guo, K. & Xu, X. Facial expression recognition based on multi-task self-distillation with coarse and fine grained labels. Expert Syst. Appl. 281, 127440. https://doi.org/10.1016/j.eswa.2025.127440 (2025) (Accessed 2025-07-10).
Parthasarathy, S., Rozgic, V., Sun, M. & Wang, C. Improving emotion classification through variational inference of latent variables. In: ICASSP 2019-2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 7410–7414. IEEE, https://ieeexplore.ieee.org/abstract/document/8682823/ (2019) (Accessed 2025-04-12).
Chamain, L. D., Qi, S. & Ding, Z. End-to-end image classification and compression with variational autoencoders. IEEE Internet Things J. 9(21), 21916–21931. https://doi.org/10.1109/JIOT.2022.3182313 (2022) (Accessed 2025-03-14).
Hashemifar, S., Marefat, A., Hassannataj Joloudari, J. & Hassanpour, H. Enhancing face recognition with latent space data augmentation and facial posture reconstruction. Expert Syst. Appl. 238, 122266. https://doi.org/10.1016/j.eswa.2023.122266 (2024) (Accessed 2025-07-10).
Hu, J., Shen, L. & Sun, G. Squeeze-and-excitation networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 7132–7141. http://openaccess.thecvf.com/content_cvpr_2018/html/Hu_Squeeze-and-Excitation_Networks_CVPR_201 8_paper.html (2018) (Accessed 2025-04-28).
Woo, S., Park, J., Lee, J.-Y. & Kweon, I.S. CBAM: Convolutional Block Attention Module. In: Computer Vision – ECCV 2018: 15th European Conference, Munich, Germany, September 8–14, 2018, Proceedings, Part VII, pp. 3–19. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-030-01234-2_1 (2018).
Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G. & Gelly, S. An image is worth 16x16 words: Transformers for image recognition at scale. In: International Conference on Learning Representations (ICLR) (2021).
He, J. et al. Micro_nest: multi-scale attention enhanced micro-expression recognition framework. Expert Syst. Appl. 290, 128372. https://doi.org/10.1016/j.eswa.2025.128372 (2025) (Accessed 2025-07-10).
Lu, Z., Lin, R. & Hu, H. Tri-level modality-information disentanglement for visible-infrared person re-identification. IEEE Trans Multim 26, 2700–2714 (2023) (Accessed 2025-11-08).
Lu, Z., Lin, R. & Hu, H. Disentangling modality and posture factors: Memory-attention and orthogonal decomposition for visible-infrared person re-identification. IEEE Trans. Neural Netw. Learn. Syst. 36(3), 5494–5508 (2024).
Hatamizadeh, A., Yin, H., Heinrich, G., Kautz, J. & Molchanov, P. Global context vision transformers. In: International conference on machine learning, pp. 12633–12646. PMLR. https://proceedings.mlr.press/v202/hatamizadeh23a.html (2023).
Li, S., Deng, W. & Du, J. Reliable crowdsourcing and deep locality-preserving learning for expression recognition in the wild. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2852–2861. http://openaccess.thecvf.com/content_cvpr_2017/html/Li_Reliable_Crowdsourcing_and_CVPR_2017_paper.html (2017) (Accessed 2025-03-19).
Mollahosseini, A., Hasani, B. & Mahoor, M. H. Affectnet: A database for facial expression, valence, and arousal computing in the wild. IEEE Trans. Affect. Comput. 10(1), 18–31 (2017).
Barsoum, E., Zhang, C., Ferrer, C.C. & Zhang, Z. Training deep networks for facial expression recognition with crowd-sourced label distribution. In: Proceedings of the 18th ACM International Conference on Multimodal Interaction, pp. 279–283. ACM, Tokyo Japan. https://doi.org/10.1145/2993148.2993165. (2016) (Accessed 2025-04-26).
Loshchilov, I. & Hutter, F. Decoupled weight decay regularization. In: International Conference on Learning Representations (ICLR) (2019).
Vo, T.-H., Lee, G.-S., Yang, H.-J. & Kim, S.-H. Pyramid with super resolution for in-the-wild facial expression recognition. IEEE Access 8, 131988–132001. https://doi.org/10.1109/ACCESS.2020.3010018 (2020) (Accessed 2025-06-10).
Zeng, D., Lin, Z., Yan, X., Liu, Y., Wang, F. & Tang, B. Face2Exp: Combating Data Biases for Facial Expression Recognition, pp. 20291–20300. https://openaccess.thecvf.com/content/CVPR2022/html/Zeng_Face2Exp_Combating_Data_Biases_for_Facial_Expression_Recognition_CVPR_2022_paper.html (2022) (Accessed 2025-06-10).
Xu, J., Li, Y., Yang, G., He, L. & Luo, K. Multiscale facial expression recognition based on dynamic global and static local attention. IEEE Trans. Affect. Comput. https://doi.org/10.1109/TAFFC.2024.3458464 (2024) (Accessed 2025-11-08).
Kingma, D. P. & Welling, M. Auto-encoding variational bayes. Preprint at https://arxiv.org/abs/1312.6114https://arxiv.org/abs/1312.6114 (2013).
Funding
This work was supported by funding from Zhejiang Office Philosophy and Social Sciences Planning Project (24NDJC04Z), the 3rd Batch of Scientific Research Innovation Teams of Zhejiang Open University. Jinhua Science and Technology Bureau (2025-4-178). The funders had no role in the design of the study, collection and analysis of data, writing of the manuscript, or decision to submit the manuscript for publication.
Author information
Authors and Affiliations
Contributions
Gang lv: Conceptualization, Methodology, Writing-Original draft preparation, Investigation, Software, Validation Junling Zhang: Conceptualization, Writing- Reviewing and Editing Chiki Tsoi:Validation,Provided valuable guidance–particularly on improving the figures
Corresponding author
Ethics declarations
Competing interests
The authors declare no competing interests.
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.
About this article
Cite this article
Lv, G., Zhang, J. & Tsoi, C. Facial expression recognition via variational inference. Sci Rep (2026). https://doi.org/10.1038/s41598-026-38734-x
Received:
Accepted:
Published:
DOI: https://doi.org/10.1038/s41598-026-38734-x