Explainable blind image quality assessment with closed-loop semantic guidance and distortion diagnosis

Song, Chenye; Yuan, Fujiang; Zhang, Zhiwang

doi:10.1038/s41598-026-51187-6

Download PDF

Article
Open access
Published: 09 May 2026

Explainable blind image quality assessment with closed-loop semantic guidance and distortion diagnosis

Chenye Song¹,
Fujiang Yuan¹ &
Zhiwang Zhang²

Scientific Reports (2026) Cite this article

320 Accesses
Metrics details

We are providing an unedited version of this manuscript to give early access to its findings. Before final publication, the manuscript will undergo further editing. Please note there may be errors present which affect the content, and all legal disclaimers apply.

Subjects

Abstract

In large-scale visual systems, particularly in automated image triage and preprocessing pipelines, the reliability of downstream computer vision tasks depends on the quality of perceptual inputs and the interpretability of assessment outcomes. Existing blind image quality assessment methods are typically formulated as offline black-box predictors that produce a single scalar score. While indicating degradation, it provides limited actionable guidance, semantic awareness, and insufficient diagnostic information for system-level decision-making. To address these limitations, this work proposes an interpretable blind image quality assessment framework designed to serve as a perceptual monitoring component that provides diagnostic support for automated visual systems. The approach incorporates high-level semantic priors from a frozen vision–language model into a Vision Transformer-based assessment stream via feature-wise linear modulation, enabling content-aware quality evaluation. In parallel, a distortion diagnosis branch is jointly optimized to identify dominant degradation types and generate structured diagnostic cues that support adaptive restoration. Experiments on standard benchmarks demonstrate strong consistency with human subjective judgments, achieving Spearman’s rank correlation coefficients of 0.9509 on TID2013 and 0.9408 on KADID-10k. The model also operates at 65 frames per second, indicating a favorable balance between accuracy, interpretability, and efficiency.

A blind image super-resolution network guided by kernel estimation and structural prior knowledge

Article Open access 25 April 2024

Digital image quality evaluation based on multi-scale aesthetic features and graph convolutional neural networks

Article Open access 29 December 2025

Interactive text-guided image segmentation via vision Mamba and large language models

Article Open access 18 March 2026

Funding

This work was supported by the National Natural Science Foundation of China (Grant No. 62541611), the Natural Science Foundation of Zhejiang Province (Grant No. LQN26F020069), the Ningbo Public Welfare Science and Technology Program Project (Grant No. 2025S181), the Ningbo Science and Technology Special Projects (Grant Nos. 2025Z124 and 2024Z263), and the Ningbo Special Talent Support Program (Grant Nos. 2025 S-237-05).

Author information

Authors and Affiliations

School of Computer Science and Technology, Taiyuan Normal University, Jinzhong, 030619, Shanxi, China
Chenye Song & Fujiang Yuan
School of Computer Science and Data Engineering, NingboTech University, Ningbo, 315000, Zhejiang, People’s Republic of China
Zhiwang Zhang

Authors

Chenye Song
View author publications
Search author on:PubMed Google Scholar
Fujiang Yuan
View author publications
Search author on:PubMed Google Scholar
Zhiwang Zhang
View author publications
Search author on:PubMed Google Scholar

Corresponding author

Correspondence to Zhiwang Zhang.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

Supplementary Information. (download PDF )

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Song, C., Yuan, F. & Zhang, Z. Explainable blind image quality assessment with closed-loop semantic guidance and distortion diagnosis. Sci Rep (2026). https://doi.org/10.1038/s41598-026-51187-6

Download citation

Received: 06 February 2026
Accepted: 27 April 2026
Published: 09 May 2026
DOI: https://doi.org/10.1038/s41598-026-51187-6