Abstract
In large-scale visual systems, particularly in automated image triage and preprocessing pipelines, the reliability of downstream computer vision tasks depends on the quality of perceptual inputs and the interpretability of assessment outcomes. Existing blind image quality assessment methods are typically formulated as offline black-box predictors that produce a single scalar score. While indicating degradation, it provides limited actionable guidance, semantic awareness, and insufficient diagnostic information for system-level decision-making. To address these limitations, this work proposes an interpretable blind image quality assessment framework designed to serve as a perceptual monitoring component that provides diagnostic support for automated visual systems. The approach incorporates high-level semantic priors from a frozen vision–language model into a Vision Transformer-based assessment stream via feature-wise linear modulation, enabling content-aware quality evaluation. In parallel, a distortion diagnosis branch is jointly optimized to identify dominant degradation types and generate structured diagnostic cues that support adaptive restoration. Experiments on standard benchmarks demonstrate strong consistency with human subjective judgments, achieving Spearman’s rank correlation coefficients of 0.9509 on TID2013 and 0.9408 on KADID-10k. The model also operates at 65 frames per second, indicating a favorable balance between accuracy, interpretability, and efficiency.
Similar content being viewed by others
Funding
This work was supported by the National Natural Science Foundation of China (Grant No. 62541611), the Natural Science Foundation of Zhejiang Province (Grant No. LQN26F020069), the Ningbo Public Welfare Science and Technology Program Project (Grant No. 2025S181), the Ningbo Science and Technology Special Projects (Grant Nos. 2025Z124 and 2024Z263), and the Ningbo Special Talent Support Program (Grant Nos. 2025 S-237-05).
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Competing interests
The authors declare no competing interests.
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary information
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.
About this article
Cite this article
Song, C., Yuan, F. & Zhang, Z. Explainable blind image quality assessment with closed-loop semantic guidance and distortion diagnosis. Sci Rep (2026). https://doi.org/10.1038/s41598-026-51187-6
Received:
Accepted:
Published:
DOI: https://doi.org/10.1038/s41598-026-51187-6


