Skip to main content

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

Advertisement

Scientific Reports
  • View all journals
  • Search
  • My Account Login
  • Content Explore content
  • About the journal
  • Publish with us
  • Sign up for alerts
  • RSS feed
  1. nature
  2. scientific reports
  3. articles
  4. article
S2SWCLIP: semantic-optimized prompts with spatial-wavelet synergy for zero-shot anomaly detection
Download PDF
Download PDF
  • Article
  • Open access
  • Published: 11 March 2026

S2SWCLIP: semantic-optimized prompts with spatial-wavelet synergy for zero-shot anomaly detection

  • Huan Zhang1,2,
  • Chunlei Wu1,2,
  • Jing Lu1,2 &
  • …
  • Mengyuan Jing1,2 

Scientific Reports , Article number:  (2026) Cite this article

  • 1050 Accesses

  • Metrics details

We are providing an unedited version of this manuscript to give early access to its findings. Before final publication, the manuscript will undergo further editing. Please note there may be errors present which affect the content, and all legal disclaimers apply.

Subjects

  • Engineering
  • Mathematics and computing

Abstract

Zero-shot anomaly detection is crucial for privacy-sensitive scenarios with limited target data. However, prominent methods based on visual-language models suffer from semantic overlap due to simplistic generic prompts, while the reductive design of visual representations fails to capture crucial local details and global structures, leading to alignment deviation between text and visual embeddings. In this paper, we propose S2SWCLIP, which integrates semantic-optimized prompts with wavelet-spatial synergy to advance the design principles by refining prompt learning, enriching visual representations, and optimizing cross-modal alignment. Initially, object-agnostic prompts, contrastive normal-anomaly prompts, and anomaly-referenced prompts are combined to delineate sharper semantic boundaries via strongly contrasting vocabulary, while comprehensive semantic information is optimized through embedding integration enabled by a cross-informative adaptive fusion mechanism. Subsequently, the spatial-to-wavelet transformation module facilitates the conversion of spatial features into frequency domain representations, in synergy with hierarchically fused visual features to retain fine-grained and meaningful image details. Furthermore, the entropy-gain similarity adaptively quantifies information richness to emphasize features with low entropy disparity, optimizing image-text alignment. Large-scale experiments on 14 real-world anomaly detection datasets reveal that S2SWCLIP outperforms numerous methods. The code is available at https://github.com/Huanzh111/S2SW.

Similar content being viewed by others

Weakly supervised video anomaly detection based on hyperbolic space

Article Open access 01 November 2024

A swin transformer-based hybrid reconstruction discriminative network for image anomaly detection

Article Open access 30 September 2025

A principled representation of elongated structures using heatmaps

Article Open access 14 September 2023

Data availibility

The datasets generated and/or analysed during the current study are available in the GitHub repository, [https://github.com/Huanzh111/S2SW].

References

  1. Liu, W., Wang, C. & Zhang, Y. Industrial surface defect detection by multi-scale inpainting-gan. Vis. Comput. 41(8), 5643–5660 (2025).

    Google Scholar 

  2. Zhu, W. et al. Surface defect detection and classification of steel using an efficient swin transformer. Adv. Eng. Inform. 57, 102061 (2023).

    Google Scholar 

  3. Wei, C., Liang, J., Liu, H., Hou, Z. & Huan, Z. Multi-stage unsupervised fabric defect detection based on dcgan. Vis. Comput. 39(12), 6655–6671 (2023).

    Google Scholar 

  4. Qin, Z., Yi, H., Lao, Q. & Li, K. Medical image understanding with pretrained vision language models: A comprehensive study. arXiv preprint arXiv:2209.15517 (2022).

  5. Liu, J. et al. Deep industrial image anomaly detection: A survey. Mach. Intell. Res. 21(1), 104–135 (2024).

    Google Scholar 

  6. Cao, Y., Xu, X., Zhang, J., Cheng, Y., Huang, X., Pang, G. & Shen, W. A survey on visual anomaly detection: Challenge, approach, and prospect. arXiv preprint arXiv:2401.16402 (2024).

  7. Bae, J., Lee, J.-H. & Kim, S. Pni: Industrial anomaly detection using position and neighborhood information. In: Proceedings of the IEEE/CVF International Conference on Computer Vision 6373–6383 (2023).

  8. Gu, Z., Liu, L., Chen, X., Yi, R., Zhang, J., Wang, Y., Wang, C., Shu, A., Jiang, G. & Ma, L. Remembering normality: Memory-guided knowledge distillation for unsupervised anomaly detection. In: Proceedings of the IEEE/CVF International Conference on Computer Vision 16401–16409 (2023).

  9. Wang, G., Han, S., Ding, E. & Huang, D. Student-teacher feature pyramid matching for anomaly detection. arXiv preprint arXiv:2103.04257 (2021).

  10. McIntosh, D. & Albu, A. B. Inter-realization channels: Unsupervised anomaly detection beyond one-class classification. In: Proceedings of the IEEE/CVF International Conference on Computer Vision 6285–6295 (2023).

  11. Xiao, Q., Li, G. & Chen, Q. Complex image classification by feature inference. Neurocomputing 544, 126231 (2023).

    Google Scholar 

  12. Yu, X., Wang, H., Wang, J. & Wang, X. A common feature-driven prediction model for multivariate time series data. Inf. Sci. 677, 120967 (2024).

    Google Scholar 

  13. Miao, J., Tao, H., Xie, H., Sun, J. & Cao, J. Reconstruction-based anomaly detection for multivariate time series using contrastive generative adversarial networks. Inf. Process. Manag. 61(1), 103569 (2024).

    Google Scholar 

  14. Liu, S. et al. Time series anomaly detection with adversarial reconstruction networks. IEEE Trans. Knowl. Data Eng. 35(4), 4293–4306 (2022).

    Google Scholar 

  15. Chen, Y. et al. Lgfdr: Local and global feature denoising reconstruction for unsupervised anomaly detection. Vis. Comput. 40(12), 8881–8894 (2024).

    Google Scholar 

  16. Yang, M., Wu, P. & Feng, H. Memseg: A semi-supervised method for image surface defect detection using differences and commonalities. Eng. Appl. Artif. Intell. 119, 105835 (2023).

    Google Scholar 

  17. Bergmann, P., Fauser, M., Sattlegger, D. & Steger, C. Mvtec ad-a comprehensive real-world dataset for unsupervised anomaly detection. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition 9592–9600 (2019).

  18. Zhou, Q., Pang, G., Tian, Y., He, S. & Chen, J. Anomalyclip: Object-agnostic prompt learning for zero-shot anomaly detection. arXiv preprint arXiv:2310.18961 (2023).

  19. Baugh, M., Batten, J., Müller, J. P. & Kainz, B. Zero-shot anomaly detection with pre-trained segmentation models. arXiv preprint arXiv:2306.09269 (2023).

  20. Chen, X., Zhang, J., Tian, G., He, H., Zhang, W., Wang, Y., Wang, C. & Liu, Y. Clip-ad: A language-guided staged dual-path model for zero-shot anomaly detection. In: International Joint Conference on Artificial Intelligence 17–33 (Springer, 2024).

  21. Jeong, J., Zou, Y., Kim, T., Zhang, D., Ravichandran, A. & Dabeer, O. Winclip: Zero-/few-shot anomaly classification and segmentation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition 19606–19616 (2023).

  22. Zhang, X. et al. Fscmf: A dual-branch frequency-spatial joint perception cross-modality network for visible and infrared image fusion. Neurocomputing 641, 130376 (2025).

    Google Scholar 

  23. Zhang, X., Dong, K., Cheng, D., Hua, Z. & Li, J. Stwanet: Spatio-temporal wavelet attention aggregation network for remote sensing change detection. IEEE J. Select. Top. Appl. Earth Observ. Remote Sens. (2025).

  24. Zhang, X., Fan, G., Chen, G.-Y., Hua, Z., Li, J., Gan, M. & Chen, C. Wavelet-guided dual-frequency encoding for remote sensing change detection. arXiv preprint arXiv:2508.05271 (2025).

  25. Bao, H. et al. Vlmo: Unified vision-language pre-training with mixture-of-modality-experts. Adv. Neural. Inf. Process. Syst. 35, 32897–32912 (2022).

    Google Scholar 

  26. Radford, A., Kim, J. W., Hallacy, C., Ramesh, A., Goh, G., Agarwal, S., Sastry, G., Askell, A., Mishkin, P. & Clark, J. Learning transferable visual models from natural language supervision. In: International Conference on Machine Learning 8748–8763 (PMLR, 2021).

  27. Schuhmann, C. et al. Laion-5b: An open large-scale dataset for training next generation image-text models. Adv. Neural. Inf. Process. Syst. 35, 25278–25294 (2022).

    Google Scholar 

  28. Ilharco, G., Wortsman, M., Wightman, R., Gordon, C., Carlini, N., Taori, R., Dave, A., Shankar, V., Namkoong, H. & Miller, J. et al. Openclip (2021).

  29. Zhang, J., Huang, J., Jin, S. & Lu, S. Vision-language models for vision tasks: A survey. IEEE Trans. Pattern Anal. Mach. Intell. (2024).

  30. Aota, T., Tong, L. T. T. & Okatani, T. Zero-shot versus many-shot: Unsupervised texture anomaly detection. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision 5564–5572 (2023).

  31. Liznerski, P., Ruff, L., Vandermeulen, R. A., Franks, B. J., Müller, K.-R. & Kloft, M. Exposing outlier exposure: What can be learned from few, one, and zero outlier images. arXiv preprint arXiv:2205.11474 (2022).

  32. Esmaeilpour, S., Liu, B., Robertson, E. & Shu, L. Zero-shot out-of-distribution detection based on the pre-trained model clip. In: Proceedings of the AAAI Conference on Artificial Intelligence vol. 36, pp. 6568–6576 (2022).

  33. Schwartz, E. et al. Maeday: Mae for few-and zero-shot anomaly-detection. Comput. Vis. Image Underst. 241, 103958 (2024).

    Google Scholar 

  34. Zhou, C., Loy, C. C. & Dai, B. Extract free dense labels from clip. In: European Conference on Computer Vision 696–712 (Springer, 2022).

  35. Chen, X., Han, Y. & Zhang, J. A zero-/fewshot anomaly classification and segmentation method for cvpr 2023 vand workshop challenge tracks 1&2: 1st place on zero-shot ad and 4th place on few-shot ad. arXiv preprint arXiv:2305.17382 2(4) (2023).

  36. Huang, C., Jiang, A., Feng, J., Zhang, Y., Wang, X. & Wang, Y. Adapting visual-language models for generalizable anomaly detection in medical images. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition 11375–11385 (2024).

  37. Wortsman, M., Ilharco, G., Kim, J. W., Li, M., Kornblith, S., Roelofs, R., Lopes, R. G., Hajishirzi, H., Farhadi, A. & Namkoong, H. Robust fine-tuning of zero-shot models. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition 7959–7971 (2022).

  38. Khattak, M. U., Rasheed, H., Maaz, M., Khan, S. & Khan, F. S. Maple: Multi-modal prompt learning. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition 19113–19122 (2023).

  39. Kim, K., Oh, Y. & Ye, J. C. Zegot: Zero-shot segmentation through optimal transport of text prompts. arXiv preprint arXiv:2301.12171 (2023).

  40. Rao, Y., Zhao, W., Chen, G., Tang, Y., Zhu, Z., Huang, G., Zhou, J. & Lu, J. Denseclip: Language-guided dense prediction with context-aware prompting. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition 18082–18091 (2022).

  41. Ham, J., Jung, Y. & Baek, J.-G. Glocalclip: Object-agnostic global-local prompt learning for zero-shot anomaly detection. arXiv preprint arXiv:2411.06071 (2024).

  42. Zhou, K., Yang, J., Loy, C. C. & Liu, Z. Learning to prompt for vision-language models. Int. J. Comput. Vision 130(9), 2337–2348 (2022).

    Google Scholar 

  43. Cao, Y., Zhang, J., Frittoli, L., Cheng, Y., Shen, W. & Boracchi, G. Adaclip: Adapting clip with hybrid learnable prompts for zero-shot anomaly detection. In: European Conference on Computer Vision 55–72 (Springer, 2024).

  44. Zhang, K. et al. Adversarial spatio-temporal learning for video deblurring. IEEE Trans. Image Process. 28(1), 291–301 (2018).

    Google Scholar 

  45. Chen, N., Li, B., Wang, Y., Ying, X., Wang, L., Zhang, C., Guo, Y., Li, M. & An, W. Motion and appearance decoupling representation for event cameras. IEEE Trans. Image Process. (2025).

  46. Ross, T.-Y. & Dollár, G. Focal loss for dense object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition 2980–2988 (2017).

  47. Li, X., Sun, X., Meng, Y., Liang, J., Wu, F. & Li, J. Dice loss for data-imbalanced nlp tasks. arXiv preprint arXiv:1911.02855 (2019).

  48. Zou, Y., Jeong, J., Pemula, L., Zhang, D. & Dabeer, O. Spot-the-difference self-supervised pre-training for anomaly detection and segmentation. In: European Conference on Computer Vision 392–408 (Springer, 2022).

  49. Jezek, S., Jonak, M., Burget, R., Dvorak, P. & Skotak, M. Deep learning-based defect detection of metal parts: Evaluating current methods in complex conditions. In: 2021 13th International Congress on Ultra Modern Telecommunications and Control Systems and Workshops (ICUMT) 66–71 (IEEE, 2021).

  50. Mishra, P., Verk, R., Fornasier, D., Piciarelli, C. & Foresti, G. L. Vt-adl: A vision transformer network for image anomaly detection and localization. In: 2021 IEEE 30th International Symposium on Industrial Electronics (ISIE) 01–06 (IEEE, 2021).

  51. Tabernik, D., Šela, S., Skvarč, J. & Skočaj, D. Segmentation-based deep-learning approach for surface-defect detection. J. Intell. Manuf. 31(3), 759–776 (2020).

    Google Scholar 

  52. Wieler, M. & Hahn, T. Weakly supervised learning for industrial optical inspection. In: DAGM Symposium In vol. 6, p. 11 (2007).

  53. Cauley, K., Hu, Y. & Fielden, S. Head CT: Toward making full use of the information the x-rays give. Am. J. Neuroradiol. 42(8), 1362–1369 (2021).

    Google Scholar 

  54. TS, C. & Jagadale, B. N. Comparative analysis of u-net and deeplab for automatic polyp segmentation in colonoscopic frames using cvc-clinicdb dataset (2023).

  55. Tajbakhsh, N., Gurudu, S. R. & Liang, J. Automated polyp detection in colonoscopy videos using shape and context information. IEEE Trans. Med. Imaging 35(2), 630–644 (2015).

    Google Scholar 

  56. Jha, D., Smedsrud, P. H., Riegler, M. A., Halvorsen, P., De Lange, T., Johansen, D. & Johansen, H. D. Kvasir-seg: A segmented polyp dataset. In: MultiMedia Modeling: 26th International Conference, MMM 2020, Daejeon, South Korea, January 5–8, 2020, Proceedings, Part II 26 451–462 (Springer, 2020).

  57. Hicks, S. A., Jha, D., Thambawita, V., Halvorsen, P., Hammer, H. L. & Riegler, M. A. The endotect 2020 challenge: Evaluation and comparison of classification, segmentation and inference time for endoscopy. In: Pattern Recognition. ICPR International Workshops and Challenges: Virtual Event, January 10–15, 2021, Proceedings, Part VIII, pp. 263–274 (Springer, 2021).

  58. Gong, H., Chen, G., Wang, R., Xie, X., Mao, M., Yu, Y., Chen, F. & Li, G. Multi-task learning for thyroid nodule segmentation with thyroid region prior. In: 2021 IEEE 18th International Symposium on Biomedical Imaging (ISBI) 257–261 (IEEE, 2021).

  59. Bergmann, P., Fauser, M., Sattlegger, D. & Steger, C. Uninformed students: Student-teacher anomaly detection with discriminative latent embeddings. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition 4183–4192 (2020).

  60. Li, H. & Hu, J. Feature consistency learning for anomaly detection. IEEE Trans. Instrum. Meas. 74, 1–9 (2024).

    Google Scholar 

  61. Li, H., Chen, Z., Xu, Y. & Hu, J. Hyperbolic anomaly detection. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition 17511–17520 (2024).

Download references

Funding

This work is supported by the grants from the Natural Science Foundation of Shandong Province (ZR2024MF145), the National Natural Science Foundation of China (62072469), and the Qingdao Natural Science Foundation (23-2-1-162-zyyd-jch).

Author information

Authors and Affiliations

  1. Qingdao Institute of Software, College of Computer Science and Technology, China University of Petroleum (East China), No. 66, Changjiang West Road, Qingdao, 266580, Shandong, China

    Huan Zhang, Chunlei Wu, Jing Lu & Mengyuan Jing

  2. Shandong Key Laboratory of Intelligent Oil & Gas Industrial Software, China University of Petroleum (East China), No. 66, Changjiang West Road, Qingdao, 266580, Shandong, China

    Huan Zhang, Chunlei Wu, Jing Lu & Mengyuan Jing

Authors
  1. Huan Zhang
    View author publications

    Search author on:PubMed Google Scholar

  2. Chunlei Wu
    View author publications

    Search author on:PubMed Google Scholar

  3. Jing Lu
    View author publications

    Search author on:PubMed Google Scholar

  4. Mengyuan Jing
    View author publications

    Search author on:PubMed Google Scholar

Contributions

Conceptualization, H.Z. and M.Y.J.; methodology, H.Z. and C.L.W.; investigation, H.Z. and C.L.W.; writing-original draft, H.Z. and C.L.W.; writing-review & editing, H.Z. and J.L.; funding acquisition, C.L.W.; resources, C.L.W.; supervision,M.Y.J. and J.L.

Corresponding author

Correspondence to Chunlei Wu.

Ethics declarations

Competing interests

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Zhang, H., Wu, C., Lu, J. et al. S2SWCLIP: semantic-optimized prompts with spatial-wavelet synergy for zero-shot anomaly detection. Sci Rep (2026). https://doi.org/10.1038/s41598-026-43044-3

Download citation

  • Received: 17 November 2025

  • Accepted: 28 February 2026

  • Published: 11 March 2026

  • DOI: https://doi.org/10.1038/s41598-026-43044-3

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

Keywords

  • Image anomaly detection
  • Prompt learning
  • Feature enhancement
  • Zero-shot learning
Download PDF

Advertisement

Explore content

  • Research articles
  • News & Comment
  • Collections
  • Subjects
  • Follow us on Facebook
  • Follow us on X
  • Sign up for alerts
  • RSS feed

About the journal

  • About Scientific Reports
  • Contact
  • Journal policies
  • Guide to referees
  • Calls for Papers
  • Editor's Choice
  • Journal highlights
  • Open Access Fees and Funding

Publish with us

  • For authors
  • Language editing services
  • Open access funding
  • Submit manuscript

Search

Advanced search

Quick links

  • Explore articles by subject
  • Find a job
  • Guide to authors
  • Editorial policies

Scientific Reports (Sci Rep)

ISSN 2045-2322 (online)

nature.com footer links

About Nature Portfolio

  • About us
  • Press releases
  • Press office
  • Contact us

Discover content

  • Journals A-Z
  • Articles by subject
  • protocols.io
  • Nature Index

Publishing policies

  • Nature portfolio policies
  • Open access

Author & Researcher services

  • Reprints & permissions
  • Research data
  • Language editing
  • Scientific editing
  • Nature Masterclasses
  • Research Solutions

Libraries & institutions

  • Librarian service & tools
  • Librarian portal
  • Open research
  • Recommend to library

Advertising & partnerships

  • Advertising
  • Partnerships & Services
  • Media kits
  • Branded content

Professional development

  • Nature Awards
  • Nature Careers
  • Nature Conferences

Regional websites

  • Nature Africa
  • Nature China
  • Nature India
  • Nature Japan
  • Nature Middle East
  • Privacy Policy
  • Use of cookies
  • Legal notice
  • Accessibility statement
  • Terms & Conditions
  • Your US state privacy rights
Springer Nature

© 2026 Springer Nature Limited

Nature Briefing AI and Robotics

Sign up for the Nature Briefing: AI and Robotics newsletter — what matters in AI and robotics research, free to your inbox weekly.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing: AI and Robotics