Skip to main content

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • Research Briefing
  • Published:

Vision language models excel at perception but struggles with scientific reasoning

A benchmark — MaCBench — is developed for evaluating the scientific knowledge of vision language models (VLMs). Evaluation of leading VLMs reveals that they excel at basic scientific tasks such as equipment identification, but struggle with spatial reasoning and multistep analysis — a limitation for autonomous scientific discovery.

This is a preview of subscription content, access via your institution

Access options

Buy this article

Prices may be subject to local taxes which are calculated during checkout

Fig. 1: Frontier vision language models demonstrate varying capabilities across scientific reasoning tasks.

References

  1. Durante, Z. et al. Agent AI: Surveying the horizons of multimodal interaction. Preprint at https://arxiv.org/abs/2401.03568 (2024). This preprint discusses agent-based multimodal intelligence.

  2. Boiko, D. A., MacKnight, R., Kline, B. & Gomes, G. Autonomous chemical research with large language models. Nature 624, 570–578 (2023). This paper reports a chemical agent that autonomously performs reactions.

    Article  Google Scholar 

  3. Bran, A. M. et al. Augmenting large language models with chemistry tools. Nat. Mach. Intell. 6, 525–535 (2024). This paper reports an LLM agent that uses chemical tools.

    Article  Google Scholar 

  4. Chollet, F. On the measure of intelligence. Preprint at https://arxiv.org/abs/1911.01547 (2019). This preprint discusses how to evaluate the ‘intelligence’ of models and to compare with that of humans.

  5. Mirza, A. et al. A framework for evaluating the chemical knowledge and reasoning abilities of large language models against the expertise of chemists. Nat. Chem. 17, 1027–1034 (2025). This article describes a framework for the evaluation of the chemical capabilities of LLMs.

    Article  Google Scholar 

Download references

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

This is a summary of: Alampara, N. et al. Probing the limitations of multimodal language models for chemistry and materials research. Nat. Comput. Sci. https://doi.org/10.1038/s43588-025-00836-3 (2025).

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Vision language models excel at perception but struggles with scientific reasoning. Nat Comput Sci 5, 852–853 (2025). https://doi.org/10.1038/s43588-025-00871-0

Download citation

  • Published:

  • Issue date:

  • DOI: https://doi.org/10.1038/s43588-025-00871-0

Search

Quick links

Nature Briefing AI and Robotics

Sign up for the Nature Briefing: AI and Robotics newsletter — what matters in AI and robotics research, free to your inbox weekly.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing: AI and Robotics