Vision language models excel at perception but struggles with scientific reasoning

doi:10.1038/s43588-025-00871-0

Research Briefing
Published: 10 September 2025

Vision language models excel at perception but struggles with scientific reasoning

Nature Computational Science volume 5, pages 852–853 (2025)Cite this article

161 Accesses
1 Altmetric
Metrics details

Subjects

A benchmark — MaCBench — is developed for evaluating the scientific knowledge of vision language models (VLMs). Evaluation of leading VLMs reveals that they excel at basic scientific tasks such as equipment identification, but struggle with spatial reasoning and multistep analysis — a limitation for autonomous scientific discovery.

Access through your institution

Buy or subscribe

This is a preview of subscription content, access via your institution

Access options

Access through your institution

Buy this article

Purchase on SpringerLink
Instant access to full article PDF

Buy now

Prices may be subject to local taxes which are calculated during checkout

**Fig. 1: Frontier vision language models demonstrate varying capabilities across scientific reasoning tasks.**

References

Durante, Z. et al. Agent AI: Surveying the horizons of multimodal interaction. Preprint at https://arxiv.org/abs/2401.03568 (2024). This preprint discusses agent-based multimodal intelligence.
Boiko, D. A., MacKnight, R., Kline, B. & Gomes, G. Autonomous chemical research with large language models. Nature 624, 570–578 (2023). This paper reports a chemical agent that autonomously performs reactions.
Article Google Scholar
Bran, A. M. et al. Augmenting large language models with chemistry tools. Nat. Mach. Intell. 6, 525–535 (2024). This paper reports an LLM agent that uses chemical tools.
Article Google Scholar
Chollet, F. On the measure of intelligence. Preprint at https://arxiv.org/abs/1911.01547 (2019). This preprint discusses how to evaluate the ‘intelligence’ of models and to compare with that of humans.
Mirza, A. et al. A framework for evaluating the chemical knowledge and reasoning abilities of large language models against the expertise of chemists. Nat. Chem. 17, 1027–1034 (2025). This article describes a framework for the evaluation of the chemical capabilities of LLMs.
Article Google Scholar

Download references

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

This is a summary of: Alampara, N. et al. Probing the limitations of multimodal language models for chemistry and materials research. Nat. Comput. Sci. https://doi.org/10.1038/s43588-025-00836-3 (2025).

Rights and permissions

Reprints and permissions

About this article

Cite this article

Vision language models excel at perception but struggles with scientific reasoning. Nat Comput Sci 5, 852–853 (2025). https://doi.org/10.1038/s43588-025-00871-0

Download citation

Published: 10 September 2025
Issue date: October 2025
DOI: https://doi.org/10.1038/s43588-025-00871-0

Vision language models excel at perception but struggles with scientific reasoning

Subjects

Access options

References

Additional information

Rights and permissions

About this article

Cite this article

Probing the limitations of multimodal language models for chemistry and materials research

Search

Quick links

Subjects

Access options

References

Additional information

Rights and permissions

About this article

Cite this article

Share this article

Search

Quick links