Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain
the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in
Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles
and JavaScript.
Large language models (LLMs) are rapidly being implemented in a wide range of disciplines, with the promise of unlocking new possibilities for scientific exploration. However, while the development of LLMs brings opportunities to science, it also comes with pressing challenges. This Focus discusses the current state of the art, highlights key obstacles, and examines some of the potential pitfalls and biases of implementing and using LLMs across different domains, including healthcare, urban planning, chemistry, linguistics, humanities, and computer science. In addition, the Focus explores emerging technologies – such as neuromorphic engineering – that show promise in enhancing the energy efficiency of LLM deployment on hardware platforms.
This issue of Nature Computational Science features a Focus that highlights both the promises and perils of large language models, their emerging applications across diverse scientific domains, and the opportunities to overcome the challenges that lie ahead.
While leading tech companies race to build ever-larger models, researchers in Brazil, India and Africa are using clever tricks to remix big labs’ LLMs to bring AI to billions of users.
This Perspective highlights the potential integrations of large language models (LLMs) in chemical research and provides guidance on the effective use of LLMs as research partners, noting the ethical and performance-based challenges that must be addressed moving forward.
Large language models remain largely unexplored is the design of cities. In this Perspective, the authors discuss the potential opportunities brought by these models in assisting urban planning.
Large language models are increasingly important in social science research. The authors provide guidance on how best to validate and use these models as rigorous tools to further scientific inference.
This Perspective discusses that generative AI aligns with generative linguistics by showing that neural language models (NLMs) are formal generative models. Furthermore, generative linguistics offers a framework for evaluating and improving NLMs.
Many humanists are skeptical of language models and concerned about their effects on universities. However, researchers with a background in the humanities are also actively engaging with artificial intelligence — seeking not only to adopt language models as tools, but to steer them toward a more flexible, contextual representation of written culture.
The use of generative artificial intelligence (AI) in healthcare is advancing, but understanding its potential challenges for fairness and health equity is still in its early stages. This Comment investigates how to define fairness and measure it, and highlights research that can help address challenges in the field.
The adoption of generative artificial intelligence (AI) code assistants in scientific software development is promising, but user studies across an array of programming contexts suggest that programmers are at risk of over-reliance on these tools, leading them to accept undetected errors in generated code. Scientific software may be particularly vulnerable to such errors because most research code is untested and scientists are undertrained in software development skills. This Comment outlines the factors that place scientific code at risk and suggests directions for research groups, educators, publishers and funders to counter these liabilities.
Large language models (LLMs) are already transforming the study of individual cognition, but their application to studying collective cognition has been underexplored. We lay out how LLMs may be able to address the complexity that has hindered the study of collectives and raise possible risks that warrant new methods.
Strong barriers remain between neuromorphic engineering and machine learning, especially with regard to recent large language models (LLMs) and transformers. This Comment makes the case that neuromorphic engineering may hold the keys to more efficient inference with transformer-like models.
This study presents SciToolAgent, a large language model-based agent that orchestrates scientific tools via a knowledge graph, enabling automated and effective execution of scientific research workflows.
A comprehensive benchmark, called MaCBench, is developed to evaluate how vision language models handle different aspects of real-world chemistry and materials science tasks.
A physics-based training pipeline is developed to help tackle the challenges of data scarcity. The framework aligns large language models to a physically consistent initial state that is fine-tuned for learning polymer properties.
Language models offer promises in encoding quantum correlations and learning complex quantum states. This Perspective discusses the advantages of employing language models in quantum simulation, explores recent model developments, and offers insights into opportunities for realizing scalable and accurate quantum simulation.
Generative artificial intelligence (GAI) is driving a surge in e-waste due to intensive computational infrastructure needs. This study emphasizes the necessity for proactive implementation of circular economy practices throughout GAI value chains.
Leveraging in-memory computing with emerging gain-cell devices, the authors accelerate attention—a core mechanism in large language models. They train a 1.5-billion-parameter model, achieving up to a 70,000-fold reduction in energy consumption and a 100-fold speed-up compared with GPUs.
This study shows a viable pathway to the efficient deployment of state-of-the-art large language models using mixture of experts on 3D analog in-memory computing hardware.
Researchers replicated 156 psychological experiments using three large language models (LLMs) instead of human participants. LLMs achieved 73–81% replication rates but showed amplified effect sizes and challenges with socially sensitive topics.
Researchers show that large language models exhibit social identity biases similar to humans, having favoritism toward ingroups and hostility toward outgroups. These biases persist across models, training data and real-world human–LLM conversations.
Using registry data from Denmark, Lehmann et al. create individual-level trajectories of events related to health, education, occupation, income and address, and also apply transformer models to build rich embeddings of life-events and to predict outcomes ranging from time of death to personality.
The reasoning capabilities of OpenAI’s generative pre-trained transformer family were tested using semantic illusions and cognitive reflection tests that are typically used in human studies. While early models were prone to human-like cognitive errors, ChatGPT decisively outperformed humans, avoiding the cognitive traps embedded in the tasks.
A neural network-based language model of supra-word meaning, that is, the combined meaning of words in a sentence, is proposed. Analysis of functional magnetic resonance imaging and magnetoencephalography data helps identify the regions of the brain responsible for understanding this meaning.
Many AI companies implement safety systems to protect users from offensive or inaccurate content. Though well intentioned, these filters can exacerbate existing inequalities, and data shows that they have disproportionately removed LGBTQ+ content.
Artificial intelligence (AI) drives innovation across society, economies and science. We argue for the importance of building AI technology according to open-source principles to foster accessibility, collaboration, responsibility and interoperability.
Training foundation models often requires a costly budget and excessive computational resources. In this study, a low-cost instruction learning framework is proposed that could enable the rapid adoption of visual-language pathology applications.
Larger LLMs’ self-attention more accurately predicts readers’ regressive saccades and fMRI responses in language regions, whereas instruction tuning adds no benefit.
PandemicLLM adapts the large language model to predict disease trends by converting diverse disease-relevant data into text. It responds to new variants in real time, offering robust, interpretable forecasts for effective public health responses.
A multimodal computational framework is proposed to integrate single-cell RNA sequencing data with phenotypic information to map complex genotype–phenotype relationships. This approach helps to refine cellular heterogeneity analysis, identify cross-tissue biomarkers and reveal polyfunctional characteristics of genes with cellular resolution.
This study introduces the Protein Importance Calculator (PIC), a deep learning model designed to predict human essential proteins (HEPs) crucial for survival and development. Unlike conventional methods, PIC offers a comprehensive assessment of HEPs across three levels: humans, cell lines and mice.
The parallels between natural language and antibody sequences could serve as a stepping stone to using deep language models for analyzing antibody sequences. This Perspective discusses how issues in antibody language model rule mining could be addressed by linguistically formalizing the antibody language.
Signal peptides (SPs) are vital for protein–transmembrane communication. In this work, the authors introduce USPNet, a deep learning method based on a protein language model for SP prediction that shows both high sensitivity and efficiency, thereby contributing to the identification of novel SPs.
In this study, a supervised protein language model is proposed to predict protein structure from a single sequence. It achieves state-of-the-art accuracy on orphan proteins and is competitive with other methods on human-designed proteins.