Georeferencing translates textual location information, for instance, on herbarium specimen labels, into geographical coordinates, but traditional methods are labour intensive and costly. Large language models (LLMs), however, have the potential to facilitate the georeferencing of natural history collections. Under standardized testing, some available LLMs achieved a near-human level of accuracy quickly and affordably, such that their incorporation into current workflows will increase the efficiency of georeferencing.
This is a preview of subscription content, access via your institution
Access options
Access Nature and 54 other Nature Portfolio journals
Get Nature+, our best-value online-access subscription
$32.99 / 30 days
cancel any time
Subscribe to this journal
Receive 12 digital issues and online access to articles
$119.00 per year
only $9.92 per issue
Buy this article
- Purchase on SpringerLink
- Instant access to the full article PDF.
USD 39.95
Prices may be subject to local taxes which are calculated during checkout


Data availability
The database and code used for this study are available from https://doi.org/10.6084/m9.figshare.28904936.v1
References
Hedrick, B. P. et al. BioScience 70, 243–251 (2020).
Park, D. S. et al. Nat. Hum. Behav. 7, 1059–1068 (2023).
Marcer, A., Groom, Q., Haston, E. & Uribe. F. Natural history collections georeferencing survey report. Current georeferencing practices across institutions worldwide. Zenodo https://doi.org/10.5281/zenodo.4644529 (2021).
Rios, N. & Bart, H. GEOLocate (Version 3.22), 2.0 ed. (Tulane University Museum of Natural History, 2010).
Karimzadeh, M., Pezanowski, S., MacEachren, A. M. & Wallgrün, J. O. Trans. GIS 23, 118–136 (2019).
Bhandari, P., Anastasopoulos, A. & Pfoser, D. in Proc. 31st ACM International Conference on Advances in Geographic Information Systems 1–4 (ACM, 2023).
Castro, A., Pinto, J., Reino, L., Pipek, P. & Capinha, C. Ecol. Inform. 82, 102742 (2024).
Cohn, A. G. & Blackwell, R. E. in Leibniz International Proceedings in Informatics (LIPIcs) 315, 28:1–28:9 (Schloss Dagstuhl – Leibniz-Zentrum für Informatik, 2024).
Yan, Y. & Lee, J. in Proc. 33rd ACM International Conference on Information and Knowledge Management (eds Serra, E. & Spezzano, F.) 4163–4167 (ACM, 2024).
Manvi, R. et al. Preprint at https://arxiv.org/abs/2310.06213 (2023).
Zhou, Z. et al. in Proc. 47th International ACM SIGIR Conference on Research and Development in Information Retrieval 2749–2754 (ACM, 2024).
Murphey, P. C., Guralnick, R. P., Glaubitz, R., Neufeld, D. & Ryan, J. A. Phyloinformatics 3, 1–29 (2004).
Li, H. et al. in Findings of the Association for Computational Linguistics: ACL 2025 20004–20026 (Association for Computational Linguistics, 2025).
Levy, A. A. & Geva, M. in Proc. 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 2: Short Papers) 385–395 (Association for Computational Linguistics, 2025).
Wieczorek, J., Guo, Q. & Hijmans, R. Int. J. Geogr. Inf. Sci. 18, 745–767 (2004).
Ye, F. et al. Adv. Neural Inf. Process. Syst. 37, 15356–15385 (2024).
Tyen, G., Mansoor, H., Carbune, V., Chen, P. & Mak, T. in Findings of the Association for Computational Linguistics ACL 2024 (eds. Ku, L.-W., Martins, A. & Srikumar, V.) 13894–13908 (Association for Computational Linguistics, 2024).
Alahmari, S. S., Hall, L. O., Mouton, P. R. & Goldgof, D. B. IEEE Access 12, 153221–153231 (2024).
Wang, J. J. & Wang, V. X. Preprint at https://arxiv.org/abs/2503.16974 (2025).
Dalton, R. Nature 423, 575 (2003).
Acknowledgements
We thank C. A. McCormick, curator of University of North Carolina Chapel Hill Herbarium, for their valuable suggestions. D.S.P. was supported by a grant from Korea University.
Author information
Authors and Affiliations
Contributions
X.F. conceived the initial idea, which was inspired through discussions with D.S.P. X.F., Y.X. and D.S.P. designed the research. Y.X. performed the analysis, with help from J.H. and R.X. Y.X., X.F., D.S.P., M.A.S-A, L.J.A.L., J.C., M.L., N.S. and C.H. performed the manual georeferencing. Y.X. and X.F. wrote the manuscript, with major input from D.S.P. and M.A.S-A. All authors contributed to the interpretation of the results and the revision of the manuscript.
Corresponding author
Ethics declarations
Competing interests
The authors declare no competing interests.
Peer review
Peer review information
Nature Plants thanks Gengchen Mai, Barbara Thiers and John Waller for their contribution to the peer review of this work.
Supplementary information
Supplementary Information (download PDF )
Supplementary Information. Details of the benchmark experiment, Supplementary Figures 1–4, Supplementary Tables 1–8
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Xie, Y., Park, D.S., Sinnott-Armstrong, M.A. et al. Using large language models to address the bottleneck of georeferencing natural history collections. Nat. Plants 11, 2446–2450 (2025). https://doi.org/10.1038/s41477-025-02162-y
Published:
Version of record:
Issue date:
DOI: https://doi.org/10.1038/s41477-025-02162-y
This article is cited by
-
Historical trends and biases in North American mammal scientific collections (1800–2018)
Biodiversity and Conservation (2026)