Using large language models to address the bottleneck of georeferencing natural history collections

Xie, Yuyang; Park, Daniel S.; Sinnott-Armstrong, Miranda A.; Ho, Joyce; Chen, Tianlong; Weakley, Alan S.; Aguirre Lopez, Luis J.; Choi, Jaein; Laitinen, Marisa M.; Steeves, Nicholas A.; Huang, Chingyan H.; Xu, Ran; Feng, Xiao

doi:10.1038/s41477-025-02162-y

Comment
Published: 05 December 2025

Using large language models to address the bottleneck of georeferencing natural history collections

Nature Plants volume 11, pages 2446–2450 (2025)Cite this article

941 Accesses
1 Citations
48 Altmetric
Metrics details

Subjects

Georeferencing translates textual location information, for instance, on herbarium specimen labels, into geographical coordinates, but traditional methods are labour intensive and costly. Large language models (LLMs), however, have the potential to facilitate the georeferencing of natural history collections. Under standardized testing, some available LLMs achieved a near-human level of accuracy quickly and affordably, such that their incorporation into current workflows will increase the efficiency of georeferencing.

Access through your institution

Buy or subscribe

This is a preview of subscription content, access via your institution

Access options

Access through your institution

Buy this article

Purchase on SpringerLink
Instant access to the full article PDF.

USD 39.95

Prices may be subject to local taxes which are calculated during checkout

**Fig. 1: Summary of georeferencing using different methods.**

**Fig. 2: The geographical distribution and factors related to georeferencing accuracy.**

Data availability

The database and code used for this study are available from https://doi.org/10.6084/m9.figshare.28904936.v1

References

Hedrick, B. P. et al. BioScience 70, 243–251 (2020).
Article Google Scholar
Park, D. S. et al. Nat. Hum. Behav. 7, 1059–1068 (2023).
Article PubMed Google Scholar
Marcer, A., Groom, Q., Haston, E. & Uribe. F. Natural history collections georeferencing survey report. Current georeferencing practices across institutions worldwide. Zenodo https://doi.org/10.5281/zenodo.4644529 (2021).
Rios, N. & Bart, H. GEOLocate (Version 3.22), 2.0 ed. (Tulane University Museum of Natural History, 2010).
Karimzadeh, M., Pezanowski, S., MacEachren, A. M. & Wallgrün, J. O. Trans. GIS 23, 118–136 (2019).
Article Google Scholar
Bhandari, P., Anastasopoulos, A. & Pfoser, D. in Proc. 31st ACM International Conference on Advances in Geographic Information Systems 1–4 (ACM, 2023).
Castro, A., Pinto, J., Reino, L., Pipek, P. & Capinha, C. Ecol. Inform. 82, 102742 (2024).
Article Google Scholar
Cohn, A. G. & Blackwell, R. E. in Leibniz International Proceedings in Informatics (LIPIcs) 315, 28:1–28:9 (Schloss Dagstuhl – Leibniz-Zentrum für Informatik, 2024).
Yan, Y. & Lee, J. in Proc. 33rd ACM International Conference on Information and Knowledge Management (eds Serra, E. & Spezzano, F.) 4163–4167 (ACM, 2024).
Manvi, R. et al. Preprint at https://arxiv.org/abs/2310.06213 (2023).
Zhou, Z. et al. in Proc. 47th International ACM SIGIR Conference on Research and Development in Information Retrieval 2749–2754 (ACM, 2024).
Murphey, P. C., Guralnick, R. P., Glaubitz, R., Neufeld, D. & Ryan, J. A. Phyloinformatics 3, 1–29 (2004).
Google Scholar
Li, H. et al. in Findings of the Association for Computational Linguistics: ACL 2025 20004–20026 (Association for Computational Linguistics, 2025).
Levy, A. A. & Geva, M. in Proc. 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 2: Short Papers) 385–395 (Association for Computational Linguistics, 2025).
Wieczorek, J., Guo, Q. & Hijmans, R. Int. J. Geogr. Inf. Sci. 18, 745–767 (2004).
Article Google Scholar
Ye, F. et al. Adv. Neural Inf. Process. Syst. 37, 15356–15385 (2024).
Google Scholar
Tyen, G., Mansoor, H., Carbune, V., Chen, P. & Mak, T. in Findings of the Association for Computational Linguistics ACL 2024 (eds. Ku, L.-W., Martins, A. & Srikumar, V.) 13894–13908 (Association for Computational Linguistics, 2024).
Alahmari, S. S., Hall, L. O., Mouton, P. R. & Goldgof, D. B. IEEE Access 12, 153221–153231 (2024).
Article Google Scholar
Wang, J. J. & Wang, V. X. Preprint at https://arxiv.org/abs/2503.16974 (2025).
Dalton, R. Nature 423, 575 (2003).
Article CAS PubMed Google Scholar

Download references

Acknowledgements

We thank C. A. McCormick, curator of University of North Carolina Chapel Hill Herbarium, for their valuable suggestions. D.S.P. was supported by a grant from Korea University.

Author information

These authors contributed equally: Yuyang Xie, Daniel S. Park.

Authors and Affiliations

Department of Biology, University of North Carolina, Chapel Hill, NC, USA
Yuyang Xie, Alan S. Weakley, Luis J. Aguirre Lopez & Xiao Feng
Department of Life Sciences, Korea University, Seoul, Republic of Korea
Daniel S. Park
Department of Biological Sciences, Purdue University, West Lafayette, IN, USA
Daniel S. Park, Jaein Choi, Marisa M. Laitinen, Nicholas A. Steeves & Chingyan H. Huang
Department of Biology, Duke University, Durham, NC, USA
Miranda A. Sinnott-Armstrong
Department of Computer Science, Emory University, Atlanta, GA, USA
Joyce Ho & Ran Xu
Department of Computer Science, University of North Carolina, Chapel Hill, NC, USA
Tianlong Chen
University of North Carolina Herbarium (NCU), North Carolina Botanical Garden, University of North Carolina, Chapel Hill, NC, USA
Alan S. Weakley

Authors

Yuyang Xie
View author publications
Search author on:PubMed Google Scholar
Daniel S. Park
View author publications
Search author on:PubMed Google Scholar
Miranda A. Sinnott-Armstrong
View author publications
Search author on:PubMed Google Scholar
Joyce Ho
View author publications
Search author on:PubMed Google Scholar
Tianlong Chen
View author publications
Search author on:PubMed Google Scholar
Alan S. Weakley
View author publications
Search author on:PubMed Google Scholar
Luis J. Aguirre Lopez
View author publications
Search author on:PubMed Google Scholar
Jaein Choi
View author publications
Search author on:PubMed Google Scholar
Marisa M. Laitinen
View author publications
Search author on:PubMed Google Scholar
Nicholas A. Steeves
View author publications
Search author on:PubMed Google Scholar
Chingyan H. Huang
View author publications
Search author on:PubMed Google Scholar
Ran Xu
View author publications
Search author on:PubMed Google Scholar
Xiao Feng
View author publications
Search author on:PubMed Google Scholar

Contributions

X.F. conceived the initial idea, which was inspired through discussions with D.S.P. X.F., Y.X. and D.S.P. designed the research. Y.X. performed the analysis, with help from J.H. and R.X. Y.X., X.F., D.S.P., M.A.S-A, L.J.A.L., J.C., M.L., N.S. and C.H. performed the manual georeferencing. Y.X. and X.F. wrote the manuscript, with major input from D.S.P. and M.A.S-A. All authors contributed to the interpretation of the results and the revision of the manuscript.

Corresponding author

Correspondence to Xiao Feng.

Ethics declarations

Competing interests

The authors declare no competing interests.

Peer review

Peer review information

Nature Plants thanks Gengchen Mai, Barbara Thiers and John Waller for their contribution to the peer review of this work.

Supplementary information

Supplementary Information (download PDF )

Supplementary Information. Details of the benchmark experiment, Supplementary Figures 1–4, Supplementary Tables 1–8

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Xie, Y., Park, D.S., Sinnott-Armstrong, M.A. et al. Using large language models to address the bottleneck of georeferencing natural history collections. Nat. Plants 11, 2446–2450 (2025). https://doi.org/10.1038/s41477-025-02162-y

Download citation

Published: 05 December 2025
Version of record: 05 December 2025
Issue date: December 2025
DOI: https://doi.org/10.1038/s41477-025-02162-y

This article is cited by

Historical trends and biases in North American mammal scientific collections (1800–2018)
- Leticia Cab-Sulub
- Alina Gabriela Monroy-Gamboa
- Sergio Ticul Álvarez-Castañeda
Biodiversity and Conservation (2026)