Skip to main content

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • Comment
  • Published:

Using large language models to address the bottleneck of georeferencing natural history collections

Georeferencing translates textual location information, for instance, on herbarium specimen labels, into geographical coordinates, but traditional methods are labour intensive and costly. Large language models (LLMs), however, have the potential to facilitate the georeferencing of natural history collections. Under standardized testing, some available LLMs achieved a near-human level of accuracy quickly and affordably, such that their incorporation into current workflows will increase the efficiency of georeferencing.

This is a preview of subscription content, access via your institution

Access options

Buy this article

USD 39.95

Prices may be subject to local taxes which are calculated during checkout

Fig. 1: Summary of georeferencing using different methods.
Fig. 2: The geographical distribution and factors related to georeferencing accuracy.

Data availability

The database and code used for this study are available from https://doi.org/10.6084/m9.figshare.28904936.v1

References

  1. Hedrick, B. P. et al. BioScience 70, 243–251 (2020).

    Article  Google Scholar 

  2. Park, D. S. et al. Nat. Hum. Behav. 7, 1059–1068 (2023).

    Article  PubMed  Google Scholar 

  3. Marcer, A., Groom, Q., Haston, E. & Uribe. F. Natural history collections georeferencing survey report. Current georeferencing practices across institutions worldwide. Zenodo https://doi.org/10.5281/zenodo.4644529 (2021).

  4. Rios, N. & Bart, H. GEOLocate (Version 3.22), 2.0 ed. (Tulane University Museum of Natural History, 2010).

  5. Karimzadeh, M., Pezanowski, S., MacEachren, A. M. & Wallgrün, J. O. Trans. GIS 23, 118–136 (2019).

    Article  Google Scholar 

  6. Bhandari, P., Anastasopoulos, A. & Pfoser, D. in Proc. 31st ACM International Conference on Advances in Geographic Information Systems 1–4 (ACM, 2023).

  7. Castro, A., Pinto, J., Reino, L., Pipek, P. & Capinha, C. Ecol. Inform. 82, 102742 (2024).

    Article  Google Scholar 

  8. Cohn, A. G. & Blackwell, R. E. in Leibniz International Proceedings in Informatics (LIPIcs) 315, 28:1–28:9 (Schloss Dagstuhl – Leibniz-Zentrum für Informatik, 2024).

  9. Yan, Y. & Lee, J. in Proc. 33rd ACM International Conference on Information and Knowledge Management (eds Serra, E. & Spezzano, F.) 4163–4167 (ACM, 2024).

  10. Manvi, R. et al. Preprint at https://arxiv.org/abs/2310.06213 (2023).

  11. Zhou, Z. et al. in Proc. 47th International ACM SIGIR Conference on Research and Development in Information Retrieval 2749–2754 (ACM, 2024).

  12. Murphey, P. C., Guralnick, R. P., Glaubitz, R., Neufeld, D. & Ryan, J. A. Phyloinformatics 3, 1–29 (2004).

    Google Scholar 

  13. Li, H. et al. in Findings of the Association for Computational Linguistics: ACL 2025 20004–20026 (Association for Computational Linguistics, 2025).

  14. Levy, A. A. & Geva, M. in Proc. 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 2: Short Papers) 385–395 (Association for Computational Linguistics, 2025).

  15. Wieczorek, J., Guo, Q. & Hijmans, R. Int. J. Geogr. Inf. Sci. 18, 745–767 (2004).

    Article  Google Scholar 

  16. Ye, F. et al. Adv. Neural Inf. Process. Syst. 37, 15356–15385 (2024).

    Google Scholar 

  17. Tyen, G., Mansoor, H., Carbune, V., Chen, P. & Mak, T. in Findings of the Association for Computational Linguistics ACL 2024 (eds. Ku, L.-W., Martins, A. & Srikumar, V.) 13894–13908 (Association for Computational Linguistics, 2024).

  18. Alahmari, S. S., Hall, L. O., Mouton, P. R. & Goldgof, D. B. IEEE Access 12, 153221–153231 (2024).

    Article  Google Scholar 

  19. Wang, J. J. & Wang, V. X. Preprint at https://arxiv.org/abs/2503.16974 (2025).

  20. Dalton, R. Nature 423, 575 (2003).

    Article  CAS  PubMed  Google Scholar 

Download references

Acknowledgements

We thank C. A. McCormick, curator of University of North Carolina Chapel Hill Herbarium, for their valuable suggestions. D.S.P. was supported by a grant from Korea University.

Author information

Authors and Affiliations

Authors

Contributions

X.F. conceived the initial idea, which was inspired through discussions with D.S.P. X.F., Y.X. and D.S.P. designed the research. Y.X. performed the analysis, with help from J.H. and R.X. Y.X., X.F., D.S.P., M.A.S-A, L.J.A.L., J.C., M.L., N.S. and C.H. performed the manual georeferencing. Y.X. and X.F. wrote the manuscript, with major input from D.S.P. and M.A.S-A. All authors contributed to the interpretation of the results and the revision of the manuscript.

Corresponding author

Correspondence to Xiao Feng.

Ethics declarations

Competing interests

The authors declare no competing interests.

Peer review

Peer review information

Nature Plants thanks Gengchen Mai, Barbara Thiers and John Waller for their contribution to the peer review of this work.

Supplementary information

Supplementary Information (download PDF )

Supplementary Information. Details of the benchmark experiment, Supplementary Figures 1–4, Supplementary Tables 1–8

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Xie, Y., Park, D.S., Sinnott-Armstrong, M.A. et al. Using large language models to address the bottleneck of georeferencing natural history collections. Nat. Plants 11, 2446–2450 (2025). https://doi.org/10.1038/s41477-025-02162-y

Download citation

  • Published:

  • Version of record:

  • Issue date:

  • DOI: https://doi.org/10.1038/s41477-025-02162-y

This article is cited by

Search

Quick links

Nature Briefing

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing