Skip to main content

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • Review Article
  • Published:

The city as text

Preface

Urban researchers now have access to vast amounts of textual data—from social media and news to planning documents and property listings. These textual data provide important information about the activities of people and organizations in urban environments. Meanwhile, recent advancements in computational tools, including large language models, have expanded our ability to analyze textual data. Here we explore how these tools are reshaping the ways we analyze, understand and theorize the city through text. By outlining key developments, applications and challenges, it argues that text is no longer a ‘fringe resource’ but a central component in urban analytics with the potential to connect quantitative and qualitative researchers.

This is a preview of subscription content, access via your institution

Access options

Buy this article

USD 39.95

Prices may be subject to local taxes which are calculated during checkout

Similar content being viewed by others

References

  1. Arribas-Bel, D. Accidental, open and everywhere: emerging data sources for the understanding of cities. Appl. Geogr. 49, 45–53 (2014).

    Google Scholar 

  2. Oto-Peralías, D. What do street names tell us? The ‘city-text’ as socio-cultural data. J. Econ. Geogr. 18, 187–211 (2018).

    Google Scholar 

  3. Batty, M. The New Science of Cities (MIT Press, 2017).

  4. Kitchin, R. Big Data, new epistemologies and paradigm shifts. Big Data Soc. https://doi.org/10.1177/2053951714528481 (2014).

    Article  Google Scholar 

  5. Arribas‐Bel, D. & Reades, J. Geography and computers: past, present and future. Geogr. Compass 12, e12403 (2018).

    Google Scholar 

  6. Harford, T. Big data: are we making a big mistake? Significance 11, 14–19 (2014).

    Google Scholar 

  7. Long, Y. & Thill, J.-C. Combining smart card data and household travel survey to analyze jobs-housing relationships in Beijing. Comput. Environ. Urban Syst. 53, 19–35 (2015).

    Google Scholar 

  8. Lazer, D. et al. Computational social science. Science 323, 721–723 (2009).

    Google Scholar 

  9. Lore, M., Harten, J. G. & Boeing, G. A hybrid deep learning method for identifying topics in large-scale urban text data: benefits and trade-offs. Comput. Environ. Urban Syst. 111, 102131 (2024).

    Google Scholar 

  10. Lazer, D. M. J. et al. Computational social science: obstacles and opportunities. Science 369, 1060–1062 (2020).

    Google Scholar 

  11. Zook, M. & Poorthuis, A. in The Geography of Beer (eds Patterson, M. & Hoalst-Pullen, N.) 201–209 (Springer, 2014).

  12. Crooks, A., Croitoru, A., Stefanidis, A. & Radzikowski, J. #Earthquake: Twitter as a distributed sensor system. Trans. GIS 17, 124–147 (2013).

    Google Scholar 

  13. Crampton, J. W. et al. Beyond the geotag: situating ‘big data’ and leveraging the potential of the geoweb. Cartogr. Geogr. Inf. Sci. 40, 130–139 (2013).

    Google Scholar 

  14. Johnson, I. L., Sengupta, S., Schöning, J. & Hecht, B. The geography and importance of localness in geotagged social media. In CHI’16: Proc. 2016 CHI Conference on Human Factors in Computing Systems, 515–526 (ACM, 2016).

  15. Hecht, B. & Stephens, M. A tale of cities: urban biases in volunteered geographic information. In Proc. Int. AAAI Conference on Web and Social Media Vol. 8, 197–205 (AAAI, 2014).

  16. Wang, Z., Lam, N. S., Obradovich, N. & Ye, X. Are vulnerable communities digitally left behind in social responses to natural disasters? An evidence from Hurricane Sandy with Twitter data. Appl. Geogr. 108, 1–8 (2019).

    Google Scholar 

  17. Hristova, D., Williams, M. J., Musolesi, M., Panzarasa, P. & Mascolo, C. Measuring urban social diversity using interconnected geo-social networks. In WWW’16: Proc. 25th International Conference on World Wide Web, 21–30 (ACM, 2016).

  18. Borner, K. Atlas of Science: Visualizing What We Know (MIT Press, 2010).

  19. Skupin, A. The world of geography: visualizing a knowledge domain with cartographic means. Proc. Natl Acad. Sci. USA 101, 5274–5278 (2004).

    Google Scholar 

  20. Boyack, K. W. et al. Clustering more than two million biomedical publications: comparing the accuracies of nine text-based similarity approaches. PLoS ONE 6, e18029 (2011).

    Google Scholar 

  21. Mikolov, T., Chen, K., Corrado, G. & Dean, J. Efficient estimation of word representations in vector space. In Proc. 1st International Conference on Learning Representations (ICLR) 1–12 (ICLR, 2013).

  22. Nissim, M., van Noord, R. & Van Der Goot, R. Fair is better than sensational: man is to doctor as woman is to doctor. Comput. Linguist. 46, 487–497 (2020).

    Google Scholar 

  23. Stich, C., Tranos, E. & Nathan, M. Modeling clusters from the ground up: a web data approach. Environ. Plan. B Urban Anal. City Sci. 50, 244–267 (2023).

    Google Scholar 

  24. Würschinger, Q. & McGillivray, B. Semantic change and socio-semantic variation: the case of COVID-related neologisms on Reddit. Linguist. Vanguard https://doi.org/10.1515/lingvan-2023-0106 (2024).

  25. Taylor, J. E. & Gregory, I. N. Deep Mapping the Literary Lake District: A Geographical Text Analysis (Rutgers Univ. Press, 2022).

  26. National Archives. Born-digital records and metadata. National Archives https://www.nationalarchives.gov.uk/information-management/manage-information/digital-records-transfer/what-are-born-digital-records/ (2024).

  27. Moretti, F. Distant Reading (Verso Books, 2013).

  28. Fu, X. Natural language processing in urban planning: a research agenda. J. Plan. Lit. https://doi.org/10.1177/08854122241229571 (2024).

  29. Hu, Y. Geo‐text data and data‐driven geospatial semantics. Geogr. Compass 12, e12404 (2018).

    Google Scholar 

  30. Ahmed, K. B., Radenski, A., Bouhorma, M. & Ahmed, M. B. Sentiment analysis for smart cities: state of the art and opportunities. In Proc. International Conference on Internet Computing and Internet of Things (ICOMP) 55–61 (The Steering Committee of The World Congress in Computer Science, 2016).

  31. Kovacs-Gyori, A., Ristea, A., Havas, C., Resch, B. & Cabrera-Barona, P. #London2012: towards citizen-contributed urban planning through sentiment analysis of Twitter data. Urban Plan. 3, 75–99 (2018).

    Google Scholar 

  32. Ceccato, V. & Snickars, F. in Urban Ecology (eds Breuste, J. et al.) 273–277 (Springer, 1998).

  33. Das, D. Urban quality of life: a case study of Guwahati. Soc. Indic. Res. 88, 297–310 (2008).

    Google Scholar 

  34. Eby, J., Kitchen, P. & Williams, A. Perceptions of quality life in Hamilton’s neighbourhood hubs: a qualitative analysis. Soc. Indic. Res. 108, 299–315 (2012).

    Google Scholar 

  35. Khoo, C. S. & Johnkhan, S. B. Lexicon-based sentiment analysis: comparative evaluation of six sentiment lexicons. J. Inf. Sci. 44, 491–511 (2018).

    Google Scholar 

  36. Wankhade, M., Rao, A. C. S. & Kulkarni, C. A survey on sentiment analysis methods, applications and challenges. Artif. Intell. Rev. 55, 5731–5780 (2022).

    Google Scholar 

  37. Hu, Y., Deng, C. & Zhou, Z. A semantic and sentiment analysis on online neighborhood reviews for understanding the perceptions of people toward their living environments. Ann. Am. Assoc. Geogr. 109, 1052–1073 (2019).

    Google Scholar 

  38. Zou, L. et al. Social and geographical disparities in Twitter use during Hurricane Harvey. Int. J. Digit. Earth 12, 1300–1318 (2019).

    Google Scholar 

  39. Huang, J. et al. Re-examining Jane Jacobs’ doctrine using new urban data in Hong Kong. Environ. Plan. B Urban Anal. City Sci. 50, 76–93 (2023).

    Google Scholar 

  40. Fu, X., Sanchez, T. W., Li, C. & Reu Junqueira, J. Deciphering public voices in the digital era: benchmarking ChatGPT for analyzing citizen feedback in Hamilton, New Zealand. J. Am. Plan. Assoc. 90, 728–741 (2024).

    Google Scholar 

  41. Azaryahu, M. Renaming the past: changes in ‘city text’ in Germany and Austria, 1945–1947. Hist. Mem. 2, 32–53 (1990).

    Google Scholar 

  42. Zelinsky, W. Along the frontiers of name geography. Prof. Geogr. 49, 465–466 (1997).

    Google Scholar 

  43. Rose-Redwood, R., Alderman, D. & Azaryahu, M. Geographies of toponymic inscription: new directions in critical place-name studies. Prog. Hum. Geogr. 34, 453–470 (2010).

    Google Scholar 

  44. Purves, R. S., Clough, P., Jones, C. B., Hall, M. H. & Murdock, V. Geographic information retrieval: progress and challenges in spatial search of text. Found. Trends® Inf. Retr. 12, 164–318 (2018).

    Google Scholar 

  45. Goodchild, M. F. & Hill, L. L. Introduction to digital gazetteer research. Int. J. Geogr. Inf. Sci. 22, 1039–1044 (2008).

    Google Scholar 

  46. Alex, B., Byrne, K., Grover, C. & Tobin, R. Adapting the Edinburgh geoparser for historical georeferencing. Int. J. Humanit. Arts Comput. 9, 15–35 (2015).

    Google Scholar 

  47. Karimzadeh, M., Pezanowski, S., MacEachren, A. M. & Wallgrün, J. O. GeoTxt: a scalable geoparsing system for unstructured text geolocation. Trans. GIS 23, 118–136 (2019).

    Google Scholar 

  48. DeLozier, G., Baldridge, J. & London, L. Gazetteer-independent toponym resolution using geographic word profiles. In Proc. Twenty-Ninth AAAI Conference on Artificial Intelligence https://doi.org/10.1609/aaai.v29i1.9531 (AAAI, 2015).

  49. Gritta, M., Pilehvar, M. T. & Collier, N. Which Melbourne? Augmenting geocoding with maps. In Proc. 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) 1285–1296 (ACL, 2018).

  50. Wang, J., Hu, Y. & Joseph, K. NeuroTPR: a neuro-net toponym recognition model for extracting locations from social media messages. Trans. GIS 24, 719–735 (2020).

    Google Scholar 

  51. Zhou, B., Zou, L., Hu, Y., Qiang, Y. & Goldberg, D. TopoBERT: a plug and play toponym recognition module harnessing fine-tuned BERT. Int. J. Digit. Earth 16, 3045–3063 (2023).

    Google Scholar 

  52. Hu, Y. et al. Geo-knowledge-guided GPT models improve the extraction of location descriptions from disaster-related social media messages. Int. J. Geogr. Inf. Sci. 37, 2289–2318 (2023).

    Google Scholar 

  53. Hu, X., Kersten, J., Klan, F. & Farzana, S. M. Toponym resolution leveraging lightweight and open-source large language models and geo-knowledge. Int. J. Geogr. Inf. Sci. https://doi.org/10.1080/13658816.2024.2405182 (2024).

  54. Hu, Y. & Janowicz, K. An empirical study on the names of points of interest and their changes with geographic distance. In Proc. 10th International Conference on Geographic Information Science https://doi.org/10.4230/LIPIcs.GISCIENCE.2018.5 (Schloss Dagstuhl-Leibniz-Zentrum fuer Informatik, 2018).

  55. Hu, Y., Mao, H. & McKenzie, G. A natural language processing and geospatial clustering framework for harvesting local place names from geotagged housing advertisements. Int. J. Geogr. Inf. Sci. 33, 714–738 (2019).

    Google Scholar 

  56. McKenzie, G. & Hu, Y. The ‘nearby’ exaggeration in real estate. In Cognitive Scales of Spatial Information Workshop at COSIT2017: Proc. 13th International Conference on Spatial Information Theory 1–4 (Springer, 2017).

  57. Peris, A., Meijers, E. & Van Ham, M. Information diffusion between Dutch cities: revisiting Zipf and Pred using a computational social science approach. Comput. Environ. Urban Syst. 85, 101565 (2021).

    Google Scholar 

  58. Southall, H., Mostern, R. & Berman, M. L. On historical gazetteers. Int. J. Humanit. Arts Comput. 5, 127–145 (2011).

    Google Scholar 

  59. Delmelle, E. C. GIScience and neighborhood change: toward an understanding of processes of change. Trans. GIS 26, 567–584 (2022).

    Google Scholar 

  60. Chapple, K., Poorthuis, A., Zook, M. & Phillips, E. Monitoring streets through tweets: using user-generated geographic information to predict gentrification and displacement. Environ. Plan. B Urban Anal. City Sci. 49, 704–721 (2022).

    Google Scholar 

  61. Glaeser, E. L., Kim, H. & Luca, M. Nowcasting gentrification: using Yelp data to quantify neighborhood change. In AEA Papers and Proceedings Vol. 108, 77–82 (American Economic Association, 2018).

  62. Zhou, X. & Zhang, L. Crowdsourcing functions of the living city from Twitter and Foursquare data. Cartogr. Geogr. Inf. Sci. 43, 393–404 (2016).

    Google Scholar 

  63. Törnberg, P. & Chiappini, L. Selling black places on Airbnb: colonial discourse and the marketing of black communities in New York City. Environ. Plan. Econ. Space 52, 553–572 (2020).

    Google Scholar 

  64. Zukin, S., Lindeman, S. & Hurson, L. The omnivore’s neighborhood? Online restaurant reviews, race and gentrification. J. Consum. Cult. 17, 459–479 (2017).

    Google Scholar 

  65. Olson, A. W., Calderón-Figueroa, F., Bidian, O., Silver, D. & Sanner, S. Reading the city through its neighbourhoods: deep text embeddings of Yelp reviews as a basis for determining similarity and change. Cities 110, 103045 (2021).

    Google Scholar 

  66. Delmelle, E. C. & Nilsson, I. The language of neighborhoods: a predictive-analytical framework based on property advertisement text and mortgage lending data. Comput. Environ. Urban Syst. 88, 101658 (2021).

    Google Scholar 

  67. Kennedy, I., Hess, C., Paullada, A. & Chasins, S. Racialized discourse in Seattle rental ad texts. Soc. Forces 99, 1432–1456 (2021).

    Google Scholar 

  68. Nilsson, I. & Delmelle, E. C. Smart growth as a luxury amenity? Exploring the relationship between the marketing of smart growth characteristics and neighborhood racial and income change. J. Transp. Geogr. 106, 103522 (2023).

    Google Scholar 

  69. Zhang, H., Li, Y. & Branco, P. Describe the house and I will tell you the price: house price prediction with textual description data. Nat. Lang. Eng 30, 661–695 (2024).

    Google Scholar 

  70. Huang, Z. How Languages used in Property Listing Descriptions Vary and Affect its Price Geographically Across the UK? (Univ. College London, 2020).

  71. Jiang, Y. Housing Price Prediction in London: a Predictive Analysis Based on Property Advertisement Texts (Univ. College London, 2022).

  72. Wang, W. How do Textual Information and Sentiment Analysis Improve House Price Estimation? (Univ. College London, 2022).

  73. Lai, Y. & Kontokosta, C. E. Topic modeling to discover the thematic structure and spatial-temporal patterns of building renovation and adaptive reuse in cities. Comput. Environ. Urban Syst. 78, 101383 (2019).

    Google Scholar 

  74. Mleczko, M. & Desmond, M. Using natural language processing to construct a National Zoning and Land Use Database. Urban Stud. 60, 2564–2584 (2023).

    Google Scholar 

  75. Xu, W., Markley, S., Bronin, S. C. & Drogaris, D. A national zoning atlas to inform housing research, policy and public participation. Cityscape 25, 55–72 (2023).

    Google Scholar 

  76. Brinkley, C. & Stahmer, C. What is in a plan? Using natural language processing to read 461 California city general plans. J. Plan. Educ. Res. 44, 632–648 (2021).

    Google Scholar 

  77. Brinkley, C. & Wagner, J. Who is planning for environmental justice—and how? J. Am. Plan. Assoc. 90, 63–76 (2022).

    Google Scholar 

  78. D’ignazio, C. & Klein, L. F. Data Feminism (MIT Press, 2023).

  79. Thomas, T., Ramiller, A., Ren, C. & Toomet, O. Toward a national eviction data collection strategy using natural language processing. Cityscape 26, 241–260 (2024).

    Google Scholar 

  80. Gromis, A. et al. Estimating eviction prevalence across the United States. Proc. Natl Acad. Sci. USA 119, e2116169119 (2022).

    Google Scholar 

  81. Nelson, K., Garboden, P., McCabe, B. J. & Rosen, E. Evictions: the comparative analysis problem. Hous. Policy Debate 31, 696–716 (2021).

    Google Scholar 

  82. Summers, N. & Steil, J. Pathways to eviction. Law Soc. Inq 50, 129–169 (2025).

    Google Scholar 

  83. Cai, M., Huang, H. & Decaminada, T. Local data at a national scale: introducing a dataset of official municipal websites in the United States for text-based analytics. Environ. Plan. B Urban Anal. City Sci. 50, 1988–1993 (2023).

    Google Scholar 

  84. Occhini, G. Who, What and Where (Univ. of Bristol, 2024).

  85. Arts, S., Hou, J. & Gomez, J. C. Natural language processing to identify the creation and impact of new technologies in patent text: code, data and new measures. Res. Policy 50, 104144 (2021).

    Google Scholar 

  86. Ozgun, B. & Broekel, T. The geography of innovation and technology news—an empirical study of the German news media. Technol. Forecast. Soc. Change 167, 120692 (2021).

    Google Scholar 

  87. Axenbeck, J. & Breithaupt, P. Innovation indicators based on firm websites—which website characteristics predict firm-level innovation activity? PLoS ONE 16, e0249583 (2021).

    Google Scholar 

  88. Yan, B., Janowicz, K., Mai, G. & Gao, S. From ITDL to Place2Vec—reasoning about place type similarity and relatedness by learning embeddings from augmented spatial contexts. In SIGSPATIAL’17: Proc. 25th ACM SIGSPATIAL International Conference on Advances in Geographic Information Systems Vol. 35, 1–10 (ACM, 2017).

  89. Spruyt, V. Loc2Vec: Learning Location Embeddings with Triplet-Loss Networks (Sentiance, 2018).

  90. Woźniak, S. & Szymański, P. hex2vec: context-aware embedding H3 hexagons with openstreetmap tags. In GEOAI’21: Proc. 4th ACM SIGSPATIAL International Workshop on AI for Geographic Knowledge Discovery 61–71 (ACM, 2021).

  91. Du, J., Chen, Y., Wang, Y. & Pu, J. Zone2Vec: distributed representation learning of urban zones. In Proc. 2018 24th International Conference on Pattern Recognition (ICPR) 880–885 (IEEE, 2018).

  92. Sun, K., Hu, Y., Joseph, K. & Zhou, R. Z. GALLOC: a GeoAnnotator for Labeling LOCation descriptions from disaster-related text messages. Int. J. Geogr. Inf. Sci. 39, 1623–1653 (2025).

    Google Scholar 

  93. Mekala, D. & Shang, J. Contextualized weak supervision for text classification. In Proc. 58th Annual Meeting of the Association for Computational Linguistics 323–333 (ACL, 2020).

  94. Occhini, G., Tranos, E. & Wolf, L. Measuring a country’s digital industrial structure: commercial websites and weakly supervised classification to the rescue. Preprint at SocArXiv https://doi.org/10.31235/osf.io/h572n (2023).

  95. Singleton, A. D. & Spielman, S. Segmentation using large language models: a new typology of American neighborhoods. EPJ Data Sci. 13, 34 (2024).

    Google Scholar 

  96. Wu, J. et al. A survey on LLM-generated text detection: necessity, methods and future directions. Comput. Linguist. 51, 275–338 (2025).

    Google Scholar 

  97. Mellon, J. et al. Do AIs know what the most important issue is? Using language models to code open-text social survey responses at scale. Res. Polit. 11, 20531680241231468 (2024).

    Google Scholar 

  98. Park, J. S. et al. Generative agents: interactive simulacra of human behavior. In UIST’23: Proc. 36th Annual ACM Symposium on User Interface Software and Technology 1–22 (ACM, 2023).

  99. Zheng, Z. & Sieber, R. Putting humans back in the loop of machine learning in Canadian smart cities. Trans. GIS 26, 8–24 (2022).

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Contributions

J.R. conceived and designed the experiments. All authors contributed materials and analysis tools and wrote the paper.

Corresponding author

Correspondence to Jonathan Reades.

Ethics declarations

Competing interests

The authors declare no competing interests.

Peer review

Peer review information

Nature Cities thanks Julia Harten, Xinyu Fu and the other, anonymous, reviewer(s) for their contribution to the peer review of this work.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Reades, J., Hu, Y., Tranos, E. et al. The city as text. Nat Cities 2, 794–800 (2025). https://doi.org/10.1038/s44284-025-00314-x

Download citation

  • Received:

  • Accepted:

  • Published:

  • Version of record:

  • Issue date:

  • DOI: https://doi.org/10.1038/s44284-025-00314-x

Search

Quick links

Nature Briefing AI and Robotics

Sign up for the Nature Briefing: AI and Robotics newsletter — what matters in AI and robotics research, free to your inbox weekly.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing: AI and Robotics