Preface
Urban researchers now have access to vast amounts of textual data—from social media and news to planning documents and property listings. These textual data provide important information about the activities of people and organizations in urban environments. Meanwhile, recent advancements in computational tools, including large language models, have expanded our ability to analyze textual data. Here we explore how these tools are reshaping the ways we analyze, understand and theorize the city through text. By outlining key developments, applications and challenges, it argues that text is no longer a ‘fringe resource’ but a central component in urban analytics with the potential to connect quantitative and qualitative researchers.
This is a preview of subscription content, access via your institution
Access options
Subscribe to this journal
Receive 12 digital issues and online access to articles
$119.00 per year
only $9.92 per issue
Buy this article
- Purchase on SpringerLink
- Instant access to the full article PDF.
USD 39.95
Prices may be subject to local taxes which are calculated during checkout
Similar content being viewed by others
References
Arribas-Bel, D. Accidental, open and everywhere: emerging data sources for the understanding of cities. Appl. Geogr. 49, 45–53 (2014).
Oto-Peralías, D. What do street names tell us? The ‘city-text’ as socio-cultural data. J. Econ. Geogr. 18, 187–211 (2018).
Batty, M. The New Science of Cities (MIT Press, 2017).
Kitchin, R. Big Data, new epistemologies and paradigm shifts. Big Data Soc. https://doi.org/10.1177/2053951714528481 (2014).
Arribas‐Bel, D. & Reades, J. Geography and computers: past, present and future. Geogr. Compass 12, e12403 (2018).
Harford, T. Big data: are we making a big mistake? Significance 11, 14–19 (2014).
Long, Y. & Thill, J.-C. Combining smart card data and household travel survey to analyze jobs-housing relationships in Beijing. Comput. Environ. Urban Syst. 53, 19–35 (2015).
Lazer, D. et al. Computational social science. Science 323, 721–723 (2009).
Lore, M., Harten, J. G. & Boeing, G. A hybrid deep learning method for identifying topics in large-scale urban text data: benefits and trade-offs. Comput. Environ. Urban Syst. 111, 102131 (2024).
Lazer, D. M. J. et al. Computational social science: obstacles and opportunities. Science 369, 1060–1062 (2020).
Zook, M. & Poorthuis, A. in The Geography of Beer (eds Patterson, M. & Hoalst-Pullen, N.) 201–209 (Springer, 2014).
Crooks, A., Croitoru, A., Stefanidis, A. & Radzikowski, J. #Earthquake: Twitter as a distributed sensor system. Trans. GIS 17, 124–147 (2013).
Crampton, J. W. et al. Beyond the geotag: situating ‘big data’ and leveraging the potential of the geoweb. Cartogr. Geogr. Inf. Sci. 40, 130–139 (2013).
Johnson, I. L., Sengupta, S., Schöning, J. & Hecht, B. The geography and importance of localness in geotagged social media. In CHI’16: Proc. 2016 CHI Conference on Human Factors in Computing Systems, 515–526 (ACM, 2016).
Hecht, B. & Stephens, M. A tale of cities: urban biases in volunteered geographic information. In Proc. Int. AAAI Conference on Web and Social Media Vol. 8, 197–205 (AAAI, 2014).
Wang, Z., Lam, N. S., Obradovich, N. & Ye, X. Are vulnerable communities digitally left behind in social responses to natural disasters? An evidence from Hurricane Sandy with Twitter data. Appl. Geogr. 108, 1–8 (2019).
Hristova, D., Williams, M. J., Musolesi, M., Panzarasa, P. & Mascolo, C. Measuring urban social diversity using interconnected geo-social networks. In WWW’16: Proc. 25th International Conference on World Wide Web, 21–30 (ACM, 2016).
Borner, K. Atlas of Science: Visualizing What We Know (MIT Press, 2010).
Skupin, A. The world of geography: visualizing a knowledge domain with cartographic means. Proc. Natl Acad. Sci. USA 101, 5274–5278 (2004).
Boyack, K. W. et al. Clustering more than two million biomedical publications: comparing the accuracies of nine text-based similarity approaches. PLoS ONE 6, e18029 (2011).
Mikolov, T., Chen, K., Corrado, G. & Dean, J. Efficient estimation of word representations in vector space. In Proc. 1st International Conference on Learning Representations (ICLR) 1–12 (ICLR, 2013).
Nissim, M., van Noord, R. & Van Der Goot, R. Fair is better than sensational: man is to doctor as woman is to doctor. Comput. Linguist. 46, 487–497 (2020).
Stich, C., Tranos, E. & Nathan, M. Modeling clusters from the ground up: a web data approach. Environ. Plan. B Urban Anal. City Sci. 50, 244–267 (2023).
Würschinger, Q. & McGillivray, B. Semantic change and socio-semantic variation: the case of COVID-related neologisms on Reddit. Linguist. Vanguard https://doi.org/10.1515/lingvan-2023-0106 (2024).
Taylor, J. E. & Gregory, I. N. Deep Mapping the Literary Lake District: A Geographical Text Analysis (Rutgers Univ. Press, 2022).
National Archives. Born-digital records and metadata. National Archives https://www.nationalarchives.gov.uk/information-management/manage-information/digital-records-transfer/what-are-born-digital-records/ (2024).
Moretti, F. Distant Reading (Verso Books, 2013).
Fu, X. Natural language processing in urban planning: a research agenda. J. Plan. Lit. https://doi.org/10.1177/08854122241229571 (2024).
Hu, Y. Geo‐text data and data‐driven geospatial semantics. Geogr. Compass 12, e12404 (2018).
Ahmed, K. B., Radenski, A., Bouhorma, M. & Ahmed, M. B. Sentiment analysis for smart cities: state of the art and opportunities. In Proc. International Conference on Internet Computing and Internet of Things (ICOMP) 55–61 (The Steering Committee of The World Congress in Computer Science, 2016).
Kovacs-Gyori, A., Ristea, A., Havas, C., Resch, B. & Cabrera-Barona, P. #London2012: towards citizen-contributed urban planning through sentiment analysis of Twitter data. Urban Plan. 3, 75–99 (2018).
Ceccato, V. & Snickars, F. in Urban Ecology (eds Breuste, J. et al.) 273–277 (Springer, 1998).
Das, D. Urban quality of life: a case study of Guwahati. Soc. Indic. Res. 88, 297–310 (2008).
Eby, J., Kitchen, P. & Williams, A. Perceptions of quality life in Hamilton’s neighbourhood hubs: a qualitative analysis. Soc. Indic. Res. 108, 299–315 (2012).
Khoo, C. S. & Johnkhan, S. B. Lexicon-based sentiment analysis: comparative evaluation of six sentiment lexicons. J. Inf. Sci. 44, 491–511 (2018).
Wankhade, M., Rao, A. C. S. & Kulkarni, C. A survey on sentiment analysis methods, applications and challenges. Artif. Intell. Rev. 55, 5731–5780 (2022).
Hu, Y., Deng, C. & Zhou, Z. A semantic and sentiment analysis on online neighborhood reviews for understanding the perceptions of people toward their living environments. Ann. Am. Assoc. Geogr. 109, 1052–1073 (2019).
Zou, L. et al. Social and geographical disparities in Twitter use during Hurricane Harvey. Int. J. Digit. Earth 12, 1300–1318 (2019).
Huang, J. et al. Re-examining Jane Jacobs’ doctrine using new urban data in Hong Kong. Environ. Plan. B Urban Anal. City Sci. 50, 76–93 (2023).
Fu, X., Sanchez, T. W., Li, C. & Reu Junqueira, J. Deciphering public voices in the digital era: benchmarking ChatGPT for analyzing citizen feedback in Hamilton, New Zealand. J. Am. Plan. Assoc. 90, 728–741 (2024).
Azaryahu, M. Renaming the past: changes in ‘city text’ in Germany and Austria, 1945–1947. Hist. Mem. 2, 32–53 (1990).
Zelinsky, W. Along the frontiers of name geography. Prof. Geogr. 49, 465–466 (1997).
Rose-Redwood, R., Alderman, D. & Azaryahu, M. Geographies of toponymic inscription: new directions in critical place-name studies. Prog. Hum. Geogr. 34, 453–470 (2010).
Purves, R. S., Clough, P., Jones, C. B., Hall, M. H. & Murdock, V. Geographic information retrieval: progress and challenges in spatial search of text. Found. Trends® Inf. Retr. 12, 164–318 (2018).
Goodchild, M. F. & Hill, L. L. Introduction to digital gazetteer research. Int. J. Geogr. Inf. Sci. 22, 1039–1044 (2008).
Alex, B., Byrne, K., Grover, C. & Tobin, R. Adapting the Edinburgh geoparser for historical georeferencing. Int. J. Humanit. Arts Comput. 9, 15–35 (2015).
Karimzadeh, M., Pezanowski, S., MacEachren, A. M. & Wallgrün, J. O. GeoTxt: a scalable geoparsing system for unstructured text geolocation. Trans. GIS 23, 118–136 (2019).
DeLozier, G., Baldridge, J. & London, L. Gazetteer-independent toponym resolution using geographic word profiles. In Proc. Twenty-Ninth AAAI Conference on Artificial Intelligence https://doi.org/10.1609/aaai.v29i1.9531 (AAAI, 2015).
Gritta, M., Pilehvar, M. T. & Collier, N. Which Melbourne? Augmenting geocoding with maps. In Proc. 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) 1285–1296 (ACL, 2018).
Wang, J., Hu, Y. & Joseph, K. NeuroTPR: a neuro-net toponym recognition model for extracting locations from social media messages. Trans. GIS 24, 719–735 (2020).
Zhou, B., Zou, L., Hu, Y., Qiang, Y. & Goldberg, D. TopoBERT: a plug and play toponym recognition module harnessing fine-tuned BERT. Int. J. Digit. Earth 16, 3045–3063 (2023).
Hu, Y. et al. Geo-knowledge-guided GPT models improve the extraction of location descriptions from disaster-related social media messages. Int. J. Geogr. Inf. Sci. 37, 2289–2318 (2023).
Hu, X., Kersten, J., Klan, F. & Farzana, S. M. Toponym resolution leveraging lightweight and open-source large language models and geo-knowledge. Int. J. Geogr. Inf. Sci. https://doi.org/10.1080/13658816.2024.2405182 (2024).
Hu, Y. & Janowicz, K. An empirical study on the names of points of interest and their changes with geographic distance. In Proc. 10th International Conference on Geographic Information Science https://doi.org/10.4230/LIPIcs.GISCIENCE.2018.5 (Schloss Dagstuhl-Leibniz-Zentrum fuer Informatik, 2018).
Hu, Y., Mao, H. & McKenzie, G. A natural language processing and geospatial clustering framework for harvesting local place names from geotagged housing advertisements. Int. J. Geogr. Inf. Sci. 33, 714–738 (2019).
McKenzie, G. & Hu, Y. The ‘nearby’ exaggeration in real estate. In Cognitive Scales of Spatial Information Workshop at COSIT2017: Proc. 13th International Conference on Spatial Information Theory 1–4 (Springer, 2017).
Peris, A., Meijers, E. & Van Ham, M. Information diffusion between Dutch cities: revisiting Zipf and Pred using a computational social science approach. Comput. Environ. Urban Syst. 85, 101565 (2021).
Southall, H., Mostern, R. & Berman, M. L. On historical gazetteers. Int. J. Humanit. Arts Comput. 5, 127–145 (2011).
Delmelle, E. C. GIScience and neighborhood change: toward an understanding of processes of change. Trans. GIS 26, 567–584 (2022).
Chapple, K., Poorthuis, A., Zook, M. & Phillips, E. Monitoring streets through tweets: using user-generated geographic information to predict gentrification and displacement. Environ. Plan. B Urban Anal. City Sci. 49, 704–721 (2022).
Glaeser, E. L., Kim, H. & Luca, M. Nowcasting gentrification: using Yelp data to quantify neighborhood change. In AEA Papers and Proceedings Vol. 108, 77–82 (American Economic Association, 2018).
Zhou, X. & Zhang, L. Crowdsourcing functions of the living city from Twitter and Foursquare data. Cartogr. Geogr. Inf. Sci. 43, 393–404 (2016).
Törnberg, P. & Chiappini, L. Selling black places on Airbnb: colonial discourse and the marketing of black communities in New York City. Environ. Plan. Econ. Space 52, 553–572 (2020).
Zukin, S., Lindeman, S. & Hurson, L. The omnivore’s neighborhood? Online restaurant reviews, race and gentrification. J. Consum. Cult. 17, 459–479 (2017).
Olson, A. W., Calderón-Figueroa, F., Bidian, O., Silver, D. & Sanner, S. Reading the city through its neighbourhoods: deep text embeddings of Yelp reviews as a basis for determining similarity and change. Cities 110, 103045 (2021).
Delmelle, E. C. & Nilsson, I. The language of neighborhoods: a predictive-analytical framework based on property advertisement text and mortgage lending data. Comput. Environ. Urban Syst. 88, 101658 (2021).
Kennedy, I., Hess, C., Paullada, A. & Chasins, S. Racialized discourse in Seattle rental ad texts. Soc. Forces 99, 1432–1456 (2021).
Nilsson, I. & Delmelle, E. C. Smart growth as a luxury amenity? Exploring the relationship between the marketing of smart growth characteristics and neighborhood racial and income change. J. Transp. Geogr. 106, 103522 (2023).
Zhang, H., Li, Y. & Branco, P. Describe the house and I will tell you the price: house price prediction with textual description data. Nat. Lang. Eng 30, 661–695 (2024).
Huang, Z. How Languages used in Property Listing Descriptions Vary and Affect its Price Geographically Across the UK? (Univ. College London, 2020).
Jiang, Y. Housing Price Prediction in London: a Predictive Analysis Based on Property Advertisement Texts (Univ. College London, 2022).
Wang, W. How do Textual Information and Sentiment Analysis Improve House Price Estimation? (Univ. College London, 2022).
Lai, Y. & Kontokosta, C. E. Topic modeling to discover the thematic structure and spatial-temporal patterns of building renovation and adaptive reuse in cities. Comput. Environ. Urban Syst. 78, 101383 (2019).
Mleczko, M. & Desmond, M. Using natural language processing to construct a National Zoning and Land Use Database. Urban Stud. 60, 2564–2584 (2023).
Xu, W., Markley, S., Bronin, S. C. & Drogaris, D. A national zoning atlas to inform housing research, policy and public participation. Cityscape 25, 55–72 (2023).
Brinkley, C. & Stahmer, C. What is in a plan? Using natural language processing to read 461 California city general plans. J. Plan. Educ. Res. 44, 632–648 (2021).
Brinkley, C. & Wagner, J. Who is planning for environmental justice—and how? J. Am. Plan. Assoc. 90, 63–76 (2022).
D’ignazio, C. & Klein, L. F. Data Feminism (MIT Press, 2023).
Thomas, T., Ramiller, A., Ren, C. & Toomet, O. Toward a national eviction data collection strategy using natural language processing. Cityscape 26, 241–260 (2024).
Gromis, A. et al. Estimating eviction prevalence across the United States. Proc. Natl Acad. Sci. USA 119, e2116169119 (2022).
Nelson, K., Garboden, P., McCabe, B. J. & Rosen, E. Evictions: the comparative analysis problem. Hous. Policy Debate 31, 696–716 (2021).
Summers, N. & Steil, J. Pathways to eviction. Law Soc. Inq 50, 129–169 (2025).
Cai, M., Huang, H. & Decaminada, T. Local data at a national scale: introducing a dataset of official municipal websites in the United States for text-based analytics. Environ. Plan. B Urban Anal. City Sci. 50, 1988–1993 (2023).
Occhini, G. Who, What and Where (Univ. of Bristol, 2024).
Arts, S., Hou, J. & Gomez, J. C. Natural language processing to identify the creation and impact of new technologies in patent text: code, data and new measures. Res. Policy 50, 104144 (2021).
Ozgun, B. & Broekel, T. The geography of innovation and technology news—an empirical study of the German news media. Technol. Forecast. Soc. Change 167, 120692 (2021).
Axenbeck, J. & Breithaupt, P. Innovation indicators based on firm websites—which website characteristics predict firm-level innovation activity? PLoS ONE 16, e0249583 (2021).
Yan, B., Janowicz, K., Mai, G. & Gao, S. From ITDL to Place2Vec—reasoning about place type similarity and relatedness by learning embeddings from augmented spatial contexts. In SIGSPATIAL’17: Proc. 25th ACM SIGSPATIAL International Conference on Advances in Geographic Information Systems Vol. 35, 1–10 (ACM, 2017).
Spruyt, V. Loc2Vec: Learning Location Embeddings with Triplet-Loss Networks (Sentiance, 2018).
Woźniak, S. & Szymański, P. hex2vec: context-aware embedding H3 hexagons with openstreetmap tags. In GEOAI’21: Proc. 4th ACM SIGSPATIAL International Workshop on AI for Geographic Knowledge Discovery 61–71 (ACM, 2021).
Du, J., Chen, Y., Wang, Y. & Pu, J. Zone2Vec: distributed representation learning of urban zones. In Proc. 2018 24th International Conference on Pattern Recognition (ICPR) 880–885 (IEEE, 2018).
Sun, K., Hu, Y., Joseph, K. & Zhou, R. Z. GALLOC: a GeoAnnotator for Labeling LOCation descriptions from disaster-related text messages. Int. J. Geogr. Inf. Sci. 39, 1623–1653 (2025).
Mekala, D. & Shang, J. Contextualized weak supervision for text classification. In Proc. 58th Annual Meeting of the Association for Computational Linguistics 323–333 (ACL, 2020).
Occhini, G., Tranos, E. & Wolf, L. Measuring a country’s digital industrial structure: commercial websites and weakly supervised classification to the rescue. Preprint at SocArXiv https://doi.org/10.31235/osf.io/h572n (2023).
Singleton, A. D. & Spielman, S. Segmentation using large language models: a new typology of American neighborhoods. EPJ Data Sci. 13, 34 (2024).
Wu, J. et al. A survey on LLM-generated text detection: necessity, methods and future directions. Comput. Linguist. 51, 275–338 (2025).
Mellon, J. et al. Do AIs know what the most important issue is? Using language models to code open-text social survey responses at scale. Res. Polit. 11, 20531680241231468 (2024).
Park, J. S. et al. Generative agents: interactive simulacra of human behavior. In UIST’23: Proc. 36th Annual ACM Symposium on User Interface Software and Technology 1–22 (ACM, 2023).
Zheng, Z. & Sieber, R. Putting humans back in the loop of machine learning in Canadian smart cities. Trans. GIS 26, 8–24 (2022).
Author information
Authors and Affiliations
Contributions
J.R. conceived and designed the experiments. All authors contributed materials and analysis tools and wrote the paper.
Corresponding author
Ethics declarations
Competing interests
The authors declare no competing interests.
Peer review
Peer review information
Nature Cities thanks Julia Harten, Xinyu Fu and the other, anonymous, reviewer(s) for their contribution to the peer review of this work.
Additional information
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Reades, J., Hu, Y., Tranos, E. et al. The city as text. Nat Cities 2, 794–800 (2025). https://doi.org/10.1038/s44284-025-00314-x
Received:
Accepted:
Published:
Version of record:
Issue date:
DOI: https://doi.org/10.1038/s44284-025-00314-x


