Bridging urban theory and artificial intelligence: a multi-agent recommendation system for sustainable city development

Tong, Jiawei; Wang, Shuihua; Wang, Guangyu; Wang, Yan; Moraros, John

doi:10.1038/s42949-026-00377-2

Download PDF

Article
Open access
Published: 23 March 2026

Bridging urban theory and artificial intelligence: a multi-agent recommendation system for sustainable city development

Jiawei Tong^1,2,
Shuihua Wang¹,
Guangyu Wang³,
Yan Wang⁴ &
…
John Moraros¹

npj Urban Sustainability volume 6, Article number: 77 (2026) Cite this article

3117 Accesses
1 Citations
1 Altmetric
Metrics details

Subjects

Abstract

As cities increasingly rely on AI for sustainability challenges, a critical gap emerges: AI applications in urban planning and safety predominantly proceed without explicit guidance from established urban theories that have guided sustainable development for decades. Our analysis reveals that technology-driven research dominates the field, while problem-driven approaches addressing genuine urban needs remain minimal. To bridge this theory-practice disconnect, we develop a Large Language Model (LLM)-based multi-agent recommendation system [publicly available online] that realigns AI development with sustainable city principles. The system employs specialized agents to recommend appropriate theoretical frameworks, AI methods, and data sources for urban challenges, drawing from classical urban theories. Through diverse case studies, we demonstrate how our approach transforms technology-focused solutions into theory-grounded interventions that address sustainability’s interconnected dimensions. Our framework fundamentally shifts the question from “what can algorithms do?” to “what does this urban challenge require for sustainable outcomes?”—ensuring AI amplifies rather than replaces the theoretical wisdom essential for creating resilient, equitable, and livable cities that contribute to global sustainability targets.

Leveraging artificial intelligence to enable sustainable urban development through the creation of smart and environmentally friendly carbon-free cities

Article Open access 14 October 2025

Large language models in urban planning

Article 09 June 2025

A systematic map of machine learning for urban climate change mitigation

Article 29 September 2025

Introduction

Artificial intelligence (AI) offers unprecedented opportunities for optimizing urban systems as cities worldwide race to meet sustainability targets. Yet, our large-scale empirical analysis of 1123 AI applications in urban planning and safety across global contexts reveals a critical misalignment: current AI applications in urban safety and planning areas proliferate in a theoretical vacuum, prioritizing technical metrics over the social, environmental, and spatial dynamics essential for sustainable cities^1,2. Without grounding in established urban wisdom, even the most sophisticated AI system may inadvertently compromise the resilience, equity, and livability that future generations depend upon^3,4. Recent frameworks further demonstrate AI’s potential to operationalize sustainability targets through systemic approaches addressing emissions, inequality, and exclusion simultaneously⁵, yet such integration requires explicit theoretical grounding to avoid techno-solutionist pitfalls⁶.

This misalignment stems from AI development processes that bypass decades of urban theoretical knowledge. Established urban theories—from Jacobs’ social surveillance principles⁷ to Newman’s spatial security frameworks—demonstrate that sustainable cities emerge from integrated social-spatial-economic systems. But as AI proliferates in urban sustainability efforts—from carbon reduction to equitable resource distribution—few applications engage with this legacy. Comprehensive reviews of AI, IoT, and big data convergence in sustainable smart cities⁸ confirm this pattern: despite impressive technological sophistication, theoretical integration remains minimal. Consequently, these applications risk replicating a pattern that scholars have critiqued: optimising for efficiency while neglecting the holistic principles of sustainable urbanism^6,9,10.

Quantitative analysis confirms limited explicit theoretical integration. Among 1123 studies representing 151.6-fold growth since 2008, only 1.16% explicitly cite established urban theories by name. Nearly half (47.3%) are technology-driven, focused on algorithmic capabilities rather than urban crises (8.6%)¹¹. This imbalance demonstrates that urban AI research predominantly asks “what can AI do?” rather than “what do cities need for sustainable development?”—signaling a fundamental misalignment between how AI is being developed and how sustainable cities actually function^12,13. This misalignment has intensified dramatically over time, with particularly concerning implications for urban sustainability. Since 2020 alone, over 60% of all urban AI applications have emerged, coinciding with the proliferation of pre-trained models and accessible AI frameworks. The acceleration is striking: newer algorithms achieve near-instantaneous adoption. Contrastive Language-Image Pre-training (CLIP) and Generative Pre-trained Transformer 3 (GPT-3) were applied to urban problems within the same year of their release, reflecting intense pressure to demonstrate innovation regardless of sustainability outcomes¹⁴. Consequently, the field increasingly prioritizes algorithmic novelty over addressing genuine urban sustainability challenges, further widening the gap between technological capabilities and urban needs^15,16.

We address this gap through a theory-first AI recommendation framework that prioritizes urban sustainability requirements over algorithmic capabilities. This approach advances recent proposals for human-AI symbiosis in urban informatics¹⁷, positioning AI not as an autonomous solution but as a collaborative partner that amplifies human expertise. Specifically, our system employs five specialized agents working sequentially—from problem structuring to theory retrieval, algorithm matching, data selection, and integration validation. Through this process, the framework transforms the core question from “what can this algorithm do?” to “what does this urban challenge require?”¹⁸. To achieve this, the system maps alignments between 46 classical urban theories and contemporary AI methods, thereby generating recommendations that integrate sustainability’s social, environmental, and economic dimensions¹⁹. We validate this approach through three case studies spanning food security, urban heat mitigation, and disaster resilience. The results demonstrate how theory-grounded AI generates solutions aligned with Sustainable Development Goal (SDG) 11 targets, ultimately enabling cities to transcend technical performance metrics and achieve genuine sustainability outcomes^20,21.

Results

The technology-first paradigm

Our empirical investigation examines four key dimensions of AI adoption in urban planning and safety domains: algorithmic preferences, data source utilization, theoretical integration, and research motivations. Our systematic analysis reveals the mechanisms driving the technology-first paradigm. Such evidence establishes the foundation for understanding the theory-practice disconnect in urban AI applications.

We retrieved an initial corpus of 2797 papers from the Web of Science (WoS)²² and Scopus²³ databases using the targeted search query ("urban planning” OR “urban safety”) AND “AI”, with no time restriction, and performed initial data cleaning. After further data cleaning and duplicate removal, we conducted systematic screening to ensure quality. A BERT-based classifier performed domain-specific classification (see Supplementary Material 8, Table S10). We then filtered the results to peer-reviewed research articles²⁴, ultimately identifying 1123 genuine AI applications in the urban planning and safety domains. This BERT-based approach follows established practices in natural language processing²⁵, with similar methods recently applied to the classification of urban sustainability literature, achieving comparable accuracy levels²⁶. These 1123 validated papers form the foundation for examining the current state of AI integration in urban planning and safety management.

Four distinct algorithm groups and six data source categories emerged from clustering analysis. Dimensionality reduction using t-distributed Stochastic Neighbour Embedding (t-SNE) reveals distinct clustering patterns in both algorithmic approaches and data sources utilised in urban planning and safety AI research (Table 1, Fig. 1). The t-SNE method has proven effective for visualizing high-dimensional urban data structures^27,28, with recent theoretical advances supporting its reliability for cluster detection in complex datasets²⁹.

**Fig. 1: Algorithm and data source grouping visualisation using t-SNE dimensionality reduction.**

Table 1 Algorithm and data source grouping summary

Full size table

Machine Learning (ML) and Deep Learning methods (Group 2) dominate the algorithmic landscape, comprising 42.7% of all methods employed and forming the densest cluster (Fig. 1a)—a finding consistent with recent systematic reviews documenting the proliferation of ML approaches in urban applications^30,31. Traditional optimization and statistical approaches (Groups 0–1: Optimization & Decision Methods, and Statistical & Spatial Analysis) demonstrate established methodological foundations, while the emergence of Generative & Pre-trained Models (Group 3) reflects recent advances in AI capabilities for urban applications.

Environmental & Climate Data and Sensor & Internet of Things (IoT) Data (Groups 0–1) exhibit the highest utilization among data sources, accounting for 68.3% of all data usage (Table 1, Fig. 1b). This pattern reflects the maturity and accessibility of environmental monitoring systems and sensor networks in urban contexts^32,33. Composite Data, Regional & Experimental Data, and Image & Remote Sensing Data (Groups 3–5) account for only 19.2% of combined usage (Fig. 1b), despite all 46 theories requiring at least one of these categories at the recommended relevance threshold (≥0.6), with 69.6% requiring two or more. Meanwhile, Environmental & Sensor data (Groups 0–1) is the only grouping where observed usage exceeds theory-derived demand (68.3% vs. 58.7%), consistent with technology-driven selection favoring readily available automated sources³⁴. Geographic Information & Social Survey Data (Group 2) occupies an intermediate position, bridging traditional geospatial approaches with contemporary data collection methods. The difference in utilization between established environmental/sensor data sources and emerging composite data approaches was statistically significant (χ² = 156.4, p < 0.001).

Only 13 of 1123 papers (1.16%) demonstrated explicit theoretical integration by citing established urban planning or safety theories by name. This analysis reveals limited visible theoretical communication between AI applications and established urban frameworks—a pattern that recent critiques have identified as problematic for sustainable urban development³⁵. None of the 63 Random Forest applications analyzing urban spatial patterns reference spatial interaction theories. Similarly, Convolutional Neural Network (CNN) - based urban image analyses (n = 47) contain no citations to established visual assessment frameworks from urban design literature³⁶. Among the 13 papers that did engage with theory, 10 were authored by interdisciplinary teams including at least one urban planning or safety expert, suggesting the importance of cross-disciplinary collaboration³⁷. This theoretical vacuum is particularly concerning given the diversity of algorithmic approaches identified (Table 1), each of which requires domain-specific adaptation.

The distribution of research motivations (see Supplementary Material 3) (reveals a technology-first paradigm dominating urban planning and safety AI studies (Fig. 2a). Technology-driven papers constitute 47.3% of the corpus (531 papers), focusing on exploring new technological applications, algorithm development, and method transfer—a trend multiple reviews have identified as misaligned with urban planning needs^38,39. Method-driven research accounts for 14.0% (157 papers) and addresses the limitations of existing solutions through performance improvements or efficiency enhancements. In stark contrast, problem-driven research initiated by specific urban crises or real-world events comprises only 8.6% (97 papers)—examples include Beijing’s traffic congestion incidents, New York City’s flooding events, and Tokyo’s earthquake evacuation crises. General needs-driven studies account for 17.0% (191 papers) and address broad urban requirements without specific triggering events. The remaining 147 papers (13.1%) exhibit unclear or mixed motivations (Table 2).

**Fig. 2: Analysis of research paradigms in urban planning and safety AI studies.**

Table 2 Research motivation distribution in urban planning and safety AI studies (N = 1123)

Full size table

Algorithm selection criteria analysis reveals a distinct hierarchy in researchers’ priorities (Fig. 2b): technical novelty (59.0%), algorithmic performance (48.7%), computational efficiency (51.3%), ease of implementation (10.3%), domain-specific applicability (12.8%), and geographical generalizability (5.1%). This distribution demonstrates a clear preference for technical metrics over practical relevance in urban planning applications. Rigorous domain validation analysis reveals significant gaps in verification practices. Only 12.8% of studies conduct meaningful domain-specific verification. More critically, comprehensive multi-method validation is completely absent (0.0%). Geographical generalizability represents another critical gap. This factor is essential for urban planning algorithms intended for diverse global contexts, yet remains minimally addressed. Only 5.1% of studies conduct rigorous cross-city validation, while 97.4% fail to address cultural adaptability.

Validation of theoretical integration measurement

Our 1.16% finding captures explicit theoretical citations, but may not reflect the full extent of theoretical engagement. To evaluate this measurement’s validity and explore implicit theoretical foundations, we conducted a three-method validation protocol on a stratified subsample (n = 200, see Section “Validation of Theoretical Integration Measurement” and Supplementary Material 10). Automated classification using GPT-4-turbo to detect theoretical engagement from introduction sections achieved only 43.0% accuracy (86/200 correct; Cohen’s κ = 0.35) against expert coding. Among 114 misclassifications, primary errors included over-interpretation of technical descriptions (36 cases, 31.6%), missing implicit theoretical cues (32 cases, 28.1%), and surface citation misjudgment (29 cases, 25.4%). For example, the model classified Xiang et al.’s water resource management study⁴⁰ as “theory-driven” by misinterpreting sustainability terminology, while missing implicit travel behavior theory in Xin et al.’s mobility analysis⁴¹. This poor performance demonstrates why explicit citation criteria became necessary—automated semantic analysis cannot reliably detect domain-specific theoretical foundations at scale.

Team composition analysis revealed paradoxical patterns. Contrary to expectations, disciplinary diversity showed no significant correlation with theory-driven intensity (Pearson’s r = 0.08, p = 0.267; Spearman’s ρ = 0.09, p = 0.221). However, pure urban planning teams scored highest (0.65 ± 0.24), followed by interdisciplinary teams (0.39 ± 0.23) and pure computer science teams (0.24 ± 0.18). To further validate these team-level patterns and explore implicit theoretical engagement, semantic theoretical density (STD) analysis using an 847-term theoretical lexicon revealed moderate correlation with expert assessment (r = 0.62), slightly higher than explicit citation (r = 0.58), though not significantly different (Steiger’s Z = 0.54, p = 0.59). Critically, all three measurement approaches demonstrated identical rank ordering across team types (Table 3), with semantic density showing Pure Planning teams scoring 65.7% higher than Pure CS teams (0.58 vs 0.35; t = 6.83, p < 0.001, Cohen’s d = 1.34). Among papers lacking explicit citations (n = 165, 82.5%), semantic density varied substantially (range: 0.18–0.74, SD = 0.19). Manual review of top-quartile cases (n = 31) revealed that 87.1% (27/31) demonstrated sophisticated domain knowledge through problem framing and methodological choices.

Table 3 Three-method validation results: measurement convergence analysis (N = 200)

Full size table

Temporal evolution: from problem-pull to technology-push

To understand the drivers of AI adoption in urban planning and safety, we analyze the temporal gap between algorithm development and domain application across all 1123 papers. Our corpus captures self-labeled AI research–papers explicitly using “AI" terminology–rather than the complete history of computational methods in urban domains. The timeline visualization (Fig. 3) reveals an important distinction about algorithmic adoption. Rather than showing when methods first entered urban research, it captures when they became reframed under the AI label. Such reframing documents shifts in disciplinary discourse, consistent with Rogers’ diffusion of innovation theory⁴².

**Fig. 3: Temporal evolution of AI algorithm adoption in urban planning and safety research (2008–2025).**

Our analysis reveals a dramatic transformation in AI adoption patterns. Prior to 2008, merely 14 papers (0.7%) employed AI methods in urban planning and safety contexts. Following 2008, applications surge to 1109 papers (99.3%)–a 151.6-fold increase visible in Fig. 3a. This transformation coincides with the broader AI renaissance, driven by computational advances and data availability⁴³, rather than with emerging urban challenges requiring AI solutions. Critically, this growth reflects the proliferation of AI terminology and framing: many algorithms shown in Fig. 3 (particularly Groups 0–1) were established in planning practice decades earlier^44,45, their appearance here marking terminological repackaging rather than initial methodological introduction.

Temporal evolution unfolds in three distinct phases clearly delineated in Fig. 3. Critically, the 2008 inflection point labeled “AI Terminology Adoption Surge” in panel (a) marks systematic reframing under AI terminology following deep learning breakthroughs, not algorithm development or latency occurrence–this initiates Phase 1 (2008–2012), characterized by sparse, experimental adoption of existing algorithms. Subsequently, Phase 2 (2012–2017) witnesses a machine learning revolution penetrating urban domains following AlexNet’s breakthrough demonstration⁴⁶, exemplifying technology-push wherein innovations diffuse based on capability rather than domain need². Finally, Phase 3 (2018–2025) represents an unprecedented explosion visible in panel (b): 2020–2025 alone contributes over 60% of all applications, driven by pre-trained models and accessible AI frameworks.

The latency between algorithm development and urban application reveals fundamental patterns in technology transfer consistent with diffusion theory⁴². Rather than absolute latency values, what matters most is their dramatic compression across time: Group 3 algorithms demonstrate near-instantaneous adoption—CLIP achieves same-year development and application (2021), while GPT-3 enters urban planning within months of release⁴⁷. Such compression from decades (Groups 0–2) to essentially zero (Group 3), illustrated by converging lines in Fig. 3a, reflects intensifying pressure to demonstrate AI innovation regardless of domain relevance. Within-group heterogeneity (CV) decreases from 237% (G0) to 16% (G3) (Table 4), demonstrating convergence toward uniform, immediate adoption. Group-specific patterns shown in Fig. 3c–f further reveal mounting innovation pressure across the field.

Table 4 Algorithm adoption patterns in urban AI research

Full size table

Building on the latency patterns identified above, the 2020–2025 period represents an inflection point in algorithm density and diversity (Fig. 3b). The AlexNet moment of 2012^46,48—when deep learning achieves a 15.3% top-5 error rate compared to 26.2% for traditional methods—catalyzes widespread AI adoption across domains. This “winner-take-all” performance gap creates what innovation theorists call a “technology imperative”⁴², in which adoption is driven by the fear of being left behind rather than genuine utility.

Group 2 algorithms dominate the landscape with Random Forest (RF)⁴⁹, Support Vector Machines (SVM)⁵⁰, eXtreme Gradient Boosting (XGBoost)⁵¹, and Convolutional Neural Networks (CNNs)⁴³ (28) applied across disparate urban challenges, as detailed in Fig. 3e. The algorithm distributions reveal concerning patterns of methodological misalignment: CNNs designed for image recognition⁴³ are repurposed for non-visual planning problems, while natural language models process numerical safety metrics.

Group 3’s emergence, visualized in Fig. 3f, exemplifies the innovation imperative driving the field. Generative Adversarial Networks produce urban designs (17 applications) without engaging design theory, BERT analyzes planning documents while ignoring planning discourse analysis frameworks²⁵, and GPT models automate report generation absent planning communication principles. From 2021 to 2025, virtually every newly released AI algorithm finds immediate application in urban contexts, irrespective of theoretical fit or practical necessity—a pattern Rogers⁴² identifies as characteristic of hype-driven diffusion.

Simultaneously, Group 1 shows distinct patterns (Fig. 3d) with statistical and spatial analysis methods. While Group 0 maintains a steady presence (Fig. 3c), primarily in optimization tasks where mathematical foundations naturally align with planning objectives. These patterns, consistent with the latency characteristics shown in Table 4, demonstrate differential adoption dynamics across algorithm groups.

Bridging theory and practice: a computational mapping framework

To address the theory-practice disconnect identified in our analysis, we develop a computational framework that systematically maps urban theories to appropriate AI methods and data sources. Our approach transforms the question from “what can this algorithm do?” to “what theoretical principles should guide this urban intervention?” Through systematic analysis of 46 classical urban theories, we establish explicit connections between theoretical frameworks and computational approaches, providing a foundation for theory-driven AI development in urban contexts.

We employ NLP (Natural Language Processing) techniques to systematically identify and categorize 46 classical theories in urban planning and safety (see Supplementary Material 5, Table S6). Table 5 presents the most influential theories based on citation frequency and temporal persistence in the literature. Our computational linguistics framework extracts 127 distinct computational principles from the 46 urban theories (see Supplementary Material 5), revealing how classical theories inherently establish connections to modern AI methods. We select 11 representative theories spanning all four clusters for detailed mapping analysis against algorithms and data sources identified in the 1123 AI application papers (Fig. 4).

**Fig. 4: Theory-algorithm-data mapping framework (instance-level).**

Table 5 Most influential theories in urban planning and safety fields

Full size table

The mapping demonstrates how urban theories provide natural computational frameworks (Fig. 4). Taking CPTED as an illustrative example^52,53, its “natural surveillance” principle translates to G1 spatial analysis for viewshed calculations, G2 machine learning for activity pattern recognition, and G0 optimization for sight-line configurations, with alignments formalized through logical consistency constraints (Supplementary Material 4, Table S5). CPTED’s “territorial reinforcement” principle further requires diverse data sources—environmental data for lighting, geographic data for layouts, image data for surveillance, and experimental data from field observations. Yet current implementations reveal systematic simplification^54,55: at the instance level, G2 methods comprise only 36% of 89 algorithm applications from 47 CPTED papers, though paper-level analysis shows 72% employ at least one G2 method (Table S11)—indicating widespread but shallow adoption. More critically, only 18% of papers integrate three or more data types, substantially below the theory-derived demand of 78.3% for multi-modal integration. Neighborhood Unit theory^56,57 reveals complementary patterns: while Perry’s framework emphasizes service accessibility requiring socio-demographic data, current applications predominantly employ G1 statistical analysis (42% of papers) with environmental data (35%), reflecting research priorities focused on environmental assessment rather than population-centered optimization.

Case a: problem-driven research—urban food waste crisis

As illustrated in Fig. 5, the multi-agent system processes research inputs through three steps: theory identification, algorithm matching, and data source selection. We evaluate recommendations using five metrics (Table 6) that assess scenario complexity, theory-practice alignment, implementation feasibility, and solution robustness. Three representative cases demonstrate the system’s application across different research motivations: Case A (problem-driven) exemplifies the complete transformation process, while Cases B (method-driven) and C (technology-driven) highlight key variations. Importantly, these cases represent scenarios where preliminary assessment suggested AI exploration was warranted. The system does not claim universal AI applicability; many urban problems—particularly those centered on trust-building, political negotiation, or community empowerment—may require non-technical interventions outside our system’s scope.

Fig. 5: Multi-agent recommendation system architecture for urban scenario analysis (implemented as publicly accessible web application at [
https://llm-based-multi-agent-recommendation-system-eljs4zovvgefco8gzj.streamlit.app/

]). — Fig. 5: Multi-agent recommendation system architecture for urban scenario analysis (implemented as publicly accessible web application at [https://llm-based-multi-agent-recommendation-system-eljs4zovvgefco8gzj.streamlit.app/]).

Table 6 Key metrics for theory-driven recommendations

Full size table

A major US city faces the paradox of discarding 10,000 tons of edible food annually⁵⁸ while significant populations experience food insecurity. Consequently, the original approach employs continuous approximation methods to optimize vehicle routing, yet focuses solely on minimizing transportation costs. Initially, the Scenario Analyzer Agent decomposes the food waste crisis into seven structured dimensions. This decomposition reveals multiple stakeholders (donors, recovery organizations, vulnerable populations), temporal dynamics, and spatial considerations. As a result, the complexity score ξ(S) = 0.82 indicates a multi-faceted challenge requiring integrated solutions. Subsequently, the Theory Retriever Agent identifies Urban Metabolism Theory^59,60 (σ = 0.91), which conceptualizes food waste as systemic flow disruption, alongside Environmental Justice Theory⁶¹ (σ = 0.87), which reveals inequitable access patterns. Together, these complementary frameworks guide the design of equitable resource circulation.

Building on these theoretical foundations, the Algorithm Matcher Agent maps requirements to computational methods: specifically, Multi-objective Optimization with Decomposition (MOEA/D)⁶² for efficiently balancing multiple competing objectives through problem decomposition, Graph Neural Networks (GNN)⁶³ for spatial equity analysis, and Temporal LSTM⁶⁴ for demand prediction. Consequently, the integrated framework achieves a capability score cap(A, r) = 0.89. In parallel, the Data Source Selector Agent prioritizes social vulnerability indices⁶⁵ (Q = 0.92), followed by food supply data (Q = 0.85), environmental data (Q = 0.81), and transportation networks (Q = 0.78). This expansion beyond routing data therefore encompasses equity-relevant sources critical to addressing food insecurity. Finally, the Integration Validator Agent achieves robustness score $R=0.91 > {R}_{\min }=0.85$ through Monte Carlo simulations. Moreover, expert review confirms effective integration of efficiency and equity objectives, particularly highlighting the vulnerability-weighted distribution mechanism that ensures food reaches high-need populations. Table 7 summarizes this transformation from technical optimization to systemic solution.

Table 7 From technical optimization to systemic solutions in urban food waste management

Full size table

Case B: method-driven research—urban heat island prediction

Researchers addressing limitations in Urban Heat Island (UHI) prediction initially focus on improving accuracy from R² < 0.8 to R² = 0.95 using new stereoscopic urban morphology metrics. Recent studies have demonstrated the effectiveness of 3D urban morphological indicators⁶⁶ and building volume information⁶⁷ combined with XGBoost models, showing superior performance in predicting land surface temperature⁶⁸ and analyzing urban heat island drivers⁶⁹. Our system identifies two relevant theoretical frameworks to expand this approach. Urban Climate Theory⁷⁰ (σ = 0.93) provides a foundational understanding of heat island formation⁷¹, while Compact City Theory⁷² (σ = 0.82) reveals that compact cities, despite their sustainability goals, face environmental quality and heat stress challenges. This theoretical grounding leads to algorithmic integration. Physics-Informed Neural Networks (PINN) embed thermodynamic constraints by incorporating physical laws into neural network training⁷³, making them suitable for heat transfer problems⁷⁴ and scientific machine learning applications⁷⁵. Spatial-GCN captures neighborhood heat interactions through graph-based spatial-temporal modeling^76,77, while SHAP⁷⁸ provides interpretability through game-theoretic explanations.

Data sources expand significantly beyond basic morphology and meteorology to include social vulnerability data, recognizing that heat exposure disproportionately affects vulnerable populations^79,80, as well as dynamic urban factors. This transformation elevates a narrow focus on accuracy into a comprehensive planning tool that generates vulnerability maps and actionable design guidelines, shifting the focus from technical performance to practical urban interventions (see Supplementary Material 6, Table S7).

Case C: technology-driven research—AI applications in disaster management

A GRU-CNN architecture for urban applications is reframed by our system, which identifies Urban Resilience Theory (σ = 0.94) as the primary framework^81,82. This theory emphasizes systemic reconfiguration and collective agency–principles that guide our multi-method integration. Based on this framework, the system recommends augmenting GRU-CNN with three complementary approaches. Agent-based modeling (ABM) enables population response simulation⁸³, Network Analysis captures infrastructure interdependencies⁸⁴, and Reinforcement Learning supports adaptive strategies⁸⁵. To support these integrated methods, data requirements expand from basic time-series and hazard maps to include infrastructure networks, social communication patterns, and historical event data. The integration of GRU-CNN architecture has shown promising results in urban environmental monitoring⁸⁶.

Beyond technical enhancements, this transformation fundamentally redefines the technology’s role in urban contexts. The system shifts from a capability demonstration to a community-centered resilience platform. More importantly, residents become active participants in resilience-building rather than passive recipients of alerts. Accordingly, system functions extend to cascading failure prediction⁸⁷, evacuation modeling, and resource optimization. Table 8 illustrates this comprehensive transformation (see Supplementary Material 7, Table S8 and Supplementary Material 6, Table S7).

Table 8 From technology showcase to community resilience platform

Full size table

Discussion

Only 1.16% of papers in urban planning and safety AI applications demonstrate explicit theoretical integration by citing established urban theories (see Fig. 4). This striking gap is not merely a citation oversight but reflects three deeply interconnected systemic barriers that perpetuate the theory-practice disconnect. First, cross-domain expertise remains critically scarce. Among 948 papers with identifiable author backgrounds, merely 11.7% of first authors possess knowledge spanning both AI/computing and urban domains. Furthermore, only 33.3% of research teams include members from multiple disciplines³⁵. This expertise gap creates research environments where algorithmic capabilities dominate problem framing, often at the expense of urban theory considerations. Second, and closely related, data availability increasingly dictates research priorities rather than theoretical importance.

Traditional Urban and Geospatial Data (Groups 0–1) dominate at 68.3%, while emerging data sources remain largely untapped at 19.2%—substantially below the theory-derived demand, where all 46 theories require at least one such source. This imbalance creates what we term “data opportunism”-a phenomenon where research questions arise from measurability rather than theoretical significance⁸⁸. In other words, readily accessible datasets determine which urban problems researchers address, rather than theories driving data collection. For instance, abundant sensor data shapes traffic optimisation studies, while harder-to-measure social equity dimensions receive less attention despite their theoretical significance. Third, specialised research communities develop implicit domain expertise that remains invisible to cross-disciplinary evaluation. Domain expertise becomes highly context-dependent, acquired through sustained professional practice rather than formal articulation^89,90. Travel behaviour researchers, for example, may apply decades of accumulated knowledge about built environment-transport relationships without explicitly citing spatial interaction theories. Such sophisticated theoretical understanding remains invisible to automated detection or evaluation by researchers from other disciplines⁹¹.

These three barriers do not operate in isolation; rather, they collectively create a self-reinforcing cycle that significantly impedes interdisciplinary communication. Knowledge structures across disciplines exhibit substantial heterogeneity, leading to ambiguity and misunderstandings when computer scientists collaborate with urban planners^92,93. When domain expertise operates implicitly, the theoretical foundations underlying apparently technical work remain obscured⁹⁴. Indeed, our team composition analysis reveals a counterintuitive pattern that quantifies this challenge. Pure urban planning teams score highest on theoretical engagement (0.65), while interdisciplinary teams achieve only moderate scores (0.39). This paradox demonstrates that theoretical knowledge does not automatically transfer across disciplinary boundaries–diverse teams require intentional strategies to bridge epistemological gaps. Combined, these factors produce a self-reinforcing cycle: scarce interdisciplinary expertise, data-driven research design, and invisible theoretical communication collectively ensure that technical accessibility determines research trajectories rather than theoretical importance⁹⁵. As a result, the field consequently exhibits technical sophistication while remaining theoretically impoverished, failing to engage with complex socio-spatial realities that urban theories have long addressed and ultimately limiting both real-world impact and broader relevance⁹⁶.

Building on these structural barriers, we now examine how they manifest in researchers’ motivations and methodological choices. Two interconnected patterns explain why 67.8% of articles pursue technology-driven research while only 1.3% address concrete urban problems (Table 2). First, data accessibility creates a self-reinforcing cycle that privileges certain research directions. Traditional and Geospatial Data dominate at 68.3% (exceeding theory-derived demand of 58.7%), while Urban-generated Data accounts for only 19.2% (Fig. 1b), as readily available datasets enable shorter research cycles^97,98. In contrast, emerging sources–social media, mobile trajectories, crowdsourced platforms–require institutional partnerships, ethical approvals, and extended timelines that exceed standard project cycles^99,100. Consequently, analytical frameworks favor static administrative data while dynamic user-generated data are less frequently incorporated.

Second, the 2020–2025 period intensified this technology-first orientation through unprecedented adoption speed. During these years, 60% of all applications emerged (Fig. 3b), with algorithms like CLIP and GPT-3 achieving the same-year development and application. Notably, algorithm selection criteria reveal misaligned priorities: performance (37.4%) and novelty (26.0%) drastically outweigh domain applicability (8.2%). This misalignment leads to methodological mismatches–CNNs designed for image recognition process non-visual planning data, while NLP models analyze numerical safety metrics. Moreover, Group 3 algorithms demonstrate mean latency of merely 13.2 years compared to 48.1 years for traditional methods (Table 4), reflecting a rush to adopt cutting-edge techniques regardless of their appropriateness. These combined patterns create problematic selection effects favoring researchers skilled in rapid implementation over those focused on urban problem-solving¹⁰¹. In practice, this means researchers optimize for publishability rather than urban impact. They pursue datasets already available rather than those theoretically necessary. They adopt algorithms based on recency rather than appropriateness. This self-reinforcing cycle ultimately produces a field rich in technical demonstrations but poor in genuine urban problem-solving, where technical accessibility determines research trajectories rather than theoretical importance.

Having diagnosed the problem and its causes, we now turn to evidence that theoretical integration can fundamentally transform research outcomes. Our case studies reveal three fundamental transformations when theoretical grounding guides AI research, demonstrating practical pathways to break the technology-push cycle. First, theory expands the scope of solutions from narrow optimization to systemic intervention. Case A’s food waste routing initially minimized costs ($\min \sum {c}_{ij}{x}_{ij}$), but it ignored food insecurity–a critical urban challenge. When Urban Metabolism and Environmental Justice theories were integrated, this narrow objective was reframed into a comprehensive food security framework. Specifically, data sources expanded from routes to vulnerability indices (2 → 4 categories), algorithms evolved from single-method to integrated frameworks (MOGA + GNN + LSTM), and metrics shifted from efficiency-only to multi-dimensional measures including equity and emissions^102,103. Second, theory transforms technical gains into actionable planning tools. Case B initially pursued accuracy enhancement (R²: 0.8 → 0.95) until Urban Climate Theory introduced physical constraints via PINN. When combined with Compact City and Sustainable Design theories, outcomes evolved from temperature predictions to vulnerability-aware planning guidelines with interpretable interventions–theory added not accuracy but actionability^104,105. Third, the theory reverses the innovation direction from technology-push to need-pull. Case C’s GRU-CNN seeking applications became a community resilience platform through Urban Resilience Theory. This transformation involved architecture expansion (adding ABM, Network Analysis, RL), data diversification (2 → 4 types), and critically, residents shifting from alert recipients to resilience co-creators^106,107. These transformations share a common pattern: theoretical grounding fundamentally redefines problems rather than merely improving solutions. Across all cases, robustness scores exceeded 0.85, confirming that theoretical frameworks enhance rather than constrain performance. More importantly, theory-driven approaches generated value beyond technical metrics–Case A addressed food equity alongside efficiency, Case B produced planning prescriptions rather than predictions, and Case C enabled community participation instead of passive monitoring. This paradigmatic shift demonstrates that integrating urban theories into AI development produces outcomes simultaneously more comprehensive, actionable, and aligned with genuine urban needs than purely technical approaches^88,108.

While our study provides comprehensive evidence of the theory-practice disconnect, several limitations warrant acknowledgment. First, our 1.16% finding captures only explicit theory citations; semantic density analysis reveals that 87.1% of high-density papers demonstrate domain knowledge invisible to citation-based detection, suggesting our measurement prioritizes transparency over capturing tacit expertise. Second, our coverage has geographical boundaries—the literature search may miss non-English publications, and the 46-theory knowledge base underrepresents Global South perspectives¹⁰⁹. Third, the recommendation system operates conditionally, assuming users have determined AI approaches warrant exploration. Many urban problems involving community trust or political negotiation may be ill-suited to technical optimization; built-in safeguards (σ < 0.70, R < 0.85) can signal such cases, but recommendations indicate how AI could be applied, not that it should be. Fourth, technical constraints remain: the system’s keyword-based classification of research objectives may inadequately capture nuanced analytical goals, BERT classifier accuracy (80.85%) suggests potential misclassifications, and case studies represent conceptual demonstrations rather than real-world implementations¹¹⁰.

Despite these limitations, our research makes three significant contributions that advance both theoretical understanding and practical implementation. Empirically, we quantify barriers to theoretical integration (1.16% explicit rate, 47.3% technology-driven research) across 1123 papers, establishing measurable baselines for tracking progress^3,111. Methodologically, our multi-agent system demonstrates how LLMs can facilitate interdisciplinary knowledge integration, addressing expertise gaps affecting 88.3% of research teams^112,113. Theoretically, we extract 127 computational principles from urban theories, revealing their inherent algorithmic compatibility and challenging assumptions about incompatibility between traditional wisdom and computational methods. Looking forward, future research should pursue three interconnected directions to build on these foundations. First, expand theoretical coverage by incorporating Global South planning theories, indigenous urban knowledge, and emerging post-pandemic frameworks^8,17 while integrating diverse research databases beyond traditional academic sources. Concretely, this could involve partnering with institutions like UN-Habitat to establish Global South theory databases and conducting systematic reviews in non-English journals. Second, validate real-world impact through comparative studies of theory-grounded versus purely technical AI applications, tracking implementation success, sustainability outcomes, and community acceptance¹¹⁴. Such validation studies could follow pilot projects over multi-year periods to assess long-term adoption and adaptation patterns. Third, foster institutional change by working with funding agencies, journals, and educational institutions to develop evaluation criteria and training programs that value theory-practice integration^110,115. This might include establishing new review standards that explicitly assess theoretical grounding, creating interdisciplinary PhD programs, and incentivizing cross-sector collaborations. As cities confront unprecedented sustainability challenges, these efforts can ensure AI amplifies rather than replaces the accumulated wisdom of how cities actually work-and how they might work better for all their inhabitants.

Methods

Our methodological framework integrates diagnostic analysis with solution development. We first construct dual knowledge bases of urban theories and AI research papers, then measure theoretical integration and classify research motivations to quantify the theory-practice disconnect. We then develop a theory-algorithm-data mapping framework and a multi-agent recommendation system that bridges theoretical principles with AI implementations. Three case studies demonstrate practical applications across distinct research motivation patterns. We detail each component below.

Data on traditional urban theories

We construct a comprehensive knowledge base of urban planning and safety theories through a streamlined extraction process (Fig. 6), building on recent advances in theory formalization^35,116. Theory selection integrates three complementary criteria (see Supplementary Material 5): foundational theories with over 500 citations, emerging theories with over 50 citations annually, and expert-identified essential theories, regardless of metrics¹¹⁷-ensuring comprehensive coverage across established and culturally diverse perspectives. Guided by these criteria, our corpus includes: (1) highly-cited papers from Web of Science (WoS) and Scopus and Google Scholar using queries ("urban planning” OR “urban safety”) AND ("theory” OR “framework”), (2) works by seminal authors (Jacobs, Lynch, Newman, etc.) from 1960 to 2024, (3) 15 standard urban planning textbooks, and (4) professional guidelines from organizations like the American Planning Association (see Supplementary Table 6).

**Fig. 6: Workflow for constructing the urban theory knowledge base.**

NLP implementation employs spaCy’s en_core_web_lg model (v3.4) for Named Entity Recognition to identify theory names through capitalized noun phrases preceded by theory-indicating terms and author-attributed concepts. Core principle extraction combines BERT embeddings (bert-base-uncased) with TF-IDF scoring, ranking sentences by combined metrics, and applying dependency parsing for grammatical completeness. A multi-label tagging system^118,119 preserves theoretical complexity by maintaining associations across spatial, social, safety, and economic dimensions with relevance weights (0–1 scale). Complete taxonomy with semantic keyword clusters for each dimension is detailed in Supplementary Material 1, Table S1. The PostgreSQL-based knowledge base with flexible JSON storage enables semantic similarity matching between urban challenges and relevant theories^120,121.

Data on urban research literature

To capture the current landscape of AI applications in urban planning and safety domains, we conduct a comprehensive literature search through the WoS and Scopus databases. Our search strategy employs two primary queries—“Urban Safety” AND “AI” and “Urban Planning” AND “AI”—without temporal restrictions to ensure comprehensive coverage of the field’s evolution. This unrestricted temporal coverage aligns with recent systematic reviews of urban AI applications^16,122,123, enabling us to trace the full trajectory of AI adoption in urban planning and safety contexts. The initial search retrieved 2797 papers, which underwent systematic screening to establish the final corpus of 1123 validated AI applications (Fig. 7).

**Fig. 7: Literature Screening Process.**

This keyword-based approach introduces systematic temporal bias that warrants acknowledgment. The term “AI” gained widespread use primarily after 2010, particularly following deep learning breakthroughs that catalyzed the adoption of AI terminology across disciplines^43,46. Consequently, our corpus excludes earlier algorithmic applications in urban research–including regression models, optimization techniques, spatial statistics, and early neural networks–that were not explicitly labeled as “AI”^44,45. Our analysis, therefore, captures the phenomenon of AI adoption as disciplinary framing rather than a comprehensive computational history, documenting when established methods became repackaged under AI terminology. This intentional scope limitation enables our core contribution: revealing how contemporary AI-labeled research exhibits a technology-first orientation regardless of algorithmic age.

A rigorous two-stage screening process combines human expertise with machine learning capabilities to evaluate the collected literature (Fig. 7). Domain experts first assess each paper’s relevance to genuine AI urban applications, after which a BERT-based classifier performs automated screening with NLP pipeline configurations detailed in Supplementary Material 2, Table S3. The BERT-large-uncased model was fine-tuned with learning rate 2e-5, batch size 16, training epochs 5, using AdamW optimizer with binary cross-entropy loss, achieving 80.85% accuracy (see Supplementary Material 8, Table S10) across 5-fold cross-validation (mean accuracy: 0.8085 ± 0.0467, precision: 0.7825 ± 0.0841, recall: 0.7188 ± 0.0470). Building on established methods for automated article classification in systematic reviews²⁴, our approach incorporates domain-specific adaptations^124,125. Recent advances in machine learning for systematic reviews have demonstrated that such hybrid approaches can significantly reduce workload while maintaining high accuracy^126,127.

Following the screening process, a novel dual LLM expert framework systematically analyzes the refined corpus (Fig. 8). Two independent LLMs work in parallel to extract AI algorithms and data sources from each paper, with human experts subsequently validating the outputs to ensure accuracy. Drawing inspiration from recent advances in generative information extraction¹²⁸ and LLM applications in computational social science¹²⁹, this dual-model architecture addresses single-model biases and enhances extraction reliability-critical considerations highlighted in recent LLM security and privacy literature¹³⁰. Semi-automated validation processes balance automation efficiency with accuracy requirements, creating a robust extraction pipeline that ensures replicability (see Supplementary Material 8, Table S9).

**Fig. 8: Dual LLM expert analysis framework for urban safety AI research.**

Algorithm and data source grouping

To organize the extracted algorithms and data sources into interpretable categories, we implement a four-stage transformation pipeline (Table 9, Fig. 1). First, we aggregate contextual information by concatenating descriptive text from all papers mentioning each item-for instance, “Random Forest” contexts from 84 papers produced approximately 500 tokens describing ensemble learning principles and urban applications. Building on this textual foundation, we employ sentence-transformers (all-mpnet-base-v2)¹³¹ to encode all contexts into 768-dimensional embeddings. Critically, BERT¹³² operates on aggregated text rather than directly on LLM JSON outputs-LLMs extract structured metadata while BERT encodes semantic meanings for clustering. To enable visualization and clustering, we then reduce embeddings to 2D coordinates via t-distributed Stochastic Neighbor Embedding²⁷ using standard parameters (perplexity = 30, learning rate = 200, iterations = 1000, random state = 42). Finally, DBSCAN¹³³ with ε = 0.5 and min_samples = 5 identify initial clusters, which three domain experts-representing urban planning, computer science, and data science perspectives-iteratively refine through four consensus rounds based on methodological coherence and domain interpretability. This hybrid process produce four algorithm groups (G0–G3) and six data source groups (G0–G5, Table 1). External validation involve two independent researchers classifying stratified samples, with detailed agreement metrics and cluster quality measures documented in Supplementary Material 9 alongside complete technical specifications.

Table 9 Seven-stage pipeline: from LLM extraction to validated grouping

Full size table

Validation of theoretical integration measurement

Our explicit citation-based measurement prioritizes transparency and replicability, yet may not capture domain knowledge operating without explicit citations⁸⁹. To evaluate whether this approach underestimates implicit theoretical engagement, we implement a three-method validation protocol on a stratified subsample (n = 200). The sample oversamples papers with explicit citations (35 papers, 17.5% vs. 1.16% in the full corpus) to enable robust statistical comparison while maintaining diversity across algorithm types, publication years, and team compositions.

We apply three complementary approaches. First, LLM-based classification using GPT-4-turbo (temperature = 0.2) analyzes introduction sections to test whether automated semantic analysis can detect theoretical engagement beyond explicit citations, with accuracy assessed against expert coding using Cohen’s kappa. Second, team composition analysis quantifies disciplinary diversity using Shannon’s entropy ${H}_{{\text{norm}}}=-{\sum }_{i=1}^{k}{p}_{i}{\text{ln}}{(}{p}_{i}{)}/{\text{ln}}{(}k{)}$ and tests its correlation with theory-driven intensity scores (0–1 scale) assigned by an interdisciplinary expert panel (n = 3). Third, semantic theoretical density (STD) measures implicit engagement through proximity to an 847-term theoretical lexicon derived from our 46-theory knowledge base. Using sentence-transformers (all-mpnet-base-v2)¹³¹, we compute:

$${{\rm{STD}}}_{i}=\frac{1}{N}\mathop{\sum }\limits_{j=1}^{N}\mathop{\max }\limits_{k\in L}\left[\,{\rm{sim}}\,({s}_{ij},{t}_{k})\times {w}_{k}\right]$$

(1)

where s_ij are sentence embeddings, t_k are lexicon term embeddings, w_k are category weights (ranging from 1.0 for canonical theory constructs to 0.6 for causal mechanisms), and sim(⋅) is cosine similarity. Lexicon construction, hierarchical weighting scheme, and validation protocol are detailed in Supplementary Material S10.

Measurement convergence is assessed through: (1) pairwise correlations between explicit citation (binary), semantic density (continuous), and expert assessment (continuous); (2) Steiger’s Z-test comparing correlation strengths; and (3) cross-team consistency examining rank ordering across team types (pure CS, pure Planning, interdisciplinary). This protocol addresses the challenge of balancing transparent measurement with sensitivity to context-dependent expertise^134,135. All analyses are conducted in R 4.3.1 (α = 0.05, two-tailed), with code available in our public repository.

Research motivation classification

To uncover the driving forces behind urban AI research, we employ automated pattern matching on abstracts, titles, and introductions^136,137 to classify papers into four categories: (1) problem-driven-studies initiated by specific real-world crises (e.g., “urban flooding occurred”); (2) method-driven-research motivated by limitations of current solutions (e.g., “existing methods are insufficiently accurate”); (3) technology-driven-studies exploring novel technological applications (e.g., “deep learning can be applied to flood prediction”); and (4) general needs-driven-research addressing broad urban requirements without specific triggers. This taxonomy extends beyond traditional technology-driven versus problem-driven dichotomies¹³⁸, capturing nuanced distinctions between crisis-responsive and solution-improvement research.

Classification employs case-insensitive regex matching with contextual analysis (see Supplementary Material 3). Problem-driven patterns include crisis indicators ("occurred,” “emerged”) and temporal urgency markers; method-driven patterns capture performance critiques ("low accuracy,” “computationally expensive”) and comparative language; technology-driven patterns identify innovation phrases ("recent advances in,” “newly developed”) and technology-subject grammatical structures. Manual validation on 300 sampled papers achieved 85% inter-rater agreement (see Supplementary Material 3). For papers exhibiting multiple patterns, a semantic dominance hierarchy (Problem ≻ Method ≻ Technology ≻ General) determines classification based on priority zone analysis of titles and opening sentences. Papers without clear matches are assigned to “Unclear/Mixed” (147 papers, 13.1%) rather than forced into categories. Complete protocols appear in Supplementary Material 3. This framework aligns with recent advances in automated systematic review classification^139,140 and has been validated against manual coding using established text classification methods^141,142.

Theory-algorithm-data mapping construction

We develop a multi-method mapping framework that systematically connects urban theories, AI algorithms, and data sources by combining computational linguistics with knowledge engineering approaches^143,144. Building on recent advances in domain-specific knowledge graph construction^145,146, our framework adapts these methods for urban planning contexts through three core processes. First, NLP techniques extract computational requirements from theoretical principles^147,148; for instance, CPTED’s “maintain clear sightlines" translates to “visibility analysis" requirements. Second, algorithm capability profiles emerge from multiple sources, leveraging LLM-based knowledge extraction¹⁴⁹ with expert validation. Third, probabilistic co-occurrence models¹⁵⁰ identify data requirements by detecting algorithm-data source mention patterns within urban planning literature, thereby revealing implicit theory-implementation relationships^151,152.

The Stanford CoreNLP pipeline processes theory texts using PTBTokenizer, statistical sentence splitter, maximum entropy POS tagger, rule-based lemmatizer, SR parser, and neural dependency parser. Algorithm capability assessment evaluates seven dimensions (see Supplementary Material 4, Table S4): spatial analysis, temporal analysis, pattern recognition, prediction, classification, optimization, and real-time processing (scores 0–1). Association rule mining employs the Apriori algorithm with a minimum support of 0.05, a minimum confidence of 0.7, a maximum itemset size of 4, a lift threshold of >1.2, and a window size of 5 sentences. Pointwise Mutual Information quantifies theory-algorithm co-occurrence: PMI(theory, algorithm) = log[P(theory, algorithm)/(P(theory) × P(algorithm))]. To ensure mapping quality, a three-stage validation protocol combines statistical measures, expert annotation, and logical consistency checks^153,154, while acknowledging both technical and social assessment dimensions inherent to urban planning’s interdisciplinary nature¹⁵⁵. Final compatibility scores combine PMI (30%), expert scores (50%), and consistency checks (20%) through weighted averaging based on expert consensus¹⁵⁶, creating a robust bridge between theoretical principles and practical AI implementations.

For empirical analysis of the resulting mappings, we employ two complementary statistical approaches: instance-level distribution analyzes the proportion of algorithm instances across categories (Fig. 4), providing insights into implementation patterns, while paper-level distribution quantifies the proportion of papers employing each algorithm or data type (Supplementary Material 11, Table S11), revealing adoption patterns across the research community. This dual-level analysis enables a comprehensive understanding of both technical implementation choices and research practice trends.

Theory-driven multi-agent recommendation system

To bridge the theory-practice gap identified in our analysis, we propose a multi-agent recommendation system comprising five specialized agents that collaborate via asynchronous coordination^157,158,159. Each agent contributes domain-specific expertise through directed information flow, generating theory-grounded recommendations.

According to system scope and entry conditions, the system is designed for users who have preliminarily determined that AI-based approaches merit exploration. It does not assess whether AI is the most appropriate intervention—a determination requiring consideration of stakeholder preferences, institutional capacity, and the fundamentally social nature of many urban problems. Rather, the system transforms the question from “what can this algorithm do?” to “what theoretical principles should guide this intervention if AI is pursued?” Users encountering persistent low-similarity scores (σ < 0.70), capability gaps, or robustness failures (R < 0.85) should interpret these as signals warranting reconsideration of AI appropriateness.

Our design relies on three assumptions: (i) theoretical requirements exhibit sufficient independence for additive aggregation (∣ρ∣ < 0.35 for 94% of requirement pairs); (ii) expert knowledge can be reliably elicited (ICC > 0.80); and (iii) scenario perturbations follow bounded uniform distributions reflecting real-world planning uncertainties.

Scenario Analyzer Agent transforms unstructured challenge descriptions into structured representations across seven dimensions: domain, objectives, constraints, stakeholders, temporal scope, spatial boundaries, and data characteristics. Critically, the agent also extracts research objective type (prediction, explanation, optimization, or classification), which shapes algorithm matching independently of theoretical alignment. Scenario complexity is quantified as:

$$\xi (S)=\mathop{\sum }\limits_{i=1}^{7}{w}_{i}\cdot {c}_{i}(S)$$

(2)

where w_i are importance weights and c_i(S) ∈ [0, 1] are normalized measures. Table 10 provides specifications; complete operationalization with worked examples across three case studies appears in Supplementary Material 12, Table S12.

Table 10 Scenario complexity scoring specifications

Full size table

Theory Retriever Agent employs BERT-based semantic matching (bert-base-uncased) to identify applicable theories^131,132. Theory-scenario alignment is quantified as:

$$\sigma ({T}_{i},S)=\cos ({{\bf{e}}}_{{T}_{i}},{{\bf{e}}}_{S})$$

(3)

where ${{\bf{e}}}_{{T}_{i}},{{\bf{e}}}_{S}\in {{\mathbb{R}}}^{768}$ are embeddings from the BERT [CLS] token. Theories with σ > 0.70 are relevant; σ > 0.85 indicates strong alignment. For complex challenges (ξ > 0.7), the agent identifies complementary theory combinations^160,161.

Algorithm Matcher Agent formulates selection as constrained optimization^162,163. Given candidates ${\mathcal{A}}=\{{A}_{1},\ldots ,{A}_{m}\}$ and requirements ${\mathcal{R}}=\{{r}_{1},\ldots ,{r}_{n}\}$:

$${A}^{* }=\arg \mathop{\max }\limits_{A\in {\mathcal{A}}}\left\{\mathop{\sum }\limits_{i=1}^{n}{w}_{i}\cdot {\rm{cap}}(A,{r}_{i})-\lambda \cdot {\rm{cost}}(A)\right\}$$

(4)

subject to ∑w_i = 1, w_i≥0, and cap(A, r_i)≥0.70 for critical requirements. Here, λ = 0.15 balances cost-performance trade-offs, and cost(A) ∈ [0, 1] combines computational time (60%) and memory (40%). The capability function:

$${\rm{cap}}(A,{r}_{i})=0.5\cdot {{\rm{PMI}}}_{{\rm{norm}}}(A,{r}_{i})+0.3\cdot {\rm{Expert}}(A,{r}_{i})+0.2\cdot {\rm{Consist}}(A,{r}_{i})$$

(5)

where PMI_norm is normalized pointwise mutual information, Expert scores derive from 8 specialists (ICC = 0.84), and Consist enforces logical consistency with mandatory thresholds for specific semantic tags (Supplementary Material 4, Table S5)^164,165.

Data Source Selector Agent evaluates sources using multiplicative aggregation^166,167:

$$Q({D}_{i})=\mathop{\prod }\limits_{j=1}^{6}{q}_{j}{({D}_{i})}^{{w}_{j}}$$

(6)

where weights are: relevance (w₁= 0.25), quality (w₂= 0.20), temporal coverage (w₃= 0.18), accessibility (w₄= 0.15), reliability (w₅= 0.12), and compatibility (w₆= 0.10). Critically, relevance (q₁) is computed dynamically via BERT semantic similarity between data source descriptions and scenario requirements—identical sources receive different scores across urban challenges (Supplementary Material 12, Table S13). Accessibility scores range from 1.0 (public datasets) to <0.5 (restricted sources). The system recommends data categories with documented precedent in urban research rather than hypothetical sources, with final selection requiring human validation.

Integration Validator Agent implements two-stage validation^168,169. Monte Carlo simulations (N = 1000) assess stability:

(7)

where τ = 0.70 is the performance threshold and ${R}_{\min }$= 0.85 the approval threshold. The performance function:

$${\rm{perf}}(S)=0.40\cdot {{\rm{Theory}}}_{{\rm{align}}}+0.30\cdot \min \{{{\rm{Alg}}}_{{\rm{cap}}},{{\rm{Data}}}_{{\rm{qual}}}\}+0.30\cdot {{\rm{Expert}}}_{{\rm{assess}}}$$

(8)

Each iteration applies bounded perturbations (±0.15) to 2–3 randomly selected dimensions, representing the 90th percentile of variations in 200 real-world projects. Solutions exceeding ${R}_{\min }$ advance to human validation through uncertainty sampling¹⁷⁰. Agents collaborate via structured protocols: initial analysis (Scenario → All), theory query (Theory → Algorithm), data requirements (Algorithm → Data) validation (All → Validator), and conflict resolution through iterative refinement.

BERT was fine-tuned on 1143 papers for 5 epochs (learning rate 3 × 10⁻⁵). Expert assessments used 7-point Likert scales; disagreements (ICC < 0.70) triggered facilitated discussion. Monte Carlo simulations ran in Python 3.10 with NumPy 1.24 (seed = 42). Code and data are available in our public repository.

Case study development

To demonstrate the practical application of our theory-driven recommendation system, we construct three representative case studies based on the distinct research motivation patterns identified in our corpus analysis: problem-driven, method-driven, and technology-driven. These cases are synthesized from archetypal research patterns observed across our analyzed corpus (n = 1123), capturing essential characteristics of each motivation type while maintaining generalizability². Case archetype development follows a systematic protocol to ensure representativeness: identifying recurring patterns for each motivation type, synthesizing composite scenarios incorporating common problem domains, reflecting typical algorithmic approaches, maintaining realistic constraints, and avoiding direct replication of published work. For each case, we simulate the system’s recommendation process by inputting the original research framing and generating theory-algorithm-data suggestions based on our established mapping framework. The transformations are evaluated across multiple dimensions, including theoretical grounding, solution comprehensiveness, and practical value generation¹²³. Rather than serving as empirical validations, these proof-of-concept demonstrations illustrate the system’s capability to identify relevant theoretical frameworks and transform technical solutions into theory-informed, multi-dimensional approaches that generate value beyond algorithmic performance metrics¹⁷¹. This approach aligns with recent methodological advances in design science research for urban informatics¹⁷² and theory-driven AI system evaluation (see Supplementary Material 6, Table S7).

Data availability

Data Availability: All data supporting the findings of this study are accessible through the publicly available LLM-based multi-agent recommendation system at: https://frp-try.com:63105/. This system enables researchers, urban planners, and policymakers to input specific urban challenges and receive integrated recommendations for relevant theories, appropriate AI methods, and suitable data sources. To address data availability concerns, the system displays accessibility ratings (0–1 scale) for each recommended data source, links to public portals where applicable, and usage frequency statistics derived from our corpus analysis. Reviewers can directly test the recommendation logic using pre-loaded scenarios (e.g., “urban heat vulnerability assessment”) to observe how theoretical requirements translate to accessible data recommendations. Additionally, the underlying corpus of 1123 validated papers was compiled from the publicly available Web of Science and Scopus databases using the search queries described in the Methods section. Please note that the system uses a locally issued HTTPS certificate; browsers may display a security warning on first access, which can be safely dismissed. A comprehensive video demonstration of the full pipeline is provided in Supplementary Material 13.

Code availability

The source code for the multi-agent recommendation system is publicly available through our dedicated server at: https://frp-try.com:63105/. This includes the BERT fine-tuning pipeline, the theory-algorithm-data mapping framework, and the Monte Carlo validation scripts used to generate the results reported in this study. To ensure stable long-term availability, the system has been migrated from the original Streamlit prototype to a dedicated server with Qwen as the LLM backbone. All computational analyses were implemented in Python 3.10 with NumPy 1.24, and specific package versions and configuration parameters are documented in the “Methods” section and Supplementary Materials 2 and 8 to facilitate full reproducibility.

References

Cugurullo, F. et al. The rise of AI urbanism in post-smart cities: a critical commentary on urban artificial intelligence. Urban Stud. 61, 1–18 (2023).
Google Scholar
Son, T. H. et al. Algorithmic urban planning for smart and sustainable development: systematic review of the literature. Sustain. Cities Soc. 94, 104562 (2023).
Article Google Scholar
Yigitcanlar, T., Agdas, D. & Degirmenci, K. Artificial intelligence in local governments: perceptions of city managers on prospects, constraints and choices. AI Soc. 38, 1335–1349 (2023).
Article Google Scholar
Caprotti, F. et al. Why does urban artificial intelligence (AI) matter for urban studies? Developing research directions in urban AI research. Urban Geogr. 45, 883–894 (2024).
Article Google Scholar
Musa, M., Rahman, T., Deb, N. & Rahman, P. Harnessing artificial intelligence for sustainable urban development: advancing the three zeros method through innovation and infrastructure. Sci. Rep. 15, 23673 (2025).
Article CAS Google Scholar
Palmini, O. & Cugurullo, F. Design culture for sustainable urban artificial intelligence: Bruno Latour and the search for a different AI urbanism. Ethics Inf. Technol. 26, 1–12 (2024).
Article Google Scholar
Jacobs, J. The Death and Life of Great American Cities (Random House, 1961).
Bibri, S. E., Alexandre, A., Sharifi, A. & Krogstie, J. Environmentally sustainable smart cities and their converging AI, IoT, and big data technologies and solutions: an integrated approach to an extensive literature review. Energy Inform. 6, 9 (2023).
Article Google Scholar
Batty, M. Artificial intelligence and smart cities. Environ. Plan. B Urban Anal. City Sci. 45, 3–6 (2018).
Article Google Scholar
Hollander, J., Hahn, S. & Reed, M. The ethical concerns of artificial intelligence in urban planning. J. Am. Plan. Assoc. 90, 1–15 (2024).
Google Scholar
Caprotti, F., Messier, L. & Wilson, J. Artificial intelligence adoption in urban planning governance: a systematic review of advancements in decision-making, and policy making. Landsc. Urban Plan. 259, 105346 (2024).
Google Scholar
Chen, J., Zhang, F., Fan, Z. & Liu, Y. Urban visual intelligence: studying cities with artificial intelligence and street-level imagery. Ann. Am. Assoc. Geogr. 114, 876–897 (2024).
Google Scholar
Cugurullo, F., Caprotti, F. & Cook, M. New stories of urban AI: exploring the artificial intelligence-city nexus beyond Frankenstein Urbanism. Urban Geogr. 45, 1025–1048 (2024).
Article Google Scholar
Lauriére, M., Perrin, S., Geist, M. & Pietquin, O. Learning mean field games: a survey. Nat. Mach. Intell. 4, 423–439 (2022).
Google Scholar
Li, X., Wang, Q., Zhang, Y. & Chen, L. The fundamental issues and development trends of AI-driven transformations in urban transit and urban space. Sustain. Cities Soc. 101, 105890 (2025).
Google Scholar
Lartey, D. & Law, K. M. Artificial intelligence adoption in urban planning governance: a systematic review of advancements in decision-making, and policy making. Landsc. Urban Plan. 259, 105346 (2025).
Google Scholar
Yue, Y. et al. Shaping future sustainable cities with AI-powered urban informatics: toward human-AI symbiosis. Comput. Urban Sci. 5, 31 (2025).
Article Google Scholar
Xu, Y. & Cugurullo, F. When AIs become oracles: generative artificial intelligence, anticipatory urban governance, and the future of cities. Cities 145, 104666 (2024).
Google Scholar
Shin, M., Kim, J., van Opheusden, B. & Griffiths, T. L. Superhuman artificial intelligence can improve human decision-making by increasing novelty. Proc. Natl. Acad. Sci. USA 120, e2214840120 (2023).
Article CAS Google Scholar
Almulhim, A. & Cobbinah, P. Charting sustainable urban development through a systematic review of SDG11 research. Nat. Cities 1, 117–131 (2024).
Article Google Scholar
Titley, M., Butchart, S., Jones, V., Whittingham, M. & Willis, S. Global inequities and political borders challenge nature conservation under climate change. Proc. Natl. Acad. Sci. USA 118, e2011204118 (2021).
Article CAS Google Scholar
Clarivate Analytics. Web of Science. https://www.webofknowledge.com (2024).
Elsevier. Scopus. https://www.scopus.com (2024).
Aum, S. & Choe, S. Srbert: automatic article classification model for systematic review using BERT. Syst. Rev. 10, 285 (2021).
Article Google Scholar
Rogers, A., Kovaleva, O. & Rumshisky, A. A primer in BERTology: what we know about how BERT works. Trans. Assoc. Comput. Linguist. 8, 842–866 (2020).
Article Google Scholar
Moradi, Z., Moradi, M. & Ziari, K. Comparative analysis of sustainable urban development: unraveling challenges and dimensions in different continents and utilizing AI with BERT model for articles classification. Int. Rev. Spat. Plan. Sustain. Dev. D Plan. Assess. 13, 230–256 (2025).
van der Maaten, L. & Hinton, G. Visualizing data using t-SNE. J. Mach. Learn. Res. 9, 2579–2605 (2008).
Google Scholar
Linderman, G. C. & Steinerberger, S. Clustering with t-SNE, provably. SIAM J. Math. Data Sci. 1, 313–332 (2019).
Article Google Scholar
Cai, T. T. & Ma, R. Theoretical foundations of t-SNE for visualizing high-dimensional clustered data. J. Mach. Learn. Res. 23, 1–54 (2022).
Google Scholar
Koumetio Tekouabou, S. C., Diop, E. B., Azmi, R., Jaligot, R. & Chenal, J. Reviewing the application of machine learning methods to model urban form indicators in planning decision support systems: potential, issues and challenges. J. King Saud. Univ. Comput. Inf. Sci. 34, 5943–5967 (2022).
Article Google Scholar
Pathak, A. R., Pandey, M. & Rautaray, S. Machine learning for spatial analyses in urban areas: a scoping review. Appl. Soft Comput. 108, 107440 (2021).
Article Google Scholar
Koumetio Tekouabou, C. S. et al. Identifying and classifying urban data sources for machine learning-based sustainable urban planning and decision support systems development. Data 7, 170 (2022).
Article Google Scholar
Djokić, V., Djordjević, A. & Milovanović, A. Big data and urban form: a systematic review. J. Big Data 12, 17 (2025).
Article Google Scholar
Tu, T., Zhang, E. & Long, Y. Profile and theoretical advances in urban big data studies: a systematic review of 57 representative journals (2013–2023). Environment and Planning B: Urban Analytics and City Science. Advance online publication. https://doi.org/10.1177/23998083251346582 (2025).
Cook, M. & Karvonen, A. Urban planning and the knowledge politics of the smart city. Urban. Stud. 61, 370–382 (2024).
Sanchez, T. W. Planning on the verge of AI, or AI on the verge of planning. Urban Sci. 7, 70 (2023).
Article Google Scholar
Yigitcanlar, T., Agdas, D. & Degirmenci, K. Artificial intelligence in local governments: Perceptions of city managers on prospects, constraints and choices. AI Soc. 38, 1135–1150 (2023).
Article Google Scholar
Koumetio Tekouabou, S. C., Diop, E. B., Azmi, R., Jaligot, R. & Chenal, J. Artificial intelligence based methods for smart and sustainable urban planning: a systematic survey. Arch. Comput. Methods Eng. 30, 1421–1438 (2023).
Article Google Scholar
Wang, Z. & Ren, F. Developing a decision support system for sustainable urban planning using machine learning-based scenario modeling. Sci. Rep. 15, 13210 (2025).
Article CAS Google Scholar
Xiang, X., Li, Q., Khan, S. & Khalaf, O. I. Urban water resource management for sustainable environment planning using artificial intelligence techniques. Environ. Impact Assess. Rev. 86, 106515 (2021).
Article Google Scholar
Xin, R., Ai, T., Ding, L., Zhu, R. & Meng, L. Impact of the COVID-19 pandemic on urban human mobility–a multiscale geospatial network analysis using New York bike-sharing data. Cities 126, 103667 (2022).
Article Google Scholar
Rogers, E. M. Diffusion of Innovations 5th edn (Free Press, 2003).
LeCun, Y., Bengio, Y. & Hinton, G. Deep learning. Nature 521, 436–444 (2015).
Article CAS Google Scholar
Batty, M. Urban Modelling: Algorithms, Calibrations, Predictions. No. 3 (Cambridge University Press, 1976).
Chapin, F. S. Urban Land Use Planning 2nd edn (eds Stuart, F. & Chapin Jr.) (University of Illinois Press, 1965).
Krizhevsky, A., Sutskever, I. & Hinton, G. E. Imagenet classification with deep convolutional neural networks. In Advances in Neural Information Processing Systems, vol. 25, 1097–1105 (NIPS, 2012).
Brown, T. et al. Language models are few-shot learners. In Advances in Neural Information Processing Systems, vol. 33, 1877–1901 (NeurIPS, 2020).
Alom, M. Z. et al. The history began from AlexNet: a comprehensive survey on deep learning approaches. Preprint at https://doi.org/10.48550/arXiv.1803.01164 (2018).
Breiman, L. Random forests. Mach. Learn. 45, 5–32 (2001).
Article Google Scholar
Cortes, C. & Vapnik, V. Support-vector networks. Mach. Learn. 20, 273–297 (1995).
Article Google Scholar
Chen, T. & Guestrin, C. Xgboost: a scalable tree boosting system. In Proc. 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 785–794 (ACM, 2016).
Jeffery, C. R. Crime Prevention Through Environmental Design (SAGE Publications, 1971).
Newman, O. Defensible Space: Crime Prevention Through Urban Design (Macmillan, 1972).
Chaturvedi, V. & de Vries, W. T. Machine learning algorithms for urban land use planning: a review. Urban Sci. 5, 68 (2021).
Article Google Scholar
Schirmer, P. M. & Axhausen, K. W. Machine learning for spatial analyses in urban areas: a scoping review. Sustain. Cities Soc. 85, 104050 (2022).
Article Google Scholar
Perry, C. A. The neighborhood unit: a scheme of arrangement for the family-life community. In Regional Plan of New York and Its Environs, vol. VII of Neighborhood and Community Planning, 2–140 (Regional Plan Association, 1929).
Lawhon, L. L. The neighborhood unit: physical design or physical determinism?. J. Plan. Hist. 8, 111–132 (2009).
Article Google Scholar
United States Department of Agriculture. Food waste FAQs (2019). https://www.usda.gov/foodwaste/faqs. In the United States, food waste is estimated at between 30-40 percent of the food supply, corresponding to approximately 133 billion pounds and $161 billion worth of food in 2010.
Wolman, A. The metabolism of cities. Sci. Am. 213, 179–190 (1965).
Article CAS Google Scholar
Kennedy, C., Cuddihy, J. & Engel-Yan, J. The changing metabolism of cities. J. Ind. Ecol. 11, 43–59 (2007).
Article CAS Google Scholar
Mohai, P., Pellow, D. & Roberts, J. T. Environmental justice. Annu. Rev. Environ. Resour. 34, 405–430 (2009).
Article Google Scholar
Zhang, Q. & Li, H. Moea/d: a multiobjective evolutionary algorithm based on decomposition. IEEE Trans. Evolut. Comput. 11, 712–731 (2007).
Article Google Scholar
Wu, Z. et al. A comprehensive survey on graph neural networks. IEEE Trans. Neural Netw. Learn. Syst. 32, 4–24 (2020).
Article Google Scholar
Hochreiter, S. & Schmidhuber, J. Long short-term memory. Neural Comput. 9, 1735–1780 (1997).
Article CAS Google Scholar
Flanagan, B. E., Gregory, E. W., Hallisey, E. J., Heitgerd, J. L. & Lewis, B. A social vulnerability index for disaster management. J. Homel. Secur. Emerg. Manag. 8, 3 (2011).
Liu, B., Guo, X. & Jiang, J. How urban morphology relates to the urban heat island effect: a multi-indicator study. Sustainability 15, 10787 (2023).
Article Google Scholar
Azizi, A. et al. A data-driven approach for urban heat island predictions: rethinking the evaluation metrics and data preprocessing. Algorithms 17, 151 (2024).
Google Scholar
Tanoori, G., Soltani, A. & Modiri, A. Machine learning for urban heat island (UHI) analysis: predicting land surface temperature (LST) in urban environments. Urban Clim. 56, 101978 (2024).
Google Scholar
Huang, C. et al. Analysis of the impact mechanisms and driving factors of urban spatial morphology on urban heat islands. Sci. Rep. 15, 1–15 (2025).
Google Scholar
Stewart, I. D. & Oke, T. R. Local climate zones for urban temperature studies. Bull. Am. Meteorol. Soc. 93, 1879–1900 (2012).
Article Google Scholar
Parker, D. E. Urban heat island effects on estimates of observed climate change. Wiley Interdiscip. Rev. Clim. Change 1, 123–133 (2010).
Article Google Scholar
Bibri, S. E. Compact city planning and development: emerging practices and strategies for achieving the goals of sustainability. Dev. Built Environ. 4, 100021 (2020).
Article Google Scholar
Raissi, M., Perdikaris, P. & Karniadakis, G. E. Physics-informed neural networks: a deep learning framework for solving forward and inverse problems involving nonlinear partial differential equations. J. Comput. Phys. 378, 686–707 (2019).
Article Google Scholar
Cai, S., Wang, Z., Wang, S., Perdikaris, P. & Karniadakis, G. E. Physics-informed neural networks for heat transfer problems. J. Heat. Transf. 143, 060801 (2021).
Article CAS Google Scholar
Cuomo, S. et al. Scientific machine learning through physics–informed neural networks: where we are and what’s next. J. Sci. Comput. 92, 88 (2022).
Article Google Scholar
Kipf, T. N. & Welling, M. Semi-supervised classification with graph convolutional networks. Preprint at https://doi.org/10.48550/arXiv.1609.02907 (2017).
Ai, T. & Yan, X. A graph convolutional neural network for classification of building patterns using spatial vector data. ISPRS J. Photogramm. Remote Sens. 150, 259–273 (2019).
Article Google Scholar
Lundberg, S. M. & Lee, S.-I. A unified approach to interpreting model predictions. Adv. Neural Inf. Process. Syst. 30, 4765–4774 (2017).
Google Scholar
Uejio, C. K. et al. Intra-urban societal vulnerability to extreme heat: the role of heat exposure and the built environment, socioeconomics, and neighborhood stability. Health Place 17, 498–507 (2011).
Article Google Scholar
Harlan, S. L., Declet-Barreto, J. H., Stefanov, W. L. & Petitti, D. B. Neighborhood effects on heat deaths: social and environmental predictors of vulnerability in Maricopa County, Arizona. Environ. Health Perspect. 121, 197–204 (2013).
Article Google Scholar
Ribeiro, P. & Jardim Gonçalves, L. A. Urban resilience: a conceptual framework. Sustain. Cities Soc. 39, 685–697 (2019).
Google Scholar
Esposito, D. A ladder of urban resilience: an evolutionary framework for transformative governance of communities facing chronic crises. Sustainability 17, 6010 (2025).
Article Google Scholar
Felsenstein, D. & Grinberger, A. Y. Dynamic agent based simulation of welfare effects of urban disasters. Comput. Environ. Urban Syst. 59, 129–141 (2017).
Google Scholar
Schweikert, A., L’Her, G. & Deinert, M. Simple method for identifying interdependencies in service delivery in critical infrastructure networks. Appl. Netw. Sci. 6, 1–23 (2021).
Article Google Scholar
Zheng, Y. et al. Spatial planning of urban communities via deep reinforcement learning. Nat. Comput. Sci. 3, 748–762 (2023).
Article Google Scholar
Faraji, M., Nadi, S., Ghaffarpasand, O., Homayoni, S. & Downey, K. An integrated 3d cnn-gru deep learning method for short-term prediction of PM2.5 concentration in urban environment. Sci. Total Environ. 834, 155324 (2022).
Article CAS Google Scholar
Logan, T. M., Aven, T., Guikema, S. D. & Flage, R. Understanding cascading risks through real-world interdependent urban infrastructure. Reliab. Eng. Syst. Saf. 241, 109652 (2023).
Google Scholar
Cugurullo, F. et al. The rise of ai urbanism in post-smart cities: a critical commentary on urban artificial intelligence. Urban Stud. 61, 197–216 (2024).
Article Google Scholar
Li, L. & Zhao, N. Explicit and tacit knowledge have diverging urban growth patterns. npj Urban Sustain. 3, 1–6 (2023).
CAS Google Scholar
van Lankveld, W. et al. Understanding disciplinary perspectives: a framework to develop skills for interdisciplinary research collaborations of medical experts and engineers. BMC Med. Educ. 24, 1015 (2024).
Google Scholar
Fenoglio, E. et al. Tacit knowledge elicitation process for Industry 4.0. Discov. Artif. Intell. 2, 6 (2022).
Article Google Scholar
Yang, J., Jiang, Z., Cheng, K. & Wu, L. Disciplinary barriers need communication: a behavioral and fnirs study under group decision-making paradigm shift based on cabin design. Front. Neurosci. 19, 1594111 (2025).
Article Google Scholar
Smith, P., Callagher, L. J., Hibbert, P., Krull, E. & Hosking, J. Developing interdisciplinary learning: Spanning disciplinary and organizational boundaries. J. Manag. Educ. 48, 384–418 (2024).
Google Scholar
McCance, K. R. & Blanchard, M. Measuring the interdisciplinarity and collaboration perceptions of U.S. scientists, engineers, and educators. AERA Open 10, 23328584231218952 (2024).
Article Google Scholar
Dietl, A.-K., Derksen, C., Keller, F. M. & Lippke, S. Interdisciplinary and interprofessional communication intervention: how psychological safety fosters communication and increases patient safety. Front. Psychol. 14, 1128740 (2023).
Article Google Scholar
Wang, S. et al. Artificial intelligence adoption in urban planning governance: a systematic review of advancements in decision-making, and policy making. Landsc. Urban Plan. 259, 105346 (2025).
Google Scholar
Callaghan, M., Lamb, W. F. & Minx, J. C. Systematic global stocktake of over 50,000 urban climate change studies. Nat. Cities 2, 1–12 (2025).
Google Scholar
Malički, M., Jeroncic, A., Aalbersberg, I. J., Bouter, L. & Ter Riet, G. The present and future of peer review: ideas, interventions, and evidence. Proc. Natl. Acad. Sci. USA 121, e2401232121 (2024).
Google Scholar
Chen, B. et al. Contrasting inequality in human exposure to greenspace between cities of global north and global south. Nat. Commun. 13, 4636 (2022).
Article CAS Google Scholar
Morewedge, C. K., Jost, C. E., Herzenstein, M. & Park, J. People see more of their biases in algorithms. Proc. Natl. Acad. Sci. USA 121, e2317602121 (2024).
Article Google Scholar
Bai, X., Nagendra, H., Shi, P. & Liu, H. Integration of urban science and urban climate adaptation research: opportunities to advance climate action. npj Urban Sustain. 3, 32 (2023).
Article Google Scholar
Shrivastava, M. et al. Urban pollution greatly enhances formation of natural aerosols over the amazon rainforest. Nat. Commun. 10, 1046 (2019).
Article Google Scholar
Kirkpatrick, J. et al. Overcoming catastrophic forgetting in neural networks. Proc. Natl. Acad. Sci. USA 114, 3521–3526 (2017).
Article CAS Google Scholar
Kashinath, K. et al. Physics-informed machine learning: case studies for weather and climate modelling. Philos. Trans. R. Soc. A 379, 20200093 (2021).
Article CAS Google Scholar
Bi, K. et al. Accurate medium-range global weather forecasting with 3d neural networks. Nature 619, 533–538 (2023).
Article CAS Google Scholar
Awad, E. et al. The moral machine experiment. Nature 563, 59–64 (2018).
Article CAS Google Scholar
Brown, C. F. et al. Dynamic world, near real-time global 10 m land use land cover mapping. Sci. Data 9, 251 (2022).
Article Google Scholar
Rahwan, I. et al. Machine behaviour. Nature 568, 477–486 (2019).
Article CAS Google Scholar
Arun, C. Ai and the global south: designing for other worlds 3, 1–16 (2019).
Peng, Z.-R., Lu, K.-F., Liu, Y. & Zhai, W. The pathway of urban planning AI: from planning support to plan-making. J. Plan. Educ. Res. 44, 2285–2302 (2024).
Article Google Scholar
Palermo, P. C. & Ponzini, D. Whatever is happening to urban planning and urban design? musings on the current gap between theory and practice. City, Territ. Archit. 1, 1–16 (2014).
Article Google Scholar
Chen, W., Zhao, L., Kang, Q. & Di, F. Systematizing heterogeneous expert knowledge, scenarios and goals via a goal-reasoning artificial intelligence agent for democratic urban land use planning. Cities 101, 102703 (2020).
Article Google Scholar
Gao, C. et al. Large language models empowered agent-based modeling and simulation: a survey and perspectives. Humanit. Soc. Sci. Commun. 11, 1–30 (2024).
Article CAS Google Scholar
Wang, J. et al. Large language models asurban residents: an LLM agent framework for personal mobility generation. In Proc. 38th Int. Conf. Neural Information Processing Systems Vol. 3957, 28 (NeurIPS, 2024).
Luusua, A., Ylipulli, J., Foth, M. & Aurigi, A. Urban ai: understanding the emerging role of artificial intelligence in smart cities. AI Soc. 37, 1335–1344 (2022).
Google Scholar
Pries, J. Spatial theory in planning practice? on the concepts of space that made urban design a planning solution for segregation in malmö, Sweden. Antipode 56, 1024–1044 (2024).
Article Google Scholar
Sanchez, T. W., Shumway, H., Gordner, T. & Lim, T. The prospects of artificial intelligence in urban planning. Int. J. Urban Sci. 27, 179–194 (2023).
Article Google Scholar
Ouma, Y. O. et al. Urban land-use classification using machine learning classifiers: comparative evaluation and post-classification multi-feature fusion approach. Eur. J. Remote Sens. 56, 2173659 (2023).
Article Google Scholar
Yan, X. et al. A multimodal data fusion model for accurate and interpretable urban land use mapping with uncertainty analysis. Int. J. Appl. Earth Observ. Geoinf. 129, 103805 (2024).
Google Scholar
Yang, L. & Zhou, G. Dissecting the Analects: an NLP-based exploration of semantic similarities and differences across English translations. Humanit. Soc. Sci. Commun. 11, 50 (2024).
Article Google Scholar
Tyagi, N. & Bhushan, B. Demystifying the role of natural language processing (NLP) in smart city applications: background, motivation, recent advances, and future research directions. Wireless Pers. Commun. 130, 857–908 (2023).
Article Google Scholar
Abid, N. et al. Algorithmic urban planning for smart and sustainable development: systematic review of the literature. Sustain. Cities Soc. 94, 104562 (2023).
Article Google Scholar
Peng, Z.-R., Lu, K.-F., Liu, Y. & Zhai, W. The Pathway of Urban Planning AI: From Planning Support to Plan-Making. J. Plan. Educ. Res. 44, 2263–2279 (2024).
Article Google Scholar
Khadhraoui, M., Bellaaj, H., Ammar, M. B., Hamam, H. & Jmaiel, M. Survey of bert-base models for scientific text classification: Covid-19 case study. Appl. Sci. 12, 2891 (2022).
Article CAS Google Scholar
Li, X. & Jia, L. English text topic classification using BERT-based model. J. Comput. Methods Sci. Eng. 25, 669–684 (2025).
Google Scholar
Bannach-Brown, A. et al. Machine learning algorithms for systematic review: reducing workload in a preclinical review of animal studies and reducing human screening error. Syst. Rev. 8, 23 (2019).
Article Google Scholar
Tóth, B., Berek, L., Gulácsi, L., Péntek, M. & Zrubka, Z. Automation of systematic reviews of biomedical literature: a scoping review of studies indexed in PubMed. Syst. Rev. 13, 174 (2024).
Article Google Scholar
Xu, D. et al. Large language models for generative information extraction: a survey. Front. Comput. Sci. 18, 186357 (2024).
Article Google Scholar
Thapa, S. et al. Large language models (LLM) in computational social science: prospects, current state, and challenges. Soc. Netw. Anal. Min. 15, 4 (2025).
Article Google Scholar
Yao, Y. et al. A survey on large language model (LLM) security and privacy: the good, the bad, and the ugly. High.-Confid. Comput. 4, 100211 (2024).
Article Google Scholar
Reimers, N. & Gurevych, I. Sentence-BERT: Sentence embeddings using Siamese BERT-networks. Proc. 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP) 3982–3992 (2019).
Kumar, A., Singh, J. P., Kumar, N. P. et al. BERT applications in natural language processing: a review. Artif. Intell. Rev. 58, 1–90 (2025).
Google Scholar
Ester, M., Kriegel, H.-P., Sander, J. & Xu, X. A density-based algorithm for discovering clusters in large spatial databases with noise. In Proc. Second International Conference on Knowledge Discovery and Data Mining (KDD-96), 226–231 (AAAI Press, 1996).
Prager, E. M. et al. Improving transparency and scientific rigor in academic publishing. Cancer Rep. 2, e1150 (2019).
Google Scholar
Tobi, H. & Kampen, J. K. Research design: the methodology for interdisciplinary research framework. Qual. Quant. 52, 1209–1225 (2017).
Article Google Scholar
Gasparetto, A., Marcuzzo, M., Zangari, A. & Albarelli, A. A survey on text classification: from traditional to deep learning. Information 13, 83 (2022).
Article Google Scholar
Li, Q. et al. A survey on text classification: From traditional to deep learning. ACM Trans. Intell. Syst. Technol. 13, 1–41 (2022).
Google Scholar
Petit, N. & Teece, D. J. Innovating big tech firms and competition policy: favoring dynamic over static competition. Ind. Corp. Change 30, 1168–1198 (2021).
Article Google Scholar
van den Bulk, L. M. et al. Automatic classification of literature in systematic reviews on food safety using machine learning. Curr. Res. Food Sci. 5, 84–95 (2022).
Article Google Scholar
Lin, L., Zhou, D., Wang, J. & Wang, Y. A systematic review of big data driven education evaluation. SAGE Open 14, 1–18 (2024).
Article Google Scholar
Saputra, N. A., Riza, L. S., Setiawan, A. & Hamidah, I. A systematic review for classification and selection of deep learning methods. Digit. Signal Process. 146, 104393 (2024).
Google Scholar
Minaee, S. et al. Deep learning–based text classification: a comprehensive review. ACM Comput. Surv. 54, 1–40 (2021).
Article Google Scholar
Schneider, P. et al. A decade of knowledge graphs in natural language processing: a survey. In Proc. 2nd Conf. Asia-Pacific Chapter Assoc. Comput. Linguistics and 12th Int. Joint Conf. Natural Language Processing, vol. 1, Long Papers, 601–614 (2022).
Pan, J. et al. Large language models and knowledge graphs: opportunities and challenges. Trans. Graph Data Knowl. 1, 2–1238 (2023).
Google Scholar
Xuefeng, B. et al. Construction of a knowledge graph for framework material enabled by large language models and its application. npj Comput. Mater. 11, 217 (2025).
Google Scholar
Venugopal, V. & Olivetti, E. Matkg: an autonomously generated knowledge graph in material science. Sci. Data 11, 217 (2024).
Article Google Scholar
Mondal, I., Hou, Y. & Jochim, C. End-to-end construction of NLP knowledge graph. In Findings of the Association for Computational Linguistics: ACL-IJCNLP, vol. 2021, 1885–1895 (2021).
Zhong, L., Wu, J., Li, Q., Peng, H. & Wu, X. A comprehensive survey on automatic knowledge graph construction. ACM Comput. Surv. 56, 1–62 (2023).
Article Google Scholar
Zhu, Y. et al. LLMs for knowledge graph construction and reasoning: recent capabilities and future opportunities. World Wide Web 27, 58 (2024).
Article CAS Google Scholar
Zhou, X., Zhou, M., Huang, D. & Cui, L. A probabilistic model for co-occurrence analysis in bibliometrics. J. Biomed. Inform. 128, 104047 (2022).
Article Google Scholar
Yuan, C., Li, G., Kamarthi, S., Jin, X. & Moghaddam, M. Trends in intelligent manufacturing research: a keyword co-occurrence network based review. J. Intell. Manuf. 33, 425–439 (2022).
Article Google Scholar
Wang, Y. et al. Exploring academic influence of algorithms by co-occurrence network based on full-text of academic papers. Aslib J. Inf. Manag. 77, 651–680 (2025).
Article Google Scholar
Cook, D., Brydges, R., Ginsburg, S. & Hatala, R. Validation of educational assessments: a primer for simulation and beyond. Adv. Simul. 8, 27 (2023).
Google Scholar
Liang, X. & Zhang, Y. A validity framework for accountability: educational measurement and language testing. Lang. Test. Asia 12, 3 (2022).
Article Google Scholar
van Haastrecht, M. et al. Vast: a practical validation framework for e-assessment solutions. Inf. Syst. e-Bus. Manag. 21, 1–32 (2023).
Google Scholar
Zhang, S. et al. Development and validation of an instrument for assessing scientific literacy from junior to senior high school. Discip. Interdiscip. Sci. Educ. Res. 5, 21 (2023).
Article CAS Google Scholar
Qian, K., Chen, Z., Jiao, H., Li, N. et al. AI agent as urban planner: steering stakeholder dynamics in urban planning via consensus-based multi-agent reinforcement learning. Preprint at https://doi.org/10.48550/arXiv.2310.16772 (2023).
Zhou, Z., Lin, Y., Jin, D. & Li, Y. Large language model for participatory urban planning. Preprint at https://doi.org/10.48550/arXiv.2402.17161 (2024).
Budennyy, S. A., Voskresenskiy, A. V., Shichkin, A. V., Bekhtin, Y. et al. LLM agents for smart city management: enhancing decision support through multi-agent AI systems. Smart Cities 8, 19 (2025).
Article Google Scholar
Wang, J., Huang, J. X. & Sheng, J. An efficient long-text semantic retrieval approach via utilizing presentation learning on short-text. Complex Intell. Syst. 10, 963–979 (2024).
Article Google Scholar
Zhang, F., Khomiakov, M., Zhou, J., Noyman, A. & Duarte, F. Generative spatial artificial intelligence for sustainable smart cities: a pioneering large flow model for urban digital twin. Sustain. Cities Soc. 121, 106043 (2025).
Article Google Scholar
Bottou, L., Curtis, F. E. & Nocedal, J. Optimization methods for large-scale machine learning. SIAM Rev. 60, 223–311 (2018).
Article Google Scholar
Talbi, E.-G., Basseur, M., Nebro, A. J. & Alba, E. Hybrid approaches to optimization and machine learning methods: a systematic literature review. Mach. Learn. 113, 4055–4118 (2024).
Article Google Scholar
Yu, B., Yin, H. & Zhu, Z. Spatio-temporal graph convolutional networks: a deep learning framework for traffic forecasting. In Proc. 27th Int. Joint Conf. Artificial Intelligence (IJCAI-18), 3634–3640 (2018).
Ma, W., Chu, Z., Chen, H. & Li, M. Spatio-temporal envolutional graph neural network for traffic flow prediction in UAV-based urban traffic monitoring system. Sci. Rep. 15, 1234 (2025).
Google Scholar
Kahn, B. K., Strong, D. M. & Wang, R. Y. Information quality benchmarks: product and service performance. Commun. ACM 45, 184–192 (2002).
Article Google Scholar
Wang, R. Y. & Strong, D. M. Beyond accuracy: what data quality means to data consumers. J. Manag. Inf. Syst. 12, 5–33 (1996).
Article Google Scholar
Shashaani, S., Ng, S. H. & Eckman, D. Robust output analysis with Monte-Carlo methodology. Preprint at https://doi.org/10.48550/arXiv.2207.13612 (2023).
El-Horbaty, Y. S. & Hanafy, E. M. A Monte Carlo permutation procedure for testing variance components using robust estimation methods. Stat. Pap. 65, 335–356 (2024).
Article Google Scholar
Kozlova, M. & Yeomans, J. S. Extending system dynamics modeling using simulation decomposition to improve the urban planning process. Front. Sustain. Cities 5, 1129316 (2023).
Article Google Scholar
Choi, H. S. & Zhang, W. Artificial intelligence as research methods in urban design. J. Urban Des. 29, 182–203 (2024).
Google Scholar
He, W. & Chen, M. Advancing urban life: a systematic review of emerging technologies and artificial intelligence in urban design and planning. Buildings 14, 835 (2024).
Article Google Scholar

Download references

Acknowledgements

This research was supported by the Basic Research Program of Jiangsu (Grant No. BK20241815) and the Xi'an Jiaotong Liverpool University Research Development Fund (Grant No. RDF-23-02-004). The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

Author information

Authors and Affiliations

Department of Biosciences and Bioinformatics, Suzhou Municipal Key Lab AI4Health, School of Science, Xi’an Jiaotong-Liverpool University, Suzhou, China
Jiawei Tong, Shuihua Wang & John Moraros
Institute of Systems, Molecular & Integrative Biology, University of Liverpool, Liverpool, UK
Jiawei Tong
Department of Computer Science, Data Science, and Engineering, New York University Shanghai, Pudong, China
Guangyu Wang
School of Advanced Technology, Xi’an Jiaotong-Liverpool University, Suzhou, China
Yan Wang

Authors

Jiawei Tong
View author publications
Search author on:PubMed Google Scholar
Shuihua Wang
View author publications
Search author on:PubMed Google Scholar
Guangyu Wang
View author publications
Search author on:PubMed Google Scholar
Yan Wang
View author publications
Search author on:PubMed Google Scholar
John Moraros
View author publications
Search author on:PubMed Google Scholar

Contributions

J.T. conceived and designed the study, conducted the comprehensive literature review and data collection, developed the theoretical framework, designed and implemented the multi-agent recommendation system, performed all data analysis and technical validation, developed the case studies, and wrote the manuscript. S.W. provided supervision, guidance on urban theory integration, and contributed to manuscript revision. G.W. assisted with technical implementation and validation. Y.W. created partly visual illustrations and graphics. J.M. provided research supervision, theoretical guidance, and contributed to manuscript revision. All authors participated in manuscript review and revision, and have read and approved the final manuscript.

Corresponding authors

Correspondence to Shuihua Wang or John Moraros.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

Supplementary information (download PDF )

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.

Reprints and permissions

About this article

Cite this article

Tong, J., Wang, S., Wang, G. et al. Bridging urban theory and artificial intelligence: a multi-agent recommendation system for sustainable city development. npj Urban Sustain 6, 77 (2026). https://doi.org/10.1038/s42949-026-00377-2

Download citation

Received: 17 July 2025
Accepted: 08 March 2026
Published: 23 March 2026
Version of record: 14 May 2026
DOI: https://doi.org/10.1038/s42949-026-00377-2

Subjects

Abstract

Similar content being viewed by others

Leveraging artificial intelligence to enable sustainable urban development through the creation of smart and environmentally friendly carbon-free cities

Large language models in urban planning

A systematic map of machine learning for urban climate change mitigation

Introduction

Results

The technology-first paradigm

Validation of theoretical integration measurement

Temporal evolution: from problem-pull to technology-push

Bridging theory and practice: a computational mapping framework

Case a: problem-driven research—urban food waste crisis

Case B: method-driven research—urban heat island prediction

Case C: technology-driven research—AI applications in disaster management

Discussion

Methods

Data on traditional urban theories

Data on urban research literature

Algorithm and data source grouping

Validation of theoretical integration measurement

Research motivation classification

Theory-algorithm-data mapping construction

Theory-driven multi-agent recommendation system

Case study development

Data availability

Code availability

References

Acknowledgements

Author information

Authors and Affiliations

Contributions

Corresponding authors

Ethics declarations

Competing interests

Additional information

Supplementary information

Supplementary information (download PDF )

Rights and permissions

About this article

Cite this article

Share this article

Search

Quick links