Introduction

The interdisciplinary field of disaster resilience modeling, which generally aims to quantify the (uncertain) ability of a system to withstand, recover from, and adapt to such stressors as natural hazards, has seen exponential growth in scholarly work, framework and tools development, and even pilot-scale deployment to facilitate community-driven inquiries1,2,3,4,5,6. Narrowing focus on assessing resilience of an infrastructure system, a range of input models or algorithms are typically derived and coupled to simulate: (1) the hazards under current or future climate conditions, (2) the vulnerability or fragility of the exposed built environment, (3) the functionality, serviceability or alternative measure of the system’s ability to perform its role, (4) the evolution of this performance over time as restoration and recovery processes evolve and (5) the uncertainty propagation across multiple scales considered when quantifying resilience and evaluating interventions (see for refs. 2,5,7,8,9). The chained structure of these models (presented in Fig. 1) conveniently allows communication between different scientific disciplines to achieve common resilience modeling objectives, separating the simulation of processes by expert competence, and enabling what-if scenario exploration. For example: How would expected annual storm disruptions to power availability change if selected transmission towers were retrofitted? What level of hazardous material spill risks could be avoided if a major flood defense system were built around an industrial complex? Will equitable restoration of transportation access to critical facilities by vulnerable populations improve with prepositioned resources or advance recovery contracts?

Fig. 1: Components of current resilience modeling, with illustration of post-event system-level outcomes over time and social equity checks.
Fig. 1: Components of current resilience modeling, with illustration of post-event system-level outcomes over time and social equity checks.
Full size image

a Models for external stressors such as shock events (e.g., seismic or flood hazards), and progressive stressors (e.g., deterioration). b Infrastructure models used to represent physical components’ location, characteristics, and fragility conditioned to different stressors. The infrastructure systems’ service capacity and demands models may need considering interdependencies. c Performance, response, or damage models to assess the condition of the exposed system and their ability to perform their functions. d Recovery models to estimate infrastructure’s condition and serviceability evolution during recovery actions. The models in (d) may also consider cascading effects through the existing interdependencies. e Aggregated and disaggregated community resilience quantification, commonly obtained as a static metric from time-dependent post-event predicted functionality curves. Spatial visualization of communities’ resilience can be used to inspect disaster equity-related quantities of interest. Uncertainty propagation is considered across the different models in (ad). For the sake of simplicity, probabilistic nature is only depicted at the level of functionality curves in (e).

While many important gaps and questions still exist with respect to this infrastructure resilience modeling paradigm, recent research trends suggest that we will slowly chip away at some of these needs. For example, developing missing models for understudied hazards or infrastructures10,11,12,13; representing infrastructure systems as the socio-technical systems that they truly are when modeling their performance over time14,15,16,17,18; tightening model coupling across natural-built-socio-economic systems when evaluating broader system dynamics and metrics of resilience19,20,21,22,23,24; or integrating alternative decision algorithms with these resilience models to support interventions at different stages of the disaster life-cycle4,25,26,27,28. However, future cities demand more from our infrastructure resilience algorithms and models, posing heightened challenges and opportunities. First, real-world infrastructure and the demands placed on them present uncertain and time-evolving conditions. Not only are the chronic stressors and acute hazards changing and uncertain (i.e., associated with climate change), but so are other conditions affecting infrastructure resilience, such as aging and deterioration, changing technology, or shifting societal demands and user requirements29,30,31,32. Thus, despite the expected improvements in modeling external stressors and cities’ infrastructure and communities’ performance, resilience models still generally lack the necessary agility–meaning that while they may predict temporally evolving system performance, the models themselves tend to remain unchanged over time or are inflexible30. Yet we sit on the precipice of a data revolution affording the opportunity to leverage emerging content from smart systems, to harness the capabilities of intelligent algorithms, or to steer future data collection efforts to improve model measurements and predictions.

Furthermore, when managing risks or investing in resilience enhancement, future cities demand equity considerations, often positioned in the urban resilience literature from the perspective of distributional equity33,34,35,36. For example, the inequitable impacts of natural hazards on communities or unequal disaster response actions are well-known and documented in the literature37,38,39,40,41 (i.e., disparate impacts on socially vulnerable groups, including minorities, the elderly, or low-income populations) along with the inequitable distribution and quality of infrastructure and its services. These inequities are the target of recent calls for just investments in hazard and climate risk mitigation42, with methods beginning to emerge to support equity-informed pursuits ranging from prioritization of critical components in a system to optimizing infrastructure upgrades43,44. However, the inequities of our ability to predict infrastructure resilience with confidence should also be scrutinized, calling for an examination of the data and algorithms underpinning resilience modeling in our current and future cities. Data quality and (un)availability, particularly in underserved communities, along with potential model biases or systemic errors that are not often quantified or well understood, can all affect our ability to equitably measure and predict infrastructure resilience. Neglecting this dimension may catalyze inequities when these models are used for decision-making, undermining pursuits of just resilience investments.

Considering the demands of future cities and limitations of current infrastructure resilience modeling approaches, we propose a perspective shift toward frameworks for developing algorithms and models that are “smart and equitable.” Herein we describe the characteristics of such perspective, along with potential pathways toward realizing this transition via both conceptual and algorithmic approaches. We envision that infrastructure resilience models in future cities will be flexible and adaptable to cope with continually evolving conditions of real-world infrastructure systems; agile and intelligent to leverage diverse and emerging data from smart systems; and just in pursuing strategies to not only minimize model errors but do so in an equitable manner. As a result, measurements or predictions from such models are expected to help achieve heightened accuracy and reliability, improved timeliness and efficiency, and enhanced equity to support the quest for infrastructure resilience.

Smart and equitable resilience modeling perspective

The “smart” modeling perspective

In the envisioned perspective shift, we pursue “smart models” defined as those flexible to cope with changing conditions, agile to leverage diverse data from smart systems, and autonomous to fuse, update, and learn from this content. Hence “smart” denotes not only the emphasis on rendering useful the growing IoT, data collection campaigns, and smart city pursuits of many modern communities. It also refers to the attributes of the algorithms expected to underpin future infrastructure resilience models, whereby intelligence is introduced in the process of learning, predicting, efficiently exploring, and probing infrastructure in hazard-prone urban settings. Figure 2 shows the comparison of resilience assessment strategies, depicting infrastructure performance over time. In the current approaches, even though time-dependent features are modeled, the infrastructure metrics are measured with respect to models fixed in time, usually unaware of changes or evolution in the knowledge we have about a system. Figure 2 also provides examples of how, in a smart resilience framework, different data sources update such estimates as damage, time to recovery, or time to technology shift throughout the infrastructure lifecycle. They communicate models (and their predictions) with observations and knowledge sensed from the physical infrastructure through continual feedback loops. Intelligent algorithms offer the opportunity to enhance efficiency in model development, model verification, and uncertainty propagation.

Fig. 2: Time-dependent resilience estimates and recovery modeling in traditional and smart approaches.
Fig. 2: Time-dependent resilience estimates and recovery modeling in traditional and smart approaches.
Full size image

a The annual resilience estimates derived from current and smart modeling approaches. Current models can depict infrastructure performance under progressive and shock events. Also, maintenance actions can be posed in terms of a limiting minimum estimate value. Uncertainty bounds may or may not change over time. Smart modeling approaches take advantage of different information sources and intelligent algorithms to infuse validated knowledge into the resilience estimation. In this way the uncertainty is temporal and conditional on the available information. The uncertainty in prediction may increase if the models are not updated, however the need for updating may reduce as algorithms, models, and sensors’ technology improves. Corrective maintenance could be scheduled ahead, or delayed, with respect to the traditional approach, given better information is at hand for decision-making. b Traditional and smart-based post-event functionality evolution. Current recovery modeling probably relies on pre-existing models adapted from other regions, without improvements observed in inference even if new system condition were sensed. Expert-opinion, data assimilation, data fusion, and dynamic updating of current recovery models can be used to better inform the infrastructure recovery evolution in smart modeling approaches. (Icons © Microsoft).

The “equitable” modeling emphasis

While emerging literature focuses on measuring, modeling, and predicting the equitable outcomes of disasters, we take a distinct approach herein. “Equitable modeling” refers specifically to our infrastructure resilience modeling capabilities—whether model errors or biases systemically affect sub-populations, particularly socially vulnerable groups that may have limited capacity to adapt or respond to extreme events like natural hazards. We want to avert the situation where existing inequities are exacerbated by producing less confident or biased infrastructure resilience predictions in those same locales where social vulnerability is highest, schematically undermining abilities to inform mitigation or adaptation with confidence as shown in Fig. 3. Such inequities in predictions may stem from the data that underpins resilience quantification, for example, the quality of available infrastructure inventory databases (which may be inadequate in resource constrained communities), the distribution of field deployed physical or social sensors used for condition updating (which may be biased toward particular segments of the population), the chained models for resilience assessment (whose measurement bias may not be properly established, harming the final estimates), or the assumptions for model deployment (that may suffer from representational bias, influencing the predictions towards the characteristics of the population used to develop a particular model). That is, as underscored in Fig. 3, not only biased or limited data is expected to affect the equitable modeling, but also modeling choices or lack of comprehension of the propagated model errors, e.g., from model coupling. Biases may also emerge from (lack of) model availability; for example, damage estimates biased toward a particular typology of structure, or demand and usage patterns not fully representative of the constituency in a community. Uncovering and overcoming any existing and future inequities in the data and algorithms that underpin our infrastructure resilience models is a precursor to steering just and equitable interventions founded on the model predictions. This perspective aims for comparable resilience modeling capability among communities with diverse economic resources. We emphasize equitable and need-based effort in guiding data collection and resilience model development until equality in model availability and quality is achieved.

Fig. 3: Spatial distribution of model errors or biases relative to socially vulnerable populations.
Fig. 3: Spatial distribution of model errors or biases relative to socially vulnerable populations.
Full size image

a Schematic example of data and model inequities in a transportation network exposure model. Data quality for roads and bridges depicts a high correlation with the geographical location, subsequently with income level (used to depict a social vulnerability metric). Model fidelity for road models seems to be fair everywhere, but individual bridges show a lack of fidelity in their model definition, which may occur given poor modeling choices (e.g., not considering aging conditions in certain locales). b Schematic example of data and model inequities in a building’s portfolio model. Damage data collected depicts a bias in data availability or quality, with the potential to exacerbate inequities in census tracts (or other social units) with limited capacity to cope with disasters. Fragility modeling availability appears adequate for most of the region, however the performance on some structures in a few census tracts may be over- or under-represented, typically observed when common behavior is assumed in systems that do not necessarily share the same performance under similar hazard conditions.

Toward realizing smart and equitable infrastructure resilience modeling

The traditional modeling cycle in the infrastructure engineering field occurs in a sequential manner, starting from the data collection and ending with the model validation (and deployment). This sequential approach is applied by groups of diverse expertize to derive the different input models required for resilience estimation at a certain time (shown in Fig. 1). Hazard, damage, network performance, and system recovery modeling are complex, uncertain modeling tasks. Seldom are closed-form solutions built on physics-based analytical equations available, nor are they sufficient to characterize complex infrastructure resilience and associated uncertainty. Instead, deriving input models for resilience prediction often hinges on probabilistic analysis of numerical models, expert judgment, experimental data, or empirical data from field reconnaissance to name a few.

Given the complex interactions involved in infrastructure resilience modeling, researchers are leveraging algorithms from statistical learning, surrogate modeling, artificial intelligence (AI) and machine learning (ML)45,46,47. In a best-case scenario, analysts may be privy to \(l\) “fully labeled” observations \({\mathcal{L}}{{=}}\left\{\left({{\bf{x}}}_{1},\,{{\bf{y}}}_{1}\right),\,\ldots ,\,\left({{\bf{x}}}_{l},\,{{\bf{y}}}_{l}\right)\right\}\), where \({\bf{X}}=\left({X}_{1},\ldots ,{X}_{p}\right)\) represents the features of the problem to model, and \({\bf{Y}}=({Y}_{1},\ldots ,{Y}_{q})\) the system measurable responses. Given these observations, the goal becomes to define the “best” model h*, from the set of possible models \({\mathcal{H}}\), that minimizes the error \(E\left(\cdot \right)\) between the model prediction \(\hat{{\bf{y}}}\) and the available observations y, see Eq. (1)48,49. A set of algorithmic steps are performed during training to achieve a generalizable model, that is, encouraging small prediction error on unseen data48.

$${h}^{{\boldsymbol{* }}}\left({\bf{x}}\right)\,{\boldsymbol{=}}\,{{\arg }}\mathop{\min }\limits_{h\in {\mathcal{H}}}E\left(\hat{{\bf{y}}},{\bf{y}}\right)$$
(1)

While this is the most common pathway for leveraging AI and ML to derive input models for resilience analysis, we foresee six actionable strategies to guide research efforts towards smart and equitable infrastructure resilience modeling (presented in this section). Such strategies build upon and extend the traditional infrastructure resilience modeling paradigm, while also leveraging emerging opportunities to harness AI and ML toward models that evolve, embody concepts of fairness and equity, and offer reliable predictions. Our target audience is researchers studying and modeling infrastructure resilience in the face of natural hazards. However, the proposed perspectives can also inform practice and policymaking, considering the growing need to promote resilient infrastructure under exacerbating climate impacts. Each strategy presents a conceptual discussion and a practical example depicting how current efforts are moving in this direction. Focus is given to modeling limitation and needs, oversighted research areas that can be embraced to improve infrastructure resilience modeling, and general illustration of cases where smart and equitable models have been recently applied to the field of engineering resilience. Figure 4 presents schematically the different pathways to leverage recent advances in artificial intelligence, big data, algorithmic fairness, and resilience models to promote equitable community resilience.

Fig. 4: Pathways to promote smarter and more equitable infrastructure resilience modeling.
Fig. 4: Pathways to promote smarter and more equitable infrastructure resilience modeling.
Full size image

a Intelligent algorithms can model complex failure patterns without overfitting or underfitting, working also in settings where sparse observations requires using diverse data sources and techniques, such as semi-supervised and transfer learning. b Combining observations from multiple data sources can overcome data scarcity and improve model prediction quality. c The smart resilience perspective focuses on a continuous improvement of models’ performance by using observations that emerge over time related to different components of infrastructure resilience. d Guiding data collection by leveraging methods such as active learning will facilitate optimal and efficient resilience model development. e Biases in data or models must be assessed at different levels, for example at census tracts or individual households, to uncover if these are fairly and equitably distributed. f Attention is steered towards addressing data or model errors in locales with populations with increased vulnerabilities, allowing models to self-guide error minimization to improve equity in resilience modeling. (Icons © Google Material Icons).

Pathway 1: Tackling data scarcity through knowledge transfer

Although we seek, and envision, a future data-rich environment for input model development, the reality is that the infrastructure resilience quantification problem often suffers from data scarcity. This can be attributed to the relative infrequency of extreme hazard events, the computational expense of physics-based models, and the challenges of systematic data collection (such as the cost of deploying and maintaining sensors). Here, we propose tackling data scarcity through algorithmic approaches to maximize knowledge extraction from existing data. One practical way would be exploring algorithms that exploit unlabeled observations \({\mathcal{U}}\,{\mathcal{=}}\,{\mathcal{\{}}{{\bf{x}}}_{l+1},\,\ldots,\,{{\bf{x}}}_{l+u}{\mathcal{\}}}\), that is, data collected without information about the response feature, to enrich the data’s marginal distribution representation p(x), thus improving the models’ inference, \(p\left(y|x\right)\)50,51,52 (e.g., semi-supervised methods). For example, data collected from aerial imagery could complement the dataset of a model trained for predicting damage or infrastructure conditions. The former does not add information regarding the response of interest (i.e., “infrastructure condition”), but may contribute to discern hidden clusters or low-dimensional manifolds by enlarging the feature information X, thus improving the ability to classify the damage conditions. Other examples of algorithmic-centered learning methods include transfer learning techniques where training a model for a target problem \({\mathcal{T}}\) is supported on the knowledge from a separate but similar (source) problem \({\mathcal{S}}\)53,54. Sharing or “transferring” knowledge from a secondary source can support infrastructure resilience model development in cases where information in one domain is scarce but similar data is available from another domain (e.g., fragility estimates are limited for one region or structure type but can be complemented from similar others); data arrives from sources with different features (e.g., empirical and simulation-based recovery data); the marginal distribution of the data collected is outdated (e.g., policy or technological changes may impact the data acquired between two sequential inspections)53,54,55; or the conditional probability distributions are similar (e.g., learning different infrastructure performance metrics can benefit from knowledge regarding another metric). Note that the information from the source \({\mathcal{S}}\) is not directly used to reduce the data scarcity, but the shared similarities with the context \({\mathcal{T}}\) are leveraged to improve (and make faster) the training of a model. Approaches envisioned in this pathway would only require the adoption and possible adaptation of learning algorithms to the context of dynamic resilience modeling, making it very affordable for its application in the short term.

Example. Structural damage detection. Images from post-earthquake building inspections or regional-wide aerial imagery can be considered “unlabeled” until experts have judged the “damage condition;” in such cases only a few hundred images may be fully classified considering limited resources. To circumvent the necessity of completely labeled datasets, different researchers have explored transfer and semi-supervised learning approaches56,57,58. For example, Gao and Mosalam57 explored the use of a pretrained model (noted as VGG-16 Model, trained on ImageNet dataset) to help train a convolutional neural network (CNN) to predict the structural component type, detect the type of damage and to predict the severity level of the damage. This example demonstrates the use of transfer learning to improve damage detection and prediction models, enabling communities with faster post-event building tagging.

Pathway 2: Harnessing multi-modal data to enrich data availability and improve resilience prediction

Approaches presented in the Pathway 1 cope with data scarcity through an algorithmic-centered perspective. In situations where there may be several related but diverse data sources, blending such information can enhance data availability and improve models’ prediction capabilities (that is, a data-centered approach). Through this lens one can explore using more than one type of sensor for informing (or complementing) disaster risk model estimates. While none of the sensors may be predefined to collect data related to disaster risk impacts, many of them are useful to understand the processes related to disasters that impact infrastructures and citizens. We include in the sensors’ definition social sensors (crowdsourcing, social media, citizen service portals), physical sensors (water level sensors, traffic speed sensors, mobile trace data, structural health monitoring, and IoT devices), remote sensors (UAVs, satellite, and airborne platforms), mathematical models (e.g., OpenSafe Mobility59, CERA60), and authoritative data sources (DOTs, NOAA). Of the cases where such multi-modal data exists, few efforts have aimed to combine them and most approaches tend to rely on the most trustworthy (or complete) source. However, the predictions about the state \({\mathcal{y}}\) of a system or process can be improved by fusing the shared knowledge between the sources as demonstrated by past authors59,61,62,63.

When using data fusion strategies, a careful characterization of the data sources (Qi for \(i=1,\ldots ,n\) sources) should be pursued in terms of accuracy, bias, fidelities, and time lag, among other characteristics, under various conditions to avoid harming the predictions’ reliability. Moreover, different workflows (Wi) may be needed for processing multi-modal data from diverse sources (i.e., Wi(Qi)). For example, natural language processing is needed to extract insights from text data (e.g., information on social media related to a hazard event or infrastructure recovery in progress), while deep learning-based image classifiers may be needed to glean information from aerial imagery (e.g., useful for prediction of spatial and temporal damage evolution across a region). The diverse sources are used to obtain the (imperfect) observations \({Z}_{i}^{t}\) of the state of the system y, and subsequently these are fused, as shown in Eqs. (2) and (3). Methods such as Kalman filters64, Dynamic Bayesian Networks65,66, and Particle filters67 can be leveraged to fuse resilience-related data.

$${Z}_{1:n}^{t}=\left\{{W}_{i}\left({Q}_{i}^{t}\right)\forall \,i=1,\ldots ,n\,{\rm{at}}\; {\rm{time}}\; {\rm{t}}\right\}$$
(2)
$$P\left(y|{Z}_{1:n}^{t}\right)=\frac{P\left({Z}_{1:n}^{t}|y\right)p\left(y\right)}{P\left({Z}_{1:n}^{t}\right)}$$
(3)

As explained above, this pathway assumes richer data contexts, improving the resilience model development, verification, and validation by allowing to include multi-modal data sources. Cities need to have started their transition towards technological conditions embedded in smart cities, as well defining algorithmic approaches to deal with such rich-data settings; these tasks may require a medium to long time for implementation.

Example. Fusing real-time observations for predicting road conditions during flooding. Social sensors, physics-based models, physical sensors, and remote sensing can be used to improve data availability or model training to enhance the inferences about ground conditions during a disaster, i.e., to inform emergency response. Developed and implemented as an online application, Panakkal and Padgett62 present an example of data fusion of multi-modal sources to estimate real-time road conditions during storms. Their framework uses data from gages operated by public agencies, physics-based models, social media data, citizen portals, traffic camera images, traffic alerts, among other sources, and source-specific processing methods to infer the condition of the road. They proposed a data fusion approach based on the discrete form of the Bayes filter. To scale their application, physic-guided augmentation techniques infer neighbor road conditions and network analyses workflows were used to evaluate flood impacts at different scales. The data-fusion of existing data sources consists in a situational awareness framework able to sense real-time flooded roads, exemplifying ways to reduce the burden of costly physical sensors for monitoring floods.

Pathway 3: Enabling continual learning for improved resilience quantification

Deterioration, variations in hazardous events frequencies, and shifts in socio-physical interactions (for example, changes in infrastructure service demands or policies), impose additional challenges that prompt the need for resilience models to cope with ever-evolving conditions, i.e., to understand cities’ infrastructures as the complex dynamic system they are14,29,30,68,69,70. Moreover, we envision such variations require algorithms able to not only capture the non-stationarities of the processes that take place during the infrastructure service life but to learn, recursively, as new conditions of the system are experienced. In such settings, it is typical to describe a model \({h}_{{\boldsymbol{\Theta }}}\left(\cdot \right)\) by a set of parameters Θ, which can be treated as random variables—with distribution P(Θ)—to reflect our uncertainty about the model itself. Letting X be the system characteristics, M the set of system stressors, and a response variable Y (for example, the performance of an aged concrete building), one could update the model parameters given new observations \({\boldsymbol{o}}\,{\boldsymbol{=}}\,{\boldsymbol{\{}}{\bf{x}}{\boldsymbol{,}}{\bf{m}}{\boldsymbol{,}}\,{y}{\boldsymbol{\}}}\); see Eq. (4). This updated model, \(P\left({\boldsymbol{\Theta}}|{\bf{m}}{\boldsymbol{,}}\,{\bf{x}}{\boldsymbol{,}}\,{y}\right)\), incorporates the insights from observed data, such as the level of deterioration of structural components, enhancing model fidelity and performance throughout the system’s lifetime.

$$P\left({\mathbf{\Theta}}|{\bf{m}},\,{\bf{x}},{y}\right)\propto P\left({\mathbf{\Theta}}\right)P\left({\bf{m}},\,{\bf{x}},{y}|{\mathbf{\Theta}}\right)$$
(4)

As shown in Eq. 4, Bayesian methods are a straightforward path to achieve continual learning; other methods such as online and transfer learning schemes can be exploited for continuous model updating as well53,71,72. Transfer learning could be adapted to use the data from a model, at a given time, to re-train such model once new observations become available (aiming to perform better in the latter). For example, using updated data on power grid outages to continuously re-train the prediction of future years system failures.

In general, existing surrogate modeling techniques based on Bayesian frameworks, statistical learning techniques, AI and ML methods comprehensively studied in the past decade22,45,46,73, provide theoretical foundation and methods to enable a prompt transition of current resilience models to ones more agile and intelligent (i.e., autonomous). Applications of such capabilities are often observed in the reliability engineering and structural monitoring realm (typically of a single structure), but the improved technologies for large-scale data collection, monitoring, and processing will enable adaptation of this pathway in the short-to-medium term. Self-driven model learning is proposed to adapt single- and multi-task models to the real-world dynamics and complexities of the infrastructure resilience problem31,74.

Example. Time-dependent resilience modeling and updating. Rather than estimating the (static and) traditional resilience metric following the occurrence of a hazardous event, Ouyang and Duenas-Osorio29 framed resilience as a time-dependent metric able to capture evolution of the systems and their conditions. In the midst of ever-changing socio-physical systems, they envisioned resilience assessments must consider time-dependency and (non-linear) relations that take place in the continuous history of interactions between infrastructure with its hazards, user demands, and post-event improvements, such as the integration of new technologies or improvements of operation standards29. To ensure that modeling not only predicts long term evolution and improvement mechanisms, but indeed can measure and learn from the observed fluctuation on the system conditions (as new data, knowledge, and models emerge), Rincon and Padgett30 proposed a multi-scale modeling approach supported by recursive dynamic estimation of resilience estimates. With the use of recursive Bayesian estimation methods, prior information (from models or past observations) is fused with the new observations of features that describe system processes, impacting the multi-scale models’ estimations of the engineered systems.

Pathway 4: Steering efficient data collection and resilience computation

Due to the elevated challenges of modeling resilience of complex dynamic systems (such as multi-scale interactions, feedback loops of information, and continual model learning), it is imperative to achieve more agile, flexible, and smarter approaches capable of handling big data needs and high computational demands. In many situations, the problem of resilience-related data collection departs from being able to observe the phenomenon or reliably fuse information from disparate sources (Pathways 1 and 2) and becomes tightly constrained to the complexity—in time and costs—of the labeling procedure (i.e., the process of assigning a tag to an observation or obtaining a model response). Algorithms should then become intelligent in the sense they can steer efficient data collection as well as strategize computational simulations to reach faster and accurate, yet affordable, estimations of infrastructure resilience. Modeling approaches with the ability to query responses from an algorithm or a human (-in-the-loop) are becoming powerful tools to embed intelligence in algorithms such that they optimize resources during the analysis of resilience in complex and uncertain environments75,76,77,78. That is, the model can enrich the training set \({\mathcal{L}}\) in an iterative manner by accessing to the unlabeled samples \({\mathcal{U}}\) to find the most ‘informative data’, or sample x*; later, such sample response is queried or targeted for collection and added to the training set \({\mathcal{L}}\). Informative samples are selected using Eq. (5), where a “value” function V(x) is defined to attain different modeling objectives, such as to reduce the model uncertainty, induce the largest model change, or minimize the generalization error.75,77

$${{\bf{x}}}^{{\boldsymbol{* }}}\,{\boldsymbol{=}}\,{\rm{arg}}\mathop{\min }\limits_{{\bf{x}}{{\in }}{\mathcal{U}}}V{\boldsymbol{(}}{\bf{x}}{\boldsymbol{)}}$$
(5)

For example, with such capabilities the algorithm can prompt evaluation (e.g., of a computational model) of a hazard scenario that contributes towards improving the overall accuracy on the resilience-related estimates (total damaged buildings, fraction of closed bridges, total population dislocated, time to recover network functionality)79,80,81. This enrichment procedure is repeatedly performed until the model reaches a target accuracy, acceptable error, or an acceptable labeling cost is surpassed. Models can also be guided to align budgetary constraints with goals of reduction of data biasedness and minimization of modeling inequities as explained later in this section.

Example. Models trained with less information to reduce data collection efforts. Borrowing ideas from the active learning field, which focuses on reducing the cost of “labeling” (i.e., defining the class or response variable y) of instances x, researchers working on reliability analysis and optimization of infrastructure performance have seen great advances in terms of accuracy, rapidity, and computational cost savings during model development73,79,80,82,83,84. For example, Zhang et al.84 demonstrated the application of active-learning trained surrogates for enabling the value of information analysis for optimal decision-making involving load tests in truss bridges using adaptive Gaussian process regression. However, most of the application of active learning has been focused on one-time (or static) model development, where instant infrastructure performance estimates are the final goal. We foresee the use of active learning beyond this scope, towards longer-term goals such as guiding continuous model development and updating, making affordable the computation of time-dependent risk and resilience estimates, and steering data collection to lessen biased outcomes or reduce the risk of having future model outdating. Embedding of such autonomous characteristics within the infrastructure resilience models is envisaged in a short-term for model simulations and resilience computation; longer-terms may be required for the case of implementing the model-guided data collection in real-world settings.

Pathway 5: Uncovering biases and inequities in resilience models

As largely discussed, resilience assessment at the community level requires numerous data sets and sub-models developed by independent researchers (as shown in Fig. 1), which are likely developed with very different assumptions and modeling choices regarding how to represent the (unobservable) processes and their interactions at the face of disasters and infrastructure. Once translated to a tangible resilience estimate, decision-makers will try to ensure that mitigation of future impacts is achieved, but here we pose some questions regarding the type of support offered by the models. Can the models developed ad-hoc be confidently chained together for applications in real world complex systems (without showing unexpected coupling effects) or can these be leveraged to in fact guide just investments in resilience enhancement? Were the models equitably constructed, or do they inextricably pose biased estimations? Are we offering decision-makers an estimation of resilience whose errors and uncertainties are equitably distributed, particularly as they relate to socially vulnerable populations? Do these model biases and uncertainties affect decision-outcomes that alleviate or exacerbate pre-existing inequities in infrastructure performance?

In this direction, understanding whether an algorithm systematically favors (or harms) a subpopulation is of paramount importance. This systematic undesired performance of an algorithm has been recently noted as “algorithmic bias” in the field of AI and ML79,80. Concerned with “bias,” it has brought diverse points of view about its definition, how it can be sensed, which are considered sources of bias, and how it can be mitigated. Algorithmic bias, in this study, is assumed to be the root cause of inequities in the infrastructure resilience models. Exploring and adopting methods for uncovering biases and inequities in resilience models is imperative and we consider rapid implementation possible from a technical point of view. In practice, longer times may be needed to find consensus on how to shift the researchers’ and stakeholders’ proclivity towards demanding and posing equitable infrastructure resilience models and metrics, particular to individual characteristics of each society.

Modeling inequities should be examined at each stage of model development, taking proactive measures to quantify, control, and reduce data, modeling, and deployment bias and ensuring an equitable computation of resilience. For example, data scarcity, beyond catalyzing model uncertainties, can even further exacerbate the models’ inequities which is an undesired model outcome. Inequity in the resilience workflows (e.g., coupling models whose performance is only acceptable when used alone) could also exacerbate risk to the affected communities and perpetuate a cycle of injustice and vulnerability. First, identifying sources of bias ϵ is essential for mitigating their impacts; stages from data collection to model deployment are prone to different types of bias85,86,87,88. It is imperative to simultaneously investigate whether such biases are fairly and equitably distributed, that is, if the cases where these exceed an unacceptable threshold ϵ0 are not conditional upon sensitive features Ci (which could include gender, ethnicity, age, or aggregate social vulnerability metrics that elucidate those disadvantaged by the system) as shown in Eq. (6).

$${\rm{P}}\left(\epsilon \ge {\epsilon }_{0}|{\bf{X}}={\bf{x}},C={C}_{i}\right)\equiv {\rm{P}}\left(\epsilon \ge {\epsilon }_{0}|{\bf{X}}={\bf{x}}\right)$$
(6)

While such notion of fairness is important, the inspection of modeling inequities should make sure also that within-groups inequities are avoided, guaranteeing that individuals who belong to individual categories Ci do not suffer exacerbated impacts89,90,91. Measuring disaster inequities, and steering equitable resilience enhancement, should rely on models that themselves are equitable, else the important effort may be undermined with inadvertent impacts on vulnerable groups. While uncovering modeling inequities may be a priority, focus should be placed on developing methods and resilience models (and metrics) that are robust against modeling errors, because it might be unfeasible to completely overcome bias or inequity.

Example. Exploring the presence of bias in the representation of social media during disasters. Concerned about the possible bias in disaster informatics, Fan et al.92 analyzed the content of large datasets of social media posts during Hurricane Harvey and Hurricane Florence. From variable correlation analysis and statistical tests, they found that the number of damage claims (from the Federal Emergency Management Agency, FEMA) and the social media attention were concentrated (and strongly correlated) with the population size. More importantly, they found that the social media attention was not correlated with other socio-demographic factors like education level, median income, unemployment, among others. To discover other possible bias, they developed a classification algorithm based on deep learning models for social media classification into humanitarian categories. They classified the social media data and found that more attention (independent of the population aggrupation scale) was given to rescue and donation efforts.

Pathway 6: Equitably minimizing errors in infrastructure resilience estimation

The adverse effects of inequitably distributed modeling errors in resilience assessments can affect certain sub-populations, exacerbating pre-existing inequities if such outcomes are used to mitigate disaster impacts. Modeling resilience for informing different action plans (e.g., pre-event mitigation actions, situational awareness procedures, plans for recovery and prioritization, or post-event new adaptation policies) should consider strategies to equitably minimize the algorithmic errors throughout the modeling pipeline. With this purpose in mind, developed models could also be augmented with intelligent capabilities to equitably minimize the model errors by themselves. For example, leveraging smart learning methods, such as the aforementioned active learning strategies, one could use measured errors to enlighten the search space for “informative and equitable” samples. In that regard, the training of sub-models using active learning can leave aside the mere interest on attaining the highest accuracy on the prediction (e.g., minimizing the root mean square error, RMSE, of a regressor that predicts the disaster impact on a validation set) to focus on the mitigation of the inequities posed by the model themselves (that is, guaranteeing that larger RMSE or other metrics related to prediction’s quality is not concentrated on specific groups, locations, services, individuals, etc.). Such a strategy places heighted algorithmic training attention in populations with increased vulnerabilities to improve model quality. Other ways of minimizing inequities in modeling approaches require the use of multi-sensor data fusion, larger investments in model testing, calibration of the generalizability of our models to different contexts (e.g., exploiting transfer learning techniques), and other strategies which leads us to argue that smart and equitable models should be inexorably tied together in the realm of disaster modeling.

Example. Mitigating the impact of aggregation bias in fragility models. Decision-makers will use the available information of existing models to prompt actions seeking to protect, absorb, recover, or adapt from the negative impacts of disasters. In the context of disaster risk assessment, fragility functions are used to represent the probabilistic performance (e.g., damage potential) of structures conditioned typically on the disaster-imposed demands (noted as an intensity measure vector im), i.e., \(P({D|}{\bf{im}})\,\)93,94. However, developing fragility models for individual structures in large portfolios is commonly unfeasible (and impractical due to changing conditions). Hence, class-fragility models have been a common approach in regional scale assessments95,96,97,98. A possible problem with this approach is that the aggrupation scheme could bias the systems’ response or performance estimation towards the “representative” models used for the fragility model development99, especially if the underrepresented structures are linked to populations with higher social vulnerabilities88. Although not necessarily connecting the level of resolution with modeling inequities, many researchers have embarked on mitigating unfair system representation by parameterizing the fragility models to better capture the system complexities. Such models capture characteristics of the systems such as geometrical parameters, materials, or aging conditions, condensed in a vector X9,100,101,102. These models offer more expressive and tailored fragility functions, i.e., \(P({D|}{\bf{im}}{\boldsymbol{,}}{\bf{x}})\), with the potential to assuage aggregation bias; however, their potential to do so in an equitable fashion has yet to be explored.

Future pathways, opportunities and challenges

Reliable and effective functioning of infrastructure systems during and following a hazard event is essential to public safety, economic vitality, and quality of life in modern and future cities. The complex interplay of environmental stressors, infrastructure components, societal actors, and modeling frameworks in future cities not only challenges the prediction of such infrastructure performance over time but also renders it a dynamic and uncertain process. Still, accurate and timely information on multi-hazard conditions and cascading consequences is needed to aid resilience related decision-making and communication with the public before, during, and after a natural hazard like a flood, earthquake, or hurricane. Pursuing intelligence and equity as guiding principles of infrastructure resilience model and algorithm development is crucial, particularly as we sit on the precipice of smartening our infrastructure systems and transitioning technology, all the while pursuing just investments in hazard resilience.

A perspective shift is proposed in infrastructure resilience model and algorithm development toward one that infuses “intelligence” and is equitable. In this framework the models are equipped to enable agile, efficient, and high confidence infrastructure resilience predictions, while overcoming potential biases and inequities to guide a transition to future smart and just systems. We highlight the potential to leverage techniques from the general domain of AI and ML, statistical learning, and Bayesian frameworks, but within the context of developing models, evolving models, steering intelligently model enhancements, and critically probing models for their equitable performance in the disaster resilience context. It should be noted that other pathways based on formal methods for complex systems modeling, efficient sampling, reduced order and surrogate modeling, modeling choices evaluations, value of information, among others47,103,104,105,106,107,108, are expected to complement and enhance those proposed in this paper, contributing to the field of modeling infrastructure resilience in the future cities. The order of adoption of the perspectives in a region will depend on the availability of data, models, resources, and expertize. However, we recommend prioritizing measures to reduce data and model bias. Specifically, we envision prioritizing efficient data collection informed by modeling needs (Pathway 4; Fig. 4d) and bias reduction considerations (Pathways 5-6; Fig. 4e, f). Enhanced data availability can directly influence other pathways, including the community’s ability to leverage intelligent algorithms and transfer knowledge (Pathway 1; Fig. 4a), use multimodal data to enhance resilience (Pathway 2; Fig. 4b); and continuously improve resilience models (Pathway 3; Fig. 4c).

While a subset of limited examples is identified from the literature, significant opportunities exist to overcome the lack of agility in resilience models, explore ways in which models can better guide actionable plans in real world settings, and demonstrate how intelligent systems also require coordination among diverse stakeholders for enhanced resilience measurement. In addition, methods should be developed to verify model robustness against pre-existing inequities in data and modeling choices, or to quantify how model errors drive (or dissuade) investment in disaster mitigation plans that actually worsen (or improve) the overall wellbeing of the community. Other methods will be required for handling large amounts of data, processing multi-sensor data that arrives asynchronously, assimilating data and knowledge from infrastructure performance measured at different scales, addressing heterogeneity in data quality, detecting outliers (a difficult task in the presence extreme events), among other needs that emerge from a shift in the vision of future cities’ infrastructure resilience models. Special attention will be needed to validate and verify multi-scale estimates, prioritizing those that will be used for decision making.

Uncovering bias and overcoming inequities in the methods that underpin resilience quantification is critical to support just infrastructure transformations both now and in the future. While equitable modeling concerns are underscored here, ethical risks also need attention given that the envisioned smart models will depend on the constant feedback between infrastructure systems, models, and humans. Only once these challenges are met can we confidently approach the task of guiding solutions that advocate for equitable resilience outcomes amidst dynamic and uncertain conditions associated with climate impacts, infrastructure aging, demand shifts, or new technology. With the growing scholarly work in methods to support decision-making regarding disaster and infrastructure equity, and mounting programmatic propensity for pursuing just investments, this perspective shift in resilience modeling where intelligence and equity are inextricably linked (and where we take a hard look at our data and models) is timely if not absolutely necessary. The transformation will require a collective effort and investment, but our future cities demand and deserve it.