Introduction

Urban planning practices pursued after World War II led to the creation of urban areas mainly centred on the use of private motorised vehicles1. This practice has led to the proliferation of the phenomenon of urban sprawl, i.e. low-density urban expansion, instead of compact urban forms. However, this kind of urban development is no longer sustainable in all aspects (social, environmental and economic). Therefore, it is necessary to resort to urban planning centred on active mobility modes, like walking, which constitutes the primary mode of human movement, as well as being the most environmentally friendly, accessible, and affordable. Additionally, planning for pedestrian-friendly spaces and enhancing walkability not only increases environmental and health outcomes2 but also fosters more, attractive, livable, and inclusive cities3,4, where diverse transportation methods are more efficient and better integrated with each other5.

Another outcome of post-World War II urban planning in Europe is also the development of suburban university campuses, following the model pioneered by American cities. They derive, among other reasons, from the need for large, non-urbanised and affordable areas. Nonetheless, within these specialised urban enclaves, pedestrian movement and walkability appears to be of particular relevance. In fact, university campuses, as bustling centers of academic and social activities, represent microcosms of urban life where students primarily move by walking. However, the walkability of a campus is more than a matter of practical convenience as it profoundly influences the overall well-being of its academic community6.

Similarly to other urban centers, university campuses, with their intricate layout and high concentration of streets and social facilities, are examples of complex networks, and more specifically, transportation networks7,8,9,10. Therefore, many of their features can be analysed using strategies commonly used for studying complex systems, like measures of effective temperature and entropy, testing how the system reacts to perturbations or studying the network efficiency11,12. More specifically, studying the transport properties of a university campus translates to placing pedestrians along paths in a confined geometric network.

In this context, the utilization of WiFi data emerges as a powerful tool for detecting pedestrian movements. In contrast to mobile networks data, WiFi data are easily available by public institutions and they rely on relatively economic infrastructures that are already present in public spaces. Other WiFi applications13 can be found in the context of epidemic control, which became particularly urgent in 2020 and 2021 due to the Covid-19 pandemic14, or for tracking people’s trajectories and occupancy to study pedestrian accessibility in outdoor locations, like cities15. Similar studies in indoor locations, like large buildings16, can be relevant for reducing energy consumption17 or testing mixed techniques18,19.

This research endeavors to shed light on the current state of pedestrian movements on the University Campus located in Northern Italy (Parma) and, especially, to test a set of methods for analyzing urban complex networks. To investigate the context of pedestrian movements, we first establish a realistic network for pedestrian paths in the campus. Afterward, using extensive WiFi data collected at the University, we measure the average occupancy in each building and the pedestrian flux between each pair of buildings. We derive the pedestrian traffic on each arc and the overall entropy of the network, with the aim of estimating the information gain due to WiFi data. Finally, we perform different tests to assess the robustness of the network, suggesting optimal strategies for reducing overcrowding and the excessive lengthening of pedestrian paths due to the removal of individual arcs. As we navigate the complex interplay of physical spaces and human dynamics, the insights gleaned from this study hold the potential to enhance walkability20 in urban and educational settings.

Apart from this study, in the past years, other analysis had been conducted on accessibility and walkability state of Parma Campus. For example, in 2023, the study of the pedestrian accessibility of the Parma Campus was conducted through the Space Syntax approach21. Furthermore, the WiFi data on Parma University had been already used in other analysis, especially regarding the control of epidemics22,23. Besides WiFi, other technologies have been used to track pedestrians’ movements, like Bluetooth24 and GPS25,26. Since each technology has its own limitations, analyses for tracking pedestrians are often characterized by the fusion of multiple techniques27,28. In this way, it is also possible to compare directly different technologies and approaches.

The paper is organized as follows: the next chapter outlines the strategies and models we adopted for distributing pedestrians on the possible paths, and ultimately, on the arcs of a realistic pedestrian network of the Campus of Parma. In Chapter 3, we present our results of the distribution of pedestrian traffic on the network. Chapter 4 is dedicated to the study of the entropy of the network and the information gain provided by WiFi data. In Chapter 5, we analyse the participation ratio of the arcs of the network in terms of the contributions from the pedestrian fluxes. In Chapter 6, considering different types of measures, we test how much our network is susceptible to changes like the removal of single arcs. The final chapter provides a summary of our findings, presents our conclusions and possible future applications of our analysis.

Methods

Pedestrian movements within the university campus will be described by first introducing a campus walking network. In this network, arcs represent pedestrian infrastructures, i.e. pavements, other footpaths and pedestrian crossings, with their physical length; but also main lanes of parking areas and roadsides (in absence of dedicated pavements), as pedestrians tend to occupy the road space as well as walkways. Nodes represent either the junctions between these infrastructures or the main access to university buildings interconnected by the network. Within these buildings, pedestrians can connect their devices to the university WiFi system. This network was created using Geographic Information Systems (GIS) data available trough the municipality GIS platform, reprocessed and implemented as needed through additional sources and on-site observations, to accurately capture real pedestrian movements on the campus. Further details on the network construction can be found in Section A of Supplementary material.

The first goal of our study is to estimate the pedestrian traffic \(p_i\) in each arc of the network i by using the data obtained from the WiFi connection of the pedestrians devices. As we will show in Section 2.1 the WiFi dataset can be efficiently used to observe not only the occupation of the university buildings but also the travels between them. These can be quantified in terms of an average daily flux \(\Psi _{\alpha \beta }\) between buildings \(\alpha\) and \(\beta\). In Section 2.1 we also discuss some basic characteristic of the building occupancy as a function of the building location in the campus and of the different part of the working day.

In Section 2.2 we outline a method to distribute the pedestrian fluxes between buildings \(\Psi _{\alpha \beta }\) within the campus walking network to obtain the arc pedestrian traffic, \(p_i\). This method prioritizes the shortest paths by introducing an effective inverse temperature related to the length of the path29. It not only provides a reliable description of movement within the campus but also allows us to estimate the network’s response to perturbations, such as adding or removing pedestrian infrastructure or changing the occupancy of a specific building.

Population density occupancy and fluxes between buildings

The WiFi have been automatically collected by the University of Parma’s IT staff23. The structure of the dataset is described in details Section B of Supplementary material. Instead, In section C we discuss some operations needed to clean up errors in the dataset due to occasional malfunctions of the WiFi systems, and to deal reliably with data relevant to an agent using multiple WiFi devices. After completing this preliminary process, we determine whether WiFi users were present on campus at any given time and in which building. Whenever a WiFi user connects to at least two buildings in a day, we can extrapolate a virtual trajectory of the relevant user. However, WiFi data displays limitations, which include the unpredictability of periodic time checks for new connections, gaps in WiFi coverage in certain areas of the campus, limited WiFi usage and a limited precision in the location of the WiFi device connected to a certain antenna. Such limitations do not allow us to extrapolate the precise path associated to each movement and one can obtain with a certain precision only the flux between couples of buildings. For this reason, in order to obtain an estimate of the pedestrian traffic in each arc of the network we redistribute such fluxes into all paths that connect the relevant buildings according to a rule the we will discuss in the next section.

It is important to stress that approximately one third of the campus population connects to the WiFi network. However, assuming a uniform likelihood of connection across individuals, this does not affect the structure of the inferred mobility patterns and only results in a global rescaling of the measured fluxes. Therefore, this limitation introduces a systematic bias in the absolute values of traffic, but it is largely irrelevant for the relative comparisons and structural analyses performed in this study.

Our analysis covers three months, with 41 working days from October 9, 2023, to December 7, 2023. This period was selected for its regular daily activities. To calculate pedestrian fluxes, we count each time a person’s device connects to a different building’s WiFi network, and we then determine the average number of people moving between the two buildings per day. This approach identifies buildings where users connect to WiFi as common starting points or destinations within the campus environment.

Specifically, we measure the average number of connected people in 30-minute intervals between 8:00 a.m. and 7:00 p.m. for each building, as this time frame encompasses all academic activities. The results of occupancy are shown in Fig. 1, where each building is depicted as a dot which size is proportional to its average population density. As anticipated, the Figure illustrates that the majority of people who remain indoors occupy only a few university buildings. The physical layout of all campus buildings, including those not covered by WiFi or located in peripheral areas, is shown in grey in the background of the pedestrian network.

In order to better understand these results, and especially those regarding pedestrian traffic which will be discussed later, it is important to know that certain regions within campus are not covered by WiFi connection. These are represented in grey in Fig. 1 and are the sports center (A) and the IMEM-CNR building (B). Furthermore, WiFi connection is also absent in the accesses of the campus in blue. The main one (D) is served by car lanes flanked by pavements on both sides; the other car access (C) is only served by car lanes; the third access (E), can only be traveled on foot or by bicycle.

Fig. 1
figure 1

Analysis of building occupancy. We realised a scheme where buildings are highlighted as dots, which dimension is proportional to their average population occupancy. The buildings with the highest occupancy are “Ingegneria Scientifica” and “Sede Didattica”, primarily used for engineering and architecture courses, as well as “Q02” and the “Pharmacy” building. The black lines represent the pedestrian paths that constitute the network. The buildings are colored according to their function. The classification used is shown in the legend at the bottom left. Moreover, we highlighted the regions that are not covered by WiFi connection in grey and the accesses of the campus in lightblue.

Pedestrian traffic on campus network

To assign a pedestrian traffic to each arc, based on pedestrian fluxes between buidings, as we cannot directly determine a pedestrian trajectory, we make assumptions based on common human behavior29,30,31. For instance, people typically choose the shortest path when moving from one place to another, although this is not always the case32. As a matter of fact, secondary factors33,34 influencing path choice, some of which are highly subjective, include the number and intensity of angular deviations, perceived safety, the beauty of the surroundings, and weather conditions.

In the context of pedestrian decision-making, defining a rigorous framework is challenging due to the multitude of influencing factors35. Among the factors influencing pedestrian behavior, one of the most extensively researched and inherently intriguing phenomena is the impact of crowding on movement patterns and decision-making processes36,37.

Considering the factors we mentioned, the most straightforward approach for distributing pedestrian flows between each pair of buildings is to allocate them among all suitable paths, i.e. sequences of arcs connected by nodes that do not form loops. The distribution of weights depends on the lengths of the paths, with an intuitive expectation of an exponential decay pattern. Specifically, we opted to use the following law to assign the weight for each path:

$$\begin{aligned} W(\gamma _{\alpha \beta }) = \dfrac{\textit{N}}{1+ \exp \left( k \ \dfrac{D_{\gamma _{\alpha \beta }}-D_{\min }}{D_{\gamma _{\alpha \beta }}}\right) } \end{aligned}$$
(1)

where \(\gamma _{\alpha \beta }\) is a selected path, between the buildings \(\alpha\) and \(\beta\), \(D_{\gamma _{\alpha \beta }}\) is the length of the path \(\gamma _{\alpha \beta }\), \(D_{\min }\) is the length of the shortest path, \(\textit{N}\) is a normalization factor, evaluated numerically, and k is a control parameter. This law had been formulated in the context of algorithmic modeling and simulations based on Markov theory applied to the study of pedestrian dynamics in Venice29.

The value of the parameter \(k\) can be roughly estimated based on typical pedestrian behavior. Since we do not have access to the actual trajectories chosen by pedestrians, it is essential to test the effect of varying \(k\) within a realistic range. In particular, we considered three values: \(5\), \(20\), and \(50\). To provide an intuition behind these choices, we calculated—according to equation 1—how much more likely pedestrians are to choose the shortest path over one that is \(20\%\) longer. For the three values of \(k\), the corresponding probability ratios are approximately \(1.86\), \(27.8\), and \(11,\!014\), respectively. In the first case, pedestrians are encouraged to also explore paths which are considerably longer than the shortest path even if they are always less likely. In the latter case, pedestrians almost always choose the shortest path available, or at most slightly longer alternatives, as significantly longer paths become extremely unlikely.

This parameter plays a role analogous to the inverse temperature in statistical systems, as it controls the intensity of fluctuations or perturbations in the system. In the limit \(k \rightarrow \infty\) (corresponding to zero temperature), the system becomes “frozen” in the configuration where pedestrians exclusively follow the shortest paths. Conversely, when \(k \rightarrow 0\) (corresponding to infinite temperature), all paths become equally likely, and the system behaves as if it were in a highly excited and strongly disordered state. A clear visualization of the effect of the parameter k on the pedestrian traffic is reported in Section G of the Supplementary Material by focusing on the flux between two single buildings.

In order to apply the previous formula, we needed to know all paths that connect each pair of buildings. However, depending on the complexity of the network, finding the shortest paths can be computationally demanding. The literature offers various strategies for solving the shortest path problem38,39.

In our case, we accepted to find only the majority (or at least the shortest ones) of paths that connect each pair of buildings by using random walks. Specifically, for each pair of buildings, we generated a list of all possible trajectories by allowing self-avoiding random walks to explore possible paths between the two nodes. A complete introduction to self-avoiding random walks40 can be found in literature, along with many applications in urban mobility41. Moreover, an explanation, in which the role of self-avoiding random walks is contextualized in our project, can be found in Section D of Supplementary material.

The procedure for the assignment of the pedestrian weight \(p_{j}\) to each arc j can be summarized by the following expression:

$$\begin{aligned} p_{j} = \sum _{(\alpha , \beta ), \ \alpha \ne \beta } \Psi _{\alpha \beta } \sum _{\gamma _{\alpha \beta } \ni j}W(\gamma _{\alpha \beta }) = \sum _{(\alpha , \beta ), \ \alpha \ne \beta } p^{\alpha \beta }_{j} \end{aligned}$$
(2)

where the first and second sum runs over all building couples and the paths \(\gamma _{\alpha \beta }\), between the buildings \(\alpha\) and \(\beta\), that passes through the arc j, respectively. Furthermore, \(\Psi _{\alpha \beta }\) is the pedestrian fluxes between buildings \(\alpha\) and \(\beta\) and \(W(\gamma _{\alpha \beta })\) is the weight associated to the path \(\gamma _{\alpha \beta }\).

In order to significantly decrease the computational cost of Eqs. (1) and (2), we neglect very long unlikely paths form the summation in (2). In particular, we keep only paths that satisfy the following condition:

$$\begin{aligned} D_{\gamma _{\alpha \beta }} < \left( 1+\frac{A}{k}\right) D_{\min } \end{aligned}$$
(3)

where \(D_{\gamma _{\alpha \beta }}\) and \(D_{\min }\) are the length of the generic and shortest path that connects a fixed pair of buildings. By fixing \(A=10\) we skip paths with weight 11,014 times smaller than the one assigned to the shortest path. In Section E of Supplementary material, we show a test of robustness regarding the choice of this threshold.

Results on pedestrian movements on the network

Pedestrian traffic on arcs and the k dependency

The results on pedestrian traffic are shown in Fig. 2 which contains three networks associated to values of k equal to 5, 20 and 50, respectively. As a reference, in Section F of Supplementary material, we reported the list of the highest pedestrian fluxes we detected.

We removed from the representations all arcs leading to building entrances, as they can be considered extensions of the buildings themselves. Moreover, we used a logarithmic scale for pedestrian traffic, since traffic values are highly concentrated on a small number of frequently used arcs, while the majority are significantly less used.

Upon examining the main features common to all networks, in Fig. 2d, we highlight in red the region on network that delineates most of the pedestrian flows; as expected, the buildings with the highest occupancy are located in this area. Instead, as a reminder, the grey and blue regions (entrances) indicate where there is no WiFi connection.

In Fig. 2e, we show the frequency distributions of pedestrian traffic for different values of k. It can be observed that decreasing k effectively imposes an upper cutoff on the pedestrian traffic: the highest traffic values on the arcs decrease, and simultaneously, the number of arcs with very low traffic is reduced. In other words, decreasing k leads to a more even redistribution of pedestrian traffic, with flow being distributed more regularly across the network arcs. The histograms of pedestrian traffic across the arcs are reported in Section G of the Supplementary Material.

Regarding the dependency on k, as this parameter increases, the distribution of pedestrian traffic on arcs becomes steeper and steeper along specific paths. In fact, in Fig. 2a, which represents a scenario where pedestrians are less influenced by path lengths, new arcs obtain a slightly higher traffic. On the contrary, Fig. 2c represents a scenario where pedestrians choose almost exclusively the shortest paths available. As a consequence, the distribution of pedestrian traffic becomes even steeper with few arcs standing out from the others. Although it is not possible to directly estimate the value of \(k\) from the available data, we showed that the resulting distribution of pedestrian traffic across the network remains qualitatively similar over a broad range of realistic \(k\) values. Therefore, we fix \(k = 20\) for all subsequent analyses, as it provides a reasonable balance and aligns with common assumptions about human movement patterns.

Furthermore, we assessed the robustness of our results with respect to measurement uncertainties in the pedestrian flows. In particular, we verified that the most trafficked paths remain stable when accounting for statistical fluctuations in the estimated fluxes. A detailed estimation of the statistical error on inter-building flows, and its propagation to other measures such as arc traffic, is provided in Section H of the Supplementary Material.

Fig. 2
figure 2

Analysis of pedestrian traffic. We added a chromatic scale on arcs of the network. The color indicates the daily average pedestrian traffic evaluated with our model. As colors go from blue to red the estimated traffic increases. As we can see from the chromatic representation at the bottom of the figure, the six chromatic intervals are not associated with equally long traffic intervals, as we utilized a logarithmic scale. This scale is the same in all networks and goes from 0 to the highest traffic found, 840.0. The networks were obtained by using a parameter k for the probability distribution equals to 5, 20 and 50, respectively. Moreover, the maximum intensities of daily pedestrian traffic among all arcs are 781.5, 794.3 and 840.0, respectively. In the fourth figure, we highlighted the regions where most of the pedestrian flows are distributed, where there is no WiFi connection and where the entrances-exits of the campus are located in red, grey and blue, respectively. In the last figure, we show the frequency distributions of pedestrian traffic for different values of k..

Time distributions of pedestrian traffic on the arcs

We also conducted a study on how pedestrian traffic on the arcs varies at different times of the day by identifying five temporal phases based on the variations in average building occupancy throughout a typical workday that are represented in Fig. 3f. The details of this preliminary study can be found in Section G of Supplementary material.

Fig. 3
figure 3

Analysis of pedestrian traffic during temporal phases and (f) time distribution of the total occupancy of buildings. The represented networks were obtained using the same procedure explained before and with \(k=20\). Indeed, the color indicates the average pedestrian traffic in a 20-minute interval. In particular, the logarithmic scale is upper-bounded by the maximum pedestrian traffic we found, which corresponds to the “Lunch break” phase. In particular, the maximum intensities of daily pedestrian traffic among all arcs are 5.57, 13.48, 18.02, 11.53 and 4.34, respectively.

The five phases are: the “early morning” from 6:00 a.m. to 9:00 a.m., the “late morning” from 9:00 a.m. to 11:40 a.m., the “lunch break” from 11:40 a.m. to 1:20 p.m., the “early afternoon” from 1:20 p.m. to 3:20 p.m. and finally, the “late afternoon and evening” period from 3:20 p.m. to 8:00 p.m. These are associated to the networks in Fig. 3, where, within each of these intervals, the pedestrian traffic on the arcs were normalized to be averaged among 20-minutes time intervals.

In Fig. 3f, we represented the total average occupancy calculated in each 20-minutes time interval. In particular, for each of these time intervals, we detected the number of WiFi users connected in each building and summed these across all buildings to obtain the total average occupancy. Each of the five phases, delimited by vertical dashed lines, reflects different occupancy behaviors that align with the expected frequency on campus.

Firstly, we notice that the pedestrian fluxes are significantly low in the first and last phases of the working day. In fact, as far as the movement of people is concerned, the first and last intervals are characterized by a gradual arrival at and departure from the campus, respectively and some of these movements are underestimated due to the lack of WiFi connection at the entrance of the campus. However, these movements typically occurs by public transport or by private cars, while most of the movements within the campus are performed by walking. During the second and fourth phases, there are peaks in occupancy due to class schedules, which commonly occur during these time intervals. The third phase shows a local minimum in occupancy because of the lunch break.

Aside from the previous general considerations, it is worth discussing specific traffic increases which are highlighted in Fig. 3. In particular, Fig. 3b,c show two regions highlighted in purple that experience high traffic, which are not so significant in the other phases. These two regions correspond to the two university dining halls, with (B) being the most frequented one. If we consider instead Fig. 3e, we can notice that the region (C) highlighted in brown is the one characterized by the highest traffic despite not being particularly relevant in the other phases. This can be explained by the fact that the north-wester portion of the campus consists of the sporting area where the majority of activities is carried out during the evening time.

As an additional analysis, we quantify the overall people occupancy and pedestrian flows in all five phases. The main results are that “Lunch break” and “Late afternoon and evening” are the two phases in which there is the largest intensity of pedestrian fluxes compared with the “stable” people occupancy of buildings. Furthermore, the phase “Early afternoon” is the one during which there are the fewest relative pedestrian movements. The numerical results on occupancy and pedestrian flows are reported in Section G of Supplementary material.

Entropy and information of the traffic network

The pedestrian traffic reconstructed in the previous sections was obtained from two types of information: the analysis of the pedestrian infrastructures connecting the buildings, with the arcs of the network with their length, and the WiFi data providing the fluxes between the different buildings. It is now interesting to analyze how much information about pedestrian traffic comes from the two contributions. In this perspective, we replicate our previous study without utilizing WiFi data. Specifically, instead of using WiFi measurements to associate the flux between each building pair, we assigned a constant flux that is independent of the chosen pair while maintaining the overall average number of moving pedestrians, so that the flux in each link should be related only to its centrality in the network42,43. This approach allows us to isolate the impact of the network topology on pedestrian traffic, separate from the influence of the WiFi data. Our results are reported in Section H of Supplementary material. We observed that, without using WiFi data, pedestrians are more equally distributed among all arcs.

We now quantify how much information we gained from the pedestrian infrastructure and form the WiFi data respectively, by evaluating the Shannon entropy of the pedestrian network fluxes44,45,46 which is defined as:

$$\begin{aligned} S = - \sum _{j=1}^{n}\tilde{p}_{j} \log \left( \tilde{p}_{j}\right) \ \ \ \ \ \ \ \ \ \ \ \ \tilde{p}_{j} = \dfrac{p_{j}}{\sum _{i=1}^{n} p_{i}} \end{aligned}$$
(4)

where \(n=521\) is the number of all arcs that compose the network and \(\tilde{p}_{j}\) is the probability that a generic walker is crossing the arc j. The maximum entropy is obtained when all the arcs are crossed with the same probability i.e. \(\tilde{p}_{j}=1/n\) and \(S_{\max }=\log (n) \simeq 6.256\), while the entropy is vanishing if all the pedestrian traffic occurs on a single arc. The natural logarithm is used for calculating the entropy.

We calculate the entropy first by using only the pedestrian network and then by introducing also the WiFi data. The results are plotted in Fig. 4, as a function of k .

Fig. 4
figure 4

Entropy of the network. We represented the entropy as a function of k both with (blue line) and without (red line) the WiFi data. We also represented the maximum value of entropy (dotted light-blue line). The error bars represent the standard deviation, calculated based on repeated simulations in which pedestrian flows between buildings were varied according to the expected statistical error. Further details are provided in Section H of the Supplementary Material.

The information gain in the two cases (difference with respect \(S_{\max }\)) is comparable. Therefore both the network structure and the WiFi data provide significant information to our estimate of the pedestrian traffic. As expected, the information gain is minimal for \(k=0\), which corresponds to the configuration in which pedestrians choose equally among all possible paths. Instead as \(k \rightarrow \infty\), which corresponds to the case in which pedestrians choose exclusively the shortest path, the entropy variation asymptotically reaches a maximum. In particular, for our case of interest \(k=20\), the entropy difference is reasonably close to its value at \(k \rightarrow \infty\).

Participation ratios in the traffic network

Based on the analysis from the previous section, some arcs in the networks experience heavy traffic primarily due to their central location, while other high-traffic arcs are involved in transporting between buildings with significant flux, independent of their position.

In the first scenario, numerous pedestrian fluxes contribute to the overall traffic of an arc, whereas, in the second scenario, a single flux tends to dominate. Distinguishing between these two cases is essential for predicting which paths will be most affected by changes in a single flux between two buildings. For instance, it is valuable to determine whether traffic in a particular arc can be reduced by limiting the movement from just one building.

In order to quantify these different behaviors, we introduce the participation ratio of the traffic of arc j. This is a typical tool used in the physics of localization47 and it provides an estimate of the number of couples of buildings that contribute to the traffic of the considered arc. In particular we define:

$$\begin{aligned} L_{j} =\dfrac{\left( \sum _{\alpha \ne \beta } p_{j}^{\alpha \beta }\right) ^{2}}{\sum _{\alpha \ne \beta } \left( p_{j}^{\alpha \beta } \right) ^{2}} \end{aligned}$$
(5)

where according to Eq. (2), \(p_{j}^{\alpha \beta }\) is the portion of pedestrian flux between buildings \(\alpha\) and \(\beta\) that is assigned to arc j. The sums run over all building couples. The participation ratio \(L_{j}\) varies from 1 to the number of building pairs \(\mathfrak {N}=210\).

Fig. 5
figure 5

Analysis of the participation ratio. We added a chromatic logarithmic scale on arcs network for \(k=20\). The color indicates the participation ratio, which corresponds to a measure of the number of building pairs that contribute to the traffic.

In Fig. 5, we depict the participation ratios of the 60 arcs with the highest pedestrian traffic, as it is most pertinent to examine the participation ratios for heavily trafficked arcs. The other arcs are shown in grey to clarify the network’s structure.

We observe that these high-traffic arcs exhibit a range of participation ratios, from those influenced by a single flux to arcs where pedestrian traffic originates from more than 10 different building pairs. Generally, the arcs with the highest participation ratios are found in the central areas of the network. In contrast, arcs with the lowest participation ratios usually connect buildings near the network’s boundaries, despite having significant pedestrian traffic.

Measures of robustness of the network

Whenever a network is involved for modelling a system, the network robustness42 can often be a concern. This is certainly true in the case of transportation networks, such as air routes48 and road networks49, which must ensure good connectivity even after network damage. This is also true for pedestrian networks and therefore, we studied how the network responds to perturbations such as the removal of single arcs.

As a first measure of network response to the removal of a single arc, we chose to sum the absolute values of all traffic variations in all arcs except the removed one; i.e.:

$$\begin{aligned} \Delta p_{u} = \sum _{j \ne u} \left| p_{j | u} - p_{j} \right| \end{aligned}$$
(6)

where \(p_{j | u}\) and \(p_{j}\) represent the pedestrian traffic on the arc j after and before the removal of arc u, respectively. The values of \(\Delta p_{u}\) are represented in the top network of Fig. 6.

Fig. 6
figure 6

First analysis of the robustness of the network subjected to the removal of single arcs. The colors in the first linear scale indicate the total pedestrian traffic variation, and as they go from blue to red the effect of removing an arc increases. The networks were obtained by using the parameter k equals to 20. The two bottom networks represent two opposite situations, (a) a localized and (b) a widespread traffic redistribution. The colors in the scale indicate the pedestrian traffic due to the removal of the arc u highlighted in black. The traffic originally assigned to the removed arc (a) is \(p = 419.2\) and to the removed arc (b) is \(p = 103.7\).

Despite being useful for gaining insights into the importance of each arc in the network, this global measure cannot discriminate between cases where traffic shifts to many arcs versus only a few arcs. However, in terms of network impact, these differences are extremely relevant. Although they have similar values of \(\Delta p\), two examples representing the opposite situations are shown in Fig. 6.

In particular, in Fig. 6a, the removal of an arc shows an high increase in traffic in some nearby arcs, which can lead to problems like overcrowding in those regions. In contrast, in Fig. 6b, the traffic increase is almost not visible using the same scale since the flux variation is dispersed in several arcs.

Therefore, if we are interested in predicting when large traffic increases may appear due to the closure of a arc of the network, it is better to define another quantity. In order to investigate the robustness of the pedestrian network, and specifically to predict possible crowding effects, we used the measure of the maximum traffic variation caused by the removal of a arc:

$$\begin{aligned} {\Delta p^{\max }_{u} = \max _{j \ne u} \left( p_{j | u} - p_{j} \right) } \end{aligned}$$
(7)

These results are represented in the top network of Fig. 7. From the latest figure, it is evident that some of the arcs adjacent to highly frequented buildings experience the greatest increase in traffic. Specifically, pedestrians redirect towards other directions immediately after leaving the buildings. This effect is shown in Fig. 7 which represents the traffic variation resulting from the removal of the two arcs that cause the highest maximum pedestrian traffic variation.

Fig. 7
figure 7

Second analysis of the robustness of the network subjected to the removal of single arcs. The colors in the first linear scale indicate the maximum pedestrian traffic variation, and as they go from blue to red the effect of removing an arc increases. The networks were obtained by using \(k=20\). The bottom figures represent examples of high traffic redistribution. In particular, they show the change in traffic after removing the two arcs whose removal cause the highest maximum pedestrian traffic variation. The removed arcs u are marked in black. The chromatic scale is based on the largest variations that we found. The traffic variations of figure (a) fall within the range \([-443.2, 793.8]\), while those of figure (b) fall within the range \([-327.1, 640.2]\).

The previous measures of robustness are mainly focused on the network structure and possible overcrowding. In this framework, we show that some arc removal may disperse the variation of pedestrian fluxes on several arcs; in this case it is possible that some pedestrians may face a significant lengthening of their path which is a different issue of network stability under arc removal. This new measure can be viewed in terms of travel cost, specifically as the additional time required for the journey, as has been also explored in studies related to vehicular traffic50. In this context we evaluate how many meters on the average the displaced people need to travel after the removal of a arc u; i.e.:

$$\begin{aligned} \Delta L_{u} = \frac{1}{p_{u}}\sum _{j \ne u} D_{j} (p_{j | u} - p_{j}) \end{aligned}$$
(8)

where \(D_{j}\) is the length of the arc j and \(p_{u}\) according to Eq. (2) is the number of displaced walkers when removing the arc u (i.e. the flux on the arc).

In Fig. 8 we represent the \(\Delta L_{u}\) for each arc of the network.

Fig. 8
figure 8

Analysis of the additional meters needed per person by the displaced people after the removal of single arcs. The colors in the linear scale indicate the length variation, in meters, of the new chosen paths for \(k=20\). The three arcs with the highest values are highlighted in order.

Contrary to what one might initially expect, many arcs have a negative \(\Delta L_{u}\). This is because, for a not excessively large value of k, such as \(k=20\), a non-negligible fraction of pedestrians are distributed on paths that are not the absolute shortest. In fact, we found that as k increases, all the negative \(\Delta L_{u}\) converge to 0.

We highlighted the three arcs whose removal cause the longest alternative paths for the displaced people. The additional meters \(\Delta L_{u}\) displaced people should cross to reach their destinations and the pedestrian traffic \(p_{u}\) of the three highlighted arcs in Fig. 8 are: \(\Delta L_{1}=457.8\), \(p_{1}=74.4\), \(\Delta L_{2}=364.3\), \(p_{2}=192.8\), \(\Delta L_{3}=358.1\) and \(p_{3}=81.5\). All other arcs have \(\Delta L_{u} < 250\).

The most critical arcs are placed close to a building so that the removal significantly elongate the path connecting this building to the rest of the network. Clearly this may have a relatively small impact on the variation of the pedestrian traffic if the relevant building is not involved in large fluxes.

Conclusions

We studied pedestrian movements on a university campus using WiFi data, which proved to be a valuable source of insight into pedestrian dynamics. WiFi data are easily available through public institutions, and they rely on relatively economic infrastructures that are typically already present in public spaces.

As we distributed pedestrian flows across all different paths, we employed a law (Formula 1) characterized by an exponential decay with respect to path length and a free parameter k. We assumed that this “temperature” parameter remains constant across all regions of the network. However, this assumption is generally not true, as k governs how people tend to choose one path over the other, and therefore, it may vary depending on the environment.

Considering these factors, the parameter k could be promoted into a vector, with each component associated with an observable of a given path. In addition to length, an example of a relevant observable is the angular variation of a path. However, the Formula 1 may also be influenced by factors that are either unrelated to the paths themselves or highly subjective. For instance, weather conditions and decreased visibility during nighttime can significantly impact pedestrian choices. Generally, the perceived aesthetic quality of the surroundings of a path can also influence pedestrian choices.

Besides the study on building occupancy and the time distribution of pedestrian flows, which aid in understanding which buildings serve primarily as pass-through structures and how pedestrian flows vary throughout a typical working day on campus, one of the primary outcomes of our research include the establishment of a pedestrian traffic to assign to each arc of the network. These values allow us to define a list of medium and high traffic arcs that need more attention in the walkability assessment to ensure an overall improvement of pedestrian accessibility to campus spaces and buildings. The results of the analysis identified the central area of the pedestrian network, delineated by the red perimeter in Fig. 2d, as the zone with the highest concentration of pedestrian flows. This trend remains consistent across all temporal phases of the day, even though overall values decrease. The predominance of flows in this area aligns with the spatial distribution of the buildings exhibiting the highest occupancy rates. Additionally, this portion of the network includes the primary stops of the local public transport service connecting the campus to the city centre and the railway station. These insights suggest that this area should be prioritized in future interventions aimed at enhancing walkability conditions, ensuring that infrastructure quality and accessibility standards are adequately met. A careful walkability analysis assessing the quality of urban environments and infrastructures for pedestrians, combined with the flow analysis reported in this study, could effectively support the identification of intervention priorities along the most critical segments of the network. Moreover, this approach could prove useful in monitoring the effects of temporary adaptations to the pedestrian infrastructure-such as maintenance work involving construction sites that alter the layout of the walking network-and in anticipating the resulting redistribution of flows, while ensuring that safe and accessible routes are always available for all pedestrians, including those with disabilities.

By taking inspiration from other fields, we also studied the participation ratio of the most trafficked arcs. This quantity allows to highlight the arcs where pedestrian traffic is most susceptible to the closure of a building.

Using the evaluation of entropy, we were able to estimate the amount of information gained from WiFi data. Furthermore, the analysis of the effect of removing an arc from the network could serve as a fundamental tool for predicting new pedestrian traffic patterns whenever a region of the campus is closed. This study was conducted both in terms of maximum traffic variation, considering the potential crowding effects that could arise, and in terms of the additional meters displaced people would need to travel to reach their destinations, focusing on the direct impact on individual pedestrians.

Finally, as previously reported, this work can be further extended by correlating the pedestrian traffic with quality assessments of footpaths. Indeed, there are numerous features of a footpath that could be considered when evaluating its overall quality. Some examples include the footpaths practicability and inclusivity, assessing whether regulatory requirements guaranteeing accessibility for all are met; safety, evaluating the protection from vehicular traffic; and comfort resulting from the suitability of the flooring, the attractiveness of the urban environment and the correct climatic design of spaces51. These quality assessments can be performed either in situ or by 3D mapping of the campus and, eventually, utilizing image recognition. In fact, the use of big data, and especially, deep learning is becoming more and more relevant in walkability studies52,53,54.