Introduction

Transport network design is a key determinant of travel accessibility. At the same time, accessibility is a key determinant of transport use1. It is therefore paramount to understand how network structures influence transport accessibility. In particular, urban sustainability development goals call for improving the accessibility and therefore access to opportunities provided by public means of transport.

The classical definition of accessibility is the Hansen definition as travel impedance between spatially dispersed opportunities for activity2. In contrast to other more technical measures, such as robustness or vulnerability, common across various fields of engineering3, accessibility is specific to the spatial movements of people, and thus to disciplines such as geography and transport science, and is often perceived as a “soft” measure. The Hansen definition conceptually captures these characteristics and, while it is qualitatively well-founded, it lacks quantitative rigour to guide formal analysis. Consequently, concrete operationalisation and the formulation of related indicators differ significantly across analyses, often relying on ad-hoc interpretations of the devised indicators. This dispersion has made it challenging to devise a unified and generalisable framework for accessibility analysis, leading to the recent recognition of the need to unify accessibility studies4,5,6.

Accessibility is essentially about connectedness - the ability to reach various destinations within a certain time budget. A complex network perspective arguably offers a promising framework to establish a rigorous and unified analysis. Public transport systems can naturally be modelled as networks, and the methods of network science have been previously applied to study their topological structure7,8,9. Past works have often focused on network science applications for studying the vulnerability and robustness of public transport networks10,11,12,13,14.

Recently, a complex network approach has been gaining importance also in accessibility studies. Previous studies employ standard topological measures as accessibility indicators. In particular, topological measures such as average path length or network (in)efficiency at the global level15,16, or metrics related to closeness centrality at the node level17 have been proposed as indicators of accessibility. All of the aforementioned proposed network-level measures are based on node-level averages. In addition, analyses often adopt a strictly topological approach, discarding the infrastructural and operational properties of public transport systems, such as in-vehicle and waiting times, which ultimately influence their use. These properties are direct outcomes of network design decisions such as line configuration, service frequencies and travel speeds. Moreover, there are several standard graph representations, reflecting either the underlying infrastructure or operational characteristics, since reconciling all aspects in a single representation is non-trivial.

In this work, we approach the problem from a complementary perspective, asking first how accessibility can be qualitatively redefined in terms of reachability within well-connected parts of the network. Specifically, our objective is to identify parts of the network where (almost) all nodes are reachable from one another within a given maximum allowed travel time. To this end, we adopt the notion of an access graph \(\mathcal {G}_A\)18. Its node set represents the public transport stops, and its edge set is constructed by calculating the generalised travel time (GTT) matrix and connecting all pairs of nodes i and j for which GTT(ij) is within a certain cut-off time \(t_c\). GTT is comprised of in-vehicle and waiting time, together with time-equivalent transfer penalty. The edge set of the access graph is thus \(t_c\)-dependent and \(\mathcal {G}_A\) gradually evolves from the empty graph at \(t_c=0\) to a complete graph when \(t_c\) reaches the maximum travel time in the network.

To formalise this view of accessibility, we examine several maximal subgraphs of the access graph and their evolution with increasing cut-off time. The strictest condition on connectedness is imposed by the maximal clique, where each pair of nodes shares an edge. A looser condition is that of a k-core, defined as a maximal subgraph where each node has a degree of at least k. Note that the access graphs and all its subgraphs used in our analysis are unweighted and undirected. These allow us to define reachability levels by setting predetermined values of k, for example observing the part of the network where at least \(x\%\) of all the stops (of the original network) can be reached from each node. Specifically, we observe the \(x=25,50,75\) xk-cores in increasing order of stringency. Each such k-core emerges at a specific GTT threshold \(\tau _k\) in the evolution of \(\mathcal {G}_A\).

The analysis of (maximal) cliques in transport networks in the literature is hitherto relatively scarce. Some interesting works include19 where the authors consider urban statistical areas as nodes in the graph that are connected by commuting flows. The identification of maximal cliques in this case corresponds to clusters of employment nodes. In20, contact networks of public transport passengers are constructed and cliques of passengers are identified based on similarity in journey and transfer patterns to identify the high-risk journeys in epidemic spreading. In addition,21 consider the topological characteristics of the Chinese airline network, focusing on the identification of cliques to identify densely connected parts of the network.

We note that the analysis of k-cores, especially k-core decomposition methods, is often used to examine node importance in networks (e.g.22) or the hierarchical structure of networks (e.g.23). k-core decomposition methods have been used in transport research, e.g. for studying network vulnerability24, finding the backbone structure of the network25, and airline network evolution26. The methods used are based on the main core, i.e. the k-core with the largest k. In contrast to the existing analyses, in our approach we consider fixed values of k that reflect the desired levels of connectedness.

We trace the evolution of the selected subgraphs by progressively increasing the cut-off time. The process is visualised in Fig. 1c and d, where the access graph and the considered subgraphs are visualised for six increasing values of \(t_c\). Specifically, the growth of the maximal clique as the most stringent condition on connectedness is illustrative (nodes and edges shown in red). We introduce two complementary sets of accessibility indicators related to the evolution observed for each network: cumulative accessibility indicators \(S^{(H)}\) and threshold indicators \(\tau ^{(H)}\). Cumulative indicators are defined as integral under the curve \(N^{(H)}(t_c)\), where \(N^{(H)}\) is the dimension of the subrgaph at \(t_c\). Threshold indicators \(\tau _{(H)}\) reflect a complementary perspective and measure the cut-off time at which the k-cores for fixed values of k first occur. Intuitively, the networks where the subgraphs grow fast early on in shorter time intervals can be regarded as more accessible. The cumulative and threshold indicators correspond to the notions of primal and dual accessibility, respectively27.

To allow for the construction of indicators that enable meaningful comparison across networks, we contrast the concept of an idealised network on the set of nodes, each with given geographical coordinates, similar to28. This allows us to incorporate geographic and topological scaling into the indicator construction. The introduced measures are then normalised to their idealised counterparts to allow for reliable accessibility indicators.

We perform the analysis for 42 metro networks worldwide and compare their accessibility levels. We illustrate our approach in Fig. 1 where we exemplify our analysis for the cases of Paris and London metro networks. The networks are similar in size and are among the largest networks in the dataset. Sub-figures (a) and (b) show the growth dimensions versus \(t_c\) for the studied subgraphs. The evolution of the access graphs and their maximal cliques and kx-cores is visualised in Fig. 1c, d. The curves for the maximal clique of the actual and idealised networks are shown in figures (e) and (f). Visually, the networks where the curves are closer to the idealised curves, corresponding to smaller gray areas in the plots, are more accessible. In this example, the London metro network has significantly higher accessibility than the Paris metro.

The accessibility indicators proposed in this study are compared to commonly used topological indicators, as well as to traditional indicators from transport geography. The implications of this approach for transport planners are illustrated on a case study of the Stockholm metro, where different developments of the network are simulated and the changes in accessibility levels are measured. We discuss the implications of the proposed framework for future accessibility studies.

Fig. 1
Fig. 1
Full size image

Illustration of the subgraphs and related indicators for the London and Paris metro networks. Plots (a) and (b) show the dimension of the subgraph with respect to \(t_c\). The red markers show the dimension of the maximal clique vs. cut-off time \(t_c\). Dark, medium and light blue markers show the dimensions of the k25, k50, and k75 cores, respectively. Plots (c) and (d) show the evolution of the access graph (grey), k25-core (dark blue), k50-core (medium blue), k75-core (light blue) and maximal clique (red), together with the idealised maximal clique (yellow) for London and Paris networks, respectively. The cumulative accessibility indicators used in the analysis are shown in plots (e) and (f) for the respective networks. Dimension of the idealised maximal clique is shown in black markers. We set the upper integral boundary to \(\min (t_{max}, 2\cdot t_{max}^{ideal})\), where \(t_{max}\) is the maximum travel time in the network and \(2\cdot t_{max}^{ideal}\) is twice the maximum idealised travel time. The minimum is taken, as in some networks \(t_{max} < 2\cdot t_{max}^{ideal}\). This happens in the London network, while the \(2\cdot t_{max}^{ideal}\) threshold is shown in (f) for Paris. The cumulative accessibility indicator represents the shaded red area relative to the shaded grey area. The latter represents the idealised cumulative accessibility against which the actual cumulative accessibility is measured. The values of selected indicators for both networks are: \(S^{(MC)}=0.76\), \(S^{(k50)}=0.77\), \(\tau _{(k50)}=0.34\) for London; and \(S^{(MC)}=0.53\), \(S^{(k50)}=0.57\), \(\tau _{(k50)}=0.50\) for Paris.

Results

We construct access graphs of 42 metro networks worldwide and examine the evolution of their maximal subgraphs with progressively increasing cut-off time. For each network, we observe four characteristic subgraphs H: the maximal clique and xk-cores for \(x\in \{25,50,75\}\). For accessibility measurement, we follow the same procedure for the idealised network to give a meaningful reference for actual accessibility levels. Sect. 2.1 presents the results for the idealised scenario and sets the scene for the results of the existing networks. The results for the actual networks and their comparison to their idealised counterparts are the subject of Sect. 2.2. Comparative assessment of the cities based on the proposed accessibility indicators is presented in Sect. 2.3. In Sect. 2.4, the proposed indicators are compared to the conventional accessibility indicators.

Idealised network

To be able to meaningfully assess the level of accessibility and compare networks of differing topological and geographical dimensions, a notion of an ideally accessible network is needed. In the absence of which, one compares to zero travel times between all origins and destinations which is not only unrealistic but also distortive when comparing networks of different scales. Public transport networks are spatial networks, and the physical constraints of distance coupled with allowed travel speed pose an upper limit to possible accessibility levels. The problem is formally stated as: given a set of N nodes, each with geographical coordinates (latlon), what is the theoretical upper limit of accessibility levels?

The problem is nontrivial and even in the idealised case requires a set of assumptions. The maximum speed must be set so that velocity and acceleration are safe for passengers. Physical constraints also determine the maximum capacity, so a finite frequency must be set.

Even with unlimited resources, designing a realistic (i.e. satisfying the physical constraints) optimally accessible network requires several stages of planning, such as line planning and frequency setting, rendering this an NP-hard problem. We therefore approach the problem from a strictly theoretical perspective. We assume that each origin-destination pair can be travelled directly and in a straight line. We then set typical values of average speed and waiting time (proportional to inverse frequency) for all connections in all networks. The construction of the idealised network is described in more detail in the Methods Sect. 4.2. The maximum and mean Haversine distances, together with the corresponding idealised travel times, are shown in Table 1. Maximum distances vary by an order of magnitude, spanning from \(\approx 7.5km\) (Marseille) to \(\approx 75km\) (San Francisco), with the average maximum distance of \(\approx 26km\) across all networks. Mean distances are highly linearly correlated with maximum distances (\(r=0.94\)). Figure 2 shows the scatter plot of maximum Haversine distance \(\ell ^{max}\) and number of nodes N. The relationship is approximately linear with a few outliers that we comment on here. On one end, the most notable outlier is the San Francisco network, where 50 stops cover a maximum distance of more than 75 km. The Valencia network is another outlier in the same direction. On the other end, the New York and Paris networks cover a comparatively small geographic area relative to the number of nodes in the network. The difference stems from the metro systems covering different areas in different cities: in some cases only the urban area (e.g. Paris, whereas the RER extends further into parts of Ile-de-France), while in other cases they reach far into the outskirts (e.g. San Francisco and parts of the Bay area).

Table 1 Key statistics of geographical distances and idealised travel times for all cities. N: number of nodes; \(\ell ^{max}\): maximum Haversine distance in the network; \(\bar{\ell }\): average distance; \(t^{max}_I\): maximum idealised travel time; \(\bar{t}_I\): average idealised travel time.
Fig. 2
Fig. 2
Full size image

Scatter plot of maximum Haversine distance \(\ell ^{max}\) vs. number of nodes N for all cities.

Comparison of real and idealised network behaviour

We construct the access graphs for the metro networks and their idealised counterparts. The growth of the examined subgraphs in both cases is shown separately for each city in Figure 3. Similar growth patterns are observed across all networks. The maximal clique dimension is a continuous monotonic function of \(t_c\) that increases from \(N^{(MC)}=1\) at \(t_c=0\) to \(N^{(MC)}=N\) at \(t_c=t^{max}\). Conversely, kx-core dimension \(N^{(kx)}\) is 0 until a certain threshold time \(t_c=\tau _{kx}\), and emerges with \(N^{(kx)}\ge kx\) at \(\tau _{kx}\) and monotonically increases after that. The difference between the growth of the actual and idealised cliques, as visually demonstrated by the relative difference of the areas under the violet (idealised) and orange (actual) curves, is examined in more detail in the next Sect. (2.3).

Fig. 3
Fig. 3
Full size image

Comparison of of kx-cores and maximal cliques dimension growth vs. cut-off time \(t_c\) for the idealised and actual metro networks for all cities. Green (kx-cores) and violet (maximal clique) curves represent the idealised network, and blue and orange curves represent the actual network in each subplot.

To understand the differences in maximal clique and k-core growth behaviour between realised and idealised networks, we examine possible sources of discrepancy. In the idealised network, all distances can be travelled directly. In real networks, this is - either due to network infrastructure or line planning - generally not the case, and the large majority of connections have a larger actual distance. The network indicator encapsulating this behaviour is circuity, which measures the pair-wise discrepancy between the network distance \(d_G(i,j)\) and the straight-line distance \(d_E(i,j)\) in terms of their ratio (Eq. 2 in Methods)29,30. The average circuity over all node pairs is the corresponding global metric (Eq. 3 in Methods).

The average circuity of the idealised network is \(C=1\) by design since it means that all connections are direct without requiring any detours. The average circuity of the realised network is thus a measure of discrepancy from the ideal case. Since we only have data on travel times (and not on the exact shape of physical paths), we consider two measures of distance: in-vehicle time for infrastructural circuity \(C^I\), and waiting time for operational circuity \(C^O\). The values of \(C^I\) and \(C^O\) for all networks are given in Table 2. The relationship of infrastructural and operational circuity to accessibility measures is presented and discussed in Sect. 2.3.

Table 2 Circuity metrics across metro networks.

For all networks, the growth of the idealised network subgraphs consistently occurs significantly earlier in the time interval than that of the actual network. In addition, consistent patterns in the shapes of kx-core and maximal clique growth for each network point to universal behaviour of accessibility. A quantitative examination of the discussed behaviour will provide the means for comparative assessment of networks in terms of accessibility levels in the following section.

Accessibility measures

Cumulative accessibility indicators are defined as integrals under the subgraph dimension curves of the actual network, relative to the idealised network (see Sect. 4.4 for more detail). To relax the idealised accessibility assumption, the integrals are evaluated up to twice the maximum idealised travel time, \(2t_I^{max}\). For some networks, this value is larger than the actual maximum travel time. Specifically, this happens for the Cairo, Copenhagen, Dubai, Hyderabad, Lille, London, Milan, Prague, San Francisco, Santiago, Valencia, Warsaw and the Washington DC networks. The upper bound T of the integral is thus determined as \(T = \min (t^{max}, 2\cdot t_I^{max})\). This is illustrated in Fig. 4. Similarly, the threshold indicators \(\tau _{(kx)}\) are defined relative to T.

Fig. 4
Fig. 4
Full size image

Comparison of maximal clique growth for the ideal (black) and original (red) networks for each city. For indicator construction, we integrate up to twice the maximum idealised time (grey markers). If the maximum travel time in the original network is smaller than twice the idealised value, we integrate to maximum travel time (e.g. Cairo, Dubai, London). This is the case for 11 of 42 networks, i.e. about a quarter.

The rankings with the color-coded normalised indicator values for the full dataset are shown in Fig. 5. Note that higher values of the cumulative accessibility indicators mean higher accessibility, and vice versa for the threshold indicators. The five most accessible networks are London, San Francisco, Valencia, Bilbao and Dubai. Note that all of the networks, except Bilbao, fall in the \(T=t_{max}\) category, i.e., have relatively low actual maximum travel times compared to the idealised case, and also in the idealised case the maximal clique growth has a distinctively slow convergence compared to most other networks (Figs. 3 and 4). At the lower end, Buenos Aires, Marseille, Philadelphia, Kobe and Oslo score the lowest on the proposed accessibility indicators. All five networks are characterised by a steep growth of the maximal clique dimension in the idealised case and a slow growth in the actual network in the beginning of the interval, resulting in larger discrepancies between the existing network behaviour and that of its idealised version (Fig. 4).

Fig. 5
Fig. 5
Full size image

Color-coded city rankings for the proposed indicators. The left-hand side shows the results for the four cumulative accessibility indicators \(S^{(MC)}, S^{(k25)}, S^{(k50)}, S^{(k75)}\), and the right-hand side shows the three threshold indicators \(\tau _{(k25)}, \tau _{(k50)}, \tau _{(k75)}\). For cumulative indicators, the city is ranked higher (i.e. has lower ranking score) in terms of accessibility for higher values of indicators. Conversely, lower values of time variables indicate higher accessibility. \(\tau _{(kx)}\) values were multiplied by \((-1)\) to achieve consistent ranking.

We observe that for most of the networks, the rankings based on different indicators tend to be overall in agreement. This is also confirmed by the finding that all indicators are strongly correlated (Fig. 6) with positive correlations within each type of indicators - cumulative accessibility and threshold-based - and strong negative correlations between those, as expected. Notwithstanding, for a subset of networks, the rankings vary significantly between indicators. Among those are the Paris network, where kx-core indicators rank it significantly higher than the maximal clique indicator, whereas the pattern is reversed for the Copenhagen network. As can be seen in (Fig. 6), the correlations are the strongest for the maximal clique and k50-core cumulative indicators (\(r=0.96\)). Linear correlations are the highest for pairs of complementary indicators \(S^{(xk)}\) and \(\tau _{xk}\) at the same values of x (\(r=-0.90\) for \(x=25\); \(r=-0.96\) for \(x=50\); \(r=-0.92\) for \(x=75\)). The correlation strength remains high across all combinations, but decreases for pairs of cumulative and threshold indicators with differing values of x (\(-0.82\) to \(-0.89\) for consecutive values of x and \(r=-0.73\) - \(r=-0.76\) for pairs with \(x=25\) and \(x=75\)). The cross-correlations of the cumulative indicators at different values of x are the lowest for \(S^{(k25)}\) and \(S^{(k75)}\) (\(r=0.81\)). Compared to \(x=50\), the two subgraphs for \(x=25\) and \(x=75\) are more sensitive to specific conditions at lower and upper boundaries of the time interval that might differ for each network. The k50-based indicators are therefore deemed the most stable. The overall results suggest high regularity in maximal clique and k-core growth behaviour across all networks, pointing to universal patterns in network accessibility.

Fig. 6
Fig. 6
Full size image

Pearson correlation matrix of the indicators. \(S^{(H)}\): cumulative accessibility indicators; \(\tau _{(H)}\): threshold accessibility indicators. Blue and red shades represent positive and negative correlations, respectively.

To further examine the relationship between accessibility measures and circuity (Sect. 2.2), we observe their correlations (Fig. 7).

Fig. 7
Fig. 7
Full size image

Pearson correlation matrix of the indicators and circuity measures. \(S^{(H)}\): cumulative accessibility indicators; \(\tau _{(H)}\): threshold accessibility indicators. \(C^I\): infrastructural circuity; \(C^O\): operational circuity. Blue and red shades represent positive and negative correlations, respectively.

Correlations imply that the more circuit the network is the less accessible it is, in agreement with expectations, i.e. higher circuity indicates lower accessibility. The correlations are stronger for infrastructural circuity, implying the greater impact of infrastructure on accessibility levels compared to operational characteristics. Specifically, we note that the networks with the highest values of \(C^I\) - greatest circuity - are Naples, Buenos Aires and Oslo (Table 2) are those that rank among the lowest few in Fig. 5. Conversely, the cities with the lowest values of \(C^I\) - San Francisco, Atlanta and Copenhagen, do not per-se, with the exception of San Francisco, correspond to the highest ranked cities. For operational circuity, an interesting observation is that London has by far the lowest average waiting time and is also overall the most accessible network according to the indicators introduced here. On the other hand, San Francisco has one of the highest values of \(C^O\). At the same time, it is an outlier in terms of infrastructural circuity values with \(C^I=0.9\), i.e. less than one. In the construction of idealised networks, the speed on links is taken to be constant across all networks. San Francisco is an outlier in its geographical diameter (approx. 75 km) and the actual speeds on (long) rail segments can be significantly higher than this value.

The access graph and three of the subgraphs (maximal clique, k25-core and k50-core) at a characteristic value of \(t_c=\tau _{k50}\) are visualised in Fig. 8. The networks are ranked and their subplot borders are color-coded based on the \(S^{(MC)}\) indicator. The most accessible networks have maximal cliques of comparable dimensions to k50-cores at the latter’s emergence (compare also with the plots in Fig. 3). Moreover, the cliques are geographically relatively dense, while the spatial density is visibly low for the least accessible networks. Compare also the L-space layout with circuity values in Table 2.

Fig. 8
Fig. 8
Full size image

Visualisation of subgraphs at \(\tau _{k50}\) for each metro network. Dark blue: k25-core; light blue: k50-core; red: maximal clique. L-space infrastructural representation is shown in a figure insert for each city. Cities are sorted based on their maximal clique cumulative accessibility indicator \(S^{(MC)}\) and the subplot borders are color-coded based on the value of this indicator.

The geographical distribution of the cities included in the analysis is shown in Fig. 9.

Figure 9
Figure 9
Full size image

Geographical distribution of the studied networks with color-coded accessibility levels based on the \(S^{(MC)}\) indicator. The ranking and the scale is the same as in Fig. 5. Two areas with the highest density of cities included in the analysis are zoomed in in the map.

Relation with conventional indicators

Topological indicators

Accessibility in general, and the topological indicators we consider in particular, rely on the identification of shortest paths. We therefore investigate their relation with commonly used topological indicators which are closely related to the notion of shortest paths: network efficiency E; average shortest paths \(\langle SP\rangle ^L\) for in-vehicle time-weighted L-space representation, \(\langle SP\rangle ^P_w\) and \(\langle SP\rangle ^P\), for weighted and unweighted P-space representation, respectively. Similarly, we consider the network diameter d in all three representations. All measures are normalised to their counterparts in the idealised network (see Sect. 4.5 for definitions).

The correlation matrix is shown in Fig. 10. The correlations are the highest for efficiency and weighted P-space indicators. Note that the definition of distance in the calculation of efficiency is GTT, Generalised Travel Time, i.e. the same metric that we use in the construction of the access graph.

Similarly, correlations with average shortest paths are moderate, especially for the weighted representations. The correlations with shortest paths and diameter are negative since longer shortest paths reflect lower accessibility. The correlations are generally the highest for the weighted P-space, followed by weighted L-space and unweighted P-space, reflecting the influence of the service properties contained in the definition of GTT, used to construct the access graph (Eq. (4)). Insignificant to low correlations with network dimension N are expected due to the scaled construction of indicators using the notion of idealised accessibility.

Fig. 10
Fig. 10
Full size image

The Pearson correlation matrix for accessibility indicators: cumulative indicators \(S^{(H)}\) and threshold indicators \(\tau {(xk)}\). Blue and red shades represent positive and negative correlations, respectively.

Geographical indicators

Accessibility is a mature research area within transport geography. The classical indicators are based on the Hansen definition as the ease of travel between spatially dispersed opportunities for activity2. A standard operationalisation of this notion is the cumulative opportunities measure of accessibility which measures the number of reachable opportunities within a given threshold time (definitions in Eq. 9 and Eq. 10 in Methods).

In the access graph, this corresponds exactly to node degree \(k_i\) at the threshold time \(t_c\). To compare and contrast the introduced indicators to classical measures, we examine the average node degree in the access graph (i.e. the cumulative opportunities accessibility) for a few select values of \(t_c\): \(t_c=\{30, 45, 60\}\) minutes. Additionally, we consider a few values of \(t_c\) relative to the network-specific maximum GTT,\(t_{max}\): \(t_c=\{\frac{1}{3}, \frac{1}{2}, \frac{2}{3}\}t_{max}\).

The correlations of the introduced indicators to absolute-time values are low and negative (Fig. 11). This is in line with expectations since the geographical areas and thus the physical constraints on travel ability vary significantly between cities and the introduced indicators are designed to allow for meaningful comparisons across networks of different scales. The correlations with time points relative to maximum time are moderately positive which is in agreement with our formulation of accessibility as the one realised by the infrastructure network and the super-imposed service plans compared to the system’s geographical constraints.

Fig. 11
Fig. 11
Full size image

The Pearson correlation matrix for accessibility indicators: cumulative indicators \(S^{(H)}\) and threshold indicators \(\tau {(xk)}\). Geographical indicators are the average degrees of the access graph at different time points: \(t_c=\{30, 45, 60\}\) minutes and \(t_c=\{\frac{1}{3}, \frac{1}{2}, \frac{2}{3}\}t_{max}\). \(\langle k\rangle _{30}\) is the average degree at \(t_c=30\) min, and \(\langle k\rangle _{t_{max}/3}\) is the average degree at \(t_c=\frac{1}{3}t_{max}\) (and similarly for others). Blue and red shades represent positive and negative correlations, respectively.

We stress however that the indicators introduced in this work reflect the temporal evolution of the accessibility levels, and are thus qualitatively different from the cumulative opportunities definition. The cumulative indicators \(S^{(H)}\) proposed in this work can be considered to represent “cumulative cumulative accessibility”, i.e., at a higher level of aggregation than cumulative accessibility in its traditional sense.

Illustration

For an illustration of the practical meaning of the introduced accessibility indicators, we investigate the impacts of different developments for the case of the Stockholm metro. We then compare accessibility measures of the modified networks with the original network. The Stockholm metro currently has \(N=101\) nodes and operates \(L=3\) lines and \(R=7\) routes (Fig. 12). It scores relatively low in accessibility rankings (30 to 37 out of 42 across all measures). It has large infrastructural circuity as can be seen from its topology: a central core to the east, and cross-center radial routes spread out to the west, the nodes on which are geographically relatively close but topologically far apart.

Fig. 12
Fig. 12
Full size image

Stockholm L-space with colour-coded routes.

For the sake of illustration, we consider the following three scenarios: (i) increasing (doubling) the frequencies on all lines in the network (refer to it as 2F); (ii) introducing a new line (Line X, Fig. 13), and; (iii) combined previous two points (2F + Line X).

Fig. 13
Fig. 13
Full size image

Stockholm L-space with added “Line X” (shown in red).

We perform the complete access graph analysis on the modified networks and compare the accessibility measures in each case with the values obtained for the original network. The comparison of maximal clique growth in all cases is shown in Fig. 14.

Fig. 14
Fig. 14
Full size image

Comparison of maximal clique growth with simulated improvements to the metro network.

The improvements in the \(S^{(MC)}\) indicator are comparable for the 2F and Line X scenarios, and both show considerable improvement in overall network connectedness for values of GTT above 20 minutes. This coincides with the time needed for the maximal clique “outskirts” to reach the nodes that are crossed by the new line. After that the improvement is rapid and the Line X scenario outperforms the 2F scenario considering the maximum GTT. The combination of both modifications to the network amounts to another significant improvement in accessibility levels. The values of all indicators in all scenarios are summarised in Table 3.

Table 3 Values of indicators for different scenarios.

Discussion

This study presented a topological formulation of public transport accessibility by focusing on several characteristic subgraphs of the access graph. The proposed approach studies global accessibility levels beyond averaged node-level measures by tracing the emergence and evolution of well-connected regions of the network with increasing maximum allowed travel time. This framework amounts to a re-conceptualization of accessibility that offers several novel insights and opens promising ways for future analysis. Our approach is consistent with the popular notion of the X-minute city31 and thereby facilitates related investigations of accessibility using different travel modes and in relation to various amenities.

The maximal clique represents the most stringent condition on connectedness, and is at the same time the most interpretable measure. The high correlations between maximal clique and k-cores measures suggest that k-cores analysis suffices for the presented accessibility analysis, especially the k50-core indicator. This is crucial when applying the analysis to larger networks due to tractability issues as finding maximal cliques is an NP-hard problem. In contrast, algorithms for finding k-cores have polynomial time complexity32. However, this correlation should first be established when applying the analysis to other modalities. Highly effective methods for finding cliques should be adopted for scalability to larger networks (e.g.33).

Moderate to strong correlations with efficiency and shortest paths strengthen the relevance of the proposed indicators that are in line with current measures. The main strengths of our approach compared to traditional approaches is twofold: first, identifying the subgraphs offers a detailed view of accessibility going beyond simple averages. This offers an interpretable operationalisation of accessibility and allows for informed planning and optimisation of public transport networks. Second, by using the notion of the idealised network on the same set of (geographically positioned) nodes, it offers an approach that scales with the network geographic and topological size.

These findings warrant a rethinking of accessibility through the lens of complex networks. More importantly, and with broader implications, the observations support the need for a methodological shift: rather than adapting existing network metrics to transport concepts, relevant transport properties should guide the development of suitable graph-theoretic measures.

Next to these theoretical implications, the introduction of the access graph and its subgraph-based accessibility measures can serve transport planners in designing highly accessible networks. The proposed methodology allows for simulating changes in infrastructure (e.g. adding new stops or links) as well as service planning (e.g. introducing new lines and modifying service frequencies). The illustration of this approach on the Stockholm metro network offers a preliminary confirmation of its feasibility, and such simulations of alternative developments can guide decision making when planning public transport developments.

In the following, we outline the limitations of this study, and suggest venues for future research. We first note that in transport geography, accessibility is understood in relation to different amenities. In the present framework, we focus on the travel impedance component of accessibility, and simply consider the number of reachable nodes in the PT network as a proxy for number of opportunities for different activities. Connected to the previous point, actual land use or register data are not used in the model, and there is no estimate on the actual number of opportunities or their differentiation. This view of accessibility thus only addresses the efficiency of the existing system. Including the coverage, and number of different opportunities data in terms of node weights seems a promising direction for future work.

In terms of urban PT studies, to gain a more complete picture of accessibility, it will be necessary to consider other PT modes. In this work, we only consider metro systems that have a specific role in the urban PT system: compared to bus or tram networks, metro has high speed, large capacity and direct connections, and carries a significant percentage of passenger flows in multi-modal urban PT systems. However, due to its infrastructure, it has comparatively fewer stations and less spatial coverage than bus networks.

On the travel impedance side, the construction of the access graph is based on the cut-off function and is an undirected unweighted graph. Adding edge weights and using directed edges would give a more realistic view of accessibility. The maximal weighted clique problem would be interesting to apply (e.g.34). In addition, a detailed analysis of the impacts of each component of GTT on accessibility levels will offer a deeper understanding of the interplay between PT network infrastructure and operation on access.

On the methodological side, the maximal clique is the strictest constraint and defines a single core part of the network. This is oblivious to any other well-connected parts of the network. A complementary perspective would be the analysis of all k-cliques, or a deeper analysis of network geometry and the forces shaping the network topology by examining smaller cliques and their interacions (similar to e.g.35). Especially in larger cities, there might exist several smaller well-connected parts, which would also be interesting to see with respect to the idea of the X-minute city. Community detection is another promising alternative to predefined subgraphs.

Next to those mentioned above,

this framework opens several lines of further inquiry. We propose the identification of realistic maximally accessible networks, given the physical constraints as well as incorporating the cost of resources, emissions and energy consumption of operating the network, and accounting for (expected) passenger demand as a promising venue for future research. The consideration of metro systems within multi-modal public transport systems is another promising direction, and a similar analysis for multilayer networks should be performed to assess accessibility of integrated PT networks. In addition, an integrated framework for analysing accessibility and robustness will reveal the interplay between the two properties. Specifically, measuring the difference in accessibility levels in terms of the proposed subgraph indicators under failures on either nodes or links emerges as a promising direction.

Methods

Datasets and software

The access graphs are constructed using the curated datasets of L- and P-space representations of 51 metro networks36,37. In the L-space representation (also known as the Infrastructure space,38), there is an edge if the corresponding stops are consecutive stops on a route. In P-space (also known as the Service space,38), each route is a clique, effectively counting the legs of a journey based on transfers. The edges in L- and P-space are weighted with in-vehicle and waiting times, respectively. Of the 51 networks included in the datasets, the 42 networks that have more than one operating line are considered for analysis. The networks included in our empirical analysis vary significantly in dimension, from \(N=26\) (Kobe) to \(N=421\) (New York).

The code is written in Python. The NetworkX library39 is used for most of the analysis and the NetworKit library40 is used for intensive clique computations.

Idealised network

For a reliable and meaningful comparison across networks of varying dimensions, we use the notion of the idealised network on the given set of nodes, each with geographical coordinates (latlon)28. The idealised network is the complete graph on the given set of nodes that connects each pair of stops in the network using a straight line.

The idealised network is constructed as follows. First, we calculate the matrix of Haversine distances \([\ell ]_{ij}\) between all pairs of nodes. The distances are divided by the average speed \(\bar{v}\), and the averege waiting time, equal to half the headway, is added to obtain the idealised generalised travel time \(GTT^{ideal}\):

$$\begin{aligned} GTT^{ideal}_{ij} = \ell _{ij} / \bar{v} + t^{wait}, \end{aligned}$$
(1)

We set fixed values of \(\bar{v} = 45 km/h\) and \(t^{wait}=5\) min - corresponding to a frequency of 12 vehicles per hour - for all networks. The values are chosen based on the typical specifications of the real-world metro systems included in our dataset. The edge weight \(w_{ij}\) between the nodes i and j is thus the value of \(GTT^{ideal}_{ij}\).

Comparison with realised networks

In the idealised network, connections of all pairs of nodes can be travelled directly by design. To quantify the deviations from the idealised case and measure the pair-wise discrepancy between the network distance \(d_G(i,j)\) and the straight-line distance \(d_E(i,j)\), we consider network circuity, defined as:

$$\begin{aligned} C_{ij} = \frac{d_G(i,j)}{d_E(i,j)}. \end{aligned}$$
(2)

The value of circuity \(C=1\) means that the connection is direct and larger values of circuity mean larger detours. The average circuity of the network is the average over all pairs:

$$\begin{aligned} C = \frac{1}{N(N-1)}\sum _{i,j}C_{ij}. \end{aligned}$$
(3)

Since the exact shapes of paths and the actual distances travelled are not given in our data, we consider two measures of distance and their corresponding circuities:

  • Infrastructural circuity \(C^I\), where the distances \(d_G(i,j)\) and \(d_E(i,j)\) represent in-vehicle times in the realised and idealises network, respectively.

  • Operational circuity \(C^O\), where the distances \(d_G(i,j)\) and \(d_E(i,j)\) represent waiting times in the realised and idealises network, respectively.

Access graph construction

The access graph \(\mathcal {G}_A\) is constructed by calculating generalised travel times (GTT) between all pairs of nodes (stops) in the network and directly connecting the nodes that are reachable within a given cut-off time \(t_c\). The edge set of \(\mathcal {G}_A\) is thus dependent on the external cut-off parameter \(t_c\), gradually increased from 0 to maximum GTT. We use steps of 2 minutes in the analysis.

Generalised travel times are calculated as follows, similar to the methodology presented in41. The k-shortest paths, with \(k=3\), in the weighted L- and P-space representations are calculated between each pair of nodes i and j using Yen’s algorithm42. For each of the paths obtained in a given graph representation, the path in the remaining graph representation is matched. For each path l of the resulting 2k paths, the generalised travel time is calculated as:

$$\begin{aligned} GTT_{ij}(l) = d_{ij}^L(l) + w^{\textrm{wait}}d_{ij}^P(l) + w^{\textrm{transfer}}d_{ij}^{P_u}(l) \end{aligned}$$
(4)

where \(d_{ij}^L(l)\) and \(d_{ij}^P(l)\) are the weights of the L- and P-space shortest paths, respectively, and \(d_{ij}^{P_u}(l)\) is the path length in unweighted P-space. \(w^{\textrm{wait}} = 2\) is the waiting time weight and \(w^{\textrm{transfer}}=5\) min is the transfer penalty. The values are determined from the literature and reflect average passengers’ valuation of time43. The generalised travel time between nodes i and j is then determined as the minimal GTT over the 2k paths:

$$\begin{aligned} d_{ij} = \min _l GTT_{ij}(l) \end{aligned}$$
(5)

Topological indicators of accessibility

Topological indicators in this study are based on several subgraphs of the access graph:

  • Maximal clique: a clique is a complete subgraph; maximal clique is the largest such subgraph.

  • k-core: the largest subgraph where each node has a degree of at least k (within the subgraph). Note that for a given k, the dimension of the k-core might be zero. In the case of the access graph, a k-core for a given k will first occur at a sufficient cut-off time \(\tau _{k}\).

At each \(t_c\)-step, the k-cores for \(k = \{\big \lceil 0.25N\big \rceil , \big \lceil 0.50N\big \rceil , \big \lceil 0.75N\big \rceil \}\) and maximal clique are computed. For \(k=\big \lceil x\%N\big \rceil\), the respective subgraph is termed the kx-core. The motivation for this choice follows the goals of public transit service providers where accessibility should reach a certain level of coverage, e.g. 75%. The time-dependent growth of the subgraph H is examined, where \(H \in\) {maximal clique, k25-core, k50-core, k75-core}. The dimensions of the subgraphs are increasing functions of \(t_c\) in the case of access graphs. The dimension of a k-core for a given k becomes non-zero only when a sufficient number of edges are added, i.e. at a certain threshold value of \(t_c\).

For accessibility indicator construction, the evolution of the subgraphs with increasing \(t_c\) is observed for both the actual and the idealised network. The value for the actual network is normalised to that of the idealised network. A set of cumulative accessibility indicators \(S^{(H)}\) is defined as the integrals under the respective curves \(N^{(H)}(t_c)\):

$$\begin{aligned} S^{(H)}=\frac{\int _0^{T}N^{(H)}(t_c) dt_c}{\int _0^{T}N^{(H^{ideal})}(t_c) dt_c} \end{aligned}$$
(6)

The upper boundary T of the integral is defined as \(T=\max (t^{max}, 2\cdot t^{max}_{ideal})\), where \(t^{max}\) is the maximum GTT in the actual network and \(t^{max}_{ideal}\) the maximum travel time in its idealised counterpart.

A set of threshold indicators represents the values of the cut-off times \(\tau _{kx}\) at which the kx-cores first appear, relative to T. By definition, \(S^{(H)} \in [0,1]\) and \(\tau _{(xk)}\in [0, t_{max}/T]\).

Comparison with conventional topological indicators

Topological indicators

We investigate how the proposed accessibility indicators relate to standard topological measures. Alongside the number of nodes, the measures related to the shortest paths are chosen since they are closely related to the standard notion of accessibility.

  • Average shortest paths \(\langle SP\rangle\).

  • Network diameter d: the longest of the shortest paths.

  • Network efficiency E: averaged sum of inverse shortest paths \(d_{ij}\) between all pairs of nodes i and j:

    $$\begin{aligned} E = \frac{\sum _{\{i,j\ne i\}}\frac{1}{d_{ij}}}{N(N-1)}. \end{aligned}$$
    (7)

All conventional measures are normalised to their idealised network counterparts. For network efficiency, this idea was proposed in28 and elaborated in44,45 where the authors considered efficient spread of information in diverse networks, from social and communication to transport networks. The normalised efficiency is defined as:

$$\begin{aligned} \overline{E} = \frac{E}{E^{ideal}}, \end{aligned}$$
(8)

where \(E^{ideal}\) is the efficiency of the complete weighted graph of the idealised network.

Geographical indicators

Traditional measures of accessibility within transport geography rely on several models, such as cumulative opportunities gravitational models46,47. The cumulative opportunities definition of accessibility of node i is defined as:

$$\begin{aligned} A_i=\sum _{j=1}^N O_j f(c_{ij}), \end{aligned}$$
(9)

where \(O_j\) is the number of opportunities at location j and \(c_{ij}\) is the generalised cost (impedance) of travel between nodes i and j and \(f(c_{ij})\) is a function of the cost that can take many different definitions, that are generally decreasing functions of the cost (see e.g.6). The simplest, and often used, such function is the cut-off function, defined as:

$$\begin{aligned} c_{ij} = {\left\{ \begin{array}{ll} 1; t_{ij}\le t_c,\\ 0; \text {otherwise}. \end{array}\right. } \end{aligned}$$
(10)