Network accessibility as the emergence of cliques

Šfiligoj, Tina; Peperko, Aljoša; Cats, Oded

doi:10.1038/s41598-026-35542-1

Download PDF

Article
Open access
Published: 13 January 2026

Network accessibility as the emergence of cliques

Tina Šfiligoj¹,
Aljoša Peperko^2,3 &
Oded Cats⁴

Scientific Reports volume 16, Article number: 5089 (2026) Cite this article

1604 Accesses
Metrics details

Subjects

Abstract

We propose a topological formulation of accessibility based on the notion of Access Graph, in which two nodes are connected if they are reachable within a given travel time. We trace the emergence and evolution of its subgraphs with imposed levels of connectedness, specifically maximal clique and k-cores. We propose two complementary sets of accessibility indicators, cumulative and threshold, based on integral measures of subgraph growth and times at emergence of k-cores, respectively. For a meaningful comparison of networks across different dimensions, we contrast the realised accessibility with that of an idealised network on the same set of nodes. The proposed measures offer a view of accessibility that extends beyond the commonly used node-averaged indicators. Empirical analysis of 42 metro networks worldwide demonstrates universal patterns of accessibility behaviour. We illustrate the practical application of this approach on a case study where we examine the accessibility impacts yielded by alternative infrastructure and service developments. Our results amount to the reconceptualisation of accessibility within the complex network framework.

Geometric description of clustering in directed networks

Article 02 November 2023

Uncovering the hidden structure of small-world networks

Article Open access 19 March 2024

The temporal rich club phenomenon

Article 13 June 2022

Introduction

Transport network design is a key determinant of travel accessibility. At the same time, accessibility is a key determinant of transport use¹. It is therefore paramount to understand how network structures influence transport accessibility. In particular, urban sustainability development goals call for improving the accessibility and therefore access to opportunities provided by public means of transport.

The classical definition of accessibility is the Hansen definition as travel impedance between spatially dispersed opportunities for activity². In contrast to other more technical measures, such as robustness or vulnerability, common across various fields of engineering³, accessibility is specific to the spatial movements of people, and thus to disciplines such as geography and transport science, and is often perceived as a “soft” measure. The Hansen definition conceptually captures these characteristics and, while it is qualitatively well-founded, it lacks quantitative rigour to guide formal analysis. Consequently, concrete operationalisation and the formulation of related indicators differ significantly across analyses, often relying on ad-hoc interpretations of the devised indicators. This dispersion has made it challenging to devise a unified and generalisable framework for accessibility analysis, leading to the recent recognition of the need to unify accessibility studies^4,5,6.

Accessibility is essentially about connectedness - the ability to reach various destinations within a certain time budget. A complex network perspective arguably offers a promising framework to establish a rigorous and unified analysis. Public transport systems can naturally be modelled as networks, and the methods of network science have been previously applied to study their topological structure^7,8,9. Past works have often focused on network science applications for studying the vulnerability and robustness of public transport networks^{10,11,12,13,14}.

Recently, a complex network approach has been gaining importance also in accessibility studies. Previous studies employ standard topological measures as accessibility indicators. In particular, topological measures such as average path length or network (in)efficiency at the global level^15,16, or metrics related to closeness centrality at the node level¹⁷ have been proposed as indicators of accessibility. All of the aforementioned proposed network-level measures are based on node-level averages. In addition, analyses often adopt a strictly topological approach, discarding the infrastructural and operational properties of public transport systems, such as in-vehicle and waiting times, which ultimately influence their use. These properties are direct outcomes of network design decisions such as line configuration, service frequencies and travel speeds. Moreover, there are several standard graph representations, reflecting either the underlying infrastructure or operational characteristics, since reconciling all aspects in a single representation is non-trivial.

In this work, we approach the problem from a complementary perspective, asking first how accessibility can be qualitatively redefined in terms of reachability within well-connected parts of the network. Specifically, our objective is to identify parts of the network where (almost) all nodes are reachable from one another within a given maximum allowed travel time. To this end, we adopt the notion of an access graph $\mathcal {G}_A$¹⁸. Its node set represents the public transport stops, and its edge set is constructed by calculating the generalised travel time (GTT) matrix and connecting all pairs of nodes i and j for which GTT(i, j) is within a certain cut-off time $t_c$. GTT is comprised of in-vehicle and waiting time, together with time-equivalent transfer penalty. The edge set of the access graph is thus $t_c$-dependent and $\mathcal {G}_A$ gradually evolves from the empty graph at $t_c=0$ to a complete graph when $t_c$ reaches the maximum travel time in the network.

To formalise this view of accessibility, we examine several maximal subgraphs of the access graph and their evolution with increasing cut-off time. The strictest condition on connectedness is imposed by the maximal clique, where each pair of nodes shares an edge. A looser condition is that of a k-core, defined as a maximal subgraph where each node has a degree of at least k. Note that the access graphs and all its subgraphs used in our analysis are unweighted and undirected. These allow us to define reachability levels by setting predetermined values of k, for example observing the part of the network where at least $x\%$ of all the stops (of the original network) can be reached from each node. Specifically, we observe the $x=25,50,75$ xk-cores in increasing order of stringency. Each such k-core emerges at a specific GTT threshold $\tau _k$ in the evolution of $\mathcal {G}_A$.

The analysis of (maximal) cliques in transport networks in the literature is hitherto relatively scarce. Some interesting works include¹⁹ where the authors consider urban statistical areas as nodes in the graph that are connected by commuting flows. The identification of maximal cliques in this case corresponds to clusters of employment nodes. In²⁰, contact networks of public transport passengers are constructed and cliques of passengers are identified based on similarity in journey and transfer patterns to identify the high-risk journeys in epidemic spreading. In addition,²¹ consider the topological characteristics of the Chinese airline network, focusing on the identification of cliques to identify densely connected parts of the network.

We note that the analysis of k-cores, especially k-core decomposition methods, is often used to examine node importance in networks (e.g.²²) or the hierarchical structure of networks (e.g.²³). k-core decomposition methods have been used in transport research, e.g. for studying network vulnerability²⁴, finding the backbone structure of the network²⁵, and airline network evolution²⁶. The methods used are based on the main core, i.e. the k-core with the largest k. In contrast to the existing analyses, in our approach we consider fixed values of k that reflect the desired levels of connectedness.

We trace the evolution of the selected subgraphs by progressively increasing the cut-off time. The process is visualised in Fig. 1c and d, where the access graph and the considered subgraphs are visualised for six increasing values of $t_c$. Specifically, the growth of the maximal clique as the most stringent condition on connectedness is illustrative (nodes and edges shown in red). We introduce two complementary sets of accessibility indicators related to the evolution observed for each network: cumulative accessibility indicators $S^{(H)}$ and threshold indicators $\tau ^{(H)}$. Cumulative indicators are defined as integral under the curve $N^{(H)}(t_c)$, where $N^{(H)}$ is the dimension of the subrgaph at $t_c$. Threshold indicators $\tau _{(H)}$ reflect a complementary perspective and measure the cut-off time at which the k-cores for fixed values of k first occur. Intuitively, the networks where the subgraphs grow fast early on in shorter time intervals can be regarded as more accessible. The cumulative and threshold indicators correspond to the notions of primal and dual accessibility, respectively²⁷.

To allow for the construction of indicators that enable meaningful comparison across networks, we contrast the concept of an idealised network on the set of nodes, each with given geographical coordinates, similar to²⁸. This allows us to incorporate geographic and topological scaling into the indicator construction. The introduced measures are then normalised to their idealised counterparts to allow for reliable accessibility indicators.

We perform the analysis for 42 metro networks worldwide and compare their accessibility levels. We illustrate our approach in Fig. 1 where we exemplify our analysis for the cases of Paris and London metro networks. The networks are similar in size and are among the largest networks in the dataset. Sub-figures (a) and (b) show the growth dimensions versus $t_c$ for the studied subgraphs. The evolution of the access graphs and their maximal cliques and kx-cores is visualised in Fig. 1c, d. The curves for the maximal clique of the actual and idealised networks are shown in figures (e) and (f). Visually, the networks where the curves are closer to the idealised curves, corresponding to smaller gray areas in the plots, are more accessible. In this example, the London metro network has significantly higher accessibility than the Paris metro.

The accessibility indicators proposed in this study are compared to commonly used topological indicators, as well as to traditional indicators from transport geography. The implications of this approach for transport planners are illustrated on a case study of the Stockholm metro, where different developments of the network are simulated and the changes in accessibility levels are measured. We discuss the implications of the proposed framework for future accessibility studies.

Results

We construct access graphs of 42 metro networks worldwide and examine the evolution of their maximal subgraphs with progressively increasing cut-off time. For each network, we observe four characteristic subgraphs H: the maximal clique and xk-cores for $x\in \{25,50,75\}$. For accessibility measurement, we follow the same procedure for the idealised network to give a meaningful reference for actual accessibility levels. Sect. 2.1 presents the results for the idealised scenario and sets the scene for the results of the existing networks. The results for the actual networks and their comparison to their idealised counterparts are the subject of Sect. 2.2. Comparative assessment of the cities based on the proposed accessibility indicators is presented in Sect. 2.3. In Sect. 2.4, the proposed indicators are compared to the conventional accessibility indicators.

Idealised network

To be able to meaningfully assess the level of accessibility and compare networks of differing topological and geographical dimensions, a notion of an ideally accessible network is needed. In the absence of which, one compares to zero travel times between all origins and destinations which is not only unrealistic but also distortive when comparing networks of different scales. Public transport networks are spatial networks, and the physical constraints of distance coupled with allowed travel speed pose an upper limit to possible accessibility levels. The problem is formally stated as: given a set of N nodes, each with geographical coordinates (lat, lon), what is the theoretical upper limit of accessibility levels?

The problem is nontrivial and even in the idealised case requires a set of assumptions. The maximum speed must be set so that velocity and acceleration are safe for passengers. Physical constraints also determine the maximum capacity, so a finite frequency must be set.

Even with unlimited resources, designing a realistic (i.e. satisfying the physical constraints) optimally accessible network requires several stages of planning, such as line planning and frequency setting, rendering this an NP-hard problem. We therefore approach the problem from a strictly theoretical perspective. We assume that each origin-destination pair can be travelled directly and in a straight line. We then set typical values of average speed and waiting time (proportional to inverse frequency) for all connections in all networks. The construction of the idealised network is described in more detail in the Methods Sect. 4.2. The maximum and mean Haversine distances, together with the corresponding idealised travel times, are shown in Table 1. Maximum distances vary by an order of magnitude, spanning from $\approx 7.5km$ (Marseille) to $\approx 75km$ (San Francisco), with the average maximum distance of $\approx 26km$ across all networks. Mean distances are highly linearly correlated with maximum distances ($r=0.94$). Figure 2 shows the scatter plot of maximum Haversine distance $\ell ^{max}$ and number of nodes N. The relationship is approximately linear with a few outliers that we comment on here. On one end, the most notable outlier is the San Francisco network, where 50 stops cover a maximum distance of more than 75 km. The Valencia network is another outlier in the same direction. On the other end, the New York and Paris networks cover a comparatively small geographic area relative to the number of nodes in the network. The difference stems from the metro systems covering different areas in different cities: in some cases only the urban area (e.g. Paris, whereas the RER extends further into parts of Ile-de-France), while in other cases they reach far into the outskirts (e.g. San Francisco and parts of the Bay area).

Table 1 Key statistics of geographical distances and idealised travel times for all cities. N: number of nodes; $\ell ^{max}$: maximum Haversine distance in the network; $\bar{\ell }$: average distance; $t^{max}_I$: maximum idealised travel time; $\bar{t}_I$: average idealised travel time.

Full size table

Comparison of real and idealised network behaviour

We construct the access graphs for the metro networks and their idealised counterparts. The growth of the examined subgraphs in both cases is shown separately for each city in Figure 3. Similar growth patterns are observed across all networks. The maximal clique dimension is a continuous monotonic function of $t_c$ that increases from $N^{(MC)}=1$ at $t_c=0$ to $N^{(MC)}=N$ at $t_c=t^{max}$. Conversely, kx-core dimension $N^{(kx)}$ is 0 until a certain threshold time $t_c=\tau _{kx}$, and emerges with $N^{(kx)}\ge kx$ at $\tau _{kx}$ and monotonically increases after that. The difference between the growth of the actual and idealised cliques, as visually demonstrated by the relative difference of the areas under the violet (idealised) and orange (actual) curves, is examined in more detail in the next Sect. (2.3).

To understand the differences in maximal clique and k-core growth behaviour between realised and idealised networks, we examine possible sources of discrepancy. In the idealised network, all distances can be travelled directly. In real networks, this is - either due to network infrastructure or line planning - generally not the case, and the large majority of connections have a larger actual distance. The network indicator encapsulating this behaviour is circuity, which measures the pair-wise discrepancy between the network distance $d_G(i,j)$ and the straight-line distance $d_E(i,j)$ in terms of their ratio (Eq. 2 in Methods)^29,30. The average circuity over all node pairs is the corresponding global metric (Eq. 3 in Methods).

The average circuity of the idealised network is $C=1$ by design since it means that all connections are direct without requiring any detours. The average circuity of the realised network is thus a measure of discrepancy from the ideal case. Since we only have data on travel times (and not on the exact shape of physical paths), we consider two measures of distance: in-vehicle time for infrastructural circuity $C^I$, and waiting time for operational circuity $C^O$. The values of $C^I$ and $C^O$ for all networks are given in Table 2. The relationship of infrastructural and operational circuity to accessibility measures is presented and discussed in Sect. 2.3.

Table 2 Circuity metrics across metro networks.

Full size table

For all networks, the growth of the idealised network subgraphs consistently occurs significantly earlier in the time interval than that of the actual network. In addition, consistent patterns in the shapes of kx-core and maximal clique growth for each network point to universal behaviour of accessibility. A quantitative examination of the discussed behaviour will provide the means for comparative assessment of networks in terms of accessibility levels in the following section.

Accessibility measures

Cumulative accessibility indicators are defined as integrals under the subgraph dimension curves of the actual network, relative to the idealised network (see Sect. 4.4 for more detail). To relax the idealised accessibility assumption, the integrals are evaluated up to twice the maximum idealised travel time, $2t_I^{max}$. For some networks, this value is larger than the actual maximum travel time. Specifically, this happens for the Cairo, Copenhagen, Dubai, Hyderabad, Lille, London, Milan, Prague, San Francisco, Santiago, Valencia, Warsaw and the Washington DC networks. The upper bound T of the integral is thus determined as $T = \min (t^{max}, 2\cdot t_I^{max})$. This is illustrated in Fig. 4. Similarly, the threshold indicators $\tau _{(kx)}$ are defined relative to T.

The rankings with the color-coded normalised indicator values for the full dataset are shown in Fig. 5. Note that higher values of the cumulative accessibility indicators mean higher accessibility, and vice versa for the threshold indicators. The five most accessible networks are London, San Francisco, Valencia, Bilbao and Dubai. Note that all of the networks, except Bilbao, fall in the $T=t_{max}$ category, i.e., have relatively low actual maximum travel times compared to the idealised case, and also in the idealised case the maximal clique growth has a distinctively slow convergence compared to most other networks (Figs. 3 and 4). At the lower end, Buenos Aires, Marseille, Philadelphia, Kobe and Oslo score the lowest on the proposed accessibility indicators. All five networks are characterised by a steep growth of the maximal clique dimension in the idealised case and a slow growth in the actual network in the beginning of the interval, resulting in larger discrepancies between the existing network behaviour and that of its idealised version (Fig. 4).

We observe that for most of the networks, the rankings based on different indicators tend to be overall in agreement. This is also confirmed by the finding that all indicators are strongly correlated (Fig. 6) with positive correlations within each type of indicators - cumulative accessibility and threshold-based - and strong negative correlations between those, as expected. Notwithstanding, for a subset of networks, the rankings vary significantly between indicators. Among those are the Paris network, where kx-core indicators rank it significantly higher than the maximal clique indicator, whereas the pattern is reversed for the Copenhagen network. As can be seen in (Fig. 6), the correlations are the strongest for the maximal clique and k50-core cumulative indicators ($r=0.96$). Linear correlations are the highest for pairs of complementary indicators $S^{(xk)}$ and $\tau _{xk}$ at the same values of x ($r=-0.90$ for $x=25$; $r=-0.96$ for $x=50$; $r=-0.92$ for $x=75$). The correlation strength remains high across all combinations, but decreases for pairs of cumulative and threshold indicators with differing values of x ($-0.82$ to $-0.89$ for consecutive values of x and $r=-0.73$ - $r=-0.76$ for pairs with $x=25$ and $x=75$). The cross-correlations of the cumulative indicators at different values of x are the lowest for $S^{(k25)}$ and $S^{(k75)}$ ($r=0.81$). Compared to $x=50$, the two subgraphs for $x=25$ and $x=75$ are more sensitive to specific conditions at lower and upper boundaries of the time interval that might differ for each network. The k50-based indicators are therefore deemed the most stable. The overall results suggest high regularity in maximal clique and k-core growth behaviour across all networks, pointing to universal patterns in network accessibility.

To further examine the relationship between accessibility measures and circuity (Sect. 2.2), we observe their correlations (Fig. 7).

Correlations imply that the more circuit the network is the less accessible it is, in agreement with expectations, i.e. higher circuity indicates lower accessibility. The correlations are stronger for infrastructural circuity, implying the greater impact of infrastructure on accessibility levels compared to operational characteristics. Specifically, we note that the networks with the highest values of $C^I$ - greatest circuity - are Naples, Buenos Aires and Oslo (Table 2) are those that rank among the lowest few in Fig. 5. Conversely, the cities with the lowest values of $C^I$ - San Francisco, Atlanta and Copenhagen, do not per-se, with the exception of San Francisco, correspond to the highest ranked cities. For operational circuity, an interesting observation is that London has by far the lowest average waiting time and is also overall the most accessible network according to the indicators introduced here. On the other hand, San Francisco has one of the highest values of $C^O$. At the same time, it is an outlier in terms of infrastructural circuity values with $C^I=0.9$, i.e. less than one. In the construction of idealised networks, the speed on links is taken to be constant across all networks. San Francisco is an outlier in its geographical diameter (approx. 75 km) and the actual speeds on (long) rail segments can be significantly higher than this value.

The access graph and three of the subgraphs (maximal clique, k25-core and k50-core) at a characteristic value of $t_c=\tau _{k50}$ are visualised in Fig. 8. The networks are ranked and their subplot borders are color-coded based on the $S^{(MC)}$ indicator. The most accessible networks have maximal cliques of comparable dimensions to k50-cores at the latter’s emergence (compare also with the plots in Fig. 3). Moreover, the cliques are geographically relatively dense, while the spatial density is visibly low for the least accessible networks. Compare also the L-space layout with circuity values in Table 2.

The geographical distribution of the cities included in the analysis is shown in Fig. 9.

Relation with conventional indicators

Topological indicators

Accessibility in general, and the topological indicators we consider in particular, rely on the identification of shortest paths. We therefore investigate their relation with commonly used topological indicators which are closely related to the notion of shortest paths: network efficiency E; average shortest paths $\langle SP\rangle ^L$ for in-vehicle time-weighted L-space representation, $\langle SP\rangle ^P_w$ and $\langle SP\rangle ^P$, for weighted and unweighted P-space representation, respectively. Similarly, we consider the network diameter d in all three representations. All measures are normalised to their counterparts in the idealised network (see Sect. 4.5 for definitions).

The correlation matrix is shown in Fig. 10. The correlations are the highest for efficiency and weighted P-space indicators. Note that the definition of distance in the calculation of efficiency is GTT, Generalised Travel Time, i.e. the same metric that we use in the construction of the access graph.

Similarly, correlations with average shortest paths are moderate, especially for the weighted representations. The correlations with shortest paths and diameter are negative since longer shortest paths reflect lower accessibility. The correlations are generally the highest for the weighted P-space, followed by weighted L-space and unweighted P-space, reflecting the influence of the service properties contained in the definition of GTT, used to construct the access graph (Eq. (4)). Insignificant to low correlations with network dimension N are expected due to the scaled construction of indicators using the notion of idealised accessibility.

Geographical indicators

Accessibility is a mature research area within transport geography. The classical indicators are based on the Hansen definition as the ease of travel between spatially dispersed opportunities for activity². A standard operationalisation of this notion is the cumulative opportunities measure of accessibility which measures the number of reachable opportunities within a given threshold time (definitions in Eq. 9 and Eq. 10 in Methods).

In the access graph, this corresponds exactly to node degree $k_i$ at the threshold time $t_c$. To compare and contrast the introduced indicators to classical measures, we examine the average node degree in the access graph (i.e. the cumulative opportunities accessibility) for a few select values of $t_c$: $t_c=\{30, 45, 60\}$ minutes. Additionally, we consider a few values of $t_c$ relative to the network-specific maximum GTT,$t_{max}$: $t_c=\{\frac{1}{3}, \frac{1}{2}, \frac{2}{3}\}t_{max}$.

The correlations of the introduced indicators to absolute-time values are low and negative (Fig. 11). This is in line with expectations since the geographical areas and thus the physical constraints on travel ability vary significantly between cities and the introduced indicators are designed to allow for meaningful comparisons across networks of different scales. The correlations with time points relative to maximum time are moderately positive which is in agreement with our formulation of accessibility as the one realised by the infrastructure network and the super-imposed service plans compared to the system’s geographical constraints.

We stress however that the indicators introduced in this work reflect the temporal evolution of the accessibility levels, and are thus qualitatively different from the cumulative opportunities definition. The cumulative indicators $S^{(H)}$ proposed in this work can be considered to represent “cumulative cumulative accessibility”, i.e., at a higher level of aggregation than cumulative accessibility in its traditional sense.

Illustration

For an illustration of the practical meaning of the introduced accessibility indicators, we investigate the impacts of different developments for the case of the Stockholm metro. We then compare accessibility measures of the modified networks with the original network. The Stockholm metro currently has $N=101$ nodes and operates $L=3$ lines and $R=7$ routes (Fig. 12). It scores relatively low in accessibility rankings (30 to 37 out of 42 across all measures). It has large infrastructural circuity as can be seen from its topology: a central core to the east, and cross-center radial routes spread out to the west, the nodes on which are geographically relatively close but topologically far apart.

For the sake of illustration, we consider the following three scenarios: (i) increasing (doubling) the frequencies on all lines in the network (refer to it as 2F); (ii) introducing a new line (Line X, Fig. 13), and; (iii) combined previous two points (2F + Line X).

We perform the complete access graph analysis on the modified networks and compare the accessibility measures in each case with the values obtained for the original network. The comparison of maximal clique growth in all cases is shown in Fig. 14.

The improvements in the $S^{(MC)}$ indicator are comparable for the 2F and Line X scenarios, and both show considerable improvement in overall network connectedness for values of GTT above 20 minutes. This coincides with the time needed for the maximal clique “outskirts” to reach the nodes that are crossed by the new line. After that the improvement is rapid and the Line X scenario outperforms the 2F scenario considering the maximum GTT. The combination of both modifications to the network amounts to another significant improvement in accessibility levels. The values of all indicators in all scenarios are summarised in Table 3.

Table 3 Values of indicators for different scenarios.

Full size table

Discussion

This study presented a topological formulation of public transport accessibility by focusing on several characteristic subgraphs of the access graph. The proposed approach studies global accessibility levels beyond averaged node-level measures by tracing the emergence and evolution of well-connected regions of the network with increasing maximum allowed travel time. This framework amounts to a re-conceptualization of accessibility that offers several novel insights and opens promising ways for future analysis. Our approach is consistent with the popular notion of the X-minute city³¹ and thereby facilitates related investigations of accessibility using different travel modes and in relation to various amenities.

The maximal clique represents the most stringent condition on connectedness, and is at the same time the most interpretable measure. The high correlations between maximal clique and k-cores measures suggest that k-cores analysis suffices for the presented accessibility analysis, especially the k50-core indicator. This is crucial when applying the analysis to larger networks due to tractability issues as finding maximal cliques is an NP-hard problem. In contrast, algorithms for finding k-cores have polynomial time complexity³². However, this correlation should first be established when applying the analysis to other modalities. Highly effective methods for finding cliques should be adopted for scalability to larger networks (e.g.³³).

Moderate to strong correlations with efficiency and shortest paths strengthen the relevance of the proposed indicators that are in line with current measures. The main strengths of our approach compared to traditional approaches is twofold: first, identifying the subgraphs offers a detailed view of accessibility going beyond simple averages. This offers an interpretable operationalisation of accessibility and allows for informed planning and optimisation of public transport networks. Second, by using the notion of the idealised network on the same set of (geographically positioned) nodes, it offers an approach that scales with the network geographic and topological size.

These findings warrant a rethinking of accessibility through the lens of complex networks. More importantly, and with broader implications, the observations support the need for a methodological shift: rather than adapting existing network metrics to transport concepts, relevant transport properties should guide the development of suitable graph-theoretic measures.

Next to these theoretical implications, the introduction of the access graph and its subgraph-based accessibility measures can serve transport planners in designing highly accessible networks. The proposed methodology allows for simulating changes in infrastructure (e.g. adding new stops or links) as well as service planning (e.g. introducing new lines and modifying service frequencies). The illustration of this approach on the Stockholm metro network offers a preliminary confirmation of its feasibility, and such simulations of alternative developments can guide decision making when planning public transport developments.

In the following, we outline the limitations of this study, and suggest venues for future research. We first note that in transport geography, accessibility is understood in relation to different amenities. In the present framework, we focus on the travel impedance component of accessibility, and simply consider the number of reachable nodes in the PT network as a proxy for number of opportunities for different activities. Connected to the previous point, actual land use or register data are not used in the model, and there is no estimate on the actual number of opportunities or their differentiation. This view of accessibility thus only addresses the efficiency of the existing system. Including the coverage, and number of different opportunities data in terms of node weights seems a promising direction for future work.

In terms of urban PT studies, to gain a more complete picture of accessibility, it will be necessary to consider other PT modes. In this work, we only consider metro systems that have a specific role in the urban PT system: compared to bus or tram networks, metro has high speed, large capacity and direct connections, and carries a significant percentage of passenger flows in multi-modal urban PT systems. However, due to its infrastructure, it has comparatively fewer stations and less spatial coverage than bus networks.

On the travel impedance side, the construction of the access graph is based on the cut-off function and is an undirected unweighted graph. Adding edge weights and using directed edges would give a more realistic view of accessibility. The maximal weighted clique problem would be interesting to apply (e.g.³⁴). In addition, a detailed analysis of the impacts of each component of GTT on accessibility levels will offer a deeper understanding of the interplay between PT network infrastructure and operation on access.

On the methodological side, the maximal clique is the strictest constraint and defines a single core part of the network. This is oblivious to any other well-connected parts of the network. A complementary perspective would be the analysis of all k-cliques, or a deeper analysis of network geometry and the forces shaping the network topology by examining smaller cliques and their interacions (similar to e.g.³⁵). Especially in larger cities, there might exist several smaller well-connected parts, which would also be interesting to see with respect to the idea of the X-minute city. Community detection is another promising alternative to predefined subgraphs.

Next to those mentioned above,

this framework opens several lines of further inquiry. We propose the identification of realistic maximally accessible networks, given the physical constraints as well as incorporating the cost of resources, emissions and energy consumption of operating the network, and accounting for (expected) passenger demand as a promising venue for future research. The consideration of metro systems within multi-modal public transport systems is another promising direction, and a similar analysis for multilayer networks should be performed to assess accessibility of integrated PT networks. In addition, an integrated framework for analysing accessibility and robustness will reveal the interplay between the two properties. Specifically, measuring the difference in accessibility levels in terms of the proposed subgraph indicators under failures on either nodes or links emerges as a promising direction.

Methods

Datasets and software

The access graphs are constructed using the curated datasets of L- and P-space representations of 51 metro networks^36,37. In the L-space representation (also known as the Infrastructure space,³⁸), there is an edge if the corresponding stops are consecutive stops on a route. In P-space (also known as the Service space,³⁸), each route is a clique, effectively counting the legs of a journey based on transfers. The edges in L- and P-space are weighted with in-vehicle and waiting times, respectively. Of the 51 networks included in the datasets, the 42 networks that have more than one operating line are considered for analysis. The networks included in our empirical analysis vary significantly in dimension, from $N=26$ (Kobe) to $N=421$ (New York).

The code is written in Python. The NetworkX library³⁹ is used for most of the analysis and the NetworKit library⁴⁰ is used for intensive clique computations.