Introduction

With the rapid development of globalization, the integration of economies, societies, and civilizations all over the world has increased (Hjalager 2007). Compared to most other large industries, the increasing of consumers and the products of travel and tourism, which is identified as the ‘world’s largest industry’, are highly visible (Middleton and Hawkins 1998). Simultaneously, a substantial portion of global passenger transportation is intertwined with tourism, and the sector, especially international tourism, acts as a barometer for global mobility trends and economic development trajectories (Dubois et al. 2011; Wang et al. 2022).

Tourist flow represent the movement of travelers from their places of origin to various destinations worldwide. The interplay of these movements forms a global tourism network, intricately linking various destinations, transportation entities, lodging establishments, attractions, and other stakeholders within the tourism industry (Valeri and Baggio 2022). Tourist flows are essential to sustaining the global tourism network, driving economic advancement, fostering cultural interchange, facilitating knowledge dissemination, and promoting sustainable tourism practices worldwide (Albrecht 2013; Gan et al. 2021; Moulin and Boniface 2001; Nguyen et al. 2019). These complex tourist flows connect countries on a global scale, with nations and regions (serving as either destinations or source markets) act as nodes, and movement between them forming the network’s edges.

Complex network theory provides a crucial framework for systematic research, demonstrating maturity in system deconstruction and relational analysis. Much of tourism development relies on the successful alignment of organizations through partnerships or networks (Ramayah et al. 2011; Zhou et al. 2023). Meanwhile, the tourism industry, characterized by substantial population movements, financial exchanges, business transactions, knowledge sharing and diverse stakeholder relations, gives rise to various intricate network structures (McLeod et al. 2024). Consequently, the use of network methodologies to examine tourism phenomena has become increasingly prevalent.

Over the past two decades, numerous studies have employed network analysis methods to explore international tourism flows and the global tourism networks (Morrison et al. 2004; Seok et al. 2021; Shao et al. 2020). These studies have revealed a distinct small-world characteristic of global tourism networks, with structures that are constantly changing (Lozano and Gutiérrez 2018; Morrison et al. 2004). Previous research has mainly focused on changes in network scale and individual attributes of main countries (such as degree, centrality, etc.), discussed the evolution of international tourism networks from macro-level and micro-level. However, there is still room for improvement in research at the meso-level of global tourism networks, particularly in terms of community analysis.

The global tourism network exhibits a community structure (Zhu and Liu 2022), implying that it can be partitioned into some clusters. Countries within the same community (cluster) generally maintain closer interactions, which facilitates communication and exchange, accelerates resource flows, and promotes innovative cooperation (Xu et al. 2021). Regarding community divisions, the World Tourism Organization (UNWTO) proposes that the global tourism network can be divided into five parts, namely, Europe, Asia Pacific, Americas, Africa, and Middle East. Meanwhile, the World Tourism Council (WTTC) divides the global tourism network into eleven clusters based on geographical location: North America, Caribbean, Latin America, Europe, Africa, Middle East, Asia Pacific. However, due to the complexity of international geopolitics, the uniqueness of regional culture, and the dynamic nature of national relations, the formation of tourism clusters cannot be solely by geographical location. Furthermore, the scale and composition of tourism clusters are not static. Therefore, it is essential to conduct more detailed research on the formation of global tourism clusters based on a comprehensive analysis.

Typically, communities are defined by a strong internal connectivity compared to a weaker external connections (Kaiser et al. 2022). In the global tourism network, community structures feature tight internal connections and looser external ones. These structural relationships exhibit geographic homophily, meaning that more links exist within continents than between them (Lozano and Gutiérrez 2018). However, there is ongoing debate over the number and composition of these communities. For example, Chung et al. (2020) indicated that the global tourism network could be segmented into eight communities from 2000 to 2012, and two communities from 2011 to 2013, while Seok et al. (2021) found that the global network could be divided into eight communities.

While existing studies have examined the members of these communities, the differences in structure, function, and effects among these communities have not been adequately addressed. Moreover, there are scarce appropriate methods currently available to investigate the characteristics of this meso-level network structure referred to as the community.

Above all, this paper focuses on the meso-level of the global tourism network based on complex network theory specifically the community. By constructing global tourism network for each year from 1995 to 2021, this paper explores the changes in the communities of the global tourism networks, construct theories and methods for measuring community efficiencies, and investigate the factors influencing these efficiencies. Broadly, this paper addresses three primary questions:

  1. 1.

    Which communities/clusters exist within the global tourism network and how have they evolved over time?

  2. 2.

    How can we measure the characteristics and efficiencies of communities within real-world complex network?

  3. 3.

    What factors impact the efficiencies of communities, particularly how individual characteristics effect on communities within the tourism network?

The remainder of the paper is structured as follows: Section “Related works” provides a summary of the related literature on international tourist flow and networks. Section “Methodology” outlines the methodology employed in this study. The empirical findings are presented in Section “Results”, followed by Section “Conclusion and Discussion”, which includes the conclusion and discussion of the study’s results.

Related works

International tourism is one of today’s fastest-growing industries, driven by globalization, necessitating increasing academic attention (Witt et al. 2013; Hanni et al. 2024). Since the 1990s, much research on international tourist flows has primarily focused on tourism demand (Adeola et al. 2018; Martins et al. 2017; Song et al. 2018; Yang et al. 2019). Early studies emphasized economic factors and predominantly employed gravity models to analyze these flows (Lim 2006). These models are based on the premise that the volume of tourist flow between two regions is directly proportional to the size of their populations and inversely proportional to the distance between them (Crouch 1992).

Various factors have been identified as determinant of international tourist flows in different contexts. For instance, geographical distance and exchange rate can explain tourism demand in Brazil (Tavares and Leitao 2017), while income of tourists and infrastructure development in Tanzania, are primary determinants of international tourism (Wamboye et al. 2020). Other studies have identified factors such as global income, state-level transportation costs, stock of foreign-born residents, the exchange rate and the price levels of international and domestic as key influencers of international visits to the Australian states and territories (Shafiullah et al. 2019). The number of tourists from neighboring countries also affects the international tourism flow of a country (Park et al. 2023). Some scholars also explored the influencing factors of tourism flow from both the origin and destination perspectives, using variables such as income and population of origin countries, tourism attraction, special events, labor, and capital availability (Eilat and Einav 2004). In addition, other social factors such as crime, ethnic tensions, external conflicts, and geopolitical risks have negative and significant impacts on international tourism arrivals (Altindag 2014; Gillen and Mostafanezhad 2019; Hailemariam and Ivanovski 2021).

With advances in computational methods and graph theory, an increasing number of scholars have tend to examine international tourism through network science (Baggio and Cooper 2008; Baggio et al. 2010). This paradigm shift has provided novel insights into the complex and evolving relationships within tourism systems, expanding the scope beyond pairwise interactions to include indirect relationships and the overall network structure (D’Agata et al. 2013), and verified that it helps predict tourist flow (Sáenz et al. 2023). For example, Baggio (2008) utilized network analysis to investigate the structure and dynamics of tourism destinations, suggesting that the development of destinations could be augmented through network structure optimization. Williams and Baláž (2013) delved into the world tourism network, identifying a highly uneven distribution of tourism flows, predominantly centralized in Europe and North America. Seok et al. (2021) found a gradual decentralization in outbound and inbound tourism networks, while Shao et al. (2020) examined the structure and progression of international tourist flows, as well as the roles of countries and regions within the global tourism network. Chung et al. (2020) combined cluster analysis and social network models to delineate the configurations of global tourism networks and pinpoint factors driving changes in international tourist flows. Okafor et al. (2022) observed that higher migration rates contribute to increased international tourism flows in destination countries, particularly when the destination possesses a larger linguistic network. Moreover, researchers have examined the impact of network structure on international tourism performance (Zhu and Liu 2022).

As tourist flows gradually become complex, changes in the tourism network need to be detected from more perspectives (Zhu 2021). More recent research has moved beyond structural analysis to explore the interplay between network structures and tourist behaviors. For instance, Halu et al. (2013) combined mobile phone data with network analysis to uncover patterns in human mobility, while Gao et al. (2022) used travel notes to find China’s tourism network structure and revealed tourists’ popular cities. Liu and his colleagues (2023) developed a complex tourist itinerary visualization model that reveals the spatial structure of multi-layer tourist flows. In Korea, Jeon and Yang (2021) examined the structural changes in the international tourism network based on mobile phone data, particularly in response to increased demand following the COVID-19 outbreak.

Most of the above research discusses the overall structure or node characteristics of the tourism network, with less attention has been given to the meso-level structure of network communities. Communities, understood as densely connected subgroups of nodes, can highlight important clusters of tourism activity, and reveal cooperative or competitive relationships between destinations (Zhu 2021). Researchers employ various methods to measure and identify communities within tourism networks, including modularity analysis, hierarchical clustering, and community detection algorithms (Hoffman et al. 2018; Newman 2006). By identifying communities within networks, scholars have also conducted research on community composition and relationships. Provenzano and Baggio (2020) detected community changes in inbound tourism in Sicily. Based on community identification of the network, Tsiotas et al. (2021) revealed four groups of seasonality, which are described by distinct seasonal, geographical, and socio-economic profiles. Combining online tourist search data and official tourism statistics, Xu et al. (2022) found tourism attractions, traffic, policy, and information may affect the tourism community in the network.

While discussions have taken place within network science, research endeavors concerning communities predominantly concentrate on their delineation. In contrast, there is a lack of research methodologies focusing on communities, which limits the potential insights that can be gained. Future research could explore the mechanisms driving community evolution, incorporate more diverse data sources for a comprehensive understanding, and investigate how insights into community structures can inform tourism management and policy-making.

Overall, while significant advancements have been made, there remains a notable gap in the global tourism network, particularly in the field of mesoscale research about global tourism networks. Expanding research in this area is both necessary and urgent.

Methodology

Research framework

Figure 1 shows our research framework, which consists of five key steps. The first step is collecting data from public databases, and then constructing the foundation of this research: the global tourism network in each year. Secondly, the community deviation is conducted based on the modularity algorithm, assign codes to the resulting communities. Next, combining similarity index calculation and communities identification, we divide the global tourism network into some stages. Patterns and roles of communities in global tourism network evolution can be reflected through visualization. Fourthly, we measure the communities’ efficiencies from structural, performance, and functional based on the new methods this research proposed, allowing us to track changes in community efficiencies over time. Fifthly, we chose a mixed effects model to test if the 11 factors (considering geographical relationship, population, economy and export development, air traffic, attractions, and social stability) affect the communities’ efficiencies within the global tourism network. From this research framework, we may make theoretical and practical contributions in the complex network theory and organization of global tourism networks.

Fig. 1
figure 1

Research framework.

Data source

The dataset on international tourist flows was sourced from the United Nations World Tourism Organization (UNWTO) (https://www.e-unwto.org/toc/unwtotfb/current), covering over 200 countries or regions from 1995, updated through 2021. This dataset provides detailed information on both outbound and inbound tourism for each country, tracking tourist movements between nations. Specific indicators include: a) “TFR” for arrivals of non-resident tourists at national borders; b) “VFN” for non-resident visitor arrivals; c) “TCER” for non-resident tourist arrivals in all accommodation types; and d) “THSR” for non-resident tourist arrivals in hotels and similar establishments. To standardize the data, we prioritized “TFR” and “VFN” indicators, resorting to “TCER” and “THSR” when the former were unavailable. Additionally, in cases of country or region names changes, we applied the current names uniformly. For countries with significant territorial changes (e.g., Serbia and Montenegro), segmented processing was employed. After thorough data filtering, cleaning, and verification, we compiled a comprehensive dataset containning 322,877 records of international tourist statistics from 1995 to 2021.

Based on this data obtained, we construct a directed weighted network G = {N,w}, where N = {N1,N2,…,Nn} represents the nodes in the network, corresponding to the n countries or regions involved. w = {wij} denotes the weighted edges from node i to j, where i represents the outbound country or region, j represents the inbound country or region, and wij represents the total number of tourists traveling from i to j in a given year.

Other indicators shown in Table 1 were used to analyze the factors influencing communities efficiencies. These factors include GDP, per GDP, total population, urban population, exports of goods and services, air passengers carried from World Bank, number of world heritage and number of representative intangible cultural heritage in the country are gain from UNESCO, war and conflict are gained from Wars in the World (https://www.warsintheworld.com), political violence and protest events are gained from ACLED (https://acleddata.com/). Furthermore, the geographical proximity of each country or region is calculated by the authors from Google Earth.

Table 1 Factors used to analyze.

Network community/cluster deviation algorithm

Numerous approaches have been explored to identify communities within networks, with the community/cluster deviation algorithm by Blondel et al. (2008) being particularly notable. This algorithm, is widely used for partitioning networks into clusters and is based on the principle of modularity, a measure that varies from −1 to 1. Modularity assesses the partitioning quality by comparing the density of links within communities to the density of links between different communities. The degree of modularity is influenced by the allocation of nodes to communities within the network. A modularity value close to 1 indicates a well-defined community structure resulting from the network partitioning, signifying a higher quality of partition. The calculation method is structured as follows:

$${\rm{Q}}=\frac{\sum _{i,j}\left[{w}_{{ij}}-\frac{{A}_{i}{A}_{j}}{2m}\right]\delta ({c}_{i},\,{c}_{j})}{2m}$$
(1)
$$\delta ({c}_{i},\,{c}_{j})\,{is}\,1\,{if}\,{c}_{i}=\,{c}_{j}{\rm{and}}0{\rm{otherwise}}.{\rm{m}}=\frac{{\sum }_{i,j}{w}_{i,j}}{2}$$
(2)

where wi.j is the weight of the edge between nodes i and j. In the unweighted network model, if i and j has a time of co-occurrence, the value will be 1, otherwise it will be 0. \({A}_{i}={\sum }_{j}{w}_{i,j}\) is the sum of the weights of the edges attached to node i. The modularity is a scalar value between −1 and 1 that evaluates the density of links inside communities. \({c}_{i}\) is the community where node i is allocated in.

Each node i is taken as a community, so the number of communities is the same as that of nodes. For node i, when i is put into its neighboring community of j, the gain of modularity ΔQ is measured. Taking every neighboring community of node i into consideration, if the gain is positive, i will join the one with maximum ΔQ; otherwise, it will stay in its original community. The gain of modularity ΔQ is calculated by the following equation:

$$\Delta {\rm{Q}}=\left[\frac{\sum {C}_{{in}}+{A}_{i,{in}}}{2m}-{\left(\frac{\sum\, {tot}+{A}_{i}}{2m}\right)}^{2}\right]-\left[\frac{\sum\,{in}}{2m}-{\left(\frac{\sum\,{tot}}{2m}\right)}^{2}-{\left(\frac{{A}_{i}}{2m}\right)}^{2}\right]$$
(3)

In Eq. (3), \(\sum {C}_{{in}}\) represents the total weight of links within community C, \(\sum {tot}\) denotes the total weight of links to all nodes in community C, \({A}_{i}\) is the sum of the weights of links incident to node i, \({A}_{i,{in}}\) is the total weight of links from node i to all nodes in C, and m signifies the total weight of all links across the network.

Subsequently, a restructured network is formed where nodes represent the identified communities. The weights of links connecting these nodes are recalculated by summing the weights of links between nodes within the same communities. Links between nodes of the same community transform into self-loops in the new network configuration, and this process is repeated iteratively until no further changes are observed, indicating the achievement of maximum modularity.

This approach reveals the hierarchical structure of the network by merging nodes and clusters from the original network, enabling the exploration of the network’s structure at different levels of detail. This method overcomes resolution limits, is straightforward to implement, and is efficient, as improvements in modularity are easily computed. A positive modularity value indicates the presence of distinct communities within the network, with higher values suggesting clearer and more pronounced divisions, thereby suggesting a more cohesive network structure.

Network community efficiencies measurement

Theoretical deduction

A network can be divided into several communities with closer internal connections, where each community (comprising more than one node) is a sub-network. Thus, the network topology can be further organized into intra-community and inter-community relationships following community deviation (Biswas and Biswas 2015). Certain attributes of these communities resemble those of the entire network, including: 1) the number of nodes and edges in the sub-network, where the combination of these elements reflects the scale of the community, and change in nodes and edges within a dynamic sub-network indicate shifts in the community, demonstrating the similarity characteristics of the community; 2) the way in which nodes within the sub-network connect to each other, which reveal the community’s structure; and 3) the volume and direction of edges within the community, which reflect its function. Within the community, the merging of individual relationships can reveal the interactions between communities, allowing for the calculation of the role and characteristics of the entire cluster in the network (Fig. 2).

Fig. 2
figure 2

The foundation of network community efficiencies in this study.

Secondly, the formation of communities effectively improves network efficiency, however, few studies have focused on measuring cluster efficiency. In a network, connections between members of the same community are relatively tight, with the cooperative development of members serving as a common goal of community formation and growth (Newman and Girvan, 2004). Guided by this objective, it is necessary, on one hand, to enhance connectivity within the intra-network of a community, fostering the development of a more efficient community structure and improving community outcomes. On the other hand, it is essential to expand the community’s influence and increase its embeddedness within the broader network (Ke et al. 2013).

Therefore, this paper aims to measure the efficiency of communities in networks from three aspects: structure, performance, and function. It introduces three indices: structural efficiency, performance efficiency, and functional efficiency. Structural efficiency measures the connectivity within a community, performance efficiency assesses the actual benefits within a community, and functional efficiency evaluates the external influence of a community. Thus, the paper provides a comprehensive evaluation of the structure, performance, and function of communities within networks.

Similarity measurement

As the network evolves, the members of a community may shift as well, but currently, there is no standardized method to quantify the extent of these member fluctuations. Thus, we designed a new method drawing inspiration from the Jaccard similarity index from computer science. The index can value the dissimilarity of members between the compositions of two sets, which is shown as follows:

$$J\left({G}_{i},{G}_{i+t}\right)=\frac{\left|{G}_{i}\cap {G}_{i+t}\right|}{\left|{G}_{i}\cup {G}_{i+t}\right|}=\frac{{M}_{o}}{{M}_{o}+{M}_{1}+{M}_{2}}$$
(4)

where, \(J\left({G}_{i},{G}_{i+t}\right)\) is the similarity index of Gi and when it change into Gi+t in the time of t, M0 is the number of nodes which are both in the Gi and Gi+t, M1 represents the number of nodes which only in Gi but not in Gi+t, M2 is the number of nodes which only in Gi+t but not in Gi+t. The value of \(J\left({G}_{i},{G}_{i+t}\right)\) is in the range of [0,1], the index is more nearly to 1 the more similar of the two sub-networks. The whole \(\bar{{J}_{t}}\) is the J value of all communities in t.

Structure efficiency measurement

Structural efficiency refers to the effectiveness of connections within a community, essentially capturing the tightness of internal links in the sub-network. In complex network research, the average shortest path length (ASPL) and the clustering coefficient (CC) can usually reflect the tightness of the network structure (Watts 2003). When the ASPL is condensed and the CC is robust, it signifies that the network connection will be tighter and the connection efficiency will be higher. However, given variations in community size, it’s necessary to eliminate the impact of scale on the structure of the sub-network.

For a given scale (number of nodes and edges) of a random network (ER network), the ASPL and CC are only related to the number of nodes and edges. Consequently, by referencing the ASPL and CC of an ER network of equivalent scale, we can adjust for the scale’s impact on the structure of the target sub-network (Humphries and Gurney 2008). Thus, it reflects the structural efficiency of the actual connected sub-network of the community. The calculation method is as follows:

$${S}_{{Gi}}=\frac{{C}_{{Gi}}/{C}_{{random}}}{{L}_{{Gi}}/{L}_{{random}}}$$
(5)

where, network G contains n nodes and forms x clusters G1, G2, G3…Gi…Gx. Crandom and Lrandom represent the CC and ASPL for an ER network with equivalent scale, respectively. The structural efficiency index S is greater than or equal to 1; the larger its value, the fewer edges are needed to achieve connections among cluster members, indicating a higher structure efficiency in the community.

Performance efficiency measurement

Performance efficiency denotes the extent to which a community realizes its developmental objectives in terms of the outcomes achieved. The primary objective of the global tourism network is to enhance the flow of tourists between various countries or regions. Consequently, a community with a substantial relative tourist volume among its members can be deemed to have high performance efficiency. Hence, this paper introduces performance efficiency P, calculated as follows:

$${P}_{{Gi}}=\mathop{\sum }\limits_{i\in {Gi},j\in {Gi}}^{I}{w}_{{ij}}/\mathop{\sum }\nolimits_{i\in {Gi}}^{I}{A}_{i}$$
(6)

where, GiG, PGi is the performance efficiency measurement of Gi, and node i and j are the members of Gi, the total number of nodes in Gi is I. Inside, wij is the weight of edge between i and j. Ai is weighted sum of edges which pass i, thus, \({A}_{i}={\sum }_{j}{w}_{{ij}}\). The range of P is between 0 to 1, and the higher the value of P, the higher performance efficiency of the community is.

Functional efficiency measurement

Functional efficiency refers to the role of a community within the entire network, especially its impact on other parts. He and his colleagues (He et al. 2017) proposed a method for calculating the diffusion coefficient of a network community, which serves as a valuable reference for this study. Considering that the constitution of the global tourism network is tourist flow, thus, the efficiency with which a community facilitates the movement of tourists to other parts of the network can be indicative of its functional efficiency. Therefore, the calculation method for functional efficiency is as follows:

$${F}_{{Gi}}=N\mathop{\sum }\limits_{i\in {Gi},j\notin {\rm{Gi}}}^{I}{w}_{{ij}}/I(N-I)\mathop{\sum }\nolimits_{i\in G}^{N}{A}_{i}$$
(7)

where, FGi is the functional efficiency of Gi. \(\mathop{\sum }\nolimits_{i\in {Gi},j\notin {\rm{Gi}}}^{I}{w}_{{ij}}\) represents the weighted sum of edges that connect nodes within the community Gi with the nodes without the community Gi. \(\mathop{\sum }\nolimits_{i\in G}^{N}{A}_{i}\) represents the weighted sum of all the edges in G. Considering the number of members is different of each community, thus, there is a need to adjust for the scale of the community. The range of the F is [0,1]. The larger the value of F, the higher the efficiency of the community’s impact on other parts.

Mixed effects model to explore the influencing factors

In addition to analyzing communities within the global tourism network, this research utilizes a mixed-effects model, encompassing both random and fixed effects, to identify which factors influence community efficiencies. This model is particularly beneficial in scenarios where repeated measurements are made on identical statistical units (longitudinal study), or when measurements are conducted on clusters of related statistical units (Gomes 2022). An additional merit of this approach lies in its flexibility when handling missing values and uneven spacing of repeated measurements (Yang et al. 2014).

In our model, the three community efficiencies are shown as dependent variables, and the 11 factors (in Section “Research framework” Table 1) are regarded as independent variables, thus the formular is as follow:

$$\begin{array}{l}{Y}_{{it}}={\beta }_{1t}{{GDP}}_{{it}}+{\beta }_{2t}{{pGDP}}_{{it}}+{\beta }_{3t}{{TP}}_{{it}}+{\beta }_{4t}{{UP}}_{{it}}+{\beta }_{5t}{\rm{EGS}}\\\qquad\,\,\,+\,{\beta }_{6t}{{APC}}_{{it}}+{\beta }_{7t}{{WH}}_{{it}}+{\beta }_{8t}{{ICH}}_{{it}}+{\beta }_{9t}{{WC}}_{{it}}\\\qquad\,\,\,{+\,{\beta }_{10t}{{PVPE}}_{{it}}+\beta }_{11t}{{GP}}_{{it}}+{s}_{{it}}{+\varepsilon }_{{ijt}}\end{array}$$
(8)

where, \({Y}_{{it}}\) is one of the community i’s efficiency in time t.\(\,{\beta }_{1t}\) to \({\beta }_{11t}\) represent the coefficients of all the independent variables we selected in the year of t; all independent variables are the average of members’ value in the same community; sit shows the random effect of the community i in time t; \({\varepsilon }_{{ij}}\) is a residual error in regression model.

Results

Communities detection within global tourism network

Between 1995 and 2021, the global tourism network experienced a rapid increase, with the total number of international tourists worldwide increasing from 540 million in 1995 to 6.6 billion in 2022, marking a 12.2-time increase. Over these 26 years, the scale of the global tourism network consistently expanded, as indicated by the growing number of nodes and edges (Fig. 3). This suggests that as globalization progresses, more nations and regions are involved in cross-border tourism, and their connections are also increasing. By 2019, the global tourism network encompassed 234 countries or regions (nodes), forming 17,048 direct connections (edges). representing the largest scale recorded to date. However, due to the impact of COVID-19, the size of the global tourism network contracted sharply in 2020.

Fig. 3
figure 3

The scale of global tourism network in 1995-2021.

Results based on the community detection method in Section “Data Source” analysis reveal fluctuating trends in both modularity and the number of communities within the global tourism networks over 26 years (Fig. 4). From 1995 to 2001, there were 8 communities; from 2002 to 2004, 7 communities were identified; from 2005 to 2021, consistently 6 communities were found, except for 2020, which saw 7 communities because of the Covid-19 pandemic. Modularity exhibits cyclical changes, reflecting fluctuations in community cohesion. Specifically, 1995–1996, 2001–2004, 2009–2013, and 2020 are the periods with relatively high modularity, implying stronger internal cohesion within their sub-network communities.

Fig. 4
figure 4

Modularity and communities in 1995–2021.

The results of this study differ from the findings of Chung et al. (2020) and Seok et al. (2021). The main reason for these discrepancies lies in the scope of the data. Previous research focused exclusively on the inbound and outbound data from top-ranking tourist destination countries to construct a global tourism network. However, this paper study incorporates data from all available countries and regions with statistical indicators, making its results potentially more comprehensive and reliable.

The communities have been labeled as A, B, C, …, I in the order of identification. If the community member turnover exceeds 30%, renaming the community is considered. The number of members in each community from 1995 to 2021 is statistics in Fig. 5. It can be observed that communities C, D, F, and H consistently maintain a high number of members. While, G and F have fewer members, which were merged into H and D, respectively in 2002 and 2004. In 2020, community I was carved out of community F, and in 2021, community C disappeared, with related countries and regions primarily integrating into communities F and I.

Fig. 5
figure 5

Number of members in each community during 1995–2021.

Communities evolution within the global tourism network

We conducted a statistical analysis of the member count in each community (Fig. 5) and calculated the structural similarity between each community in the current year and the previous year, based on the similarity index J (Table 2). This further facilitated the determination of evolution stages in the global tourism network communities.

Table 2 The similarity index (J) of all communities during 1995–2021.

By setting \(M=\bar{{J}_{t}}-\sigma\) (σ is the standard deviation), it was observed that the \(\bar{{J}_{t}}\) coefficients for the years 1999, 2000, and 2001 were lower than the M value, which means it was discrete within a 95% confidence interval. Along with the changes in the number of communities, we divided the evolutionary stages into 1995–1998, 1999–2003, 2004–2019, and 2020–2021, and name these 4 stages as stage I to IV.

Taking into account the location of countries and regions, we visualized the distribution of community members for the years 1995, 2002, 2004, 2010, and 2020 representing communities’ evolution in 4 stages, as demonstrated in Fig. 6. Community A predominantly consists of former Soviet Union countries such as Russia, Belarus, and Ukraine. Community B is centered around African Union countries, but mainly in central and southern African countries like South Africa, Tanzania, and Zimbabwe. Middle Eastern countries such as Saudi Arabia and Qatar form Community C, while Pacific and Indian Ocean Island nations and coastal countries in Oceania like Australia and New Zealand constitute Community D.

Fig. 6
figure 6

The patterns of communities in four stages.

Community E includes China, Hong Kong, Macau, Taiwan, and North Korea. Since 2004, likely as a result of China’s accession to the World Trade Organization, community E has merged into D. Community F represents Western European countries especially those bordering the Mediterranean Sea, with some Northern European countries occasionally moving between communities F and A, reflecting their role in bridging these two communities. Community G includes South American countries like Brazil and Chile, and Community H is composed of North and Central American countries. In 2002, Community G merged into Community H, strengthening tourism ties across the Americas.

Community I emerged from Community F in response to the pandemic, comprising countries such as Germany, Poland, Italy, and Slovakia, highlighting the significant impact of the pandemic on European tourism organizations.

Additionally, the fluctuations in tourist flow between communities reflect the relationships between them. The results of the tourist flow between communities’ analysis (Fig. 7) show that, during the first phase (1995–1998), the most significant tourist flow was from Community F to Community A, averaging 59.513 million tourists, which helped the Community F emerge as the biggest sender in the global tourism network. The second-largest flow was from Community H to F, averaging 14.349 million tourists.

Fig. 7
figure 7

The relationships between communities in 1995–2021.

During the second phase (1999–2003), the volume of tourists from Community A to Community F increased rapidly, then receded somewhat, but maintained a high level. In the third phase (2004–2019), the largest tourist flow was from Community A to Community F, averaging 47.062 million, followed by the tourist flow from Community H to Community F, averaging 32.954 million. This reflects that the Mediterranean region is an important tourist destination for America, Eastern Europe, Russia, Nordic countries, etc., and is shown as the biggest receiver of international tourists.

Under the impact of the pandemic, the fourth phase (2020–2021) saw significant changes in international tourist flows. The main flow was from Community I to Community F, while the flow from Community A to Communities H and I also increased significantly. The total flow from Community A to Community E, H, I accounted for 49.86% of all flows from Community A.

The community efficiencies changes

The statistical data and trends regarding the structural, performance, and functional efficiency of each community from 1995 to 2021 are presented in Table 3 and Fig. 8.

Table 3 The statistics summary of all communities’ efficiencies.
Fig. 8
figure 8

Communities’ efficiencies changes during 1995–2021.

Structural efficiency results show that, on average, Community C with an average efficiency of 3.297, exhibits the highest structural efficiency, but with a large variance, indicating significant annual changes. The next is Community E, while Community B has the largest variance. From an evolutionary perspective, Community B had the highest structural efficiency in stage I, reflecting strong internal communication efficiency among African countries, but it declined rapidly, possibly related to the transformation from the former Organization of African Unity to the African Union organization. Community E was present for only nine years, but consistently demonstrated high structural efficiency. In stage II and stage III, the structural efficiency of Community C surged, especially during 2001–2008, with an average value of 4.333, likely due to the rapid development of China’s inbound and outbound tourism and the integration of Community E into Community C. In the second stage, community D ranks second in terms of structural efficiency, with an average value of 2.904. However, in the third stage, community B’s structural efficiency increases again, with an average value of 1.867, and ranks first in the fourth stage. Based on the results of all the four stages, it is found that the variance of communities A, F, G, and I is relatively small showing relatively stable overall structural efficiency. Overall, the structural efficiency of each community is on a downward trend.

The performance efficiency results indicate that Community E has the highest performance efficiency, which might be associated with the largest international tourism flow between mainland China and Hong Kong. Conversely, Community A has relatively lower performance efficiency. Considering the evolutionary trend, Communities D and F have larger standard deviations, 0.077 and 0.056 respectively, suggesting considerable changes in performance efficiency. Detailed analysis reveals that the merger of Community D into Community E led to a rapid increase in performance efficiency after 2004. The sudden decrease in performance efficiency for Community F in 2014, 2015, and 2020 may be influenced by members declining.

Functional efficiency shows that Community F has the highest functional efficiency, significantly surpassing other communities, reflecting that mainland China, Hong Kong, Macau, and so forth are important sources of international tourists from other global areas. In the stage III and IV, the merging of Community D into E also drove the increase in functional efficiency for Community D, followed by Community F, indicating that the region centered around Mediterranean countries is a crucial market in international tourism. Community B’s functional efficiency is the lowest, suggesting a smaller scale of transcontinental tourism activities in African countries. Except for Communities A and D, the standard deviations of other communities are below 0.005, with minor variations and an annual upward trend (except for stage IV), which aligns well with international tourism development trends.

Influencing factors of community efficiencies

To determine the factors that influence community efficiency, we utilized a mixed-effects model for validation. Building on prior studies on tourist flows, potential influencing factors include economic level, population, transportation, attractions, and security environment. Given data availability and representativeness, we established a set of 11 independent variable Table 1. Firstly, it is necessary to exclude multicollinearity between variables as much as possible. The variance inflation factor (VIF) tests were conducted on all selected variables before constructing the model. When the VIF value is less than 10, it indicates that there is no significant correlation between the variables. As shown in Table 4, the VIF of variable UP and TP are higher than 10, thus the variable UP is deleted. After adjustment, all the VIF have been below 10, means there is no multicollinearity among the 10 selected variables, which can be used for model construction.

Table 4 The VIF of all variables.

Acknowledging the time series nature of the data, we employed a mixed-effects model with years serving as random effect variables to account for the temporal variations. The results of the mixed-effects model are shown as Tables 57.

Table 5 The estimate results of structural efficiencies.
Table 6 The estimate results of performance efficiencies.
Table 7 The estimate results of functional efficiencies.

Firstly, geographical proximity (GP), air passengers carried (APC), and exports of goods and services (EGS) significantly influence on community structure efficiency in the global tourism network at 0.05 level. Meanwhile, political violence and protest events (PVPE) have significant impact on community structure efficiency at 0.1 level. Inside, the effects of geographical proximity and exports of goods and services are positive, while air passengers carried (APC) and political violence and protest events (PVPE) generate negative influence. The coefficient of political violence and protest events is −6.132 with maximum absolute value in all factors, which means political violence and protest events can greatly damage the community’s structure efficiency.

Secondly, air passengers carried (APC) and GDP significantly influence performance efficiency of community in the global tourism network. While, air passengers carried (APC) demonstrate completely contrasting effects on performance efficiency. In addition, GDP has a negative effect, suggesting the average value of all members’ GDP within a community may be unfavorable for the community’s performance efficiency.

Thirdly, total population (TP), air passengers carried (APC) and GDP influences on community function efficiency, in which, the GDP works significantly at the level of 0.001. The effects of total population and air passengers carried are positive, conversely, the impact of GDP is negative.

Conclusion and discussion

Conclusion

This study constructs global tourism networks based on international tourist statistics from 1995 to 2021, and further analyzes these networks using complex network methods. Some conclusions have been discovered.

Firstly, from 1995 to 2019, the scale of the global tourism network has been continuously expanding, but it experienced a significant reduction in scale due to the COVID-19 pandemic. The global network can be partitioned based on modularity calculations. The results show that, from 1995 to 2001, there were 8 communities in the global tourism network, 7 communities from 2001 to 2004, and 6 communities from 2005 to 2021 (excluding 2020). These findings partially differ from those of Chung et al. (2020) and Seok et al. (2021), possibly due to differences in data collection and calculation methods.

After coding each community, the similarity index was used to calculate the similarity of community’s structure, revealing significantly lower similarity index in 1999, 2000, and 2001. Combining these results with community detection, the evolution of the global tourism network can be divided into four stages: 1995–1998, 1999–2003, 2004–2019, and 2020–2021. Mapping communities in each stage shows that Community A is dominated by former Soviet Union countries, Community B constructed by central and southern Africa countries, Community C by Middle East Islamic countries, Community D by Oceanian countries and some periods including countries bordering the Indian Ocean and East Asia and Southeast Asia countries), Community E by China, and Democratic People’s Republic of Korea, which merged into Community D in 2004, Community F by countries surrounding the Mediterranean and most European countries, Community G by South American countries, which merged into Community H in 2001, and Community H by most countries in the Americas. Community I emerged due to the pandemic, mainly consisting of Eastern European countries, partitioned from Community F. Calculations of tourist flow between communities show that Community F is the main source and destination of tourists, with close connections between Communities A and F, and H and F.

Efficiency calculations of communities show that Community C has the highest structural efficiency, Community E has the highest performance efficiency, and Community F has the highest functional efficiency. From 1995 to 2021, structural efficiencies of communities showed a slight decline, performance efficiencies remained overall stable but with strong fluctuations, while functional efficiencies showed an upward trend. This reflects the increasingly prominent interdependence relationship between various groups and the holistic network.

Building on previous studies indicated many factors such as socio-cultural factors, geographical, international mobility, demographic, technological, health risks, psychological and lifestyle, and peace and security factors, could significantly affect international tourist flow (Gidebo 2021; Zeng and He 2019), this study further calculated the influencing factors of community efficiencies through mixed-effect models. It is found that the average geographical proximity and exports of goods and services of members significantly positively affect the structural efficiency of communities, while air passengers carried and political violence and protest events have negative impacts. Meanwhile, air passengers carried have a normal positive effect on performance efficiencies and functional efficiencies of communities. However, the average GDP of members has a negative impact on both performance and functional efficiencies of communities. These results differ from previous studies on international tourism flows (from source to target), especially regarding per GDP, which has shown the opposite effect (Chung et al. 2020), reflecting that individual characteristics have different effects on the community level compared to individual levels.

Implications and limitations

This research has contributed both theoretically and practically to the field of international tourism. On theoretically, international tourism research has been a focal point for scholars, particularly with the recent introduction of network science research methods, there is a growing interest in deconstructing the global tourism network. However, most studies have focused on the macro-level of the global tourism network or individual-level characteristics, with less emphasis on the community level. This oversight may be due to two factors: a lack of methods for measuring the community level within networks, and a need to raise awareness of the importance of communities. Real-world complex networks are dynamic, with communities evolving, reflecting the overall characteristics of the entire network’s evolution. Therefore, the first theoretical innovation of this study is to investigate the global tourism network from a community scale perspective, focusing on communities evolution within the global tourism network and establishing a basic paradigm for network community research.

Secondly, through network structure analysis, we have developed indices to measure community similarity, as well as methods to calculate community structural efficiency, performance efficiency, and functional efficiency. These provide theoretical and methodological support for future research and help us understand the structure and evolution of the global tourism network. These indicators highlight the network’s changing characteristics, avoiding the uncertainty caused by changes in community members, a factor not previously addressed. The method reflects the efficiencies of communities from multiple levels, a perspective not explored in previous research, aiding in a comprehensively understanding of communities evolution in network.

Thirdly, this paper analyzed factors influencing community structure and efficiency within the network, exploring the relationship between individual-level characteristics and community traits. By examining factors influencing community efficiencies in the global network, this paper found that GDP, air passengers carried, exports of goods and services, political violence and protest events, and geographical proximity significantly affect community efficiencies, each with varying effects. Previous studies have not explored the impacts of individual member characteristics on community efficiencies, representing another theoretical contribution of this study. Analyzing individual characteristics will also help to understand community features and the overall network structure.

Practically, from the community level, we have delineated the evolution of the global tourism network into four distinct phases, marking a first in global tourism empirical research. We discovered that pivotal historical events such as the dissolution of the Soviet Union, China’s accession to the WTO, and the COVID-19 pandemic might serve as watersheds affecting the phase transitions within the global tourism network. This research narrates the development pattern and changes in the global tourism industry from the perspective of network community evolution, providing insights for studies on the history of global tourism. This conclusion reflects that major international events can affect the composition of communities within the global tourism network. Previous research has primarily focused on the impact of events on individual countries and regions. Therefore, our research will help provide a new perspective, understanding the impact of major events from a global level.

Secondly, this research proposes a feasible method for evaluating community efficiencies. The analysis of community efficiencies can be applied to various tourism networks, helping to grasp the rationality of community organizations and guide tourism cooperation development. Meanwhile, this research provides insights into the organization and management of global tourism. The findings on factors influencing community efficiencies reflect the need to consider economic strength, export capacity, political security, air shipping capacity, and geographical relationships in the organization of international tourism clusters. Therefore, when considering strengthening tourism connections with other countries or regions, these factors should be taken into account.

Additionally, the delineation of global tourism communities remains contentious, as evidenced by differing methods from organizations such as WTTC and UNWTO. Our use of network research methods for community delineation offers practical guidance for international tourism cluster statistics. For instance, we found that North African countries have closer tourism connections with European countries and Middle Eastern countries, while East Asian countries have closer ties with Pacific Island countries and Oceania, which cannot be delineated solely based on continental plate divisions.

However, our study also has certain limitations. Firstly, the developed research methods, including similarity index and community efficiencies measurements require further empirical testing. Secondly, considering factors such as language, visa policies, and cultural differences among community members may also affect community structure and efficiencies. Given the available data, these variables could be further examined. Thirdly, our study only used the mixed-effect model method, which although suitable for our research application, lacks contrasting results from different research methods. Various research methods may yield different results. Moreover, the impact of global tourism changes post-COVID-19, beyond 2021, warrants further exploration.