Introduction

In the dynamic landscape of urban development, urban vitality stands as a cornerstone for sustainable growth and a linchpin in the construction of smart cities. Encompassing the vibrant tapestry of society, economy, and culture1, urban vitality is the beating heart of urban public spaces. It’s not merely about the hustle and bustle of human activities,it’s a powerful force that breathes life into cities. By enhancing the urban spatial environment, it elevates residents’ quality of life, acting as a catalyst for social activities and interactions. Moreover, it serves as a magnet, attracting investments and high-end talents, while also bolstering urban intelligent construction and propelling a city’s regional competitiveness to new heights2,3,4,5. Existing research has left no doubt that the urban built environment, with its complex web of land-use conditions, development intensity, and service-facility density, plays a decisive role in urban development, and is intricately linked to urban vitality.

As the world’s largest developing country, China has witnessed an urban transformation on an unprecedented scale over the past three decades. Many cities have burgeoned into urban centers marked by high population density and intense development6,7. However, in recent years, the well-spring of urban vitality in central urban areas has faced a significant ebb. The depletion of new construction-land resources and the aging of existing urban infrastructure have set off a chain reaction. Traffic congestion has become a daily headache, infrastructure-maintenance costs have soared, the ecological environment has taken a hit, and economic vitality has waned8,9,10. Against this backdrop, unlocking the driving mechanisms of built-environment factors on urban vitality, breathing new life into urban spaces, fueling smart-city development, and charting a course for sustainable and inclusive urban growth have emerged as the burning questions at the forefront of current urban-planning research.

Measuring urban vitality is no simple feat, owing to its multi-dimensional nature. Early attempts at vitality measurement relied heavily on traditional data collection methods such as questionnaire—survey data11, telephone—interview data12, and manual—counting data13. These methods, while ensuring data accuracy, were like trying to capture a complex ecosystem with a limited—aperture lens. They propelled the quantitative study of urban vitality, but the small sample size and high acquisition costs were formidable barriers. Gathering enough data to measure urban vitality from all its diverse dimensions was a Herculean task.

The digital age has ushered in a new dawn for urban vitality measurement. The convergence of information and communication technology with large-sample, high-resolution spatio-temporal location big data has been a game-changer. These new technologies have not only supercharged measurement efficiency and accuracy but have also opened up a world of possibilities for multi—dimensional measurement and characterization of vitality. Mobile-signaling data14, Internet Location-Based Service (LBS) location data15, and subway smart-card data16 have become invaluable tools for gauging population density. Meanwhile, identifying the urban spatial distribution of nighttime-light data17, Dianping data18, and Point of Interest (POI) data19 has emerged as a key to unlocking the economic attributes of urban vitality. In the information age, where our lives are increasingly intertwined with digital spaces, social—media data like Weibo check-ins are revolutionizing the way we evaluate urban social vitality, bridging the gap between our physical and virtual movements3,4.

The advent of multi-source big data, paired with the rapid evolution of spatial-analysis technology, has been a boon for researchers delving into the relationship between urban vitality and the urban built environment. It has provided a cornucopia of data sources and a rich toolkit of quantitative methods, catapulting research in this field forward. Key areas of investigation include: the role of concepts like functional agglomeration and mixed land use in urban land use in enhancing vitality and to what degree19,20,the impact of urban public transportation, slow-traffic, and other factors on urban vitality12,and the direct link between urban development intensity, as represented by plot ratio and building density, and vitality, along with the question of whether high-density urban-form construction might stifle vitality18,21,22.

Despite the extensive research efforts by scholars, the picture remains muddled. Some studies have painted high-density urban construction as a harbinger of scale effects, leading to functional concentration and a flourishing of human activities and interactions in space, thereby enhancing spatial vitality3,4,23. However, through cross-city comparisons, other scholars have uncovered that the relationship between high density and urban vitality is not so straightforward. Excessive urban density can create a suffocating living-space environment, putting the brakes on urban vitality24,25,26. Clearly, the same factors can have wildly different outcomes in different built environments. This calls for a deep-dive into heterogeneity, to uncover the precise conditions under which each environmental factor can stoke the fires of urban vitality. Such insights are invaluable for guiding refined urban design and management in the context of urban renewal.

In this study, we embark on a comprehensive exploration of urban vitality, harnessing the power of multi-source big data. We build a built—environment system grounded in four key dimensions: location, diversity, density, and traffic, and blend various data sources to calculate relevant indicators. By employing the Multiscale Geographically Weighted Regression (MGWR) model, we set out to unravel the spatial-heterogeneity characteristics of how the built environment impacts urban vitality. This research offers three major contributions. First, by integrating multi-source big data, our findings can deepen the multi-dimensional understanding of urban vitality, shedding light on the heterogeneous nature of the built-environment’s impact based on a detailed exploration of local-scale heterogeneity features. Second, introducing the MGWR model into the study of the urban vitality-built-environment relationship sharpens the accuracy of our research conclusions. Comparing the fitting results of Ordinary Least Squares (OLS) and Spatial Lag model (SLM) models further enriches our understanding of methodological applicability. Third, by clarifying the conditions under which the built environment can enhance urban vitality, this study serves as a compass, guiding spatial remodeling and vitality-boosting efforts during urban renewal and construction. It provides both theoretical underpinnings and practical blueprints for the precise transformation of smart-city construction and land use.

Data and method

Research area

Xiamen, one of the earliest-established special economic zones in China, holds a significant position as an important central city within the Guangdong-Fujian-Zhejiang coastal urban agglomeration as outlined in the "14th Five-Year Plan." It also serves as a port city and a popular coastal tourist destination. Xiamen Island, the central urban area of Xiamen, is characterized by intense urban activities, diverse activity types, and high urban vitality. In 2021, the urbanization rates in Huli District and Siming District on Xiamen Island comprehensively reached 100%. Xiamen Island features well-developed basic service facilities, diverse urban spatial construction, a mature urban road network system, and a vibrant urban spatial environment, presenting a rich and diverse urban built environment.

Therefore, we select Xiamen Island as a case study to explore the spatial heterogeneity of the influence of the built environment on urban vitality. Given that Gulangyu Islet is an independent island and has significant disparities in urban vitality and the built environment compared to Xiamen Island, it is excluded from the scope of this study (Fig. 1). Choosing Xiamen Island as the research area is like picking a dazzling pearl from the treasure trove of urban research. Its unique urban development trajectory and rich built-environment characteristics offer an excellent sample for us to deeply analyze the complex relationship between urban vitality and the built environment. Through the research on Xiamen Island, we expect to unearth universal laws and provide references for the development of other cities.

Fig. 1
figure 1

Location and land use of Xiamen Island Note: The satellite images are from OpenStreetMap.

Data source and preprocessing

LBS big data

LBS big data, originating from mobile internet positioning services, meticulously records various events within mobile applications (Apps), such as user logins, searches, message exchanges, and push notifications27. This data is highly effective in measuring people’s behavioral characteristics in large-scale spatial environments. As a result, LBS big data is extensively utilized for the quantitative expression of urban vitality and the description of its spatial distribution characteristics. The LBS data employed in this study is sourced from Aurora Mobile’s anonymous geospatial data. This data is collected by providing SDK positioning development environments for mainstream Apps in China, enabling the accurate recording of mobile users’ spatial location information.

The data collection spanned two weeks from October 17 to 30, 2020, covering 10 working days and 4 rest days without holidays. During this period, the impact of the COVID-19 pandemic in Xiamen was gradually waning. The temperature ranged between 19.8—26.9 degrees Celsius, with clear or mostly cloudy weather and no rainfall. These favorable conditions allowed residents to freely engage in activities and daily travel within the city.

Given the conflicting relationships in population vitality between working days and rest days, the collected data was segmented and processed separately for each. The preprocessing steps for the original data are as follows: All LBS records within the Xiamen city area from October 17 to 30, 2020, were selected, amounting to 130,817,778 records, and 6.02 million users were identified. The location records of users at whole hours during the daytime (8:00—22:00) each day were calculated. The location information of each user ID was pushed back by one hour, and the record closest to that whole hour was regarded as the user’s location at that time point. The highest number of users identified in the Xiamen city area was 329,800 at 19:00 on working days and 309,200 at 18:00 on rest days. The Xiamen Island data from 8:00 to 22:00 over 14 days was summarized. The multi-day average values of whole-hour data on working days and rest days were calculated. The total sum of these multi—day average values at each whole hour from 8:00 to 22:00 over 14 days was used to calculate spatial vitality.

Weibo check-in data

Social media check-in data can vividly describe individual spatial behavioral characteristics and, when combined with specific activity types, can effectively reflect individual lifestyles. In China, the primary source of social media check—in data is Sina Weibo check-in data. Sina Weibo, a social media platform software similar to Twitter, has hundreds of millions of users in China and is a highly popular social media software. Its check-in data can mirror social hot topics and public opinion, often serving as a proxy indicator for urban vitality.

The method for obtaining Weibo check—in data is as follows: Using a Python web crawler program, Weibo check-in data from Xiamen from October 1, 2019, to October 31, 2020 (a total of 13 months) was scraped via the Sina API port. The data included 75,131 user IDs and 329,132 records, with fields such as "username," "time," "check-in location," "longitude and latitude," and “Weibo content”. First, invalid data, including those without location information, with a check-in location labeled as "none," or with longitude and latitude as "0,0," was removed. Second, since some data had a check-in location of “Xiamen” with longitude and latitude at "118.03394, 24.48405" (all located at a specific point in Haicang District), this data was considered invalid and removed. Subsequently, if the same user ID had more than two check-in records at the same location between adjacent whole hours on the same day, only the record closest to the next whole hour was retained as valid data. Finally, the remaining data was corrected for coordinate errors, resulting in 28,195 valid records on Xiamen Island.

Dianping data

Commercial establishments typically choose densely populated areas for their locations to ensure their sustainable operations. Additionally, the spatial clustering of businesses, such as restaurants, is conducive to promoting various activities and facilitating interactions among people28. Therefore, regions with prosperous businesses are often areas with vibrant urban vitality. Dianping, an independent third-party consumer review software, enables merchants to publish product information and allows consumers to post reviews on their dining experiences. The evaluation information in Dianping data can effectively reflect the foot traffic of merchants and serves as an effective representation of urban vitality. We utilized the commercial establishment review data from Xiamen Island on the Dianping website in November 2021. The original data included information such as store ID, store name, number of reviews, and location coordinates, with a total of 113,935 data records. After removing duplicate data, a total of 100,172 data records from Xiamen Island were extracted.

Baidu heat map data

Baidu, a well-known Chinese search engine company with hundreds of millions of users in China, offers mobile application services such as Baidu Maps, Baidu Cloud Drive, and Baidu Search Engine. Baidu’s heat map data is derived from the spatial location information of active users within Baidu’s mobile application services. The spatial distribution of this data is in a grid form, with each data point having a spatial range of 91.63 m x 91.63 m. Each data point contains information such as location coordinates and heat values. The heat values of data points reflect the level of user activity in the area, although the values do not directly correspond to the number of users. Baidu uses different colors to render the heat values of data points, symbolizing the spatial intensity of user activity. We extracted cross-sectional data for one week in November 2021 from Xiamen Island, amounting to 84 data points and 177,255 data records, using a product application network interface, with sampling intervals of 2 h and 12 samples per day.

POI

POIs were crawled from the Amap Open Platform in March 2022. The original data included information such as point of interest names, categories, and location coordinates, covering 17 major categories, including catering, leisure, healthcare, and others, with a total of 117,253 data records. Based on the research scope and requirements, the data was trimmed by excluding natural features and administrative landmarks, leaving 116,788 data records.

Due to the overlapping functionalities and diverse categories of POI, which may impede specific quantitative research, we classified the POI into four major categories: life service facilities, leisure and entertainment facilities, business and office facilities, and public service facilities. Life service facilities include dining and life services; leisure and entertainment facilities encompass leisure activities, hotel accommodations, tourist attractions, and shopping consumption; business and office facilities cover companies, financial institutions, commercial residences, and government agencies; public service facilities include education, sports, healthcare, and cultural services.

Street view image

Street view image data is sourced from Baidu Street View Maps. First, street view capture points were generated at 40-m intervals based on the central axis of roads on Xiamen Island, and the coordinates of each point were obtained. For short road segments, the midpoint of the central axis was selected. A total of 21,486 capture points were obtained. Second, using the API interface of Baidu Maps Open Platform, panoramic photos were captured at each point. The street view images were captured in June 2021, with a resolution of 4096 x 1380 pixels and a camera angle covering a 360° panoramic view at each point. The captured images are equipped with unique identifiers and latitude—longitude coordinates, with a total of 2768 streets and 21,486 valid images.

Since approximately one-third of the area in the lower part of the panoramic photos is occupied by the street view capture vehicle and the street ground, which may potentially affect the recognition results, the lower one-third portion of each photo was uniformly removed. Subsequently, the Deeplap-V3 + semantic tool developed by Google was used to identify features in the captured street view images. The results included 5 major and 19 subcategories, covering factors such as buildings, roads, sky, plants, vehicles, and people. The results were then matched with the capture point data spatially based on the unique ID of each image. Finally, the capture point data was spatially linked to the corresponding streets based on their spatial positions, and the average of the street view recognition results for all capture points on each street was calculated.

Road network data

The road network data is sourced from the OpenStreetMap (OSM) open-source website, specifically the data from 2021. The data was processed by classifying and modifying the topology, dividing the road system into four levels: expressways, main roads, secondary roads, and local roads.

Building vector data

Building vector data is sourced from the OpenStreetMap (OSM) website for 2021, including information such as building area, floor height, etc. The data underwent topological checks and refinements based on actual conditions.

Traffic data

Traffic data is sourced from the Amap Open Platform for 2021, including data on subway stations, Bus Rapid Transit (BRT) stations, and bus stops on Xiamen Island. After processing duplicate data, 40 subway stations, 35 BRT stations, and 864 bus stops were obtained.

Resident data

Based on the addresses of permanent residents recorded by the Public Security Bureau in December 2019, the latitude and longitude of each resident’s address were obtained using the Baidu API interface. After processing through ArcGIS, the spatial distribution of the city’s permanent residents was obtained. According to the calculation results, the population of Xiamen City is 5.32549 million, and the population of Xiamen Island is 2.32207 million. This data is in basic agreement with the 7th National Population Census in 2020, which reported a population of 5.16397 million for Xiamen City and 2.11029 million for Xiamen Island. The advantage of this data is that it allows for the localization of the residential addresses of each resident, thereby obtaining the population quantity and density of each residential area.

Research methods

Division of spatial units

The division of spatial units adopts a regional grid approach, which involves converting the study area into numerous grids of uniform shape and size. Commonly used grid types include triangular, square, and hexagonal grids. Among them, the hexagonal grid is more suitable for spatial heterogeneity analysis due to its unique characteristics. Hexagonal grids expand in a six-neighborhood direction, with each grid connected to six adjacent units. The expansion distance from the central unit to the neighboring units remains consistent29,30. In contrast to the eight-neighborhood expansion of square grids, this six-neighborhood expansion ensures distance control while enabling multi-directional extension30. As a result, hexagonal grids not only achieve efficient space-filling in terms of coverage but also possess isotropic advantages in neighborhood expansion. Moreover, the hexagonal shape more realistically mimics the spatial structure of urban commercial and service industries31. Therefore, we utilize hexagonal grids as the unit cells and divide Xiamen Island into 2,553 hexagonal grids with a side length of 100 m. This division method provides a precise and systematic framework for our subsequent analysis, enabling us to capture the fine-grained spatial variations within the study area.

Indicator calculation

Dependent variable: urban vitality

Incorporating the multi-dimensional and complex nature of the urban vitality definition proposed by scholars such as Jacobs32,33,34, we define urban vitality as the intensity of diverse urban activities that occur when people interact with urban spaces based on their communication needs. Based on this definition, we select LBS location service data, Sina Weibo check-in data, and Dianping (similar to Yelp) data to comprehensively assess urban vitality. Different from previous studies that relied on a single source of vitality data, the data used in this study reflects urban vitality attributes from three distinct dimensions: spatial activities, social activities, and economic activities.

LBS location service data records the triggering information of various operations in users’ mobile apps and real-time user spatial locations, thus reflecting the spatial activity attributes of urban vitality. Social media check—in data captures the spatial coordinates of users when they check in on online social platforms, mirroring the social activity attributes that contribute to urban vitality. Additionally, customer review data from platforms like Dianping records the online reviews of various businesses, effectively representing the economic activity attributes that contribute to urban vitality. These datasets offer rich and valuable insights into the dynamics of city life, allowing us to analyze the patterns of social and economic engagement across different areas.

Firstly, kernel density analysis (after repeated verification, the bandwidth selections are as follows: LBS big data 800 m, Weibo check-in data 1000 m, Dianping data 800 m) is employed to generate continuous and smooth raster surfaces. Partition statistics tools are then utilized to calculate the vitality values for each grid unit. Secondly, factor analysis is used to determine the weight proportion of various vitalities and generate comprehensive urban vitality. Factor analysis allows for the identification of underlying latent dimensions among the observed indicators. This enables this study to construct a more theoretically meaningful urban vitality index based on shared variance among the variables. Finally, the Baidu heat map data is used as validation data to verify the correlation of overall urban vitality. The heat values in the Baidu heat map data are relative numerical values calculated by Baidu based on the actual distribution of product users and may not accurately reflect the population in specific areas. However, they do reflect the comprehensive characteristics of urban vitality and are positively correlated with it. Therefore, this study uses it as validation data and conducts a correlation analysis between the heat values and the comprehensive urban vitality data to verify the accuracy of the urban vitality assessment framework. The Pearson correlation coefficient between the urban comprehensive vitality data obtained through factor analysis and the Baidu heat map data is 0.612 and is significant at the 0.01 level, indicating that the comprehensive measurement of urban vitality using multiple data sources is generally consistent with the actual situation. This validation step adds a layer of reliability to our research findings, ensuring the credibility of our urban vitality assessment.

Explanatory variables: built-environment factors

Measuring the built environment from the dimensions of location, diversity, density, and transportation can effectively characterize the environmental factors related to urban vitality11,12,14. We selected the most common indicators of these four dimensions and quantitatively studied the explanatory variables of the built environment. Given that our research focuses on the spatial heterogeneity of the influence of built-environment factors on urban vitality variables, each variable will be quantitatively calculated based on spatial grid units. The calculation method is shown in Table 1, following the same approach as descriptive statistics. This quantitative approach allows us to precisely analyze the relationship between the built environment and urban vitality at a fine—scale spatial level.

Table 1 Explanatory variable and explained variable.

The spatial distributions of urban vitality and built-environment indicators are detailed in Fig. 2, which visually presents the data patterns and relationships, facilitating a more intuitive understanding of the research results.

Fig. 2
figure 2

Spatial distribution of urban vitality and built environment indicators.

Model construction

By introducing the Ordinary Least Squares (OLS) model and the Spatial Lag model (SLM), and comparing them with the Multiscale Geographically Weighted Regression (MGWR) model, we aim to verify the performance of local models in handling spatial heterogeneity issues.

Global regression model

The OLS model is founded on the least-squares method for parameter estimation. It stipulates the relationship between the explained variable and the explanatory variable, assuming that this relationship is spatially stable and invariant35. Taking urban vitality as the explained variable and built-environment factors as the explanatory variables, a fitting curve is constructed on a global scale based on the OLS model. The model is expressed as follows:

$${y}_{i}={\beta }_{0}+\sum_{i=1}^{k}{\beta }_{i}{x}_{i}+{\varepsilon }_{i}$$
(1)

where \({y}_{i}\) represents the actual value of the explained variable of urban vitality in the i-th grid cell. \({\beta }_{0}\) represents the intercept. \(k\) represents the number of explanatory variables of the built environment. \({x}_{i}\) represents the set of explanatory variables. \({\beta }_{i}\) represents the regression coefficient describing the change of urban vitality \({y}_{i}\) when \({x}_{i}\) changes by one unit, and \({\varepsilon }_{i}\) represents the random error term.

Although the OLS model can demonstrate the correlation between explanatory variables and explained variables, it assumes that the relationship between the two remains spatially constant and does not consider the spatial dependence between variables. Consequently, the spatial relationship between variables cannot be adequately described36,37. In the context of urban studies, where the characteristics of different areas can vary significantly, this assumption may lead to inaccurate results. For example, in a city with distinct urban cores and suburban areas, the impact of a built-environment factor like road density on urban vitality may be different. The OLS model would overlook these differences, treating the entire city as a homogeneous entity. To address this limitation, we introduce the spatial lag model in the spatial autoregressive model, which integrates spatial interactions into the model and considers spatial weights and dependencies38. The SLM takes into account the spatial autocorrelation of the dependent variable. It assumes that the value of the dependent variable at a particular location is not only influenced by the explanatory variables at that location but also by the values of the dependent variable in neighboring locations. The SLM can be written as:

$$y=\lambda Wy+X\beta +\varepsilon$$
(2)

where urban vitality \(y\) represents an observation vector of \(n\times 1\). \(n\) represents the number of grid units. \(\lambda\) represents a spatial autoregressive parameter. \(W\) is a spatial weight matrix of \(n\times n\), and \(X\) is a data matrix of \(n\times k\), which includes \(k\) columns of built environment explanatory variables. \(\beta\) represents the corresponding regression coefficient, a regression coefficient vector of \(k\times 1\), and \(\varepsilon\) is a random error term. The model can be further written as follows:

$$y=(1-\lambda W{)}^{-1}X\beta +(1-\lambda W{)}^{-1}\varepsilon$$
(3)
Local regression model

The Geographically Weighted Regression (GWR) model assumes a non-stationary relationship between explained variables and explanatory variables, thereby ensuring the explanatory ability of the geographically weighted regression model for local variable space non-stationarity. In contrast to the OLS and SLM models, GWR acknowledges that the relationship between variables can change from one location to another. It estimates local regression coefficients, which allows for a more nuanced understanding of how explanatory variables influence the dependent variable in different parts of the study area. However, this model assigns a single, unique bandwidth to all explanatory variables. This means that it treats all variables as if they have the same spatial scale of influence, and cannot account for the spatial interactions of variables at different scales. For instance, the impact of a local coffee shop (a small-scale variable) on urban vitality in its immediate vicinity may be different from the impact of a large shopping mall (a larger-scale variable) on a broader area, but GWR may not capture these differences effectively.

To overcome the limitations of the GWR model, Fotheringham’s team proposed a multiscale geographically weighted regression model to handle explanatory variables at different spatial scales. This model enables the conditional relationship between explained and explanatory variables to vary across different spatial scales39. The MGWR model can effectively reduce parameter estimation bias, eliminate variable collinearity, and describe spatial heterogeneity more accuratel 40. In this study, the Bi-square kernel function is selected as the spatial weight kernel function. The Akaike Information Criterion corrected (AICc) is chosen as the bandwidth optimization criterion, and the Sum of Squared Cross-Validation Residuals Function (SOCf) is adopted as the termination criterion for continuous iteration. The formula is presented as follows:

$${y}_{i}={\beta }_{bw0}\left({u}_{i},{v}_{i}\right)+\sum_{j=1}^{m}{\beta }_{bwj}\left({u}_{i},{v}_{i}\right){x}_{ij}+{\varepsilon }_{i}$$
(4)

where \({y}_{i}\) represents the vitality value of the \(\text{i}\)-th grid cell. \(\left({u}_{i},{v}_{i}\right)\) represents the centroid coordinates of the \(\text{i}\)-th grid cell. \({\beta }_{bw0}\left({u}_{i},{v}_{i}\right)\) represents the constant term. \({x}_{ij}\) represents the value of the \(j\)-th explanatory variable in the \(\text{i}\)-th grid cell.\(bwj\) in \({\beta }_{bwj}\) represents the bandwidth used to calibrate the \(j\)-th explanatory variable. \({\beta }_{bwj}\left({u}_{i},{v}_{i}\right)\) represents the corresponding \({x}_{ij}\) regression coefficient, and \({\varepsilon }_{i}\) represents the random error term. The graphical user interface (GUI) developed by Oshan et al.41 is selected for the MGWR research. We can access the GUI from the website (https://sgsup.asu.edu/sparc/multiscale-gwr).

Model tests

It is crucial to conduct a series of tests on the MGWR model to ensure the reliability and accuracy of the model results. Specifically, we focus on examining the multicollinearity of variables, spatial autocorrelation, and local parameter estimation uncertainty within the MGWR model.

Multicollinearity test

Multicollinearity among variables can severely affect the stability and interpretability of the regression model. To address this issue, the variance inflation factor (VIF) is chosen to test for multicollinearity. The fundamental principle of VIF is to measure the degree of linear correlation between an explanatory variable and other explanatory variables in the model. A high VIF value for a particular variable indicates that it has a strong linear relationship with other variables, which may lead to distorted parameter estimates and unreliable model results. By calculating the VIF for each explanatory variable, we can identify variables with high collinearity. Conventionally, if the VIF value of a variable exceeds 10, it is considered to have serious multicollinearity problems42,43,44. Once such variables are detected, we take appropriate measures to eliminate them from the model, such as removing the variable or combining it with other related variables. This process is essential to prevent distortion in model calculations and ensure that the regression coefficients accurately represent the relationship between the independent and dependent variables.

Spatial autocorrelation test

Spatial autocorrelation is a key characteristic in spatial data analysis, especially when studying the relationship between built—environment factors and urban vitality. The Moran’s I index is employed to calculate spatial autocorrelation. The value range of the Moran’s I index is (-1, 1). When the value is within the range of (0, 1), it indicates that high values of the variable are clustered with other high values and low values are clustered with other low values, presenting a positive spatial correlation. For example, in the context of urban vitality, areas with high population density (a high—value variable) may be surrounded by other areas with high population density, suggesting a positive spatial autocorrelation in population density distribution. When the value is within the range of (-1, 0), it indicates that the high value of the variable is clustered with the low value, showing a spatial negative correlation. The closer the absolute value is to 1, the stronger the spatial autocorrelation effect. When its absolute value is close to 0, it indicates that the variable shows a spatially random distribution. The formula for the Moran’s I index is as follows:

$$I=\frac{{\sum }_{i=1}^{n}{\sum }_{j=1}^{n}{w}_{ij}({x}_{i}-\overline{x })({x}_{j}-\overline{x })}{{S}^{2}{\sum }_{i=1}^{n}{\sum }_{j=1}^{n}{w}_{ij}}$$
(5)

where \(n\) represents the number of urban vitality grid units. \({S}^{2}=\frac{{\sum }_{i=1}^{n}{\left({x}_{i}-\overline{x }\right)}^{2}}{n}\) represents the sample variance. \({w}_{ij}\) represents the \((i,j)\) factor of the urban vitality spatial weight matrix, and \({\sum }_{i=1}^{n}{\sum }_{j=1}^{n}{w}_{ij}\) represents the sum of all spatial weights.

Local parameter estimation uncertainty

Since the MGWR model estimates local regression coefficients at different spatial locations, the uncertainty of these estimates can affect the reliability of the model’s conclusions. To assess this uncertainty, we can use the t-test to eliminate local parameter estimates that are statistically equal to 0. According to the conventional t-test, to ensure that the estimated value of the model is not 0 at the 95% confidence level (1—α), the value should be greater than ± 1.96. However, due to the false positives caused by multiple hypothesis tests in MGWR, it is advisable to use more conservative α values and corresponding t values to maintain a 95% confidence level 45. A revised hypothesis testing framework is proposed for the estimation of local parameters of the MGWR model, and its adjusted α and t values are calculated for the j-th variable in the i-th cell network41,45. The formula for calculating the revised hypothesis test value is as follows:

$${\alpha }_{j}=\frac{\xi }{EN{P}_{j}}$$
(6)

where \(EN{P}_{j}\) represents the number of valid parameters of the \(j\)-th model term, and \({\alpha }_{j}\) denotes the modified significance level used to calculate the critical \(t\)-value of a specific covariate.

The framework of this study is shown in Fig. 3.

Fig. 3
figure 3

The framework of this study.

Results

Validity test results

The data of built-environment factors is standardized to ensure the uniformity of data dimensions, enabling feasible comparative analysis of the coefficients of built-environment variables. Multiple collinearity tests are conducted on the explanatory variables of the built-environment, and it is found that all Variance Inflation Factor (VIF) values are less than 10, indicating no collinearity issues. When investigating the spatial distribution pattern of urban vitality, the Moran’s I value is 0.945, and the standardized test Z(I) value is 81.545. The significance level test (P < 0.001) confirms the statistical significance of the results. The proximity of Moran’s I to 1 indicates significant spatial correlation and clustering characteristics of urban vitality, making it suitable for constructing the MGWR model.

When comparing the goodness-of-fit of the models, the adjusted R-squared values for the OLS model is 0.641, indicating a relatively poor fit. In contrast, the SLM model has an adjusted R-squared of 0.986, and the MGWR model has an adjusted R-squared of 0.913 (Table 2). These higher values for the SLM and MGWR models suggest that they can explain a larger proportion of the variance in the data, outperforming the OLS model in terms of fit.

Table 2 Comparison of OLS, SLM and MGWR model parameters.

Furthermore, we used Moran’s I to analyze the spatial autocorrelation of the residuals of the three types of models (Fig. 4). It can be clearly observed that the Moran’s I for the SLM and MGWR models is lower than that of the OLS model. A lower Moran’s I value for the residuals implies that the spatial distribution of the variables in these two models is more random and there is no spatial clustering. This indicates that the SLM and MGWR models are effective in eliminating the spatial non—stationarity of the variables. Among them, the MGWR model, which constructs a regression model at the local scale and generates separate coefficient values for each spatial location, is particularly conducive to analyzing the spatial heterogeneity of the variables. This unique feature of the MGWR model enables a more detailed and accurate exploration of how built—environment factors influence urban vitality across different spatial areas.

Fig. 4
figure 4

Spatial distribution of residuals of OLS, SLM and MGWR models.

In summary, the validity test results demonstrate the superiority of the SLM and MGWR models over the OLS model in handling spatial data related to urban vitality and built-environment factors, especially highlighting the MGWR model’s ability to capture spatial heterogeneity.

Before delving into the analysis of the MGWR model results, a t-test is carried out to screen the model results. This process divides the data into significant coefficient values and non-significant coefficient values, followed by the removal of meaningless non-significant coefficient values. As depicted in Fig. 5, in the data prior to the t-test, multiple types of built-environment factors exhibit both positive and negative impacts on urban vitality. The presence of a large number of bidirectional influencing factors contradicts previous research conclusions. This contradiction indirectly signals the existence of unreasonable values within the existing data. However, after the t-test, the built-environment factors display a more singular influence direction. This outcome suggests that the retained data post-test can more lucidly reflect the direction of the influence of built-environment factors on urban vitality, rendering it suitable for subsequent analysis of spatial heterogeneity characteristics.

Fig. 5
figure 5

The positive and negative ratio of built environment factor coefficient before and after t-test

The MGWR model takes into account the conditional relationship between urban vitality and urban built-environment factors at different spatial scales. This characteristic enables a more accurate representation of the spatial non—stationarity of their relationship. Considering the results of the validity tests, the processing of the model results, and the unique capabilities of the MGWR model in handling spatial heterogeneity, this study ultimately adopts the MGWR model to explore the spatial heterogeneity of the influence of urban built environment on urban vitality. This choice ensures that our analysis can comprehensively and precisely capture the complex relationships between built-environment factors and urban vitality across various spatial contexts.

Heterogeneity analysis of the influence of built environment on urban vitality

The higher the proportion of significant coefficient values for each factor of the built environment is, the more correlated the variable is with urban vitality across multiple grid units. In Fig. 5a, except for urban road density and average number of floors, all other factors contain both significant and non—significant coefficient values, indicating that the correlation between most built environment factors and urban vitality varies across different spatial locations. This variability highlights the importance of considering spatial heterogeneity when studying the relationship between the built environment and urban vitality.

Compared with the proportion of significant coefficient values reflecting the correlation between each factor and urban vitality, the magnitude of the coefficients of each factor indicates the strength of the influence of the built environment factors on urban vitality. In Fig. 5b, the distance to the urban commercial center has the strongest impact on urban vitality, followed by the distance to subway and BRT stations and the distance to bus stops. This ranking of influence strength provides insights into which built-environment factors have a more pronounced effect on urban vitality, guiding urban planners and policymakers to prioritize relevant factors in urban development.

Building on this, by utilizing the spatial distribution of the coefficient values for each variable, a multidimensional exploration of the built environment factors is conducted in Fig. 6 to clarify the spatial heterogeneity characteristics of the influence of each built environment factor on urban vitality. For example, in areas closer to the urban commercial center, the positive impact on urban vitality may be more significant, while in remote areas, the same factor may have a different or even negative impact. This kind of spatial-specific analysis can help us understand why certain areas have high or low urban vitality and how to optimize the built environment to enhance urban vitality in different regions. Through this in-depth analysis, we can better formulate targeted urban development strategies to meet the diverse needs of different spatial areas.

Fig. 6
figure 6

The proportion of significance coefficient of built environment factors.

Location

The low coefficient values of the location indicator are spatially coincident with the city’s commercial center, indicating that the closer a grid unit is to the city’s commercial center, the stronger the influence of location on urban vitality (Fig. 7a). Wu et al.46 proposed the “suction effect” of street vitality related to the commercial center and urban vitality analysis on Xiamen Island, and the results of this study confirm this conclusion. In grid units close to the city’s commercial center, the locational advantage effectively enhances the clustering effect on urban vitality. Additionally, as shown in Fig. 5b, the distance to the city’s commercial center has the most significant influence on urban vitality among all built environment factors. The city’s commercial center influences the vitality pattern of surrounding areas and guides the spatial aggregation of the population. Therefore, its spatial location should be carefully considered in urban planning and layout. For instance, in the central business district of Xiamen Island, the high-density gathering of commercial activities, office buildings, and entertainment facilities, all attributed to its prime location, attracts a large number of people during both working and leisure hours. This not only boosts economic activities but also enriches social and cultural interactions, significantly enhancing urban vitality in the area. In the central business district, the diversity of functions and activity types is a crucial factor in stimulating vitality. Such real-world examples further illustrate the crucial role of location, especially the proximity to the commercial center, in determining urban vitality.

Fig. 7
figure 7

Spatial distribution of coefficients of built environment indicators.

Diversity

The influencing factors of diversity comprise functional mix, public service facilities, leisure and entertainment facilities, business and office facilities, and life service facilities. In Fig. 6b, the significant coefficient values of functional mix on Xiamen Island are mainly concentrated in areas such as parks, university campuses, and central business districts, where urban functional mix plays a stimulating role in urban vitality. By comparing Fig. 4 and Fig. 7b, it is apparent that the common feature of areas like the central business district is the high functional mix along with high densities of various types of infrastructure, forming a mixed overlay based on a certain facility density. This indicates that the stimulating effect of functional mix on urban vitality on Xiamen Island is conditioned by the presence of diverse facilities with high densities. The high-density distribution of facilities represents economies of scale, leading to functional concentration and aggregation of people. Meanwhile, the diversified distribution of facilities provides a richer variety of activities for people, contributing to enhancing the urban vitality of the respective areas.

The significant coefficient values of most facility densities show a consistent pattern with facility distribution. The significant coefficient values of public service facilities are mainly distributed in concentrated areas such as parks and schools (Fig. 7c); significant coefficient values of leisure and entertainment facilities are located in surrounding areas with cultural and sports functions as the core (Fig. 7d); areas where business and office facilities have a significant stimulating effect on urban vitality correspond to zones with business and office functions as the core (Fig. 7e). This indicates that the concentration of dominant functions in an area leads to economies of scale, thereby stimulating urban vitality.

However, the spatial distribution of coefficients for life service facilities does not completely conform to the aforementioned patterns (Fig. 7f). In the northern part of Xiamen Island, where facilities are concentrated and community development is mature, the density of life service facilities is not significant. On the contrary, significant units are distributed in the southwestern part of Xiamen Island, where there are fewer residential communities, and high coefficient values are found near Dongping Mountain Park. The main reason is that the construction of life service facilities in the northern part of Xiamen Island is already well-developed and spatially rational, resulting in minimal long-distance, directional spatial movements for basic life needs and limited cross-regional population agglomeration. In contrast, the southwestern part of Xiamen Island is a tourist area, including Xiamen University and Xiamen Botanical Garden. The service recipients of life service facilities here are not limited to local residents but also include a wider range, and even external tourists to some extent, enhancing the stimulating effect of life service facilities on urban vitality in the area. It is obvious that a concentrated distribution of service facilities does not necessarily lead to an increase in urban vitality. The stimulating effect of concentrated distribution on urban vitality is subject to specific conditions and requires a comprehensive consideration of factors such as the current construction status and land use.

Density

According to the data from the Seventh Population Census, the population density on Xiamen Island is approximately 13,400 people per square kilometer. Xiamen Island features a high construction density and crowded living spaces, making it a typical high-density urban environment. Previous studies have demonstrated that there is a threshold effect of population density on urban vitality. When the population density exceeds a certain value, the urban living environment becomes congested, leading to a decline in residents’ quality of life. At this point, the positive influence of population density on urban vitality begins to weaken and may even turn into a negative impact2,35. From Fig. 7g, it can be observed that the stimulating effect of permanent population density on urban vitality is not significant in most grid units on Xiamen Island, and in a few grid units, it shows a negative impact on urban vitality. The negative impact of permanent population density on urban vitality implies that the population density on Xiamen Island has reached a threshold, and issues such as environmental congestion and decreased quality of life due to high-density population have emerged in some areas. According to the calculations, when the permanent population density on Xiamen Island exceeds 3,917.43 people per square kilometer, the coefficient turns negative.

The influence of floor area ratio on urban vitality is only apparent in some areas (Fig. 7h). This might be due to the high level of development on Xiamen Island, where the negative effects of a high floor area ratio, such as crowded living spaces and low environmental quality, counteract its positive effects on urban vitality. A comparative analysis of two typical areas, Area A, a core business district with high functional mix and a dense distribution of various facilities, and Area B, a high-density residential area with comprehensive facilities such as schools, shopping malls, and parks, reveals that the floor area ratio of Area B is significantly lower than that of Area A. Yet, both areas exhibit a stimulating effect on urban vitality. This indicates that there is no absolute relationship between floor area ratio and its positive influence on urban vitality, and that a diverse and high-density distribution of urban functions is a basic requirement for enhancing the positive influence of floor area ratio on urban vitality. When the floor area ratios are similar, the same level of diversity can also increase vitality.

Compared with floor area ratio, building density has a higher proportion of significant coefficient values and shows a positive correlation with urban vitality over a larger range (Fig. 7i). The higher significance of building density is due to the fact that building density reflects the three-dimensional spatial interface of buildings. Higher building density implies more street interfaces, providing pedestrians with more spatial experiences and facilitating pedestrian flow. The significant units are mainly distributed in the southern part of Xiamen Island, showing a gradual increasing trend from north to south at the spatial level. The high building density in the southern region effectively stimulates urban vitality, leading to a positive influence on urban vitality. To explore this spatial difference, two high-density and similar spatial texture urban villages, C and D, are compared. Despite many similarities in construction intensity and form between the two urban villages, their building density coefficients show contrasting significance. The main reason is that while Village C is a residential area with a single function, Village D has a more diverse function mix and organized structure. Under the premise of functional mix and organization, high density contributes to stimulating urban vitality, which is consistent with the analysis results of the floor area ratio factor.

Transportation

The city road density on Xiamen Island shows a significant positive correlation with urban vitality across the entire island. The coefficient values gradually decrease from north to south, and higher values appear in the densely distributed city road areas in the northern region of Xiamen Island (Fig. 7j). This demonstrates that city road traffic has a universal stimulating effect on urban vitality. Accessible and convenient city road traffic enhances the positive influence on urban vitality. However, compared to other factors, the coefficient values for city road density are relatively small, indicating that the enhancing effect of city road traffic on urban vitality is weaker than that of other factors. Additionally, the range of variation in its coefficient values is minimal, with the difference between the highest and lowest values being approximately 0.006. This means that even more accessible city road traffic only has a minor enhancing effect on urban vitality. In conclusion, while improved city road traffic can enhance urban vitality, its influence is limited. Urban planning should not overly rely on road traffic planning to enhance urban vitality.

The significant coefficients for the distance to subway and BRT stations are mainly distributed in the eastern business district and the western and southern peripheral areas of the island (Fig. 7k). The choice of transportation mode by residents is closely related to the purpose of travel, and different types of transportation are often associated with specific travel purposes. Subways and BRT systems are known for their fast, punctual, and efficient travel characteristics and are often linked to long-distance travel. Therefore, the areas where the distance to subway and BRT stations significantly correlates with urban vitality are far from residential areas and represent long—distance destinations in residents’ daily travel. Moreover, the high coefficient values in the eastern business district indicate that the stimulating effect of subways and BRT systems on urban vitality in this area is further enhanced. This is mainly due to the presence of numerous companies and enterprises in the region, with a large number of commuters using subway and BRT transportation, leading to the spatial aggregation of people. In areas with more diverse functions, being closer to map locations and BRT stations can more effectively stimulate vitality.

The proportion of significant coefficients for the distance to bus stops stands at merely 13.87%, and these are primarily clustered around Xiamen Railway Station (as shown in Fig. 7l). In the vast majority of units, bus stops exhibit no significant correlation with urban vitality. This can be attributed to the fact that traditional public transportation, when compared to subways and BRT systems, has inherent limitations. For instance, its service range is restricted, and its passenger capacity is relatively low. As a result, it is more suitable for short—distance trips and struggles to efficiently handle long-distance or cross-regional passenger transport. Consequently, its stimulating effect on urban vitality is rather limited.

In contrast, the area around the railway station witnesses a high demand for transfers, with a wide variety of complex transfer destinations. Relying solely on subway and BRT transportation is incapable of fully handling the passenger flow from the railway station. Thus, the flexible and diverse routes of traditional public transportation come into play. They attract passenger flows, contributing to the formation of population aggregation in the surrounding area.

Regarding sidewalk coverage, all significant coefficients are positive, and they are mainly distributed in the areas around gardens and botanical gardens (Fig. 7m). In residential areas, the regions with high-density and high-sidewalk coverage are mainly located in the northern part of the island. Nevertheless, within this scope, sidewalk coverage does not demonstrate a significant correlation with urban vitality. This clearly indicates that in the residential areas on Xiamen Island, sidewalk coverage does not directly boost urban vitality.

Discussion

Urban vitality emerges as the result of the comprehensive interplay of built-environment factors. In the face of diverse built-environment conditions, it is crucial to effectively identify the applicability of different factors. This identification helps to clarify the influencing factors of urban vitality and its spatial heterogeneity. By doing so, we can offer scientific and precise spatial resource allocation plans for the development of smart cities, thus elevating the level of urban sustainable development. In the era of high-quality development, the concentrated and mixed distribution of diverse urban functions must be coordinated with land-use conditions. This coordination can fully exploit the leading functions of regions, enhance urban vitality, and prevent the disorderly arrangement of various functions.

Most related studies have typically relied on linear regression models (e.g.,3,4,11,12) or global spatial econometric models (e.g.,46) to determine whether built-environment indicators have a significant impact on urban vitality. However, the physical environment is complex. Global linear relationships often struggle to explain certain regional and detailed heterogeneous influence characteristics. In this study, we utilized a multiscale geographically weighted regression model and discovered that built-environment indicators with a significant overall impact may not necessarily have the same effect locally. Besides urban road density, which is related to overall vitality, the impact of other built-environment indicators is spatially heterogeneous. For example, while Ye et al.18 posited that density can promote urban vitality, our findings suggest that excessive population density and urban development intensity do not necessarily lead to an improvement in urban vitality. Although some studies have contended that diversity can enhance urban vitality18,21,22, we found that diversity not only boosts urban vitality but also serves as a prerequisite for other indicators, such as density and traffic, to promote urban vitality.

Nevertheless, this study has several limitations. First, the heterogeneity characteristics of the built environment’s impact on urban vitality encompass not only spatial differences but also temporal changes. Although the cross-sectional nature of some data and the relatively short cycle of LBS data may not fully capture the evolutionary nature of these relationships, the data contain massive records that can relatively truthfully reflect the actual conditions of urban vitality and built environments, thereby mitigating the impact of temporal instability on calculation results. In follow-up research, we will accumulate longer-term datasets to validate the accuracy of the research findings. By incorporating temporal dynamics, we can reveal patterns and relationships that may be overlooked in static cross-sectional analyses, providing a more comprehensive understanding of the complex interactions between urban environments and their vitality47,48. Second, this study was based on a case study of Xiamen Island. Further research is required to expand the scope and sample size to enhance the generalizability and universality of the research findings.

Conclusion

This study harnessed multi-source big-data analysis to explore the spatial-heterogeneity characteristics of the built environment’s influence on urban vitality. By integrating LBS big data, Weibo check-in data, and Dianping data, which represent spatial activities, social activities, and economic activities respectively, we constructed a multidimensional urban-vitality index reflecting comprehensive attributes. Built-environment indicators were developed from the aspects of location, diversity, density, and transportation and measured using integrated multi-source heterogeneous spatiotemporal big data. The MGWR model was employed to investigate the spatial-heterogeneity characteristics of the influence of built-environment factors on urban vitality. The main conclusions are as follows:

Overall, road density has a significant positive correlation with urban vitality, while the influence of other indicators shows spatial heterogeneity. This indicates that different factors have varying effects on urban vitality under different built-environment conditions. Strategies to stimulate urban vitality should be tailored to specific urban blocks instead of adopting a one-size-fits-all approach. Different planning measures should be proposed for different urban areas.

Although the influence of diversity factors on urban vitality is not the most pronounced, location, density, and transportation are closely linked to the diversity of functional layouts. For instance, units characterized by lower density indicators demonstrate a significant positive association with urban vitality, attributable to their diverse and high-density distribution of urban functions. The agglomeration and mixed distribution of diverse functional factors are essential conditions for stimulating urban vitality.

Most indicators have specific applicable conditions for enhancing vitality. There is a certain interaction and synergy among the indicators. High—density development needs to take into account factors such as functional layout and transportation conditions, as excessive population density and urban development intensity may not enhance urban vitality. Life-service facilities should increase diversity and service range. The improvement of long-distance transportation facilities has a more significant impact on urban vitality than conventional transportation. Areas with excellent natural environments and high sidewalk coverage can significantly boost urban vitality.