Background & Summary

Over half of the global population currently resides in urban areas and this proportion are projected to reach 70% by 20501. The unprecedented urbanization worldwide has significantly impacted the environment, contributing to at least 70% of anthropogenic greenhouse gas emissions, 80% of natural habitat loss and over 3.4 million km² degradation of wetlands in the past centuries2,3,4, posing great challenges to the achievement of the Sustainable Development Goals (SDGs)5. Therefore, elucidating the future implications of urbanization on both the environment and human well-being is essential for advancing sustainable development in the future. Recent decades have witnessed many studies conducted to projected future urban land expansion throughout the 21st century based on cellular automata (CA) and its derived models6,7,8,9,10,11,12, which significantly enhanced the understanding of the future evolution of urban systems. The CA framework constitutes a discrete spatial modeling module wherein complex macroscopic system patterns emerge from microscopic-level cell interactions following predefined transition rules13,14,15. However, despite the extensive applications of CA models in urban simulation, an important limitation is that they cannot simultaneously simulate both horizontal and vertical expansion. Therefore, existing large-scale projections predominantly focused on horizontal 2D expansion, while the vertical 3D growth is overlooked. In the context of global change, urban 3D expansion has close relationship with sustainable land use16,17,18, public health19,20,21, energy consumption22,23, transport efficiency24, urban heat island25,26,27, natural hazards28,29,30,31 and biodiversity32,33,34,35,36. Despite the importance of the vertical building space and 3D spatial capacity in accounting for land use demands from the burgeoning urban population, they have often been overlooked. Thus, a comprehensive understanding of urban 3D expansion in the future is imperative to provide essential information for urban planning in coping with these future challenges.

Recent studies have closed this gap by simulating future urban 3D expansion. For example, Zhao et al.37 projected future urban 2D expansion while simultaneously employed a random forest algorithm to forecast the vertical growth of building height in Shenzhen, China. He et al.38 combined the CA model with a backpropagation artificial neural network to simulate future urban height in Wuhan, China. Lin et al.39 employed a predefined set of “IF-THEN” rules within a CA model to simulate built-up height growth in Guangzhou, China. Koziatek et al.40 proposed a “iCity 3D” model for forecasting vertical urban development. Chen et al.41 proposed an extended patch-based CA model to simulate horizontal and vertical urban growth under the SSPs. Despite the progress made in these preliminary studies on 3D urban simulation, most existing studies simply combined horizontal urban simulation with vertical height estimation, by which the 3D built-up space is inadequately considered for macro control. To address this limitation, recent studies employed built-up 3D volume space as top-down macro demand rather than the conventional land use demand. For example, Xu et al.42 extended the traditional FLUS model to FLUS-3D model to simulate future urban 3D expansion in three metropolitan regions of China. Wang et al.43 proposed an enhanced CA model and used built-up volume as macro control to project future urban 3D expansion in China at 1-km resolution. However, in terms of the bottom-up (spatially explicit) projections, existing studies mainly used two separate models to estimate urban land suitability and vertical height. For example, the land use suitability is mainly estimated by an Artificial Neural Network (ANN) model, while vertical height is projected by the widely used random forest model15,37,41,42,43,44. This decoupled modeling paradigm inevitably introduces structural redundancy and computational inefficiency, particularly considering the significant overlap in socioeconomic and environmental explanatory variables (e.g., population density, economic level, transportation accessibility) required for both urban land suitability and vertical height. The recent advancement of deep learning techniques presents new opportunities to overcome these limitations through multi-task learning architectures45. Multi-task learning has demonstrated superior performance across various domains by leveraging shared representations and task correlations while reducing model complexity through parameter sharing46,47,48,49. The horizontal urban land-use suitability and vertical height prediction tasks inherently exhibit strong interdependencies, that is, urban horizontal expansion patterns influence vertical intensification potential, while existing building height conversely constrains future horizontal development possibilities. This symbiotic relationship creates ideal conditions for implementing multi-task learning frameworks that can simultaneously capture these synergistic dynamics. Therefore, our study addresses this research gap by proposing an end-to-end multi-task deep learning model that seamlessly integrates urban land suitability estimation and vertical height prediction within a unified CA modeling architecture.

On the other hand, while most existing studies in terms of urban 3D projection mainly focused on some specific cities, including cities in United States, Europe and China38,39,41,44,50, knowledge gap is particularly pronounced for low- and middle-income countries which are expected to host the majority of global population growth and urbanization throughout the 21st century51,52. To the best of our knowledge, only Global Human Settlement Layer (GHSL) provides global-scale future built-up volume map53, but the projection generated by simple historical trend extrapolation is only up to 2030. Meanwhile, it does not provide SSP-consistent projections and thus does not meet the needs of the design of SSPs. Therefore, it can hardly be used in longer-term future analyses and cannot support the development of societal development pathways. That is, there is still a lack of SSP-consistent and spatially explicit 3D urban projections at a global scale that extend over the whole 21st century.

In general, the absence of global-scale, long-term and SSP-consistent 3D urban expansion products significantly limits the understanding of future urban expansion under anticipated global change and impedes advancements in urban planning. To bridge this gap, we take an important step forward to provide the first global projections of future urban 3D expansion dataset (named FU3D) from 2020 to 2100 under the five shared socioeconomic pathways (SSPs). Our study, based on the elaborating MECA-3D model (see Methods), hypothesizes the relationship between the localized SSPs and urban 3D expansion which highlights how various socioeconomic and environmental factors—such as demographic trends, economic conditions, technological advancements, and policy decisions embedded in SSP scenarios—impact urban 3D expansion and influence urban form. The five SSPs (i.e., SSP1: sustainability, SSP2: middle of the road, SSP3: regional rivalry, SSP4: inequality, and SSP5: fossil-fueled development) outline five potential pathways for the future world based on demographics and economic growth54,55,56 (Table 1). The SSP-consistent FU3D dataset with 1-km resolution can extend the applicability of urban 3D information into future scenario-based climate, environment and sustainability-related research.

Table 1 Description of the SSPs6,7,8.

Methods

Data preparation

The projection framework in this study initiates in 2015. Therefore, we first collect and fuse (by calculating their average) the global gridded built-up volume dataset from Li et al.57, Zhou et al.58, and the WSF3D dataset59, because they are the currently available global-scale 3D building products with high precision and fine resolution in (or nearby) 2015. The average built-up height is then calculated by dividing the fused built-up volume by the urban area in each cell (i.e., 1 km2). Consequently, the built-up height in this study can be considered to be a “flattened” height representing the average height of all built-up areas within a 1 km2 pixel. Thus, the value of built-up height in this study may appear to be lower than the actual “building height”42. We then mask the fused 3D maps by the urban land product collected from the European Space Agency Climate Change Initiative Land Cover products (ESA-CCI-LC, http://maps.elie.ucl.ac.be/CCI). The ESA-CCI-LC product has been widely used in many studies. For example, Chen et al.6 used the ESA-CCI-LC as the baseline of the future urban land projections. We also provide comparison with other urban land products and find that the difference is minor (Fig. 2). Historical gridded population data are retrieved from WorldPop60. Future gridded population data are provided by Li et al.61. Historical and future GDP products are collected from Chen et al.62 and Wang et al.63. Global road database is collected from the Global Roads Inventory Project (GRIP) dataset, which was generated from many different public-available sources64. Global city centers are collected from Melchiorri et al.65. Terrain data are collected from the Global Land One-kilometer Base Elevation (GLOBE) project66. We also collect global ecological reserve areas from the World Database on Protected Area project, which provides comprehensive global database of marine and terrestrial protected areas67. Global historical gridded built-up volume data from 1980 to 2010 (at 5-year intervals) are collected from GHSL68,69, which are derived based on long-term Landsat and Sentinel imageries70,71 and widely used in many studies72,73,74. Noted that they are not directly used in spatial projection but used to calculate historical built-up volume amounts in 31 subregions which are defined based on geographic location and development level. Their historical and future scenario-based GDP, population, and urbanization rate are provided by the SSP database version 2 (https://tntcat.iiasa.ac.at/SspDb). See Fig. 11C for the spatial extent of these 31 subregions.

Top-down component: determining urban built-up volume demand

Panel data regression model is used to determine the future urban built-up volume demand, which provides 3D space for population and socioeconomic activities. The panel data regression model has been widely used in estimating macro demand in urban CA models for its efficiency with limited data, model transparency and interpretability6,43,75. We use socioeconomic data attained from SSP database from 1990 to 2010 for training, and the data in 2015 for validation. We also provide comparison with Geographically and temporally weighted regression (GTWR) model76 and random forest model77. The GTWR model is mainly designed to address spatiotemporal heterogeneity, while the random forest is famous for its capacity to capture non-linear relationships. However, the results show that the panel regression model outperforms the other two models (Fig. 1), mainly due to the limited training data, which indicates that panel data regression is suitable to use in this study. Specifically, using the GHSL dataset, historical built-up volume amounts can be calculated for each subregional and then used as the dependent variable in panel data regression. However, there is obvious difference between GHSL built-up volume amount (\({D}_{2015}^{{GHSL}}\)) and the observed global amount in 2015 (\({D}_{2015})\), because \({D}_{2015}\) only includes buildings in urban area while GHSL includes buildings in both urban and rural area. To reconcile this difference, we employ an simple but effective harmonization strategy motivated by a previous study estimating historical urban land amount10. Specifically, we use the ratio \(\frac{{D}_{2015}}{{D}_{2015}^{{GHSL}}}\) to calibrate the historical built-up volume amounts of the GHSL built-up volume amounts in historical years (1980–2010) (denoted as \({D}_{t}^{{GHSL}}\)). The harmonization strategy can be expressed as follows:

$${D}_{t}=\frac{{D}_{2015}}{{D}_{2015}^{{GHSL}}}\times {D}_{t}^{{GHSL}},$$
(1)

where \({D}_{t}\) is the calibrated urban built-up volume demand in historical year t. Then, historical statistics of GDP per capita (GDPC) and urbanization rate (UR) from 1980 to 2010 (at 5-year interval) are used as predictors to establish the built-up volume demand model at a global scale, as shown below:

$${{DC}}_{r,t}={\beta }_{{\rm{r}}}+{\beta }_{1}\times {{GDPC}}_{r,t}+{\beta }_{2}\times {{UR}}_{r,t}+\varepsilon ,$$
(2)

where \({{DC}}_{r,t}\) indicates per capita urban built-up volume demand in year t and region r. \(\varepsilon \) refers to the error term. \({\beta }_{{\rm{r}}}\) is a region-fixed effect term which controls for time-invariant regional differences such as geography. This term can also reflect spatial heterogeneity among the regions. Once the panel regression is established in Eq. (2), future built-up volume demand can be predicted in 31 subregions (the top-down component presented in Fig. 3). The result of the regression shows the model effectively accounts for over 98% of the variation in the data, with a statistically significant p-value < 0.001, indicating the robustness of the panel regression models and the general reliability of the projected future built-up volume demands (Table 2).

Fig. 1
figure 1

Validation of predicted macro urban built-up volume demand in 2015 using panel regression, GTWR18 and random forest model19.

Fig. 2
figure 2

An illustration for different urban land products in Greater Bay Area, China, in 2015. (a) Building volume product from Li et al.1. (b) ESA-CCI-LC. (c). CLCD from Yang et al.2. (d) CNLULC from Liu et al.3 (e) GlobaLand30 from Chen et al.4. (f) MCQ12Q1 product. Red pixels refer to urban land, while the black lines refer to urban boundary from Li et al.5.

Fig. 3
figure 3

The overall framework of the MECA-3D model proposed in this study.

Table 2 Performance of panel regression model for fitting historical built-up volume amounts in 31 subregions.

Bottom-up component: estimating urban land suitability and height

Urban land suitability characterizes the suitability of a non-urban pixel for development into urban pixel6,15,42. Unlike previous studies that employed separate models for predicting urban land suitability and urban height, we introduce an end-to-end multi-task deep learning model, SE-ResUNet, to simultaneously address the two tasks. Compared with single-task learning, multi-task learning shows higher learning efficiency as well as less over-fitting risk since it guides models to reach more general feature representation preferred by multiple related tasks, which can be considered as an inductive bias for the regularization of deep neural networks45. Specifically, the SE-ResUNet model adopts a residual neural network (ResNet)78 integrated with Squeeze-and-Excitation blocks (SE)79 backbone as its encoder architecture, while the decoder follows a U-Net symmetric structure80. The model employs a shared encoder but uses two independent decoders, one for urban land suitability prediction and another for urban height estimation. The ResNet backbone adopts the framework of residual learning and inserts shortcut connections into the plain neural networks, which can mitigate the training of much deeper models without degradation of performance78. SE blocks within ResNet are used, where channel-wise attention is calculated by two fully connected layers after the global average pooling on the input feature maps.

We use urban land data and urban built-up height in 2015 as ground truth data, along with a total of 7 socioeconomic and environmental variables as explanatory variables (Table 3). Specifically, population and GDP are used as they have been found to have strong relationship with urban 3D expansion41,52,53,81,82. For example, Frokling et al.82 found that building volume and GDP are positively related with R2 reaching 63%, while the GHSL building volume mapping53 found the R2 is nearly 58%. In addition, road density is also included, which is calculated as the ratio of total length of roads to the total area of each 1-km2 pixel. Distance to rivers and distance to city centers of each cell are also calculated by the Euclidean Distance tool in ArcGIS Pro software. Except for population and GDP, other variables keep static during long-term projections as we suggest they will not change a lot in the rest of the 21st century. These global 1km-resolution input data are split into tensors of shape 7 × 64 × 64 for model training. We take 80% of the data for training and the rest 20% for testing. The output of the model includes the urban land suitability and urban height, both of which are tensors of shape 1 × 64 × 64. As for loss function, the Binary Cross-Entropy (BCE) loss is used land suitability prediction (Eq. (3)), while the Mean Squared Error (MSE) loss is used for height prediction (Eq. (4)). We apply an adaptive weight scheme based on task uncertainty to combine the two loss functions (Eq. (5)). The Adaptive moment estimation (Adam) optimizer is employed83 and the initial learning rate is set as 0.001 and then decrease following the cosine annealing schedule84. The batch size is set as 16 and the number of epochs is set as 100.

$${l{oss}}_{{BCE}}=-\left[{ylog}\left(\hat{y}\right)+\left(1-y\right)\log \left(1-\hat{y}\right)\right],$$
(3)
$${l{oss}}_{{MSE}}={(\hat{y}-y)}^{2},$$
(4)
$${{loss}}_{{total}}=\frac{1}{2{\sigma }_{1}^{2}}\times {l{oss}}_{{BCE}}+\frac{1}{2{\sigma }_{2}^{2}}\times {l{oss}}_{{MSE}}+{\log \sigma }_{1}+{\log \sigma }_{2},$$
(5)

where \(\hat{y}\) is the predicted value and y is the reference value. \({\sigma }_{1}^{2}\) and \({\sigma }_{2}^{2}\) are uncertainty measures for corresponding tasks and they can be treated as trainable parameters along with the parameters of the SE-ResUNet model.

Table 3 The input variables used for spatially explicit prediction of urban land suitability and built-up height.

According to previous studies6,10,15,42,43, the conversion probability of a non-urban pixel changing to urban pixel (termed CP) is typically determined by four components and can be calculated as:

$${CP}=S\times N\times I\times P,$$
(6)

where \(S\) refer to the urban land suitability predicted by the SE-ResUNet model. \(N\) is the neighbor effect, which can be calculated as the percentage of existing urban land pixels within a 3 × 3 neighbor window of each pixel. \(I\) is the inertia coefficient which is calculated based on the discrepancy between projected built-up volume demand and existing developed built-up volume amount (Eq. (7)). \(P\) is set as 1 if a pixel is considered as prohibited pixel. In this study, water pixels, pixels with slopes greater than 15° and pixels within the ecological protection areas are considered as pixels prohibited from being converted to urban land.

$${I}_{t}=\left\{\begin{array}{cc}{I}_{t-1}, & \left|{D}_{t-1}\right|\le \left|{D}_{t-2}\right|\\ {I}_{t-1}\times \left({D}_{t-2}/{D}_{t-1}\right), & {D}_{t-1} < {D}_{t-2} < 0\\ {I}_{t-1}\times \left({D}_{t-1}/{D}_{t-2}\right), & {D}_{t-1} > {D}_{t-2} > 0\end{array}\right.,$$
(7)

where \({I}_{t}\) is the inertia coefficient at time t. \({D}_{t-1}\) is the difference between the projected built-up volume demand at t and the actual volume at time t-1.

MECA-3D model: projecting urban 3D expansion iteratively

Urban expansion can be simulated by the dynamic interplay between bottom-up and top-down forces43,85. The overall framework of the MECA-3D model proposed in this study to project future urban 3D expansion is presented in Fig. 3. The MECA-3D model consists of two main components: a top-down component to determine the macro built-up volume demand, and a bottom-up component to implement spatially explicit projection. The specific procedure to simulate urban 3D expansion from year t to t + 10 can be described as follows: First, the well-trained SE-ResUNet is used to predict urban land suitability and urban height in year t + 10. The built-up volume demand is also predicted by the already established panel regression model in year t + 10. Second, we update the conversion probability (CP) in year t + 10. Third, based on the updated conversion probability, we employ the random roulette selection scheme15,42,86 to determine whether or not a non-urban pixel convert to urban land in year t + 10. Fourth, the urban pixels in year t + 10 are multiplied with the corresponding predicted vertical height and accumulated to the existing built-up volume amount. The random roulette selection is implemented successively, until the accumulated built-up volume amount reaches the projected built-up volume demand in year t + 10.

Data Records

The FU3D dataset is publicly available at Zhao et al.87. It contains global gridded data of future urban land, urban built-up height and urban built-up volume at 1-km resolution, covering the years 2010 to 2020 at a 10-year interval under five SSPs. The dataset is formatted in GeoTIFF and uses the WGS1984 coordinate system. Specifically, urban built-up land is measured by a binary value (0: non-urban; 1: urban). Built-up volume is measured in km3 and built-up height is measured in meter. Data files are named according to a standardized format: “ff_yyyy_SSPx.tif,” where “ff” represents urban built-up land, urban built-up height and urban built-up volume (denoted as “urban”, “height” and “volume”, respectively); “x” represents the future SSP scenario, ranging from 1 to 5; “yyyy” represents the future year, ranging from 2020 to 2100, at a 10-year interval. The codes and programs used to generate and validate the FU3D dataset are Python (3.7) and ArcGIS Pro (3.0). Figure 4(A) provides a comprehensive view of the long-term evolution of urban built-up height in five global metropolises from 2020 to 2100 under SSP2. The five metropolises are: San Francisco in USA, Shanghai in China, Tokyo in Japan, Buenos Aires in Argentina, and Paris in France. Our FU3D dataset allows for a comprehensive 3D perspective on how these cities expand and develop over time. We can find that under SSP2, the macro built-up volume demand in Japan (JPN) will stop increasing and even decrease after 2030 (Fig. 11B). This means that the cities in Japan are expected to stop growing (Fig. 4A). For China (CHN), built-up volume demand will increase until 2060 and then stay. On the other hand, for USA and Europe, as built-up volume demands are projected to keep increasing, cities will keep growing until the end of the century.

Fig. 4
figure 4

Projected urban built-up height in the future from 2020 to 2100 under SSP2. (A) Long-term dynamics of urban built-up height in five metropolises from 2020 to 2100 under SSP2. The five metropolises are San Francisco in USA, Shanghai in China, Tokyo in Japan, Buenos Aires in Argentina, and Paris in France. (B) Urban built-up height in China, USA, and Europe in 2100 under SSP2. Note that the built-up height here can be considered to be a “flattened” height representing the average height of all built-up areas within a 1 km2 pixel. Thus, the value of built-up height in this study may appear to be lower than the actual “building height”.

Technical Validation

Validation measurements

The receiver operating characteristic (ROC) curve is used to evaluate the overall performance of the SE-ResUNet. The area under the curves (AUC) value ranges from 0 to 1. A completely random model yields an AUC value of 0.5, while a perfect model yields an AUC value of 188. The performance of 2D urban expansion is quantified by overall accuracy (OA), Kappa coefficient (Kappa), and figure of merit (FoM). OA is a measure of the proportion of pixels that are accurately simulated. Kappa ranges from −1 to 1, with a value of 0 indicating that model performance is equivalent to a random model, while a value approaching 1 indicating that model performance is perfect. FoM is designed for evaluating how much correct changes can be simulated by a cellular automaton-based model. It ranges from 0 to 1, with a value of approximately 0.2 considered as a favourable accuracy according to previous studies15,89. FoM can be calculated as follows:

$${FoM}=\frac{A}{A+B+C}$$
(8)

where A represents the non-urban pixels correctly simulated to change to urban pixel. B denotes the observed changed pixels but fail to be projected to change. C denotes the observed non-changed pixels that are incorrectly projected as changed. We also use error decomposition-based method. In specific, the overall disagreement between the actual and simulated urban land can be decomposed into the quantity disagreement (QD) and the allocation disagreement (termed AD)90,91,92,93,94, which can be calculated as follows:

$${QD}=\left|{\sum }_{i}^{N}{p}_{i}-{\sum }_{i}^{N}{r}_{i}\right|,$$
(9)
$$AD=2\min ({FP},{FN}),$$
(10)

where \({p}_{i}\) and \({r}_{i}\) represent the binary value (urban: 1; non-urban: 0) of pixels in the projected and actual urban land data, respectively. \({FP}\) (false positive) and \({FN}\) (false negative) are commission and omission errors in the confuse matrix, respectively.

The performance of 3D urban expansion is evaluated by three metrics, i.e., squared-correlation coefficient (R2), Root Mean Square Error (RMSE) and Mean Absolute Error (MAE) are used to evaluate the overall agreement, which are expressed as follows:

$${R}^{2}=1-\frac{\sum _{i}{\left({\hat{y}}_{i}-{y}_{i}\right)}^{2}}{\sum _{i}{\left({\bar{y}}_{i}-{y}_{i}\right)}^{2}}$$
(11)
$${RMSE}=\sqrt{\frac{1}{m}\mathop{\sum }\limits_{i=1}^{m}{\left({y}_{i}-{\hat{y}}_{i}\right)}^{2}}$$
(12)
$${MAE}=\frac{1}{n}\mathop{\sum }\limits_{i=1}^{n}\left|{\hat{y}}_{i}-{y}_{i}\right|$$
(13)

where \({y}_{i}\), \({\hat{y}}_{i}\) and \({\bar{y}}_{i}\) are the reference value, predicted value and the average of predicted values.

Model evaluation

The AUC value for land use suitability exceeds 0.96 (Fig. 5A), suggesting that the estimated urban land suitability is well accounted for by the chosen input variables. What’s more, the validation conducted on independent test set shows that the overall accuracy of predicting urban land suitability reaches 0.97, while the RMSE and MAE reach 0.2 m and 0.06 m (Table S2). These results indicate that the SE-ResUNet achieves satisfactory performance after 100 training epochs (Fig. 5B).

Fig. 5
figure 5

Model evaluation information of the SE-ResUNet. (A) The ROC curves. (B) The training and testing loss during 100 epochs.

Performance of the FU3D dataset in 2020

We compare the spatial patterns of urban land projected in FU3D dataset with the ESA-CCI-LC 2020 product and three similar global 2D urban projections, i.e., Chen et al.6, Li et al.10, and Gao et al.95. As shown in Fig. 6, the projected urban land of FU3D shows the most similarity with ESA-CCI-LC product. Further validations including spatial agreement analysis and error decomposition show that the OA across the four datasets is similar (nearly 0.99), while the Kappa, FoM, QD and AD of FU3D dataset outperform the other three existing products (Table 4). Moreover, the other three existing products have only horizontal attributes, while the FU3D has vertical height attributes. Generally, the methodological advancements of our study are achieved by two aspects: (1) we use built-up volume as the proxy of macro demand, while the compared benchmark studies only used urban land. The built-up volume can better represent the 3D space that supports human activities and thus provides more accurate macro control for urban expansion in a CA model. (2) The MECA-3D proposed in this study leverages state-of-the-art multi-task residual deep learning technique, enhancing the performance of the spatially explicit projection.

Fig. 6
figure 6

Comparison of urban land between ESA-CCI 2020 product, FU3D dataset, and three similar global 2D urban projection datasets under SSP2 in 2020 in five metropolises. The products are from Chen et al.6, Li et al.10, and Gao et al.95. The background image is from Google Earth.

Table 4 Global-scale assessment of the projected urban land from FU3D dataset and three similar global 2D urban projection datasets under SSP2 in 2020 against the EAS-CCI-LC 2020 product.

For the projected urban built-up height, we collect three global-scale built-up height datasets, i.e., GHSL53, GLAMOUR96 and GBH97. GLAMOUR dataset is derived from Sentinel imagery that captures the average building height and footprint at a resolution of 0.0009° across urbanized areas worldwide96. GBH dataset provides a 150-m global urban building height map around 2020 by combining the spaceborne lidar (GEDI), multi-sourced data (Landsat-8, Sentinel-2, and Sentinel-1), and topographic data97. The 1km-resolution GHSL built-up volume map in 2020 is generated by integrating multi-source satellite imagery (Landsat, Sentinel-2) and DEM datasets (AW3D30/SRTM), while the built-up volume map in 2030 is generated by extrapolating the historical trends53. To avoid possible misunderstanding, it should be noted that the gridded GHSL volume datasets (both historical and future) are not directly used to train the bottom-up component (the SE-ResUNet). In this study, the validation procedure is conducted in ten regions determined by the World Bank. Within each region, 5000 sample points are randomly allocated. The results demonstrate that our projections generally align with the selected datasets, as shown in Figs. 710. We find that the R² between FU3D and the GHSL dataset is around 80%, while the overall R² with the GLAMOUR and GBH datasets is approximately 45% and 60%, respectively. It should be noted that since the urban built-up height projected in FU3D is a “flattened-height” (as mentioned in the Data Preparation section), it is relatively underestimated compared to the vertical heights in the three compared datasets, similar with Xu et al.42.

Fig. 7
figure 7

Comparison of urban built-up height between GHSL dataset and FU3D dataset in 2020 under SSP2.

Fig. 8
figure 8

Comparison of urban built-up height between GHSL dataset and FU3D dataset in 2030 under SSP2.

Fig. 9
figure 9

Comparison of urban built-up height between GLAMOUR dataset and FU3D dataset in 2020 under SSP2.

Fig. 10
figure 10

Comparison of urban built-up height between GBH dataset and FU3D dataset in 2020 under SSP2.

Except for global-scale datasets, we also provide comparisons against several regional datasets. For example, we collect building footprints of 68 Chinese cities with height information (expressed as floor numbers) in 2020 from Amap (https://ditu.amap.com/). Height of each floor is assumed as 3 meter according to previous studies98,99. The results show that the overall R2 is 74% with the RMSE value 0.93 m (Table S3). Then, for Europe, EUBBCCO v0.1 dataset is collected100, which includes nearly 202 million buildings across the 27 European Union countries by combining newest multi-source government achieves and the records from OpenStreetMap100. Because the EUBBCCO v0.1 dataset is too large to use all the records for validation, 8 representative countries are chosen. In each country, 10,000 sample points are randomly generated for validation. Results show that the FU3D height in 2020 show a good agreement with the EUBBCCO dataset, with R2 nearly 78% (Table S4). Note that since both datasets are building vector datasets, we calculate the “flattened height” before the comparison with FU3D. Specifically, we compute the total building volume within each 1 km² pixel and then divided it directly by the pixel area to derive the so-called “flattened height”.

Uncertainty analysis

Uncertainty in the FU3D dataset can be categorized into three types: spatial heterogeneity uncertainty, parameter uncertainty and stochastic uncertainty. To mitigate spatial heterogeneity uncertainty, regional-fixed effect is included in the panel data regression for projecting future built-up demand. This approach accounts for spatial heterogeneity by assigning distinct intercepts to different subregions. However, it should be noted that while panel regression with individual fixed effects improves model fit, as reflected in higher R² value, this approach relies on the assumption that regional heterogeneity remains time-invariant, which may be untenable for long-term projections spanning decades to 2100, as socioeconomic and environmental conditions typically evolve nonlinearly. Future work could address this limitation through time-varying fixed effects or hybrid modeling approaches101. For parameter uncertainty, the input parameter uncertainty primarily stems from variations in per capita GDP and urbanization rates. We quantify it by calculating the 95% confidence intervals of the regression coefficients in the panel data regression, which are shown in Table S1. How parameter uncertainty impacts the projected demand are also depicted in Fig. 11. Then, to quantify stochastic uncertainty, we perform 100 simulations for each SSP during the future projections. Since we employ the random roulette selection method in the MECA-3D model, the results from multiple repeated experiments can effectively capture the model’s stochastic uncertainty. By overlaying the results of these simulations6, stochastic uncertainty can be clearly quantified (Figs. 13, 14). What’s more, by comparing the different spatial patterns across the SSPs, we can clearly see how macro socioeconomic variable variations across different SSPs ultimately affect the projected outcomes in 2100. The resulting urban spatial patterns align well with those reported in previous studies6,10,95.

Fig. 11
figure 11

Global and regional built-up volume demand during historical period (1980–2010) and the future under five SSPs (2020–2100). The shaded areas represent the 95% confidence intervals estimated by panel data regression. (A) Global built-up volume demand. (B) Built-up volume demands of the 31 subregions. (C) Distribution of the 31 subregions.

Fig. 12
figure 12

Global and continental dynamics of future population, GDP and urban land area from 2020 to 2100 under five SSPs. Future population dataset is collected from Li et al.61. Future GDP dataset is collected from Wang et al.63. Future urban land area is collected from Chen et al.6.

Fig. 13
figure 13

Uncertainty of projected urban land in five metropolises in 2100 under five SSPs. The five metropolises are San Francisco in USA, Shanghai in China, Tokyo in Japan, Buenos Aires in Argentina, and Paris in France. This figure shows the likelihood of each non-urban pixel becoming urban which is estimated by overlapping the results of 100 simulations.

Fig. 14
figure 14

Uncertainty of projected urban land in China, USA, and Europe in 2100 under SSP2 and SSP5. This figure shows the likelihood of each non-urban pixel becoming urban which is estimated by overlapping the results of 100 simulations.

Usage Notes

To understand the environmental impacts of future urbanization and human activities, it is crucial to have global-scale and long-term projections of urban 3D expansion. Currently, such projections are missing. Our research addresses this gap by presenting the first comprehensive global-scale projections of urban 3D growth for the 21st century. The projected urban 3D growth patterns in this study under SSPs comply with the relative literal meanings of the five SSPs (Fig. 11). For example, by 2100, global urban built-up volume will increase to nearly 184%–409% under the five SSPs, with the largest 3D expansion exceeding 4000 km3 projected under SSP5, which reflects the dramatic global economic growth. The SSP3 and SSP4 show minimal changes, owing to the projected economic deceleration6 (Fig. 12). Despite severe global population decline8,56, SSP5 anticipates continued urbanization due to ongoing economic development, which maintains high demand for built-up space throughout the 21st century.

With the support of the FU3D dataset, researchers can further understand how cities will grow and which pattern they follow (e.g., outward or upward102) in the future under different scenarios. Moreover, the FU3D dataset is also highly valuable for advancing research in several areas such as urban public health103,104, infrastructure resilience to natural hazards105,106,107, urban heat island effects108,109, CO2 emission estimates110,111, and local climate zone mapping108,112,113. In urban planning, the FU3D dataset is a critical resource for policymakers. It helps refine development strategies for sustainable growth by evaluating factors such as energy consumption, urban heat islands, and biodiversity under different SSP scenarios. This enables the creation of policies that mitigate negative impacts and enhance sustainability, for example, understanding future vertical development patterns helps cities improve resilience to climate change impacts, such as sea-level rise and extreme weather events, promoting adaptive and sustainable urban development. For climate modelling, recent studies have demonstrated the importance of spatially explicit 3D urban information. For example, Kamath et al.114 found that incorporating 3D urban information can significantly improve the accuracy (55% in RMSE) of the urban Weather Research and Forecasting (WRF-Urban) model compared to the traditional table-based local climate zone approach. Melissa et al.115 found that high-resolution 3D urban morphological parameters into the WRF model reduced extreme precipitation simulation errors by nearly 18%. Therefore, we believe the long-term future 3D information provided by FU3D dataset can also serve as crucial input parameters for future urban climate modelling. The incorporation of FU3D can help more accurately capture future urban climate and weather extremes under different SSPs, thereby enabling local governments to develop targeted adaptation strategies for sustainable urban development.

There are also several limitations in the FU3D dataset. First, to ensure the projections is consistent with the SSPs storyline, the macro urban built-up volume demand is calculated in each subregion, due to the lack of country-level (or finer-scale) urban built-up and socioeconomic data for all countries. Second, the impact of future climate change on the evolution of urban 3D forms is not considered. Projections considering climate factors are also needed. This limitation will be addressed in future work by incorporating projections based on the integrated scenarios of SSP and Representative Concentration Pathway (RCP). Third, we neglect the urban renewal during long-term urbanization. Although it is reasonable for a 10-year interval, our future work is expected to produce annual datasets by considering more complicated rules to include urban renewal. Fourth, the MECA-3D model is only trained and calibrated in 2015, due to the lack of long-term and accurate historical data. In further research, long-term historical 3D urban expansion data can be considered for the calibration and validation of urban 3D growth simulations.