Introduction

The acceleration of urbanization has led to rapid economic growth and continuous improvement in the living standards of residents. However, long-term extensive spatial expansion and the disorderly spread of cities have triggered a series of social, economic, and ecological problems1, such as air pollution2, environmental degradation3, traffic congestion4, and a reduction in agricultural land5. Urban functional areas, as the basic units of urban development, play a crucial role in the urbanization process. Their scientific delineation and rational allocation cannot only solve urban problems but also have considerable importance in optimizing urban spatial structures, rationally arranging spatial elements and promoting sustainable development6,7,8. Conventional studies of urban functional areas have been mainly based on remote sensing imagery, subjective assessments, surveys, statistics, and land-use data, which typically consume substantial human and material resources, resulting in inaccurate outcomes9. In the context of informatization, point-of-interest (POI) data can describe the spatial attributes and geographic location information of cities and are characterized by large information content, easy acquisition, and convenient analysis, providing new perspectives for research on urban functional areas. Many researchers have investigated urban functional areas using POI data10,11,12,13,14,15. Huang et al. identified urban functional areas by integrating the POI density and spatial heterogeneity16. Gao et al. extracted urban functional areas from POI and crowd activities based on social network locations17. Li et al. identified urban functional areas and their mixed-use degrees based on POI data7. Xu et al. determined the characteristics of urban functional distribution by integrating remote sensing images and POI18,19. In their research, urban functional areas were usually identified and characterized based on POI data from a single time period, thereby facilitating sustainable urban development.

However, with the transition from incremental to stock planning, the configuration of urban functional areas has undergone unprecedented changes. These changes directly affect urban spaces and economic structures, leading to changes in resident lifestyles and urban service functions20. For example, traditional industrial areas may be transformed into innovation and technology parks, whereas residential areas may integrate more commercial and public service facilities to meet the needs of urban residents. Changes in urban functional areas require urban planners and decision-makers to accurately grasp the development trends of these areas to effectively allocate them and devise rational planning strategies. Research that merely identifies urban functional areas is not comprehensive, and some scholars have focused on investigating changes in urban functional areas. Shen et al. explored the evolution of urban functions in the center of Shanghai21, Schiavina, whereas others examined the evolution of land use efficiency in urban functional areas22. Wang et al. investigated urban functional areas and their dynamic changes based on years of traffic smart card data23, whereas Zhou et al. took Changchun as an example to explore the changes in urban land use functions and expansions under the backdrop of urbanization24. Overall, the current research on urban functional areas primarily focuses on the identification, distribution characteristics, and dynamic changes in these areas, with less attention paid to the simulation and prediction of functional areas. Changes in urban functional areas directly impact urban planning and development; thus, predicting these changes can provide urban planners with a more forward-looking perspective. The prediction of functional urban areas is essential for solving urban problems, addressing future challenges, and promoting sustainable development.

In recent years, advancements in Geographic Information Systems (GIS), big data analytics, and the emergence of new data sources such as point of interest (POI) data have enabled the prediction of urban functional areas. However, despite the support of these technologies and data, the prediction of urban functional areas still faces many challenges, including the accuracy of data, applicability of models, and reliability of prediction results. In this context, the CA–Markov model, which has high international recognition, combines the spatiotemporal prediction capabilities of Cellular Automata (CA) with the quantity prediction capabilities of the Markov model. Based on multi-objective, multi-criteria allocation principles, it has achieved high urban prediction accuracy and gained widespread attention. Numerous studies have shown that the CA–Markov model can effectively coordinate with GIS to create suitable methods for dynamic modeling of spatial and temporal changes in cities25,26,27. This model has been widely applied to predict changes in urban land use28,29,30,31,32. Compared with land use, changes in urban functions are more rapid and complex. Therefore, as a mature model widely used in urban predictions, how capable is the CA–Markov model in predicting urban functional areas?

Hence, this study attempted to assess the capability of the CA–Markov model to predict changes in urban functional areas based on the identification of these areas over several years and the integration of POI data. This study specifically focused on the following: first, identifying the functional areas of the Nanjing old city district over several years based on POI data; second, exploring and evaluating the effectiveness and accuracy of the CA–Markov model in predicting changes in urban functional areas; and finally, discussing the potential of this model in the application of urban functional area predictions. This study aimed to contribute to the sustainable development of cities.

Materials and methods

Study area

Nanjing is the capital of the Jiangsu Province and serves as its political, economic, cultural, technological, educational, and financial hub. By the end of 2022, the permanent population of Nanjing reached 9.49 million, and its GDP increased to 1.69 trillion, placing it tenth among Chinese cities. In addition, Nanjing functions as a crucial nexus within the Yangtze River Delta area, embodying the representative stature among cities in China. Nanjing old city was selected as the research area, and the scope of the study is shown in Fig. 1.

Figure 1
Figure 1
Full size image

Study area. The figure was created using ArcGIS 10.2 (https://www.esri.com/en-us/home).

Data sources and processing

The research data for this study were POI locations in Nanjing old city from 2015 to 2023 sourced from the Amap Open Platform (https://lbs.amap.com/). Individual POI data included attributes such as name, category, address, longitude, and latitude. According to the original data crawled from the Amap Open Platform, POI data was categorized into three levels. The primary classification is a major category classification, including major categories such as catering services, accommodation services, government agencies and social organizations, shopping services, and corporate enterprises. The secondary classification is a subtype of the primary classification, taking public services as an example, including government agencies, hospitals, schools, and other intermediate categories. The tertiary classification is a subcategory of the secondary classification, taking schools as an example, including primary schools, secondary schools, universities, and other subcategories. POI data are extensively categorized into multiple levels, leading to data redundancy and overlapping. In addition, the classification standards for POI data do not align with the urban land-use classification standards, making them inconvenient for research and discussion. Consequently, we removed points with lower public recognition and reclassified the POI data to clearly represent the characteristics of urban functional areas. Referencing the “Urban Land Classification and Planning Construction Land Standards (GB50137—2011)” and the “National Economic Industry Classification (GB/T4754—2017)” and following the principles of universality and consistency in POI classification, we reclassified POI data into six major functional categories: business, traffic, industrial, public services, green space, and residential, thereby obtaining POI functional type data for each year33. The specific classification is shown in Table 1.

Table 1 Functional classification of POI.

Research methodology

Study process

This study was divided into four steps, as shown in Fig. 2. First, POI data spanning several years were obtained from the Amap Open Platform and categorized into six functional types: traffic, industrial, business, public services, residential, and green space. Second, the TF-IDF method was utilized to identify urban functional areas over multiple years. Third, after the identification of urban functional areas over several years, the CA–Markov model was utilized to conduct multiple predictions. Finally, after the completion of predictions, the predicted distribution of functional areas was compared with the actual distribution, comprehensively studying the model’s predictive capability and accuracy in predicting functional areas from various perspectives.

Figure 2
Figure 2
Full size image

Study process.

TF-IDF model

Because functional area data could not be directly obtained, this study employed the TF-IDF model to calculate the importance of POI functional points both locally and globally and used the actual contribution of POI functional points to delineate urban functional areas. The TF-IDF model is a statistical method commonly used in information retrieval and data mining to evaluate the importance of a word to a document in a collection or corpus and has been proven to have good classification capabilities for urban functional area identification13,14,34,35,36. TF represents term frequency, whereas IDF represents the inverse document frequency index. In this study, each spatial unit was considered a separate document, with the category of each POI within a document regarded as a word, thereby converting the analysis of the functions of spatial units into calculating the weight of each word in the collection of documents. After multiple experiments, this study empirically adopted a 100 m grid to homogenize and divide the research area, resulting in 4316 spatial units. Our goal was to use the TF-IDF algorithm to measure the contribution of POI functions, thereby identifying the functions within each grid. Drawing on the TF-IDF model, the actual contribution (AC) of POI functions can be calculated to comprehensively reflect the scale and usage characteristics of POI functions. The calculation formula is as follows:

$${f}_{i,j}=\frac{{n}_{i,j}}{\sum_{k} {n}_{k,j}}$$
(1)
$${b}_{i}=\text{log}\frac{|D|}{|\{j:{t}_{i}\in {d}_{j}\}|}$$
(2)
$$A{C}_{i}={f}_{i,j}\times {b}_{i}$$
(3)

where i represents the type of POI function, j represents the functional area where the POI function type is located, ni,j represents the frequency of occurrence of the POI function type in the functional area, \(\sum_{k}n\)k,j represents the total frequency of occurrence of all POI function types in the functional area, fi,j represents the proportion of the total frequency of occurrence of all POI function types in the functional area, D represents the total number of functional areas, \(|\{j:{t}_{i}\in {d}_{j}\}|\) represents the number of functional areas containing this POI function type, bi represents the general importance of this POI function type, and ACi represents the actual contribution of the POI function type.

CA–Markov model

Cellular Automata (CA) is a dynamic system characterized by discrete spaces and states. The states of all cells are updated simultaneously according to the transition rules. The state of a cell at any given moment depends on that cell and its neighborhood at the previous moment. The formula is as follows:

$$ {\text{S}}\_\left( {{\text{t}} + {1}} \right) = {\text{f}}\left( {{\text{S}}\_\left( {{\text{t}},} \right){\text{ N}}} \right) $$
(4)

where S represents the cell state, N represents the cellular field, t + 1 represents different moments in time, and f is the transformation rule of the local spatial cells.

The Markov model is a stochastic model in the time domain, where the state of an event at time t transitions to the state at time t + 1, and the state at time t + 1 is solely related to the state at time t. Its essence is to predict the probability of events. The transition matrix is a numerical representation of the likelihood of an event transitioning from time t to state t + 1 and serves as an important quantitative basis for the simulation and prediction results under the Markov model. This formula is expressed as follows:

$$ {\text{S}}\_\left( {{\text{t}} + {1}} \right) = {\text{P}}\_{\text{ij}} \times {\text{S}}\_{\text{t}} $$
(5)

Pij is a state transition matrix, which reflects the probability of each type of urban functional area transforming into other types, calculated through historical data. The formula is as follows:

$${P}_{ij}=\left[\begin{array}{cccc}{P}_{11}& {P}_{12}& \dots & {P}_{1n}\\ {P}_{21}& {P}_{22}& \dots & {P}_{2n}\\ \dots & \dots & \dots & \dots \\ {P}_{n1}& {P}_{n2}& \dots & {P}_{nn}\end{array}\right]$$
(6)

0 < Pij < 1, (i,j = 1,2,…,n), where n represents the number of functional types, P represents the probability of initial type i transitioning to type j, i is the row of the matrix, and j is the column of the matrix. Each row of the matrix represents the probability that functional type i transitions to each functional type.

The CA–Markov model, formed by combining the two models, can achieve accurate predictions. Specifically, the CA–Markov model fully leverages the capabilities of the CA and Markov models to simulate spatial changes in complex systems and quantitatively predict long-time series. This study employed the CA–Markov model to predict changes in functional areas. After identifying the functions of the corresponding grids using the POI data, the data were rasterized, reclassified, and ultimately imported into the IDRISI software for predictive analysis (https://clarklabs.org/)37. This section of the study explained the prediction process using the CA–Markov model through an experimental set. By using urban functional areas data from 2017 and 2020, it calculated the probability of each functional type transforming into other types, resulting in the functional area transition matrix for 2017–2020. Using the 2020 functional areas data, it predicted the distribution of functional areas for 2023. The predicted results were then compared with the actual functional areas distribution of 2023 to obtain the kappa coefficient, which indicated the accuracy of the CA–Markov model in predicting functional areas.

Designing three sets of experiments

After identifying the distribution of urban functional areas within the study area from 2015 to 2023, three sets of experiments were designed (Table 2) to comprehensively test the predictive ability of the CA–Markov model for urban functional areas. Initially, the forecasted outcomes for the functional areas in 2023 were juxtaposed with actual data to determine the overall accuracy, followed by the application of an identical methodology to determine the predictive accuracies for 2022 and 2021. The robustness of the CA–Markov model was confirmed based on the predictive accuracy obtained for the three experimental groups. Subsequently, the forecast data were imported into ArcGIS Pro for comparative spatial accuracy analysis.

Table 2 Three sets of experiments to verify the model’s ability.

Result

Results of urban functional areas identification

Due to the mixed phenomena of functional areas within the study region, there were different functional distributions within the grid units. For instance, in some grids, one function’s actual contribution is absolutely dominant; in others, two functions’ contributions are dominant; and in some cases, the differences in the actual contributions among multiple functions are not significant, presenting a comprehensive distribution phenomenon. Based on this, the study first filtered out areas without POI, designating them as no-data zones. In the remaining grid units, when the actual contribution value of a certain function within a grid exceeded 50%, the functional attribute was designated as the attribute of the functional area. When the actual contribution of no function within a grid exceeded 50%, two functions with actual contributions between 20 and 50% were designated as a combined functional area. For example, if the actual contributions of commercial and residential functions within a grid were between 20 and 50%, the grid was classified as a commercial-residential functional area. The other cases were classified as integrated functional areas6,38. Following this method, we identified the distribution of mixed functional areas within the Nanjing old city for the years 2017, 2020, and 2023, as specifically shown in Fig. 3.

Figure 3
Figure 3
Full size image

Distribution of mixed functional areas in Nanjing old city. The figure was created using ArcGIS 10.2 (https://www.esri.com/en-us/home).

Verification of identification results

After identifying the distribution of functional areas, this paper took the identification results of the functional areas in 2023 as an example, referencing the studies by Ding to validate the accuracy of identification39. The study randomly selected 40 grids, judged the real attributes of functional areas based on Amap, and used a conformity scoring method to evaluate the accuracy of the identification results. A perfect score of 3 points indicates full compliance, and 0 points indicate complete non-compliance. If a single functional area is identified as a mixed functional area, it scores 2; if it is marked as a comprehensive functional area, it scores 1. If a mixed functional area is identified as containing a single functional area from that mix, or another mixed functional area containing a function from that mix, it scores 2; if it is marked as a comprehensive functional area, it scores 1. If a comprehensive functional area is identified as a single functional area or a mixed functional area, it scores 1.

The formula for calculating accuracy is:

$$a=\frac{\sum_{i=1}^{n} {x}_{i}}{\sum_{i=1}^{n} {X}_{i}}\times 100\text{\%}$$
(7)

In the formula, n represents the sample size, Xi is the sum of all samples with full accuracy, and xi is the actual score of sample accuracy. The verification results of functional areas accuracy are shown in Table 3. Ultimately, by calculation, the overall accuracy of identifying the functional areas in the old city of Nanjing reached 80.83%, indicating the effectiveness of this method in identifying urban functional areas.

Table 3 Accuracy evaluation of urban functional area.

We considered the complex distribution of functional areas and the fact that the CA–Markov model currently simulated land use as single use. Therefore, for the mixed functional areas, we assigned the function with the highest proportion to a single functional area. For example, if 60% of a business-public service functional area was business, the area was allocated as a business functional area. Using this method, we identified the distribution of single functional areas in Nanjing old city, namely the primary functions within the grid, as shown in Fig. 4.

Figure 4
Figure 4
Full size image

Distribution of single functional areas in Nanjing old city. The figure was created using ArcGIS 10.2 (https://www.esri.com/en-us/home).

Verification of overall accuracy and robustness in urban functional area prediction

After the functional areas were identified, they were imported into IDRISI for reclassification to facilitate further processing. Based on the functional area data from 2017 and 2020, a mixed-function transition matrix for 2017–2020 was generated using the Markov module to obtain the probability of converting each function type into other functions. Next, we used the functional area data from 2020 combined with the transition matrix to predict the functional area distribution in 2023. This study evaluated the precision of the predictions using the kappa coefficient, which ranges from 0 to 1. The higher the value, the more precise the results. The accuracy was the lowest when the value was between 0 and 0.2. When the value was between 0.4 and 0.6, the result was acceptable. A value between 0.61 and 0.8 indicates higher precision, and if the value is greater than 0.8, the accuracy of the result is very high. We compared the predicted results of the mixed functional areas in 2023 with the actual situation to obtain a kappa value representing the overall accuracy.

Overall accuracy and robustness of mixed functional areas

This study predicted the mixed functional areas for 2023 based on the distributions of 2017 and 2020, with the results shown in Fig. 5. By comparing this with the actual situation of mixed functional areas in 2023, we obtained the overall accuracy of the prediction. We predicted the mixed functional areas for 2022 and 2021 using the same method and obtained their respective predictive accuracies, as shown in Table 4. Overall, the accuracy of the CA–Markov model’s predictions for mixed functional areas in 2021, 2022, and 2023 tended to be stable, confirming its robustness for predicting mixed functional areas.

Figure 5
Figure 5
Full size image

Prediction of mixed functional areas in Nanjing old city in 2023 The figure was created using ArcGIS 10.2 (https://www.esri.com/en-us/home).

Table 4 Prediction accuracy of three groups of years in mixed functional areas.

Overall accuracy and robustness of single functional areas

This study predicted the single functional areas for 2023 based on the distributions in 2017 and 2020, with the results shown in Fig. 6. We obtained the overall accuracy of the prediction by comparing this with the actual situation of single functional areas in 2023. We predicted the single functional areas for 2022 and 2021 using the same method and obtained their respective predictive accuracies, as shown in Table 5. Overall, the accuracy of the CA–Markov model’s predictions for single functional areas in 2021, 2022, and 2023 tended to be stable, confirming its robustness in predicting single functional areas.

Figure 6
Figure 6
Full size image

Prediction of single functional areas in Nanjing old city in 2023 The figure was created using ArcGIS 10.2 (https://www.esri.com/en-us/home).

Table 5 Prediction accuracy of three groups of years in single functional areas.

Spatial accuracy of model

In addition to validating the kappa value and robustness of the model, this study further verified its spatial accuracy. Taking the predicted distribution of functional areas for 2023, based on the functional areas from 2017 and 2020, as an example, we imported the predicted results into ArcGIS Pro and conducted a differential analysis between the actual distribution of functional areas in 2023 and 2020 to obtain the actual changes in functional areas. Subsequently, a differential analysis was performed between the predicted distribution of functional areas in 2023 and the actual distribution in 2020, comparing the actual changes with the predicted changes to further analyze the accuracy of the CA–Markov model in predicting functional areas.

We first analyzed the comparison of mixed functional areas, with specific results shown in Fig. 7. Where the yellow blocks represent the parts that remain the same after comparing 2 years, indicating areas where the functions have not changed, and the blue blocks represent the different parts, showing the changes in the functional areas between the 2 years. It is visually apparent from the figure that the blue area in the right panel is much smaller than that in the left panel, indicating that the predicted changes in the mixed functional areas from 2020 to 2023 are much smaller than the actual changes in the mixed functional areas between 2020 and 2023. Through further analysis, we obtained a coincidence rate of 0.52 between the actual distribution results of mixed functional areas in 2023 and those in 2020, while the coincidence rate between the predicted distribution results of mixed functional areas in 2023 and the actual distribution results in 2020 was 0.81. In other words, the mixed functional areas changed by 48% from 2020 to 2023, whereas the predicted change in mixed functional areas from 2020 to 2023 was only 19%. This is obviously not consistent with the actual situation, indicating that the model’s prediction largely relies on the original mixed-use results, while the actual changes are much greater than the model’s predictions, suggesting a low predictive accuracy of the model for mixed-use area changes, making such predictions unreliable.

Figure 7
Figure 7
Full size image

Comparison between the actual results of mixed functional areas in 2023 and 2020 (a) and comparison between the predicted results of mixed functional areas in 2023 and the actual results of mixed functional areas in 2020 (b). The figure was created using ArcGIS 10.2 (https://www.esri.com/en-us/home).

Using the same method, we analyzed the comparison of single functional areas, with specific results shown in Fig. 8. Similar to the previous analysis, yellow blocks represent the parts that remain the same after comparing 2 years, indicating areas where the functions have not changed, while blue blocks represent the different parts, showing the changes in functional areas between the 2 years. Directly from the graph, the difference between the two cannot be observed; through further analysis, we found that the overlap rate between the actual distribution results of single functional areas in 2023 and those in 2020 was 0.69, while the overlap rate between the predicted distribution results of single-function areas in 2023 and the actual distribution results in 2020 was 0.73. In other words, the single functional areas changed by 31% from 2020 to 2023, and the predicted change in single functional areas from 2020 to 2023 was 27%; the difference between the two was not significant, indicating that the model can predict changes in single functional areas.

Figure 8
Figure 8
Full size image

Comparison between the actual results of single functional areas in 2023 and 2020 (a) and comparison between the predicted results of single functional areas in 2023 and the actual results of single functional areas in 2020 (b). The figure was created using ArcGIS 10.2 (https://www.esri.com/en-us/home).

In order to explore the specific situation of model prediction of changes in single functional areas, we calculated the proportion of major function prediction errors in single-function areas, as shown in Fig. 9. We found that the highest accuracy in prediction was for business function areas, with an accuracy of around 0.75, while the lowest was for industrial function areas, with an accuracy of around 0.60; the accuracies for the other four types of function areas were around 0.7, and overall, there was no obvious trend of dispersion in the proportion of prediction errors for the six types of function areas.

Figure 9
Figure 9
Full size image

Prediction error ratio of single functional area.

To further investigate the prediction errors for each type of function, we obtained the spatial prediction results for 2023 in ArcGIS Pro, as shown in Fig. 10. In this, gray grids represent the cases where predictions for the six types of functional areas were incorrect, while other colors represent correct predictions. Taking the commercial function area as an example, gray grids indicate instances where the true situation was a commercial function area but was incorrectly predicted, whereas red grids indicate instances where the true situation was a commercial function area and was correctly predicted. The results show that the grids where each type of function area was incorrectly predicted did not exhibit obvious spatial clustering characteristics.

Figure 10
Figure 10
Full size image

Single functional area prediction errors in 2023. The figure was created using ArcGIS 10.2 (https://www.esri.com/en-us/home).

Discussion

Research results compared to other studies

Many scholars have currently focused on the prediction of urban public areas, mainly covering aspects such as urban public facilities, the utilization of urban public spaces40, the vitality of urban public spaces41, urban traffic flow42, urban population growth43, urban land use changes32, and urban green space demand27. In contrast, urban function is a key element that reflects urban morphology and influences urban structure, and the development of urban functional area is closely related to urban spatial structure. This study focused on the analysis of functional area identification and distribution characteristics using POI data, attempting to explore the predictive ability of the CA–Markov model for functional areas, and to make reasonable functional zoning and layout decisions, including guiding the layout of various businesses such as catering and entertainment, optimizing urban public spaces, open spaces, and green spaces, and assisting land use planning to improve spatial efficiency. Through this, reasonable urban spatial structures can be created to address urban issues and achieve sustainable development44,45.

Currently, scholars mainly rely on the CA–Markov model to study the evolution patterns and trends of geographic spatial systems, predict future spatial patterns and states, and apply it primarily in areas such as land use change simulation28, ecosystem evolution analysis46, urban expansion simulation2, resource management and environmental protection47. However, fewer scholars have paid attention to the model’s prediction of urban functional areas. The CA–Markov model combines the spatial self-organizing properties of cellular automata with the time series forecasting ability of Markov chains, enabling the model to simultaneously handle complex dynamic changes in space and time30. This characteristic makes the model suitable for addressing issues such as urban functional areas, where spatial distribution evolves over time. Based on identifying functional areas, this study delved into the CA–Markov model’s ability to predict functional areas, thus expanding the model’s applicability.

This study took the old city area of Nanjing as an example, using POI data to identify the distribution of urban functional areas. Based on this, it studied the capability of the CA–Markov model to predict functional areas over several years. The results show that the model reliably predicts the main functions within the functional areas, which is consistent with the findings of Zheng’s study38. Unlike previous studies, this research provided a more comprehensive explanation of the model’s capability to predict urban functional areas through various aspects such as model accuracy, robustness validation, comparative analysis of predicted changes, and specific functional areas prediction outcomes. We found that the model is robust in predicting functional areas, however, its predictions for mixed functional areas are inaccurate, and it currently cannot support predictions for mixed functional areas. Additionally, this study specifically analyzed the probability of prediction errors for each type of functional area and the spatial occurrences of these errors.

Feasibility analysis

The methodology of this study utilized POI data from multiple time points, covering a span of 9 years from 2015 to 2023; validation with multi-year data ensures the universality of the research findings, enabling a more comprehensive explanation of the CA–Markov model’s predictive capabilities for functional areas and facilitating the rational allocation of urban functions.

For the identification of urban functional areas, this study conducted accuracy verification, obtaining an identification accuracy of 0.81 after comparison with real functional areas. We believe that the identified functional areas data can support the exploration of the model’s predictive capabilities.

To assess the applicability of models in predicting urban functional areas, this study first verified them through robustness and kappa values, both of which reflect the accuracy of the model predictions to some extent, but still do not fully explain the superiority and importance of the models. Consequently, this study further analyzed the spatial accuracy of the CA–Markov model predictions by comparing the actual distribution of functional areas with the predicted results, providing a detailed prediction for each type of functional area, thereby offering a more comprehensive interpretation of the model’s ability to predict urban functional areas.

Limitations and future directions

In order to explore the scientific validity and integrity of the CA–Markov model in predicting urban functional areas, this study set different time spans for validation, conducting three sets of tests to examine the prediction results for time spans of 1 year, 2 years, and 3 years, as shown in Table 6. We calculated the average accuracy values for the three time spans and found that they were similar, thus selecting a 3-year time span for the prediction study. Additionally, since the data for this study spanned 9 years from 2015 to 2023, setting a time span of 4 years or more would not allow for the formation of three sets of control experiments, making the validation results less representative. In future research, as more POI data is collected, it will be possible to explore predictions with longer time spans.

Table 6 Model accuracy results for different time spans.

Furthermore, in the case of predicting single functional areas, we found that the predicted results still required improvement compared with the actual situation. In reality, there are situations in which a greenspace function is surrounded by business functions. In the model’s prediction, when the surrounding area is entirely a business function, the green space in the middle also has a high probability of being converted into a business function, implying that the surrounding business functions affect the existence of green space. This aligns with the rules of the model but does not fully match reality. In reality, if a green space within a business area is a park or plaza, it is likely to persist and not convert because it is surrounded by business functions. Therefore, there is room for further enhancement in the accuracy of the CA–Markov model’s predictions for functional areas.

The urban spatial structure is shaped by factors such as population distribution, land use, and transportation systems48,49,,50, which interact with each other to determine the form and function of the city. Urban functional areas, as an essential component of urban spatial structure, reflect the primary uses and characteristics of different regions within a city14, such as business, residential, and industrial areas. Therefore, research on urban functional areas must consider information on population dynamics, land use patterns, and traffic flow. Future studies could use models incorporating factors like population shifts, employment changes, traffic volume, road accessibility, and changes in land use to comprehensively analyze the characteristics of changes in urban functional areas, develop suitability atlases that affect the transformation of each functional area, and make more reasonable and reliable predictions on the development of urban functional areas to promote sustainable and efficient city operations.

This study has certain limitations. First, for the same factual POI, some POI locations were overlooked during the multiyear data statistics on Amap, such as sculptures in parks reflecting urban functions. More precisely, they were ignored in some years but appeared in others. However, POI in green park spaces exist objectively every year, which could affect the judgment of POI functional weight, thereby further influencing the identification and prediction results of functional areas for each year. However, Amap is China’s leading provider of digital map content, navigation, and positioning services, widely recognized and used by users in the industry, and its reputation and influence within the industry also provide a certain guarantee for the reliability of its data. In this study, considering the minimal possibility of these POI being overlooked in the statistics, and for the total number of POI across multiple years, the proportion of overlooked POI is negligible, we believe that this has little impact on the identification of functional areas and the prediction results. Additionally, in future research, with the advancement of technology, obtaining more accurate POI data can further improve the conclusions of this study.

Conclusion

Research on urban functional areas is crucial for promoting sustainable urban development. Identifying urban functional areas and predicting their changes help in understanding the development of urban functions, thereby enabling the rational allocation of the distribution of urban functional areas, advancing the rationalization of urban spatial structures, and thus solving urban problems. This study initially identified the distribution of urban functional areas in the old city area of Nanjing over several years and then used the CA–Markov model to predict their changes. The results show that, for mixed or single functional areas, the model’s predictions over several years tend to be stable, and the accuracy of the predictions over many years indicates the robustness of the model in predicting urban functional areas. For mixed functional areas in cities, model predictions largely rely on the distribution of the base years used for prediction, leading to inaccurate results, and it is still not applicable for simulating and predicting mixed functional areas. For single functional areas in cities or the primary function within an area, the degree of change predicted by the model was close to the actual degree of change, making the results referable.

In summary, this study proposed the identification of functional areas based on POI data and explored the capacity of the CA–Markov model to predict changes in these areas, presenting a standardized and scalable workflow. This approach addressed the challenge of functional area identification within urban function area research, facilitating an understanding and rational arrangement of urban spatial structures and functional layouts to enhance spatial utilization efficiency. Moreover, within the context of sustainable development, the approach of this paper provided fresh perspectives for forecasting and researching future urban functional areas.