Introduction

With the continuous growth of city size, density, and population, urban green spaces play an increasingly critical role in ensuring the sustainability and livability of cities. At the same time, in the context of severe impacts from global climate change on urban environments, urban greening policies are becoming increasingly important in improving the quality of urban ecosystems and living environment in cities1,2. Effective greening policies can not only significantly improve urban air quality and mitigate the urban heat island effect, but also increase public green space, thereby significantly improving the life quality of the residents3. Currently, the functions, benefits, and protection of urban green spaces have received great attention in the world2. Many countries have issued relevant policies of vegetation protection, such as the Biodiversity Offset Policy in Australia4 and the Urban and Community Forestry Act in the United States5. Since the “reform and opening up”, China has enacted a series of policies and regulations to strengthen urban greening and ecological construction and management, such as the Regulations on Urban Greening in China and the Shanghai Municipal Greening Regulations6. However, the design and implementation of urban greening policies face numerous challenges, such as policies being disconnected from reality, insufficient consideration of economic feasibility, and poor cross-departmental coordination in practice7,8. These obstacles not only limit the achievement of policy objectives but also highlight deficiencies in the scientific rigor of policy design.

In-depth analysis of the content and objectives of greening policies is crucial to improving the rationality and adaptability policies. Greening policy is a subset of policies designed to enhance the ecological environment, address climate change, and ensure ecological security7. It adopts the fundamental methodologies and frameworks of policy assessment. Therefore, policy evaluation research provides theoretical and practical guidance for assessing greening policies. Policy evaluation is a systematic analysis of the policy system through scientific methods9, and can reveal the formulation motivation, design framework, and implementation process of policies, providing an empirical basis for policy theoretical research10,11. According to Varone et al.’s Handbook of Public Policy Evaluation, policy evaluation research now extends beyond traditional analyses of content, implementation, and effects to an integrated framework that includes goal achievement, social impacts, and democratic accountability9,12,13. For instance, policy evaluation now emphasizes not only policy effects but also resource allocation efficiency and long-term impacts on social well-being14. These advancements provide theoretical and methodological support for analyzing policies, including greening policies.

Policy content analysis is an important part of policy evaluation, and helps understand the intention, scope, and expected outcomes of a policy through in-depth interpretation of the policy documents, which is crucial for the evaluation of greening policies. Traditional approaches for policy content analysis mainly rely on qualitative interviews, literature reviews, or expert reviews to analyze the content and impacts of policies15,16. For example, Schwartz et al.16 delved into the green infrastructure policies of Ambos Nogales through policy categorization, stakeholder interviews, and focus group discussion. They also conducted a content analysis and an objective review of the qualitative data collected to identify the barriers for the implementation and regulatory opportunities, and explored the strategies to enhance urban greening. However, these approaches are often time-consuming and heavily dependent on personal experiences, while policy documents often involve complex legal language and multi-layered information structures, which together pose great challenges to traditional approaches when dealing with large-scale policy texts.

With the development of technology, text mining has been widely used in the field of policy analysis, providing an automated and efficient way to process and parse large amounts of textual data15. With Natural Language Processing (NLP) techniques, text mining allows quick identification of topic distribution17,18,19 and extraction of keywords20,21, as well as the determination of affective tendencies22,23 and structural patterns, thereby revealing the trends and potential impacts of policy changes. It is currently widely used in public policy evaluation13,24,25,26, social science studies27,28,29, and market research30,31,32, and is employed to capture the main topics of policy texts and the public’s attitudes toward policies. For example, Park and Yong15 assessed the congruence between South Korea’s energy policy and its official press releases using word frequency analysis and latent semantic analysis, revealing a direct link between government communiqués and energy policy. Li et al.17 analyzed the policy documents related to new energy vehicles by applying Latent Dirichlet Allocation (LDA) thematic modeling and econometric methods, revealing their significant impact on the market promotion of new energy vehicles. Dai et al.33 examined the policy texts of the bike-sharing industry by using word frequency and clustering analysis, identified three primary and nine secondary indicators, and revealed the governance challenges and government responses. Wu et al.22 analyzed Chinese residents’ opinions on mandatory waste separation by sentiment analysis and LDA thematic modeling, providing a solid basis for improving policies associated with waste separation. These approaches have allowed researchers to effectively extract the core information from complex policy documents and improve their understanding of the policy content, thereby providing robust analytical tools for the analysis of greening policies.

Current research on greening policies has been mostly focused on the management practices of urban green spaces, policy evaluation of green infrastructure, and impacts of policies on greening effectiveness3,16,34,35,36. For example, Baycan-Levent and Nijkamp34 compared green space policies in European cities through an extensive expert questionnaire. Feng et al.36 employed trend estimators and Mann–Kendall tests to analyze the changes in vegetation cover in China to explore how government policies promote greening. Bush3 systematically evaluated local government greening policies in Melbourne, Australia, by using a transition management approach. Kwartnik-Pruc and Trembecka35 analyzed the data provided by Central Statistical Office of Poland to understand the urban distribution and types of public green spaces, focusing on the status of public green space policy implementation in the city of Krakow. However, most greening-related studies still mainly rely on traditional content analysis methods, such as questionnaires and case studies, which may yield biased interpretations due to researchers’ varying experiences or perspectives, thereby constraining the scope and depth of policy analysis. In addition, there is still a noticeable gap in research on how policies guide urban greening efforts. More importantly, while significant progress has been made in text mining to deal with content analysis of policy texts in other fields, these studies have failed to provide a comprehensive and systematic analysis. Current research on policy content assessment is mainly limited to some aspects such as topic classification and word frequency analysis, and there have been no systematic approaches that can comprehensively assess the deeper meaning of policy texts. In addition, there is a lack of real-time tracking and in-depth interpretation of policy dynamics, as well as intuitive and dynamic interactive visualization tools for policy makers.

In order to fill the research gap and address these challenges, this study constructs a dynamic, multi-dimensional policy analysis framework by introducing AI big models and text mining techniques, with the aim to systematically analyze the content of urban greening policies. The framework was then innovatively used to analyze different levels and spatiotemporal dimensions of greening policies by taking Wuhan City as an example: exploring the historical thematic evolution of greening policies at the macro level, mining the annual topic distribution and hot keywords at the meso level, and assessing regional greening indicators at the micro level. This study contributes to greening policy evaluation by: (1) Providing policymakers with a quantitative foundation for decision-making by identifying priority areas for greening resource allocation through thematic classification and indicator extraction. (2) Supporting dynamic policy adjustments to address emerging environmental needs by tracking real-time policy trends. (3) Offering visualization tools, such as word cloud analysis and geospatial displays of greening indicators, to enhance policymakers’ and the public’s understanding of policy impacts, thereby improving decision-making transparency and public engagement. This framework not only enhances policymakers’ understanding of policy design and objectives but also improves the timeliness and precision of policy adjustments, ultimately fostering the effective implementation and continuous improvement of urban greening policies.

To achieve these objectives, this study first details the framework’s design and technical implementation using the Wuhan case study in “Materials and methods” section. It then evaluates its application outcomes in “Results” section and discusses limitations and future directions in “Discussion” section, thereby offering a comprehensive analysis from theory to practice.

Materials and methods

Framework for intelligent analysis of greening policy texts based on text mining and AI big models

In this study, we developed an intelligent analysis framework for greening policy texts based on text mining and AI big models, aiming to improve the rationality and practicality of policy analysis. The framework consists of the following seven main components (Fig. 1). 1) Automated timed data collection and preprocessing, which allows automatic collection of greening policy texts from government gazettes and related agencies to ensure real-time data update and consistency for subsequent analysis. 2) Policy keyword extraction, which employs NLP techniques to extract keywords and phrases from the texts to reveal the core content of the policy. 3) Policy topic categorization, which allows automatic identification and classification of the main topics in policy documents with topic modeling techniques to quickly understand the policy focus. 4) Extraction of greening core indicator: which means identification and extraction of key greening planning indicators from the policy documents, such as green space area and greenway construction, and is a key to assessing the effectiveness of policy implementation. 5) Policy AI interpretation, which uses AI big models to deeply interpret the policy text, analyze its main goals, and predict the possible outcomes of policy implementation. 6) Real-time policy tracking, which can collect dynamic data related to greening policies to provide real-time feedback to policy makers. 7) Visualization of the intelligent analysis results, which can display all analysis results through user-friendly interfaces, including charts, timelines, and maps, to facilitate the intuitive understanding of policy makers and researchers on the policy trends and specific objectives. Overall, the framework integrates seven functions to create a multi-level system for analyzing urban greening policies across macro, meso, and micro dimensions, addressing the time-consuming nature and interpretive biases of traditional methods in processing large-scale policy texts. These functions are interconnected through multidimensional analysis logic: (1) automated data collection and preprocessing establish the data foundation; (2) keyword extraction and thematic categorization identify macro trends and meso-level priorities of greening policies; (3) core indicator extraction and AI-driven interpretation assess micro-level indicators and policy outcomes; and (4) visualization integrates multi-level results into intuitive decision-making insights. This multi-layered design ensures comprehensive and systematic analysis while addressing specific policy needs, thereby enhancing the scientific rigor, timeliness, and practicality of policy formulation.

Fig. 1
figure 1

Framework for intelligent analysis of greening policy texts based on AI big models and text mining.

Study area

Wuhan was chosen as the study area for this study due to its unique urban characteristics and environmental challenges. As an important industrial and commercial center in central China, Wuhan is characterized by abundant water resources and vast green spaces, but it is also faced with great environmental pressures due to rapid urbanization37. In recent years, urban heat island effect and air pollution have become increasingly serious38,39. Official data indicate that Wuhan’s average annual temperature increased significantly from 1951 to 2018, with a warming rate of 0.30 °C per decade40,41. The average annual PM2.5 concentration in 2023 was 38 μg/m3, exceeding the national ambient air quality secondary standard by 9% and rising 8.6% from 202242, highlighting the urgent need for effective greening policies. In this context, Wuhan government has implemented several greening and ecological policies43,44,45,46 to increase the green space area, improve the ecological environment, and enhance the sustainable development of the city. These policies make Wuhan an ideal city to study the impact and effectiveness of urban greening policies and provide a good basis for this study. Therefore, this study explores the rationality, design effectiveness, and core content of these policies, to facilitate future informed urban greening policymaking.

Automatic data collection and pre-processing

In the automatic data collection and data pre-processing stage, this study focuses on systematic collection and organization of urban greening policy texts in Wuhan. The data were mainly obtained from two official platforms: Wuhan Municipal People’s Government Portal (https://www.wuhan.gov.cn/) and the official website of the Wuhan Municipal Bureau of Landscape and Forestry (https://ylj.wuhan.gov.cn/). By searching these websites using the keywords “greening program”, “greening”, and “parks”, the relevant policy documents can be precisely located. The specific steps are as follows.

  1. (1)

    Automated data collection. This study develops automated crawler scripts via Python, which can automatically crawl the corresponding text data in the specified websites. The captured text data include 12 greening policy texts (2009–2024) for thematic analysis, and 10 additional greening-related activity records from the past six months for real-time information delivery (Table 1). The acquired textual data were subsequently stored into a MySQL database with records including document source, year, and textual content to ensure that the data are organized and searchable.

Table 1 Policy documents related to greening in Wuhan.
  1. (2)

    Data pre-processing. The following steps are taken to process the data in order to improve the accuracy of the analysis.

  • Text cleaning Python’s standard and RE libraries are used to remove HTML tags, spaces, special characters, and numbers, as these are irrelevant in most text analysis.

  • Segmentation processing Text segmentation can split continuous text into individually manageable lexical units. Chinese text segmentation is particularly critical because Chinese writing is not separated by distinct spaces like English. This study utilizes a specialized Chinese word-splitting tool, Jieba, to ensure efficient and accurate recognition of Chinese words33,47.

  • Removal of stop words The common stop words that do not contribute to the analysis are removed, such as “and” and “is”, and the list of stop words in the field of greening are customized to exclude specialized but unrecognizable words, such as “hectare” and “area”.

  • These preprocessing steps are aimed to optimize the dataset and provide clean and accurate input data for subsequent topic modeling and keyword extraction.

Policy keyword extraction

TF-IDF (Term Frequency-Inverse Document Frequency) is a common weighting technique used in information retrieval and text mining to assess the importance of a word to a set of documents or one document in a corpus48. In this study, the TF-IDF model was used to extract keywords from policy texts. The algorithm involves two concepts: term frequency (TF) and inverse document frequency (IDF).

TF represents the occurrence frequency of a word in a document and is normalized to the number of words. For the word \(t\) in document \(d\), its TF is calculated as follows49:

$$\begin{array}{*{20}c} {TF_{ij} = \frac{{{\text{n}}_{ij} }}{{\mathop \sum \nolimits_{{\text{k}}} {\text{n}}_{kj} }}} \\ \end{array}$$
(1)

where \({\text{n}}_{ij}\) is the number of occurrences of the word \(t\) in document \(d\), and the denominator is the sum of occurrences of all words in document \(d\).

IDF measures the general importance of a word in a document collection. To calculate the IDF value of a particular word, the total number of documents in the document collection is first divided by the number of documents containing the word, and then the natural logarithm is taken to obtain the IDF value for the word:

$$\begin{array}{*{20}c} {IDF_{i} = \log \frac{\left| D \right|}{{\left| {\left\{ {j:t_{i} \in d_{j} } \right\}} \right|}}} \\ \end{array}$$
(2)

where \(\left| D \right|\) represents the total number of texts in the document collection and \(\left| {\left\{ {j:t_{i} \in d_{j} } \right\}} \right|\) denotes the number of documents containing the particular word \(t\). In order to prevent the denominator from becoming zero if \(t\) does not appear in any document, the denominator is usually added by one to ensure the stability of computation and avoid the error of dividing by zero.

Finally, TF-IDF can be obtained by multiplying the two:

$$\begin{array}{*{20}c} {TF - IDF_{ij} = TF_{ij} \times IDF_{i} } \\ \end{array}$$
(3)

The TF-IDF score can indicate the importance of the word in the document, with a higher value indicating higher uniqueness of the word in the document.

The TF-IDF model was chosen because it can effectively distinguish between high-frequency words and key words in a document, and is particularly powerful in analyzing policy texts containing specialized terminology, which can accurately identify highly indicative words. This model improves the accuracy and depth of text analysis by attenuating the influence of common words and emphasizing key thematic vocabulary50. Specifically, this study calculates the TF-IDF value of each word in a policy document and selects the top 30 words with the highest TF-IDF value as the keywords of the document, which are used to reveal the policy content and tendency in depth. This method not only improves the accuracy of keyword extraction, but also enhances the systematicity of policy text analysis.

Policy topic classification

Latent Dirichlet Allocation (LDA) is a statistical model that contains a three-layer structure of words, topics, and documents, and is designed to identify implicit topics in large-scale document sets. In this study, we used LDA as a core tool for policy topic classification. The model assumes that each document is generated by a set of implicit topics, with each being defined by a probability distribution of words51. It can reveal the statistical patterns of texts, and deeply parse the structure and topic content of policy texts26. The LDA model can be applied to a wide range of policy documents. Although the framework supports dynamic analysis, we selected traditional LDA with topic similarity analysis (Step 4) over Dynamic Topic Modeling (DTM) to track policy topic evolution, as LDA is more computationally efficient and balances complexity and dynamism effectively52,53. Additionally, LDA reduces computational costs with limited data, minimizing the overfitting risk associated with DTM51.

Topic modeling with LDA has significant advantages over traditional text analysis methods such as frequency statistics or manual classification. First, it can automatically extract topics through probability distribution, which significantly reduces human error and subjectivity. Second, the model can handle unstructured text and reveal deep semantic connections by simulating the text generation process. In addition, LDA supports categorization of multi-theme documents, which can accurately reflect the complexity of documents and provide a dynamic and precise analytical perspective for complex policy texts26.

The operation process of the LDA model can be divided into the following steps. First, the text is preprocessed (such as word splitting and deactivation removal) to construct a document-vocabulary matrix. Next, topics are randomly assigned to each document, and based on these initial topic assignments, iterative computations are performed to adjust the probability distribution of each word across topics and the distribution of each topic across documents. This iterative process relies on two key Dirichlet distributions: for each topic \(k\), the distribution \(P\left( {w\left| k \right.} \right)\) of the word \(w\) is governed by the prior distribution \(Dir\left( \beta \right)\), while document \(d\) in the subject distribution \(P\left( {k\left| d \right.} \right)\) is controlled by the prior distribution \(Dir\left( \alpha \right)\), where \(\alpha\) and \(\beta\) are model hyperparameters that affect the dispersion degree of topics in a document and that of words in a certain topic, respectively54.

By iteration until model convergence, the LDA model can output the topic distribution of each document and the lexical distribution of each topic. The results not only reveal the underlying topic structure in the document set, but also improve understanding of the implicit semantic hierarchy in the text. By analyzing these probability distributions, we can characterize each topic in detail, and thus further explore the key messages and trends of the policy texts. The specific steps are as follows.

  1. (1)

    Data categorization: according to the needs of the study, the greening policy texts are categorized into two classes for different analysis purposes. The first category of data is used for topic evolution analysis, and segmentation of the policy text data was primarily based on the number distribution of texts. Specifically, the data are segmented into four phases: 2009–2012, 2013–2015, 2016–2021, and 2022–2024, with each phase including three years of policy texts. This arrangement not only ensures data consistency and sufficient sample size within each phase, but also allows the comparison of changes and development trends of the policies between different phases. The second category of data is used to reveal the distribution of annual topics in recent years, including separate analyses of the data for 2022, 2023, and 2024 to explore in detail the policy priorities and changes in each year.

  2. (2)

    Determination of the optimal number of topics \(k\): To determine the optimal number of topics \(k\), it is necessary to calculate the consistency score and perplexity of the topic model, both of which are effective in evaluating the performance of the model. Perplexity measures the model’s ability to predict the new text, reflecting the model’s generalization ability, while consistency score assesses the differentiation between the topics generated by the model, i.e., whether the vocabularies of different topics are highly unique and relevant. A lower perplexity typically indicates that the model has better predictive accuracy and internal consistency, whereas a high consistency score represents that there are clear boundaries between topics, and the vocabulary contained in each topic is highly relevant and unambiguous26,55. Although both perplexity and consistency score can be used to determine the \(k\), we found that consistency score is more effective in evaluating the model as it can more directly and clearly reflect the optimal number of topics, which better meets our research needs of analyzing policy texts and revealing policy topics. Therefore, the \(k\) with the highest consistency score was chosen as the basis for topic categorization in this study (Fig. 2).

Fig. 2
figure 2

Thematic consistency scores of greening policies in Wuhan.

  1. (3)

    Topic classification with LDA: After determination of \(k\), the LDA model was parameterized. Supported by prior validations and research26,56,57, we adopted default settings of α = 50/k and β = 0.01 for LDA. The parameter α controls topic sparsity per document, with a higher value (e.g., 50/k) promoting more topics, ideal for diverse policy texts. Conversely, a lower β (e.g., 0.01) concentrates vocabulary distribution per topic, improving interpretability51.Such settings can help the model to better learn the associations between documents and topics as well as between words and topics. The model was iterated 1000 times to ensure adequate learning and stable results.

  2. (4)

    Calculation of topic similarity: After completion of topic model training for each stage, the topic similarity between successive stages was calculated to identify the evolution and persistence of topics over a time span. Specifically, the calculation involves integration of the topics at each stage into text strings according to the contained vocabulary. For two consecutive stages, we transformed the textual data using the TF-IDF vectorization method and computed the cosine similarity between the TF-IDF vectors, which is a metric used to assess the directional similarity between the two sets of data with the following equation15:

$$\begin{array}{*{20}c} {Cosine Similarity\left( {A,B} \right) = \frac{A \cdot B}{{\left| {\left| A \right|} \right|\left| {\left| B \right|} \right|}}} \\ \end{array}$$
(4)

where \(A\) and \(B\) are two TF-IDF vectors, \(A \cdot B\) is the dot product of the vectors, and \(\left| {\left| A \right|} \right|\) and \(\left| {\left| B \right|} \right|\) are the modes of the vectors (i.e., the lengths of the vectors). The result of this formula ranges from − 1 to 1. When the cosine similarity value is 1, − 1, and 0, it indicates that the two vectors have exactly the same, opposite, perpendicular direction, respectively.

  1. (5)

    Calculation of the share of each topic in each year: For the annual data, we calculated the share of each topic in each year. This step provides data support for policy development and adjustment by analyzing the occurrence frequency of each topic in each year’s documents and proportionally assigning the topics to each document, thereby revealing the focus and changes in policy concerns in each year. The specific calculation process is as follows.

  • Topic occurrence frequency: The times that each topic appears in each document is counted to obtain the absolute frequency \(f_{t,d}\) of each topic, where \(t\) represents a specific topic and \(d\) represents a specific document.

  • Total topic frequency: The total occurrence frequency \(F\) of all topics in all documents, i.e., the sum of all \(f_{t,d}\), is calculated.

  • Topic percentage calculation: For each topic, the percentage is derived from the ratio of its occurrence frequency in all documents to the total frequency of all the topics:

$$\begin{array}{*{20}c} {P\left( t \right) = \left( {\frac{{\mathop \sum \nolimits_{d} f_{t,d} }}{F}} \right) \times 100\% } \\ \end{array}$$
(5)

where \(P\left( t \right)\) denotes the percentage of topic \(t\), reflecting the importance of that topic relative to all topics in the year.

In this way, the percentage of each topic can be used to indicate its relative weight in the annual policy, providing a quantitative basis for analyzing the policy focus in each year.

AI extraction of greening core indicators

In policy text analysis, extraction of core indicators is a key step to understand and quantify the effects of policy implementation. The common approaches include named entity recognition and entity relationship extraction, which are mainly used to identify and classify key entities in a text (such as locations, names of people, organizations, or other proper names) and their interrelationships in an unstructured text. Named entity recognition aims at tagging out specific entities in the text, while entity relationship extraction further analyzes the semantic connections between these entities, such as attribution and location relationships, thereby enabling more accurate and data-driven analysis and decision support58,59.

In this study, we chose Baidu’s AI Qianfan Big Model Platform (https://qianfan.cloud.baidu.com/) to extract the core metrics of greening efforts. This platform allows one-step big model development and service operation for enterprise developers, providing multi-functional natural language processing capabilities including Wenxin Yiyin’s underlying model and third-party open-source big models. The platform is based on Baidu Intelligent Cloud and adopts the PaddlePaddle deep learning framework as the underlying support, which is capable of realizing high-precision and high-performance model output through a small amount of data adjustment. The application of Qianfan Big Model avoids the complex training process of traditional deep learning models on entity relationship extraction, which greatly saves the development and training time. In addition, the platform provides AI development tools and a complete development environment, making it fast and accurate to handle large-scale and diverse policy documents60. The use of the Baidu AI Qianfan Big Model enables the researcher to focus on building and validating a methodological framework for policy analysis without inputting too many resources into the development and optimization of technical details. In this way, the researchers can more effectively focus on parsing and applying the policy data, thereby promoting the integration of theoretical research and practical application.

The module leverages a large language model and prompt engineering to generate structured greening metrics (e.g., year, region, metric value) in JSON format, enabling subsequent visualization and analysis (Supplementary Table S1). The specific operational process includes the following steps.

  1. (1)

    Prompt design Prompts are directive to guide a large model. The instruction can be a question or a text description with multiple parameters. Based on the prompts provided, the large model will generate corresponding texts or images60. In the large model, prompts affect the quality of the output as instructions. We carefully designed the prompts and debugged them to ensure that they correctly guide the model in extracting key information from the text, such as “year”, “region”, “type of indicator”, “value”, and “unit” (Supplementary Table S1).

  2. (2)

    Interface call The API of Baidu AI Qianfan model is programmatically called, and the pre-processed policy text and prompts are input.

  3. (3)

    Data storage The results extracted by the model are formatted and stored directly into a MySQL database, which facilitates subsequent data analysis, visualization, and provides structured data support for policy evaluation and decision-making.

Policy AI interpretation

This module uses the Baidu AI Qianfan Big Model for in-depth analysis and understanding of policy texts, following the AI processing workflow outlined in “AI extraction of greening core indicators” section, to produce standardized policy interpretations for decision-making support. Its advanced semantic understanding capability is especially suitable for dealing with complex or large-scale policy documents, which can greatly improve the standardization and objectivity of interpretation. For instance, using tailored prompt engineering, the model generates a summary of policy objectives, key terms, and potential impacts (Supplementary Table S1), enabling policymakers to quickly understand core policy elements.

The operation process is similar to the “Extraction of greening core indicators” module, which first determines the interpretation needs and designs appropriate prompts to guide the AI model to focus on the key contents or issues of the policy text. Through the programming interface, the pre-processed policy text is submitted to the large model and parameters are configured to perform in-depth semantic analysis (Supplementary Table S1). Ultimately, the model outputs will form policy interpretations to provide important insights and support to policy makers and analysts.

Real-time policy tracking

Real-time policy tracking is a key component of this study, designed to enable continuous monitoring of policy releases. While current data sources remain limited, the tracking module lays the foundation for integrating multi-source data (e.g., greening policies from government platforms, social media evaluations) to provide policymakers with comprehensive policy dynamics for timely adjustments and optimized designs. The specific operation process is as follows.

  1. (1)

    Setting of a monitoring module: An automated policy monitoring module is set up, which regularly scans the official websites of the government and related departments on a daily basis to identify newly released or updated policy information. This module utilizes search with keywords such as “greening” and “parks” and automated crawling to ensure that all relevant policies are captured in time.

  2. (2)

    Real-time push: Once a new policy is identified, the module will automatically extract and store key metadata in the database, including the policy’s publication source, date, title, and link. The data are pushed to the front-end display interface in real-time through a customized interface, ensuring that decision makers and analysts can instantly receive the latest policy changes and information.

Visualization of intelligent analysis results

In this study, a series of cutting-edge front-end and back-end technologies and tools were used to visualize the results of intelligent analysis. The front-end part mainly uses basic technologies such as HTML, CSS, and JavaScript, combined with the Vue.js framework to build the user interaction interface to ensure the responsiveness and interactivity of the application. For data visualization, we chose AntV G2Plot charting library and AntV L7 geographic information visualization tool from AntV, which together support the graphical representation of complex data and dynamic display of geographic information. The back-end part utilizes Python and Flask frameworks to develop API for processing the front-end requests and realize data interaction with MySQL database, thereby ensuring real-time data processing and feedback.

The whole visualization provides not only a convenient tool for researchers to observe and analyze the impact of policies, but also an intuitive platform for policy makers and the public to understand policy directions and trends, which greatly enhances the transparency and public participation in policy research, thus promoting the open sharing of information and democratization of the decision-making process.

Results

Topic evolution of urban greening policies

This study utilizes Sankey diagrams to analyze the topic evolution of urban greening policy texts in Wuhan between 2009 and 2024 (Fig. 3). The policy texts are explicitly divided into four phases in order to explore the focus shifts and development trends of policies in each phase.

Fig. 3
figure 3

Topic evolution of urban greening policies in Wuhan (2009–2024). The figure shows the evolution of topics and correlation analysis of greening policies in Wuhan. The streamlines of different colors represent different topics related to greening policies; the thickness of the streamlines indicates variations in the attention of each topic over time; and the intersections between streamlines reveal the correlations and interactions between different topics. By tracing individual streamlines, it is possible to understand how a particular greening topic has evolved from the past to the present as well as anticipate the future trend.

As shown in Fig. 3, the policies in the early phase (2009–2012) were mainly focused on basic urban greening construction and ecological protection, specifically including greening projects on urban rooftops and community parks, as well as the planning and layout of urban green spaces and roads. Moreover, the policies were intended to promote three-dimensional greening and ecological gardens, aiming to improve the life quality of residents and the urban ecological environment through urban park construction and ecological development.

In the middle phase (2013–2015), the topics were expanded to the integration of landscaping and culture, highlighting the interplay between nature and culture, and focusing on the greening renewal and maintenance of community parks. In addition, the policies continued to promote three-dimensional urban greening, plan and renovate urban green spaces to meet new development needs, and strengthen the landscape effect of road greening. A new nodal management approach was added to systematically promote urban and rural greening, while ecological development was focused on specific directions and began to pay attention to the protection of forest greenways and wetlands.

In the later phase (2016–2021), the focus of policies further shifted to in-depth ecological protection and publicity, including urban park and wetland protection, urban garden ecological publicity and greenway construction, resulting in enhanced stability of urban ecosystems, landscape afforestation, and neighborhood greening. In addition, this phase also emphasized the management of road greening, reflecting greater attention to the details and quality of urban greening.

For the latest phase (2022–2024), the policies were mostly focused on urban green ecological planning, which was aimed to comprehensively build an urban green ecological system and continuously optimize urban green space planning and management. The focus of construction was wetland and forest protection, while investment in urban park infrastructure was increased to enhance the service function of parks and the citizens’ environmental awareness through public participation activities.

Annual topic distribution and hot keywords

In this study, an annual topic distribution analysis was carried out to provide a time-dynamic view on the focus of the greening policies in Wuhan, which can reveal the evolution and adjustment of policy priorities over time. From 2022 to 2024, the focus of greening policies in Wuhan showed definite trends and shifts. In 2022, the policies were mainly focused on “flower planting and flower city construction” (36%) and “national greening and garden renovation” (32%), reflecting high priorities of urban beautification and citizen participation. However, forestry industry development and requirements only accounted for about 10%, indicating that relatively little importance was attached to forestry development in that year. The policy focus in 2023 shifted to “park greening and facility construction” (27%) and “wetland conservation and roadside greening” (32%), indicating an increase in attention to urban green space infrastructure and wetland protection. “Afforestation and forestry development” showed an obvious increase relative to that of the previous year, and accounted for 19% of the total, indicating a gradual increase in interests in forestry development. By 2024, “park green space and garden construction” (34%) and “wetland flowering and ecological restoration” (29%) became the main topics, indicating that growing importance was attached to ecological restoration and garden construction. At the same time, some new topics such as “green tree planting and forest manager system” (6%) and “forest fire prevention and national land greening” (15%) emerged, revealing the refinement of policies on forest management and ecological protection.

The results of hot keyword analysis for the last three years also demonstrate the shifts of attention focus of greening policies in Wuhan in different years (Fig. 4, Supplementary Table S2). In 2022, “greening”, “green space”, and “wetland” were the most dominant keywords, highlighting the emphasis on urban greening, green space construction, and wetland protection. By 2023, although “greening” was still a high-frequency keyword, “station” and “greenway” exhibited obvious increases in frequency, indicating the focus on improving the quality of urban life. In 2024, there were significant changes in the most important keywords, with “restoration” and “resource recycling” becoming the dominant keywords, with the emergence of some new concerns such as “low temperature”, “rain and snow”, and “frost”. Moreover, the appearance of keywords such as “children” and “friendly” indicates the emphasis on a child-friendly environment. Overall, “golden autumn”, “flowers”, and “sea of flowers” were the focus of attention for the three years, suggesting a continued interest in seasonal landscapes and urban beautification.

Fig. 4
figure 4

Distribution of annual topics and hot keywords of greening policies. In (b), the font size of the keywords is adjusted according to their TF-IDF values to indicate the importance and occurrence frequency of the keywords. The users can view the specific TF-IDF value of each keyword in real time through mouse interaction to further understand the statistical weight of each keyword in the policy text.

Spatial and temporal changes in annual greening indicators

According to the data from 2021 to 2024, the planning of Wuhan for the four core indicators, including newly constructed green space, renovated green space, greenways, and tree planting, show significant spatial differences and temporal changes (Fig. 5). The planning data show different development focus and adjustment trend between the central and distant urban areas. In central urban areas, such as Jiang’an and Jianghan Districts, the planning data for newly constructed and renovated green spaces reached the peak in 2022 and were then adjusted. For example, in Jiang’an District, newly constructed green space increased to 10 ha and renovated green space reached 26.1 ha in 2022, after which the newly constructed green space remained almost unchanged and the renovated green space declined to 5 ha in 2024. Tree planting and greenway construction also showed a relatively stable trend, with greenway construction increasing in 2022 and 2024, such as three kilometers in Jiang’an District in 2022 and 2024, and three kilometers and two kilometers in Jianghan District, respectively. Tree planting generally remained stable despite of some fluctuations in central urban areas. For example, the number of trees planted in Jianghan District between 2021 and 2024 remained at 5000 to 10,000 trees, reflecting an emphasis on optimizing the quality of existing green space and increasing ecological facilities in central urban areas.

Fig. 5
figure 5

Core greening indicator values and temporal and spatial changes in Wuhan (2021–2024). The greenway plan map of 2021 in (b) is missing because all zone plans have a value of zero. The legend uses a gradient from “low” to “high” rather than specific numbers to standardize the range of data across maps.

In distant urban areas, especially Wuhan Economic & Technological Development Zone (Hannan District) and Jiangxia District, the scale of newly constructed green space and tree planting significantly exceeds that of central urban areas with a steadily growing trend. For example, the newly constructed green space in Jiangxia District increased from 220 ha in 2021 to 230 ha in 2024, and the number of trees planted in the same district increased from 550,000 to 600,000. Newly constructed green space in Hannan District increased from 160 ha in 2021 to 200 ha in 2022, and reached 220 ha in 2023, followed by a decrease to 150 ha in 2024. The number of trees planted remained between 100,000 and 300,000 in the region, showing a continued large-scale investment in greening. In addition, the focus of greenway construction has shifted from central to distant urban areas, especially in 2024, with distant urban areas demonstrating positive growth, reflecting more investment in connecting new facilities and improving the life quality of residents. For example, Wuhan Airport Economic & Technological Development Zone (Dongxihu District) and Hannan District have greenway construction plans of five and ten kilometers in 2024, respectively. This change marks a geographical shift in policy focus, highlighting strategic adjustment to expand urban boundaries and enhance ecological functions.

Discussion

Greening policy evolution and strategy analysis

Analysis of topic evolution in this study reveals the evolution of policy topics from infrastructure to integrated ecological management, which reflects the response of policy makers to the emerging needs of urban development and the adaptation of global trends in environmental protection. In early phases, the policies were mainly focused on increasing the area and ecological quality of urban green space and enhancing the ecological functions of cities by promoting three-dimensional greening and ecological gardens. Over time, the focus of policies was gradually shifted to the integration of nature and culture, greening renewal and maintenance of community parks, and adaptation to urban green spaces to meet new urban development needs, reflecting a response to national strategies for urban renewal and ecological civilization61.

Second, the analysis of annual topic distribution and hot keywords further confirms the sensitivity and responsiveness of Wuhan in addressing environmental challenges and meeting the needs of citizens. The evolution of topics from “flower planting and flower city construction” to “wetland flowering and ecological restoration”, as well as the emergence of the new topic “green tree planting and forest manager system” indicates that policy adjustments have become more focused on ecological civilization and forest construction. The changes in keywords reflect not only the public’s increased attention to the quality of greening, but also their rising awareness of environmental protection, which drives the policy to develop in a more integrated and sustainable direction. For example, with increasing impacts of global climate change, the keywords “restoration” and “resource recycling” appeared in 2024, signifying that the city is beginning to strengthen the restoration and sustainable use of ecosystem services and natural capital besides traditional greening efforts62. In addition, the appearance of keywords such as “low temperature”, “rain and snow”, and “frost” reflects the policy makers’ concern and response strategies for extreme weather events, and indicates that Wuhan has integrated disaster risk management and sustainable utilization into its urban planning63. These keywords not only reflect the policy responses, but also indicate the dynamic adjustment and balance between social needs and environmental goals.

Finally, there are great differences in strategies between different regions, indicating full consideration of urban planning on the functional positioning and development needs of different regions, as well as strategic adjustment of the city in expanding urban boundaries and enhancing ecological functions. An in-depth analysis of regional greening indicators shows that Wuhan’s greening policies are clearly differentiated between central and distant urban areas. Greening policies in central urban areas focus on optimizing and maintaining existing green spaces to enhance the residential and commercial attractiveness of the urban core, which matches with the high-density urban layout and complex socio-economic activities in these areas. For example, in central urban areas such as Jiang’an and Wuchang districts, the policy strives to improve the quality and accessibility of public spaces by constructing new greenways and converting green spaces to promote the health of residents and social interaction64. In distant urban areas such as Jiangxia District and Hannan District, on the other hand, the policies are more focused on large-scale newly constructed green space and tree-planting projects to support the rapid urbanization and expansion needs of these areas65, while providing enough greenery to alleviate the environmental pressures that may result from industrial development.

Overall, the analysis of topic evolution, annual topic distribution, and regional indicators reveals the strategies of policy makers in resource allocation and urban greening efforts. These strategies not only reflect a precise grasp of the stage of urban development and regional characteristics, but also provide valuable experiences for other cities, especially in promoting sustainable urban development and ecological civilization. Supported by existing research and relevant official policies61,62,63,64,65, these findings are further validated, reinforcing their scientific rigor and reliability.

Challenges for policy makers and relevant strategies

This study reveals that despite the remarkable achievements of greening efforts in Wuhan, there is still a retarded progress in intelligent construction and collaborative management. In an era of information technology, it is particularly important to improve the level of intelligence in greening construction. Currently, there are more than 1,000 parks in Wuhan, but the number of smart parks is still insufficient to meet the needs of a modern city, which limits the effective collection, monitoring, and management of data. Therefore, it is crucial to strengthen the information infrastructure and improve the level of intelligence in greening policies. The government should make more investment into the construction of smart parks, wetlands, and scenic spots, as well as introduce smart technologies such as the Internet of Things (IoT), visual recognition, and big data analytics to enhance the efficiency of greening management and visitor experience66,67. In addition, the government should establish a comprehensive urban greening information platform68, which can not only integrate GIS and remote sensing technologies to improve the efficiency of planning and execution of greening projects, but also enhance the public’s interaction and sense of participation.

In May 2024, the Wuhan Consolidation and Creation of Natural Resources and Urban–Rural Development Bureau merged different planning, management, and supervisory responsibilities to form an integrated agency69. This initiative provides a good example for landscape greening, since successful greening planning and construction not only relies on the landscape construction department, but also requires the synergistic support from various departments such as urban planning, environmental protection, and finance. Currently, information silos and inadequate coordination mechanisms hinder the effective use of greening resources and the overall implementation of policies. Therefore, it is crucial to further reform the landscaping and greening construction agencies to enhance interdepartmental information sharing and decision-making consistency. In addition, it is also critical to establish a cross-departmental platform for collaborative work, which can improve the communication and collaboration among different departments and the work efficiency, so as to synergistically promote urban greening. For example, the urban planning department can release planning updates through the platform; the landscape construction department can adjust the design and implementation of greening projects based on such information; and the financial department can monitor the fund flow to ensure the rational use of funds. These strategies can not only improve the management efficiency and public satisfaction of urban greening, but also enhance the sustainability and adaptability of greening policies, thus promoting the sustainable development of cities.

Advantages of the proposed policy analysis framework

This study combines AI big models and text mining and proposes an intelligent spatiotemporal analysis framework for greening policies, which utilizes advanced natural language processing techniques to parse policy texts, and extracts and analyzes the key information in policies through a systematic approach to reveal the intrinsic connections and evolutionary trends of policy documents.

First, compared with traditional methods of policy analysis, such as questionnaires and expert reviews, this framework utilizes machine learning algorithms and AI big models to enable real-time, intelligent policy text analysis, which can more accurately identify the trend of policy changes, automatically extract the topics and core metrics, and readily analyze the dynamic development of metrics over time and geographic location. Secondly, the framework supports AI interpretation and real-time policy tracking to help relevant departments to quickly respond to policy requirements and improve the accuracy of policy implementation. In addition, AI interpretation can objectively analyze the policy perspectives and avoid the subjectivity of expert analysis, which not only improves the efficiency of policy analysis and interpretation, but also enhances the transparency and predictability of policy outcomes. Furthermore, the framework has a certain degree of universality and can be extended and applied to other cities or fields, such as urban construction and transportation planning, by simply designing the prompts in a targeted manner and training the corresponding intelligence to extract and interpret the corresponding core indicators for adaptation to the specific policy environment and data characteristics. For example, in urban construction, by customizing the prompts such as “land use change”, “park development”, and “old neighborhood renovation”, the AI model can automatically extract policy planning objectives and implementation goals from the relevant policy texts, thus helping more accurate decision making and policy evaluation.

In summary, this study develops a multidimensional policy analysis framework that integrates AI big models and text mining techniques. The framework covers data collection, topic modeling, keyword extraction, policy AI interpretation, real-time tracking, and application scenario visualization, providing an innovative method and tool for policy analysis. This framework not only is suitable for building an intelligent policy analysis platform (Fig. 6), but also can significantly improve the rationality and precision of policy formulation and implementation, thus supporting decision makers and researchers in complex policy environments and enhancing the effectiveness of policy responses.

Fig. 6
figure 6

Visualization of intelligent analysis results for greening policies in Wuhan. As shown in the animation (Supplementary Animation S1), we constructed a visualization webpage for the intelligent analysis of greening policies in Wuhan. The website provides real-time policy monitoring and AI-driven interpretation functions, as well as multi-dimensional analysis visualization and dynamic interaction, which can be effectively integrated into the policy evaluation system to provide scientific decision support for policy makers.

Limitations and future research

Despite many advantages of the policy analysis framework proposed in this study, there are still some limitations, which may also be possible directions for future research. First, the Baidu AI Big Model is still faced with challenges in processing longer policy texts, especially when the text has a high complexity, which may affect the processing efficiency and output accuracy of the model, thus limiting the possibility to generalize it to a wider range of application scenarios. In addition, although the use of AI big models in the framework significantly saves the time cost of model training, its paid use will increase the cost, and the model promotion and application are also confronted with certain restrictions. Second, the framework mainly focuses on the analysis of textual data and fails to fully integrate other types of data, such as greenfield implementation and monitoring data, which may limit its ability to understand the impacts of policies from multiple dimensions. Indeed, the effects of greening policies often need to be assessed through real-time monitoring data such as real green space conditions and ecological changes, and a comprehensive assessment of policy impacts needs to take a variety of data sources into account so as to gain a more comprehensive perspective. Although the framework achieves multi-level analysis through seven functions, its modules lack sufficient depth to fully address specific policy analysis needs, limiting its responsiveness.

Future research could expand and deepen the current study in several directions. First, research could explore how to combine multiple data types, including quantitative and qualitative data, to construct more comprehensive policy analysis models. In addition, it should be determined how to use AI big models to enhance the interpretive power and transparency of greening policy texts. To address the limited module-specific needs analysis, future research could enhance customization for targeted policy requirements, thereby improving the framework’s utility and depth. Finally, future research should also consider the validation and adjustment of the present framework in more regions and cities of different sizes to assess its adaptability and effectiveness in different policy environments. Through cross-regional experimentation and feedback, the framework can be continuously optimized to better serve the diverse policymaking needs.

Conclusion

Urban green space plays an important role in sustainable urban development. In-depth analysis and assessment of the content and objectives of greening policies are crucial to understanding the intent of the policies and improving the rationality and adaptability of the policies. This study for the first time applies AI big models and text mining to create a multidimensional and dynamic policy analysis framework for systematic assessment of urban greening policies. The framework can analyze policy texts in real-time, intelligently and precisely track policy change trends, automatically extract topics and core indicators, and analyze the dynamic changes of indicators in time and space. In addition, through integrated visualization display, the framework not only enhances the policy makers’ understanding of policy effects, but also improves the timeliness and accuracy of policy adjustment, which can promote the effective implementation and continuous improvement of urban greening policies. This methodology is not limited to greening policy analysis, and can also be widely applied to other policy areas, especially in urban planning and sustainable development policymaking. Future research could explore how to further optimize the analytical model, as well as include more data types and more complex policy environments, to provide policy makers with more comprehensive and accurate decision support. Overall, this study breaks through the limitations of traditional policy analysis and paves a new way to utilize AI big models and text mining techniques for policy evaluation, providing powerful tools and methodologies for policy analysis and application in smart cities.