Introduction

As a major source of energy use and carbon emissions, power generation companies must undergo transformation. During power production, carbon emissions are closely linked to power output. Accurately measuring the relationship between them and optimizing the balance is a crucial task for achieving sustainable development. This process relies heavily on advanced quantitative computing technology and techniques. Currently, providing technical and methodological support for quantitative calculation methods is highly important for power generation enterprises.

The traditional methods for measuring carbon emissions mainly include the mass balance method, the emission factor method, and the continuous monitoring method, which differ in principle, application range, advantages, and disadvantages. The principle of the mass balance method is based on the law of conservation of mass, and carbon emissions are determined by quality accounting of the materials input and output in the production process. However, due to the wide variety of substances involved in the actual measurement process, the measurement is challenging, and it is easy to produce large errors; therefore, it is no longer used in thermal power enterprises1. The emission factor method is the primary method used for carbon emission statistics in China2. According to different energy types and activity types, the corresponding carbon emission factors are determined, and then combined with energy consumption or activity data to calculate carbon emissions. However, the accuracy of carbon emission factors has a significant impact on the results, and it is challenging to accurately reflect the actual carbon emissions of specific enterprises or processes. The continuous monitoring method involves real-time and continuous monitoring of carbon emission sources using specialized equipment, thereby obtaining relevant data on carbon emissions. It can obtain real-time carbon emission data with high accuracy; however, the equipment cost is high, and the technical and maintenance requirements are also substantial3.

Traditional carbon monitoring methods have significant limitations: the material balance method requires comprehensive tracking of all input and output materials, making real-time implementation difficult. Moreover, even minor measurement errors in individual components can accumulate, leading to a total emission calculation deviation of ± 8% to 15%, which is hard to adapt to the rapid load changes in modern power generation. The emission factor method relies on static preset factors and cannot reflect real-time changes in fuel quality, combustion efficiency, etc. Its regional average characteristics also ignore the specific conditions of power plants. In addition, the annual update frequency leads to lagging improvement feedback. Although continuous monitoring systems can provide direct emission data, the installation cost of a single set of 200,000 to 500,000 US dollars and the annual maintenance cost of 50,000 to 100,000 US dollars restrict their application in small and medium-sized power plants. Moreover, they require professional maintenance and frequent calibration. Their single-point measurement characteristics neither predict emission trends nor have operational optimization functions. These fundamental flaws highlight the necessity of developing new monitoring methods.

However, in the actual operational scenario of power generation enterprises, carbon emissions are influenced by various factors, including fuel type, unit operating conditions, ambient temperature, and humidity. These factors are intertwined to form a complex, nonlinear relationship, which further complicates the accurate measurement and prediction of carbon emissions4. It is challenging for traditional methods to effectively address such a complex comparative relationship, leading to discrepancies between the measurement and prediction results and the actual situation. In addition, there are errors in the calculation of emission factors. Lin Yueting proposed that due to the different types and properties of coal mixed in a certain proportion and then burned together in the boiler, there is a certain discrepancy between the theoretical calculation of emission factors and the actual situation5.

Most traditional electric-carbon coupling models are based on empirical formulas and static assumptions, which cannot fully capture the complex dynamics and spatial-temporal correlations in the power generation process, thus limiting the accuracy of the model in prediction and optimization. With the rise of data-driven technology, machine learning models are gradually applied to carbon emission prediction. With its powerful feature mining and complex pattern capture capabilities, combining multiple big data sources from thermal power plants reveals far more than the prediction accuracy of traditional methods. It has brought breakthroughs in the field of carbon emission prediction, greatly improving the reliability and forward-looking nature of the prediction6.

While established energy system planning tools such as PLEXOS, HOMER, and EnergyPLAN have proven valuable for medium- and long-term capacity planning and system-wide emissions estimation, these tools primarily operate at aggregate system levels with predetermined emission factors. PLEXOS, for instance, excels in economic dispatch optimization and strategic planning but relies on static emission factors that cannot capture plant-specific operational variations such as fuel quality fluctuations, equipment aging effects, or real-time efficiency changes. Furthermore, these planning tools typically work with hourly or daily dispatch decisions for system optimization rather than providing the granular, real-time emission tracking required for enterprise-level carbon management and compliance reporting.

The gap between strategic energy planning and operational carbon management necessitates a complementary approach that can bridge system-level optimization with plant-specific emission prediction. This requirement becomes particularly critical as carbon markets demand increasingly precise emission quantification and as power generation enterprises require real-time feedback for operational carbon management decisions.

The purpose of this paper is to develop a comprehensive machine learning-based carbon measurement methodology that complements existing monitoring systems while providing enhanced predictive and optimization capabilities. Rather than replacing direct measurement systems like CEMS, our approach offers several distinct advantages: (1) cost-effective carbon monitoring using readily available operational data, making precision carbon management accessible to enterprises with limited capital resources; (2) predictive modeling capabilities that enable proactive operational planning and carbon trading strategies; (3) direct integration of carbon emissions with operational parameters, facilitating real-time optimization for emission reduction; and (4) scalable deployment across multiple units and integration with existing plant management systems. By comparing and analyzing various machine learning algorithms, this study identifies optimal models suitable for carbon measurement in power generation enterprises, providing a scientific basis for comprehensive carbon management, energy conservation, and emission reduction decisions that support global carbon neutrality goals.

Literature review

For the issue of carbon emission accounting, the emission factor method and the mass balance method are two commonly used calculation methods7. Guo Cheng defined the boundaries of carbon accounting for the entire life cycle of buildings and constructed an estimation8. Zhanhe Li et al. based on the carbon emission flow (CEF) theory, established a carbon responsibility accounting framework suitable for DC traction power supply systems9. Li Yaowang et al. clarified various carbon emission calculation methods in the power industry10. Li Zhijun et al. proposed a near real-time carbon emission accounting framework for ships, which calculates carbon emissions based on fuel consumption and emission factors11.

PLEXOS, as a professional energy system modeling software, is widely used to simulate and optimize the path of low-carbon energy transition with high resolution. Alhadhrami et al. utilized their P2X module to study the energy-saving production strategy of green hydrogen12. Ahmad Amiruddin applied this software to simulate the national power grid of Indonesia and pointed out that higher renewable energy targets would increase the total installed capacity demand, and the optimal energy storage duration varies in different periods13. Rebeka Beres et al. took South Africa as an example and used Plexos to optimize and analyze the green hydrogen production prospects in 2030 at hourly resolution14. Ville Sihvonen et al. focused on district heating systems and simulated the crucial role of electric heating and energy storage in decarbonizing district heating in the Aland Islands of Finland15. However, when it comes to the prediction of carbon emissions, such software often has disadvantages such as difficulty in handling multi-dimensional data and poor dynamic adaptability.

The concept of machine learning (ML) is extensive, ranging from classic linear regression models to the deep neural networks that have gradually emerged in the 21st century. At present, many researchers are using machine learning techniques to track and measure carbon emissions16. Hu, Yuyi et al. integrated extended Long short-term memory Network (XLSTM), quantile regression (QR), and sparse attention mechanism (SA) to achieve high-precision interval prediction of CO₂ emissions at the quantile level17. Cao et al. dynamically quantified the proportion of total CO₂ and green CO₂ through feature selection and linear regression18. Wang et al. combined ResNet-152, bidirectional gated recurrent unit (BIGRU), and Temporal Pattern Attention mechanism (TPA) to achieve precise anomaly detection in the carbon capture process19. Jazmin Mota-Nieto et al. proposed a result-oriented time series carbon accounting method and applied it to assess the emission changes of the CCUS system in Mexico10. Oladeji et al. utilized NLP technology to mine non-traditional data for optimized accounting20, and Yingqi Xia et al. developed a two-stage deep learning estimation method by combining electrical-carbon correlation and NILM technology21. In this field A variety of advanced deep learning models (such as ResNet-BIGRU-TPA22, SSA-Attention-BIGRU23, GDCNN-GRU24, SSA-Attention-BiGRU25, DPRNS-NINJA26 have been proposed. It is used to solve problems such as abnormal detection of carbon sequestration, prediction of carbon neutrality trends, and prediction of CO₂ emissions.

In addition, the selection and optimization of the model are also important considerations. Different machine learning algorithms are applicable to different types of data and problems. Selecting the appropriate model and making proper adjustments are the keys to achieving accurate assessment. Facing these challenges, researchers began to explore a variety of solutions. Gleb Prokhorskii et al. selected four algorithms, namely K-Nearest Neighbor (KNN), Random Forest Regression (RFR), Multi-layer Perceptron (MLP), and Lasso regression, to predict the main steam flow of thermal power units. They determined the optimal hyperparameters by repeatedly training different hyperparameters. To better reflect the complexity of power plants under frequent load changes27. Ye Qiang et al. took electricity consumption as the independent variable and used three models, namely BP neural network, random forest, and SVR prediction, along with the Shapley ensemble learning prediction model, to predict the carbon emissions of the steel industry28. Zheng, Lin, Markus Mueller et al. used multiple machine learning algorithms to predict the carbon emissions of buildings throughout their life cycle. They conducted comparative processing and feature selection through principal component analysis (PCA), Spearman correlation analysis, and cluster heat maps, and finally determined 28 features for modeling29.

In addition to calculating carbon emissions, machine learning can also play a guiding role in energy conservation and emission reduction. Chen et al. proposed a multimodal AI model integrating multiple algorithms to enhance the accuracy of carbon neutrality decision support through cross-modal information alignment30. In terms of corporate carbon performance, multiple studies have confirmed that AI technology and its related policies can significantly enhance carbon performance through channels such as improving productivity, optimizing resource allocation, and promoting green innovation31,32,33,34,35. The DeepGreen-Opt framework by Bai H et al.36 and the D-OPF framework based on CEF and PPO algorithms by Sangyoon Lee et al.37 have demonstrated the powerful capabilities of AI in the optimization of complex industrial processes and multi-energy systems. Kege Sun et al. also analyzed the impact of the carbon market (ETS) on technological progress in the power generation industry by using econometric methods38. Chen et al. proposed a multimodal artificial intelligence framework based on VQA, integrating ALBEF, CLIP and large language models (LLMS) to enhance monitoring and decision support capabilities in carbon-neutral environments39.

As a high-carbon-emission industry, thermal power enterprises have received numerous studies on their efficiency assessment, emission reduction costs, and specific technical optimization plans. Fangrong Ren et al. took 21 listed thermal power enterprises in China as samples and adopted a two-stage parallel AR dynamic SBM-DEA model to analyze the relationship between multiple variables and CO₂ emission reduction40. Xiaoxue Wei et al. calculated carbon emission reduction (CERA) and emission reduction efficiency (CERE) from the perspectives of coal savings in thermal power (TPER) and clean energy generation (CEER) at the provincial level using the three-stage DEA model41. The study by Abhinav Jindal et al. on 129 thermal power plants in India shows that the marginal cost of CO₂ reduction is significantly affected by factors such as the age and geographical location of the power plants42. Xiaopan Liu et al. combined the LCA and MLP models and proposed a strategy for reducing emissions by optimizing combustion and control for a 1000 MW ultra-supercritical unit43. Kairui Li et al. innovatively utilized remote sensing data to construct the TACEE model, achieving high-precision estimation of carbon emissions from thermal power plants and providing a new tool for monitoring44. H. Ding et al. proposed an improved IP2HH method, achieving a 35.5% carbon reduction in the test system through an electric-thermal synergy mechanism45. Muhammad Amir Raza et al. used the SARIMAX model to predict the development trend of global coal-fired power and formulated corresponding emission reduction policies46. Zhao X et al. studied the change in carbon emission in the electro-hydrogen coupling system from the perspective of carbon emission flow and distribution13. Xiaopan Liu et al. focused on the impact of deep peak shaving on carbon emissions from coal-fired power plants, used a multi-layer perceptron ( MLP ) regression model to predict carbon emissions under dynamic loads, determined the nonlinear mapping relationship between carbon emissions and multi-input features15.

Methods

Carbon measurement modeling workflow

Figure 1 below presents the overall workflow of the carbon measurement modelling for power generation enterprises proposed in this study. The entire method consists of three core stages. The first stage is the data preparation stage, where actual operational data from power generation enterprises is analyzed. Eighteen characteristic parameters, including power generation volume, fuel type, and load rate, are integrated, and the most predictive key factors are selected using a feature selection strategy. The second stage is the model construction stage, where three different machine learning algorithms are compared, and targeted hyperparameter optimization is implemented for each model to enhance its performance. The final stage is the evaluation and output stage, where the model’s prediction accuracy is comprehensively verified through the evaluation of RMSE, MAE, R2, and MAPE. The optimal model is then verified to support the enterprise’s carbon management, energy conservation, and emission reduction decisions. This structured workflow design ensures the traceability and reproducibility of the entire process from raw data to decision support.

Fig. 1
Fig. 1
Full size image

The overall workflow of carbon measurement modeling for power generation enterprises.

Feature selection strategy

(1) Correlation-driven feature selection.

By analyzing the statistical correlation between features and target variables, this method selects the features that contribute significantly to the prediction target. Its core methods include:

Pearson Correlation Coefficient: It primarily measures the linear correlation between features and target variables, and is suitable for scenarios where the data distribution is close to normal and the relationship exhibits a linear trend. For example, in power load forecasting, a linear correlation often exists between temperature and electricity consumption, and key features can be quickly identified using the Pearson coefficient. Its limitation is that it cannot capture non-linear or non-monotonic relationships.

Spearman Rank Correlation: By comparing rank differences between features and target variables (i.e., ranking differences), the monotonic non-linear correlation between the two is assessed. For example, equipment ageing and failure rates may show a non-linear but monotonically increasing trend, in which Spearman’s coefficient is more advantageous than Pearson’s. This method is robust to outliers, but can not identify complex wave-type relationships.

(2) Model-based feature importance assessment.

Model-based feature importance assessment. The contribution of features to output variables is quantified by constructing a prediction model. The specific methods include:

Tree-based ensemble learning importance score:

XGBoost, Random Forest and other algorithms are used to calculate the contribution value of feature splitting, including gain contribution, feature frequency and coverage feature splitting.

Permutation Importance:

The importance of features is evaluated by randomly disrupting eigenvalues and measuring the degree of performance degradation of the model. This method is particularly suitable for identifying key predictors in time series data.

(3) System implementation of recursive feature elimination (RFE).

Recursive Feature Elimination (RFE) is a top-down feature selection method that gradually removes the least important features through an iterative optimization process. In each iteration, the model’s performance is evaluated using k-fold cross-validation, and a curve of feature number versus model performance is constructed to determine the optimal feature subset. The early stop mechanism is introduced in the feature screening process to prevent excessive screening from degrading model performance.

In summary, through this series of systematic feature selection methodologies, the carbon emission prediction model can extract the most predictive feature subset from the massive number of features, significantly improving the prediction accuracy and model generalization ability.

Machine learning method selection and its principle

This study selects three machine learning algorithms, namely multi-source linear regression, XGBoost, and long short-term memory network (LSTM), to construct carbon emission prediction models. The selection is mainly based on the following comprehensive considerations: aiming to cover the complete methodological spectrum from basic linear models to complex nonlinear ensemble models and then to deep time series models, in order to systematically evaluate the applicability of different modeling paradigms. Specifically, multi-source linear regression provides a benchmark for understanding the potential linear relationship between features and carbon emissions and has high interpretability; XGBoost, with its strong ability to capture nonlinear patterns and efficient handling of feature interactions, is suitable for analyzing the complex relationships formed by multiple factors in the power generation process; while LSTM is specifically used to explore possible time series dependencies in carbon emission data, and its gating mechanism can effectively model dynamic processes. This diversified model combination not only reflects the mainstream methods in the current field of carbon emission prediction but also directly serves the core goal of this study to identify the optimal prediction model and support enterprises in achieving precise carbon management and emission reduction decisions.Now let’s start introducing the specific principles of each method one by one:

Multi-source linear regression.

Multiple linear regression is used to describe the relationship between the independent variable (explanatory variable) and the dependent variable (response variable) by establishing a linear equation. Multi-source linear regression is suitable for the case where the relationship between data features is relatively linear, and the calculation is simple and easy to explain.

$${\mathbf{Y}}={\mathbf{X}}\beta +\varepsilon$$
(1)

where, Y = [y1,y2,…,yn]T, represent the carbon emission vector (n = 334);

\({\mathbf{X}}=\left[ {\begin{array}{*{20}{c}} {{x_{11}}}& \cdots &{{x_{1p}}} \\ \vdots & \ddots & \vdots \\ {{x_{n1}}}& \cdots &{{x_{np}}} \end{array}} \right]\), is the feature matrix (p = 18);

β = [β0,β1, ,βp]T, is the regression coefficient.

XGBoost

XGBoost (eXtreme Gradient Boosting) is an ensemble learning algorithm. The objective function comprises a loss function (mean squared error) and a regularization term to balance the model’s complexity and generalization ability. The loss function, that is, the negative gradient of the current model, is used as an approximation of the residual, and then a new decision tree is fitted to predict the residual. The process is iterated until the preset stopping condition is met, thereby gradually improving the model’s performance.

$$L(\theta )=\mathop \sum \limits_{{i=1}}^{n} l\left( {{y_i},{{\hat {y}}_i}} \right)+\mathop \sum \limits_{{k=1}}^{K} {\text{\varvec{\Omega}}}\left( {{f_k}} \right)$$
(2)

where \(l({y_i},{\hat {y}_i})\) is the loss function and \({\text{\varvec{\Omega}}}({f_k})\) is the regularization term.

LSTM

LSTM ( Long-Short Term Memory, LSTM ) is a special recurrent neural network ( RNN ). By learning long-term dependencies through memory and forgetting mechanisms, input sequence data (such as carbon emissions from the past few days, electricity demand, etc.) is used, and the forgetting gate, input gate, and output gate of the LSTM unit are utilized to update the hidden state, ultimately predicting future carbon emissions.

Forget gate :

$${{\mathbf{f}}_t}=\sigma ({{\mathbf{W}}_f} \cdot [{{\mathbf{h}}_{t - 1}},{{\mathbf{x}}_t}]+{{\mathbf{b}}_f})$$
(3)

Input gate :

$${{\mathbf{i}}_t}=\sigma ({{\mathbf{W}}_i} \cdot [{{\mathbf{h}}_{t - 1}},{{\mathbf{x}}_t}]+{{\mathbf{b}}_i})$$
(4)

Candidate memory units :

$${{\mathbf{\tilde {C}}}_t}=\tanh ({{\mathbf{W}}_C} \cdot [{{\mathbf{h}}_{t - 1}},{{\mathbf{x}}_t}]+{{\mathbf{b}}_C})$$
(5)

Cell state update :

$${{\mathbf{C}}_t}={{\mathbf{f}}_t} \odot {{\mathbf{C}}_{t - 1}}+{{\mathbf{i}}_t} \odot {{\mathbf{\tilde {C}}}_t}$$
(6)

Output gate :

$${{\mathbf{o}}_t}=\sigma ({{\mathbf{W}}_o} \cdot [{{\mathbf{h}}_{t - 1}},{{\mathbf{x}}_t}]+{{\mathbf{b}}_o})$$
(7)

The final hidden state :

$${{\mathbf{h}}_t}={{\mathbf{o}}_t} \odot \tanh ({{\mathbf{C}}_t})$$
(8)

where, \(\sigma\) Represents the sigmoid activation function (output [0, 1]), \({{\mathbf{h}}_{t - 1}}\) represents the hidden state at the previous moment, \({{\mathbf{x}}_t}\) is the current input feature, \({{\mathbf{W}}_o}\), \({{\mathbf{W}}_i}\), \({{\mathbf{W}}_{\text{c}}}\), \({{\mathbf{W}}_f}\) are weight matrices, \({{\mathbf{b}}_i}\), \({{\mathbf{b}}_o}\), \({{\mathbf{b}}_f}\)represent bias vectors, \({{\mathbf{C}}_{t - 1}}\) is the cell state at the previous moment, tanh is the hyperbolic tangent activation function, \({{\mathbf{\tilde {C}}}_t}\) is the current cell state, \({{\mathbf{i}}_t}\) controls the storage intensity of new information\({{\mathbf{\tilde {C}}}_t}\), \({{\mathbf{o}}_t}\) is the proportion of the cell state output externally, \({{\mathbf{h}}_t}\)is the final output temporal feature representation.。.

To address the temporal dependencies in carbon emissions of power generation enterprises, the LSTM architecture employs a forget gate\({{\mathbf{f}}_t}\) to preserve long-term factors such as fuel quality stability, while the input gate \({{\mathbf{i}}_t}\) captures short-term fluctuations including load variations. The cell state\({{\mathbf{\tilde {C}}}_t}\) integrates the full operational cycle status of generating units. Subsequently, the output gate\({{\mathbf{o}}_t}\) generates a hidden representation adapted to key dynamic features (X1, X3, X10, X11). This representation is ultimately mapped to daily carbon emission predictions through a linear transformation layer.

Indicators for model evaluation

Root mean square error (RMSE)

RMSE represents the standard deviation of the error between the predicted value and the actual value, and has the same unit as the original data. It can directly reflect the absolute deviation scale between the predicted value and the actual value。.

Mean absolute error (MAE)

MAE is the average absolute value to measure the difference between the predicted value and the actual value. It is less sensitive to outliers and is suitable for intuitive evaluation of the average error level. By calculating the average value of the absolute error, the interference of outliers on the evaluation results is reduced, and the overall average error level of the model can be stably reflected.

Coefficient of determination (R2)

The index to measure the goodness of fit of the model indicates the proportion of variance explained by the model relative to the total variance. The value range of R2 is [0,1], the value of R2 is closer to 1, the better the model fitting effect is.By quantifying the proportion of the model’s interpretation of the total variance of carbon emissions, it directly reflects the model’s ability to capture the correlation between complex factors in the power generation process and carbon emissions.

Mean absolute percentage error (MAPE)

MAPE can measure the average percentage of the difference between the predicted value and the actual value, and the results are expressed as a percentage, which facilitates comparison of different models or data sets.It helps to compare the prediction accuracy of different models, and at the same time can directly convey the error level to enterprise managers, providing a clear reference basis for carbon management decisions.

Model optimization

Considering that the model’s accuracy may be further optimized, this section discusses the hyperparameter tuning strategy of the model. The specific tuning strategies of each model are as follows :

Linear regression model: its parameters are directly solved by the least squares method, which does not involve the iterative optimization process.

XGBoost model: In the tuning strategy for the XGBoost model, three critical parameters—denoted as the hyperparameter set θ={η,d, n}—are optimized via grid search. Learning rate (η): Candidate values{0.01,0.1,0.2}.Max tree depth (d): Candidate values{3,5,7}. Number of trees (n): Candidate values{50,100,200}. The optimization objective is to minimize the validation loss (RMSE) over the

$${\theta ^ * }=\mathop {argmin}\limits_{{\theta \in \Theta }} {L_{val}}({f_\theta };{D_{val}})$$
(9)

where \({D_{val}}\) is the validation dataset, and \({f_\theta }\) is the XGBoost model parameterized by θ.

LSTM model: The optimization process for the LSTM model primarily targets the input sequence length, denoted as s, which represents the number of consecutive historical days used to predict the next day’s carbon emissions. Other model parameters, such as the hidden dimension, number of layers, dropout rate, and training hyperparameters (e.g., learning rate, batch size), remain fixed at preset values. The candidate values for the sequence length are s {5, 7, 10}, selected to capture short-term (5 days), weekly (7 days), and medium-term (10 days) temporal patterns.

Input sequence: Given a time step t, the input sequence Xt(s) is a matrix comprising feature vectors from day t − s + 1 to day t:

$${{\mathbf{X}}_{t - s:t}}=\{ {{\mathbf{x}}_{t - s}},{{\mathbf{x}}_{t - s+1}}, \ldots ,{{\mathbf{x}}_t}\}$$
(10)

where xτ Rd is the feature vector at day τ, and s is the sequence length.

Prediction target: the LSTM model fLSTM maps the input sequence Xt(s) to the predicted carbon emission \({\hat {y}_{t+1}}\)for the next day (t + 1):

$${\hat {y}_{t+1}}={f_{{\text{LSTM}}}}\left( {{\mathbf{X}}_{t}^{{(s)}};\phi } \right)$$
(11)

Here, ϕ denotes the fixed parameters of the LSTM architecture.

The optimal sequence length s is determined by minimizing the root mean square error (RMSE) on the validation set via time-series cross-validation:

$${s^ * }=\mathop {argmin}\limits_{{s \in \{ 5,7,10\} }} RMS{E_{val}}({f_{LSTM}};s)$$
(12)

where RMSEval(s) is computed as:

$${\text{RMS}}{{\text{E}}_{{\text{val}}}}(s)=\sqrt {\frac{1}{{|{\mathcal{D}_{{\text{val}}}}|}}\mathop \sum \limits_{{t \in {\mathcal{D}_{{\text{val}}}}}} {{\left( {{y_{t+1}} - \hat {y}_{{t+1}}^{{(s)}}} \right)}^2}}$$

where \({\mathcal{D}_{{\text{val}}}}\) is the validation set, and \(\hat {y}_{{t+1}}^{{(s)}}\) is the prediction for day t + 1 generated by the LSTM model trained with sequence lengths.

Results

Data experiment

A power plant in Haikou City, Hainan Province, is primarily coal-based, supplemented by a small amount of diesel power generation, and has a very low dependence on purchased electricity. In the annual fuel consumption, the proportion of medium and high volatile bituminous coal is more than 99%. Among them, the annual coal consumption of Units 8 and 9 is 1.015 million tons and 1.041 million tons, respectively, accounting for 99.3% of the total coal consumption of the whole plant. Diesel accounts for only 0.7% of the plant’s total fuel consumption, which is primarily used for auxiliary operations. Electricity purchases occurred only in September, accounting for 0.04% of total electricity consumption, with a negligible impact on carbon emissions. The high calorific value of coal and stable carbon content parameters support high power generation efficiency; however, the high carbon content of coal results in the total carbon emission of the entire plant reaching 3.329 million tons of CO2, with coal accounting for more than 99.7%.

In this paper, the carbon emission factors are calculated by using the coal consumption of various energy sources and real-time monitoring of thermal power generation in the power plant from January 1, 2024, to November 30, 2024. Temperature, precipitation and wind speed are used as climate information to carry out subsequent model testing and prediction research.

In this study, to accurately model and analyze the carbon emissions of a unit within the enterprise, a series of relevant parameters was carefully selected. The parameter X1 represents the power generation of the unit, and X2 is the power consumption of the unit. X10 represents the monthly operating hours of the unit. X3 represents the power structure reflected in the proportion of coal, natural gas, and nuclear power, which is calculated based on the monthly power generation share of coal, natural gas, and nuclear power output by the calculation system. The parameters of X4 (fuel type), X5 (whether biomass co-firing), X6 (product type), and X7 (installed capacity) remain unchanged and are derived from the system. X8 represents the running time of the unit, calculated based on the number of days specified in the enterprise research report, according to the production time. The data accuracy of X9 ( heat supply ), X11 ( load rate ), X12 ( coal consumption of power generation ), X13 ( received carbon content of base elements ), X14 ( carbon content per unit calorific value ) and X15 ( biomass co-combustion ) are all in months and exported from the system. For environment-related parameters, X16 (temperature), X17 (precipitation), and X18 (wind speed) represent the average conversion of hourly data to daily data, where the temperature data are from Haikou. The dependent variable Y, namely the carbon emissions of a unit within the enterprise, is converted into daily data by using the daily coal or natural gas consumption of the unit obtained from the enterprise’s research. The comprehensive selection of these parameters can provide a more comprehensive and accurate exploration of the factors affecting the unit’s carbon emissions.

The study collects daily operational data from January 1 to November 30, 2024, resulting in n = 334 samples (days). This accounts for leap year effects in February 2024. Each sample includes: Daily carbon emissions yi (derived from fuel consumption) and 18 feature vectors xi(aggregated to daily resolution).

Variation characteristics of carbon emission factors of power generation

The carbon emission intensity of power generation at this power plant shows differences among units and monthly fluctuations. The monthly fluctuation diagram of the carbon emission intensity of power generation from January to November 2024 is shown in Fig. 2.

Fig. 2
Fig. 2
Full size image

Monthly fluctuation diagram of the carbon emission factor of power generation.

As illustrated in Fig. 2, Unit 8 exhibits a higher carbon emission intensity compared to Unit 9, which could be attributed to differences in load rates and heat supply ratios between the units. The monthly carbon emission intensity ranges from 0.84 to 0.953 tCO2/MWh. The lowest average value for the entire plant occurs in February (0.84 tCO2/MWh), while the highest is observed in August (0.894 tCO2/MWh). These fluctuations are closely associated with variations in the carbon content per unit calorific value of coal, seasonal changes in the heat supply ratio, and the operational efficiency of the units. Notably, the stability of the carbon oxidation rate and the lower calorific value of coal provide a robust data foundation for constructing models. However, the carbon emission intensity demonstrates a strong positive correlation with the carbon content per unit calorific value of coal, underscoring the critical importance of fuel quality control in achieving carbon reduction targets. Overall, despite the success of coal consumption in limiting inter-annual fluctuations in carbon emission intensity to within ± 4.3%, the heavy reliance on coal remains a fundamental barrier to deep decarbonization. Thus, it is imperative to address this challenge by advancing technical breakthroughs through unit flexibility transformation and multi-energy collaborative dispatching.

Prediction effect of different models before tuning

The processed data sets are input into linear regression model, XGBOOST model and LSTM model respectively.

Table 1 Error comparison of prediction results of each model before optimization.

The error comparison of the prediction results of each model is shown in Table 1.

Prediction effect after optimization of different models

In this section, we have explored various combinations of inputs X. The results indicate that selecting the combination of X1, X3, X10, and X11 yields the best results for the output Y. To systematically analyze the multidimensional relationships among power generation (X1), operating hours (X10), load factor (X11), and energy structure (X3) in coal-fired power plants, a scatter matrix was employed to visualize the joint distribution patterns and covariance relationships of the four variables, as shown in Fig. 3. The prediction error comparisons of the tuned models are presented in Table 2.

Fig. 3
Fig. 3
Full size image

Correlation and distributional characterization of X1, X3, X10, X11.

Table 2 Error comparison of prediction results after optimization of each model B.

Discussion

Model performance analysis and interpretation

This study systematically evaluated three distinct machine learning approaches for carbon emission prediction in thermal power generation facilities, revealing significant insights into the comparative performance of different modeling paradigms for carbon management in the energy sector. The exceptional performance of the XGBoost model, which achieved 90.39% prediction accuracy after optimization, can be attributed to its ensemble learning framework that effectively captures the complex, multi-factorial relationships inherent in power generation processes. The algorithm’s gradient boosting framework excels at modeling nonlinear interactions between operational parameters, including power generation volume (X1), energy structure (X3), operating hours (X10), and load rate (X11), which correlation analysis identified as the most predictive feature subset. The model’s capacity to handle feature interactions proves particularly crucial in the power generation context, where carbon emissions result from multiplicative rather than additive effects of operational variables. XGBoost’s tree-based architecture inherently captures these conditional dependencies, thereby explaining its superior performance relative to linear approaches. The optimal hyperparameter configuration achieved through grid search represents a balanced compromise between model expressiveness and generalization capability, as evidenced by the substantial improvement from 45.3% to 90.39% accuracy following optimization.

The performance of the linear regression model reveals both the existence of underlying linear relationships and the inherent limitations of linear assumptions in complex industrial systems. The model’s relatively strong R2 value (0.94 before optimization) indicates that substantial linear relationships exist between operational features and carbon emissions, particularly for dominant operational parameters. However, the model’s degraded performance following optimization (accuracy declining to 89.25%) demonstrates that although linear relationships exist at the aggregate level, they prove insufficient when the analysis focuses on the most predictive features. This observation suggests that the carbon emission process involves significant nonlinear interactions that linear models cannot adequately capture, though the model’s interpretability advantage remains valuable for regulatory compliance and operational transparency.

The limited performance of the LSTM model (R2 approaching zero, 89.24% accuracy) indicates that carbon emissions in the investigated power plant do not exhibit strong temporal dependencies that LSTM’s memory mechanisms can effectively exploit. This finding suggests that daily carbon emissions are predominantly determined by contemporaneous operational conditions rather than historical operational patterns, aligning with the physical nature of power generation where emissions are directly correlated with real-time fuel consumption and combustion efficiency. The constrained model performance may also be attributed to the relatively limited dataset duration (334 days) and daily data aggregation level, which may have obscured shorter-term temporal patterns that could be more relevant for LSTM modeling applications.

The proposed framework addresses critical limitations of conventional emission monitoring systems through an integrated machine learning approach. Through simultaneous processing of 18 operational parameters, the framework achieves real-time adaptability to dynamic operational conditions, including fluctuations in fuel quality (X13, X14), load variations (X11), and efficiency metrics (X10), which static emission factors inherently fail to capture. This approach demonstrates significant cost-effectiveness through the utilization of existing operational data streams, thereby eliminating dependence on specialized monitoring hardware while maintaining high predictive accuracy. Crucially, the framework facilitates a paradigm shift in carbon management from retrospective calculation to proactive strategic deployment: its predictive capabilities enable optimized dispatch decisions and preemptive emission reduction planning. Furthermore, the framework’s continuous learning mechanism ensures progressive refinement as new operational data accumulates, fundamentally contrasting with static methodologies that require manual recalibration for accuracy maintenance.

Complementary role with direct measurement systems

The high prediction accuracy achieved by the optimized XGBoost model demonstrates the viability of ML-based approaches as valuable complements to direct measurement systems such as CEMS. While CEMS provides the gold standard for real-time carbon emission monitoring with measurement uncertainties typically below 5%, the ML framework offers distinct operational advantages that enhance comprehensive carbon management capabilities. The predictive nature of this approach addresses a critical gap in current carbon management practices. Traditional CEMS installations provide historical and real-time data but cannot anticipate future emissions based on planned operational modifications. The developed models enable power plant operators to evaluate the carbon implications of different operational scenarios prior to implementation, thereby supporting proactive rather than reactive carbon management strategies.

Furthermore, the cost-effectiveness of this approach renders comprehensive carbon monitoring feasible for enterprises with constrained capital resources. The total implementation cost of the ML framework, including data infrastructure and model development, represents approximately 5–10% of a single CEMS installation cost while providing monitoring coverage across multiple operational parameters and extended time horizons. The integration capabilities of this methodology with existing plant management systems create synergies with CEMS data. When both systems operate concurrently, CEMS provides high-accuracy calibration benchmarks for ML model validation, while ML predictions inform operational decisions that CEMS data alone cannot directly support. This complementary deployment maximizes the value of both monitoring approaches while minimizing overall carbon management costs.

Methodological significance and limitations

From a methodological perspective, this investigation demonstrates the superior performance of ensemble learning approaches compared to traditional linear methods and recurrent neural networks for carbon emission prediction in thermal power plants. The comprehensive feature selection strategy, which combines correlation analysis with recursive feature elimination, effectively identified the most predictive variables while reducing model complexity and computational requirements.

Nevertheless, several limitations warrant acknowledgment. The investigation’s focus on a single coal-fired power plant in Hainan Province may constrain the generalizability of findings to different plant configurations, fuel types, and operational environments. The relatively limited observation period (11 months) restricts the model’s capability to capture seasonal variations and long-term operational trends that may influence carbon emission patterns. Furthermore, the daily aggregation of operational data may mask important intra-day patterns that could potentially enhance prediction accuracy and provide more granular insights into emission dynamics.

Conclusion

Primary research achievements

This research establishes a systematic methodology for predicting carbon emissions that integrates dynamic training set selection mechanisms with hybrid feature selection models. Through rigorous analysis of 18 operational parameters collected from a coal-fired power plant in Hainan Province over 11 months (January to November 2024), we developed a comprehensive feature selection strategy that combines correlation-driven analysis, model-based importance assessment, and recursive feature elimination (RFE). This multi-tiered approach successfully identified the four most predictive variables: power generation volume (X1), energy structure proportion (X3), monthly operating hours (X10), and load rate (X11), demonstrating the complex interplay between operational efficiency and environmental impact.

The study conducted an extensive comparative evaluation of three distinct machine learning paradigms: multi-source linear regression, XGBoost ensemble learning, and Long Short-Term Memory (LSTM) networks. Each model was systematically optimized through targeted hyperparameter tuning strategies. The linear regression model, although providing excellent interpretability and baseline performance (R2= 0.94), revealed the limitations of linear assumptions in capturing the complexity of industrial processes. The LSTM model, despite its theoretical advantages in temporal pattern recognition, demonstrated poor performance (R2≈0.017) due to the predominantly contemporary nature of carbon emission dependencies rather than historical patterns. Most significantly, the XGBoost model achieved exceptional performance after optimization, reaching 90.39% prediction accuracy with the lowest RMSE (661.1) and MAE (526.22) values, representing a substantial improvement from its pre-optimization accuracy of 45.3%. The implementation of systematic hyperparameter tuning through grid search algorithms proved crucial for enhancing model performance. For XGBoost, the optimal configuration, including learning rate (η = 0.1), maximum tree depth (d = 5), and number of trees (n = 100), was determined through comprehensive validation across 27 parameter combinations. The LSTM sequence length optimization revealed that short-term temporal patterns (5–7 days) provided optimal performance, confirming that carbon emissions are primarily driven by immediate operational conditions rather than extended historical dependencies. The optimized XGBoost model’s 90.39% prediction accuracy represents a significant advancement over traditional carbon measurement methods, which typically rely on static emission factors and simplified calculations. This improvement is particularly crucial for power generation enterprises operating under increasingly stringent carbon regulations and market-based emission trading systems. The model’s robust performance across different evaluation metrics demonstrates its reliability for real-world deployment in industrial carbon management systems.

Unlike traditional carbon accounting methods that provide retrospective assessments, the developed machine learning framework enables near real-time carbon emission monitoring and prediction. This capability supports dynamic decision-making processes, including optimal load dispatch, fuel scheduling, and maintenance planning —all critical for minimizing the carbon footprint while maintaining operational efficiency. The model’s ability to process daily operational data and provide accurate predictions makes it invaluable for both regulatory compliance and carbon trading strategies. While validated on a specific coal-fired power plant, the methodological framework demonstrates strong potential for scalability across different power generation technologies and operational contexts.

The method provides power generation companies with a precise and efficient solution for real-time carbon monitoring and forecasting, enabling data-driven decision-making in carbon management, energy conservation, and emission reduction. Furthermore, it lays a scalable foundation for improving carbon accounting systems and supporting low-carbon dispatch strategies within the energy sector.

Future research directions

Looking to the future, some promising research approaches have emerged:

1. The most pressing research priority involves expanding the validation scope to encompass diverse power generation enterprises across different geographical regions, fuel types, and operational scales. Future studies should systematically evaluate the framework’s performance across coal-fired, natural gas, biomass, and hybrid power plants operating in various climatic conditions and regulatory environments. This expansion is essential for developing universally applicable carbon prediction models that can support sector-wide decarbonization efforts.

2. While this study incorporated basic meteorological variables, future research should systematically integrate additional external factors, including electricity market prices, renewable energy availability, grid stability requirements, and regional air quality constraints. These factors are increasingly influencing operational decisions in modern power systems and could significantly enhance prediction model accuracy and practical applicability.

3. Our ML-based framework exhibits considerable potential for integration with well-established energy planning tools, paving the way for the creation of a holistic carbon management ecosystem. The precise emission factors generated by our framework have the capacity to significantly enhance the accuracy of system-level optimization models employed in tools such as PLEXOS. This integration would facilitate several key advancements: Firstly, it would enable dynamic updates of emission factors, grounded in real operational performance data. This ensures that emission predictions remain current and accurately reflect actual plant conditions, thereby enhancing the reliability of optimization models. Secondly, the integration would enhance the accuracy of system-wide emission forecasting for long-term energy planning. By incorporating plant-specific emission factors, planners can achieve a more nuanced understanding of future emission trends, enabling more informed decision-making. Thirdly, operational emission tracking would provide real-time validation of planning assumptions. This feedback loop allows planners to adjust their strategies promptly in response to actual emission data, ensuring that plans remain aligned with operational realities.