Introduction

Well production prediction plays a critical role in reservoir engineering, significantly influencing financial decisions, development optimization, production operations, and uncertainty management. Accurate short-term forecasts ensure daily operational targets are met while optimizing resource allocation and costs1. Robust long-term forecasts guide financial planning, reserve evaluations, and economic assessments of reservoirs, informing budgets and investment strategies2. By quantifying forecast uncertainty, engineers can better manage risks and develop flexible plans that adapt to evolving reservoir conditions3. Moreover, optimizing well placement and production strategies maximizes reservoir recovery and economic benefits4.

Decline curve analysis (DCA) methods are widely used in oil and gas forecasting due to their simplicity and computational efficiency, particularly when detailed geological models are unavailable or when analyzing large datasets5,6,7. However, the accuracy of DCA is limited by its reliance on model selection, data quality, and assumptions about driving mechanisms. Different models, such as the Arps and exponential decline models, may fit the same dataset well but produce divergent predictions, leading to inaccuracies if the wrong model is chosen8. Additionally, traditional DCA often requires manual intervention for outlier detection and data fitting, introducing subjectivity and limiting standardization5.

Numerical simulation(NS) offers detailed modeling of reservoir conditions, providing critical insights into flow behavior and informing optimized production strategies9,10. NS often achieve high accuracy in production forecasts than DCA, particularly when integrated with machine learning (ML) techniques11. However, NS is computationally intensive, requiring significant time and resources for model development, particularly in large-scale reservoirs with complex physical phenomena12,13. Additionally, uncertainties in model parameters can reduce forecast reliability, and changes in operating conditions may render previously insignificant parameters critical14. These challenges limit the broad applicability and efficiency of NS15,16.

ML techniques have demonstrated significant potential in production forecasting by processing large datasets and capturing complex nonlinear relationships. Support Vector Regression (SVR), enhanced by metaheuristic optimization for hyperparameter tuning, achieves high precision17,18,19. Random Forest (RF) excels in ranking variable importance and forecasting cumulative production20, while Gradient Boosting Trees (GBT) effectively model variable interactions and nonlinearity21. However, ML models are prone to overfitting and often struggle with generalization across reservoirs with differing geological and operational conditions20,22.

Deep learning (DL) techniques, particularly Long Short-Term Memory (LSTM) networks and Gated Recurrent Units (GRUs), have advanced production forecasting by capturing long-term dependencies and temporal patterns23,24. Despite these improvements, traditional DL models like RNN, LSTM, and GRU still face limitations in capturing the nonlinear relationships and long-term temporal dependencies present in oil production data. These models also lack the capacity to fully address the dynamic interactions between operational behaviors and reservoir conditions, leading to suboptimal predictive performance in complex scenarios. Recent studies, such as Zhen et al.‘s work on attention-enhanced temporal convolutional networks (TCNs), have demonstrated that incorporating attention mechanisms into TCNs can effectively extract temporal features and improve the performance of well production forecasting models25. By dynamically focusing on critical time steps and key operational factors, attention mechanisms enhance the model’s ability to learn intricate patterns and dependencies in oil production data, thereby addressing some of the limitations of traditional DL models.

Hybrid DL models, such as CNN-LSTM, Attention-CNN-LSTM, and CNN-GRU, have further improved forecasting accuracy by combining CNN’s feature extraction capabilities with recurrent networks’ temporal modeling strengths26,27,28. Bruyker et al.29,30,31 integrate data-driven approaches with physics-based principles, offering significant potential for reservoir management strategies. Recent advancements, such as the causal-based temporal graph convolutional network (CGTCN) developed by Bin et al., demonstrate the value of incorporating causal knowledge into deep learning prediction methods for CCUS-EOR. By identifying dynamic mechanisms and causal pathways in CCUS-EOR systems, CGTCN provides a template for causal discovery and prediction in energy engineering, highlighting the importance of domain knowledge integration in improving interpretability and model robustness32. However, a limitation in existing studies remains the lack of effective integration of domain knowledge into the feature selection process, often resulting in reduced interpretability, increased data complexity, and limited model robustness. Moreover, the substantial parameter interdependence in hybrid DL models complicates hyperparameter tuning, making it challenging to achieve robust performance across varied production data distributions. These models also often require extensive re-training to adapt to changing conditions, increasing maintenance costs and limiting their industrial application33,34.

To address these challenges, metaheuristic algorithms, such as Genetic Algorithms (GA), Particle Swarm Optimization (PSO), and Whale Optimization Algorithm (WOA), have emerged as effective tools for parameter optimization in hybrid models. WOA, in particular, has demonstrated stronger global search capability, faster convergence, and better diversity preservation compared to traditional metaheuristic methods, making it well-suited for optimizing high-dimensional hybrid DL models35.

In this context, this study develops a novel hybrid neural network model (TCN-KAN) to address these challenges. The key contributions of this research are as follows: (1) Feature Selection Strategy: A feature selection strategy guided by reservoir engineering expertise and Spearman correlation analysis to reduce input dimensionality while retaining critical production behaviors. (2) Optimization Framework: Validation of Enhanced Adaptive Learning WOA’s superior global search capability for hyperparameter optimization in hybrid neural networks. (3) Hybrid Model Design: Development of the TCN-KAN model, which effectively captures long-term temporal dependencies and nonlinear relationships in oil production data, demonstrating improved forecasting accuracy and robustness.

Methodology

Proposed model

The structural design of the proposed TCN-KAN neural network, as illustrated in Fig. 1, outlines the key components and data flow of the model. The TCN layer first extracts the spatio-temporal features affecting the well production from the input layer, and then inputs them into the KAN layer. The KAN neurons then perform an approximation and recombination to predict the production rate. The entire model is optimized during the training process using the improved Whale Optimization Algorithm.

Fig. 1
figure 1

Structure of TCN-KAN.

TCN-KAN model

Figure 2 illustrates the transition from traditional Multi-Layer Perceptrons (MLP) to Kolmogorov–Arnold networks (KAN). Rooted in the Kolmogorov–Arnold representation theorem, this evolution represents a fundamental shift in neural network design. The theorem states that any multivariable continuous function can be represented as a finite sum of continuous single-variable functions and their compositions. In simple terms, it decomposes complex nonlinear problems into a linear combination of simpler nonlinear functions.

Fig. 2
figure 2

Comparison of KAN and MLP structures36.

The Kolmogorov–Arnold representation theorem is as follows:

$$\begin{array}{*{20}c} {f\left( x \right) = \mathop \sum \limits_{q = 1}^{2n + 1} \phi_{q} \left( {\mathop \sum \limits_{p = 1}^{n} \phi_{q,p} \left( {x_{q} } \right)} \right)} \\ \end{array}$$
(1)

where \({\phi }_{q}\) and \({\phi }_{q,p}\) are continuous functions learned during training. This decomposition reduces complex multivariable interactions into manageable one-dimensional functions, forming the backbone of KAN. Nonlinear layers define the complexity of these functions, while linear combination layers determine the number of summation terms.

Unlike traditional neural networks, which use neuron-specific activation functions prone to redundancy and overfitting, KAN employs shared B-Spline activation functions. B-Splines are smooth, flexible piecewise polynomial functions that approximate any continuous function with high precision while maintaining computational efficiency. Their shared mechanism minimizes parameter count, and their piecewise smooth nature ensures efficient function approximation without extensive parameter learning37,38.

Since the activation function is fitted in layers, KAN, directly addresses the need for global approximation of high dimensional problems. However, high-dimensional inputs can still pose challenges to KAN’s performance, as the complexity of the input space may exceed the network’s approximation capacity.

The oil production forecasting task has multiple features and both time-series dependencies and complex interactions. With the help of causal and dilation convolution, the coupled TCN can efficiently extract temporal patterns, thus complementing the time-series fitting capability of KAN; while feature engineering incorporating reservoir engineering experience can greatly reduce the input dimensions, thus guaranteeing the nonlinear approximation capability of KAN.The capability of TCN-KAN is further optimized by meta-heuristic algorithms.

Enhanced adaptive learning whale optimization algorithm

The Whale Optimization Algorithm (WOA) is a heuristic algorithm proposed by Seyedali Mirjalili and Andrew Lewis in 2016, simulating the foraging behavior of humpback whales to optimize objective functions. Figure 3 illustrates a schematic diagram of a humpback whale using a bubble attack to gather prey. WOA is known for its simplicity, minimal parameter settings, and strong optimization performance. However, it also has drawbacks, such as low convergence precision and a tendency to get trapped in local optima39.

Fig. 3
figure 3

A humpback whale using a bubble attack to round up prey39.

To address these limitations, an Enhanced Adaptive Learning Whale Optimization Algorithm (EALWOA) is introduced. EALWOA aims to improve the search capability and convergence speed of the traditional WOA by incorporating adap- tive parameter adjustment, a learning mechanism, and a hybrid strategy. Specifically, EALWOA includes the following enhancements:

  1. 1.

    Adaptive Parameter Adjustment: This mechanism helps in broad exploration during the initial stages and focuses on fine-tuned exploitation in later stages. It adjusts parameters dynamically to balance exploration and exploitation throughout the optimization process.

  2. 2.

    Learning Mechanism: By introducing a mechanism where individual solutions learn from randomly selected peers, EALWOA increases randomness and enhances global search capabilities. This mechanism diversifies the search paths and helps escape local optima40.

  3. 3.

    Hybrid Strategy: EALWOA combines random search strategies with precise optimization techniques. During the global search phase, it employs random search to explore the solution space extensively. In the local search phase, it uses strategies such as prey encirclement and bubble net attack to refine the search around promising solutions, thus improving the speed and precision of convergence41.

To better understand the improved algorithm used in this study, the pseudo-code of the EALWOA is provided in Algorithm 1, and Fig. 4 shows the workflow of the whale algorithm to optimize the KAN neural network. The pseudo-code outlines the step-by-step procedure of EALWOA, detailing how it initializes whale positions, updates parameters, and incorporates adaptive learning mechanisms to improve convergence and accuracy.

Algorithm 1
figure a

Enhanced adaptive learning whale optimization algorithm (EALWOA)

Fig. 4
figure 4

The workflow of EALWOA optimized hyper-parameters of KAN.

Reference models

The proposed WOA-TCN-KAN network is compared with several classical deep learning networks to evaluate its performance and advantages. A brief description of the deep learning networks used for comparison in this study is provided below.

RNN network

Recurrent Neural Networks (RNNs) are a class of Artificial Neural Networks (ANNs) specifically designed for modeling temporal dependencies in sequential data42,43,44. They have found wide applications in fields such as time series analysis, natural language processing, and medical research44,45,46. Figure 5a illustrates the basic structure of an RNN, where hidden states are connected across time steps to capture sequential patterns.

Fig. 5
figure 5

Comparison of different RNN structures: (a) Simple RNN, (b) LSTM, (c) GRU.

Its operations can be described by the following equations:

$$\begin{array}{*{20}c} {h_{t} = \sigma \left( {W_{xh} x_{t} + W_{hh} h_{t - 1} + b_{h} } \right),} \\ \end{array}$$
(2)
$$\begin{array}{*{20}c} {y_{t} = W_{hy} h_{t} + b_{y} .} \\ \end{array}$$
(3)

Here, ht represents the hidden state at time t, updated based on the input xt and the previous hidden state ht−1. Wxh and Whh are weight matrices connecting the input and hidden states, respectively, while bh is the bias term. The activation function σ, commonly tanh or ReLU, introduces non-linearity. The output yt is computed using the hidden state and the output weight matrix Why, along with the bias by.

Despite their simplicity and effectiveness in capturing short-term dependencies, RNNs suffer from significant drawbacks. Specifically, they are prone to vanishing and exploding gradient problems during training, particularly when modeling long sequences47,48. These issues hinder the network’s ability to learn long-term dependencies, as gradients decay or explode during backpropagation through time49,50. To address these limitations, advanced architectures such as Long Short-Term Memory (LSTM) and Gated Recurrent Unit (GRU) networks have been developed, offering improved capabilities for managing long-term dependencies and ensuring training stability. In this study, RNN serves as a baseline model for predicting oil production.

LSTM and GRU networks

To address the gradient-related issues in RNNs and capture patterns in long sequences of production data, Long Short-Term Memory (LSTM) and Gated Recurrent Unit (GRU) networks were developed.

The LSTM network introduces three gating mechanisms: the forget gate ( ft), input gate (it), and output gate (ot), which enable effective management of long-term dependencies while mitigating the vanishing gradient problem. The key equations are as follows:

$$\begin{array}{*{20}c} {f_{t} = \sigma \left( {W_{f} \cdot \left[ {h_{t - 1} ,x_{t} } \right] + b_{f} } \right),i_{t} = \sigma \left( {W_{i} \cdot \left[ {h_{t - 1} ,x_{t} } \right] + b_{i} } \right)} \\ \end{array}$$
(4)
$$\tilde{C}_{t} \begin{array}{*{20}c} { = tanh\left( {W_{C} \cdot \left[ {h_{t - 1} ,x_{t} } \right] + b_{C} } \right),C_{t} = f_{t} *C_{t - 1} + i_{t} *\tilde{C}_{t} } \\ \end{array}$$
(5)
$$\begin{array}{*{20}c} {o_{t} = \sigma \left( {W_{o} \cdot \left[ {h_{t - 1} ,x_{t} } \right] + b_{o} } \right),h_{t} = o_{t} *tanh\left( {C_{t} } \right)} \\ \end{array}$$
(6)

The GRU network simplifies the LSTM architecture by merging the forget and input gates into a single update gate (zt) and introducing a reset gate (rt). This reduces the model’s complexity while maintaining competitive performance. The GRU equations are as follows:

$$z_{t} = \sigma \left( {W_{z} \cdot \left[ {h_{t - 1} ,x_{t} } \right] + b_{z} } \right),r_{t} = \sigma \left( {W_{r} \cdot \left[ {h_{t - 1} ,x_{t} } \right] + b_{r} } \right)$$
(7)
$$\begin{array}{*{20}c} {\tilde{h}_{t} = \tanh \left( {W_{h} \cdot \left[ {r_{t} *h_{t - 1} ,x_{t} } \right] + b_{h} } \right),h_{t} = \left( {1 - z_{t} } \right)*h_{t - 1} + z_{t} *\tilde{h}_{t} } \\ \end{array}$$
(8)

While both LSTM and GRU can capture long-term dependencies in sequential data, GRU offers computational efficiency due to its simpler structure. Figure 5b, c illustrate their respective architectures. GRUs are particularly advantageous in scenarios with limited computational resources or shorter input sequences51. This study utilizes LSTM and GRU networks as improved baseline models for RNN networks.

TCN network

Temporal Convolutional Networks (TCNs) are specifically designed for sequence modeling tasks and employ causal convolu- tions and dilated convolutions to capture temporal dependencies efficiently. The structure of TCN is illustrated in Fig. 6. In TCN, causal convolutions ensure that the output at time step t depends only on inputs from time t and earlier, maintaining a unidirectional flow of information. This property aligns with the temporal nature of sequential data and prevents information leakage from future time steps52,53.

Fig. 6
figure 6

Structure of TCN.

Dilated convolutions play a critical role in TCN by enabling the network to capture long-term dependencies with fewer layers. By introducing gaps between input samples during the convolution operation, the effective receptive field grows exponentially with the number of layers, allowing TCN to model long-range dependencies efficiently54,55. This makes TCN particularly suitable for tasks like time series forecasting, where capturing both short- and long-term temporal patterns is essential. In this study, TCN serves as an advanced baseline model to capture timing dependencies.

CNN-LSTM network

Convolutional Neural Networks (CNN) are powerful deep learning models designed for extracting spatial features from high-dimensional data. They consist of convolutional layers, pooling layers, and fully connected layers. The convolutional layers apply filters to identify patterns such as edges and textures, while pooling layers reduce spatial dimensions to decrease computational complexity and prevent overfitting. Fully connected layers integrate extracted features to produce the final output, typically for classification or regression tasks56. CNNs are widely recognized for their ability to capture spatial hierarchies in data and their computational efficiency through weight sharing and local connectivity57 (Fig. 7).

Fig. 7
figure 7

Structure of CNN-LSTM.

The CNN-LSTM model combines the spatial feature extraction capability of Convolutional Neural Networks (CNN) with the temporal sequence modeling strength of Long Short-Term Memory (LSTM) networks. In this hybrid architecture, CNN first extracts spatial features from the input data, which are then fed into the LSTM component to capture temporal dependencies. This design enables CNN-LSTM to effectively model spatiotemporal correlations, making it a classical architecture widely used in tasks involving both spatial and temporal information. In this study, CNN-LSTM is used as a baseline hybrid model for production forecast.

Well production forecasting

Data description

The Volve oil field, operated by Equinor, is located in the North Sea, approximately 200 km west of Stavanger, Norway. Discovered in 1993, the field started production in 2008 and ceased operations in 2016. The Volve field was developed using a floating production storage and offloading (FPSO) unit and included both production wells and injection wells. The field is known for its challenging reservoir conditions, which provided a valuable opportunity to test advanced reservoir management and production optimization techniques. Equinor released a comprehensive dataset from the Volve field in 2018, aimed at fostering research and development in reservoir engineering and data analytics. This dataset includes seismic data, well logs, reservoir simulation models, and actual production data, making it an invaluable resource for academic and industry researchers alike58.

The public dataset released by Equinor (2018) for research purposes includes a variety of data, such as seismic, logging, and reservoir simulation. The actual production data comprises five production wells and two injection wells: NO15/9-F-1 C, NO15/9-F-11H, NO15/9-F-12H, NO159-F-14H, NO15/9-F-15D, NO15/9-F-4AH and NO15/9-F-5AH. The data for each well are shown in Table 1. To investigate the effect of the injection and extraction relationship on reservoir production, data from three wells—NO159-F-14H, NO15/9-F-4AH, and NO15/9-F-5AH—were selected, resulting in a dataset of 1903 data points for this study. The daily production profile for the NO159-F-14H production well is shown in Fig. 8.

Table 1 Data provided for each well in the Volve field.
Fig. 8
figure 8

Oil production of the well NO159–F–14H.

Data preprocessing and feature engineering

Spearman correlation analysis is a feature selection method frequently employed in industrial applications to enhance decision- making processes, improve model performance, and effectively manage data complexity59,60,61,62,63. In Fig. 9, we use a feature selection strategy that combines Spearman correlation analysis with reservoir engineering expertise. Spearman correlation was utilized as a preliminary screening tool to evaluate nonlinear relationships between features and the target production rate. Subsequently, reservoir engineering knowledge was employed to design and select features capable of capturing operational behaviors and reservoir dynamics, as well as to eliminate redundant features.

Fig. 9
figure 9

Feature selection based on Spearman Correlation.

By integrating Spearman analysis with domain-specific knowledge, this approach not only minimizes input dimensionality but also ensures the statistical and physical relevance of the selected features, aligning with the objectives of model simplification and robust performance. Compared to post hoc importance ranking methods like Random Forest or Lasso regression, which rely on specific model structures and hyperparameters, the proposed method offers stronger interpretability and robustness, particularly in scenarios requiring a balance between dimensionality reduction and feature relevance.

To determine the relationship between injection and production wells, the data from injection wells were first processed to calculate the total injection volume and average running time for NO15/9-F-4AH and NO15/9-F-5AH. Additionally, a time lag term was introduced to capture the delayed effect of choke size and runtime on production. The details of the interaction item design are shown in Table 2.

Table 2 Interaction item design.

In Fig. 9, the feature effect_onNext_oilRate represents the interaction of choke size and runtime at the previous time step. This selected interaction term was specifically designed to capture the delayed influence of operational behaviors, such as well shut-in or production parameter adjustments, on subsequent oil production. Notably, the correlation analysis shows a significant increase in correlation for the feature effect_onNext_oilRate after combining the throttle size and runtime, effectively capturing the impact of pressure recovery following shut-in operations on production rates. Moreover, the average annular pressure (AVG_ANNULUS_PRESS) is influenced by the choke size percentage (AVG_CHOKE_SIZE_P), which is already incorporated as part of the interaction term AVG_ANNULUS_PRESS in the selected features. To avoid feature redundancy and potential interference between correlated inputs, AVG_ANNULUS_PRESS was excluded from the final feature set.

Additionally, the two water injection-related features, AVG_WI_ON_STREAM_HRS and TOT_BORE_WI_VOL, are in- tended to reflect the impact of water injection in flooding and replenishing reservoir pressure to enhance production. However, their correlation coefficients with the production rate are not significant. A possible explanation is that the pressure recovery induced by water injection does not immediately manifest in daily oil production, and the heterogeneity of the reservoir may lead to the establishment of preferential flow pathways during the flooding process, causing partial pressure dissipation. For these reasons, injection-related variables were excluded from the final model despite their theoretical relevance.

The other three selected features are AVG_DOWNHOLE_TEMPERATURE, BORE_GAS_VOL, and BORE_OIL_VOL. The feature AVG_DOWNHOLE_TEMPERATURE reflects fluid properties such as oil viscosity, gas-oil ratio (GOR), and oil–water relative permeability. In field operations, maintaining a controlled and low GOR is a common practice to stabilize reservoir pressure and enhance oil recovery. By incorporating GOR-related inputs, the model indirectly captures the mechanisms through which operational adjustments control production parameters and optimize production. Changes in GOR indicate long-term reservoir pressure trends, while the pressure effects of Enhanced Oil Recovery (EOR) operations are effectively controlled by the interaction term, reducing the need for pressure-related features in the final model. The final selection of input features is shown in Table 3.

Table 3 Selected input and output data for data-driven modeling.

To eliminate the effect of data scale on the model, the dataset was normalized to the interval [0,1]. Normalization improves the speed of convergence and reduces training error. The formula for normalization is:

$$X_{i,normalized} = \frac{{x_{i} - x_{min} }}{{x_{max} - x_{min} }}$$
(9)

The normalized data was then processed into a sliding window format, transforming the prediction from a time series problem to a supervised learning problem. A schematic of the data processing using the sliding window is provided in Fig. 10. The normalized and reconstructed data were divided into training and test sets in an 8:2 ratio.

Fig. 10
figure 10

Sliding window mechanism.

Evaluation criteria

In this study, the coefficient of determination (R2) and the root mean square error (RMSE) were used as indicators to evaluate the prediction of precision with the following expressions:

$$\begin{array}{*{20}c} {R^{2} = \frac{SSR}{{SST}} = \frac{{\mathop \sum \nolimits_{i = 1}^{n} \left( {\hat{y}_{i} - \overline{y}} \right)^{2} }}{{\mathop \sum \nolimits_{i = 1}^{n} \left( {y_{i} - \overline{y}} \right)^{2} }}} \\ \end{array}$$
(10)
$$\begin{array}{*{20}c} {RMSE = \sqrt {\frac{1}{n}\mathop \sum \limits_{i = 1}^{n} \left( {y_{i} - \hat{y}_{i} } \right)^{2} } } \\ \end{array}$$
(11)

In the above equations, n denotes the number of predicted points, yi denotes the actual value, and \(\hat{y}_{i}\) denotes the predicted value at the ith point.

Seven deep learning networks were built to predict well production, and the predictive performance of the models was evaluated using predefined metrics. The significance of the two metrics is as follows: R2 indicates the model’s ability to explain the variance of the data. Its value ranges from 0 to 1, with values closer to 1 indicating a better fit of the model. RMSE measures the squared mean of the differences between the predicted and actual values. Due to the squaring operation, RMSE is more sensitive to larger errors, making it effective in penalizing large deviations.

Oil production prediction

The development of the TCN-KAN network involved careful hyperparameter selection and optimization to ensure robust and accurate predictions. The hyperparameter ranges for the model were determined based on prior methodologies and recommendations from relevant literature64,65,66,67,68, as well as domain expertise. Table 4 summarizes the hyperparameter search space for the TCN-KAN network, while Table 5 details the parameter settings for the three metaheuristic optimization algorithms employed in this study. Fairness across the three algorithms was ensured by using consistent settings for population size, iteration limits, and convergence thresholds.

Table 4 Search space for hyperparameters of the TCN-KAN network.
Table 5 Parameter settings for different optimization algorithms.

For the TCN-KAN model, the kernel size specifies the receptive field dimensions, determining the size of the local features captured by the network. The number of kernels represents the total number of convolutional filters applied in a layer, which governs the depth of feature extraction. Additionally, ‘Hidden layers’ and ‘Number of neurons’ refer to the number of layers and neurons in the KAN component, respectively. These parameters influence KAN’s nonlinear approximation capacity, allowing it to model complex relationships in the input data effectively.

The performance evaluation of the TCN-KAN model was conducted using two key metrics, R2 and RMSE, which provide a comprehensive assessment of predictive accuracy and error magnitude. Additionally, visual analyses, including predicted vs. true production profiles, intersection plots, and residual plots, were employed to investigate the model’s behavior and its ability to generalize across various regions of the data distribution. These evaluation methods aim to identify the model’s strengths and limitations in capturing the underlying patterns of oilfield production data.

Results and discussion

The study developed 10 neural network models to predict the daily oil production of the F-14 well, including multiple variants of the proposed TCN-KAN framework, alongside traditional and hybrid deep learning models. The objectives of this section are: (1) to validate the effectiveness of the feature engineering approach by comparing the performance of models using all features versus selected features; (2) to evaluate the effectiveness of the proposed Whale Optimization Algorithm (EALWOA) by comparing fitness curves with GA and PSO in optimizing the proposed models; (3) to analyze the proposed model’s ability to capture temporal dependencies and nonlinear relationships, as well as its robustness, through production profiles, cross plots, and residual plots. This section presents the experimental results to assess the proposed methods in detail. It provides a comprehensive performance analysis, highlighting the advantages of the WOA-TCN-KAN model. Finally, the section discusses potential applications in reservoir management and explores future research directions.

Table 6 validates the effectiveness of the proposed feature selection method, which integrates Spearman correlation analysis with reservoir engineering expertise to identify key variables relevant to production. Across all models, the performance with selected features is comparable to or slightly lower than that with all features, indicating that the selected features effectively retain critical production information while significantly simplifying the model by reducing dimensionality. For instance, TCN-KAN achieved R2 = 0.9407 and RMSE = 12.45 with selected features, which is very close to its performance with all features (R2 = 0.9487, RMSE = 12.21). This demonstrates that the selected features can effectively capture operational sensitivities and the impact of production behaviors on output.

Table 6 Performance of models on the test set for different feature sets.

The comparative performance analysis of all developed models, as shown in Table 6, highlights the superiority of the hybrid models in terms of R2 and RMSE compared to traditional architectures. This suggests their ability to capture the complex nonlinear relationships and temporal dependencies critical(a more detailed discussion will be provided in the production profile analysis section). Among these, the WOA-TCN-KAN model exhibits the best performance, achieving R2 = 0.9815 and RMSE = 9.93 with selected features, and R2 = 0.9850, RMSE = 8.95 with all features. These results reflect the model’s strong generalization capability and adaptability to reduced-dimensional features.

Table 6 also highlights the performance improvements achieved by employing optimization algorithms (GA, PSO, and WOA). The hyperparameter identified by each algorithm are summarized in Table 7. Compared to the PSO-TCN-KAN and GA-TCN-KAN models, the WOA-TCN-KAN model demonstrated superior performance, with R2 improved by 1.04% and RMSE reduced by 6.23% relative to PSO-TCN-KAN, and R2 improved by 1.76% and RMSE reduced by 14.82% compared to GA-TCN-KAN. The fitness curves shown in Fig. 11 further emphasize WOA’s adaptive capabilities. Compared to GA and PSO, WOA exhibits faster convergence and more stable iterative performance, indicating its high search efficiency in high-dimensional search spaces. Notably, WOA achieves a significant reduction in MSE during the early iterations and consistently maintains a lower MSE throughout the optimization process. In contrast, PSO and GA exhibit slower convergence and higher variability in performance across iterations. This demonstrates WOA’s superior ability to balance exploration and exploitation, leading to more effective parameter optimization and overall model improvement.

Table 7 Hyperparameters of TCN-KAN.
Fig. 11
figure 11

Fitness Curve for GA, PSO, and WOA.

Figure 12 indicates that neither the TCN model (Fig. 12e) nor the KAN model (Fig. 12f) exhibits significant advantages over traditional models in terms of alignment and scatter reduction. However, the integration of TCN and KAN in the TCN-KAN (Fig. 12g) shows a marked improvement over both individual components. The WOA-TCN-KAN model (Fig. 12h) achieves the closest alignment to the diagonal line across all production levels, reflecting its superior accuracy and generalization capabilities.

Fig. 12
figure 12

Cross plot of the actual and predicted oil production.

The residual plots in Fig. 13 compare the prediction errors across all models. Residuals close to the zero line indicate minimal error and unbiased predictions. For traditional DL models (RNN, LSTM, GRU, CNN-LSTM) and baseline models like TCN and KAN (Fig. 13a–f), residuals are more dispersed, particularly in low- and high-production regions, reflecting their limitations in capturing the nonlinear and temporal complexities of production dynamics. Patterns of overestimation and underestimation are noticeable, suggesting challenges in generalizing across varying production levels.In contrast, the WOA-TCN-KAN model (Fig. 13h) demonstrates the most compact and symmetrical residual distribution. This result shows the robustness of WOA-TCN-KAN relative to other models.

Fig. 13
figure 13

Residual plots of the models: (a) RNN (b) LSTM (c) GRU (d) CNN-LSTM (e) TCN (f) KAN (g) WOA-KAN (h) WOA-TCN-KAN.

The production forecast profiles are shown in Fig. 14. Traditional DL models (RNN, LSTM, GRU, CNN-LSTM) perform well during steady production phases but struggle with early-phase dynamics (e.g., rapid water cut increases) and late-stage complexities introduced by operational adjustments and EOR activities. These limitations highlight their inability to fully capture nonlinear and temporal dependencies in production data.

Fig. 14
figure 14

Comparison of actual and predicted daily oil production using various models.

The TCN model (Fig. 14e) addresses these issues by extracting temporal dependencies more effectively, as shown by its accurate predictions during the early production phase. However, it cannot model nonlinear interactions, especially those driven by operational adjustments. In contrast, KAN(Fig. 14f) excels at modeling nonlinear relationships, such as the delayed effects of choke size, runtime, and well shut-ins on production.

By integrating TCN’s temporal feature extraction capabilities with KAN’s nonlinear modeling strengths, the WOA-TCN- KAN model achieves superior accuracy and robustness across the entire production lifecycle (Fig. 14h). It captures both localized production adjustments, such as the timing and effects of well shut-ins, and long-term trends, including production declines during late development stages. WOA optimization further enhances parameter tuning, improving the model’s generalization and stability, and enabling it to outperform baseline models across all production phases. These results validate the efficacy of combining metaheuristic optimization with hybrid neural networks for production forecasting.

Although the results demonstrate that the WOA-TCN-KAN model achieves high accuracy in forecasting oil well production, the use of data from a single oilfield for model development and validation remains a limit of this study. The applicability of the proposed model to different geological conditions and operational contexts requires further investigation.

Theoretically, the temporal feature extraction capability of TCN and the nonlinear modeling strength of KAN suggest that the proposed approach could be generalized to reservoirs with diverse characteristics and fluid properties. By leveraging these strengths, the model can provide accurate production forecasts and support decision-making in reservoir management. Potential applications include, but are not limited to:

  • Injection rate optimization: The model can quantify the short- and long-term impacts of EOR techniques, such as water and gas injection, on production rates, providing a basis for optimizing injection strategies.

  • Production decline prediction: By capturing trends in oil production over time, the TCN-KAN model can help identify potential production decline points, enabling operators to adjust development strategies proactively.

  • Dynamic production adjustments: Based on model predictions, operators can implement more precise production adjustments, such as optimizing shut-in timing and duration or adjusting choke sizes, to balance short-term production targets with long-term recovery goals.

  • Production anomaly detection: By monitoring residual changes between predicted and actual values, the model can identify potential production anomalies, such as equipment failures or water breakthrough.

These applications highlight the model’s predictive capabilities and its potential practical value in complex oilfield development scenarios. Future research will shift focus from improving predictive performance on datasets to quantifying the economic impact of various EOR techniques. This will involve using real-world datasets from multiple oilfields or synthetic data generated through numerical reservoir simulations. Combined with a domain knowledge-guided feature engineering and hybrid modeling framework, the objective is to provide reliable insights into reservoir management and serve as a tool for optimizing production strategies and economic outcomes.

Conclusion

This study proposed a novel hybrid model, TCN-KAN, integrating Temporal Convolutional Networks (TCN) and Kolmogorov–Arnold networks (KAN) to address the challenges of oil well production forecasting. By combining TCN’s temporal feature extraction capabilities with KAN’s nonlinear approximation strengths, the TCN-KAN model effectively captured spatio- temporal dynamics and nonlinear interactions in production data, achieving superior predictive accuracy (R2 = 0.9815, RMSE = 9.93) with fewer input features.

Feature selection guided by reservoir engineering expertise and Spearman correlation analysis ensured that the selected features retained essential production information while reducing dimensionality. The inclusion of a modified Whale Optimiza- tion Algorithm (WOA) enhanced hyperparameter tuning, further improving model robustness. Comparative experiments and ablation studies validated the effectiveness of the proposed architecture and its individual components.

Beyond predictive accuracy, the TCN-KAN model provides interpretability by separating predictions properties in time and nonlinear interactions to provide actionable insights into operating parameters and reservoir characteristics. This makes it a practical tool for optimizing reservoir management and decision making.

While promising, this study’s scope is limited to a single oilfield, and future work should validate the model’s generalizability to diverse geological and operational conditions. Further exploration of the economic implications of the model could also strengthen its value in oilfield development strategies.

In summary, the TCN-KAN model demonstrates a powerful and interpretable approach to oil production forecasting, balancing accuracy, computational efficiency, and practical applicability.