NERHF: a hybrid machine learning-driven efficient credit risk control framework

Wei, Lin; Dong, Jiyang; Yu, Hanyue

doi:10.1038/s41598-025-30905-6

Download PDF

Article
Open access
Published: 01 December 2025

NERHF: a hybrid machine learning-driven efficient credit risk control framework

Lin Wei¹,
Jiyang Dong² &
Hanyue Yu¹

Scientific Reports volume 16, Article number: 1170 (2026) Cite this article

2823 Accesses
3 Citations
2 Altmetric
Metrics details

Subjects

Abstract

As a core part of the financial industry, credit operations are accompanied by significant risks. Therefore, accurate credit risk control is crucial for financial institutions’ lending decisions and overall risk management. In this paper, we propose a hybrid machine learning framework (Neural network-Ensemble learning-Reinforcement learning Hybrid Framework, NERHF) for efficient credit risk control. The framework utilizes neural network algorithms to extract features from credit data, enhancing the accuracy and robustness of credit risk prediction. Further, based on the extracted features, ensemble learning algorithms are employed for credit risk prediction. Finally, the improved deep reinforcement learning algorithm Pre-DDQN is applied to generate optimal credit risk control strategies for different combinations of key credit indicators, aiming to mitigate default risks. Experimental results show that NERHF demonstrates significant advantages in credit risk prediction, especially when using recurrent neural networks for feature extraction in conjunction with lightweight gradient boosting machine algorithms. Additionally, the Pre-DDQN algorithm outperforms comparative algorithms in credit risk control, highlighting its potential for practical applications.

Transforming credit risk evaluation in digital lending from black box models to transparent decisions

Article Open access 13 May 2026

Research on credit risk of listed companies: a hybrid model based on TCN and DilateFormer

Article Open access 21 January 2025

Enhancing transparency and fairness in automated credit decisions: an explainable novel hybrid machine learning approach

Article Open access 24 October 2024

Introduction

Credit operations are a crucial part of the financial sector, enabling individuals and businesses to access funds to support a variety of needs. However, credit operations are also associated with significant risks, and defaults may result in significant financial losses to financial institutions. Therefore, accurate credit risk management, i.e., credit risk prediction and credit risk control, is essential for making lending decisions and managing the overall risk of a financial institution.

Traditional credit risk prediction methods typically rely on statistical approaches such as time-company fixed effects and time-bank fixed effects¹, as well as Bayesian updating methods², to analyze financial statements and credit histories, and implement credit risk control based on credit ratings³. In recent years, with the rise of machine learning technologies, the field of credit risk prediction and control has undergone revolutionary changes. Neural networks, ensemble learning, and deep reinforcement learning techniques have demonstrated enormous potential in improving the accuracy and efficiency of credit risk prediction and control^4,5.

Classical neural network algorithms such as Deep Neural Networks (DNN), Recurrent Neural Networks (RNN) and Convolutional Neural Networks (CNN) have have demonstrated a strong ability. This paper explores the application of neural networks to credit datasets. The feature extraction capabilities of neural networks, such as the ability to handle high-dimensional data and the extraction of nonlinear discriminative features, can be applied to credit datasets. By exploiting the power of neural networks, we can identify patterns and relationships in credit data that may be difficult to observe using traditional methods. On the other hand, in the context of credit risk prediction, methods such as Random Forest (RF), eXtreme Gradient Boosting (XGBoost), Light Gradient Boosting Machine (LightGBM), and classical ensemble learning algorithms such as Categorical Boosting (CatBoost) can achieve better performance than any single model by combining the prediction results of multiple models. Ensemble learning methods provide a powerful framework for credit risk prediction by improving accuracy, reducing overfitting, enhancing robustness, handling unbalanced data and maintaining interpretability. By harnessing the collective intelligence of multiple models, ensemble approaches can provide more reliable and comprehensive credit risk predictions.

However, although neural networks and ensemble learning can accurately predict credit risk, suitable mechanisms are needed to control credit risk when the predicted credit risk is large. Deep reinforcement learning algorithms such as the Double Deep Q Network (DDQN) algorithm, a branch of machine learning that combines the advantages of reinforcement learning and deep neural networks, offer a promising approach to solving this problem. By training reinforcement learning agents to optimize their credit risk control strategies through trial and error, deep reinforcement learning can generate credit risk control strategies that minimize default risk.

This paper proposes a hybrid machine learning framework named NERHF (Neural network-Ensemble learning-Reinforcement learning Hybrid Framework) as shown in Fig. 1, which combines a neural network algorithm for credit data feature extraction, an ensemble learning algorithm for credit risk prediction and a deep reinforcement learning algorithm for credit risk control. NERHF first extracts meaningful features from credit data sets using neural networks, and further, applies an ensemble learning algorithm to predict credit risk based on the extracted features. Finally, based on the credit risk prediction results, the improved DDQN algorithm Pre-DDQN (Prediction Double Deep Q Network) is proposed to formulate an optimal credit risk control strategy to reduce the default risk.

The main work of this paper includes (1) proposing the NERHF framework, based on hybrid machine learning algorithms, to realize the credit risk control process from risk prediction to risk control; (2) mining a personal credit risk prediction scheme with RNN as the feature extractor and LightGBM as the classifier; (3) innovating the deep reinforcement learning algorithms to be applied to the personal credit risk control problem by proposing the improved algorithm Pre-DDQN, and constructed a credit risk control strategy formulation mechanism for different combinations of key credit indicators.

Relevant literature

Feature extraction based on neural networks

The development of neural networks has completely transformed the field of feature extraction, providing powerful tools for dimensionality reduction and information extraction. This paper focuses on the application of neural networks in feature extraction.

Several neural network architectures for feature extraction and data projection have been proposed in the literature⁶. These networks use adaptive learning, making them suitable for environments with changing pattern distributions. Literature⁷ describes a generalized discriminant analysis method (GerDA) for feature extraction using DNNs. GerDA learns nonlinear transformations to extract optimal discriminative features. Literature⁸ evaluated pre-trained CNNs for feature extraction applications in iris recognition, highlighting the effectiveness of deep learning. Literature⁹ proposed a regularized deep feature extraction method for hyperspectral image classification using CNNs, which solves the imbalance between high and limited training samples. Literature¹⁰ proposed a deep neural network architecture for network traffic classification using RNNs, which significantly improves the accuracy compared to traditional machine learning methods.

The above studies have shown that neural networks are very effective in cross-domain feature extraction. However, the application of neural networks in feature extraction for credit datasets is not well studied in the current literature. The success of these methods in other domains shows their potential in credit risk information extraction.

Credit risk prediction based on ensemble learning

Credit risk prediction is a key aspect of credit business management, and ensemble learning has become a powerful method for improving prediction accuracy. This paper comprehensively analyses the current status of the application of ensemble learning in credit risk prediction and discusses the methodology and practical implications of these models.

Literature¹¹ describes a deep learning integration model that integrates LSTM and AdaBoost and performs well in credit risk prediction compared to other models. Literature¹² provides a comparative assessment of the performance of ensemble algorithms for credit scoring, emphasizing that ensemble learning typically outperforms individual learners, with RF and XGBoost performing particularly well. Literature¹³ proposes a model that combines oversampling techniques and ensemble learning algorithms to address category imbalance in small business credit risk prediction. Literature¹⁴ compared ensemble learning methods and neural networks in risk analysis and found that boosting outperformed other methods, including neural networks with various activation functions. Literature¹⁵ applied RF methods to micro-enterprise credit risk modelling, emphasizing their efficiency and interpret ability. The study also emphasized the importance of non-traditional variables in improving classification accuracy.

The above studies show that ensemble learning techniques significantly improve the predictive accuracy of credit risk models. However, the gap still exists when the credit dataset is complex and contains a large number of variables.

In recent years, with the increasing complexity of machine learning models, their “black box” characteristics have become a major obstacle to deploying applications in high-risk fields such as finance. Therefore, Interpretable Machine Learning has received increasing attention in credit risk assessment. These methods, such as SHAP (Shapley Additive exPlans) and LIME (Local Interpretable Model agnostic Explanations), aim to reveal the decision-making basis of complex models such as ensemble learning and deep learning, helping risk managers understand which features have a key impact on prediction results, thereby enhancing the transparency and credibility of the model¹⁶. Although this article focuses on improving prediction and control performance through hybrid models, introducing interpretability techniques in future work will be an important direction to ensure that the model decision-making process is reasonable, trustworthy, and compliant with regulatory requirements.

Credit risk control based on reinforcement learning

Credit risk control is another key aspect of managing credit operations. With the advent of reinforcement learning, studies have been conducted to use it in the field of credit analysis.

Literature¹⁷ proposes a deep Q-network algorithm based on Dynamic Reward Function with Confusion Matrix (DQN-CMDRF) for customer credit scoring. The model exhibits better performance than traditional classification models by dynamically adjusting the rewards to adapt to environmental changes. Literature¹⁸ explores the application of reinforcement learning in optimizing credit limit adjustment, balancing revenue maximization and risk minimization. They employed an offline learning strategy using data from a super-application and showed that double-Q learning with optimized hyperparameters can generate effective strategies. Literature¹⁹ addresses the problem of unbalanced credit risk prediction for SMEs in supply chain finance using Deep Reinforcement Learning (DRL-risk). The model prioritizes learning from instances that may cause significant financial losses and outperforms the baseline approach on various performance metrics. Literature²⁰ provides a comprehensive review of deep reinforcement learning methods in economics, emphasizing the scalability and robustness of DRL in dealing with complex economic systems, including credit risk prediction. Literature²¹ investigates the use of reinforcement learning to optimize acceptance thresholds for credit scoring models. Its dynamic system adjusts the thresholds in real time and outperforms traditional cost-sensitive optimization methods.

The above studies show that reinforcement learning algorithms have been studied in the field of credit analysis, but they are yet to be researched in credit risk control. This paper will provide new insights into the theoretical understanding and practical application of reinforcement learning in credit risk control.

Research methodology

Dataset processing

The dataset used in this paper is from the Lending Club platform, which is widely used and highly recognized in the field of credit research. There are 145 variables in the dataset and the loan statuses are performing and default. After rigorous data cleaning and preprocessing, 17 variables are selected for feature extraction experiments. After performing Spearman’s correlation test and Relief weight evaluation on the dataset, five key credit decision metrics, including years of employment, loan amount, instalment amount, revolving loan credit utilization and the proportion of accounts with credit card utilization over 75%, are selected for credit risk control experiments.

Credit feature extraction

Neural networks have advantages in processing complex data and dealing with complex tasks such as powerful feature extraction, strong representation, independence from human experience and strong generalization. DNNs have more hidden layers than traditional shallow neural networks. By increasing the number of layers of the neural network, the representation of the model will become more powerful and richer. RNN introduces a cyclic structure that allows information to be transmitted within the network. The hidden state of each time step is passed on to the next time step, allowing the network to capture complex, structured nonlinear dependencies between features. CNNs use a set of convolution kernels in a convolutional layer to perform a sliding window convolution operation on the input data to extract features.

DNN algorithm

Deep Neural Networks (DNNs)²² can be used to extract informative features from credit datasets and predict the likelihood of default. In this paper, a DNN is constructed with one input layer, three hidden layers and one output layer. the input layer contains five neurons, which means that the model will receive a 5-dimensional input feature vector matching the number of key credit decision metrics screened in Sect. 2.1. The target variable represents the binary outcome of credit behavior, with 0 indicating repayment and 1 indicating default.

The DNN architecture designed in this paper consists of three hidden layers, each designed to progressively capture a hierarchical representation of the input data. The first hidden layer consists of 100 neurons allowing the extraction of high-level features and patterns. Subsequent layers reduce the dimensionality of the feature space, with the second and third layers having 50 and 25 neurons, respectively, helping to abstract the increasingly complex representation. In order to introduce non-linearity and enable the network to learn complex relationships, each hidden layer contains the activation function ReLU, which alleviates the problem of vanishing gradients and accelerates model convergence.

In addition, to prevent overfitting and to enhance the model’s generalization ability, dropout regularization is applied during the training process and the dropout ratio is set to 0.2. dropout randomly deactivates a portion of neurons in each training iteration, making the learning process less dependent on specific features. This regularization technique helps to prevent the model from remembering noise in the training data and improves its generalization ability, thus improving its performance in credit risk prediction.

RNN algorithm

Long Short-Term Memory (LSTM) network is a Recurrent Neural Network (RNN)²³ that can be used to extract features. This paper proposes a business logic driven feature serialization method for static credit data. This method reorganizes the multidimensional characteristics of borrowers (such as income status, credit history, account structure, etc.) into an ordered sequence based on the inherent logic of credit approval, thereby transforming each sample into a structured multivariate sequence. This aims to use sequence models such as LSTM to capture complex nonlinear dependencies between features, in order to extract more discriminative higher-order feature representations.

The first layer of the LSTM model designed in this paper is an LSTM layer configured with 50 storage units. The LSTM layer generates outputs for each time step in the sequence instead of only outputting the result of the last time step, and is mainly used to provide additional information to the subsequent LSTM layers. Subsequently, a dropout layer with a dropout rate of 0.2 was introduced to reduce the risk of overfitting during model training.

After this, a second LSTM layer is added. This layer outputs only the hidden state of the last time step and is typically used for summarizing sequences or classification tasks. This layer is configured with 25 storage units to further optimize the functionality and prepare it for classification tasks. Again, it is followed by a dropout layer to avoid overfitting.

After the LSTM layer, a fully connected layer with 25 neurons is merged using the ReLU activation function. This layer is responsible for further compressing and non-linearly transforming the features output from the LSTM layer to help the model learn more advanced abstract features.

The final layer of the model is the output layer, with the number of neurons matched to the two types of credit behavior.

CNN algorithm

Convolutional Neural Networks (CNN)²⁴ can be used to extract features in credit data. Since credit datasets usually contain many features, the application of CNNs provides an efficient way to identify complex patterns and spatial dependencies in the data.

Convolutional layers are the basic components of the CNN architecture and are responsible for automatically extracting local features from the input data. In the context of credit data analysis, the CNN designed in this paper starts by adding two layers as convolutional layers. These layers are configured with 32 and 64 filters, respectively, with a convolutional kernel size of 3. Each convolutional layer employs a ReLU activation function, which introduces nonlinearities and enhances the expressive power of the model.

Immediately after each convolutional layer, a MaxPooling layer is attached as a pooling layer. The pool size is set to 2 to facilitate spatial downsampling of the data. The pooling layer reduces the spatial dimensionality of the data, thereby reducing the computational load while preserving key features. For sequential data such as credit data, pooling helps in extracting the essential information while minimizing the noise and enhancing the robustness and generalization of the model.

Since the fully connected layer (dense layer) requires a one-dimensional input, a spreading layer is used after the convolution and pooling layers to convert the resulting 3D data to one-dimensional. The spreading layer helps in the transition from spatial feature maps to spreading feature vectors and prepares the data for subsequent processing in the fully connected layer.

After the spreading layer, a fully connected layer consisting of 25 neurons with ReLU activation was added. This layer further refines the extracted features and facilitates learning at higher levels of abstraction. Finally, a fully connected layer with a single neuron serves as the output layer with a Sigmoid activation function.

Credit risk prediction

Based on the features extracted from the neural network model, further ensemble learning algorithms such as RF, XGBoost, LightGBM and CatBoost can be used to predict credit risk.

RF algorithm

The Random Forest (RF) algorithm²⁵ is robust in handling complex datasets and feature interactions.

RF combines a bagging strategy and a random subspace approach to construct multiple CART decision trees, constructed by recursive splitting. It Bootstrap samples from the original dataset to create n training sets, using a random subset of data for each tree. RF aggregates multiple tree predictions to reduce variance and improve generalization. Training CART trees independently resolves bias and overfitting and enhances generalization power. RF aggregates results to enhance noise immunity. Overall, RF improves prediction accuracy, stability and adaptability.

XGBoost algorithm

Extreme Gradient Boosting Algorithm (XGBoost)²⁶ is an optimized distributed gradient boosting algorithm widely used for its performance in solving regression, classification and ranking problems. XGBoost extends the gradient boosting framework by normalizing complexity and speeding up the implementation. It sequentially adds decision trees to the integration and corrects the residuals. Calculates first-order second-order derivatives of the loss function to improve approximation accuracy and efficiency. Integrating L1 and L2 regularization controls overfitting and penalizes complex models. Adopt Bagging and feature column elimination strategies to train only some features. Introduce the concept of shrinkage to reduce the weight of the new tree to ensure that each iteration is effectively close to the true value.

LightGBM algorithm

Lightweight Gradient Boosting Machine Algorithm (LightGBM)²⁷ is a gradient boosting decision tree algorithm that stands out due to its “lightweight” design. LightGBM employs a Leaf-wise growth strategy, which prioritizes the maximum gain nodes and discards the low gain nodes at an early stage to achieve deep tree growth. This improves prediction accuracy, mitigates overfitting, and optimizes computational efficiency. On credit datasets, the Leaf-wise strategy enables LightGBM to efficiently handle complex feature interactions and large datasets, maintain high accuracy, and improve scalability and speed, which is critical for real-time or large-scale financial risk assessment.

CatBoost algorithm

Classification Boosting Algorithm (CatBoost)²⁸ can directly handle categorical features, seamlessly converting them into numerical representations. CatBoost randomly groups samples, calculates the average values of target variables for samples with the same categorical feature values, and mitigates the impact of noise and low-frequency data. By assigning weight coefficients to features to ensure independence, it optimizes feature representations, enhancing model robustness and accuracy. Suitable for complex datasets such as credit datasets, CatBoost processes categorical features and optimizes their representations to improve prediction performance while simplifying preprocessing steps.

Credit risk control

Although the credit risk prediction model based on neural network feature extraction and ensemble learning can provide relatively accurate prediction results, we still need to further reduce the default risk in the actual credit business. To this end, we introduce a deep reinforcement learning algorithm that generates corresponding strategies based on different credit indicator states to guide the credit decision-making process. These strategies can be adjusted according to the actual changes in the state of credit indicators, thus further improving the performance of the whole credit risk assessment system.

Reinforcement learning is a branch of machine learning that studies how agent interact with complex and uncertain environments, perceive the state of the environment and its response to action, and continually steer themselves towards maximizing returns. The main difference between reinforcement learning and other machine learning methods, such as supervised and unsupervised learning, is that it learns by interacting with the environment, not just from data. In reinforcement learning, an agent observes the state of the environment, takes action, and is rewarded based on the results of that action. In the process, the agent learns a policy that guides it on what actions to take in each state to maximize the long-term cumulative reward.

Deep Reinforcement Learning (DRL) combines the perceptual capabilities of deep learning with the decision-making capabilities of reinforcement learning to enable agent to deal with more complex and high-dimensional state spaces and action spaces, and to learn and make decisions in more complex tasks and environments. Thus, DRL is suitable for learning from credit datasets and making credit decisions based on different states of credit decision metrics.

DRL modelling

This paper investigates DRL-based decision making for credit risk control. For five key credit indicators, four of them are used as state indicators and the other as decision indicator. For each state indicator, it is classified into three levels according to the distribution of its value size: 0, 1, and 2. 0 denotes the low value level, 1 denotes the medium value level, and 2 denotes the high value level. For descriptive purposes, we label the years of employment, loan amount, instalment amount, revolving loan credit utilization, and the proportion of accounts with credit card utilization above 75% as A, B, C, D, and E. Then, for the five key credit decision metrics, the state spaces are$\:{S}_{BCDE},\:{S}_{ACDE},\:{S}_{ABDE},\:{S}_{ABCE}$ and$\:{S}_{ABCD}$. Each state space $\:S=\{\text{0,1},2,\cdots\:,80\}$ has 81 states.

It should be noted that discretizing continuous credit indicators into three levels (low, medium and high) is necessary to simplify the complexity of the real world. This design is mainly based on the following considerations: (1) control the dimension of the state space: if finer granularity is adopted, the state space will expand rapidly, resulting in a serious “dimension disaster” for reinforcement learning agents, and it is difficult to effectively learn and converge under limited samples; (2) Enhance the robustness and explicability of the strategy: the discrete state can produce a more general and stable strategy, and the decision logic (such as “reducing” the loan limit in the “high” debt state) is easier for business personnel to understand. In the future work, it will be an important direction to explore the processing method using continuous state space.

Our goal is to find strategies on how to adapt each credit decision indicator to different states. In order to quantify changes in credit decision metrics and generate corresponding actions, we used a difference-based approach. For key credit decision metrics, each value $\:{x}_{t}$ is compared to its previous value $\:{x}_{t-1}$ and the change in value is determined by calculating the difference $\:\varDelta\:{x}_{t}={x}_{t}-{x}_{t-1}$. Based on the difference results, we define three types of actions. If $\:\varDelta\:{x}_{t}>0$, it indicates an increase in value, we designate action “2” as indicating “increase”; if$\:\varDelta\:{x}_{t}<0$, it indicates a decrease in value, we designate action “1” as “decrease”. If$\:\varDelta\:{x}_{t}=0$, the value remains the same, we designate the action “0” to mean “no change”. Then there is an action space$\:A=\{\text{0,1},2\}$ .

The goal of an agent is to find a strategy that maximizes long-term cumulative rewards. In reinforcement learning, reward is the criterion for evaluating the action of an agent. If a action is positively rewarded, it is considered “good”. On the contrary, if it receives a negative reward, it is considered “bad”. A good reward function should accurately reflect the goals pursued by the agent. In this paper, the goal of the agent is to maximize the likelihood of repayment and reduce the risk of default. Therefore, the reward is set to 1 if the loan status is repayment and 0 if the loan status is in default.

Pre-DDQN algorithm

In the field of credit decision-making, the decision-making process usually involves the combined consideration of a large number of complex factors, such as the five key credit indicators in this paper. These metrics are usually presented in the form of high-dimensional data, which poses a challenge to traditional credit decision-making methods. In order to process these high-dimensional data more effectively and make accurate credit decisions, the DDQN algorithm in deep reinforcement learning is introduced in this paper.

Reinforcement learning is based on the Bellman equation, which describes the recursive relationship of the state-action function $\:Q\left(s,a\right)$ :

$$\:\begin{array}{*{20}c} {Q\left( {s,\:a} \right) = E\left( {R_{{t + 1}} + \gamma \:\mathop {{\text{max}}}\limits_{a} Q\left( {s_{{t + 1}} ,a} \right)} \right)} \\ \end{array}$$

(1)

Where $\:{R}_{t+1}$ is the reward obtained when moving from state $\:s$ to $\:{s}_{t+1}$ after performing action $\:a$ and $\:\gamma\:$ is a discount factor indicating the importance of future rewards. Q-learning is the basic algorithm in reinforcement learning that uses Bellman’s equation to update the state-action function values:

$$\:\begin{array}{*{20}c} {Q\left( {s,\:a} \right) = Q\left( {s,a} \right) + } \\ {\:\alpha \:\left( {R_{{t + 1}} + \gamma \:\mathop {{\text{max}}}\limits_{a} Q\left( {s_{{t + 1}} ,a} \right) - Q\left( {s,a} \right)} \right)} \\ \end{array}$$

(2)

That is, the greedy method is used to select the next action$\:{a}_{t+1}$, and the action that maximizes $\:Q\left({s}_{t+1},a\right)$ is selected as the $\:{a}_{t+1}$ update state-action function.

The DDQN algorithm couples deep representation and reinforcement learning through a dual network architecture, targeting sequential decision problems in high-dimensional continuous state spaces. The core mechanism lies in decoupling action selection and action evaluation: the online network executes greedy action selection based on the current state parameterization strategy; The target network estimates the value function of the corresponding action based on the lag parameter and generates a temporal differential target. This structure uses a “selection evaluation separation” strategy to suppress Q-value overestimation bias and reduce cumulative errors during the Bellman operator iteration process. In addition, the algorithm introduces a frozen target network and slow parameter updates to effectively alleviate training instability in non-stationary environments, suppress value function oscillations caused by sampling noise and function approximation errors, and thus improve the convergence accuracy and robustness of the strategy.

DDQN introduces two Q-networks with the same structure but different parameters, an online network for decision making$\:Q(s,a,\theta\:)$ and a goal network for evaluation$\:Q{\prime\:}(s,a,{\theta\:}^{-})$. As a reinforcement learning algorithm, the natural goal of DDQN is to learn an optimal policy that maximizes the expected future payoff by acting in a given state. This is achieved by learning a Q-function that estimates the expected future payoffs when acting in a given state $\:s$ and following a strategy $\:\pi\:$:

$$\:\begin{array}{*{20}c} {Q^{{\pi \:}} \left( {s,a} \right) = E\left[ {\sum {\:_{{t = 0}}^{{\infty \:}} } \gamma \:^{t} r\left( {s_{t} ,\:a_{t} } \right)|s_{0} = s,\:a_{0} = a,\:\pi \:} \right]} \\ \end{array}$$

(3)

This paper combines credit data feature extraction as well as individual credit risk prediction, and DDQN algorithm, and proposes the Pre-DDQN algorithm, the algorithm framework schematic is shown in Fig. 2, which contains four main parts: credit risk prediction, credit risk control problem, offline training, and online decision making.

(1)
The credit risk prediction part is based on the research results in Chap. 4 of this paper, i.e., for credit data, firstly, feature extraction is performed based on the neural network model to discover the implicit laws and relationships in the credit data; and then, based on the extracted features, the ensemble learning algorithm is used to make the prediction of credit risk. The input of this part is credit data, and the output is the prediction results of repayment or default for different credit states.
(2)
The credit risk control problem part, based on the credit data to construct the actual scenario of credit risk control, that is, for different combinations of credit status, to adopt certain credit risk control strategies, and finally achieve the result of repayment or default. The credit data is divided into training set and validation set. The training set is used to provide the states (credit states) and rewards (repayment or default) for the offline training part of the deep reinforcement learning training. The validation set data is used to provide real-world scenarios for algorithm validation for the online decision-making component, i.e., to provide different credit risk states for generating credit control strategies by the online decision-making component.
(3)
The offline training part combines the input results of the credit risk prediction part to design the improved DDQN algorithm, Pre-DDQN algorithm. Pre-DDQN takes the results of credit risk prediction as the basis for adjusting the learning rate and exploration rate in the deep learning algorithm. Learning rate and exploration rate are the most important parameters in the training process of deep reinforcement learning algorithms, in which the learning rate determines the degree of acceptance of new information by the agent when updating the strategy each time; the exploration rate determines whether the agent prefers to explore new strategies or make use of the existing strategies when making decisions. Meanwhile, Pre-DDQN employs an experience playback mechanism with prioritization to ensure that the algorithm learns more from those experiences that are important for improving performance. In offline training of reinforcement learning, we assume that historical data is generated by an unknown but fixed behavioral strategy $\:\pi\:$. This strategy is based on the credit approval rules at the time and adopts corresponding credit decisions for different credit statuses. Since the probability corresponding to each decision is not recorded in historical data, we assume that the behavioral strategy is a uniform random strategy, which means that in each state, the three actions of “increase”, “decrease”, or “keep unchanged” are selected with equal probability.

The first step of Pre-DDQN is to collect experience. At each time step, the agent takes an action$\:{a}_{t}$, the environment transitions to the next state$\:{s}_{t+1}$ based on this action and returns a reward$\:{R}_{t+1}$. This transition triad is stored in the experience playback buffer along with the next state$\:{s}_{t+1}$.

The second step is experience playback, where a batch of experience values is randomly selected from the experience playback buffer$\:({s}_{t},\:{a}_{t},\:{R}_{t+1},\:{s}_{t+1})$, with t denoting the sample index in the batch. This approach breaks the temporal correlation between experiences and helps to stabilize the learning process.

The third step is to calculate the Q-value. For each sample in the batch, the action corresponding to the maximum Q-value is found first in the online Q-network:

$$\:\begin{array}{*{20}c} {a^{{max}} \left( {s_{{t + 1}} ,\theta \:} \right) = argmax_{{a_{{t + 1}} }} Q\left( {s_{{t + 1}} ,\:a_{{t + 1}} ,\:\theta \:_{t} } \right)} \\ \end{array}$$

(4)

The selected actions are then used to calculate the target Q-value as well as the error in the target network$\:{TD}_{error}$ :

$$\:\begin{array}{*{20}c} {Y_{t}^{{Pre - DDQN}} = R_{{t + 1}} + } \\ {\:\gamma \:Q^{{\prime \:}} \left( {s_{{t + 1}} ,\:argmax_{{a_{{t + 1}} }} Q\left( {s_{{t + 1}} ,\:a_{{t + 1}} ,\:\theta \:_{t} } \right),\:\theta \:^{ - } } \right)} \\ \end{array}$$

(5)

$$\:\begin{array}{*{20}c} {TD_{{error}} = Y_{t}^{{Pre - DDQN}} - Q\left( {s_{t} ,\:a_{t} ,\:\theta \:_{t} } \right)} \\ \end{array}$$

(6)

$\:{R}_{t+1}$ is the reward received at time step $\:t+1$, $\:\gamma\:$ is the discount factor, $\:Q{\prime\:}$ is the target network, $\:Q$ is the online network, $\:{s}_{t+1}$ is the next state, $\:{\theta\:}_{t}$ is the parameter of the online network and $\:{\theta\:}^{-}$ is the parameter of the target network.

Here, the Pre-DDQN algorithm adjusts the learning rate based on the credit risk prediction result. When the prediction result is repayment, the learning rate is increased so that the agent adapts faster to the new, possibly more lenient risk environment; when the prediction result is default, the learning rate is decreased so that the agent is more cautious in updating its strategy, and tends to maintain the existing, possibly more stringent risk control strategy.

$$\:\alpha \:_{t} = \left\{ {\begin{array}{*{20}c} {\alpha \:_{0} \bullet \:\eta \:_{{inc}} \:\:\:\:\:\:\:if\:\:\:\hat{y} = repayment} \\ {\:\alpha \:_{0} \bullet \:\eta \:_{{dec}} \:\:\:\:\:\:\:if\:\:\:\hat{y} = default\:\:\:\:\:\:\:} \\ \end{array} } \right.$$

(7)

$\:{\alpha\:}_{0}$ is the initial learning rate, $\:{\eta\:}_{inc}$ and $\:{\eta\:}_{dec}$ are the adjustment coefficients for learning rate. $\:\widehat{y}$ is the risk prediction result.

Further, a consideration priority playback machine is used to set a higher priority for samples with larger absolute values of $\:{TD}_{error}$ to make them easier to be sampled, thus speeding up the convergence of the model. Let the priority index parameter of the sample $\:u$ be:

$$\:\begin{array}{*{20}c} {p\left( u \right) = \left| {TD_{{error}} } \right| - \zeta } \\ \end{array}$$

(8)

$\:\zeta\:$ is a smaller number to avoid the case where the priority indicator parameter is zero. This in turn yields the sampling probability $\:P\left(u\right)$ for the sample $\:u$ and the priority weight $\:{\omega\:}_{u}$ .

$$\:\begin{array}{*{20}c} {P\left( u \right) = p\left( u \right)^{{\delta \:}} /\sum {\:_{{u^{{\prime {\kern 1pt} }} = 1}}^{U} } p\left( {u^{{\prime \:}} } \right)^{{\delta \:}} } \\ \end{array}$$

(9)

$$\:\begin{array}{*{20}c} {\omega \:_{u} = \left( {\frac{1}{U} \times \:\frac{1}{{P\left( u \right)}}} \right)^{{\beta \:}} /{\text{max}}\left( {u^{{\prime \:}} } \right)} \\ \end{array}$$

(10)

$\:U$ is the number of samples, $\:\delta\:$ is used to adjust the effect of $\:{TD}_{error}$ on priority, and $\:\beta\:$ is used to adjust the effect of priority playback on the convergence results.

The fourth step is gradient descent. Gradient descent is based on a loss function to update the parameters of the online network $\:{\theta\:}_{t}$. The loss function of the DDQN is based on the Mean Square Error (MSE) loss, which is used to measure the difference between the Q-value $\:Q({s}_{t},\:{a}_{t},\:{\theta\:}_{t})$ generated by the online network and the target value $\:{Y}_{t}^{Pre-DDQN}$ .

$$\:\begin{array}{*{20}c} {L\left( {\theta \:_{t} } \right) = \frac{1}{U}\sum {\:_{{u^{{\prime {\kern 1pt} }} = 1}}^{U} } \omega \:_{u} (Y_{t}^{{Pre - DDQN}} - Q\left( {s_{t} ,\:a_{t} ,\:\theta \:_{t} } \right))^{2} } \\ \end{array}$$

(11)

The fifth step is the target network update. Pre-DDQN needs to periodically replicate the parameters from the online network to the target network, which ensures that the data of the target network and the online network remain somewhat different and helps to stabilize the learning process. This replication process is also referred to as “soft updating” because instead of completely replacing the parameters of the target network, only some of the parameters are updated, which reduces the performance degradation caused by parameter updates. In addition, Pre-DDQN introduces an extra “hard update” mechanism, i.e., after a certain number of iterations (e.g., 10,000 iterations), a full parameter update is performed to ensure the stability and convergence speed of the learning process.

The sixth step is to select credit risk control actions based on $\:\epsilon\:-$ greedy strategy, which acts on the credit risk control problem. The action with the largest Q value is selected with the exploration rate$\:\epsilon\:$ and the action is randomly selected with the probability ($\:1-\epsilon\:$), thus retaining the exploration capability of the algorithm and avoiding the algorithm to fall into the dilemma of local optimality while making full use of the optimal solution of the algorithm.

Here, the Pre-DDQN algorithm adjusts the exploration rate based on the credit risk prediction result. When the prediction result is repayment, the exploration rate is increased so that the agent tries new and possibly more lenient risk control strategies more frequently; when the prediction result is default, the exploration rate is decreased so that the agent prefers to choose the optimal and possibly more stringent risk control strategies that are already available.

$$\:\varepsilon \:_{t} = \left\{ {\begin{array}{*{20}c} {min(\varepsilon \:_{0} \bullet \:\phi \:_{{inc}} ,\:\varepsilon \:_{{max}} )\:\:\:\:\:if\:\:\:\hat{y} = repayment} \\ {\:\varepsilon \:_{0} \bullet \:\phi \:_{{dec}} \:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:if\:\:\:\hat{y} = default\:\:\:\:\:\:\:} \\ \end{array} } \right.$$

(12)

$\:{\epsilon\:}_{0}$ is the initial exploration rate, $\:{\phi\:}_{inc}$ and $\:{\phi\:}_{dec}$ are the adjustment coefficients for exploration rate. $\:{\epsilon\:}_{max}$ is the upper bound of exploration rate.

Through these steps, Pre-DDQN is able to effectively learn the Q function, dynamically adjust the algorithm training to prefer strict or loose strategies according to the credit risk prediction results, and reduce the problem of overestimation in Q estimation, which makes Pre-DDQN perform well in the credit risk control task.

(4)
The online decision part takes the optimal control strategy model trained in the offline training part as input, and adaptively formulates optimal credit risk control actions for different combinations of credit states.

Experimental analysis

In this section, the experimental design, assessment metrics and experimental results are described in detail to verify the effectiveness of the NERHF credit risk control framework proposed in this paper.

Experimental design

The dataset used in this paper is from the Lending Club platform, covering 277,856 loan data from 2019 to 2020. Given the imbalanced nature of the LendingClub dataset, where default cases are the minority class, we employed the Synthetic Minority Over-sampling Technique (SMOTE) on the training set to mitigate potential bias in the classifiers. This preprocessing step was applied prior to all feature extraction and model training procedures to enhance the model’s ability to identify default instances.

In the credit risk prediction stage, three neural network architectures are used to extract features of credit metrics, including DNN, RNN, and CNN. These features are then fed into four ensemble learning algorithms, including RF, XGBoost, LightGBM, and CatBoost, for predicting credit risk. To ensure optimal performance of all models, we used a combination of grid search method and ten-fold cross validation method for hyperparameter tuning. The following Table 1 summarizes the key parameter values optimized by each algorithm.

Table 1 Best parameters of ensemble algorithms.

Full size table

In the credit risk control stage, the Pre-DDQN algorithm that considers the priority playback mechanism is used to formulate the strategy. Three experiments are designed in this paper.

Experiment 1: Four ensemble learning algorithms and three neural network models were tested. Specifically, the prediction performance of 16 combinations was compared. Four base combinations were RF, XGBoost, LightGBM and CatBoost with the original dataset, and the other 12 combinations were 4 ensemble learning algorithms extracting features from DNN, RNN, and CNN, respectively. The first performance metric is accuracy (Acc), which is the ratio of correctly predicted instances to the total number of instances. The second performance metric is precision (Pre), which is the ratio of correctly predicted positive instances to the total number of predicted positive instances, and is used to measure the model’s ability to avoid false positives. The third performance metric is Recall (Rec), which is the ratio of correctly predicted positive instances to the total number of actual positive instances and measures the ability of the model to find all positive instances. The fourth performance metric is the F1 score, which is the reconciled mean of precision and recall, providing a balanced measure of precision and recall. This article also introduces a series of more in-depth evaluation indicators and visualization tools: ROC curve and AUC-ROC: the subject working characteristic curve depicts the trade-off relationship between true positive rate and false positive rate. The area under the curve (AUC-ROC) provides a measure of the model’s overall ability to distinguish between default and non default. PR curve and AUC-PR: The precision recall curve focuses on the predictive performance of default. The area under the curve (AUC-PR) measures the overall balance achieved by the model between accuracy and recall. Brier score: The consistency between the predicted default probability of the model and the actual frequency of default occurrence. The smaller the value, the more accurate and calibrated the probability prediction of the model. Reliability curve: This curve is a visual supplement to the Brier score. It plots the relationship between the mean predicted probability and the corresponding actual observation frequency. Cost curve: This curve shows the normalized misclassification cost of the model at different classification thresholds.

Experiment 2: Credit risk control strategies were generated by Pre-DDQN. Five key credit decision metrics were tested separately, i.e., years of employment, loan amount, instalment amount, revolving loan credit utilisation and proportion of accounts with credit card utilisation above 75%. For each metric, Pre-DDQN selects credit risk controls that adjust the metric (increase, decrease, or no change) based on different states. However, the samples for some states are too small, making it difficult to train the Pre-DDQN algorithm. To ensure that the generated credit risk control strategies are based on sufficient statistical evidence, we set a minimum sample size threshold and only train and evaluate the Pre DQN algorithm on states where the sample size exceeds this threshold in the training set. Although this approach limits the coverage of the strategy, it ensures the robustness and practical decision-making reference value of the reported strategy. Solving the problem of comprehensive coverage of state space will be an important direction for future research.

To accurately evaluate the performance of reinforcement learning strategies, we use the inverse probability weighting (IPS) method to conduct counterfactual evaluations of Pre-DDQN, DDQN, and Q-Learning:

$$\:V_{{IPS}} \left( {\pi \:_{e} } \right) = \frac{1}{N}\sum {\:_{{i = 1}}^{N} } \frac{{\pi \:_{e} \left( {a_{i} |s_{i} } \right)}}{{\pi \:_{b} \left( {a_{i} |s_{i} } \right)}} \cdot \:r_{i}$$

(13)

Among them, $\:{\pi\:}_{e}$ is the strategy to be evaluated (such as Pre DQN), $\:{\pi\:}_{b}$ is the behavioral strategy, and $\:{r}_{i}$ is the actual reward obtained. Through IPS evaluation, we can more accurately compare the performance of various strategies in counterfactual situations, avoiding evaluation bias caused by data distribution bias or selection bias.

Experiment 3: The performance of credit risk control strategies generated by Pre-DDQN, DDQN, Q-Learning and Random algorithms are compared. For comparison, Q-Learning is a basic reinforcement learning algorithm that selects actions by estimating the expected utility of taking an action in a given state. The Random algorithm is a heuristic algorithm that randomly selects actions for a state.

Experimental results and analyses

Table 2 Performance of the four ensemble learning algorithms.

Full size table

The experimental results of Experiment 1 are shown in Tables 2, 3, 4 and 5; Figs. 3, 4, 5 and 6. As can be seen from Table 2, which presents the performance of the base ensemble learning algorithms on the resampled data, all four models now achieve recall scores above 0.77. LightGBM and CatBoost lead in overall performance, with LightGBM achieving the highest accuracy (0.903), precision (0.980), F1-score (0.895), ROC-AUC (0.945), and PR-AUC (0.961), alongside the lowest Brier score (0.077). The high ROC-AUC and PR-AUC values confirm strong discriminative power between repayment and default classes, while the low Brier score indicates well-calibrated probability predictions. The corresponding curves in Fig. 3 visually substantiate these findings: the ROC curves approach the top-left corner, the PR curves show high area under the curve, the reliability curves align closely with the diagonal, and the cost curves remain low across a wide range of thresholds.

Table 3 Performance of four ensemble learning algorithms combined with DNN.

Full size table

Table 3 shows the performance of ensemble learning models combined with DNN-based feature extraction. The DNN-LightGBM combination continues to demonstrate excellent performance, with a recall of 0.828 and high AUC values (ROC-AUC = 0.944, PR-AUC = 0.960). Figure 4’s visualizations confirm that the model maintains discrimination and calibration after DNN feature extraction.

Table 4 Performance of four ensemble learning algorithms combined with RNN.

Full size table

Table 4 highlights the best overall performance achieved by combining RNN-based feature extraction with ensemble learning. The RNN-LightGBM model attains a recall of 0.828, an F1-score of 0.898, and the highest ROC-AUC (0.945) and PR-AUC (0.961) among all combinations in this category. As illustrated in Fig. 5, the corresponding curves exhibit superior characteristics, suggesting that RNNs are particularly effective at capturing complex, non-linear dependencies within the pseudo-sequential credit data, leading to more discriminative feature representations.

Table 5 Performance of four ensemble learning algorithms combined with CNN.

Full size table

Table 5 presents the results for CNN-based feature extraction. While the CNN-LightGBM model performs admirably with a recall of 0.828 and strong AUC metrics (ROC-AUC = 0.942, PR-AUC = 0.959), it slightly trails the RNN-based approach. The performance curves in Fig. 6 remain robust, indicating that CNNs are also a viable method for feature extraction from credit data, though potentially less suited than RNNs for capturing the specific relational structures present.

By combining the Acc, Pre, Rec, F1, AUC, and Brier scores, it can be concluded that LightGBM consistently delivers superior performance, whether using raw features or those extracted by neural networks. The ensemble learning models leveraging features from RNN feature extraction yielding the most robust results. This is mainly because RNN (especially LSTM) architecture has a natural advantage in capturing complex and nonlinear dependencies between features. Although the credit data used in this study is not strictly a time series in form, there is a strong inherent logical correlation and dynamic interaction effect between various credit indicators, forming a “pseudo sequence”. The cyclic connection structure of RNN can better model the long-term dependencies and non-independent relationships between features, thereby extracting more discriminative feature representations than DNN (processing independent features) and CNN (capturing local spatial correlations). This discovery indicates that modeling the dynamic interaction between features as a sequence dependency is effective in credit risk assessment.

Among all 16 combinations, the LightGBM model combined with RNN achieved the best performance in all performance metrics. Therefore, in practical applications, combining LightGBM model with RNN can be considered to achieve better classification performance.

The experimental results of Experiment 2 are shown in Fig. 7. For descriptive purposes, years of employment, loan amount, instalment amount, revolving loan credit utilization and credit card utilization above 75% are labelled A, B, C, D and E. For indicator A, if the borrower’s other four indicators are in states 1 (low B, C, and D, medium E), 2 (Low B, C and D, High E), 4 (low B and C, medium D and E), 5 (Low B and C, Medium D, High E), we should keep the requirement for indicator A unchanged. If the borrower’s other four indicators are in states 0 (low B, C, and D), 40 (Medium B, C, D and E), we should reduce the requirement for Indicator A. If the states are 36 (Medium B and C, Low D and E), 37 (medium B, C, and E, Low D), and 41 (Medium B, C, and D, High E), we should increase the requirement for A.

For Indicator B, if the borrower’s other 4 indicators are in the states of 1, 54, 55 and 68, we should keep our requirements for Indicator B unchanged. If the borrower’s other 4 indicators are at 5, 58, and 59, we should increase the threshold for indicator B. For Indicator C, if the borrower’s remaining 4 indicators are at 1, 54, and 55, we should keep the requirement for Indicator C unchanged. If the borrower’s other 4 indicators are at 0 and 27, we should raise the threshold for Indicator C. If the borrower’s other 4 indicators are in a state of 2, 55, and 67, we should raise the threshold for Indicator D that allows for loan approval. If the borrower’s other 4 indicators are in a state of 0 and 27, we should lower the threshold for Indicator D. For Indicator E, if the borrower’s remaining 4 indicators are in a state of 1, 28, 54, and 55, we should leave the threshold for Indicator E unchanged. If the borrower’s other 4 indicators are in a state of 0 and 27, we should raise the threshold for Indicator E that allows for loan approval.

The experimental results of Experiment 3 are shown in Fig. 8, which compares the performance of the generated credit risk control strategies of the three algorithms Pre-DDQN, DDQN, Q-Learning and Random. In this paper, we use IPS corrected value estimation as a measure of algorithm performance (see Sect. 3.1). The higher the value, the better the performance of the strategy in counterfactual scenarios. Each subplot in Fig. 8 compares the performance of adjusting a key credit metric to control credit risk. Pre-DQN achieved the highest IPS value estimation on all five key credit indicators. As expected, the Random algorithm performed the worst of the three algorithms. Compared to the Random algorithm, although the Q-Learning algorithm obtained a higher average reward, it did not perform as well as the DDQN algorithm, and the overestimation problem of traditional Q-Learning may be one of the reasons for this, whereas the Pre-DDQN algorithm took into account the results of the credit risk prediction, and thus had the best performance.

The experimental results show the effectiveness of Pre-DDQN in credit risk control compared to the DDQN, Q-Learning and Random algorithms. The ability of Pre-DDQN to mitigate the overestimation problem of Q-Learning and to achieve higher reward values makes it a promising strategy for real-world credit risk management applications. Q-Learning and DDQN, although a well-established algorithm, lagged behind Pre-DDQN in this particular experiment. Its performance was still acceptable, but the lower final reward value suggests that it may not be the best choice for complex credit risk scenarios. The Random algorithm underperformed, highlighting the importance of intelligent and adaptive strategies in credit risk control.

In summary, the experimental results show that the NERHF framework consisting of a credit risk prediction model based on neural networks and ensemble learning, and a credit risk control method based on deep reinforcement learning, has significant advantages and potentials in practical applications, which not only improves the accuracy of credit risk prediction, but also provides a more accurate and efficient support for credit risk control decisions.

Conclusions and future work

The NERHF framework proposed in this paper combines the feature extraction capability of neural networks, the prediction advantage of ensemble learning, and the decision-making capability of deep reinforcement learning to provide an innovative solution for credit risk management. The effectiveness of NERHF in improving credit risk prediction accuracy and reducing default risk is verified through a series of experiments. Specifically, extracting features using RNN and combining them with the LightGBM algorithm demonstrated excellent performance on all evaluation metrics. In addition, the Pre-DDQN algorithm achieves higher average returns in developing credit risk control strategies compared to DDQN, Q-Learning, and stochastic algorithms, demonstrating its superiority in handling complex credit risk scenarios.

Despite the positive results of the NERHF framework in the current study, there is still room for further improvements and extensions. Future work can be carried out in the following areas:

(1) Model generalization ability: although NERHF performs well on the Lending Club dataset, its generalization ability still needs to be validated on other datasets. In the future, the model will be tested on different types of credit datasets to evaluate its generalization ability. (2) Algorithm optimization: although the Pre-DDQN algorithm performs well in credit risk control, its computational efficiency and parameter tuning still need to be optimized, especially when dealing with large-scale state space or high demensiional features. Future work will focus on improving the running speed of the algorithm and adjusting the hyperparameters to adapt to different credit risk assessment scenarios. (3) Interdisciplinary applications: exploring the potential applications of the NERHF framework in other financial fields, such as stock market analysis and insurance risk assessment.

Data availability

The datasets generated and/or analysed during the current study are available at https://www.kaggle.com/datasets/beatafaron/loan-credit-risk-and-population-stability/data, or from the corresponding author on reasonable request.

Code availability

The custom codes used in the current study are available at https://doi.org/10.5281/zenodo.17678788 and accessed by users specified by the corresponding author on reasonable request.

References

Jiménez, G. et al. Hazardous times for monetary policy: what do Twenty-Three million bank loans say about the effects of monetary policy on credit Risk-Taking?. Econometrica 82 (2), 463–505 (2014).
Article MathSciNet Google Scholar
Chatterjee, S. et al. A quantitative theory of the credit Score. Econometrica 91 (5), 1803–1840 (2023).
Article MathSciNet Google Scholar
Yu, Z. J. & Guo, Y. J. Credit level based optimization model for bank Loan. Control Decis. 21 (12), 1429–1431 (2006).
Google Scholar
Liu, X., Zhou, R. X. & Li, Y. R. Default prediction of credit bond in China based on stacking algorithm integrated Model. Oper. Res. Manage. Sci. 32 (3), 163–170 (2023).
Google Scholar
Zhang, T. & Fan, B. Loan risk prediction method based on CLPSO-CatBoost. Comput. Syst. Appl. 30 (4), 222–226 (2021).
ADS Google Scholar
Mao, J. & Jain, A. K. Artificial neural networks for feature extraction and multivariate data Projection. IEEE Trans. Neural Networks. 6 (2), 296–317 (1995).
Article ADS PubMed CAS Google Scholar
Stuhlsatz, A., Lippel, J. & Zielke, T. Feature extraction with deep neural networks by a generalised discriminant Analysis. IEEE Trans. Neural Networks Learn. Syst. 23 (4), 596–608 (2012).
Article Google Scholar
Alaslani, M. G. Convolutional neural network based feature extraction for Iris Recognition. Int. J. Comput. Sci. Inform. Technol. (IJCSIT). 10 (2), 65–78 (2018).
Google Scholar
Chen, Y. et al. Deep feature extraction and classification of hyperspectral images based on convolutional neural Networks. IEEE Trans. Geosci. Remote Sens. 54 (10), 6232–6251 (2016).
Article ADS Google Scholar
D’ Angelo, G. & Palmieri, F. Network traffic classification using deep convolutional recurrent autoencoder neural networks for Spatial -Temporal features Extraction. J. Netw. Comput. Appl. 173, 102890 (2021).
Article Google Scholar
Shen, F. et al. A new deep learning ensemble credit risk evaluation model with an improved synthetic minority oversampling Technique. Appl. Soft Comput. 98, 106852 (2021).
Article Google Scholar
Li, Y. & Chen, W. A comparative performance assessment of ensemble learning for credit Scoring. Mathematics 8 (10), 1756 (2020).
Article Google Scholar
Abedin, M. Z. et al. Combining weighted SMOTE with ensemble learning for the Class-Imbalanced prediction of small business credit Risk. Complex. Intell. Syst. 9 (4), 3559–3579 (2023).
Article Google Scholar
Hamori, S. et al. Ensemble learning or deep learning? Application to default risk Analysis. J. Risk Financial Manage. 11 (1), 12 (2018).
Article Google Scholar
Uddin, M. S. et al. Leveraging random forest in Micro-Enterprises credit risk modelling for accuracy and Interpretability. Int. J. Finance Econ. 27 (3), 3713–3729 (2022).
Article Google Scholar
Li, Y. & Yan, K. Prediction of bank credit customers churn based on machine learning and interpretability analysis. Data Sci. Finance Econ. 5 (1), 19–34 (2025).
Article Google Scholar
Wang, Y. et al. Deep reinforcement learning with the Confusion-Matrix-Based dynamic reward function for customer credit Scoring. Expert Syst. Appl. 200, 117013 (2022).
Article Google Scholar
Alfonso-Sánchez, S. et al. Optimising credit limit adjustments under adversarial goals using reinforcement learning. Eur. J. Operational Res. (2024).
Zhang, W. et al. Deep reinforcement learning imbalanced credit risk of SMES in supply chain Finance. Ann. Oper. Res. 1–31. (2024).
Mosavi, A. et al. Comprehensive review of deep reinforcement learning methods and applications in Economics. Mathematics 8 (10), 1640 (2020).
Article MathSciNet Google Scholar
Herasymovych, M., Märka, K. & Lukason, O. Using reinforcement learning to optimize the acceptance threshold of a credit scoring Model. Appl. Soft Comput. 84, 105697 (2019).
Article Google Scholar
Bayraci, S. & Susuz, O. A deep neural network (DNN) based classification model in application to loan default Prediction. Theoretical Appl. Econ. 4 (621), 75–84 (2019).
Google Scholar
Lee, H. S. & Oh, S. LSTM-based deep learning for time series forecasting: the case of corporate credit score Prediction. J. Inform. Syst. 29 (1), 241–265 (2020).
Google Scholar
Hoseinzade, E., Haratizadeh, S. & CNNpred: CNN-based stock market prediction using a diverse set of Variables. Expert Syst. Appl. 129, 273–285 (2019).
Article Google Scholar
Speiser, J. L. et al. A comparison of random forest variable selection methods for classification prediction Modeling. Expert Syst. Appl. 134, 93–101 (2019).
Article PubMed PubMed Central Google Scholar
Li, H. et al. XGBoost model and its application to personal credit Evaluation. IEEE. Intell. Syst. 35 (3), 52–61 (2020).
Article CAS Google Scholar
Sun, X., Liu, M. & Sima, Z. A novel cryptocurrency price trend forecasting model based on LightGBM. Finance Res. Lett. 32, 101084 (2020).
Article Google Scholar
Prokhorenkova, L. et al. CatBoost: unbiased boosting with categorical features. In Proceedings of the 32nd International Conference on Neural Information Processing Systems 6639–6649. (2018).

Download references

Author information

Authors and Affiliations

School of Management Science and Engineering, Dongbei University of Finance and Economics, Dalian, 116025, Liaoning, China
Lin Wei & Hanyue Yu
School of Data Science and Artificial Intelligence, Dongbei University of Finance and Economics, Dalian, 116025, Liaoning, China
Jiyang Dong

Authors

Lin Wei
View author publications
Search author on:PubMed Google Scholar
Jiyang Dong
View author publications
Search author on:PubMed Google Scholar
Hanyue Yu
View author publications
Search author on:PubMed Google Scholar

Contributions

The manuscript was the result of a collaborative effort by Lin Wei, Jiyang Dong, and Hanyue Yu, with each author contributing distinctive and vital components to the research. Lin Wei: Lin Wei was at the forefront of the conceptual development of the study, formulating the research questions and hypotheses that guided the investigation. Lin Wei managed the data collection process and performed the primary analysis, offering interpretations that were well-grounded in the existing body of literature. He was the primary drafter of the manuscript, skillfully incorporating feedback from her co-authors during the iterative writing process. Jiyang Dong: Jiyang Dong provided essential expertise in the field of data science, particularly in the area of credit data analysis. His meticulous review and editing of the manuscript ensured the accuracy of the technical details and the overall coherence of the presentation. Hanyue Yu: Supervised the research design, optimized the ensemble learning prediction module, revised the manuscript for academic rigor, and ensured alignment with journal scope. All authors actively participated in the revision process, working in unison to enhance the clarity, precision, and persuasiveness of the arguments presented. The research team collectively reviewed and endorsed the final version of the manuscript, collectively assuming responsibility for the veracity of the data and the precision of the research findings. We look forward to the peer review process and welcome opportunities to refine the manuscript based on constructive feedback.

Corresponding author

Correspondence to Jiyang Dong.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

Below is the link to the electronic supplementary material.

Supplementary Material 1 (download DOCX )

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.

Reprints and permissions

About this article

Cite this article

Wei, L., Dong, J. & Yu, H. NERHF: a hybrid machine learning-driven efficient credit risk control framework. Sci Rep 16, 1170 (2026). https://doi.org/10.1038/s41598-025-30905-6

Download citation

Received: 16 September 2025
Accepted: 27 November 2025
Published: 01 December 2025
Version of record: 09 January 2026
DOI: https://doi.org/10.1038/s41598-025-30905-6

Subjects

Abstract

Similar content being viewed by others

Transforming credit risk evaluation in digital lending from black box models to transparent decisions

Research on credit risk of listed companies: a hybrid model based on TCN and DilateFormer

Enhancing transparency and fairness in automated credit decisions: an explainable novel hybrid machine learning approach

Introduction

Relevant literature

Feature extraction based on neural networks

Credit risk prediction based on ensemble learning

Credit risk control based on reinforcement learning

Research methodology

Dataset processing

Credit feature extraction

DNN algorithm

RNN algorithm

CNN algorithm

Credit risk prediction

RF algorithm

XGBoost algorithm

LightGBM algorithm

CatBoost algorithm

Credit risk control

DRL modelling

Pre-DDQN algorithm

Experimental analysis

Experimental design

Experimental results and analyses

Conclusions and future work

Data availability

Code availability

References

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Competing interests

Additional information

Publisher’s note

Supplementary Information

Supplementary Material 1 (download DOCX )

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Quick links