Introduction

The escalating global concern over financial crises stems from their profound and far-reaching impacts on enterprises. Against this backdrop, the development of efficient bankruptcy prediction tools has become increasingly critical, as they provide essential decision-making support for businesses1,2. Currently, bankruptcy prediction is primarily treated as a binary classification problem, and numerous novel classification models have emerged through the application of diverse technological approaches3,4,5. Existing prediction models can generally be categorized into two types: those based on traditional statistical methods, which feature relatively simple structures, and those grounded in advanced artificial intelligence (AI) techniques. Due to their superior capability in handling nonlinear data, AI-based models often outperform conventional analytical approaches in various performance evaluations6. Moreover, the rapid advancement of big data and AI technologies has significantly propelled progress in the field of bankruptcy prediction. In the context of financial decision-making and bankruptcy forecasting, a wide array of AI models has garnered significant attention from both academia and industry. Representative examples include Artificial Neural Networks (ANN)7, Backpropagation Neural Networks (BPNN)8, Support Vector Machines (SVM)9,10, Decision Trees (DT)11, Extreme Learning Machines (ELM)4, Kernel Extreme Learning Machines (KELM)12, as well as numerous hybrid approaches derived from these models.

Additionally, Bolin Liao et al. employed the Nonlinear Activated Beetle Antennae Search (NABAS) algorithm to address fraud detection in listed companies. The NABAS algorithm demonstrated higher accuracy and efficiency in fraud detection on public datasets compared to other methods13. Jabeur et al. proposed an improved XGBoost algorithm based on feature importance selection (FS-XGBoost) for bankruptcy prediction, providing an effective approach for this purpose14. Mário Papík conducted one-year, two-year, and three-year predictions across three periods using CatBoost, LightGBM, and XGBoost methods. The results of the study indicate that the predictive models performed significantly weaker during crisis periods compared to non-crisis periods, analyzing the impact of crises on the performance of bankruptcy prediction models15. Suguru Yamanaka introduced a novel bankruptcy prediction method that utilizes features independent of financial statements. Empirical results on a sample of Japanese companies showed that some of the proposed features contain useful information for bankruptcy prediction16.

In the domain of financial forecasting, Artificial Neural Networks (ANN)17 and Support Vector Machines (SVM)9 have become two of the most widely adopted models, owing to their powerful self-learning capabilities and their effectiveness in capturing complex nonlinear relationships within data. However, both approaches exhibit notable limitations. ANN relies on the backpropagation algorithm, which is prone to becoming trapped in local optima, and its performance is highly sensitive to the tuning of hyperparameters related to network architecture18. Although SVM is theoretically well-founded, its computational efficiency is hindered by the iterative optimization of support vectors, resulting in a marked decrease in training speed when applied to large-scale datasets. In contrast, the Extreme Learning Machine (ELM) demonstrates superior overall performance. Its single-hidden-layer feedforward network architecture enables significantly faster training compared to traditional ANN and SVM, while often achieving higher classification accuracy than many advanced classifiers, including SVM19. An advanced variant of ELM, the Kernel Extreme Learning Machine (KELM), introduces kernel function techniques to effectively address the precision instability caused by the random assignment of input weights in the original ELM. This enhancement improves the model’s ability to extract high-dimensional nonlinear features and significantly boosts its generalization performance20. As a result, KELM has been increasingly applied across a wide range of application domains21.

Despite the widespread application of Kernel Extreme Learning Machines (KELM) across various domains, extensive research has shown that the performance of KELM classifiers is highly sensitive to two critical parameters: the kernel penalty parameter \(C\) and the kernel bandwidth \(\gamma\). Therefore, it is essential to appropriately determine these parameters before deploying the classifier in real-world scenarios22. Traditionally, grid search has been employed for this purpose; however, this method is prone to local optima and often struggles to define suitable parameter search ranges21,23.

To overcome these limitations, a range of optimization techniques has been introduced into the field. Optimization is inherently suited for discovering optimal or near-optimal solutions for unknown variables. For instance, Yu et al.20 proposed a hybrid forecasting model that combines Complementary Ensemble Empirical Mode Decomposition (CEEMD), Whale Optimization Algorithm (WOA), and KELM for short-term wind power prediction. The WOA was utilized to optimize the two key parameters of KELM, thereby enhancing prediction accuracy and stability. Li et al.24 applied the Butterfly Optimization Algorithm (BOA) to tune KELM hyperparameters for modeling and analyzing reservoir gas relative permeability. Similarly, Li et al.25 introduced a Biogeography-Based Optimization (BBO) enhanced KELM approach (BBO-KELM) for ultra-short-term wind power forecasting across different regions. Chen et al.26 developed an online photovoltaic power prediction model based on chaos analysis, an improved Particle Swarm Optimization (PSO) algorithm, and KELM, aiming to improve forecasting accuracy. Hai et al.27 leveraged the generalization capability and learning efficiency of KELM, incorporating a multi-strategy enhanced dung beetle optimization (MIDBO) algorithm to refine the model structure for frost heave evaluation in coarse-grained soils—leading to improved convergence accuracy, faster convergence, and enhanced robustness. Zhao et al.28 proposed a KELM-based regression model for ship motion dynamics, optimized via the Grey Wolf Optimizer (GWO). This approach incorporated a sliding time window for dynamic model updates, effectively combining KELM’s fast learning mechanism with GWO’s global search capabilities to improve prediction accuracy and timeliness. Li et al.29 developed a hybrid model based on KELM for predicting the interval time of intermittent pumping wells, where the parameters were optimized using an Improved Brain Storm Optimization algorithm (IBSO-KELM). Han et al.30 proposed a Short-Term Power Load Forecasting (STPLF) model based on KELM, optimized via an Improved Whale Optimization Algorithm (IWOA), to address the high randomness and low prediction accuracy in electricity load forecasting.

Although existing studies have attempted to apply various optimization algorithms—such as Particle Swarm Optimization (PSO), Grey Wolf Optimizer (GWO), and Whale Optimization Algorithm (WOA)—to parameter tuning of machine learning models to enhance performance in financial risk prediction tasks such as bankruptcy forecasting, these optimization algorithms still commonly suffer from critical shortcomings that are difficult to avoid. As noted in relevant research, "most optimization algorithms remain plagued by issues related to parameter randomness. For example, the coefficient vectors (α, β, δ) in GWO, which control search direction and step size, are randomly initialized. This often leads to dispersed search trajectories in the early iterations, making it difficult to quickly focus on promising regions. In later iterations, due to the overly rapid linear decay of the coefficient vectors, the algorithm prematurely loses its exploration capability, ultimately converging to a local optimum and resulting in premature convergence"31,32,33. These limitations highlight the continued need for more effective and robust optimization strategies. Moreover, the "No Free Lunch" theorem34 asserts that no single optimization algorithm performs best across all problems, which further drives research toward the development of improved and problem-specific optimization algorithms.

The Northern goshawk Optimization (NGO) algorithm, inspired by the predatory behavior of northern goshawks, has garnered attention in the field of optimization due to its simplicity and ease of implementation35. Existing studies have applied NGO to parameter optimization of models such as Support Vector Machines (SVM)36,37,38. However, its application to Kernel Extreme Learning Machine (KELM) classification problems remains relatively limited. Like many other metaheuristic algorithms, NGO also suffers from common issues such as an imbalance between exploration and exploitation, slow convergence speed, entrapment in local optima, and suboptimal convergence accuracy. To address these shortcomings, this study proposes an enhanced version of NGO—referred to as TIS_NGO—by incorporating three improvement strategies: (1) a thinking innovation strategy, (2) a prey attack strategy based on differential evolution operations, and (3) a centroid reverse learning boundary control strategy. The objective is to develop an efficient optimization algorithm tailored for KELM parameter tuning, thereby improving the accuracy and stability of bankruptcy prediction models.

Specifically, the thinking innovation strategy enables the algorithm to effectively exploit valuable information within the search space during the exploitation phase, while maintaining stability and population diversity. The prey attack strategy based on differential evolution operations enhances the diversity of search directions, allowing the algorithm to explore broader regions and thus avoid premature convergence to local optima. Meanwhile, the centroid reverse learning boundary control strategy mitigates ineffective searches and accelerates convergence speed. The effectiveness of the proposed TIS_NGO algorithm is validated through experiments conducted on the CEC2017 and CEC2022 benchmark test suites. In practical bankruptcy prediction tasks, TIS_NGO is integrated with the KELM model to form a hybrid TIS_NGO-KELM framework. To ensure the model’s generalization capability, ten-fold cross-validation and stratified sampling techniques are employed on the bankruptcy dataset. Experimental results demonstrate that the proposed model significantly outperforms other KELM variants optimized by alternative intelligent algorithms in terms of key metrics such as classification accuracy and Matthews correlation coefficient. The ability of the model to distinguish between bankrupt and non-bankrupt firms is notably enhanced. This deep integration of an improved metaheuristic algorithm with a machine learning model not only overcomes the limitations of traditional parameter tuning approaches, but also offers a novel methodological reference for modeling complex nonlinear problems in the financial domain.

The main contributions of this study are as follows:

  1. (1)

    A novel improved algorithm, termed TIS_NGO, is proposed by integrating three enhancement strategies into the original Northern goshawk Optimization (NGO): the thinking innovation strategy, the prey attack strategy based on differential evolution operations, and the centroid reverse learning boundary control strategy;

  2. (2)

    The optimization performance of TIS_NGO is validated through comprehensive comparisons with other state-of-the-art algorithms on the CEC2017 and CEC2022 benchmark test suites;

  3. (3)

    A bankruptcy prediction model, TIS_NGO-KELM, is developed by combining the proposed TIS_NGO with Kernel Extreme Learning Machine (KELM). Compared to other KELM-based methods, this model demonstrates superior accuracy in identifying bankrupt firms.

The remainder of this paper is organized as follows: Section "Northern goshawk optimization and the proposed methodology" introduces the original NGO algorithm and the proposed improvements in TIS_NGO. Section "Numerical experiments" presents the validation of TIS_NGO using the CEC2017 and CEC2022 benchmark suites. Section "Evaluate the proposed TIS_NGO for bankruptcy prediction problem" details the development of the TIS_NGO-KELM-based bankruptcy prediction model and comparative evaluations. Finally, Section "Summary and Prospect" concludes the study and discusses future research directions.

Northern goshawk optimization and the proposed methodology

Northern goshawk optimization (NGO)

Initialization

The Northern goshawk optimization (NGO) algorithm is a population-based metaheuristic method, similar in nature to other heuristic optimization algorithms. NGO begins by randomly generating a population of candidate solutions within the defined search space, as represented by:

$$\begin{array}{c}X={\left[\begin{array}{l}{X}_{1}\\ \vdots \\ {X}_{i}\\ \vdots \\ {X}_{n}\end{array}\right]}_{n\times \text{d}}={\left[\begin{array}{cc}\begin{array}{ccc}{x}_{\text{1,1}}& {x}_{\text{1,2}}& \cdots \\ {x}_{\text{2,1}}& {x}_{\text{2,2}}& \cdots \\ \vdots & \vdots & \ddots \end{array}& \begin{array}{ccc}{x}_{1,j}& \cdots & {x}_{1,d}\\ {x}_{2,j}& \cdots & {x}_{2,d}\\ \vdots & \ddots & \vdots \end{array}\\ \begin{array}{ccc}{x}_{i,1}& {x}_{i,2}& \cdots \\ \vdots & \vdots & \ddots \\ {x}_{n,1}& {x}_{n,2}& \cdots \end{array}& \begin{array}{ccc}{x}_{i,j}& \cdots & {x}_{i,d}\\ \vdots & \ddots & \vdots \\ {x}_{n,j}& \cdots & {x}_{n,d}\end{array}\end{array}\right]}_{n\times d}\end{array}$$
(1)

where \(X\) denotes the population of northern goshawks, \({X}_{i}\) represents the position of the \({i}^{th}\) individual, \({X}_{i,j}\) denotes the value of the \({j}^{th}\) decision variable of the \({i}^{th}\) goshawk, \(n\) is the population size, and \(d\) is the dimensionality of the optimization problem.

The initial positions of the goshawks are randomly initialized using the following equation:

$$\begin{array}{c}{X}_{i,j}=\left({ub}_{j}-{lb}_{j}\right)\times {r}_{1}+{lb}_{j}\end{array}$$
(2)

where \({X}_{i,j}\) is the initial value of the \({j}^{th}\) decision variable for the \({i}^{th}\) candidate solution, \({ub}_{j}\) and \({lb}_{j}\) are the upper and lower bounds, respectively, and \({r}_{1}\) is a uniformly distributed random number in the range [0,1].

Attack prey (exploration stage)

In the first stage of hunting, each northern goshawk randomly selects a prey and swiftly attacks it. Due to the random selection of prey within the search space, this phase enhances the exploration capability of the NGO algorithm. During this phase, the goshawks perform a global search with the goal of identifying the promising regions of the search space. The corresponding mathematical model is described by Eqs. (3) and (4)35:

$$\begin{array}{*{20}l} {X_{{i,j}}^{{new}} = \left\{ {\begin{array}{*{20}l} {X_{{i,j}} + r_{2} \times \left( {P_{{i,j}} - I \times X_{{i,j}} } \right),} \hfill & {F_{{P_{i} }} > F_{i} } \hfill \\ {X_{{i,j}} + r_{2} \times \left( {X_{{i,j}} - P_{{i,j}} } \right),} \hfill & {F_{{P_{i} }} \ge F_{i} } \hfill \\ \end{array} } \right.} \\ \end{array}$$
(3)
$$\begin{array}{*{20}l} {X_{i} = \left\{ {\begin{array}{*{20}c} {X_{i}^{{new}} ,\:if} & {F_{i}^{{new}} < F_{i} } \\ {X_{i} ,} & {else} \\ \end{array} } \right.} \\ \end{array}$$
(4)

where \({ X}_{i,j}^{new}\) denotes the updated position of the \({i}^{th}\) northern goshawk in the \({j}^{th}\) dimension during the first phase. The term \({r}_{2}\) is a randomly generated vector of dimension \(1\times d\) with values uniformly distributed in the range [0, 1]. \(I\) is a random integer value, either 1 or 2. \(P\) represents the position of the prey, \({F}_{{P}_{i}}\) and \({F}_{i}\) denote the fitness values of the prey and the goshawk, respectively. If the prey has a better fitness, the goshawk moves toward it; otherwise, it adjusts its position away from the prey. This mechanism promotes global exploration and reduces the risk of premature convergence.

Chase prey (exploitation stage)

In the prey pursuit phase, after the northern goshawk attacks the prey, the prey attempts to escape while the goshawk chases it. Due to the fast chasing speed of the goshawk, it can almost always pursue and eventually capture the prey under any circumstances. This behavior simulation enhances the local exploitation capability of the algorithm within the search space. The mathematical model for this phase is described by Eqs. (5) to (7)35:

$$\begin{array}{c}{X}_{i,j}^{new}={X}_{i,j}+R\times \left(2\times {r}_{3}-1\right){\times X}_{i,j}\end{array}$$
(5)
$$\begin{array}{c}R=0.02\times \left(1-\frac{t}{T}\right)\end{array}$$
(6)
$$\begin{array}{c}{X}_{i}=\left\{\begin{array}{c}{ X}_{i}^{new}, if {F}_{i}^{new}<{F}_{i}\\ {X}_{i } ,else\end{array}\right.\end{array}$$
(7)

where \({r}_{3}\) is a vector of dimension \((1\times d)\) sampled from a normal distribution, \(t\) denotes the current iteration number, and \(T\) is the maximum number of iterations. \({X}_{i}^{\text{new}}\) represents the updated position of the \({i}^{th}\) northern goshawk after the prey pursuit phase, and \({F}_{i}^{new}\) is the corresponding fitness value. The parameter \(R\) decreases linearly over iterations to gradually reduce the step size, thus enabling the algorithm to perform fine-grained local search as it converges.

Proposed Northern Goshawk optimization

Thinking innovation strategy (TIS)

In Metaheuristic Search (MHS), the algorithm evaluates performance by optimizing randomly generated positions within the search space and computing their fitness values, which guide subsequent optimization. However, MHS faces several challenges during design and implementation. On one hand, balancing algorithmic performance and computational complexity is critical, requiring a trade-off between optimal performance and solution accuracy. On the other hand, MHS algorithms generally have limited capability to intelligently recognize the current positional demands, lacking mechanisms to avoid redundant calculations. This can lead to multiple fitness evaluations at the same location, thereby increasing computational time39. Although the NGO algorithm exhibits strong exploratory capabilities, the complexity of the search space still poses risks of premature convergence to local optima.

To address these issues, this paper introduces a novel Thinking Innovation Strategy (TIS) to enhance NGO. TIS improves the overall performance and individual knowledge of the algorithm by balancing exploration and exploitation through enhanced exploratory capabilities. Specifically, TIS leverages creative thinking to boost the algorithm’s ability to identify the demands of the current position, enabling more accurate assessments of the positional value and improving search efficiency. Additionally, TIS periodically updates information of successful individuals to reduce repeated fitness evaluations at the same locations, effectively preventing redundant computations and reducing computational complexity. Moreover, TIS employs random critical thinking to ensure efficient utilization of information within the search range during exploitation, while maintaining optimization stability and diversity. Individuals within TIS primarily rely on their Depth of Knowledge (DOK) to search for solutions40. DOK acts as a comprehensive cognitive resource pool supporting critical thinking, and consists of two main layers:

  1. (1)

    \(DOK1\): Knowledge and experience accumulated through personal growth.

  2. (2)

    \(DOK2\): Continuously acquired and stored natural phenomena information.

Formally, DOK can be expressed as: \(DOK=DOK1,DOK2\). This structured representation highlights how DOK enables individuals, through strong cognitive abilities, to effectively solve both traditional simple problems and novel complex challenges. The calculation of DOK is given by Eq. (8):

$$\begin{array}{c}\genfrac{}{}{0pt}{}{DOK1=C+{\left(\frac{FEs}{MaxFEs}\right)}^{0.5}}{\begin{array}{c}DOK2={FEs}^{10} \\ DOK=DOK1+DOK2 \end{array}}\end{array}$$
(8)

where \(DOK1\) represents knowledge and experience accumulated over time, \(DOK2\) denotes continuously acquired information, \(C\) is a constant equal to 0.5, \(FEs\) is the current number of fitness evaluations, and \(MaxFEs=T\times n\) denotes the maximum number of evaluations (with \(T\) as maximum iterations and \(n\) as population size).

Imagination (IM) is defined as the ability to coordinate perceived materials in novel ways to create new mental images. Individual imagination is crucial for innovative thinking and stems from critically evaluating prior knowledge and experience, alongside comprehensive observation and understanding of the subject. Cognitively, it integrates multiple dimensions through intuition, imagery, or combination, unconstrained by existing conclusions, viewpoints, frameworks, or theories, thereby generating new ideas. The computation of IM is expressed in Eq. (9)40:

$$\begin{array}{c}IM=\pi \times IE\times rand\end{array}$$
(9)

where \(rand\) is a random number in the interval [0,1], and \(IE\) (Information Event) serves as an important basis for identifying successful individuals. During the initial phase of the algorithm, a successful individual is randomly selected and stored in \(IE\). As the algorithm progresses, a survival-of-the-fittest mechanism retains outstanding successful individuals, which provide vital guidance for subsequent innovative thinking.

Innovative thinking is rooted in the interplay of imagination, heredity, talent, IQ, and cognition, evolving into a form of original thought emphasizing novelty, divergence, and appropriateness. Critical thinking encourages questioning of current information, fostering new scientific cognition, imagination, and inspiration. When individuals aim to generate novel and valuable outcomes, they often combine or transform their depth of knowledge with rich imagination to innovate. This process effectively guides individuals toward multidirectional thinking, producing diverse outcomes and updated perspectives. The formula for innovative thinking is shown in Eq. (10)40:

$$\begin{array}{c}{X}_{new}=\text{tan}\left(IM-0.5\times \pi \right)+\left(\frac{X}{DOK}+IE\right)\end{array}$$
(10)

Prey attack strategy based on differential evolution operation

In the prey-attacking phase of the Northern Goshawk Optimization (NGO) algorithm, the population update mechanism is relatively simple and fails to fully exploit population diversity and search capabilities. This simplistic update strategy may limit the adaptability of search direction and step size, making the algorithm prone to premature convergence and reducing overall search efficiency41,42,43. To address this limitation, a prey-attacking strategy based on the Differential Evolution (DE) mechanism is proposed, as illustrated in Fig. 1. This strategy significantly enhances the diversity of search directions by randomly selecting three distinct individuals from the population to perform differential operations. Unlike the traditional update mechanism that relies solely on the current position \(X(t)\) and a fixed parameter, the proposed method utilizes information from different individuals within the population44,45,46. By introducing differential operations, the algorithm is capable of exploring a broader search space and effectively avoiding local optima. The proposed strategy is implemented via Eq. (11):

$$\begin{array}{c}{X}_{i,j}^{new}={X}_{r1}+F\times \left({X}_{r2}-{X}_{\text{r}3}\right)\end{array}$$
(11)

where \({X}_{r1}\), \({X}_{r2}\) and \({X}_{r3}\) are three randomly selected and mutually distinct individuals, and \(F\) denotes the scaling factor (step size) that controls the magnitude of the differential variation. This allows for localized search around individual members and enables the algorithm to dynamically adjust the search step size based on individual differences within the population. As a result, the strategy enhances global exploration ability, increases the likelihood of identifying superior solutions in complex search spaces, and accelerates convergence. This improved mechanism substantially enhances the robustness and efficiency of NGO when tackling complex optimization problems.

Fig. 1
Fig. 1
Full size image

Illustration of differential evolution operation.

Centroid reverse learning boundary control strategy

In swarm intelligence algorithms, boundary control plays a critical role by defining the limits of the search space, thereby preventing inefficient searches, premature convergence, and entrapment in local optima45. These issues often arise when the search process deviates from feasible solutions or exceeds problem constraints. If the boundaries of the search space are set too narrowly, the algorithm’s ability to thoroughly explore the solution space may be hindered, increasing the risk of falling into local optima47. Therefore, the design of an appropriate boundary correction mechanism is essential to enhance the algorithm’s capability to discover the global optimum. Such a mechanism helps prevent ineffective exploration within the search space and reduces time and resource consumption on infeasible solutions. In certain situations, individuals that exceed the boundaries may restrict the search process and limit the utilization of valuable information carried by other members of the population48.

To address this issue and maximize the use of positional information from all individuals, a boundary control strategy based on centroid opposition-based learning (COBL) is proposed, as illustrated in Fig. 2. This strategy corrects the positions of out-of-bound individuals by mapping them back into the valid search space, thus avoiding ineffective exploration and accelerating convergence. The correction process is formulated by Eq. (12):

$$\begin{array}{c}{X}_{i,j}^{new}=\left\{\begin{array}{c}\frac{\left(lb+{X}_{C}\right)}{2}, if {X}_{i,j}^{new}<lb\\ \frac{\left(ub+{X}_{C}\right)}{2}, if {X}_{i,j}^{new}>ub\end{array}\right.\end{array}$$
(12)

where \({X}_{C}\) denotes the centroid of the current population.

Fig. 2
Fig. 2
Full size image

Schematic diagram of the reverse learning boundary control operation of the centroid.

In summary, the pseudocode of TIS_NGO is presented in Algorithm 1.

figure a

Algorithm 1: Pseudo-Code of TIS_NGOæ.

Computational time complexity

An algorithm’s efficacy is vital; assessing its temporal complexity holds equal significance. For numerous optimization duties, a method must show superb results while maintaining exceptional immediacy. Temporal complexity illustrates runtime growth relative to input magnitude. Examining an optimizer’s time cost aids in approximating computational expense for vast-scale issues. Within the standard NGO, defined control parameters demand \(O(N\times dim)\) operations, with \(N\) being group quantity and dim indicating dimensionality. Initially, the procedure needs \(O(N\times dim)\) duration. Across \(T\) icycles, refreshing member locations consumes \(O(T\times N\times dim)\). Thus, NGO’s complete computational burden is \(O(T\times N\times dim)\). In the devised TIS_NGO, solely the location renewal tactic and goal function assessment approach are enhanced, adding no further complexity elements, so time expenditure stays \(O(T\times N\times dim)\).

Numerical experiments

In this subsection, the effectiveness of the proposed TIS_NGO is validated through comparative experiments with other algorithms on the CEC2017 benchmark functions49.

Ablation experiment

To evaluate the individual contributions and synergistic effects of three enhancement strategies—Thinking Innovation Strategy (S1), Prey Attack Strategy Based on Differential Evolution Operation (S2), and Centroid Reverse Learning Boundary Control Strategy (S3)—on the baseline NGO algorithm, this section conducts ablation experiments using the CEC 2017 test benchmark (with dimension d = 30). Four comparative variants are designed: NGO_S1 (with S1 only), NGO_S2 (with S2 only), NGO_S3 (with S3 only), and TIS_NGO, which integrates all three strategies. The experimental results are presented in Figs. 3 and 4.

Fig. 3
Fig. 3
Full size image

Convergence curves of NGO with different strategy improvements.

Fig. 4
Fig. 4
Full size image

Average rankings of different NGO variant algorithms on test functions.

As observed from the convergence curves in Fig. 3, on typical test functions such as CEC2017-F1, F5, and F7, all single-strategy enhanced variants (NGO_S1, NGO_S2, NGO_S3) outperform the original NGO in convergence performance. Among them, NGO_S2 (with S2 only) demonstrates stronger global exploration capability in the early iterations, converging faster than NGO_S1 (S1 only) and NGO_S3 (S3 only). In contrast, TIS_NGO, through the synergy of the three strategies, not only rapidly approaches the optimal solution region in the early stages but also maintains stable local search accuracy in later phases. Its convergence curve remains the lowest throughout, significantly outperforming both the single-strategy variants and the original NGO, validating that the multi-strategy integration enhances the algorithm’s balance between exploration and exploitation.

The average ranking results in Fig. 4 further quantify this advantage: the original NGO has an average ranking of 3.93. Among the three single-strategy variants, NGO_S2 performs the best (average ranking 3.70), followed by NGO_S1 (3.13) and NGO_S3 (2.73). In comparison, TIS_NGO achieves a markedly lower average ranking of only 1.50, far surpassing all other variants. This indicates a significant synergistic effect among the three strategies: S1 improves information utilization efficiency through Depth of Knowledge (DOK) and Imagination (IM), S2 enhances search direction diversity via differential evolution, and S3 reduces ineffective search through boundary control. Their combination effectively addresses issues such as the slow convergence and tendency to fall into local optima observed in the original NGO, thereby establishing a solid foundation for the superior performance of TIS_NGO.

Algorithm parameter settings

This section evaluates the performance of the proposed TIS_NGO using the most challenging benchmark suite from the CEC2017 numerical optimization competition, and compares it with several state-of-the-art algorithms. The comparison algorithms include Particle Swarm Optimization (PSO)50, Differential Evolution (DE)51,Grey Wolf Optimizer (GWO)52, Whale Optimization Algorithm (WOA)53, Dung Beetle Optimizer (DBO)54, Black-winged Kite Algorithm (BKA)55, Animated Oat Optimization (AOO)56, and Northern Goshawk Optimization (NGO)35. The specific parameter settings for each algorithm are listed in Table 1. To ensure fair comparison and eliminate randomness, the population size was uniformly set to 30, and the maximum number of iterations was fixed at 500 for all algorithms. Each algorithm was independently executed 30 times. The performance metrics, including the average value (Ave), standard deviation (Std), and ranking (Rank), were recorded, with the best results highlighted in bold. All experiments were conducted on a computer running Windows 10 with a 13th Gen Intel(R) Core(TM) i5-13,400 CPU @ 2.5 GHz, 16 GB of RAM, using MATLAB R2022b as the software platform.

Table 1 Compare algorithm parameter settings.

Experimental results and analysis of CEC2017 and CEC2022 test suite

The performance of TIS_NGO is evaluated in this section using two benchmark suites: CEC2017 and CEC2022. To comprehensively assess its effectiveness, experiments were conducted on the CEC2017 benchmark with 30 dimensions (d = 30) and the CEC2022 benchmark with 20 dimensions (d = 20). The experimental results are summarized in Tables 2 and 3, which report the average value (Ave), standard deviation (Std), and the ranking (Rank) for each algorithm. The best results are highlighted in bold. The convergence curves are illustrated in Fig. 5, demonstrating the convergence speed of each algorithm.

Table 2 Experimental results of CEC2017 test suite.
Table 3 Experimental results of CEC2022 test suite.
Fig. 5
Fig. 5
Full size image

Part of the test function iterates convergence curves.

As illustrated in the tables, TIS_NGO demonstrates superior performance in terms of both average fitness values and standard deviations across the CEC2017 and CEC2022 benchmark sets, outperforming the standard NGO and other comparative algorithms. Specifically, TIS_NGO achieves the best results on 23 out of 30 functions in the CEC2017 test suite, with only 7 functions (F7, F15, F17, F19, F22, F26, and F27) not being the best. In contrast, the standard NGO yields the best performance on only 5 functions (F15, F17, F19, F22, and F27). On the CEC2022 benchmark suite, which includes 12 functions, TIS_NGO obtains the best average fitness values on 7 functions (F2–F5, F8, F9, and F11), while the standard NGO achieves the best results on only F6, F10, and F12. In addition, TIS_NGO consistently achieves the lowest standard deviations (highlighted in bold), further demonstrating its superior robustness and stability over the standard NGO and other peer algorithms.

As shown in the average convergence curves (Fig. 5), the average fitness values of all algorithms tend to decrease as the number of iterations increases, indicating their ability to converge toward better solutions. TIS_NGO exhibits exceptional performance on various types of optimization problems, including unimodal, multimodal, hybrid, and composite functions.

Specifically, on unimodal functions such as CEC2017-F1, TIS_NGO demonstrates a significantly faster convergence rate in the early stages compared to the baseline algorithms, and it continues to improve solution quality in later iterations, reflecting its high search efficiency. In contrast, standard NGO, GWO, WOA, and PSO exhibit slower convergence behavior. For multimodal functions with numerous local optima (e.g., CEC2017-F9 and CEC2022-F5), TIS_NGO achieves rapid global search, effectively approaching the optimal region in early iterations with large steps, while avoiding premature convergence—a common issue in PSO and WOA.

In the case of hybrid and composite functions, such as CEC2017-F30, TIS_NGO also demonstrates outstanding global exploration and local exploitation capabilities. While comparison algorithms often suffer from stagnation or getting trapped in local optima, TIS_NGO, empowered by its three synergistic enhancement strategies, dynamically adjusts its search behavior throughout the optimization process, enabling continuous progress toward global optima. This advantage allows TIS_NGO to maintain high optimization efficiency, even when tackling complex, multimodal, or high-dimensional problems.

Furthermore, to better visualize the performance of each algorithm across the test functions, radar charts were employed to illustrate the ranking distribution of each algorithm on individual test functions, as shown in Fig. 6. Additionally, the average rankings of all algorithms on the CEC2017 and CEC2022 benchmark suites were computed and are presented in Fig. 7.

Fig. 6
Fig. 6
Full size image

Ranking distribution of different algorithms.

Fig. 7
Fig. 7
Full size image

Average ranking of different algorithms.

As shown in Fig. 6, the proposed TIS_NGO algorithm exhibits the smallest enclosed area on the radar charts for both the CEC2017 and CEC2022 test functions, with rankings predominantly concentrated within the top two positions and no worst-case rankings. This clearly demonstrates the effectiveness of the TIS_NGO algorithm, indicating strong competitiveness across the tested functions, significantly outperforming recently proposed algorithms such as DBO and AOO. In contrast, the ranking distribution of the standard NGO algorithm mainly falls between the third and eighth positions, substantially lower than that of TIS_NGO. This further validates the effectiveness of the three improvement strategies proposed in this study, which enhance both the convergence speed and accuracy of the algorithm, establishing it as a formidable contender in the field of optimization algorithms.

According to the average rankings shown in Fig. 7, the TIS_NGO algorithm achieved average ranks of 1.30 and 1.42 on the CEC2017 and CEC2022 test suites, respectively, far surpassing the standard IVY algorithm’s averages of 3.30 and 3.17. Notably, the recently proposed AOO algorithm attained average ranks of 3.23 and 3.25, outperforming most previously introduced algorithms, yet still falling short of TIS_NGO’s performance.

The comparative analysis of experimental data strongly corroborates the significant effectiveness of the three major improvement strategies proposed herein. These strategies not only enable the TIS_NGO algorithm to approach the global optimum more rapidly during iterations but also substantially improve solution accuracy in tackling complex problems, reduce result variability, and ensure stable and efficient algorithm performance under diverse conditions.

Runtime comparison analysis of TIS_NGO and NGO

Drawing from earlier experimental results, the enhanced TIS_NGO method exhibits markedly superior comprehensive efficacy relative to the original NGO approach. Here, attention is directed toward examining disparities in processing duration for both methodologies across the CEC2017 suite. For equitable comparison, TIS_NGO and NGO employed uniform configuration parameters, identical to those adopted in prior analyses. Mean execution periods were derived from thirty separate trials. Figure 8 displays median temporal expenditures (in seconds) consumed by every technique when addressing assessment functions.

Fig. 8
Fig. 8
Full size image

The average runtime of TIS_NGO and NGO on the test functions.

As can be observed from the data distribution in the figure, the runtime of TIS_NGO is slightly higher than that of the original NGO on most test functions. For instance, on some functions, the average runtime of NGO ranges between approximately 0.29–0.36 s, while that of TIS_NGO reaches 0.30–0.49 s. In a few specific cases (e.g., where NGO takes 0.30 s and TIS_NGO requires 0.64 s), the difference in runtime is more noticeable. Nevertheless, the overall time cost remains within the same order of magnitude, with no exponential increase observed.

This marginal increase in computational time is primarily attributed to the three enhanced strategies integrated into TIS_NGO: the Thinking Innovation Strategy (TIS), which relies on calculating Depth of Knowledge (DOK) and Imagination (IM) to improve information utilization efficiency; the Differential Evolution-based Prey Attack Strategy (S2), which involves differential operations among three randomly selected individuals to diversify search directions; and the Centroid Reverse Learning Boundary Control Strategy (S3), which requires computing the population centroid and correcting out-of-bound individuals. Although these additional operations introduce extra computational overhead, a theoretical time complexity analysis shows that TIS_NGO does not introduce higher-order complexity factors. Its time complexity remains the same as that of the original NGO, namely \((O(T \times N \times dim))\), Thus, the runtime difference between the two algorithms remains within an acceptable range.

From a practical standpoint, the slight increase in runtime is well justified by the significant performance improvements offered by TIS_NGO. Previous experiments have demonstrated that TIS_NGO substantially outperforms the original NGO in convergence speed (e.g., approaching the optimal solution within 20 iterations on CEC2017-F1, compared to over 40 iterations for NGO), solution accuracy (achieving optimal results in 23 out of 30 CEC2017 test functions), and robustness (exhibiting significantly lower standard deviation). Moreover, the modest additional computational cost does not exceed the real-time requirements of practical applications such as bankruptcy prediction in the financial domain. Therefore, the results in Fig. 8 further confirm the comprehensive advantage of TIS_NGO: it achieves a leap in optimization performance through multi-strategy integration while maintaining computational efficiency, effectively balancing both practical applicability and high performance.

Friedman mean rank test

In this subsection, the Friedman test57 is used to determine the overall ranking of the TIS_NGO algorithm relative to other methods. As a nonparametric approach, the Friedman test is suitable for comparing median performance differences across three or more matched groups. It is particularly well-suited for repeated measures or block designs, and is often employed as a robust alternative to ANOVA when the assumption of normality is violated. The Friedman test statistic is calculated according to Eqs. (13).

$$\begin{array}{c}Q=\frac{12}{nk\left(k+1\right)}\sum_{j=1}^{k}{R}_{j}^{2}-3n\left(k+1\right)\end{array}$$
(13)

where \(n\) is the number of blocks, \(k\) is the number of groups, and \({R}_{j}\) is the rank sum for \(j\)-th group. When \(n\) and \(k\) are large, \(Q\) follows approximately a \({\chi }^{2}\) distribution with \(k-1\) degrees of freedom.

Table 4 presents the Friedman average rank test results of TIS_NGO and other algorithms on the CEC2017 (30-dimensional) and CEC2022 (20-dimensional) test suites, which quantify the overall performance differences among multiple algorithms.

Table 4 Friedman mean rank test result.

The results show that TIS_NGO ranks first on both test sets, with an average rank of 1.30 on CEC2017 and 1.42 on CEC2022, significantly outperforming the second-ranked AOO (average ranks 3.23/3.25) and the third-ranked original NGO (3.30/3.17). In contrast, other algorithms such as WOA rank last (average ranks 8.93/8.83), while classical algorithms including PSO, DE, and GWO, as well as newer algorithms like DBO and BKA, all exhibit average ranks above 4.60, far behind TIS_NGO. These results statistically confirm that TIS_NGO’s optimization performance advantage is significant and robust across varying problem dimensions and complexities, providing a reliable foundation for its subsequent application in optimizing KELM parameters and improving bankruptcy prediction performance.

Evaluate the proposed TIS_NGO for bankruptcy prediction problem

KELM mathematical model

Kernel Extreme Learning Machine (KELM) is an extension of the traditional Extreme Learning Machine (ELM) based on kernel functions, designed to address classification and regression tasks. It improves upon the conventional ELM by incorporating kernel methods, thereby extending its applicability to nonlinear data and further enhancing the model’s generalization capability and learning speed58.

The essence of ELM lies in determining a unique optimal solution by randomly initializing the input weights and hidden layer biases of a single-hidden-layer feedforward neural network (SLFN), rather than through iterative tuning. Given \(n\) training samples \(({x}_{i},{t}_{i})\), \(K\) hidden neurons, and an activation function \(g\left(x\right)\), where the input \({x}_{i}\) ​is an \(n\times 1\) dimensional training feature vector \({x}_{i}\in {R}^{n}\), and \({t}_{i}\) is an \(m\times 1\) dimensional target vector \({t}_{i}\in {R}^{m}\), the output of ELM is determined by a linear system. The mathematical model of SLFNs is defined as follows59,60:

$$\begin{array}{c}{O}_{j}= \sum_{i=1}^{K}{\beta }_{i}g\left({w}_{i}\times {x}_{i}+{b}_{i}\right) i=\text{1,2},\dots ,n; j=\text{1,2},\dots ,n \end{array}$$
(14)

where \({O}_{j}\) is the output vector for the \({j}^{th}\) input, \({\beta }_{i}\) is the output weight vector connecting the \({i}^{th}\) hidden neuron to the output layer, \({w}_{i}\) is the input weight vector connecting the input layer to the \({i}^{th}\) hidden neuron, \({w}_{i}\times {x}_{i}\) denotes the inner product of \({w}_{i}\) and \({x}_{i}\), and \(g\left({w}_{i}\times {x}_{i}+{b}_{i}\right)\) is the activation function of the \({i}^{th}\) hidden neuron.

The training of ELM aims to minimize the training error. When the SLFN can perfectly fit \(n\) training samples with zero error, the following holds: \({\sum }_{i=1}^{n}\Vert {t}_{i}\times {o}_{i}\Vert =0\),with suitable \({{\beta }_{i}w}_{i}\) and \({b}_{i}\) such that:

$$\begin{array}{c}{t}_{j}={o}_{j}= \sum_{i=1}^{K}{\beta }_{i}g\left({w}_{i}\times {x}_{i}+{b}_{i}\right) j=\text{1,2},\dots ,n \end{array}$$
(15)
$$\begin{array}{c}T=h\left(x\right)\beta =H\times \beta \end{array}$$
(16)

where \(T = [t_{1} ,\;t_{2} ,{\text{ }}...,\;t_{n} ]^{T}\) and \(b = [b_{1} ,\;b_{2} ,\;...,\;bK]^{T}\), is the hidden layer output matrix defined by the feature mapping function \(h(x)\), which maps data from the input space to a \(K\)-dimensional feature space:

$$\begin{array}{c}H=h\left(x\right)={\left[\genfrac{}{}{0pt}{}{{h}_{1}\left({x}_{j}\right)}{\begin{array}{c}\vdots \\ {h}_{K}\left({x}_{j}\right)\end{array}}\right]}^{T}={\left[\begin{array}{c}g\left({w}_{1}\times {x}_{1}+{b}_{i}\right)\\ \vdots \\ g\left({w}_{1}\times {x}_{n}+{b}_{i}\right)\end{array}\begin{array}{c}\cdots \\ \ddots \\ \cdots \end{array}\begin{array}{c}g\left({w}_{K}\times {x}_{1}+{b}_{i}\right)\\ \vdots \\ g\left({w}_{K}\times {x}_{1}+{b}_{i}\right)\end{array}\right]}_{n\times K}\end{array}$$
(17)

In ELM training, the input weights \({w}_{i}\) and \({b}_{i}\) biases are randomly assigned and fixed without adjustment, and the output weights \(\beta\) are obtained analytically. To compute \(\beta\), the following rule is applied:

$$\begin{array}{c}{\beta }{\prime}={H}^{+}\times T\end{array}$$
(18)

where \({H}^{+}\) denotes the Moore–Penrose generalized inverse of \(H\),calculated by orthogonal projection as \({\text{H}}^{ + } = {\text{H}}^{{\text{T}}} ({\text{HH}}^{{\text{T}}} )^{{ - 1}}\). The Moore–Penrose inverse ensures the minimum norm least squares solution, leading to significantly enhanced learning speed and strong generalization.

Kernel-based ELM (KELM), proposed by Huang et al.61,further enhances the generalization capability of ELM by incorporating kernel methods, outperforming least-squares-based ELM. In KELM, a regularization term \(C\) is added to the diagonal of \(H^{T} H\) when calculating the output weights, given by:

$$\begin{array}{c}\beta ={H}^{T}{\left(\frac{I}{C}+H{H}^{T}\right)}^{-1}T\end{array}$$
(19)

where \(C\) is a regularization parameter and \(I\) is the identity matrix. Accordingly, the output function can be defined as:

$$\begin{array}{c}F\left(x\right)={h\left(x\right)H}^{T}{\left(\frac{I}{C}+H{H}^{T}\right)}^{-1}T\end{array}$$
(20)

The kernel matrix for ELM is obtained by:

$$\begin{array}{c}{\Omega }_{ELM}=H{H}^{T}:{\Omega }_{{ELM}_{i,j}}={h\left({x}_{i}\right)}^{2}=K\left({x}_{i},{x}_{i}\right)\end{array}$$
(21)

where \(K\left({x}_{i},{x}_{j}\right)\) is the kernel function. The output function can then be rewritten as:

$$\begin{array}{c}F\left(x\right)={\left[\begin{array}{c}K\left(x,{x}_{1}\right)\\ \dots \\ K\left(x,{x}_{n}\right)\end{array}\right]}^{T}\times {\left(\frac{I}{C}+{\Omega }_{ELM}\right)}^{-1}T\end{array}$$
(22)

KELM leverages kernel methods to deliver higher stability and generalization performance compared to traditional ELM. Figure 9 briefly illustrates the KELM structure, where the kernel function replaces the conventional feature mapping function to transform inputs into the feature space. This implies that the neural network output no longer depends on explicit hidden-layer feature mappings; rather, it is explicitly determined by the kernel function, and the dimensionality of the feature space is no longer pre-defined.

Fig. 9
Fig. 9
Full size image

The schematic architecture of the KELM model.

In this study, the Gaussian Radial Basis Function (RBF) is employed as the kernel function for KELM, which is defined as follows62:

$$\begin{array}{c}K\left(u,v\right)=exp\left(-\gamma {\Vert u-v\Vert }^{2}\right)\end{array}$$
(23)

The penalty parameter \(C\) and the kernel parameter \(\gamma\) play crucial roles in the model construction. The penalty parameter \(C\) controls the trade-off between minimizing the fitting error and the complexity of the model. The kernel parameter \(\gamma\) defines the nonlinear mapping from the input space to a specific high-dimensional hidden feature space. Typically, to enhance the performance of KELM, these two key parameters can be appropriately selected through the use of suitable optimization algorithms.

KELM improved by TIS_NGO

The penalty parameter \(C\) and kernel parameter \(\gamma\) are core to KELM’s bankruptcy prediction performance: \(C\) balances fitting error and model complexity—too small (< 10) causes underfitting on high-dimensional financial data (e.g., 30 ratios in Wieslaw dataset), while too large (> 50) leads to overfitting on financial noise; \(\gamma\) governs nonlinear mapping strength—too small (< 1) weakens distinction between similar enterprises, too large (> 10) overreacts to financial outliers. For tuning, start with (C in10,50) and \(\gamma\) in1,10; adjust \(C\) to [10,80] for noisy data or \(\gamma\) to1,5 for high-dimensional data (e.g., with non-financial indicators) to match financial data traits63,64.

In this section, we propose an improved KELM model named TIS_NGO-KELM. This model incorporates the previously proposed TIS_NGO algorithm to adaptively determine the two key parameters of KELM: the penalty parameter \(C\) and the kernel parameter \(\gamma\). The ranges of the optimization parameters used in this study are listed in Table 5. The model mainly consists of two procedures: the first procedure optimizes parameters within the internal loop, while the second procedure evaluates the classification performance in the external loop. During the parameter optimization process, TIS_NGO is applied to dynamically adjust the KELM parameters, and the optimal parameters obtained are integrated into the KELM classification model to perform the classification task. The \(fitness\) is designed by considering the classification accuracy as follows:

$$\begin{array}{c}fitness=avgAcc={\sum }_{i}^{K}\frac{test{Acc}_{i}}{K}\end{array}$$
(24)

where \(avgAcc\) represents the average testing accuracy obtained by the KELM classifier through internal tenfold cross-validation during the parameter optimization process.

Table 5 Parameter range setting.

The overall framework of the bankruptcy prediction model based on TIS_NGO-KELM is illustrated in Fig. 10. To prevent information leakage and ensure reproducibility, this study adopts a nested tenfold cross-validation scheme, where the outer loop evaluates model generalization performance and the inner loop optimizes hyperparameters (see Fig. 10). First, the Wieslaw dataset (240 samples) and the JPNdata dataset (Japanese firm samples) are partitioned using stratified sampling to maintain the original class proportions of bankrupt vs. non-bankrupt firms (e.g., the Wieslaw dataset preserves its 112:128 ≈ 0.875:1 ratio), thereby avoiding class-distribution shift caused by random splitting. In the outer tenfold cross-validation (performance evaluation layer), each dataset is divided into 10 mutually exclusive subsets (Fold 1–Fold 10), with each Wieslaw subset containing 24 samples. In each outer iteration, one subset is used as the test set while the remaining nine subsets form the outer training set. The test set is used solely for final performance evaluation and does not participate in any model training or hyperparameter tuning, fundamentally preventing information leakage. For each outer training set, the inner fivefold cross-validation (hyperparameter optimization layer) further partitions it into five non-overlapping sub-folds (Subfold 1–Subfold 5). In each inner iteration, one sub-fold is treated as the validation set and the remaining four as the inner training set. Models such as TIS_NGO-KELM, XGBoost, and others are trained on the inner training set, and hyperparameters—such as the number of iterations in TIS_NGO and the penalty parameter C and kernel parameter γ of KELM—are adjusted based on validation performance. After completing the five validation rounds, the optimal hyperparameter configuration for that outer loop is determined. The model is then retrained on the full outer training set using the selected hyperparameters and evaluated on the corresponding outer test set.

Fig. 10
Fig. 10
Full size image

Overall prediction model structure based on TIS_NGO-KELM.

It is worth noting that a stratified sampling strategy is adopted to partition the dataset so that the proportion of non-bankrupt and bankrupt companies is preserved in each fold. Due to the randomness of sampling which may affect the results, a single tenfold CV repetition does not achieve sufficient classification accuracy. Therefore, each method was run 30 times with tenfold cross-validation, and the average of the 30 runs was taken as the final result.

The proposed TIS_NGO-KELM model was evaluated on real-world classification problems using financial datasets such as the Wieslaw dataset and JPNdata dataset. These problems are well-known in the literature and serve to validate the potential of optimization-based core models. The two experimental datasets originate from different sources. The Wieslaw dataset65 contains 30 financial ratios and 240 samples, collected from 112 bankrupt Polish companies and 128 non-bankrupt companies. The data spans five years, covering approximately three years prior to bankruptcy occurrences.

Classification prediction of bankruptcy prediction problem

Model performance evaluation indicators

Commonly used metrics for evaluating multi-class classification quality were employed to assess the performance of the proposed method. These metrics include Accuracy (ACC), Matthews Correlation Coefficient (MCC), Sensitivity, and Specificity19,66,67. Based on the confusion matrix, these four metrics are defined as follows:

$$\begin{array}{c}\genfrac{}{}{0pt}{}{ACC=\frac{1}{n}\sum_{i=1}^{n}\frac{{TP}_{i}+{Tn}_{i}}{{TP}_{i}+{FP}_{i}+{Fn}_{i}+{Tn}_{i}}\times 100\%}{\begin{array}{c}MCC=\frac{1}{n}\sum_{i=1}^{n}\frac{{TP}_{i}\times {Tn}_{i}-{FP}_{i}\times {Fn}_{i}}{\sqrt{\left({TP}_{i}+{FP}_{i}\right)\times \left({TP}_{i}+{Fn}_{i}\right)\times \left({Tn}_{i}+{FP}_{i}\right)\times \left({Tn}_{i}+{Fn}_{i}\right)}}\times 100\%\\ Sensitivity=\frac{1}{n}\sum_{i=1}^{n}\frac{{TP}_{i}}{{TP}_{i}+{Fn}_{i}}\times 100\%\\ Specificity=\frac{1}{n}\sum_{i=1}^{n}\frac{{Tn}_{i}}{{Tn}_{i}+{FP}_{i}}\times 100\%\end{array}}\end{array}$$
(25)

where \({TP}_{i}\) represents the number of true positives, i.e., the cases correctly classified as the positive class; \({TN}_{i}\) denotes the number of true negatives, i.e., the cases correctly classified as the negative class; \({FP}_{i}\) is the number of false positives, i.e., negative cases incorrectly classified as positive; and \({FN}_{i}\) is the number of false negatives, i.e., positive cases incorrectly classified as negative. \(MCC\) is a well-known metric representing the correlation coefficient between observed and predicted classifications. The \(MCC\) value indicates that the proposed model is suitable for handling imbalanced datasets. Sensitivity measures the model’s ability to correctly identify positive cases, while specificity similarly measures the ability to identify negative cases68,69,70.

Compared with different baseline models

For the bankruptcy prediction task on the above-mentioned dataset, the proposed TIS_NGO-KELM model was compared against several benchmark models, including XGBoost, LightGBM, CatBoost, and SVM. All methods were implemented in a unified environment under MATLAB 2023b, as described in Section "Ablation experiment". and no random seed was set during the experiments to avoid biasing the results. Figure 11 presents box-plot comparisons of the classification performance of these optimized KELM-based models across different bankruptcy datasets.

Fig. 11
Fig. 11
Full size image

Boxplot of TIS_NGO-KELM and different baseline models in terms of evaluation indicators on Wieslaw dataset.

From Fig. 11 (with the logic of the ACC box plot clarified), TIS_NGO-KELM exhibits a “high median + narrow dispersion” pattern in ACC: its median reaches approximately 92%, with an interquartile range (IQR) of only 3.2% and no outliers. In contrast, SVM shows the lowest median ACC (78%) with a large IQR of 9.5%, indicating substantially higher dispersion. The ACC medians of XGBoost, LightGBM, and CatBoost fall within the 85%–88% range—higher than SVM—but their box plots contain several outliers below 80%, reflecting insufficient stability on some data subsets.

For the MCC metric, TIS_NGO-KELM achieves a median above 0.85, significantly outperforming SVM (around 0.7) and the 0.75–0.8 range of XGBoost, LightGBM, and CatBoost. Its IQR remains within 3%–4%, and no outliers appear. This result indicates that TIS_NGO-KELM excels at balancing the prediction accuracy between bankrupt and non-bankrupt firms, effectively mitigating classification bias arising from class imbalance. Its prediction stability markedly surpasses that of the benchmark models. By contrast, SVM exhibits a much more dispersed MCC distribution, with an IQR exceeding 8%, revealing its limited generalization capacity in high-dimensional financial data scenarios.

Regarding the Sensitivity metric (the ability to correctly identify bankrupt firms), TIS_NGO-KELM achieves the highest median value, close to 90%. XGBoost, LightGBM, and CatBoost fall within the 80%–85% range, whereas SVM performs the worst, with a median of only around 75%. This performance gap highlights TIS_NGO-KELM’s superior sensitivity to financial distress signals, allowing for more accurate detection of firms at risk of bankruptcy—an essential feature for financial institutions conducting risk warning and decision support. Furthermore, the relatively narrow bounds of its box plot show that the model maintains stable risk identification performance across different data subsets without notable degradation.

In terms of Specificity (the correct classification rate for non-bankrupt firms), TIS_NGO-KELM again maintains a leading position, with a median of 93% and values concentrated within the 90%–95% interval. CatBoost and LightGBM follow (medians around 88%–90%), although both exhibit outliers below 80%. SVM’s median Specificity is only 85% with a broader dispersion. This indicates that TIS_NGO-KELM not only identifies distressed firms accurately but also effectively minimizes false alarms on healthy firms, thereby reducing unnecessary credit tightening or investment avoidance—achieving a balanced approach between risk control and market vitality.

Overall, TIS_NGO-KELM consistently outperforms traditional machine learning models on the Wieslaw dataset. This superiority stems from the ability of the TIS_NGO algorithm to precisely optimize key KELM parameters. Through the synergistic effects of the thinking-innovation strategy, differential-evolution predation strategy, and centroid reverse-learning boundary-control strategy, the model adapts more effectively to the high-dimensional, nonlinear characteristics of financial data while avoiding pitfalls such as local optima and insufficient generalization commonly encountered in single baseline models. These results validate the practicality and superiority of TIS_NGO-KELM for bankruptcy prediction tasks and provide strong support for its application in financial risk early-warning scenarios.

Compared with the KELM optimized by different algorithms

Before the experimental analysis, all attributes in the dataset were scaled to the range [− 1,1] to minimize numerical disparities among features and reduce computational complexity. In the bankruptcy prediction problem on the aforementioned dataset, the effectiveness of the proposed TIS_NGO-KELM model was compared against several benchmark models: PSO-KELM, DEO-KELM, GWO-KELM, WOA-KELM, DBO-KELM, BKA-KELM, AOO-KELM, and NGO-KELM. All techniques were implemented under the same environment on the MATLAB 2023b platform (as described in Section "Ablation experiment"). The maximum number of iterations and population size for each algorithm were set to 100 and 30, respectively. Figures 12 and 13 depicts the convergence curves of the fitness function over iterations for KELM optimized by different algorithms on different dataset. Figures 14 and 15 presents a boxplot comparison of the classification prediction results of these optimized KELM models on the different bankruptcy dataset.

Fig. 12
Fig. 12
Full size image

Iteration curves of KELM parameters optimized by different algorithms on Wieslaw dataset.

Fig. 13
Fig. 13
Full size image

Iteration curves of KELM parameters optimized by different algorithms on JPNdata dataset.

Fig. 14
Fig. 14
Full size image

Boxplot of TIS_NGO-KELM and its competitors in terms of evaluation indicators on Wieslaw dataset.

Fig. 15
Fig. 15
Full size image

Boxplot of TIS_NGO-KELM and its competitors in terms of evaluation indicators on JPNdata dataset.

From the convergence curves (Figs. 12 and 13), it can be observed that TIS_NGO significantly outperforms the comparison algorithms in terms of parameter optimization efficiency on both datasets. On the Wiesław dataset, the convergence curve of TIS_NGO declines rapidly within the first 20 iterations and approaches the optimal fitness value (in the range of 0.18–0.19), with no significant fluctuations in the later stages. In contrast, standard NGO, PSO, GWO, and other algorithms not only converge more slowly (e.g., standard NGO requires over 40 iterations to gradually stabilize) but also achieve significantly higher final fitness values than TIS_NGO. Among them, the DE-KELM curve remains consistently high and exhibits considerable volatility, highlighting its inferior optimization capability. On the JPNdata dataset, TIS_NGO’s advantage is further demonstrated: it reaches a stable state after approximately 10 iterations, with the final fitness value consistently remaining within the range of 0.122–0.123. This performance is superior to that of standard NGO (0.125–0.126 range) and other algorithms, and the smoother curve further confirms its robustness across different data distributions. These benefits can be attributed to the synergistic integration of three strategies in TIS_NGO: the Thinking Innovation Strategy (enhancing search efficiency and information utilization), the Differential Evolution-based Prey Attack Strategy (increasing search direction diversity to avoid local optima), and the Centroid Opposition-Based Learning with Boundary Control Strategy (reducing ineffective search and accelerating convergence).

The classification performance boxplots (Figs. 14 and 15 ) further validate the superiority of TIS_NGO-KELM in terms of practical predictive performance. On the Wiesław dataset, TIS_NGO-KELM exhibits the characteristic of “high median + narrow box” across all four metrics: the median ACC is approximately 92%, median MCC is around 0.85, and median Sensitivity and Specificity are about 90% and 93%, respectively. The interquartile ranges are narrow (only 3%–5%) with no outliers, indicating not only high classification accuracy but also exceptionally stable performance across 30 repeated trials (each using tenfold cross-validation). In contrast, the comparison model DE-KELM shows the lowest median values across all four metrics (ACC ≈ 78%, MCC ≈ 0.5), wide boxes (fluctuations of 8%–10%), and numerous outliers. Other models such as GWO-KELM and PSO-KELM also display noticeable weaknesses in certain metrics—for example, GWO-KELM has a median MCC of only 0.65, reflecting its weak balance in distinguishing between bankrupt and non-bankrupt companies. On the JPNdata dataset, although the overall data discriminability is higher (all models show slightly narrower boxes compared to the Wiesław dataset), TIS_NGO-KELM maintains its leading performance: it achieves a median MCC of approximately 0.88, making it the only model exceeding 0.85. It also ranks highest in median ACC, Sensitivity, and Specificity, with no outliers in any of the four metrics. These results further demonstrate its generalization capability and stability across regionally diverse financial data with varying distributions—a critical advantage for financial institutions in real-world decision-making. The high Sensitivity ensures accurate identification of companies at risk of bankruptcy (reducing missed alarms), while the high Specificity prevents misclassification of healthy companies (protecting corporate reputation), perfectly aligning with the practical requirements of bankruptcy prediction.

In summary, the TIS_NGO-KELM bankruptcy prediction model outperforms other comparative models, attributable to the strong optimization capability of the proposed TIS_NGO algorithm. This further highlights the excellent optimization performance of TIS_NGO compared to other algorithms. It confirms that the TIS_NGO-KELM model achieves higher stability and accuracy in the binary classification problem of bankruptcy prediction, with superior classification ability. Additionally, the introduced three improvement strategies effectively enhance the overall performance of the standard NGO. Therefore, this study offers an effective tool for researching corporate bankruptcy prediction.

Summary and prospect

Metaheuristic algorithms have become increasingly popular in optimizing machine learning models for classification problems. Due to their advantages in tuning parameters for machine learning algorithms, these optimized variants often outperform their original counterparts. This paper proposes a robust and precise machine learning method, TIS_NGO-KELM, designed to effectively address the bankruptcy prediction problem. The main innovation of this model lies in applying an improved algorithm called TIS_NGO to the Kernel Extreme Learning Machine (KELM) classifier, enhancing its generalization ability by solving for the optimal parameter settings.

The improved TIS_NGO integrates three mechanisms—Thinking Innovation Strategy, Prey Attack Strategy based on Differential Evolution operation, and Centroid Reverse Learning Boundary Control Strategy—into the original NGO algorithm. This hybrid approach leverages the Thinking Innovation Strategy to effectively utilize information within the search space during the exploitation phase, while maintaining optimization stability and diversity. The Prey Attack Strategy based on Differential Evolution increases the diversity of search directions, enabling exploration over a broader region to effectively avoid local optima. Meanwhile, the Centroid Reverse Learning Boundary Control Strategy prevents ineffective searches and accelerates convergence.

In the experimental section, to verify the superiority of TIS_NGO in global optimization of specific functions, we compare it with several advanced optimization algorithms—including Particle Swarm Optimization (PSO), Differential Evolution (DE), Grey Wolf Optimizer (GWO), Whale Optimization Algorithm (WOA), Dung Beetle Optimizer (DBO), Black Kite Algorithm (BKA), Animated Osprey Optimizer (AOO), and the original Northern Goshawk Optimization (NGO)—on the CEC2017 and CEC2022 benchmark test sets. Statistical results across all optimization tasks confirm that TIS_NGO, which integrates three enhancement strategies, achieves faster convergence and superior overall performance: it attains better fitness values on most test functions of both benchmark suites, with average rankings significantly lower than those of the original NGO and other competing algorithms. The Friedman test further supports its statistical superiority, as all p-values across the benchmark functions are far below the significance level, demonstrating that TIS_NGO outperforms all counterparts in global optimum reliability, statistical consistency, and robustness. Runtime analysis also shows that, despite the integration of three enhancement strategies, TIS_NGO maintains the same time complexity as the original NGO, with only a slight increase in computational cost that remains acceptable for practical applications.

Finally, by applying various intelligent optimization algorithms to optimize KELM models—including PSO-KELM, DE-KELM, GWO-KELM, WOA-KELM, DBO-KELM, BKA-KELM, AOO-KELM, and NGO-KELM—on real bankruptcy datasets, the study verifies the outstanding performance of TIS_NGO in financial distress prediction. To prevent information leakage and ensure reproducibility, nested cross-validation is employed, stratified sampling is used to preserve the bankruptcy/non-bankruptcy ratio in each fold, and feature normalization is strictly applied only to the training set. Results across multiple samples show that TIS_NGO-optimized KELM surpasses numerous advanced bankruptcy prediction techniques—including the aforementioned KELM variants as well as XGBoost, LightGBM, CatBoost, SVM, and regularized logistic regression—on key performance indicators such as Accuracy (ACC), Matthews Correlation Coefficient (MCC), Sensitivity, Specificity, ROC-AUC, and PR-AUC. The model not only achieves higher mean values on all metrics but also demonstrates superior stability, with lower standard deviations across repeated runs. Statistical significance analysis, including the Wilcoxon signed-rank test with Holm correction, confirms that these performance gains are not due to random variation but reflect the inherent advantages of the proposed model.

Despite expanding the dataset and optimizing the algorithm, this study still presents several clear limitations that are closely tied to the characteristics of the research object and the experimental scenario. First, although the three datasets used in this work cover both European and East Asian regions, they lack bankruptcy data from emerging markets such as Southeast Asia or Latin America, which may restrict the model’s generalizability and adaptability across diverse economic environments. Second, the scalability of the proposed model in ultra-high-dimensional feature spaces (with more than 50 financial indicators) has not yet been fully verified, as the current evaluation mainly focuses on datasets containing around 30 indicators. Third, the TIS_NGO-KELM framework relies primarily on firm-level financial indicators, without incorporating macroeconomic and market-level factors (e.g., GDP growth rate, inflation, or interest rate fluctuations), which could influence prediction accuracy during periods of economic instability.

Furthermore, a theoretical analysis of the model’s convergence behavior and a quantitative sensitivity analysis of parameters were not included in this study, which represents another limitation and an opportunity for future enhancement.

From a practical perspective, the proposed TIS_NGO-KELM model provides a novel and effective technical solution for financial institutions seeking to improve early bankruptcy risk warning systems. By combining multi-strategy optimization with an interpretable learning structure, the model not only enhances prediction accuracy and stability but also satisfies the real-world requirements of financial risk management in terms of interpretability, computational efficiency, and data compliance. The results demonstrate that integrating adaptive optimization strategies within a kernel learning framework can achieve a balanced trade-off between performance and explainability, offering practical value for auditors, credit rating agencies, and financial regulators.

In terms of future research, three specific directions will be prioritized. First, integrating the TIS_NGO-KELM model with federated learning frameworks to address financial data privacy and enable secure, cross-institutional bankruptcy prediction. Second, extending the feature space by incorporating macroeconomic and market indicators to enhance the comprehensiveness and robustness of prediction outcomes. Third, conducting extensive scalability experiments on ultra-high-dimensional financial datasets (exceeding 50 indicators) and developing a formal convergence analysis to strengthen the theoretical foundation of the proposed method.

Finally, future studies will also explore the comparison of TIS_NGO-KELM with advanced ensemble and deep learning-based predictors to further validate its competitiveness and practical potential.