Introduction

Off-road agricultural traction is generated at the soil-tire contact during field operations to develop drawbar pull, tillage draft, and tractor-implement locomotion. However, generating traction in field conditions is a power-demanding process that consumes ~ 50% of the total energy required in mechanized agricultural systems1,2. Although 90% of the tractor engine power is transmitted to tractive devices via the axle torque3the generated tractive force (FTr) is largely affected by heterogeneous and dynamically changing factors at the soil-tire and soil-implement interface, where 20–55% of tillage energy can be lost4,5. The FTr is dynamically influenced by trafficability conditions and tractor-implement settings, locomotion configurations, and causal-response effects arising from wheel load (WLoad), tire inflation pressure (PTire), rut depth (Rdepth), and the operating tillage depth (Tdepth). Multivariate causal-response effects, such as vertical soil reactions, dynamic weight transfer, rolling resistance force (FRr), tire deflections, and wheel slip (SWheel), as well as fuel-torque throttling, all influence the resultant tractive thrust force6. Therefore, the net FTr generated is a non-linear function resulting from multivariate and dynamically complex soil-machine variables, as well as tractor-implement configuration and settings. Accordingly, accurate generation and utilizing the desired FTr through conventional gear-up and throttle-down to optimize power/load mismatch, while maximizing energy-use efficiency under the intricate soil-machine variables in situ, becomes challenging. Approaches to precisely respond to dynamically variable soil conditions by quickly adjusting the multiple machinery parameters and settings that influence FTr are not documented in scientific literature. Thus, optimized generation and utilization of FTr for improved energy-use efficiency in tillage operations solely relies on the traditional “gear-up and throttle-back” methods adopted by the conventional “proficient operators”. However, maximum FTr does not entirely correspond to the maximum engine power, WLoad, tillage speed, or minimum or maximum fuel consumption rate (ØFuel), and SWheel. As a result, optimizing soil-tool-wheel and tractor-implement parameters for accurate generation and utilization of FTr from such a complex system of dynamic variables in tillage is challenging. Recent developments in intelligent autonomous tractors offer promises to reduce energy and fuel use in mechanized tillage. This is an important consideration for improving energy-use efficiency, as human-operated tractors often make inaccurate traction optimization judgments, which can consume up to 30% of production energy costs7,8. Therefore, accurate decisions in real-time are needed to leverage the dynamic and causal-response effects of multivariate soil-machine variables for optimal rate generation and effective utilization of FTr during tillage. Well-developed models can assist in optimizing FTr by considering the numerous soil-tool, soil-tire, and tractor-implement variables and adjustments required to respond to highly heterogeneous and dynamically variable (both in time and space) soil characteristics in situ. However, numerical and semi-empirical models currently available rely on a limited number of parameters (e.g., wheel load, angle of internal friction (ϕ), and soil cohesion (c)) to develop traction prediction models, suggesting suboptimal generation and utilization of FTr during tillage9. Similarly, several studies have previously adopted deficient theoretical approaches or classical soil mechanics methods and combined a limited number of variables (e.g., soil cone index (CI), forward speed, soil bulk density (γSoil), c, ϕ, and rake angle) with semi-empirical terrain parameters to quantify tillage and tractive thrust forces10,11. Such approaches neglect dynamic point-specific variations of soil parameters, and they neither provide exhaustive fundamental insights into the compounded direct and indirect influence of field soil-machine variables nor account for cumulative multi-pass and dynamic load transfer effects on FTr in situ. Efforts spent on integrating numerical and classical soil mechanics approaches have enabled the prediction of horizontal and vertical forces influencing traction, but with relatively high average errors (up to ± 33% and ± 50%, respectively)12. Furthermore, previous research has developed numerical and empirical traction models based on studies conducted under controlled laboratory conditions using indoor soil bins with uniform soil conditions13. Although soil bin studies have been instrumental in developing a fundamental understanding of the engineering principles governing soil-machine interactions, the resultant models had limitations inherent to the artificial conditions under which they were developed, most importantly, a lack of soil heterogeneity. Furthermore, soil bin models do not account for the dynamic causal-response effects of arable soil loading in situ, as commonly encountered during tractive locomotion in tillage.

The laboratory traction rigs and dynamometric traction test benches used for single-wheel testing14,15which are employed to predict traction, do not account for the off-road tractive dynamics of the entire tractor-implement drive train. Furthermore, previous soil-bin and traction rig models relied heavily on accuracy metrics; however, their robustness, reliability, and generalized adaptability to other soil conditions and unseen datasets were never fully demonstrated. Approved tractor testing authorities provide a comparison for tractive performance and test data from standardized experiments conducted on concrete tracks16which do not accurately represent the heterogeneous nature of agricultural soils. Although such approaches establish a basis for proving the generated FTr, machine learning soft computing approaches can provide more accurate and robust forecasting of FTr using multivariate datasets obtained from the soil-machine interface in situ. Multivariate dynamic systems are best evaluated using large training datasets in a soft computing environment such as machine learning17,18. The reliability and robustness of machine learning techniques can be assessed in terms of generalization compared to numerical and regression methods. Soft computing machine learning approaches utilize artificial intelligence (AI) algorithms that learn and analyze complex dataset patterns and their intricate dependencies to reveal underlying multicollinearities and to provide accurate predictions18,19. Deep neural networks (DNN) and Artificial Neural Network (ANN) algorithms can replicate the intelligent thinking of human brain neurons to simulate complex soil-machine non-linearities and associated causal-response effects for accurate, real-time, and evidence-based neurocognitive forecasting of FTr in situ. Furthermore, DNN and ANN algorithms enable smart agricultural technologies to utilize excellent and robust modeling domains with adaptability to diverse datasets from dynamic environments, such as arable soils during tillage20. Dynamic soil-machine variables at the soil-tire and soil-tool interfaces are highly multivariate, complex, and nonlinear, rendering soil processing in tillage a prime target for intelligent modeling, automation and robotization using AI18,21. Certain neurocognitive algorithms perform best with specific neuro-activation functions and learn various datasets at different levels of accuracy, computational time, and with different neuro-perceptron epoch sizes. However, the adoption of dynamic soil-machine variables for in-situ modeling of FTr using ANN and DNN, as well as the resultant systems of neurocognitive equations,  are missing from the scientific and engineering literature. Previous machine learning studies connected to tillage have not established AI equations associated with the developed ML and neurocomputing models, thus limiting the scope of their adoption, generalization and utilization.

Gap identification in available studies

The literature study demonstrates that researchers utilized several soft computing approaches with different features and datasets in predicting FTr. It has also been observed that no researchers have utilized the entire soil-machine feature viz. wheel rut depth (Rdepth), implement draft (FD), fuel consumption rate (ØFuel), four levels of wheeling load (Wload), five levels of tire inflation pressure (Ptire), four tillage depths (Tdeth), soil cone index (CIsoil), shear strength (τShear), water content (θSoil), soil bulk density (γSoil), plasticity index (IPSoil), theoretical and actual wheel slippage (SWheel), rolling resistance force (FRr) and soil-tire contact patch area (AStc) as input features in predicting FTr. Interestingly, previous researchers utilized the artificial neural network models but didn’t compare the backpropagation algorithms. Further recent hybrid optimization algorithms have not been adopted in traction predictions. In addition, it has been found that the DNN models have not been implemented with Levenberg-Marquardt (trainlm), Scaled conjugate gradient (trainscg), Quasi-newton (trainbfg), Powell-Beale conjugate gradient (traincgb), Onestep secant (trainoss), Gradient descent momentum (traingdm), Fletcher-reeves conjugate gradient (traincgf), Gradient descent (traingd), Polak-ribiére conjugate gradient (traincgp), Bayesian regularization (trainbr), Learning rate gradient descent (traingdx), and Resilient backpropagation (trainrp) algorithms and compared in predicting FTr.

Novelty of the present investigation

Considering the gap identified in the literature, the present investigation has the following novelty:

  • This study employs artificial neural network and deep neural network models with the configuration of Levenberg-Marquardt (trainlm), Scaled conjugate gradient (trainscg), Quasi-newton (trainbfg), Powell-Beale conjugate gradient (traincgb), Onestep secant (trainoss), Gradient descent momentum (traingdm), Fletcher-reeves conjugate gradient (traincgf), Gradient descent (traingd), Polak-ribiére conjugate gradient (traincgp), Bayesian regularization (trainbr), Learning rate gradient descent (traingdx), and Resilient backpropagation (trainrp) backpropagation algorithms, and analyses their prediction capabilities in predicting FTr for the first time. Further, ANN and DL models have been optimized with new Spider Wasp Optimization (SWO), Puma Optimizer (PO), and Walrus Optimization (WO) algorithms to predict FTr in situ for the first time.

  • This investigation uses Rdepth, FD, ØFuel, Wload, Ptire, Tdeth, CIsoil, τShear, θSoil, γSoil, IPSoil, SWheel, FRr, and AStc as features for predicting FTr in situ for the first time. In addition, the cosine amplitude method reveals the sensitivity of each feature in predicting FTr.

Research methodology

Determining the required FTr before tillage operations would guide the optimization of tractor-implement forces and tillage energy utilization efficiency. However, the literature indicates a lack of accurate in-situ traction prediction models for wheeled vehicular applications in tillage, due to the multivariate and complex nonlinearities at the soil-tool and soil-tire interfaces. Dynamic soil-machine variables at the soil-tire and soil-tool interfaces in situ necessitate the adoption of advanced machine-learning algorithms for accurate, robust, and reliable predictions of FTr. Furthermore, operating under diverse and heterogeneous field conditions demands site-specific tractor-implement configurations, rendering the tillage process a prime target for intelligent automation and robotization using AI. This research utilizes artificial neurocomputing algorithms and neuro-activation functions to develop neurocognitive machine learning models for predicting FTr using in-situ soil-machine variables during tillage. While certain neurocognitive algorithms may learn from specific datasets and perform best with certain neuro-activation functions, the adoption of dynamic soil-machine variables for in situ prognostication of FTr using neurocomputing is currently lacking in the literature. In this study, the neurocognitive intelligence of human brain neurons is simulated to develop DNN and ANN models to accurately determine the required FTr using the dynamic parameters of the soil-tire and soil-tool interface in-situ. The developed models are useful for intelligent control and optimized traction utilization. The models will be practically utilized by machinery managers to properly match soil conditions with tractor-implement configurations for optimal rate generation and efficient utilization of FTr from wheeled agricultural tractors, thereby conserving tillage energy, reducing fuel wastage, and minimizing CO2 emissions at reduced operational costs. Further, the models can be implemented in programmable logic controllers of wheeled autonomous robots for accurate decision-making and in-field operational adjustments of tillage robots. Figure 1 presents the research flow for assessing the FTr in this investigation, utilizing deep learning and neural networks.

Fig. 1
figure 1

Illustration of the flow of the present investigation.

Data insights and analysis

Soil and tractor-implement data acquisition in-situ

Tillage experiments were conducted in Ferralsols22 of the maize-growing region in North Rift, Kenya (0°34’16.50” N, 35°18’31.70” E, elevation: 2150 m above sea level). Eighty randomized and triplicated sites were each traversed into five profile pits for soil sampling and testing to establish average field soil water content % (θSoil), soil cone index (CI), soil bulk density (γSoil), plasticity index (IPSoil), angle of internal friction (ϕ), cohesion (c) and shear strength (τShear) in situ, at four tillage depth (Tdeth) intervals namely; 0-100, 100–200, 200–300, and 300–400 mm. The 240 completely randomized experimental sites were delineated for triplicated tractive locomotion of the research tractor (CASE IHJXM 90 HP), dynamometric with the auxiliary tractor (John Deere 5503), and remotely telemetered with MSI 8000-paired laptop device to sequentially transmit tractive parameters at five levels of tire inflation pressure (PTire, 110.4, 151.8, 193.2, 234.6, and 275.8 kPa) and four levels of wheeling load (Wload,11.3, 11.8, 12.3 and 12.8 kN) at the four levels of Tdeth. Before engagement, wheel rut depth (Rdepth) was measured at the centerline of the tire path, and the soil-tire contact patch area (AStc) of the research tractor was obtained at the five levels of PTire and four levels of Wload using geometric-image-pixel-colour correlation and segmentation tool in MATLAB. Further, aided by an auxiliary tractor, the rolling resistance force (FRr) and the theoretical and actual wheel slippage (Swheel) of the research tractor were measured at all Wloads, PTires, and Tdeths levels. Thereafter, the draft dynamometer was connected to the drawbar of the research tractor and limbered up on the front-end tow of the auxiliary John Deere 5503 mounted by a 3-point hitch strip-till subsoiler. The tractive locomotion was sequentially engaged with triplications at all five levels of PTire, as well as the four levels of Wloads and Tdeth. Instantaneous FTr generated by the research tractor at various Tdeths, Wloads, and PTires were recorded by the digital dynamometer and transmitted remotely via an MSI 8000 datalogger, telemetric with a laptop device (Fig. 2). At the same time, fuel consumption rate (ØFuel) of the research tractor (at all the Tdeths, Wloads, and Ptires) was remotely relayed from the fuel tank, digitally instrumented with Teltonika FMB920 smartphone telemetry. The dynamometer was decoupled and implement draft force (FD) obtained by deducting the auxiliary FRr from the total FTr. Simultaneously, the theoretical and actual forward speeds were thereafter used to determine respective wheel slippage (SWheel) during tillage at the corresponding tractor-implement settings, and Tdeths under the prevailing conditions in situ. All 14 soil-machine variables (Rdepth, FD, ØFuel, Wload, Ptire, Tdeth, CIsoil, τShear, θSoil, γSoil, IPSoil, SWheel, FRr, AStc) were utilized in the neurocognitive modeling and prediction of FTr.

Fig. 2
figure 2

Illustration of tractive locomotion and digital telemetry setup.

Data analysis

To analyze the database, descriptive statistics, frequency distribution, and the Pearson product-moment correlation coefficient method have been utilized in this investigation. The descriptive summary statistics of the experimental database are presented in Table 1, which indicates the range of statistical parameters for all 14 experimental variables in the database. The highest and lowest FTr values were 24.3 kN and 4.01 kN, respectively, indicating that they are within the range of tractive force generally developed by agricultural tractors during tillage. The frequency distribution of experimental FTr and the entire database variables is shown in Fig. 3(a-o). All variables reported a ‘good’ range of Gaussian normality as evidenced by the bell-shaped curves of their frequency distribution, indicating an excellent traction modeling database composed of all the variables.

Table 1 Descriptive statistics of the database.
Fig. 3
figure 3

Illustration of frequency distribution curves for (a) FTr and its neurocognitive variables, (b) Rdepth, (c) FD, (d) ØFuel, (e) Wload, (f) PTire, (g) Tdeth, (h) FRr, (i) τShear, (j) θSoil, (k) γSoil, (l) IPSoil, (m) SWheel, (n) CIsoil, (o) AStc.

Pearson correlation coefficients mapped the strength of dependence between soil-machine variables, bivariate relationships, and multicollinearity among FTr modeling variables (Fig. 4). The correlation among soil-machine variables indicated that all variables were correlated with FTr. Ranging from − 1 to 1, the absolute values indicate the presence and strength of linear relationships, while 0 means a lack of linearity between the variables. FD (0.9971), IPSoil (0.9426), and Tdepth (0.9426) exhibited the highest positive correlations. In contrast, Rdepth (0.1206) showed the least correlation with FTr but was highly correlated with PTire (0.8062). Correlations among other variables indicated that γSoil and θSoil (-0.997), θSoil and τShear (-0.9476), and θSoil and CIsoil (-0.9189) were the most negatively correlated. At the same time, FD (0.9971), IPSoil (0.9912), and IPSoil (0.9694) were the most positively correlated with FTr, Tdeth, and ØFuel, respectively. Variables with absolute correlation values greater than 0.6 exhibit a high strength of linear relationship at P < 0.0523. While a correlation index of ± 0.00 represents the absence of a relationship among the variables, non-zero positive correlations indicate that all variables increase or decrease together. By contrast, negatively correlated variables show one variable increasing as the other decreases, and vice versa, as shown in Fig. 4. All variables in the database exhibited either positive or negative multicollinearity with FTr at different levels. Correlation indices of ± 0.01 to ± 0.20, ± 0.21 to ± 0.40, ± 0.41 to ± 0.60, ± 0.61 to ± 0.80, and ± 0.81 to ± 1.00 indicate very weak, weak, moderate, strong, and very strong relationships between variables, respectively, as reported by Khatti et al.24,25.

Fig. 4
figure 4

Illustration of the correlation matrix for features and labels.

Furthermore, the dataset was normalized before neurocognitive modeling using Eq. 1 to equalize and balance the scale and range of input features, thereby reducing bias and improving computational speed, accuracy, and neurocognitive generalization.

$$\:{\varvec{X}}_{{\varvec{n}}_{\varvec{i}}}=\frac{{\varvec{X}}_{{\varvec{r}}_{\varvec{v}\varvec{i}}}-{\varvec{X}}_{{\varvec{v}}_{\varvec{i}\left(\mathbf{m}\mathbf{i}\mathbf{n}\right)}}}{{\varvec{X}}_{{\varvec{v}}_{\varvec{i}\left(\mathbf{m}\mathbf{a}\mathbf{x}\right)}-{\varvec{X}}_{{\varvec{v}}_{\varvec{i}\left(\mathbf{m}\mathbf{i}\mathbf{n}\right)}}}}\left({\varvec{X}}_{\varvec{h}\varvec{o}}-{\varvec{X}}_{\varvec{h}1})+{\varvec{X}}_{\varvec{h}1}\right){0<\varvec{X}}_{{\varvec{n}}_{\varvec{i}}}<1$$
(1)

where Xni, Xrvi, Xvi(min), and Xvi(max) are the normalized, raw, lowest, and highest values of input variables, while Xh1 and Xh2 are set to 0 and 1, respectively26,27.

Cosine amplitude sensitivity analysis

The nonlinear cosine amplitude sensitivity indexing and analysis approach was adopted to assess the relative influence of the nonlinear soil-machine variables on the dynamic traction responses for the most accurate ANN and DNN models. The cosine amplitude nonlinear sensitivity indices were established using Eq. 228.

$$\:{\varvec{S}\varvec{A}}_{\varvec{i}\varvec{j}}=\frac{\sum\:_{\varvec{k}=1}^{\varvec{m}}{\varvec{a}}_{\varvec{i}\varvec{k}}\varvec{*}{\varvec{a}}_{\varvec{j}\varvec{k}}}{\sqrt{\sum\:_{\varvec{k}=1}^{\varvec{m}}{\varvec{a}}_{\varvec{i}\varvec{k}}^{2}\varvec{*}\sum\:_{\varvec{k}=1}^{\varvec{m}}{\varvec{a}}_{\varvec{j}\varvec{k}}^{2}}}$$
(2)

where SAij is the parametric sensitivity strength, aik is the model input variable, and ajk is the predicted output. Nonlinear cosine amplitude sensitivity (SAij) is illustrated in Fig. 5 (a-f), and the analysis is summarized in Table 2. Results showed that all variables had a strong and explicative influence on FTr (SAij ≥ 0.85) when considering the entire database for the best ANN trainbr and DNN trainlm models (Fig. 5. e-f). However, FTr was most significantly influenced by draft force (FD), (SAij=0.9960 in training, SAij= 0.9968 testing), followed by Tdepth (SAij = 0.9840 train, SAij = 0.9850 testing) and Øfuel (SAij = 0.9838 training, SAij = 0.9796 testing), while Ptire had the lowest effect (SAij = 0.8540 train, 0.8160 test) for the best ANN trainbr (14-72-1) model (Fig. 5a-b). A similar trend was observed (Fig. 5c-d) for the case of the multi-layered DNN trainlm (14-7-5-1) model: FD (SAij = 0.9961 in training, SAij = 0.9970 testing) followed by Tdepth (SAij = 0.9853 train, SAij = 0.9837 test), while Ptire had the least effect (SAij = 0.8713 train, 0.7798 test). A similar trend was also observed for the entire database in both ANN trainbr (0.99610, 0.98417, 0.84835) and DNN trainlm (0.99611, 0.98419, 0.84836), as shown in Figs. 5(e) and (f), respectively. The SAij values ranged from 0 to 1, indicating the extent to which variables influenced the predicted FTr. Values that approximated 1 demonstrated the highest strength, as reported in previous studies23,29.

Fig. 5
figure 5

Illustration of the feature importance for ANN (a, b, e for training, testing, and overall databases) and DNN (c, d, f for training, testing, and overall databases) models.

Table 2 Summary of sensitivity strength of the best ANN and DNN models.

Development of computational approaches

Dataset variables were imported into MATLAB (R2024a), running on macOS (version 14.7.4), with an Intel Core i7 processor at 2.5 GHz CPU frequency and 32 GB of RAM. All neurocognitive algorithms for both DNN and ANN models were executed using custom coding in MATLAB, commanding neurocomputing architectures to predict FTr. The 14 experimental variables were sequentially subjected to all 72 DNN and ANN architecture topologies, learning on log-sig, tan-sig, and purelin neuro-activation functions for all 12 neurocognitive algorithms to predict FTr, while displaying results for various code snippets. The algorithms included trainlm, trainscg, trainbfg, traincgb, trainoss, traingdm, traincgf, traingd, traincgp, trainbr, traingdx, and trainrp. For each case, the number of hidden layers and neurons was meta-heuristically tuned while targeting 50,000 epochs. The algorithms were left to optimize different hyperparameters, including optimal epoch size, training time, convergence rate, learning rate, gradient, weights, bias, and the Marquardt weight update parameter (µ). During tuning, neuron weights and biases were updated according to each algorithm’s criterion, learning sequentially on sigmosig, tansig, and purelin neuro-activation functions to achieve neurocognitive convergence. For every architecture, the most optimal number of hidden layers and neurons that predicted the output FTr with the least mean square error of convergence and epoch optimality was identified for further evaluation. In that case, the most optimal DNN and ANN neurocognitive models were established using a broad set of evaluation metrics, including accuracy, reliability, and robustness.

Neurocomputing intelligence of deep neural networks and artificial neural network algorithms

Deep neural network and ANN models utilize computational layers using structured units called neurons to learn complex nonlinearities and establish weighted relationships among large datasets. Neurocomputing algorithms iteratively adjust the weighting of each hidden neuron and the output of the previous layer, which serves as input to the succeeding neuron layers, thereby predicting the new output30. In DNN models, the number of hidden layers and neurons is incremented heuristically until the desired output is predicted at the most optimal epoch size (training cycles) with the lowest convergence error. Whereas the DNN models have multiple hidden neuron layers through which the data is propagated, ANN models transform the data through one input layer, one hidden layer, and an output layer. Further, DNN models integrate more sophisticated ANNs with complex architectures to achieve higher levels of backpropagation inference and abstraction within datasets. The neurocognitive architecture of DNN models contains at least two hidden layers. In comparison, ANN has up to a maximum of three layers (input, one hidden, and output), as depicted in Fig. 6. In this study, tillage datasets obtained from 80 triplicated (240) experimental sites were used to develop DNN and ANN models for predicting FTr in situ using the 14 soil-machine variables.

Fig. 6
figure 6

Illustration of the neurocognitive architecture of (a) deep neural network and (b) artificial neural network models.

In recent years, AI algorithms have been deployed to train DNN and ANN models to solve complex, nonlinear agricultural problems. These algorithms learn from the available data using neuro-activation functions and predict the targeted output through iterative feed-forward backpropagation inference23. Some of the most common artificial neurocognitive algorithms include Levenberg-Marquardt (trainlm), Scaled conjugate gradient (trainscg), Quasi-newton (trainbfg), Powell-Beale conjugate gradient (traincgb), Onestep secant (trainoss), Gradient descent momentum (traingdm), Fletcher-reeves conjugate gradient (traincgf), Gradient descent (traingd), Polak-ribiére conjugate gradient (traincgp), Bayesian regularization (trainbr), Learning rate gradient descent (traingdx), and Resilient backpropagation (trainrp). These algorithms iteratively adjust the weighted neuron connections to optimize the differences between predictions and their targeted outputs in each neuron layer. For instance, traingd updates the weights and bias values in the direction of the negative gradient function according to gradient descent learning and calculates the performance of the derivative, dX, concerning weights and bias of variables using the following Eq. 31:

$$\:\varvec{d}\varvec{X}=\varvec{l}\varvec{r}\times\:\frac{\varvec{d}\varvec{p}\varvec{e}\varvec{r}\varvec{f}}{\varvec{d}\varvec{X}}$$
(3)

Where lr is the learning rate, and dperf is the derivative performance. traingdx combines adaptive learning with momentum training rates32 and the previous change (dXprev) in weight or bias33 using Eq. 4:

$$\:\varvec{d}\varvec{X}=\varvec{m}\varvec{c}\times\:\varvec{d}\varvec{X}\varvec{p}\varvec{r}\varvec{e}\varvec{v}+\varvec{l}\varvec{r}\times\:\varvec{m}\varvec{c}\times\:\frac{\varvec{d}\varvec{p}\varvec{e}\varvec{r}\varvec{f}}{\varvec{d}\varvec{X}}$$
(4)

Where mc is the momentum constant, however, traingdm updates weights and biases to adjust variables (Eq. 5) according to gradient descent with momentum34:

$$\:\varvec{d}\varvec{X}=\varvec{m}\varvec{c}\times\:\varvec{d}\varvec{X}\varvec{p}\varvec{r}\varvec{e}\varvec{v}+\varvec{l}\varvec{r}(1-\varvec{m}\varvec{c})\times\:\frac{\varvec{d}\varvec{p}\varvec{e}\varvec{r}\varvec{f}}{\varvec{d}\varvec{X}}$$
(5)

Further, traincgf iteratively searches for the steepest descent negative gradient to determine the step size that would minimize the function along the conjugate search direction35:

$$\:{\varvec{X}}_{\varvec{w}+1}={\varvec{X}}_{\varvec{w}}+\varvec{l}\varvec{r}\times\:{\varvec{d}}_{\varvec{k}}$$
(6)

Where Xw+1 and Xw are the preceding and current neuron weight vectors, and dk is the current search direction. A new search direction, dk+1, is then determined to conjugate the previous dk−1 by combining the last direction with the new steepest descent direction:

$$\:{\varvec{d}}_{\varvec{k}+1}=-\varvec{d}{\varvec{X}}_{\varvec{k}}+{\varvec{\beta\:}}_{\varvec{k}}{\varvec{d}}_{\varvec{k}-1}$$
(7)

Where dXk is the kth gradient, and βk is the Fletcher Reeves update constant of traincgf obtained35:

$$\:{\varvec{\beta\:}}_{\varvec{k}}=\frac{\varvec{d}{{\varvec{X}}_{\varvec{k}}}^{\varvec{T}}\varvec{d}{\varvec{X}}_{\varvec{k}}}{\varvec{d}{{\varvec{X}}_{\varvec{k}-\varvec{t}}}^{\varvec{T}}}\varvec{d}{\varvec{X}}_{\varvec{k}-1}$$
(8)

Where βk is the gradient ratio of current to the previously squared gradients, and each variable is now updated to the traincgf function33:

$$\:\varvec{X}=\varvec{X}+\varvec{a}\times\:{\varvec{d}}_{\varvec{k}}$$
(9)

Where the parameter a is selected to control performance along dk, the traincgp obtains the constant βk by dividing the inner product of the previous gradient changes and the current gradient by a square of the earlier gradient36:

$$\:{\varvec{\beta\:}}_{\varvec{k}}=\frac{\varvec{\varDelta\:}\varvec{d}{{\varvec{X}}_{\varvec{k}}}^{\varvec{T}}\varvec{d}{\varvec{X}}_{\varvec{k}}}{\varvec{d}{{\varvec{X}}_{\varvec{k}-\varvec{t}}}^{\varvec{T}}}\varvec{d}{\varvec{X}}_{\varvec{k}-1}$$
(10)

traincgb periodically resets the search gradient to the negative whenever the number of iterations and network parameters are equal and restarts the training if the current and previous gradients lack orthogonality34. This condition improves the training efficiencies of conjugate gradient algorithms and is tested using an inequality as indicated in Eq. 9, which, if satisfied, resets the gradient to negative37:

$$\:\left|{{{(d}_{{k}-1}}})^{T}{d}_{k}\right|\ge\:0.2\times\:{\Vert{d}_{k}\Vert}^{2}$$
(11)

trainscg was reinvented to avoid computationally expensive and time-consuming lines of the search for every iterative input-output network response in the conjugate gradient. It combines the model-trust and conjugate gradient approach, thereby establishing quadratic approximations of error, E, within the vicinity of point w using Eqw (y) and its critical points38:

$$\:{\varvec{E}}_{\varvec{q}\varvec{w}}\left(\varvec{y}\right)=\varvec{E}\left(\varvec{w}\right)+{\varvec{E}}^{\varvec{{\prime\:}}{\left(\varvec{w}\right)}^{\varvec{T}}}\varvec{y}+\frac{1}{2}{\varvec{y}}^{\varvec{T}}{\varvec{E}}^{\varvec{{\prime\:}}\varvec{{\prime\:}}}\left(\varvec{w}\right)\varvec{y}$$
(12)

Compared with trainscg, trainbfg provides less time-consuming optimization and faster convergence using the Quasi-Newton weight updates learning methods39:

$$\:{\varvec{X}}_{\varvec{w}+1}={\varvec{X}}_{\varvec{w}}-{\varvec{A}}_{\varvec{k}}^{-1}\times\:\varvec{d}{\varvec{X}}_{\varvec{k}}$$
(13)

Where Ak is the iterated performance index of the Hessian matrix in the current biases and weights, at the same time, trainoss updates the neuron weights and biases according to a one-step secant method, does not store the complete Hessian matrix, and attempts to bridge the gap between the conjugate and quasi-Newton algorithms32. It assumes the previous Hessian for every iteration to be the identity matrix and calculates new search directions without computing the inverse matrix. The trainoss computes the weighted derivative (dM) concerning bias using the gradient (gM), previous iteration step (Mstep), changes in gM of the prior iteration (dgM), and their respective scalar products Ac and Bc33:

$$\:{\varvec{d}}_{\varvec{M}}=-\varvec{g}\varvec{M}+\varvec{A}\varvec{c}\left({\varvec{M}}_{\varvec{s}\varvec{t}\varvec{e}\varvec{p}}\right)+\varvec{B}\varvec{c}\left(\varvec{d}\varvec{g}\varvec{M}\right)$$
(14)

Although trainlm is the fastest backpropagation algorithm, it requires high computational memory40 to compute the gradients of the Jacobian matrix, J, containing first derivative errors concerning the weights and biases of the network, with less complexity than the Hessian matrix using Eq. 15:

$$\:\varvec{d}\varvec{X}={\varvec{J}}^{\varvec{T}}\times\:\varvec{e}$$
(15)

Where e is the network vector error, the trainlm also uses approximations to the Hessian matrix of identity I to update neuron weights41:

$$\:{\varvec{X}}_{\varvec{w}+1}={\varvec{X}}_{\varvec{w}}-{\left({\varvec{J}}^{\varvec{T}}\varvec{J}+\varvec{\mu\:}\varvec{I}\right)}^{-1}{\varvec{J}}^{\varvec{T}}\times\:\varvec{e}$$
(16)

trainbr updates the neuron weights and biases through Bayesian regularization, which optimizes, minimizes, and determines the optimal combination of squared errors and their weights to produce a well-generalizing network. It computes the performance of Jacobian jX for random regularization parameters related to random variables of bias and weights distributions35:

$$\:\varvec{j}\varvec{j}=\varvec{j}\varvec{X}.\varvec{j}\varvec{X};\:\varvec{j}\varvec{e}=\varvec{j}\varvec{X}.\varvec{E}\:\mathbf{a}\mathbf{n}\mathbf{d}\:\varvec{d}\varvec{X}=-\frac{\varvec{j}\varvec{j}+(\varvec{I}.\varvec{m}\varvec{u})}{\varvec{j}\times\:\varvec{e}}$$
(17)

Where E is the total error, trainrp updates neuron weights and biases by eliminating the magnitude and effect of the partial derivatives through resilient backpropagation, with the direction of weighted updates determined by the sign and size of the derivatives (dwjk) for each connection42:

$$\:{\varvec{d}\varvec{w}}_{\varvec{j}\varvec{k}}\left(\varvec{m}\right)=\varvec{a}\times\:{\varvec{X}}_{\varvec{j}}\left(\varvec{m}\right)\times\:{\varvec{\delta}}_{\varvec{k}}\left(\varvec{m}\right)$$
(18)

Where a, xj(m), m, ẟk are the learning rate, backpropagation input at the ith neuron in step time m, and ẟk is the error gradient. However, these updates remain the same if the derivatives converge to zero. The DNN and ANN training algorithms reviewed in the literature and adopted in the present study are summarized in Table 3.

Table 3 Comparison and advantages of backpropagation algorithms used in neurocognitive modeling.

Neuro-transfer functions of deep learning and artificial neural networks

Artificial Intelligence (AI) algorithms train DNN and ANN models on empirical data as the input layer to generate prediction outputs. Various neurons in the first and last layers represent input and outputs interconnected with one or more hidden layers using neurons (nodes). A neuron output (zt) is defined by the relationship between inputs and outputs through an activation function18. Considering an activation function φ(t) in Eq. 19, this relationship can be expressed using Eq. 2043.

$$\:{\varvec{z}}_{\varvec{t}}=\varvec{\phi\:}\left({\varvec{t}}_{\varvec{x}}\right)$$
(19)
$$\:{\varvec{t}}_{\varvec{x}}={\sum\:}_{\varvec{k}=1}^{\varvec{n}}{\varvec{w}}_{\varvec{x}\varvec{k}}{\varvec{y}}_{\varvec{k}}+{\varvec{b}}_{\varvec{x}}$$
(20)

Where n is the number of inputs, w is the weighted connections between neuron x and k, y is the input from neuron node k, and bx is the bias, respectively. This summation is processed through the neuron transfer function, , to generate output43:

$$\:\varvec{\phi\:}\left({\varvec{t}}_{\varvec{x}}\right)=\varvec{\phi\:}\left[\left({\sum\:}_{\varvec{k}=1}^{\varvec{n}}{\varvec{w}}_{\varvec{x}\varvec{k}}{\varvec{y}}_{\varvec{k}}\right)+{\varvec{b}}_{\varvec{x}}\right]$$
(21)

Activation functions expressed by (t) define the output of a neuron in terms of the induced local field. The input data is processed by the neuron activation functions associated with the weighted connections, which adjust iteratively to reduce the differences between predicted and target values by optimizing the weights. The feed-forward weight adjustments proceed until the maximum number of predefined epochs is reached, at which point the specified error limits are met. In this study, a combination of logsig (log–sigmoid), Purelin (linear), and tansig (hyperbolic tangent sigmoid) neuron activation functions was deployed (Eqs. 2224):

$$\:\varvec{\phi\:}\left(\varvec{t}\right)=\frac{1}{1+{\varvec{e}}^{-\varvec{t}}}\:\mathbf{f}\mathbf{o}\mathbf{r}\:0\hspace{0.17em}\ge\:\hspace{0.17em}\mathbf{z}\mathbf{x}\:\le\:1$$
(22)
$$\:\varvec{\phi\:}\left(\varvec{t}\right)=\varvec{t}\:\mathbf{f}\mathbf{o}\mathbf{r}\:-\mathbf{\infty\:}\ge\:\mathbf{z}\mathbf{x}\le\:+\mathbf{\infty\:}$$
(23)
$$\:\varvec{\phi\:}\left(\varvec{t}\right)=\frac{2}{{1+\varvec{e}}^{-2\varvec{t}}}-1\:\mathbf{f}\mathbf{o}\mathbf{r}\:-\hspace{0.17em}1\hspace{0.17em}\ge\:\hspace{0.17em}\mathbf{z}\mathbf{x}\:\le\:+1$$
(24)

Feed-forward backpropagation in deep learning and artificial neural networks

Artificial neurocognitive algorithms based on feed-forward-back propagation (FFBP) perform computations through the network with an error-back propagation function44:

$$\:{\varvec{P}}_{\varvec{E}\varvec{r}\varvec{r}\varvec{o}\varvec{r}}=\frac{1}{\varvec{x}}({\sum\:}_{\varvec{x}}{\sum\:}_{\varvec{p}}\left({\varvec{n}}_{\varvec{x}\varvec{k}}-{\varvec{z}}_{\varvec{p}\varvec{k}}\right)$$
(25)

Where PError is the propagated error, x is the indexed training set, p is the index of various neuron outputs, nxk is the kth element of the desired xth model, and zpk is the kth element of the predicted neuron outputs. Upon determining errors, backpropagation algorithms adjust the neuron weights iteratively using an expression that minimizes the total error to the lowest acceptable levels. Each neuron-weighted factor changes throughout the FFBP training process until the error function reaches a minimum. Considering the ith iteration, the weights can be adjusted44:

$$\:{\varvec{w}}_{\varvec{i}\varvec{j}}\left(\varvec{t}+1\right)={\varvec{w}}_{\varvec{i}\varvec{j}}\left(\varvec{t}\right)+\varvec{\mu\:}\varvec{\varDelta\:}\varvec{w}-\varvec{\eta\:}\left(\frac{\partial\:\varvec{E}}{\partial\:{\varvec{w}}_{\varvec{i}\varvec{j}}}\right)\:\:\:\mathbf{f}\mathbf{o}\mathbf{r}\:0\hspace{0.17em}>\hspace{0.17em}\hspace{0.17em}<\hspace{0.17em}1\:\&\:0\hspace{0.17em}>\hspace{0.17em}\hspace{0.17em}<\hspace{0.17em}1$$
(26)

Where µ, Δw, and η are the momentum, previous layer weight change, and learning rate, respectively. Considering the layer number and output vector, the weight adjustment can be expressed45:

$$\:{\varvec{w}}_{{\varvec{J}}_{(\varvec{L}-1)}}{\varvec{h}}_{\varvec{L}}(\varvec{t}+1)={\varvec{w}}_{{\varvec{j}}_{(\varvec{L}-1)}}{\varvec{h}}_{\varvec{L}}\left(\varvec{t}\right)+\varvec{\mu\:}\left[{\varvec{w}}_{{\varvec{j}}_{(\varvec{L}-1)}}{\varvec{h}}_{\varvec{L}}\left(\varvec{t}\right)-{\varvec{w}}_{{\varvec{j}}_{(\varvec{L}-1)}}{\varvec{h}}_{\varvec{L}}(\varvec{t}-1)\right]+\varvec{\eta\:}{\varvec{\delta\:}}_{{\varvec{h}}_{\varvec{L}}}^{\varvec{k}}{\varvec{x}}_{{\varvec{j}}_{(\varvec{L}-1)}}^{\varvec{k}}$$
(27)

Where L and xk are the ith layer number and output vector, respectively, during training, the algorithms adjust the bias and neuron weights to minimize errors between the actual input data and the neuron prediction output. Once the neuron architecture is trained and neuron weights specified, it can then be validated on the new datasets. Although DNN tends to provide more accurate results with large datasets, certain neurocognitive algorithms may learn better with specific datasets and perform better with certain neuro-activation functions in ANN than in DNN, and vice versa. A comparison of neurocognitive models with the corresponding algorithms utilized in previous agricultural operations is presented in Table 4.

Table 4 Summary of tillage studies associated with deep learning and artificial neural network modeling.

ANN model

The MATLAB simulation of neurocognitive architecture for the most accurate and optimal ANN model is presented in Fig. 7. Metaheuristic evaluation of the ANN models indicated that the most optimal ANN model comprised a single-layered architecture with 72 neurons learning on the tansig transfer function in the hidden layer and purelin in the output layers, as shown in Fig. 7. The hyperparameter details of the neurocomputing architecture of the ANN model are presented in Table 5. In contrast, the neurocognitive model equation takes the form of Eq. 28.

$$\:{\varvec{F}}_{\varvec{T}\varvec{r}}=\varvec{p}\varvec{u}\varvec{r}\varvec{e}\varvec{l}\varvec{i}\varvec{n}\left\{\varvec{t}\varvec{a}\varvec{n}\varvec{s}\varvec{i}\varvec{g}\left(\varvec{W}\bullet\:\varvec{X}+\varvec{b}\right)\right\}$$
(28)

Where W is the neuron weight, while X and b represent the input and bias, respectively.

Fig. 7
figure 7

Illustration of artificial neural network model: (a) MATLAB simulator and (b) traction prediction neurocognitive architecture.

Table 5 Summary of hyperparameter configuration of the ANN model.

The ANN neurocognitive model equation of the single-layered ANN model (14-72-1) takes into account all the input variables, neuron weights, hidden neurons, layers, and the biases to predict FTr output are as shown in Eq. 29. Characteristic values of neuron weights and biases associated with the ANN model equation are as shown in Table 6.

$$\begin{gathered} \:F_{{Tr}} = \sum\limits_{{i = 1}}^{{72}} {v_{i} \:{\mathbf{tanh}}(w_{{i1}} R_{{depth}} + w_{{i2}} {\mathcal{O}}_{{{\text{Fe}}}} + w_{{i3}} F_{{\text{D}}} + w_{{i4}} W_{{{\text{Load}}}} + w_{{i5}} P_{{{\text{Tire}}}} + w_{{i6}} T_{{{\text{depth}}}} } \hfill \\ \quad \quad \quad + w_{{i7}} CI_{{{\text{Soil}}}} + w_{{i8}} \tau _{{{\text{Shear}}}} + w_{{i9}} \theta \:_{{{\text{Soil}}}} + w_{{i10}} \gamma \:_{{{\text{Soil}}}} + w_{{i11}} I_{{{\text{pSoil}}}} + w_{{i12}} S_{{{\text{Wheel}}}} \hfill \\ \quad \quad \quad + w_{{i13}} F_{{{\text{Rx}}}} + w_{{i14}} A_{{{\text{stc}}}} + b_{i} ) + k \hfill \\ \end{gathered}$$
(29)
Table 6 Neurocognitive bias and neuron weights associated with the ANN model.

DNN model

The most optimal deep learning model utilized both tansig and logsig neural activation functions in the multiple hidden layers, while the output layers were trained on the purelin function. Figure 8 shows the best MATLAB simulator for the DNN model architecture. The DNN model approach utilized the neurocomputing form of Eq. 30 within the neurocognitive architecture to predict FTr.

$$\:{\varvec{F}}_{\varvec{T}\varvec{r}}=\varvec{p}\varvec{u}\varvec{r}\varvec{e}\varvec{l}\varvec{i}\varvec{n}\left[\varvec{l}\varvec{o}\varvec{g}\varvec{s}\varvec{i}\varvec{g}\left\{\varvec{t}\varvec{a}\varvec{n}\varvec{s}\varvec{i}\varvec{g}\left({\varvec{W}}_{1}\bullet\:\varvec{X}+{\varvec{b}}_{1}\right)\bullet\:{\varvec{W}}_{2}+{\varvec{b}}_{2}\right\}\right]$$
(30)

Where W1 and W2 are the neuron weights of the first and second hidden layers, respectively, while b1 and b2 are the corresponding neuron biases, the model architecture utilized four interconnections with two hidden layers, comprising seven hidden neurons in the first hidden layer and five neurons in the second hidden layer, to predict the output layer (FTr). The DNN model was meta-heuristically configured (Fig. 8), and the hyperparameters of the model architecture are shown in Table 7.

Fig. 8
figure 8

Illustration of deep learning model: (a) MATLAB simulator and (b) traction prediction neurocognitive architecture.

Table 7 Summary of hyperparameter configuration of deep neural network model.

The neurocognitive equation of DNN trainlm (14-7-5-1) model predicts FTr output by propagating all the respective tillage variables inputs through their corresponding neurons, adjustable neuron weights, hidden layers, and biases as shown in Eq. 31. Details of corresponding values of all the neuron weights, hidden layers and biases for each layer are shown in Table 8.

$$\begin{gathered} F_{{Tr}} = \sum\limits_{{j = 1}}^{7} {} \sum\limits_{{k = 1}}^{5} {} \sum\limits_{{L = 1}}^{1} {w_{{kl}} } \{ \frac{2}{{1 + e^{{ - 2\left( {\begin{array}{*{20}l} {w_{{i1}} R_{{depth}} + w_{{i2}} {\mathcal{O}}_{\mathcal{F}\text{uel}} + w_{{i3}} F_{D} + w_{{i4}} W_{{Load}} + w_{{i5}} P_{{Tire}} + w_{{i6}} T_{{depth}} + w_{{i7}} CI_{{Soil}} } \\ {\: + w_{{i8}} \tau _{{Shear}} + w_{{i9}} \theta _{{Soil}} + w_{{i10}} \gamma \:_{{Soil}} + w_{{i11}} P_{{Soil}} + w_{{i12}} S_{{Wheel}} + w_{{i13}} F_{{Rx}} + w_{{i14}} A_{{stc}} + b_{j} } \\ \end{array} } \right)}} }}\\ + \left( {w_{{jk}} x_{2} + b_{k} } \right)\} - 1 + b_{L} \end{gathered}$$
(31)
Table 8 Neurocognitive biases and neuron weights associated with the deep neural network model.

Modeling status and neurocognitive error zeroing

Neurocognitive modeling states of ANN trainbr 14-72-1 and DNN trainlm 14-7-5-1 provide insights into their operational dynamics as shown in Fig. 9 (a-d). First, neurocognitive training of ANN trainbr 14-72-1 converged at lower perceptron optimality (51 epochs) and mean square error of convergence (2.933e-11) but higher damping factor (mu) of 500 (Fig. 9a-c), than DNN trainlm (14-7-5-1) as shown in Fig. 9 (b-d). The model demonstrated smooth neuron transition states and convergence, likely due to its ability to adjust weight updates effectively through Bayesian priors, ensuring stability in the training process. This characteristic has been validated in studies by Keshun et al.58 and Zhang et al.59emphasizing the importance of robust priors in neural network optimization for system reliability. On the other hand, the DNN trainlm (14-7-5-1) model exhibited a lower convergence gradient (7.849e-8) and mu factor of 1.0e-9 than the ANN trainbr 14-72-1 (2.933e-11 and 500), indicating precise weight updates and neuron transitions (Fig. 9b and d). The lower values demonstrate the efficacy of the second-order optimization techniques of the trainlm algorithm in achieving a balanced neurocognitive exploration-exploitation trade-off during training.

Fig. 9
figure 9

Illustration of performance plots for (a, c) ANN trainbr 14-72-1 and (b, d) DNN trainlm 14-7-5-1.

Conversely, error-zeroing histograms demonstrate distinctive performance strengths in both ANN trainbr 14-72-1 and DNN trainlm (14-7-5-1) models, as shown in Fig. 10. The ANN 14-72-1 architecture, utilizing Bayesian regularization, converged all the instances in a more uniformly distributed manner and stabilized very close to zero error line between − 2.1e5 to 1.28e5, showcasing its efficiency in minimizing overfitting. Bayesian approaches, as highlighted in Baumgartner et al.60emphasize the role of Bayesian regularization in enhancing model robustness, particularly in noisy or complex datasets. However, the DNN (14-7-5-1) model, based on the trainlm optimization technique, distributed the errors between − 0.00094 and 0.000869. Despite its higher convergence error, trainlm excels in handling deep multilayered architectures such as the 14-7-5-1, which require intricate optimization strategies, underscoring the adaptability of gradient-based optimization techniques for DNN applications. Furthermore, unlike the widely distributed error instances in ANN trainbr 14-72-1 and its higher training time (29 s), as reported earlier, almost all the modeling instances in DNN trainlm 14-7-5-1 converged at a single error value (1.44e-05), close to the zero-error line, in a short neurocomputing time (2 s). These findings align with the comprehensive overview by Mienye and Swart61. Furthermore, we hypothesize that models utilizing trainbr algorithms require more robust and adequate training time to yield accurate results. In contrast, quick results can be obtained from a DNN trained on trainlm, albeit with compromised error limits unless denoising and overfitting costs are incurred.

Fig. 10
figure 10

Illustration of error histogram plots for (a) ANN trainbr 14-72-1 and (b) DNN trainlm 14-7-5-1.

Results and discussion

The neurocognitive performance of the models in predicting FTr was evaluated using broad statistical criteria of accuracy and reliability metrics. Furthermore, to assess the neurocognitive accuracy and reliability metrics, this study employed multiple statistical criteria, including Taylor analysis, Monte Carlo uncertainty, and the Anderson-Darling test (AD-test), to establish the neurocognitive robustness for the generalized adoption of DNN and ANN models. Moreover, nonlinear cosine amplitude sensitivity indexing was employed to determine the relative influence of each soil-machine variable on FTr for the most accurate DNN and ANN models during training and testing, as well as for the entire database. Accuracy metrics were used to evaluate the prediction performance of the DNN and ANN models, which were implemented in the source coding environment and console execution interface of the statistical software R, version 4.4.2. These metrics included Mean Squared Error (MSE), coefficient of determination (R2), Root Mean Square Error (RMSE), Mean Absolute Error (MAE), Mean Absolute Percentage Error (MAPE), Sum Square Error (SSE), Prediction scatter (Tscatter), Coefficient of Variation (CV) and Prediction Accuracy (PA)23:

$$\:\varvec{R}=\frac{{\sum\:}_{\varvec{i}=1}^{\varvec{n}}\left({\varvec{F}}_{\varvec{T}\varvec{r}\varvec{P}}-{\overline{\varvec{F}}}_{\varvec{T}\varvec{r}\varvec{P}}\right)\left({\varvec{F}}_{\varvec{T}\varvec{r}\varvec{A}}-{\overline{\varvec{F}}}_{\varvec{T}\varvec{r}\varvec{A}}\right)}{\sqrt{{\sum\:}_{\varvec{i}=1}^{\varvec{n}}{\left({\varvec{F}}_{\varvec{T}\varvec{r}\varvec{P}}-{\overline{\varvec{F}}}_{\varvec{T}\varvec{r}\varvec{P}}\right)}^{2}\varvec{x}{\sum\:}_{\varvec{i}=1}^{\varvec{n}}{\left({\varvec{F}}_{\varvec{T}\varvec{r}\varvec{A}}-{\overline{\varvec{F}}}_{\varvec{T}\varvec{r}\varvec{A}}\right)}^{2}}}$$
(32)
$$\:\varvec{M}\varvec{S}\varvec{E}={\sum\:}_{\varvec{i}=1}^{\varvec{n}}({\varvec{F}}_{\varvec{T}\varvec{r}\varvec{A}}-{\varvec{F}}_{\varvec{T}\varvec{r}\varvec{P}}{)}^{2}$$
(33)
$$\:{\varvec{R}}^{2}=\frac{{\sum\:}_{\varvec{i}=1}^{\varvec{n}}({\varvec{F}}_{\varvec{T}\varvec{r}\varvec{p}}-{\varvec{F}}_{\varvec{T}\varvec{r}\varvec{A}}{)}^{2}}{{\sum\:}_{\varvec{i}=1}^{\varvec{n}}({\varvec{F}}_{\varvec{T}\varvec{r}\varvec{P}}-{\overline{\varvec{F}}}_{\varvec{T}\varvec{r}}{)}^{2}}$$
(34)
$$\:\varvec{R}\varvec{M}\varvec{S}\varvec{E}=\sqrt{\frac{1}{\varvec{n}}{{\sum\:}_{\varvec{i}=1}^{\varvec{n}}\left({\varvec{F}}_{\varvec{T}\varvec{r}\varvec{P}}-{\varvec{F}}_{\varvec{T}\varvec{r}\varvec{A}}\right)}^{2}}$$
(35)
$$\:\varvec{S}\varvec{S}\varvec{E}={{\sum\:}_{\varvec{i}=1}^{\varvec{n}}\left({\varvec{F}}_{\varvec{T}\varvec{r}\varvec{A}}-{\varvec{F}}_{\varvec{T}\varvec{r}\varvec{P}}\right)}^{2}$$
(36)
$$\:\varvec{T}\varvec{S}\varvec{S}\varvec{E}={\sum\:}_{\varvec{i}=1}^{\varvec{n}}\left({\varvec{F}}_{\varvec{T}\varvec{r}\varvec{A}}-{\overline{\varvec{F}}}_{\varvec{T}\varvec{r}}\right)$$
(37)
$$\:{\varvec{T}}_{\varvec{S}\varvec{c}\varvec{a}\varvec{t}\varvec{t}\varvec{e}\varvec{r}}=1-\frac{{\sum\:}_{\varvec{i}=1}^{\varvec{n}}({\varvec{F}}_{\varvec{T}\varvec{r}\varvec{A}}-{\overline{\varvec{F}}}_{\varvec{T}\varvec{r}\varvec{P}}{)}^{2}}{{\sum\:}_{\varvec{i}=1}^{\varvec{n}}({\varvec{F}}_{\varvec{T}\varvec{r}\varvec{A}}-{\overline{\varvec{F}}}_{\varvec{T}\varvec{r}}{)}^{2}}$$
(38)
$$\:\varvec{M}\varvec{A}\varvec{E}=\frac{1}{\varvec{n}}{\sum\:}_{\varvec{i}=1}^{\varvec{n}}\left|{\varvec{F}}_{\varvec{T}\varvec{r}\varvec{P}}-{\varvec{F}}_{\varvec{T}\varvec{r}\varvec{A}}\right|$$
(39)
$$\:\varvec{M}\varvec{A}\varvec{P}\varvec{E}=\frac{\left({\sum\:}_{\varvec{i}=1}^{\varvec{n}}\frac{\left|{\varvec{F}}_{\varvec{T}\varvec{r}\varvec{A}}-{\varvec{F}}_{\varvec{T}\varvec{r}\varvec{P}}\right|}{{\varvec{F}}_{\varvec{T}\varvec{r}\varvec{A}}}\right)}{\varvec{n}}\varvec{x}100$$
(40)
$$\:\varvec{C}\varvec{V}=\frac{\sqrt{{\sum\:}_{\varvec{i}=1}^{\varvec{n}}\left(\frac{{\left({\varvec{F}}_{\varvec{T}\varvec{r}\varvec{p}}-{\overline{\varvec{F}}}_{\varvec{T}\varvec{r}}\right)}^{2}}{\varvec{n}-1}\right)}}{\left(\frac{{\sum\:}_{\varvec{i}=1}^{\varvec{n}}{\varvec{F}}_{\varvec{T}\varvec{r}\varvec{A}}}{\varvec{n}}\right)}$$
(41)
$$\:\varvec{P}\varvec{A}=\left[1-\left(\frac{1}{\varvec{n}}{\sum\:}_{\varvec{i}=1}^{\varvec{n}}\frac{\left|{\varvec{F}}_{\varvec{T}\varvec{r}\varvec{A}}-{\varvec{F}}_{\varvec{T}\varvec{r}\varvec{P}}\right|}{{\varvec{F}}_{\varvec{T}\varvec{r}\varvec{A}}}\right)\right]\varvec{x}100$$
(42)

Where n is the number of variables, FTrP, and \(\:{\stackrel{-}{F}}_{Tr}\) are the predicted FTr and its corresponding mean, while FTrA and \(\:{\stackrel{-}{F}}_{TrA}\) represent the actual and mean of experimental FTr, and FTri is the ith FTr, respectively. Moreover, the neurocognitive reliability of model predictions was assessed using a20-index (a20), Willmott`s index of agreement (IOA), Index of scatter (IOS), Variance accounted for (VAF), Performance index (PI)29,62:

$$\:\varvec{a}20-\varvec{i}\varvec{n}\varvec{d}\varvec{e}\varvec{x}=\frac{\varvec{m}20}{\varvec{M}}$$
(43)
$$\:\varvec{I}\varvec{O}\varvec{S}=\frac{\varvec{R}\varvec{M}\varvec{S}\varvec{E}}{{{\stackrel{-}{\varvec{F}}}_{\varvec{T}\varvec{r}}}_{\varvec{A}}}$$
(44)
$$\:\varvec{I}\varvec{O}\varvec{A}=1-\left[\frac{{\sum\:}_{\varvec{i}=1}^{\varvec{n}}{\left({{\stackrel{-}{\varvec{F}}}_{\varvec{T}\varvec{r}}}_{\varvec{P}}-{{\varvec{F}}_{\varvec{T}\varvec{r}}}_{\varvec{A}}\right)}^{2}}{\sum\:_{\varvec{i}=1}^{\varvec{n}}{\left\{\left|\left({{\varvec{F}}_{\varvec{T}\varvec{r}}}_{\varvec{P}}-{{\stackrel{-}{\varvec{F}}}_{\varvec{T}\varvec{r}}}_{\varvec{A}}\right)\right|+\left|\left({{\varvec{F}}_{\varvec{T}\varvec{r}}}_{\varvec{A}}-{{\stackrel{-}{\varvec{F}}}_{\varvec{T}\varvec{r}}}_{\varvec{A}}\right)\right|\right\}}^{2}}\right]$$
(45)
$$\:\varvec{V}\varvec{A}\varvec{F}=\left[1-\frac{\varvec{v}\varvec{a}\varvec{r}\left({{\stackrel{-}{\varvec{F}}}_{\varvec{T}\varvec{r}}}_{\varvec{A}}-{{\stackrel{-}{\varvec{F}}}_{\varvec{T}\varvec{r}}}_{\varvec{P}}\right)}{\varvec{v}\varvec{a}\varvec{r}\left({{\stackrel{-}{\varvec{F}}}_{\varvec{T}\varvec{r}}}_{\varvec{A}}\right)}\right]\varvec{x}100$$
(46)
$$\:\varvec{P}\varvec{I}={\varvec{R}}^{2}+0.01\varvec{x}\varvec{V}\varvec{A}\varvec{F}-\varvec{R}\varvec{M}\varvec{S}\varvec{E}$$
(47)

Where m20 and M represent datasets with actual/prediction ratios of 0.8 and 1.2, while n and p indicate the total number of data samples and inputs, respectively. Further, the Wilcoxon rank-sum indexing was adopted to establish the effective reliability of DNN and ANN models by comparing nonparametric statistical Wilcoxon rank-sum test score index, performed by ranking the accuracy and reliability metrics of each ANN and DNN model in their order of increased value and by assigning a rank number (ϑscore) to every metric. The sum of ranks for each model was calculated to establish the respective total Wilcoxon rank-sum test score (ξtotal) values for comparison. The depth rank of the individual ϑscore for accuracy and reliability metrics was obtained during both the training and testing phases. The sum of the ξtotal was then established by summing the particular metric ranks for the two modeling phases. The model with the highest accuracy and reliability metrics received a greater ξtotal, yielding a larger Wilcoxon rank-sum statistic index (Eq. 48). Consequently, the highest value of ξtotal represents the optimal ANN and DNN model architecture.

$$\:{\varvec{\xi\:}}_{\varvec{t}\varvec{o}\varvec{t}\varvec{a}\varvec{l}}=\left[\sum\:_{\varvec{i}=1}^{\varvec{m}}{\varvec{\xi\:}}_{\varvec{i}}+\sum\:_{\varvec{j}=1}^{\varvec{n}}{\varvec{\xi\:}}_{\varvec{j}}\right]$$
(48)

Where ξi and ξj are the ξtotal scores during training and testing, respectively, while m and n are their corresponding values of ϑscore in the respective modeling phases, Table 9 presents the ideal values for the performance metrics.

Table 9 Ideal values of the performance metrics.

The selection of multiple statistical and error-based performance metrics, such as Mean Squared Error (MSE), Coefficient of Determination (R²), Root Mean Square Error (RMSE), Mean Absolute Error (MAE), Mean Absolute Percentage Error (MAPE), Sum Square Error (SSE), Prediction Scatter (Tscatter), Coefficient of Variation (CV), Prediction Accuracy (PA), a20-index (a20), Willmott’s Index of Agreement (IOA), Index of Scatter (IOS), Variance Accounted For (VAF), and Performance Index (PI), is crucial for comprehensive model evaluation and validation. Each metric captures different aspects of model behavior: MSE, RMSE, MAE, and MAPE quantify the magnitude and type of prediction errors; R² and VAF assess how well the model explains the variance in observed data; SSE measures the total deviation from actual values; CV standardizes the error relative to the mean; PA and a20-index evaluate classification and proximity-based accuracy; IOA and IOS focus on agreement and dispersion between predicted and observed values; while PI combines multiple error components into a single composite score.

Simulation of results

Tables 10 and 11 present the results of various accuracy metrics for all 72 DNN and ANN models, including their training and testing performance, as well as a comparison between them. The most intelligent models possessed the lowest MSE, RMSE, SSE, TSSE, MAE, MAPE, and CV. Still, with the highest R, R2,Tscatter, and PA as reported by Jierula et al.63. When considering the overall accuracy metrics, ANN trainbr (14-72-1) and DNN trainlm (14-7-5-1) models were the most accurate. Therefore, they were adopted for further evaluation. However, DNN trainlm (14-7-5-1) consumed higher perceptron optimality (55 epochs) with less neurocomputing time (2s) compared to ANN trainbr (14-72-1), 51(epochs), and 29s, respectively. Although the number of optimal epochs did not regress with training time, the number of hidden layers was congruent with epoch size in DNN and incongruent in ANN. The number of hidden layers was also incongruent with prediction accuracy in ANN but congruent in DNN. Further, the number of hidden neurons was congruent with accuracy in ANN but incongruent in DNN for all the hidden layers. Increasing the number of hidden layers and neurons reduced the accuracy of neurocomputing in ANN but improved accuracy in DNN. As such, the trainlm and trainbr algorithms were the most accurate in DNN and ANN modeling, respectively.

Table 10 Summary of accuracy metrics of the ANN model in predicting FTr.
Table 11 Summary of accuracy metrics of deep learning models in predicting FTr.

Neurocognitive prediction performance during training, testing, validation sets, and with the entire database is shown in Fig. 11(a-f). The regressed correlation was strong for both models, with ANN trainbr 14-72-1 achieving a marginally superior correlation than DNN trainlm 14-7-5-1, likely due to its regularization capabilities that enhance generalization. Predictive regression in neurocomputing networks is crucial for accuracy and neurocognitive reliability60. Despite a slightly higher MSE, the DNN trainlm 14-7-5-1 model maintained adequate input-output regression (> 0.999) owing to its precise neuron transition dynamics. This observation aligns with previous studies, in which second-order methods have been shown to excel in predictive tasks that require complex data relationships, as reported by Mienye and Swart61Huo et al.64 and Bassiouni et al.65. The complementary data fitting strengths of the trainbr and trainlm algorithms highlight their potential for complex predictive modeling in diverse agricultural tasks using ANN and DNN models, respectively. Rapid convergence and robust generalization of trainbr make it ideal for addressing scenarios with limited or less noisy data. At the same time, the trainlm is adapted to suit deeper, more complex architectures that provide quick results, subject to denoising at higher backpropagation memory cost. Nonetheless, both the ANN trainbr 14-72-1 and DNN trainlm 14-7-5-1 models exhibited input-output prediction correlations of greater than 99.9% (Fig. 11), reaffirming their neurocognitive accuracy in forecasting FTr during tillage.

Fig. 11
figure 11

Illustration of regression of Artificial Neural Network (ANN) trainbr 14-72-1 and Deep Neural Network (DNN) trainlm 14-7-5-1 model predictions during (a, b) training, (c, d) testing, and (e, f) with entire database modeling phases, respectively.

Accuracy metrics approached ideal unity for the most accurate ANN and DNN models, as shown in Fig. 12, indicating that trainbr performed best in the ANN (14-72-1) model, while trainlm was best in the DNN (14-7-5-1) model. During the training and testing phase and for the entire database, all the ANN trainbr models achieved the highest overall unity performance for R, R2and Tscatter (Figs. 12a-c). However, the corresponding metrics of the trainbr algorithm in DNN modeling were less than unity and varied substantially. The best-performing model of trainbr (14-72-1) achieved R, R2and Tscatter values of unity for all the modeling phases in ANN (Figs. 12a-c). However, all the neurocomputing models of trainlm outperformed their trainbr counterparts in DNN for all the modeling phases and with the entire database (Figs. 12d-f). In DNN, only the trainlm 14-7-5-1 model architecture maintained all values close to unity for the tripartite parameters R, R2and Tscatter in the entire modeling phases, making it the most error-tolerant and superior DNN model (Fig. 12d-f). As such, trainlm algorithms are best suited for training multi-layered DNN models, while trainbr performs best in ANN models. Other studies have demonstrated that the ideal neurocomputing models have values of R, R2and Tscatter equal to unity66.

Fig. 12
figure 12

Illustration of comparison of ideal accuracy unity for ANN (a, c, e for training, testing, and overall) and DNN (b, d, f for training, testing, and overall) models.

At the same time, the accuracy metrics for DNN and ANN models, which approach the ideal value of zero, are shown in Fig. 13, depicting MSE, RMSE, MAE, and MAPE values for the best models using the most accurate algorithms (trainbr and trainlm) across all modeling phases. The ANN and DNN models of trainbr and trainlm exhibited the lowest MSE, RMSE, MAE, and MAPE values, which were close to the zero-error line for all modeling phases. The best neurocomputing models have MSE, RMSE, MAE, and MAPE values closest to zero63,67. Considering all modeling phases and datasets (training, testing, and entire databases) in tandem, the trainbr was the most accurate in modeling ANNs, while the trainlm algorithm was the most precise in DNN models. The ANN trainbr14-72-1 and DNN trainlm 14-7-5-1 model architectures were the most accurate in forecasting FTr in situ.

Fig. 13
figure 13

Illustration of convergence error zeroing for ANN (a, c, e for training, testing, and overall) and DNN (b, d, f for training, testing, and overall) models.

The prediction accuracy (PA) of the best ANN and DNN models is shown in Fig. 14. The analysis indicated that three ANN models of trainbr (14-72-1, 14-39-1, and 14-13-1) and DNN model of trainlm (14-7-5-1,14-9-5-1, and 14-9-7-1) achieved PA values close to unity (> 0.95) which was observed in all modeling phases (Fig. 14). The PA of single-layered ANN trainbr (14-13-1, 14-39-1, 14-72-1) models lagged marginally from unity compared with the multi-layered DNN trainbr counterparts (14-7-5-1,14-9-5-1, and 14-9-7-1) that widely diverged during training, testing and for the entire database. However, multilayered DNN models of trainlm (14-7-5-1, 14-9-5-1, and 14-9-7-1) performed best in all modeling phases compared with multi-layered DNN models of trainbr (14-7-5-1, 14-9-5-1, and 14-9-7-1) that diverged widely from unity. Furthermore, the PAs of single-layered ANN trainbr (14-13-1, 14-39-1, 14-72-1) models lagged marginally from unity compared to ANN trainlm counterparts, which widely diverged during testing and for the entire database. The PA of trainlm DNN models was thus higher in multi-layered architectures than in single-layered architectures, while trainbr performs best in single-layered ANN architectures. However, the best single-layered trainbr (14-72-1) model was superior, albeit marginally, during training, testing, and with the entire database compared with the best multi-layered trainlm (14-7-5-1). These findings were consistent with connectome signalling, which most accurately occurs via the shortest path lengths of highly clustered neurons in the human brain, free from extraneous artifacts such as noise or aliasing68,69.

Fig. 14
figure 14

Illustration of comparison of PA in terms of Radar chart for DNN and ANN modeling phases using trainlm and trainbr algorithms.

The maximum error differences between predictions and actual values for the range of datasets are shown in Fig. 15; Table 12. The residual error characteristic curve generalizes the data points falling within the zero-error tolerance during prediction, with error residuals on the y-axis and the data point error sources on the x-axis. The ANN trainbr model 14-39-1 achieved the smallest magnitude of error residuals during training, testing, and for the entire database (Fig. 15; Table 12). The maximum error limits of ANN trainbr (14-72-1) were closest to the actual zero error line in predicting FTr compared to the best DNN trainlm (14-7-5-1) model.

Fig. 15
figure 15

Illustration of residuals for ANN (a, c, e for training, testing, and overall) and DNN (b, d, f for training, testing, and overall) models in predicting FTrTable 12 – Maximum error residuals for the best neurocognitive models.

Table 12 Maximum error residuals for the best neurocognitive models.

A comparative assessment of ANN trainbr and DNN trainlm model learning and prediction errors during training, testing, and with the entire database is illustrated in Fig. 16. The trainbr 14-71-1 reported the lowest errors (0.0054, 0.0117, 0.006) in ANN, while trainlm 14-7-5-1 achieved the lowest errors in DNN (0.0065, 0.2226, 0.065) during training, testing, and with the entire database, respectively.

Fig. 16
figure 16

Illustration of prediction errors for (a) trainbr and (b) trainlm in Artificial Neural Network (ANN) and Deep Learning (DNN) neurocognitive architectures.

Reliability analysis

The reliability metrics of the ANN and DNN models are shown in Fig. 17; Table 13. All ANN trainbr models achieved equal and optimal values of a20 (100%), VAF (100%), PI (2.0), and IOA (1.0). However, ANN trainbr 14-72-1 achieved the lowest values of IOS, indicating its superior reliability strength. Although all the DNN models had equal a20 (100%) and VAF (100%), the DNN trainlm 14-7-5-1 model achieved the highest and ideal PI (2.0), and IOA (1.0) at the lowest IOS (0.0017), making it the most reliable. Thus, the most reliable models were ANN trainbr 14-72-1 and DNN trainlm 14-7-5-1. Overall reliability metrics indicated that ANN models achieved the highest reliability in trainbr. By contrast, the DNN model proved the most reliable in trainlm (Fig. 17). All corresponding reliability indices were similar to or equal to the ideal values, indicating the reliability of both DNN and ANN models in forecasting agricultural traction.

Table 13 Reliability metrics of ANN and DNN models.
Fig. 17
figure 17

Illustration of comparison of reliability indices for ANN and DNN in trainbr (a, c) and trainlm (b, d).

Wilcoxon rank analysis

The ξtotal computed for the ANN and DNN models is shown in Fig. 18 (a), indicating that the ANN models achieved a higher ξtotal in the single-layered neuron architecture. In contrast, DNN models achieved the highest ξtotal in multi-layered neurons. The DNN models achieved higher scores in trainlm than in trainbr, where single-layered ANN neuron models were superior (Fig. 18b and c, and Table 14). Nonetheless, the best ANN single-layered trainbr model (14-72-1) achieved an overall score of 240, outperforming the best DNN trainlm model, which scored 196 (Table 14). Figure 18b and c illustrate that ANN trainbr 14-72-1 achieved the highest score ranking during training and testing, while trainlm 14-7-5-1 achieved the highest scores in DNN. Hence, when considering the two most performing algorithms and neurocognitive architectures, the single-layered trainbr 14-72-1 and multi-layered trainlm 14-7-5-1 models proved to be the most reliable for predicting FTr in tillage using ANN and DNN strategies, respectively.

Table 14 Score rank indices for the best-performing models.
Fig. 18
figure 18

Illustration of score analysis in terms of a radar plot for (a) combined ANN and DNN and ξtotal of (b) trainbr and (c) trainlm models.

Visual interpretation of model capabilities

Taylor plot

The Taylor method was employed to simultaneously assess multiple statistical metrics and quantify the extent to which ANN and DNN neurocomputing predictions matched the experimental datasets. The statistical software R (version 4.4.2) was used to code and generate Taylor diagrams due to its cross-platform flexibility, which supports a wide range of UNIX platforms, enabling the visualization of multiple model performance metrics. Taylor plots were thus graphically constituted, integrating the goodness of fit metrics R, RMSE, and the penalties of standard deviation (σ) for the best ANN and DNN Models during training, testing, and the entire database. During Taylor analysis, RTaylor values, which describe the degree of variability between the model simulation(s) and the reference data (r), were evaluated using Eq. 4970.

$$\:{\varvec{R}}_{\varvec{T}\varvec{a}\varvec{y}\varvec{l}\varvec{o}\varvec{r}}=\frac{\frac{1}{\varvec{N}-1}\sum\:_{\varvec{n}=1}^{\varvec{N}}\left[\left({\varvec{s}}_{\varvec{n}}-\stackrel{-}{\varvec{s}}\right)\left({\varvec{s}}_{\varvec{n}}-\stackrel{-}{\varvec{r}}\right)\right]}{{\varvec{\sigma\:}}_{\varvec{s}}{\varvec{\sigma\:}}_{\varvec{r}}}$$
(49)

where \(\:\stackrel{-}{s}\), σs, sn, \(\:\stackrel{-}{r}\) and σr are the means and standard deviations of s and r for the simulated and reference data, while the corresponding RMSETaylor were defined by Eq. 50.

$$\:{\varvec{R}\varvec{M}\varvec{S}\varvec{E}}_{\varvec{T}\varvec{a}\varvec{y}\varvec{l}\varvec{o}\varvec{r}}={\left[\frac{1}{\varvec{N}}\sum\:_{\varvec{n}=1}^{\varvec{N}}{\left({\varvec{s}}_{\varvec{n}}-{\varvec{r}}_{\varvec{n}}\right)}^{2}\right]}^{0.5}$$
(50)

Further, the relationship between RMSETaylor, σ of s, and r was used to formulate the centered root mean square error (cRMSE) as indicated in Eq. 5171, whose interpretation yields the relationship between cRMSE, σs, σr, and RTaylor for simulated and reference data in Eq. 52. Thus, to evaluate the performance of DNN and ANN models, Eq. 52 was graphically adopted to determine standard variability, correlation, and cRMSE between predictions and reference datasets in a Taylor diagram72,73.

$$\:\varvec{c}\varvec{R}\varvec{M}\varvec{S}\varvec{E}={\left[\frac{1}{\varvec{N}}\sum\:_{\varvec{n}=1}^{\varvec{N}}{\left[\left({\varvec{s}}_{\varvec{n}}-\stackrel{-}{\varvec{s}}\right)-\left({\varvec{r}}_{\varvec{n}}-\stackrel{-}{\varvec{r}}\right)\right]}^{2}\right]}^{0.5}$$
(51)
$$\:{\varvec{c}\varvec{R}\varvec{M}\varvec{S}\varvec{E}}^{2}=\left(\left({{\varvec{\sigma\:}}_{\varvec{s}}}^{2}+{{\varvec{\sigma\:}}_{\varvec{r}}}^{2}\right)-2{\varvec{\sigma\:}}_{\varvec{s}}{\varvec{\sigma\:}}_{\varvec{r}}\varvec{R}\right)$$
(52)

Taylor’s analysis presented three combined statistical metrics of neurocomputing models (R, σ, and cRMSE), thereby addressing rectifications of any underlying offsets or biases that could have been introduced in the predictions, leading to a more robust representation of prediction errors and ultimately better reflecting model robustness.

A comprehensive visual-metric diversity of ANN and DNN models’ performance was quantified by portraying the extent to which their predictions differed from the reference dataset while considering the corresponding σ, as shown in Taylor`s diagram (Fig. 19). Both the ANN and DNN model architectures demonstrated satisfactory relationships between R, σ, and RMSE during the modeling phases and for the entire database. The ANN trainbr 14-72-1 and DNN 14-7-5-1 demonstrated superior prediction compared with their respective counterparts, where their R, SD, and RMSE values of prediction were positioned at the corresponding reference data points (Fig. 19). However, the ANN trainbr (14-72-1) clustered closest to the reference data points, albeit with slightly higher values. Still, the efficacy was nonsignificant compared with the DNN trainlm 14-7-5-1 model (Fig. 19). The compact, intuitive Taylor diagram summarized the multi-statistical aspects of the models, thereby depicting the degree to which the experimental datasets and predicted values were similar or dissimilar, with the best models exhibiting the closest congruence. These findings align with similar observations in previous research66,70,74.

Fig. 19
figure 19

Illustration of the Taylor plots for ANN (a, b, c for training, testing, and overall) and DNN (d, e, f for training, testing, and overall) models.

Monte Carlo uncertainty simulation

A Monte Carlo simulation was performed to establish and quantify the uncertainties associated with predictions from the best ANN and DNN models. During analysis, randomly resampled datasets were used to retrain DNN and ANN models for 1000 multiple cycles, generating a corresponding number of outputs at constant train-validation-test ratios without replacement. Monte Carlo-based cumulative distribution functions were then constructed to determine the true data bracketed by 95% prediction uncertainty (PPU95%) intervals using the degree of neurocognitive uncertainty, \(\:\stackrel{-}{dU}x\) at 2.5th (XL) and 97.5th (XU) percentiles depicted in Eq. 53.

$$\:\stackrel{-}{\varvec{d}\varvec{U}}\varvec{x}=\frac{1}{\varvec{n}}\sum\:_{\varvec{i}=1}^{\varvec{n}}\left({\varvec{X}}_{\varvec{U}}-{\varvec{X}}_{\varvec{L}}\right)$$
(53)

Where n depicts the number of experimental observations. Ideal models yield zero (0) \(\:\stackrel{-}{dU}x\) as 100% observations are bracketed by PPU95% but may not be achieved in practice due to modeling uncertainty75,76. As such, a logical equivalence of \(\:\stackrel{-}{dU}x\) was presented as dfactor and computed using Eq. 54.

$$\:{\varvec{d}}_{\varvec{f}\varvec{a}\varvec{c}\varvec{t}\varvec{o}\varvec{r}}=\frac{{\stackrel{-}{\varvec{d}}}_{\varvec{x}}}{{\varvec{\sigma\:}}_{\varvec{x}}}$$
(54)

Where σx is the standard deviation of the output variable, larger values of dfactor indicate increased uncertainty, and vice versa. In comparison, values less than unity are desirable; however, higher values of true data, bracketed by PPU95%, prevail, as shown in Eq. 5546.

$$\:{\varvec{P}\varvec{P}\varvec{U}}_{95\varvec{\%}}=\frac{1}{\varvec{n}}\varvec{c}\varvec{o}\varvec{u}\varvec{n}\varvec{t}\left(\varvec{X}|{.\varvec{X}}_{\varvec{L}}\le\:\varvec{X}\le\:{\varvec{X}}_{\varvec{U}}\right)100$$
(55)

Montecarlo uncertainty simulation generated a 95% confidence interval (CI) observed in the uncertainty plot of both ANN trainbr 14-72-1 and DNN trainlm 14-7-5-1. It encapsulated variations in the range with which the predicted outcomes fall within the observed data range (Fig. 20). A plot with relatively narrower uncertainty bands reflects a higher confidence level in the predictions for the best model77,78,79. Compared with trainlm, the model CI band of ANN trainbr (Fig. 20a) suggests that ANN would outperform the DNN trainlm model, mainly when the dataset includes patterns that can be efficiently learned through regularization. Such is a typical case of ANN-14-72-1, where the regularization process adjusts the network’s complexity by penalizing excessive weights, allowing for a closer fit to the observed data while avoiding overfitting. This observation reveals that the trainbr algorithm excelled at producing reliable and robust model predictions in ANN (14-72-1), which is apparent from the close alignment between observed and predicted data (Fig. 20). However, both DNN and ANN models exhibited high adaptability to the observed data while maintaining a balance between accuracy and generalization. Specifically, the ANN 14-72-1 effectively reduces overfitting by regularizing the weights, thereby enhancing the model’s robustness in its predictions. This improvement is achieved by iteratively and efficiently adjusting the network parameters, thereby providing a better response to the underlying complex patterns in the dataset (Fig. 20a). The trainbr ANN model 14-72-1 penalizes large weights, allowing for an effective trade-off between model complexity and PA while balancing bias and variance. Thus, an improved generalization with minimal overfitting is achieved, which is crucial when dealing with complex and noisy datasets or where prediction uncertainty is critical (Fig. 20c, noting that such improvements are not possible with the DNN trainlm model).

In contrast, the DNN trainlm neuro-optimization technique yields a different pattern of prediction and uncertainty intervals, with a smoother and more generalized mean prediction that does not closely follow the observed data points (Fig. 20b). This can be attributed to the nature of the trainlm algorithm, which is more focused on optimizing speed and accuracy through a combination of gradient descent and Gauss-Newton methods, as reported in the literature33,80. While this method is generally faster and more effective at converging, it may struggle to align predictions as closely with the data. This is the case when significant variability or noise is present in the dataset (Fig. 20b); thus, the uncertainty interval widens (Fig. 20d) compared with the trainbr model (Fig. 20c). In some instances, this trade-off may be advantageous when a fast, computationally efficient solution is needed. However, in cases where data variability is crucial, trainlm may require additional tuning, such as regularization, as reported by Ying81. Table 15 summarizes the Monte Carlo uncertainty indices corresponding to the PPU95% intervals obtained from ANN and DNN output predictions. The ideal model delivers a \(\:\stackrel{-}{dU}x\) reaching zero and 100% of observations bracketed by PPU95% as indicated in Badgujar et al.46 and Noori et al.75. Most ANN and DNN models had 100% of their predictions bracketed at PPU95%. However, the trainbr 14-72-1 ANN had the lowest \(\:\stackrel{-}{dU}x\) and dfactor values, closest to zero, and was a marginally better model, while trainlm 14-7-5-1 was best in DNN (Table 15). These results agreed with earlier findings where the trainbr algorithm models were best in ANN, while the trainlm was best in DNN architectures.

Fig. 20
figure 20

Illustration of the Montecarlo uncertainty analysis for ANN (a, c) and DNN (b, d) models.

Table 15 Uncertainty evaluation of the best ANN and DNN models.

Anderson Darling (AD) test

The AD-test was used to determine if a sampled prediction was drawn from hypothesized normality in the observed distribution. This hypothesis was achieved by evaluating the AD-test statistic parameter based on the cumulative distribution F(x,θ) and empirical probability density functions, Fn(x) for n observations, x1, ≤x2, …. xn of a particular sample. If x1, ≤x2,., xn satisfy the same distribution F(x, θ), then H0 and the converse H1 are true. The AD parameter (A2) was calculated using Eq. 56 and compared with its corresponding critical values (⍺) obtained from Eq. 5782,83.

$$\:{\varvec{A}}^{2}=\varvec{n}{\int\:}_{\varvec{\infty\:}}^{\varvec{\infty\:}}\frac{{\left[{\varvec{F}}_{\varvec{n}}\left(\varvec{x}\right)-\varvec{F}\left(\varvec{x};\varvec{\theta\:}\right)\right]}^{2}}{\varvec{F}\left(\varvec{x};\varvec{\theta\:}\right)\left[1-\varvec{F}\left(\varvec{x};\varvec{\theta\:}\right)\right]}\varvec{d}\varvec{F}\left(\varvec{x};\varvec{\theta\:}\right)$$
(56)
$$\:{\varvec{A}}^{2}=-\varvec{N}-\frac{1}{\varvec{N}}\sum\:_{\varvec{i}=1}^{\varvec{N}}\left(2\varvec{j}-1\right)\left[\varvec{l}\varvec{n}{\varvec{u}}_{\varvec{j}}+\varvec{l}\varvec{n}\left(1-{\varvec{u}}_{\varvec{N}-\varvec{j}+1}\right)\right]$$
(57)

Where N is the total sample data, uj equals F(xj), and xj is the jth sample value. Anderson-Darling test results are shown in Table 16 together with the AD-test at a 95% confidence interval in Fig. 21. The P-value (< 0.0012) of the actual database was lower than the level of significance (P < 0.05), implying rejection of the null hypothesis of normality and acceptance of a normally distributed database (Table 16). Compared with other models, the AD-test statistics of both ANN trainbr (14-72-1) and DNN trailm (14-7-5-1), i.e., 1.3913 and 1.3922, respectively, were the closest to that of the actual database (1.3912) as shown in Table 16. This observation construed normal distribution in neurocognitive predictions of ANN trainbr (14-72-1) and DNN trailm (14-7-5-1) as indicated by the closeness of AD-test statistic values (i.e., AD model ≡AD actual). However, the AD-test statistics of ANN trainbr (14-72-1) were closer to the actual values than those of DNN trailm (14-7-5-1), possibly due to the ingress of extraneous artifacts from a complex neurocognitive architecture, such as noise and long backpropagation memory. Moreover, P-values of both ANN and DNN predictions were less than the ideal P-value(i.e., Pmodel < Pideal), which further justifies the rejection of hypothetical nullity and acceptance of normality in the FTr predictions in both models (Fig. 21). Hence, their neurocognitive robustness can be generalized. The AD-test statistics of neurocognitive predictions must be close to the value of the actual database for the acceptance of normally distributed predictions. This condition must be met whenever the Pmodel values are less than the significant P-value (P < 0.05) as cogent evidence of model robustness, and the findings agree with the literature84,85.

Fig. 21
figure 21

Illustration of the Anderson-Darling test for (a) actual data, and predicted FTr using (b) DNN trainlm 14-7-5-1 and (c) ANN trainbr 14-72-1 models.

Table 16 Anderson-Darling test results for the ANN and DNN model predictions.

Discussion on results

The present investigation shows that the ANN model configured by Bayesian regularization (trainbr) has outperformed the ANN models configured by Levenberg-Marquardt (trainlm), Scaled conjugate gradient (trainscg), Quasi-newton (trainbfg), Powell-Beale conjugate gradient (traincgb), Onestep secant (trainoss), Gradient descent momentum (traingdm), Fletcher-reeves conjugate gradient (traincgf), Gradient descent (traingd), Polak-ribiére conjugate gradient (traincgp), Learning rate gradient descent (traingdx), and Resilient backpropagation (trainrp) due to its exceptional generalization capabilities. Unlike traditional methods such as Levenberg–Marquardt (trainlm) or Scaled Conjugate Gradient (trainscg), trainbr automatically incorporates regularization, preventing overfitting, especially on noisy or small datasets. While algorithms such as trainlm and trainbfg may converge faster, they risk memorizing the training data rather than learning the underlying patterns. In contrast, trainbr balances data fitting with model complexity, adjusting weights to ensure smoother outputs. This makes it highly reliable for function approximation and regression tasks. Unlike gradient-based methods such as traingd, traingdx, or traingdm, trainbr is less sensitive to learning rate tuning and local minima. It also outperforms conjugate gradient variants (traincgf, traincgp, traincgb) and resilient backpropagation (trainrp) in producing stable models across varied input distributions. Furthermore, trainbr’s probabilistic framework enhances robustness against outliers. On the other side, the DNN model trained using Levenberg–Marquardt (trainlm) often outperforms other backpropagation algorithms due to its exceptionally fast and accurate convergence, especially on moderate-sized datasets. As a hybrid of gradient descent and Gauss-Newton methods, trainlm effectively handles nonlinear error surfaces with high precision. Compared to Bayesian Regularization (trainbr), trainlm generally trains faster and requires fewer epochs, making it ideal for tasks demanding quick optimization. It also outpaces conjugate gradient methods like traincgb, traincgf, and traincgp in both speed and solution quality, especially when the network is well-initialized. Unlike basic gradient descent variants (traingd, traingdx, traingdm), trainlm is far more resilient to poor learning rate settings. While trainbr adds robustness through regularization, it is computationally intensive and slower for deeper networks. Algorithms like trainoss and trainbfg approximate second-order information but lack the adaptive damping feature of trainlm. Additionally, resilient backpropagation (trainrp) performs well in shallow networks but struggles with deeper architectures. Finally, the robustness of DNN and ANN models has been compared with other machine learning models used to predict agricultural traction (Table 17). As highlighted earlier, some of these models were developed under controlled experimental conditions in laboratory soil bins, relying on a limited number of input parameters and accuracy evaluation metrics. However, the ANN trainbr 14-72-1 and DNN trailm 14-7-5-1 models in our study were developed from in situ conditions. Consequently, neurocognitive learning and modeling were subjected to complex and dynamic soil and environmental conditions, involving a large number of input variables that enhanced their robustness. The neurocognitive models presented in our study can simulate traction force under real-world field conditions with high accuracy, reliability, and robustness, facilitating their generalized adoption (Table 17).

Table 17 Performance comparison of published and present study models.

The literature study demonstrated that the optimization algorithm enhances the performance of the soft computing models. Therefore, the BR_ANN (trainbr; 14-72-1) and LM_DNN (trainlm; 14-7-5-1) models have been optimized using three metaheuristic algorithms, i.e., Spider Wasp Optimization (SWO), Puma Optimizer (PO), and Walrus Optimizer (WO). The reasons for selecting these algorithms: (a) SWO effectively balances exploration and exploitation by mimicking the hunting and paralyzing strategies of spider wasps, enabling it to avoid premature convergence91. It is particularly efficient in handling high-dimensional search spaces and provides robust global search capability. (b) PO, inspired by the cooperative and predatory behavior of pumas, emphasizes adaptive hunting strategies that enhance convergence speed while maintaining solution diversity92. Its flexibility makes it well-suited for both continuous and discrete optimization tasks. (c) WO, modeled after the social and survival behaviors of walruses, incorporates herd-based communication and leadership mechanisms to improve local exploitation93. It shows strong stability and resilience against local optima due to its collective decision-making process. Together, these algorithms provide superior accuracy, scalability, and adaptability across engineering design, machine learning, and real-world optimization applications. The SWO algorithm has been configured with a population size of 30, 500 iterations, a 0.9 crossover rate, and a 0.5 mutation random factor. On the other hand, the PO algorithm has been tuned with a population size of 30, 500 iterations, a phase weight of 1.3, a mega exploration and exploitation ratio of 0.99, and a phase-switch threshold of 0.5. Similarly, the WO algorithm is configured with a population size of 30, 500 iterations, a leader (alpha) fraction of 0.20, a step size of 0.5, a decay rate of 0.99 per iteration, a social communication probability of 0.70, and a random perturbation of 0.1. Thus, six hybrid models, i.e., SWO_ANN, PO_ANN, WO_ANN, SWO_DNN, PO_DNN, and WO_DNN, have been developed with the same hyperparameter configurations. Figure 22 demonstrates the comparison of conventional (trainbr; 14-72-1) and hybrid ANN models in estimating FTr. It can be observed that the optimization algorithms have enhanced the prediction capabilities of the conventional ANN model. The optimized ANN models have outperformed the conventional ANN model. Still, the SWO_ANN model attained higher performance in both phases. The SWO_ANN model estimated FTr with the least residuals (RMSE = 1.38E-11 in training and 8.38E-03 in testing; MAE = 7.05E-12 in training and 6.57E-03 in testing) and higher accuracies (R = 1 in training and 0.9965 in testing), followed by the WO_ANN (R = 1 in training and 0.9948 in testing) and PO_ANN (R = 1 in training and 0.9943 in testing) models.

Fig. 22
figure 22

Illustration of comparison of conventional (BR_ANN) and hybrid models (SWO_ANN, PO_ANN, and WO_ANN).

Conversely, Fig. 23 demonstrates the comparison of conventional (LM_DNN; tuned by trainlm, 14-7-5-1) and hybrid deep neural network (DNN) models in predicting FTr. Figure 23 shows that the SWO, PO, and WO algorithms have increased the performance of the traditional DNN (trainlm; 14-7-5-1) model. It has been found that the SWO_DNN model has outperformed the conventional LM_DNN model in both training (RMSE = 9.31E-12, MAE = 4.93E-12, R = 1.0000) and testing (RMSE = 2.57E-03, Mae = 2.56E-03, R = 1.0000) phases with the least residuals and high performance, followed by the PO_DNN (RMSE = 1.38E-11, MAE = 7.18E-12, R = 1.0000 in training phase; RMSE = 7.87E-02, MAE = 7.25E-03, R = 0.9984 in testing phase) and WO_DNN (RMSE = 1.41E-11, MAE = 7.63E-12, R = 1.0000 in training phase; RMSE = 1.21E-02, MAE = 1.16E-02, R = 0.9983 in testing phase) models. Interestingly, the ANN and DNN models have attained higher performance using the SWO algorithm. The fundamental study of SWO reveals that SWO outperforms PO and WO in training ANN and DNN models due to its strong balance between exploration and exploitation. Its multi-phase strategy, including searching, escaping, paralyzing, and mating, prevents premature convergence and improves convergence speed. SWO adapts better to high-dimensional weight spaces, which are common in deep networks. It also maintains diversity through crossover and random motion, reducing the risk of overfitting.

Fig. 23
figure 23

Illustration of comparison of conventional (LM_DNN) and hybrid models (SWO_DNN, PO_DNN, and WO_DNN).

Summary and conclusions

This study simulated the neurocognitive intelligence of human brain neurons to develop Deep Learning (DNN) and Artificial Neural Network (ANN) models for predicting tractive force (FTr) of farm vehicles using dynamic soil-machine variables in situ. The model development process relied on 12 artificial neurocomputing algorithms and three activation functions to sequentially train 72 DNN and ANN neurocognitive architectures that used 14 input neurons. Subsequently, the prediction performance of FTr was evaluated using various metrics, including training time, epoch size, architecture complexity, accuracy, robustness, and reliability, from which optimal DNN and ANN models were identified. The novel set of DNN and ANN neurocognitive equations developed by this study will enable intelligent prediction of FTr for applications in wheeled tractors and autonomous systems used for tillage operations. The main conclusions derived from this research are summarized below:

  • The neurocognitive accuracy, reliability, and robustness of the DNN and ANN models depend on the training algorithm, network size, activation function, convergence time, and epoch size. The number of layers, hidden layer neurons, convergence time, and epoch optimality do not have an equal regression with neurocomputing accuracies in both ANN and DNN models. The trainbr performs best in a single-layered ANN, but it consumes higher convergence time at lower epoch optimality. In contrast, trainlm provides better predictions in multi-layered DNN models at less convergence time but at the expense of higher epoch optimality. Increasing the number of neurons in the hidden layer of an ANN trained by the trainbr improves prediction performance. However, increasing the number of layers in DNN trainlm does not necessarily lead to improved accuracy without modifying the algorithm.

  • Although the number of optimal epochs did not regress with training time in ANN, convergence time and number of hidden layers were congruent with epoch size in DNN, but incongruent in ANN. The number of hidden layers successfully regressed against modeling accuracy in trainlm DNN, but it was incongruent in trainbr ANN. The number of neurons was congruent with neurocomputing accuracy in ANN, but it did not regress with accuracy metrics in DNN models. As the number of hidden layers increased, accuracy declined, while the number of neurons improved the neurocomputing accuracy in ANN. However, increasing the number of hidden layers and neurons improved the neurocognitive accuracy in DNN, while reducing them, respectively, decreased the accuracy.

  • The performance of DNN models can be optimized by increasing the number of layers and neurons. Still, it can be limited by early stoppage due to overfitting and complex neurocognitive artifacts, such as noise and long-term memory. The trainbr ANN model optimized neurocomputing performance at 72 hidden neurons in a single layer (14-72-1), while DNN optimized prediction at 7 and 5 neurons in the first and second hidden layers (14-7-5-1). Draft force and tillage depth had the highest explanatory importance on predicted FTr among all soil-machine variables used in neurocognitive modeling. At the same time, tire inflation pressure was the least sensitive parameter.

  • The SWO, PO, and WO algorithms enhanced the performance of ANN (trainbr; 14-72-1) and DNN (trainlm; 14-7-5-1) models in both phases. The SWO_ANN and SWO_DNN models outperformed BR_ANN, LM_ANN, PO_ANN, WO_ANN, PO_DNN, and WO_DNN models with higher accuracy and the least residuals.

Based on the modeling results, future work should focus on designing and developing intelligent, programmable logic control platforms to accurately implement the models in decision-making for in-field operational adjustments, as well as optimizing wheeled autonomous tractors and tillage robots for sustainable smart farming. The present investigation may be extended by optimizing DNN (trainlm) and ANN (trainbr) models with each Ant Lion Optimizer (ALO), Information Acquisition Optimizer (INFO), Enhanced Remora Optimization Algorithm (EROA), Enhanced Runge Kutta Optimizer (ERUN), and Improved Randomized Firefly Optimization (IMRFO) algorithm to understand the impact of optimization techniques on the accuracy of models. In addition, the real-world application of the best-performing models, particularly DNN (trainlm) and artificial neural networks (trainbr), holds significant potential for enhancing autonomous tractor operations in precision agriculture. These models can be integrated into real-time control systems to facilitate intelligent decision-making for tasks such as traction control, path planning, and terrain adaptation. However, successful deployment requires careful consideration of several practical factors. First, reliable and continuous data collection from onboard sensors (e.g., GPS, IMU, torque sensors) is critical, but such data are often affected by noise, drift, and environmental variability. Therefore, preprocessing techniques like filtering, normalization, and sensor fusion must be applied to ensure model robustness. Second, computational constraints on embedded systems demand lightweight and optimized versions of these models to maintain real-time responsiveness without compromising accuracy. Model pruning, quantization, or edge computing solutions may be required. Lastly, the system should be adaptive to dynamic field conditions by incorporating online learning or periodic re-training strategies using updated field data to maintain performance consistency over time. There is a need to onboard commercial software packages for the developed models to optimize the generation and utilization of tractive forces in wheeled robots under diverse soil and field conditions.