Path-based evaluation of deep learning models for solving inverse kinematics in a revolute-prismatic robot

Manjegowda, Navya; Rao, Muralidhara

doi:10.1038/s41598-025-10940-z

Download PDF

Article
Open access
Published: 30 September 2025

Path-based evaluation of deep learning models for solving inverse kinematics in a revolute-prismatic robot

Navya Manjegowda¹ &
Muralidhara Rao²

Scientific Reports volume 15, Article number: 33953 (2025) Cite this article

1443 Accesses
1 Citations
Metrics details

Subjects

Abstract

This study investigates the generalization performance of deep learning (DL) models for solving the inverse kinematics (IK) problem in a 2-degrees of freedom (DOF) revolute-prismatic (RP) robotic manipulator. The goal is to evaluate how effectively different neural architectures predict joint configurations from end-effector positions across diverse workspace regions. Two training strategies were used: quadrant-based and full workspace training. To improve robustness, k-fold cross-validation (CV) was applied to the deep feedforward neural network (DFNN). The models evaluated include DFNN with k-fold CV using 2-input 1-output and 2-input 2-output formulations and without k-fold CV, long short-term memory (LSTM), and gated recurrent unit (GRU). Performance was tested on predefined Square and Circle paths within each quadrant and the full workspace. The DFNN with k-fold CV (2-input, 1-output) consistently achieved the lowest Cartesian deviation errors- for instance, 0.289 mm in Q1, 0.410 mm in Q2, 0.508 mm in Q3, and 0.715 mm in Q4 on Square path. Similar trends were observed on the Circle path, with errors of 0.312 mm, 0.366 mm, 0.438 mm and 0.662 mm in Q1 to Q4 respectively. In full workspace testing, it maintained strong performance with 1.594 mm (Square) and 2.084 mm (Circle) errors. In contrast, DFNN with k-fold CV (2-input, 2-output), without k-fold CV, LSTM and GRU exhibited significantly higher errors. These findings demonstrate that the k-fold CV-based DFNN with single-output formulation, achieves high accuracy and generalization and also capable of handling singularities and ambiguity in joint solutions.

Advanced predictive machine and deep learning models for round-ended CFST column

Article Open access 20 February 2025

Monitoring and deformation of deep excavation engineering based on DFOS technology and hybrid deep learning

Article Open access 08 May 2025

Deep reinforcement learning trajectory planning for robotic manipulator based on simulation-efficient training

Article Open access 10 March 2025

Introduction

Inverse kinematics (IK) is a fundamental problem in robotics, where the goal is to determine the joint parameters of a robotic manipulator for a given end-effector position. Traditional analytical and numerical methods, though widely used, often struggle with real-time constraints and workspace singularities. This makes machine learning (ML)-based approaches, particularly neural networks (NNs), a promising alternative for efficiently solving the IK problem. Recent studies have demonstrated the ability of NNs to approximate non-linear mappings and generalize across various workspaces using random points in the workspace, but few explore how these models generalize to continuous trajectories.

Several studies have leveraged artificial neural networks (ANNs) and deep learning (DL) to improve IK computation accuracy and efficiency. Adar¹ proposed a real-time IK solution for a 5-degrees of freedom (DOF) manipulator using a multi-layer perceptron (MLP) combined with a proportional-integral (PI) control system, achieving root mean square error (RMSE) < 0.85. Wagaa et al.² compared analytical and DL methods, including ANN, convolutional neural network (CNN), long short-term memory (LSTM), gated recurrent unit (GRU), and bidirectional long short-term memory (BiLSTM), for solving IK and trajectory tracking of a 6-DOF robotic arm, achieving RMSE between 0.0042 and 0.0149 with a position error < 1 mm. Vu et al.³ developed a ML-based framework for real-time IK computation of a 7-DOF redundant manipulator, achieving RMSE < 0.05 for the KUKA LBR iiwa 14 R820. Ma et al.⁴ applied a backpropagation neural network (BPNN) for soft actuators, reducing average IK errors to 2.46%.

Other works have focused on enhancing convergence speed and generalization. Sharkawy⁵ utilized a multilayer feedforward neural network (MLFNN) for forward kinematic (FK) and IK of a 2-DOF manipulator, achieving zero approximation error. Pang et al.⁶ introduced an improved BPNN for solving IK in a 7-DOF rehabilitation robot, achieving position errors < 1 mm and posture errors < 0.1 mm. Wang & Deng⁷ applied deep reinforcement learning (DRL) for multi-robot coordination in dynamic environments, ensuring robust task completion but lacking real-world validation. Gao⁸ optimized BPNN for 6-DOF robots, improving convergence and accuracy but limiting validation to simulations. Semwal & Gupta⁹ compared NNs with analytical methods for 3-DOF manipulators, finding limitations in sparsely trained regions but highlighting potential scalability to higher-DOF robots.

Several hybrid approaches have been introduced to enhance performance. Shareef¹⁰ implemented a deep artificial neural network (DANN) with 10 hidden layers for solving IK on the 6-DOF PUMA 260, achieving a maximum error of 1.579% and R² ≈ 0.99981, though limited to a single robot model. Shastri et al.¹¹ hybridized ANN with particle swarm optimization (PSO), simplified particle swarm optimization (SPSO), and modified simplified particle swarm optimization (MSPSO) for a 3-DOF robot, improving convergence speed and accuracy but at high computational cost. Aggogeri et al.¹² optimized a 3-DOF ANN with genetic algorithms (GAs), reducing trajectory error by 97%. Tammishetty et al.¹³ developed a multimodal input ANN for a 3-DOF manipulator, achieving 99% accuracy but with high computational overhead. Jiménez-López et al.¹⁴ combined quaternion algebra with ANN for a 3-DOF robot, achieving < 1 mm position error but focusing only on planar configurations.

Toquica et al.¹⁵ compared analytical IK with MLP, LSTM, and GRU models for a 3-DOF IRB360 robot, finding GRU to be the most stable and MLP to converge fastest. Gholami et al.¹⁶ applied an MLP with online retraining for real-time IK control of a 3-DOF Delta robot, improving tracking precision but requiring retraining for dynamic tasks. Zhu et al.¹⁷ introduced a hybrid artificial bee colony (ABC)-based BPNN and quaternion-multilayer Newton (QMn-M) algorithm for FK of a 6-DOF Gough–Stewart platform, improving accuracy near singular configurations. Tagliani et al.¹⁸ developed a GA-optimized sequential ANN for 6-DOF IK, reducing errors by 42.7–56.7% compared to global methods.

Recent work has further improved accuracy and speed. Lu et al.¹⁹ proposed an MLP-based IK solution for 6-DOF robots using joint space segmentation, classification models, and Newton–Raphson refinement, achieving position errors < 0.001 mm and orientation errors < 0.01°. Wang et al.²⁰ proposed a Gaussian-damped least squares (GDLS) IK solver for 7-DOF redundant robots, integrating ANNs with optimization principles. Their approach achieved RMSE < 0.01 mm and a convergence accuracy of 96.23%. Wu et al.²¹ introduced OTDPP-Net, a deep neural network (DNN)-based path planner using CNNs for value iteration. Sharkawy & Khairullah²² proposed an MLFFNN-based approach using the Levenberg–Marquardt (LM) algorithm for solve FK and IK of a 3-DOF manipulator. Their model achieved high accuracy, with mean squared error (MSE) values of $\:4.59\times\:{10}^{-8}$ for FK, $\:9.07\times\:{10}^{-7}$ for IK, ensuring minimal error and fast computation. Gadringer et al.²³ proposed a hybrid robot calibration approach that combines a kinematic model with an ANN and geometric calibration using a laser tracker, achieving a maximum position error of 0.605 mm and a maximum orientation error of 3.753mrad, ultimately reducing positioning and orientation errors by 93% and 92%, respectively, compared to the uncalibrated model. Shah et al.²⁴ developed and experimentally validated a DANN model for a 5-DOF manipulator. Their network trained for 500 epochs with the LM algorithm, demonstrated an MSE of $\:8.926\times\:{10}^{-7}$ and positional deviations within $\:\pm\:0.05$mm.

Hamarsheh et al.²⁵ developed an ANN approach to solve the IK of a 6-DOF KUKA industrial manipulator using a Non-Linear Autoregressive NN with Exogenous Inputs and an Adaptive Feedforward NN, trained on FK data using MATLAB, achieving the best MSE of 0.005 with Bayesian Regularization and 250 neurons. Dalmedico et al.²⁶ proposed an ANN approach to solve the IK of a 4-DOF robotic arm in 3D using a MLP trained on FK data with the LM algorithm, achieving Euclidean errors of 0.112 cm in simulation 1 and 0.219 cm in simulation 2. Yang et al.²⁷ developed a novel IK algorithm for a 7-DOF manipulator with offset, combining analytical and numerical methods while incorporating joint and position constraints to enhance accuracy, generating multiple solutions using the gradient method. Guo et al.²⁸ developed an analytical IK computation method for a 7-DOF manipulator with an S-R-S configuration, utilizing FK based on the Denavit-Hartenberg (DH) modelling approach and decoupling redundancy through arm angle parameterization, with the method verified using robotics toolbox and ROS simulation. Shaar and Ghaeb²⁹. developed a Recurrent Neural Network (RNN) model to solve the IK of a 6-DOF industrial manipulator, training it on 100,000 FK -generated samples with a single hidden layer of 12 neurons, achieving a MSE of 0.0013 and a regression factor (RF) of 0.99. Bouzid et al.³⁰ analysed the performance of ANNs in solving IK for a 2-DOF robotic arm with varying arm lengths, training the model on FK data using three dataset types (fixed step, random step, and sinusoidal) and evaluating three optimization algorithms (LM, Bayesian regularization, and scaled conjugate gradient), achieving the best MSE of 2.1573 using the random step dataset with the LM algorithm. Joshi et al.³¹ applied DFNNs, CNNs, RNNs, and LSTMs with Bayesian optimization and SHAP analysis for 6-DOF anthropomorphic robots, achieving real-time IK prediction with MSE of $\:1.934-3.522\times\:{10}^{3}$ and latency of ~ 1.25 ms/sample. Zhao et al.³² proposed MAPPO-IK, a reinforcement learning-based algorithm using Gaussian and cosine distance rewards for real-time, unique IK solutions. It demonstrated superior generalization, computational efficiency, and dynamic adaptability. Palliwar et al.³³ integrated GANs with computer vision to replicate human hand motions via robotic joints, achieving high motion accuracy and efficiency. Khaleel et al.³⁴ applied NNGA and PSO to solve IK of a 3-DOF redundant arm, with PSO outperforming NNGA in trajectory accuracy. Bouzid et al.³⁵ utilized ANN for 4-DOF SCARA robots using LM, BR, and SCG training algorithms on diverse datasets.

This study focuses on a prismatic and revolute joint system, which is commonly used in industrial and medical robots. The main contributions of this work include a novel quadrant-based and full workspace learning and testing approach for IK, a comparative analysis of deep feed-forward neural network (DFNN), LSTM, and GRU architectures in continuous paths, the introduction of path-based validation for real-world applicability, improved generalization testing across different workspace regions, and a data-efficient, computationally viable methodology that enhances real-time feasibility without constant retraining or excessive computational resources.

Proposed methodology

We consider a robot with a prismatic joint and a rotational joint as shown in the Fig. 1 designed specifically for evaluating IK performance using NNs. The goal is to generate the workspace for different values of prismatic joint extension (d) and rotation angle (θ) and then derive the IK to find the required joint parameters for a given end-effector position. To evaluate the accuracy and generalization of NN-based IK models, we employed a structured dataset division and K-fold cross-validation technique. Two training formats were used: Quadrant-Based Training and Full Workspace Training. To ensure robust evaluation and prevent overfitting, K-fold cross-validation was applied, allowing the model to be trained and validated on different subsets of data. Additionally, we compared standard DFNN with recurrent models such as LSTM and GRU to determine their effectiveness in learning IK relationships, particularly for continuous motion prediction. This comprehensive methodology ensures a thorough comparison between analytical and DL-based IK solutions, providing a detailed assessment of model robustness and accuracy.

Forward kinematics

FK establishes the relationship between joint parameters (d, θ) and the Cartesian coordinates (x, y) of the end-effector. The robot consists of:

(1)
A prismatic joint that allows linear motion along a fixed direction.
(2)
A rotational joint providing angular rotation around the z-axis.

Using trigonometry, the position of the end-effector is given by Eqs. (1) and (2):

$$\:x=d\text{cos}\theta\:$$

(1)

$$\:y=d\text{sin}\theta\:$$

(2)

where d represents the prismatic extension and θ the rotational angle. The workspace is formed by varying d and θ. For each $\:d\in\:\left[\text{200,400}\right]$mm and $\:\theta\:\in\:[0^\circ\:,360^\circ\:]$, the end-effector traces circular paths of increasing radii, creating a ring-shaped workspace as shown in Fig. 2. These equations are validated across the full range of prismatic joint extensions, from the inner radius (200 mm) to the outer radius (400 mm), ensuring consistent applicability across the entire reachable workspace.

Inverse kinematics

IK determines the joint parameters ($\:\theta\:,\:d$) from a given end-effector position (x, y). The rotation angle $\:\theta\:$ is calculated by Eq. (3), while the prismatic extension $\:d$ is obtained using the Euclidean distance formula, as shown in Eq. (4). This provides the radial distance from the origin, corresponding to the prismatic joint extension.

$$\:\theta\:={\text{tan}}^{-1}\left(\frac{y}{x}\right)$$

(3)

$$\:d=\sqrt{{x}^{2}+{y}^{2}}$$

(4)

Although FK and IK are conceptually inverse processes, they are not directly reversible. IK is more complex due to issues such as non-uniqueness, singularities and the possibility of infeasible solutions. Therefore, learning-based methods and optimization techniques are essential for achieving stable and accurate inverse solutions across the manipulator’s workspace. As mentioned earlier, two strategies were employed,

(1)
Quadrant-Based Training

The dataset was divided into four quadrants. For each quadrant, training was performed using respective quadrant’s data, while testing involved evaluating the model on square and circular paths within each quadrant. Error analysis focused on the deviations between predicted and actual paths, and performance was assessed by comparing the path errors between the desired trajectories and those generated by the DL models. The dataset generation process involved systematically computing the robot’s workspace coordinates (x, y) for a given range of prismatic joint values d (200–400 mm, in increments of 0.15 mm) and rotational joint angles θ (0–90°, in increments of 0.15°). This forms the datasets for the first quadrant. To generate datasets for the remaining three quadrants, coordinate transformations were applied by swapping and negating x and/or y, and adjusting θ by $\:90^\circ\:$, $\:180^\circ\:$, $\:270^\circ\:$, respectively. Given 1,334 prismatic joint values and 601 rotational angle values, the total dataset size per quadrant is 801,734 data points. The step size of 0.15 mm (linear) and 0.15° (angular) was selected based on resolution sensitivity experiments to ensure smooth trajectories and accurate model learning without introducing unnecessary redundancy. Although these angular and linear increments share the same numerical value, they are not spatially equivalent. A change in angle $\:\theta\:$ results in a Cartesian displacement that depends on the current radius d, whereas the linear increment from the prismatic joint translates directly along a fixed axis. Nonetheless, this balanced sampling strategy in polar coordinates results in a uniformly distributed Cartesian workspace, thereby improving learning consistency across all quadrants without assuming equivalence between angular and linear units. Figure 3 illustrates the quadrant-based workspace, and Table 1 summarizes the transformations used for generating each quadrant.

Table 1 Quadrant-based coordinate and angle transformations.

Subjects

Abstract

Similar content being viewed by others

Advanced predictive machine and deep learning models for round-ended CFST column

Monitoring and deformation of deep excavation engineering based on DFOS technology and hybrid deep learning

Deep reinforcement learning trajectory planning for robotic manipulator based on simulation-efficient training

Introduction

Proposed methodology

Forward kinematics

Inverse kinematics

Deep learning models and results

Quadrant-wise training and evaluation

Full workspace training

Conclusion

Data availability

References

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Competing interests

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Quick links