Abstract
This study examines the influence of machine learning model optimization on improving production efficiency within the footwear manufacturing domain. Through systematic refinement of the logistic regression model, predictive accuracy increased from 94.12 to 97.06%, while achieving complete specificity (100%), indicating a stronger capability to correctly classify defect free outputs. When benchmarked against other supervised learning algorithms, including Support Vector Machines, Naïve Bayes, and Multinomial classifiers, the optimized model exhibited superior sensitivity, F1-score, and balanced accuracy, demonstrating its robustness across diverse operational conditions. From an industrial performance perspective, these predictive gains translated into measurable process improvements a 7.2% enhancement in production throughput, a 9% reduction in equipment downtime, and a 5.3% decrease in overall energy consumption. Such improvements emphasize the practical relevance of integrating tuned classification models with real-time manufacturing analytics. The results collectively underscore the potential of advanced data driven optimization frameworks to enhance productivity, energy efficiency, and sustainability within intelligent footwear production shop floor.
Similar content being viewed by others
Introduction
The rapid evolution of intelligent manufacturing has increased the demand for predictive models that can optimize real time decision making in complex production environments1. Automation and cyber physical integration have enabled continuous data acquisition, requiring analytical methods that support operational adaptability and energy efficient scheduling2. Regression based methods such as multivariate adaptive regression splines (MARS) are widely used due to their capability to capture nonlinear interactions while maintaining interpretability, making them effective for production reliability and process optimization3. These models have demonstrated strong performance in improving precision driven manufacturing tasks and reducing operational uncertainties4. In parallel, machine learning techniques such as support vector machines are applied for classification and fault detection due to their robustness in handling non-linear boundaries5. Logistic regression has been adopted for predictive quality analytics in categorical systems6, while Naive Bayes is utilized for fast probabilistic decision making in high variability production conditions7.
Optimization of takt time has emerged as a key strategy for aligning production capacity with demand while controlling cost and variability8. Studies demonstrate the use of MARS models to enhance precision machining performance and reduce energy fluctuations in process intensive applications9. Empirical investigations further show how regression-based forecasting improves schedule adherence in industries with dynamic takt time requirements10. MARS driven prediction models have also been effective in optimizing production cycle time under varying operational conditions11. Regression based hybrid methodologies continue to gain prominence in modelling nanofluid processing and predicting precision manufacturing behaviour12. Moreover, genetic algorithms have proven effective in reducing energy consumption by optimizing machine allocation sequences13. Other studies have successfully applied hybrid metaheuristics to minimize job tardiness and energy waste in flow shop scheduling14. Regression fuzzy integrated models have also been used for energy forecasting and environmental impact prediction in industrial operations15.
Machine learning has contributed significantly to throughput optimization through predictive monitoring of manufacturing systems16. Big data models have enhanced predictive maintenance and enabled real time production analytics17. Regression techniques continue to be applied for predicting process parameters and improving production conformance18. Downtime reduction has been achieved through models targeting sequence dependent setup minimization19. Job scheduling solutions integrating genetic rules have further improved changeover management20. Time sensitive operational environments have benefited from dispatching strategies based on machine specific dependencies21. Additionally, support vector machines have demonstrated high accuracy in operational readiness prediction22. Neural network regression architectures have enhanced surface quality prediction in complex machining23. Combined regression neural models have shown potential for material porosity estimation in precision manufacturing24. However, model accuracy may be affected by multicollinearity in logistic frameworks, necessitating careful variable selection25. With the rise of digital manufacturing, privacy preserving regression models using encryption techniques are increasingly deployed to protect operational datasets26. Energy aware scheduling is further supported by multi-objective optimization models focusing on emissions reduction27. Grey relational analysis provides an effective multi-criteria decision support framework for sustainable manufacturing targets28. Gradient boosting techniques have enhanced predictive robustness in flow shop optimization29. Machine learning models such as logistic regression and Naive Bayes continue to be applied for adaptive production behaviour analysis and prediction30. Metaheuristic heuristics are increasingly utilized in solving complex scheduling and resource allocation problems across manufacturing sectors31. Regression based fuzzy logic models have shown applicability in energy forecasting and smart material selection32. Predictive analytics has also contributed to resilient supply chain optimization through scenario driven modelling33. Recent studies demonstrate the relevance of hybrid optimization models in real world industrial systems for balancing throughput, cost, and environmental objectives34. Integrated approaches have further strengthened supply chain stability and process resilience under uncertainty35. Regression driven models are now being applied to optimize energy usage while improving operational flexibility in manufacturing systems36.
Based on these research gaps, this study makes a novel contribution by integrating regression and machine learning models on an industry specific dataset to simultaneously optimize takt time, reduce energy consumption, enhance throughput, and minimize downtime. Unlike studies using generic or simulated datasets, this work validates its findings using real operational dependencies, demonstrating practical applicability in intelligent and energy efficient manufacturing.
Research gap
Despite considerable progress in applying machine learning and regression techniques to optimize manufacturing processes, key limitations persist, particularly when dealing with multifaceted and industry specific datasets such as those encountered in footwear production. Most existing studies tend to isolate individual performance metrics like energy efficiency or cycle time, without holistically incorporating interconnected operational dimensions such as takt time, changeover intervals, and downtime into a unified analytical framework37. This narrow focus overlooks the potential synergies and trade-offs that emerge when these indicators are optimized collectively. Recent advances in energy aware flow shop scheduling and controllable setup time modelling highlight the importance of integrating such parameters within a comprehensive decision-making structure38,39,40.
Moreover, although predictive models ranging from linear regression and support vector machines to neural networks have achieved notable success in specific use cases41,42,43, their combined and comparative deployment in a complex, real world production setting remains largely underexplored. Particularly in domains like smart footwear manufacturing, where process delays, operator machine interactions, and sequence dependencies play a pivotal role, the current literature offers limited evidence on the practical integration of hybrid predictive approaches tailored to dynamic production environments44,45,46,47.
Current research overlooks how optimizing one factor, like changeover time, affects others such as energy use or takt time, due to limited integrated data and standardized benchmarking. Furthermore, while sustainability and throughput optimization have been studied independently, empirical validation of multi-model frameworks particularly those that leverage both classification and regression algorithms for strategic forecasting and scheduling remains scarce48,49. This study introduces a hybrid model that integrates statistical regression and machine learning classification to improve production forecasting accuracy while simultaneously optimizing productivity, energy consumption, and changeover time, effectively bridging both methodological and practical gaps50 and51 in data driven manufacturing analytics.
Through empirical validation using contextual production data, the study demonstrates how advanced analytical modelling can significantly strengthen operational intelligence in modern manufacturing.
This study distinctly advances prior work by,
-
1.
Integrating multi-model predictive analytics (regression + classification) to handle both continuous and categorical production metrics.
-
2.
Simultaneously optimizing throughput, downtime, takt time, changeover, and energy efficiency within a single hybrid analytical architecture.
-
3.
Providing empirical validation using real manufacturing data, bridging the gap between theoretical optimization and operational performance.
-
4.
Table 1 presents below a clearly structured comparative summary of prior research and the present study
Methodology
This study adopts a hybrid analytical framework, integrating statistical and machine learning models to optimize multiple manufacturing performance indicators using the dataset. The approach consists of the following phases.
Data preprocessing and feature engineering
The dataset is first cleaned to handle missing values and inconsistencies. Variables related to processing time, energy usage, workstation changeover, and job sequences are extracted. Feature transformation is applied to normalized scales, and new features such as takt time deviation, downtime rate, and setup overlap index are derived.
Dataset description
The dataset provides an extensive overview of flow shop sequence dependent operational activities within a footwear manufacturing setup, emphasizing the sequence dependent production flow of upper assembly and finishing operations. It consists of 34 recorded production instances, each representing distinct batch configurations observed across multiple workstations. Data were collected from five major machine categories stitching, skiving, lasting, pressing, and trimming representing the core stages of upper shoe manufacturing, where production line task dependencies and setup time variations critically affect cycle time and energy usage. Ten operators participated in the sessions, strategically assigned to workstations based on skill proficiency, task familiarity, and ergonomic suitability. Skilled Operator related parameters, including task duration, idle intervals, and error occurrences, were meticulously tracked to enable manpower performance evaluation.
To ensure data consistency prior to model development, comprehensive preprocessing was carried out like missing or inconsistent Time series entries were corrected using linear interpolation and categorical variables such as machine type and operator ID were numerically encoded to enable regression and classification analyses. Continuous features like cycle time, energy consumption, and changeover duration were normalized via min–max scaling to mitigate model bias and outliers exceeding 1.5 times the interquartile range were reviewed and adjusted or excluded based on process validation records.
Descriptive Analytics and Correlation Analysis
Initial statistical analysis is conducted to identify significant patterns and dependencies among variables. Pearson correlation and variance inflation factors (VIF) techniques are applied to detect multicollinearity among predictors.
Workflow of Production Process
-
1.
Develop a Mathematical Model for Time Study.
Formulate a Time study model (Table 6 in the Appendix 1) for the flow shop upper shoe manufacturing process that accounts for sequence dependency and job scheduling constraints.
-
2.
Apply Optimization Techniques.
Utilize feature selection methods within linear regression or similar optimization approaches to identify key categorical and numerical variables that significantly influence production performance.
-
3.
Identify Production Bottlenecks.
Analyses process data to detect workstations or operations causing delays or imbalances in the production line.
-
4.
Analyses Influential Variables and Outliers.
Examine influential factors and outlier patterns to support data driven decisions aimed at improving process efficiency, resource allocation, and overall productivity.
Regression Driven Mathematical Modelling Approach
Multiple regression techniques, including linear regression, are utilized to model and predict continuous performance indicators such as energy consumption and takt time. The predictive accuracy of these models is assessed using statistical evaluation metrics including Mean Absolute Error (MAE), Root Mean Square Error (RMSE), and the coefficient of determination (R2) in the Table 2. Accordingly, a regression based analytical framework is developed to estimate production rate (PPH) as a function of key operational parameters. In alignment with manufacturing theory, this study applies to the Linear Inverse Based Production Model formulated below and Mathematical derivation to compute throughput, downtime and energy consumption improvement and Production scaling under optimized cycle time constraints in the Appendix 8 to quantify the influence of operational factors on PPH.
The baseline throughput of the production line was calculated using the sum of the inverse of cycle times multiplied by the corresponding manpower allocation, resulting in a total output of 1987.97 pieces per hour. The optimized model applies to a throughput enhancement factor of 1.072, increasing the production rate to 2131.10 pieces per hour. When extrapolated to an 8-h shift, the total output improves from 15,903.78 pieces to 17,048.86 pieces, corresponding to a 7.20% increase in productivity (Appendix 8).
This improvement is achieved by proportionally reducing the cycle time across all operations using a scaling factor of 0.9330 (i.e., \(r=1/1.072\)). Since pieces per hour is inversely proportional to cycle time, reducing cycle time leads directly to increased throughput. Thus, the optimized condition not only increases hourly output but also demonstrates the effectiveness of cycle time compression in enhancing overall line performance.
Classification and pPredictive mModelling
For categorical performance outcomes like job delay (On-time vs Late), models including logistic regression, support vector machines, multinomial regression, Linear Discriminant Analysis and Naive Bayes are applied. Cross validation is performed to avoid overfitting and ensure model generalizability.
Model Integration and Optimization
The results from regression Table 2 and classification models Tables 3 and 5 are integrated to formulate a decision support tool that recommends optimal schedules and resource allocations. This step draws on ensemble strategies to leverage the strengths of individual models.
Experimental Procedure
A methodical procedure to maximize production efficiency and accomplish desired results is described in the flowchart (Fig. 1). Key parameters like cycle time (T1), process time (PT), production per hour (PPH), and standard minutes (SMS) are calculated after input data collection. These computations reveal information about manufacturing efficiency and capability. After that, a decision point assesses if the computed metrics produce the intended results. If the goal is accomplished, the outcomes are recorded and examined for potential enhancements in the future. If not, changes are made in real time to improve procedures and match objectives. Iterative in nature, the procedure ends when the intended results are obtained.
Algorithm analysis
The two most crucial aspects of the production line that this study (Table 6) in the Appendix 1 attempted to address were reducing tardiness and machine idle times to minimize makespan and utilizing energy as objective limitations. The categorical variables in this computed mathematical Time Study model on different time slabs in a daily output Sequence dependent and scheduling policy are derived from the real time data being collected in the production unit. In this case, work measurement is critical to improving the production line; specifically, time study is the one work measurement approach required to compute Standard Minutes based on operators’ consistent performance. Understanding and analyzing the manufacturing unit’s structure will help to show the relationships between the various processes and ascertain how they affect the production line at the last minute. This is the major goal of mathematical time study.
After closely observing the real-time manufacturing unit, the Mathematical Time Study Model computed the table that is displayed below. Before launching the manufacturing line, the production department must determine the hourly target and set the daily target. Investigating time studies and taking the right actions to increase production to meet demand are important ways to gauge how well the production unit is performing well.
Using models built on enhanced local data is one method of enhancing quality. Individual factors that have the biggest effects on processing quality are the focus of the analysis. Nonetheless, a few variables can affect the acquisition of high-quality data tuples. It will take further research to determine whether their qualities can change after a while. Assigning the most effective algorithms and classification models for each subset is made possible by pre-training the models on the segments and assessing the characteristics of the resulting sample segment. By assigning the necessary number of workers to the right workstation with the best qualitative attributes, we may raise the various quality indicators for each classifier to a new percentage level. The quality measures of the ensemble models demonstrate something similar. However, they require advanced aggregation functions and processing capacity to allow the data processing models to operate in parallel, which is different from the recommended technique.
In footwear manufacturing, real time decision making on the shop floor is often based on intuition rather than data driven methods, resulting in inconsistent performance and underutilization of installed capacity. Due to variations in machine speed, operator efficiency, and sequence dependent tasks, production frequently falls below optimal output levels. To address these issues, this study adopts advanced regression and classification models, including Naïve Bayes and PKS algorithms, which are specifically chosen for their relevance to footwear production dynamics. These methods incorporate standard industrial parameters such as cycle time, manpower, and processing rates, enabling accurate classification and optimization of production performance. By integrating multiple algorithms into a unified framework, the proposed approach enhances decision making, adapts to real time data variability, and improves overall operational efficiency.
Pseudo Code
The pseudocode in Appendix 6 outlines a step-by-step data analysis process using R. It begins by loading essential libraries and visualizing initial data distributions. A linear regression model is then built and evaluated for multicollinearity using VIF, followed by calculating confidence intervals for its coefficients. A Generalized Linear Model (GLM) is also fitted, and odds ratios with confidence intervals are visualized. The process is repeated for a second dataset. Model performance is evaluated using a confusion matrix. Finally, a Support Vector Machine (SVM) is trained, and its parameters are optimized for better classification results.
PKS Algorithm 1

This means that, in regular practice, we can observe that the production firm is always beneath the installed capacity of its manufacturing plant. Consequently, we have a considerate necessity for effective organizing in a sequence dependent behavior and scheduling to take charge of the production.
The basic steps in this study are.
-
a)
Ensure the supervisor and the operator have a detailed plan for the task.
-
b)
Observe the number of cycles with determination.
-
c)
Calculate the cycle time (T1) and notice the performance of operators.
-
d)
Finally, compute the standard times carefully.
-
e)
Fix the target per hour.
Evaluation and Results
Performance Evaluation
The final framework is assessed based on its ability to reduce takt time deviation using PKS Algorithm 1, the author source to improve throughput, minimize changeover time, and lower energy usage. These outcomes are validated using both baseline comparisons and sensitivity analysis. This methodology offers a comprehensive and data driven strategy for improving manufacturing system performance, particularly in energy intensive and sequence sensitive environments like shoe production. By applying this framework to data, the study bridges theoretical modelling techniques with practical industrial application. Figure 10 (Appendix 2) presents the manpower (MP) distribution across categorical variables, while Fig. 11 illustrates processing time (PS) grouped by Production per hour (PPH).
To determine and assign the causes linked to modifications in the Time study that resulted in the controlled process improvement, one needs to pay close attention to Figs. 14 and 15 (Appendix 4).This bar graph compares the Production per hour (PPH) before and after modifications to the time study sequence dependent flow shop scheduling, making it abundantly evident that an increase in T1 would result in a decrease in PPH. The categorical production data is broken down by group or interval in this bar graph, which uses bars to depict the productions for each of the several category variables and their counts.
The production output ranges clearly from low to high levels, with 36 identified as a low performing outlier in Fig. 2, indicating a potential production bottleneck. This bottleneck is attributed to operator skill variability observed in the Time Study Mathematical Model.
A measure of how two sets of data move together is called correlation, and it will help us determine how to evaluate feature engineering. In modern research, correlation is frequently employed in conjunction with regression tools and other machine learning techniques to analyze a variety of engineering features (e.g., wind power prediction based on meteorological data and local terrain). The linear relationship between the two values is calculated using the Pearson Correlation coefficient (r). This plot facilitates the simple depiction of the relationship between multiple category variables by examining the degree of that association. Correlation diagram using R, which is 0.86, shows that there is a strong positive association between the Manpower (MP), Cycle Time (T1), and Standard Minutes Value (SMV).
The P-Value in Table 2 indicates that the differences between the MP, T1, and SMV are extremely significant. The correlation analysis in Fig. 12 and Fig. 13 (Appendix 3) shows a high negative association between processing time and productivity per hour (PPH), suggesting that the operator task performance was executed in an expected way. Note that the p-value between these is likewise very significant.
Proposed system implementation
The author source PKS Algorithm and dataset description are detailed in Table 6 in the Appendix 1 in the context of optimizing upper shoe production through machine learning driven sequence dependent flow shop scheduling.
Based on Figs. 12 and 13 presented in Appendix 3, the sequence dependent flow shop time study was evaluated using R to assess potential multicollinearity in the regression model through Variance Inflation Factor (VIF) analysis. VIF and tolerance statistics were applied to determine the degree of linear dependence among the independent variables in the time study sequence. In regression diagnostics, a VIF value of 0 indicates no collinearity; however, in practical manufacturing applications, VIF values below 5 are considered negligible, while values above 10 indicate significant multicollinearity. Applying this criterion to our upper shoe sequence dependent production data, the results from Figs. 12 and 13 confirm that multicollinearity is not present, as all VIF values fall within the acceptable range, thereby validating the reliability of the regression model used in the R based analysis.
Problem analysis
After using R to analyze the observed sequence-dependent Time study, this is congruent with the pattern seen in the Production Line between the Manpower (MP), Cycle Time (T1), and Production Per Hour (PPH). R may do a best fit line analysis on the multi-linear regression that was generated. Using R, we were able to take out the important categorical variable called Manpower (MP) from this multi-linear regression. This variable might function as our threshold value, impacting the PPH directly and proportionately up to a new threshold. The response variable (PPH) on the production line rises when we increase one unit of MP while holding constant T1, the other significant categorical variable in the regression equation. However, we also consider the residual standard error (RSE). Here’s how this time study regression analysis can be utilized in the context of capacity increasing in the production line. This Time study regression analysis can be used as a tool to guide decision-making and identify potential areas for improvement. Though it is not a direct method to increase production capacity, it provides insights into various aspects of production, such as identifying influential factors or estimating future performance.
Observe closely how inflation varies with an increase in the number of workforces during the study period with the help of performance evaluation metrics in Fig. 3. More importantly, because insufficient staffing leads to multicollinearity, efficient staffing allocation is essential for sequence-dependent time research. Carefully examine the sequence dependent Upper Shoe time research with the use of forward selection to assign the appropriate workforce to prevent multicollinearity and concurrently lower the residual standard error. It may also be found that the production strategy view is centered on increasing customer demand and keeping an eye on cost and revenue concerns.
This Regression technique in the observed Time study will help us to determine how the variables influence production capacity and quantify the strength of those variable relationships. Look for variables that have a significant impact on production capacity. Variable positive coefficients indicate factors that increase production capacity, while negative coefficients suggest factors that decrease and by using the R closely examining, we were able to improve the shop floor production on an hourly basis, elevate it to a new level, and setting new goals in response to customer demand. The two distinct Regression lines illustrated in Fig. 16 and Fig. 17 of Appendix 5 clearly demonstrate this. By doing this, costs are optimized while fewer days and subcontracting fees are incurred. In conclusion, Manpower (MP) and Cycle time (T1) are important predictors in both statistically significant models. According to greater \({\text{R}}^{2}\), smaller residual standard error, and higher F-statistics, Model 2 appears a better fit to the data than Model 1 (Table 2), and both independent variables (MP and T1) are significant predictors of the dependent variable (PPH).
Evaluation of the Performance using Logistic Regression
Using logistic regression in R with a threshold value of 50 PPH to evaluate model performance, it was initially observed that most categorical variables fell below the 55% threshold, as illustrated in Fig. 4. Following adjustments to the production sequence, operator efficiency significantly improved, resulting in 75% variables exceeding the threshold, as shown in Fig. 5 below.
A threshold based binary response variable (Y1) was modelled against pieces per hour (PPH) using logistic regression and SVM to compare statistical interpretability with Margin based classification accuracy. Logistic regression applied to the modified dataset (n = 34) confirmed PPH as a significant predictor (β = 0.3415, p = 0.0114), with an odds ratio of 1.41 (95% CI: 1.14–1.95), indicating increased odds of Y1 = 1 for each unit rise in PPH. The model outperformed the null model (AIC = 27.56) and demonstrated strong predictive power, achieving 97.06% accuracy, 88.89% sensitivity, and 100% specificity.
The linear SVM achieved comparable accuracy (≈97.06%) using 18 support vectors, while the polynomial kernel variant used fewer (12) without performance gain. Hyperparameter tuning through tenfold cross validation selected cost = 4 (ε = 0.1) with a minimal cross-validated error of 0.0513, yielding about 94.12% accuracy. Owing to the limited sample and single predictor, the logistic model is prioritized for interpretation, with SVM serving as a confirmatory approach. Cross validation analyses were further employed to validate generalization reliability. Confidence intervals provide a statistical range within which the true population coefficient is likely to fall with a specified level of certainty. For the PPH variable, the analysis indicates that, with 95% confidence, the true coefficient lies between 1.171723 and 2.0800697418. Following the modifications to the production line, the confidence interval for the same variable narrows, suggesting improved precision, with the true coefficient estimated to fall between 1.39572 and 1.952094684 at the 95% confidence level. This range reinforces the reliability of the model’s estimates and reflects the positive impact of operational adjustments.
Residual plots in R
The plots of residuals Fig. 18 and Fig. 19 against the predictor variable in the Appendix 7, which are discovered using this plot to extract the linear model, displayed below. This graphic analysis of the residuals looks for any lingering structure or pattern that can provide insight into the model applicability.
The model’s ability to visually represent different parameters used to predict contrast and compare production per hour (PPH) with the previous one is demonstrated by this box plot in Fig. 6 above. Looking at the length of the whiskers in Fig. 6, the range of the two graphs above reveals that in the real time observed time research, 25% of the observed data are below 50 (PPH). The interquartile range then shifts from being below 50 to 65. Keep in mind that the Median is raised following modifications to the time study and outliers as well.
Check model before and after changes in time study
Based on the results with the help of check models in Fig. 20 and Fig. 21 showing in the Appendix 9, it is clear from the check model in R that Model 2 performs better than Model 1 in terms of linearity, showing better traits in controlling modest changes in inflation and adhering to the normalcy assumption.
Improvement in both the process (PS) and the response variable (PPH) would be possible to the feature selection optimization technique, which was enabled in the chosen Manpower (MP) categorical variable based on the Production Per Hour (PPH) in the mathematical model of sequence-dependent time study. Following the forward selection process, we may examine the newly significant categorical variable that would assist you in improving the production line and determine the necessary adjustments in the production sequence to make in the newly constructed multi-linear regression model.
Of all the important metrics in Table 3 above, Model 2 performs better than Model 1. With less prediction errors, it indicates a better fit to the data with lower AIC, AICc, BIC, RMSE, RSE, and MAE values. Furthermore, Model 2 has an adjusted \({R}^{2}\) that takes into consideration the number of predictors and explains a larger percentage of the variation (\({R}^{2}\) = 0.947). Further evidence that Model 2 is more reliable and accurate comes from the reduced RMSE, sigma, and Cp values. When compared to Model 1, Model 2 is unquestionably the most accurate and dependable model overall.
To translate model results into actionable insights, the changes in key operational metrics energy consumption, throughput, and downtime were consolidated in a single Table 4 below. Model 2 achieved noticeable improvements in throughput and energy efficiency, confirming that predictive optimization directly supported process reliability. The integration of optimized logistics and SVM models thus resulted in higher predictive reliability, reduced energy intensity and smoother takt time alignment across workstations.
Synthesis and interpretation
Regression diagnostics and AIC / BIC results indicate that Model 2 offers the best balance between accuracy and generalizability. The post tuning improvement of logistic regression, consistent with SVM outcomes, confirms that well regularized hybrid models outperform single model approaches. Integrating energy, throughput, and downtime data strengthens the analytical foundation for intelligent production forecasting in smart footwear manufacturing.
Discussion
Combining regression and Machine learning (ML) models improved predictive accuracy and efficiency, with Model 2, Z \(= 0.94)\) and tuned classifiers achieving 97% accuracy. These results align with prior studies emphasizing energy efficient scheduling. However, the limited dataset may affect generalizability, suggesting future work with larger or real time datasets for broader applicability.
Model validation using ROC curve
The relationship between Y1 and pieces per hour was analyzed using logistic regression and Support vector machine (SVM). Logistic regression showed a significant positive effect (β = 0.3415, p = 0.0114; Odds Ratio = 1.41), achieving 97.06% accuracy, while ROC curves in Figs. 7 and 8 below indicated near perfect discrimination (AUC ≈ 1.0). The SVM models produced comparable accuracy after cross validation, confirming the robustness and generalizability of the predictive relationship.
Furthermore, the AUC value increased from 0.89 to 0.92 by process improvements, demonstrating enhanced discriminatory capability. This indicates strong classification performance, which is suitable for production decision making, provided it aligns with the specific operational requirements and customer demand characteristics.
Using a classification job, the effectiveness of multiple machine learning algorithms was examined. The evaluation was based on various key measures, such as accuracy, balanced accuracy, sensitivity, specificity, F1 score, and confidence interval are displayed below in Table 5. With a balanced accuracy of 92.92% and a great predictive capacity of 94.12%, logistic regression demonstrated remarkable sensitivity and specificity.
After the modification, its accuracy rose to 97.06%, showing enhanced predictive power, with a balanced accuracy of 94.44%. The Support Vector Machine (SVM) demonstrated remarkable sensitivity of 93.00% for class “0” and excellent specificity for class “1”, resulting in a high accuracy of 94.11%. With an accuracy of 73.50% and a balanced accuracy of 90.00%, Linear Discriminant Analysis demonstrated respectable performance. Naïve Bayes algorithm demonstrated solid performance with an accuracy of 91.17% and a balanced accuracy of 93.05%, notably high sensitivity and specificity for class “1”. Multinomial achieved a high accuracy of 97.06% with perfect sensitivity for class “0” and strong specificity for class “1”. These assessments provide valuable insights into the efficacy of each algorithm showing below in Fig. 9 accurately classifying instances aiding informed decision-making in algorithm.
Conclusion
The study concludes that the proposed integrated framework significantly enhances manufacturing performance by achieving a strong predictive accuracy (R2 = 0.94), reducing energy consumption by approximately 5 %, and improving throughput by about 7%. The results further demonstrate that optimized logistic regression and SVM models, through effective parameter tuning, can attain up to 97% accuracy, confirming the robustness of the hybrid predictive approach. This research contributes a unified, data-driven methodology that bridges productivity, energy efficiency, and downtime, aligning with sustainable manufacturing objectives emphasized in contemporary literature. From a managerial perspective, the findings provide actionable insights for optimizing manpower allocation, machine utilization, and scheduling decisions through reliable predictive analytics. However, the limited dataset size and constrained process variables may affect the broader generalizability of the outcomes. Future research should extend the model to larger and multi-factory datasets, incorporate dynamic or reinforcement learning mechanisms for real-time adaptability, and embed sustainability and carbon-intensity metrics to strengthen the environmental dimension of manufacturing analytics.
Data availability
The data that support the findings of this study are available from the corresponding author, P. K. Sudhakar, upon reasonable request.
References
He, Q. P., Wang, J. & Shah, D. Feature space monitoring for process systems via statistical pattern analysis. Comput. Chem. Eng. 126, 321–331 (2019).
Sim, H. S. Big data analysis methodology for smart manufacturing. Int. J. Precis. Eng. Manuf. 20(6), 973–982 (2019).
Zhang, W., Zhang, R. & Goh, A. T. Multivariate adaptive regression splines approach to estimate lateral wall deflection profiles caused by braced excavations in clays. Geotech. Geol. Eng. 36, 1349–1363 (2018).
Zhang, L. et al. Two-stage optimization of lunar regolith drilling using MARS. Struct. Multidiscip. Optim. 68, 92 (2025).
Bischof, A. Y. & Geissler, A. A logistic regression analysis on non-medical factors favoring caesarean sections. BMC Pregnancy Childbirth 23(1), 759 (2023).
Mujtaba, A., Islam, F., Kaeding, P., Lindemann, T. & Prusty, B. G. ML-based process monitoring for composites. J. Intell. Manuf. 36(2), 1095–1110 (2025).
Dai, M., Tang, J., Qu, R. & Zheng, Y. Energy-efficient scheduling for flexible flow shops by hybrid particle swarm optimization. Robot. Comput. Integr. Manuf. 29, 418–429 (2013).
Zeinalnezhad, M., Mustapha, N., Sahran, S. & Mukhtar, M. Prediction of air pollution using semi-experiment regression model. J. Clean. Prod. 261, 121218 (2020).
Bayman, E. O. & Dexter, F. Multicollinearity in logistic regression: Detection and remedies. Anesth. Analg. 133(2), 362–365 (2021).
Zhu, C. & Wang, J. Fuzzy linear regression model for sales forecasting in manufacturing enterprises. J. Intell. Fuzzy Syst. 40(4), 8477–8484 (2021).
Johannesen, N. J., Cooke, M. P. & Fordham, R. J. Regression tools for energy demand forecasting. J. Clean. Prod. 218, 555–564 (2019).
Li, Q. et al. Cropping structure and environmental impacts. Environ. Impact Assess. Rev. 106, 107489 (2024).
Joo, C. M. & Kim, B. S. A hybrid genetic algorithm with dispatching rules for energy-efficient scheduling in a job shop. Comput. Ind. Eng. 85, 102–109 (2015).
Zhang, R. & Chiong, R. Solving the energy-efficient job shop scheduling problem: A multi-objective genetic algorithm approach. J. Clean. Prod. 112, 3361–3375 (2016).
Zheng, W., Yang, M., Huang, D. & Jin, M. Deep learning optimization in monoclonal antibody manufacturing. J. Adv. Comput. Syst. 4(12), 28–42 (2024).
Revathi, G., Baskaran, M., & Marimuthu, K. Multiple linear regression on hybrid nanofluid flows. Waves in Random and Complex Media 1–18 (2023).
Pavlenko, I., Medvedev, A. & Kornienko, T. Regression analysis for automated material selection in smart manufacturing. Mathematics 10(11), 1888 (2022).
Grzelak, M., Borucka, A., & Guzanek, P. Application of linear regression in production effectiveness modeling. In International Conference Innovation in Engineering (pp. 36–47). Springer (2021).
Kostyrin, E. & Rozanov, D. Modeling financial flow processes in discrete production systems. Emerg. Sci. J. 7(3), 897–916 (2023).
Chang, L. K. et al. Additive manufacturing property prediction with texture-based machine learning. Int. J. Adv. Manuf. Technol. 132(1), 83–98 (2024).
Zhang, W., Zhang, R., Wang, W., Zhang, F. & Goh, A. T. C. A multivariate adaptive regression splines model for determining horizontal wall deflection envelope for braced excavations in clays. Tunn. Undergr. Space Technol. 84, 461–471 (2019).
Fleszar, K. & Hindi, K. S. A hybrid constructive metaheuristic for the resource-constrained parallel machine scheduling problem. Eur. J. Oper. Res. 271(3), 839–848 (2018).
Bajaj, N. S. et al. Metaheuristic optimization-based SVM for tool condition monitoring. Intell. Syst. Appl. 18, 200196 (2023).
Chen, J. F. Scheduling with setup and due-date constraints. Int. J. Adv. Manuf. Technol. 44(11–12), 1204–1212 (2009).
Jackson, I., Ivanov, D., Dolgui, A. & Namdar, J. Generative artificial intelligence in supply chain and operations management: a capability-based framework for analysis and implementation. Int. J. Prod. Res. 62(17), 6120–6145. https://doi.org/10.1080/00207543.2024.2309309 (2024).
Gahm, C., Denz, F., Dirr, M., & Tuma, A. Energy-efficient scheduling in manufacturing companies: A review and research framework. European Journal of Operational Research (2016).
Li, Y., et al. Privacy-preserving ridge regression with multikey encryption. IEEE Trans. Depend. Secure Comput. (2025).
Elahi, E. et al. Measuring farm-level carbon efficiency. Agric. Syst. 218, 103994 (2024).
Debnath, B. et al. Grey relational approach to sustainability in apparel manufacturing. Results Eng. 22, 102006 (2024).
Ying, K. C., Lin, S. W. & Wang, H. S. Makespan minimization in unrelated parallel machines using multi-start simulated annealing. J. Intell. Manuf. 23(5), 1795–1803 (2012).
Zahara, F. A. & Febriyanti, E. M. Company size and CSR on financial performance. J. Econ. Manag. Technol. 1(1), 17–29 (2025).
Zouhri, W., Homri, L. & Dantan, J. Y. Critical manufacturing parameters for accurate SVM modeling. Int. J. Interact. Des. Manuf. 16(1), 177–196 (2022).
Lokanan, M. E. & Maddhesia, V. Supply chain fraud prediction with machine learning and artificial intelligence. Int. J. Prod. Res. 63(1), 286–313 (2025).
Charizanos, G. et al. Monte Carlo fuzzy logistic regression in risk modeling. Inf. Sci. 655, 119893 (2024).
Sobuz, M. H. R. et al. Prediction of SCC performance using ML models. Int. J. Concr. Struct. Mater. 18(1), 67 (2024).
Xu, W. et al. Verifiable privacy-preserving Cox regression modeling. Peer-to-Peer Netw. Appl. 17(5), 3182–3199 (2024).
Pei, J., Yan, H., Wu, Y. & Du, S. Serial-batching machine scheduling with controllable setup times. Ann. Oper. Res. 249, 175–195 (2017).
Ghorbanzadeh, M., Davari, M. & Ranjbar, M. Energy-aware flow shop scheduling with uncertain renewable energy. Comput. Oper. Res. 170, 106741 (2024).
Frigerio, N., Matta, A., & Lin, Z. Pareto front analysis of buffer-based energy efficient control for machines in serial flow lines. In 2022 IEEE 21st Mediterranean Electrotechnical Conference (MELECON) (pp. 237–242) (2022). IEEE.
Renna, P. Energy saving by switch-off policy in a pull-controlled production line. Sustain. Prod. Consump. 16, 25–32 (2018).
Xu, W., Ding, R., Liu, J. & Li, H. Practical privacy-preserving linear regression under multiple encryption keys. Inf. Sci. 596, 119–136 (2022).
Zhang, C. et al. Digital transformation and carbon emissions in manufacturing firms. Int. Rev. Econ. Finance 92, 211–227 (2024).
Lyu, Y., Zhang, H., Wang, T. & Chen, L. An enhanced logistic regression model for real-time failure prediction in smart manufacturing systems. J. Manuf. Syst. 78, 133–145. https://doi.org/10.1016/j.jmsy.2023.12.008 (2024).
Müller, J., Koelbl, B. S. & Reuter, M. A. Ad hoc supply chains during COVID-19: Evidence from Europe. J. Oper. Manag. 69(3), 426–449 (2023).
Udu, A. G., et al. Machine learning for porosity characterization in thermoplastics. J. Reinf. Plast. Compos. (2024).
Xu, Z., Xu, L., Ling, X. & Zhang, B. Data-driven hierarchical learning and real-time decision-making of equipment scheduling and location assignment in automatic high-density storage systems. Int. J. Prod. Res. 61(21), 7333–7352. https://doi.org/10.1080/00207543.2022.2148011 (2022).
Zhang, L., Chu, X., Chen, H. & Yan, B. A data-driven approach for the optimisation of product specifications. Int. J. Prod. Res. 57(3), 703–721. https://doi.org/10.1080/00207543.2018.1480843 (2018).
Alawee, W. H. et al. Grey Boosting for water production forecasting. Desalin. Water Treat. 318, 100344 (2024).
Wang, R., Yang, P., Gong, Y. & Chen, C. Operational policies and performance analysis for overhead robotic compact warehousing systems with bin reshuffling. Int. J. Prod. Res. 62(14), 5236–5251. https://doi.org/10.1080/00207543.2023.2289643 (2023).
Jang, S., Chung, Y. & Son, H. Exploring the benefits of smart manufacturing systems for SMEs. Manag. Decis. 60(6), 1719–1743 (2022).
Baryannis, G., Validi, S., Dani, S. & Antoniou, G. Supply chain risk management and artificial intelligence: State of the art and future research directions. Int. J. Prod. Res. 57(7), 2179–2202. https://doi.org/10.1080/00207543.2018.1530476 (2018).
Funding
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.
Author information
Authors and Affiliations
Contributions
Data Curation, Formal Analysis, Methodology, Software, Visualization, Validation, Review & Editing – P.K.Sudhakar. Methodology, Supervision, Validation, Review & Editing – R. Muthucumaraswamy.
Corresponding author
Ethics declarations
Competing interests
The authors declare no competing interests.
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary Information
Below is the link to the electronic supplementary material.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.
About this article
Cite this article
Sudhakar, P.K., Muthucumaraswamy, R. Optimizing energy, downtime, and throughput in footwear production through machine learning. Sci Rep 16, 546 (2026). https://doi.org/10.1038/s41598-025-30082-6
Received:
Accepted:
Published:
Version of record:
DOI: https://doi.org/10.1038/s41598-025-30082-6











