Abstract
Hybrid models often referred to as gray-box models offer a promising approach by combining the flexibility of data-driven techniques with the accuracy and physical interpretability of first-principles models. This study evaluates a range of mathematical modeling techniques in the context of chemical reaction engineering, with a focus on the production of dimethyl ether (DME) from bio-methanol in a fixed-bed reactor. A comprehensive case study was conducted, beginning with the development of a first-principles model to solve a system of governing equations and generate 7,000 synthetic data points with added noise. Three black-box machine learning algorithms, including K-Nearest Neighbors (KNN), Gradient Boosting Regressor (GBR), and Extreme Gradient Boosting (XGB), were employed for predictive modeling. In parallel, hybrid modeling approaches were developed to estimate reaction rates and correct reactor outputs. Model performance was assessed using metrics such as mean squared error (MSE) and the coefficient of determination (R2), using key variables including the inlet molar flow rate, initial temperature, pressure, and the outlet concentrations of methanol, dimethyl ether, and water, as well as overall conversion. Results indicated that the data-driven models performed exceptionally well, with hybrid models offering comparable accuracy while maintaining interpretability. Finally, process optimization was performed using the Extreme Gradient Boosting model integrated with a Differential Evolution algorithm. The optimized operational conditions achieved a high dimethyl ether conversion rate of 84.3%, with a minimal temperature rise of 84.9 K.
Data availability
All data generated or analysed during this study are included in supplementary information files.
References
Lourenço, M. P. et al. An adaptive design approach for defects distribution modeling in materials from first-principle calculations. J. Mol. Model. 26, 1–12 (2020).
Chun, H. et al. First-principle-data-integrated machine-learning approach for high-throughput searching of ternary electrocatalyst toward oxygen reduction reaction. Chem Catalysis 1(4), 855–869 (2021).
Zahedi, G. et al. Hybrid artificial neural network First principle model formulation for the unsteady state simulation and analysis of a packed bed reactor for CO2 hydrogenation to methanol. Chem. Eng. J. 115(1–2), 113–120 (2005).
Schmidt, J. et al. Recent advances and applications of machine learning in solid-state materials science. npj Comput. Mater. 5(1), 83 (2019).
Stein, A. F. et al. A hybrid modeling approach to resolve pollutant concentrations in an urban area. Atmos. Environ. 41(40), 9410–9426 (2007).
Kauwe, S. K. et al. Machine learning prediction of heat capacity for solid inorganics. Integrating Mater. Manuf. Innov. 7, 43–51 (2018).
Bhutani, N., Rangaiah, G. & Ray, A. First-principles, data-based, and hybrid modeling and optimization of an industrial hydrocracking unit. Ind. Eng. Chem. Res. 45(23), 7807–7816 (2006).
Nazemzadeh, N. et al. Integration of first-principle models and machine learning in a modeling framework: An application to flocculation. Chem. Eng. Sci. 245, 116864 (2021).
Nielsen, R.F., et al. An uncertainty-aware hybrid modelling approach using probabilistic machine learning. In Computer Aided Chemical Engineering 591–597. Elsevier (2021)
Belyadi, H., & Haghighat, A. Machine learning guide for oil and gas using Python: A step-by-step breakdown with data, algorithms, codes, and applications. Gulf Professional Publishing (2021).
Park, S. et al. Machine learning applications for chemical reactions. Chem Asian J 17(14), 202200203 (2022).
Dobbelaere, M. R. et al. Machine learning in chemical engineering: strengths, weaknesses, opportunities, and threats. Engineering 7(9), 1201–1211 (2021).
Yan, Y., Borhani, T., & Clough, P. Machine Learning Applications in Chemical Engineering (2020).
Carranza-Abaid, A. & Jakobsen, J. P. Neural network programming: Integrating first principles into machine learning models. Comput. Chem. Eng. 163, 107858 (2022).
Sharma, N. & Liu, Y. A hybrid science-guided machine learning approach for modeling chemical processes: A review. AIChE J. 68(5), e17609 (2022).
Paszke, A., et al., Pytorch: An imperative style, high-performance deep learning library. In Advances in Neural Information Processing Systems, vol. 32 (2019).
Abadi, M., et al. {TensorFlow}: A system for {Large-Scale} machine learning. In 12th USENIX Symposium on Operating Systems Design and Implementation (OSDI 16) (2016).
McKinney, W. Data structures for statistical computing in Python. scipy 445(1), 51–56 (2010).
Harris, C. R. et al. Array programming with NumPy. Nature 585(7825), 357–362 (2020).
Pedregosa, F. et al. Scikit-learn: Machine learning in Python. J. Mach. Learn. Res. 12, 2825–2830 (2011).
Virtanen, P. et al. SciPy 1.0: Fundamental algorithms for scientific computing in Python. Nat. Methods 17(3), 261–272 (2020).
Sansana, J. et al. Recent trends on hybrid modeling for Industry 4.0. Comput. Chem. Eng. 151, 107365 (2021).
Zendehboudi, S., Rezaei, N. & Lohi, A. Applications of hybrid models in chemical, petroleum, and energy systems: A systematic review. Appl. Energy 228, 2539–2566 (2018).
Bikmukhametov, T. & Jäschke, J. Combining machine learning and process engineering physics towards enhanced accuracy and explainability of data-driven models. Comput. Chem. Eng. 138, 106834 (2020).
Bradley, W. et al. Perspectives on the integration between first-principles and data-driven modeling. Comput. Chem. Eng. 166, 107898 (2022).
Sun, B. et al. A comprehensive hybrid first principles/machine learning modeling framework for complex industrial processes. J. Process Control 86, 30–43 (2020).
Nasiri, P., & Dargazany, R. Reduced-PINN: An integration-based physics-informed neural networks for stiff ODEs. arXiv:2208.12045, (2022).
Ji, W. et al. Stiff-pinn: Physics-informed neural network for stiff chemical kinetics. J. Phys. Chem. A 125(36), 8098–8106 (2021).
Jinnouchi, R., Karsai, F. & Kresse, G. Making free-energy calculations routine: combining first principles with machine learning. Phys. Rev. B 101(6), 060201 (2020).
Veit, M. et al. Equation of state of fluid methane from first principles with machine learning potentials. J. Chem. Theory Comput. 15(4), 2574–2586 (2019).
Schäfer, P. et al. The potential of hybrid mechanistic/data-driven approaches for reduced dynamic modeling: application to distillation columns. Chem. Ing. Tec. 92(12), 1910–1920 (2020).
Rodriguez, C., Mhaskar, P. & Mahalec, V. Linear hybrid models of distillation towers. Comput. Chem. Eng. 171, 108160 (2023).
Di Caprio U., et al. Predicting overall mass transfer coefficients of CO2 capture into monoethanolamine in spray columns with hybrid machine learning. J. CO2 Util., 70, 102452 (2023).
Dong, S., Zhang, Y. & Zhou, X. Intelligent hybrid modeling of complex leaching system based on LSTM neural network. Systems 11(2), 78 (2023).
Khalid, R. Z. et al. Comparison of standalone and hybrid machine learning models for prediction of critical heat flux in vertical tubes. Energies 16(7), 3182 (2023).
Yang, Q. et al. A hybrid data-driven machine learning framework for predicting the performance of coal and biomass gasification processes. Fuel 346, 128338 (2023).
dos Santos Junior, J. M., Zelioli, Í. A. M. & Mariano, A. P. Hybrid modeling of machine learning and phenomenological model for predicting the biomass gasification process in supercritical water for hydrogen production. Eng. 4(2), 1495–1515 (2023).
Ren, S., Wu, S. & Weng, Q. Physics-informed machine learning methods for biomass gasification modeling by considering monotonic relationships. Biores. Technol. 369, 128472 (2023).
Tsopanoglou, A. & del Val, I. J. Moving towards an era of hybrid modelling: advantages and challenges of coupling mechanistic and data-driven models for upstream pharmaceutical bioprocesses. Curr. Opin. Chem. Eng. 32, 100691 (2021).
Karniadakis, G. E. et al. Physics-informed machine learning. Nat. Rev. Phys. 3(6), 422–440 (2021).
Cheng, Z., Ronen, A., & Yuan, H. Hybrid modeling of engineered biological systems through coupling data-driven calibration of kinetic parameters with mechanistic prediction of system performance. bioRxiv,. 2023.06. 14.545039 (2023).
Zahedi, G., Lohi, A. & Mahdi, K. Hybrid modeling of ethylene to ethylene oxide heterogeneous reactor. Fuel Process. Technol. 92(9), 1725–1732 (2011).
Luo, N. et al. Development of a hybrid model for industrial ethylene oxide reactor. Ind. Eng. Chem. Res. 51(19), 6926–6932 (2012).
Bui, L. et al. A hybrid modeling approach for catalyst monitoring and lifetime prediction. ACS Engineering Au 2(1), 17–26 (2021).
Riyono, B. et al. A hybrid machine learning approach for improving fuel temperature prediction of research reactors under mix convection regime. Results in Engineering 15, 100612 (2022).
Kordkheili, M. S. & Rahimpour, F. Artificial neural network and semi-empirical modeling of industrial-scale Gasoil hydrodesulfurization reactor temperature profile. Math. Comput. Simul. 206, 198–215 (2023).
Mehrani, M.-J. et al. Application of a hybrid mechanistic/machine learning model for prediction of nitrous oxide (N2O) production in a nitrifying sequencing batch reactor. Process Saf. Environ. Prot. 162, 1015–1024 (2022).
Li, K. et al. An integrated first principal and deep learning approach for modeling nitrous oxide emissions from wastewater treatment plants. Environ. Sci. Technol. 56(4), 2816–2826 (2022).
Ghosh, D. et al. Hybrid modeling approach integrating first-principles models with subspace identification. Ind. Eng. Chem. Res. 58(30), 13533–13543 (2019).
Azarpour, A. et al. A generic hybrid model development for process analysis of industrial fixed-bed catalytic reactors. Chem. Eng. Res. Des. 117, 149–167 (2017).
Peterson, L., Bremer, J., & Sundmacher, K. Hybrid modeling of the catalytic CO2 methanation using process data and process knowledge. In Computer Aided Chemical Engineering 1489–1494. Elsevier (2023).
Delgado Otalvaro, N. et al. Kinetics of the direct DME synthesis: State of the art and comprehensive comparison of semi-mechanistic, data-based and hybrid modeling approaches. Catalysts 12(3), 347 (2022).
Murakami, Y. & Shono, A. Reaction engineering with recurrent neural network: Kinetic study of Dushman reaction. Chem. Eng. J. Adv. 9, 100219 (2022).
Lan, T. & An, Q. Discovering catalytic reaction networks using deep reinforcement learning from first-principles. J. Am. Chem. Soc. 143(40), 16804–16812 (2021).
Hassanpour, H., Mhaskar, P. & Risbeck, M. J. A hybrid machine learning approach integrating recurrent neural networks with subspace identification for modelling HVAC systems. Can. J. Chem. Eng. 100(12), 3620–3634 (2022).
Patel, R., Bhartiya, S. & Gudi, R. Optimal temperature trajectory for tubular reactor using physics informed neural networks. J. Process Control 128, 103003 (2023).
Azarpour, A. et al. Catalytic activity evaluation of industrial Pd/C catalyst via gray-box dynamic modeling and simulation of hydropurification reactor. Appl. Catal. A 489, 262–271 (2015).
Ng, A. Machine Learning. coursera. Standford University (2016).
Dangeti, P. Statistics for Machine Learning. Packt Publishing Ltd (2017).
Chen, T., & Guestrin, C. Xgboost: A scalable tree boosting system. In Proceedings of the 22nd acm SIGKDD International Conference on Knowledge Discovery and Data Mining (2016).
Géron, A., Hands-on Machine Learning with Scikit-Learn, Keras, and TensorFlow. O'Reilly Media, Inc. (2022).
James, G. et al. An Introduction to Statistical Learning Vol. 112 (Springer, 2013).
Peterson, L. E. K-nearest neighbor. Scholarpedia 4(2), 1883 (2009).
Sherstinsky, A. Fundamentals of recurrent neural network (RNN) and long short-term memory (LSTM) network. Physica D 404, 132306 (2020).
Hochreiter, S. & Schmidhuber, J. Long short-term memory. Neural Comput. 9(8), 1735–1780 (1997).
Kamath, C. Intelligent sampling for surrogate modeling, hyperparameter optimization, and data analysis. Mach. Learn. Appl. 9, 100373 (2022).
Steponavičė, I., et al. On sampling methods for costly multi-objective black-box optimization. In Advances in Stochastic and Deterministic Global Optimization, pp. 273–296 (2016).
Bashiri, S., Yasari, E. & Tayyebi, S. Comparison of different sampling and surrogate modelling approaches for a multi-objective optimization problem of direct dimethyl ether synthesis in the fixed-bed reactor. Chemom. Intell. Lab. Syst. 230, 104683 (2022).
Bakhtyari, A., Mofarahi, M. & Iulianelli, A. Combined mathematical and artificial intelligence modeling of catalytic bio-methanol conversion to dimethyl ether. Energy Convers. Manage. 276, 116562 (2023).
Ansari, S. et al. Prediction of hydrogen solubility in aqueous solutions: Comparison of equations of state and advanced machine learning-metaheuristic approaches. Int. J. Hydrogen Energy 47(89), 37724–37741 (2022).
Cohen, I., et al. Pearson correlation coefficient. In Noise Reduction in Speech Processing, pp. 1–4 (2009).
Rios, L. M. & Sahinidis, N. V. Derivative-free optimization: a review of algorithms and comparison of software implementations. J. Global Optim. 56, 1247–1293 (2013).
Storn, R. & Price, K. Differential evolution-a simple and efficient heuristic for global optimization over continuous spaces. J. Global Optim. 11(4), 341 (1997).
Kraft, D. A software package for sequential quadratic programming. In Forschungsbericht- Deutsche Forschungs- und Versuchsanstalt fur Luft- und Raumfahrt (1988).
Byrd, R. H., Hribar, M. E. & Nocedal, J. An interior point algorithm for large-scale nonlinear programming. SIAM J. Optim. 9(4), 877–900 (1999).
Lundberg, S. M., & Lee, S. A unified approach to interpreting model predictions. In Advances in Neural Information Processing Systems, vol. 30, Curran Associates, Inc., (2017).
Funding
No funding was received for this study.
Author information
Authors and Affiliations
Contributions
M.M.: Writing-Original Draft, Visualization, Software, Modeling, Data curation, Methodology, M.R.: Methodology, Validation, Supervision, Writing-Review & Editing, Conceptualization, S.A.: Writing-Review & Editing, Methodology, Conceptualization, Investigation, Visualization.
Corresponding author
Ethics declarations
Competing interests
The authors declare no competing interests.
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary Information
Below is the link to the electronic supplementary material.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.
About this article
Cite this article
Mokari, M., Rahmani, M. & Atashrouz, S. Interpretable machine learning for optimized dimethyl ether production from bio-methanol. Sci Rep (2026). https://doi.org/10.1038/s41598-026-38090-w
Received:
Accepted:
Published:
DOI: https://doi.org/10.1038/s41598-026-38090-w