Abstract
Chemicals transfer from the packaging materials and their dissolution in food and water can create health risks. Due to the costly and time-intensive nature of experimental measurements, employing artificial intelligence (AI) methodologies is beneficial. This research uses five renowned AI-based techniques (namely, long short-term memory, gradient boosting regressor, multi-layer perceptron, Random Forest, and convolutional neural networks) to anticipate chemical migration from packaging materials to the food/water structure, considering variables such as temperature, chemical characteristics, and packaging/food types. The relevance analysis has been employed for monitoring the way that these explanatory variables impact the chemical migration from packaging materials into foods and water. Optimizing the hyperparameters, evaluating the prediction accuracy, and comparing the performance of these AI models reveal that the gradient boosting regressor (GBR) is the superior method for this simulation. The proposed GBR model accurately predicts 1847 experimental datasets, showcasing mean squared error, mean absolute error, root mean squared error, relative absolute error percent, and regressing coefficient, of 0.06, 0.15, 0.24, 6.46%, and 0.9961 respectively. Additionally, implementing a leverage algorithm for outlier detection further affirms the reliability of this modeling study.
Similar content being viewed by others
Introduction
Food packaging materials1, storage containers2, and processing facilities3 are in direct contact with food/water and can influence the safety level. As of now, more than 6500 chemicals are employed in the production of food contact materials (FCMs)4. The food packaging aims to increase the shelf-life of food by its protection against external contaminations5. In addition, the packaging materials can potentially impact the filling process as well as the transportation process of food. Zhang et al. have worked on automating a gas distribution system for the atmosphere packaging of fresh food6. To do so, this study combined the modified atmosphere preservation technology, fuzzy-based controller, and network communication technology.
The packaging material is a key factor that influences the chemical migration into the food7. This migration is required to be carefully monitored to evaluate the risks associated with chemicals in contact with foods. Polymeric materials, including low-density polyethylene (LDPE), polyethylene terephthalate (PET), high-density polyethylene (HDPE), polypropylene, polystyrene, and bioplastics8 are often used in food packaging9.
The chemical migration from FCMs into food is governed mainly by diffusion and is influenced by various factors, including temperature, time, and the compositions of both the food and FCM10,11. Chemical migration from packaging materials into foods under refrigeration conditions is both slower and smaller than that occurs at higher temperatures12. The packaging’s surface area to volume ratio is another factor that influences the amount of migrated chemicals. Food packages with a higher surface area to volume ratio experience more chemical migration into the food13, a situation that is especially relevant for foods intended for children (e.g. small size packages)14. Zhang et al. proposed an innovative methodology for the oyster packaging process15. The authors combined deep learning, machine vision, and mechanical arm control to automate the oyster packaging process.
The food contact chemical (FCC) migration is often evaluated by using the partition coefficient (Kpf), i.e., a key parameter for estimating chemical transfer from packaging materials into food10,11. This factor is calculated by dividing the chemical concentration in the food contact material by that in the food structure11. The impact of influential variables on Kpf (e.g. temperature, material type, and food characteristics) is often investigated by experimental methods. The limited migration information for major FCCs is likely related to the cost and time-intensive nature of doing such an experiment16. These complexities of the experimental analysis of chemical migration into foods have resulted in the utilization of the intelligent10,11 or mathematical17 models for the Kpf estimation. These models can be applied to anticipate the chemical migration from packaging materials into foods from several explanatory variables.
Limited correlation methods exist for determining Kpf, based on a chemical’s octanol-water partition coefficient (Kow) and the ethanol equivalency (EtOH-eq) for food’s polarity10,11,18. The solid material-water partition coefficient (Kmw) is also considered as an input variable once water is the food substance11.
Understanding FCC migration requires comprehensive model predictions, considering a wide range of scenarios. Some computational studies have developed regression models to evaluate chemical migration using the Food and Drug Administration (FDA) migration database19. With this database no longer accessible through personal means, advancing migration models requires new data acquisition.
Artificial intelligence (AI) methods can be effectively applied to simulate many phenomena in diverse contexts. Consequently, simulating the partition coefficient using machine learning (ML) techniques has gained significant attention. A novel ML-based weight-of-evidence (WoE) model focusing on assessing the carcinogenicity of FCCs, merged different computational approaches, improving the accuracy of identifying potentially carcinogenic chemicals20. This WoE model demonstrated an 8% improvement in under the receiver operating characteristic curve and a 19.7% increase in chemical coverage compared to the best singular method, identifying 44 high-risk carcinogenic chemicals. This efficient and thorough method aids in prioritizing chemicals for further analysis. Another study introduced a genetic algorithm-based WoE model21, combining techniques like structural alerts, quantitative structure-activity relationships, and in silico toxicogenomic models. It effectively prioritized 623 food packaging chemicals, identifying 26 as highly toxic, with 13 linked to developmental or reproductive issues. Further investigation is required for the remaining 13 chemicals. Recently, Huang and Jolliet compiled an extensive dataset with over 1800 data points, developing a regression model to predict food contact chemicals migration11.
Our main objective is to develop a reliable correlation based on ML methods to estimate logKpf for various food-packaging combinations. The relevancy analysis is employed for monitoring the impact of packaging materials, food characteristics, and temperature on the partition coefficient. Then, five different ML classes were designed to estimate the considered target variable, and their performances were compared to determine the highest accurate approach.
Problem description
Tables 1, 2 report a dataset comprising 1847 experimental logKpf values for different migration scenarios obtained from a prior study11. The data set includes the type of packaging material, temperature, logKow at 298 K, and EtOH-eq. The dataset comprises 232 distinct compounds and encompasses 19 various kinds of packaging materials. The ethanol equivalency values and the range of temperatures vary from 0 to 100%, and 275–313 K, respectively. The logKpf data vary from − 3.509 to 7.724. Amongst the nineteen types of packaging materials, LDPE, HDPE, and silicone rubber have the largest amount of experimental data, i.e., 804, 307, and 285, respectively.
Each type of food was given a specific ethanol equivalency value, which indicates its polarity22. The ethanol equivalency represents the proportion of ethanol in water, measured in volume/volume ratios and expressed as a percentage. Therefore, the ethanol equivalency ranges from 0 to 100, representing water at 0 and pure ethanol at 100. The maximum, average, minimum, and standard deviation (SD) values of the numerical features have also been reported. The SD value of each variable (V) can be calculated using Eq. (1)23.
Here, NDP is the number of data points. The Vave indicates the average value of a variable.
Results and discussion
Feature importance analysis, hyperparameter tuning of the AI-based paradigms, model selection by statistical inspection, outlier identification by leverage method, and trend monitoring are investigated in this section.
Feature importance analysis
Pearson’s correlation coefficient is a well-established statistically-based methodology to anticipate the direction as well as the relative importance of a relationship between two variables24. The numerical value of this coefficient is always between − 1 and + 1. The negative range for Pearson’s correlation is associated with the inverse relationship, while the positive range indicates a direct relationship. Also, a more important relationship is identified by a high absolute value of Pearson’s correlation. Figure 1 presents Pearson’s correlation coefficients between all input/output variables available in the current study. Both the last column and last row of this figure illustrate the direction and relative importance of the relationship of logKpf with the involved explanatory features (i.e., EtOH-eq, material type, logKow, and temperature). In summary, logKpf has the most important direct and inverse dependency on the logKow and EtOH-Eq, respectively.
The heatmap of the Pearson’s correlation coefficients between all variables in the present study.
Model design
Each AI-based method has some structural features known as hyperparameters that must be appropriately determined. The first and second columns of Table 3 summarize the hyperparameter names of the MLP, CNN, LSTM, RF, and GBR models and their investigated ranges. The grid search algorithm is utilized to find the best hyperparameters of each AI-based tool. This search algorithm creates a parameters grid based on the specified ranges/values for the hyperparameters. Then, all combinations of the hyperparameters are used to build a model, which is trained and validated using the cross-validation subset. The last column of the following table reports the optimum value of the hyperparameters.
Model selection
All the studies that applied machine learning methodologies for handling a classification25 or regression10 task evaluate model accuracy using different statistical indexes. This study also measures the deviation between actual (act) and predicted (pred) values of migrated chemicals from packaging materials to the food/water body using the mean absolute error (MAE), relative absolute error percent (RAE%), mean squared error (MSE), correlation coefficient (R), and root MSE (RMSE). The MAE, RAE%, MSE, RMSE, and R can be computed from Eq. (2) to (6), respectively26.
The average value of the experimental data can be obtained from Eq. (7).
All these performance metrics aim to measure the compatibility between actual and predicted values of the target variable. A model that presents smaller MAE, RAE, MSE, and RMSE values than the other is preferred from the modeling perspective. It must also be noted that this model must also possess a higher R value. Table 4 presents the performance of the designed AI-based methods in the cross-validation (CV) and testing stages in terms of MAE, RAE%, MSE, RMSE, and R indices. This table also introduces the overall performance of the checked AI-based methods.
Since there are the numerical values of five statistical indices for the CV and testing stages, it is not easy to identify the best method by visual inspection. Figure 2 separately illustrates the ranking places of the developed AI-based methods for the CV and testing stages. It can be seen that the GBR, which has the first rank in both the CV and testing stages, is the most accurate model for computing the amount of migrated chemicals from packaging materials to the food/water body.
The GBR method predicts 1478 experimental samples of the CV stage with the MAE = 0.13, RAE = 5.58%, MSE = 0.04, RMSE = 0.19, and R = 0.9977. It also provides MAE = 0.24, RAE = 10.13%, MSE = 0.16, RMSE = 0.39, and R = 0.9895 for predicting 369 unseen testing samples.
On the other hand, the RF model, with the fifth and fourth rankings in the CV and testing stages, is the worst model for predicting the target variable. It must be highlighted that the prediction performance of the RF model is also acceptable from the modeling point of view, but amongst the five designed methods, it provided the most unreliable predictions.
Comparing the performance ranking order of the utilized AI-based models.
The relative error (RE) between actual and predicted values of a target variable can be computed using Eq. (8).
Figure 3 presents the observed RE values between the experimental data and those predicted by the GBR, CNN, LSTM, RF, and MLP models. This figure approves that the GBR model has predicted the actual data with the smallest relative error ranging from − 27 to 11. It can also be concluded that the RF model has estimated the actual data with the highest RE values ranging from − 84 to 152.
Comparing the models’ prediction accuracy by the relative error.
Assessment of the GBR performance
The scatter graph between GBR predictions and experimental measurements of the chemical migration from packaging materials to food/water is shown in Fig. 4a. The accumulation of both CV and testing samples around the diagonal line indicates the outstanding ability of the constructed GBR method in precisely simulating the chemical migration from packaging materials to the food body. The observed regression coefficients of 0.9977 (CV stage) and 0.9895 (testing step) also approve the excellent performance of the proposed GBR method.
The histogram of the observed residual error (actual – prediction) in the CV and testing periods has been exhibited in Fig. 4b. This analysis approves that ~ 800 CV and ~ 100 testing records have been estimated with an error almost equal to zero. It can also be viewed that the residual error in the CV and testing steps ranges from − 1 to + 1. It is worth noting that the mean and standard deviation of the overall error are − 0.0050569 and 0.24314, respectively.
The KDE (kernel density estimation) versus magnitude profiles for the experimental chemical migration and GBR prediction have been depicted in Fig. 4c. This investigation justifies the excellent agreement between actual and simulated KDE-magnitude profiles.
Checking the compatibility between actual and predicted values of the target variable by (a) cross-plot, (b) histogram of residual errors, and (c) KDE-magnitude profile.
Validity domain inspection
The leverage algorithm provides a practical ground to separate valid records from outliers and out-of-leverage samples27. The leverage method needs the numerical values of the standardized residual (StR), Hat index (HI), and critical HI (CHI) to separate the normal from the abnormal measurements. Equation (9) to (11) present the formula of these parameters.
N and M indicate the number of input variables (i.e., 4) and the matrix of their numerical values.
The confined region by the StR = ± 3 and HI < CHI includes valid samples. On the other hand, the StR-HI samples located in other areas of this plot are either outlier or out-of-leverage.
As Fig. 5 illustrates, it is easy to see that the leverage algorithm has detected 56 problematic samples (outlier + out-of-leverage) amongst 1847 experimentally-measured chemical migration from packaging materials to food and water. This small percentile (~ 3%) of the problematic data is not big enough to negatively affect the GBR performance. So, there is no need to remove this small number of abnormal data from the conducted analyses.
Identifying outlier, valid, and out of leverage samples.
Parametric analysis
The influence of logKow on the logKpf at four polarity levels (EtOH = 40.6, 55.9, 79.2, and 100) from both experimental and simulation perspectives have been presented in Fig. 6. This figure confirms the excellent capability of the proposed GBR method in predicting the EtOH-eq and logKow impact on the logKpf in actual situations. The experimental and simulation findings justify that the logKpf increases by decreasing EtOH-eq or increasing the logKow.
Effect of ethanol equivalency on chemical dissolution in food (HDPE, 298 K).
The experimental and simulated logKpf versus logKow profiles at three temperature levels (283, 296, and 313 K) have been plotted in Fig. 7. The acceptable agreement between actual and predicted chemical migration from the packaging materials to the food/water can be concluded by this analysis. It can also be seen that the logKpf has a complex dependency on the temperature. The temperature rise first decreases the chemical migration from the packaging materials to the food body, and then it increases the chemical dissolution in the food structure.
Effect of temperature on chemical dissolution in foods (average HDPE/LDPE, EtOH-eq = 100).
Limitations of the current study
The proposed AI-based models in this study can only be applied to anticipate the amount of migrated chemicals from the involved packaging materials (see Table 1) into food/water in the investigated range of temperature and logKow (see Table 2).
Conclusions
The objective of this study is to simulate the chemical migration from packaging materials into foods and analyze how this process is influenced by temperature and the properties of foods and packaging materials. The multi-layer perceptron neural networks, convolutional neural networks, long short-term memory, Random Forest, and gradient boosting regressor (GBR) have been applied to unravel the potential relationship between chemical migration and its explanatory variables. Below are the key numerical and qualitative of our study:
-
logKow and ethanol equivalency (EtOH-Eq) are the most important features to simulate the considered problem.
-
The GBR showed superior predictive performance compared to the multi-layer perceptron neural networks, long short-term memory, and Random Forest, convolutional neural networks.
-
The GBR showed remarkable accuracy in estimating 1478 cross-validation samples, i.e., mean absolute error (MAE) of 0.13, relative absolute error (RAE) of 5.58%, mean squared error (MSE) of 0.04, root mean squared error (RMSE) of 0.19, and a correlation coefficient (R) of 0.9977.
-
The GBR performance also remains robust when it was checked by 369 unseen testing samples, i.e., MAE = 0.24, RAE = 10.13%, MSE = 0.16, RMSE = 0.39, and R = 0.9895.
-
Only 56 out of 1847 experimental measurements reported in the literature could potentially be either outliers or out of leverage.
-
The chemical migration experiences a notable decrease with increasing the food’s polarity, i.e., EtOH-Eq.
-
Chemical migration into food has a complex dependency on temperature, initially decreasing with increasing temperature, followed by an increasing response.
-
It was also observed that chemical migration continuously increases by increasing logKow.
Future works in this field may utilize complex modeling techniques and hybridized AI approaches to estimate chemical migration from emerging packaging materials, such as bioplastics or nanocomposites into foods, water, and even drugs.
Data availability
All the collected data from the literature is available in the Supplementary Materials.
References
Oldring, P. K. T. et al. Development of a new modelling tool (FACET) to assess exposure to chemical migrants from food packaging. Food Addit. Contam. Part. A 31, 444–465 (2014).
Manoli, E. & Voutsa, D. Food containers and packaging materials as possible source of hazardous chemicals to food. Hazard. Chem. Assoc. Plast. Mar. Environ. 19–50 (2019).
Pakdel, M., Olsen, A. & Bar, E. M. S. A review of food contaminants and their pathways within food processing facilities using open food processing equipment. J. Food Prot. 86, 100184 (2023).
Ncube, L. K., Ude, A. U., Ogunmuyiwa, E. N., Zulkifli, R. & Beas, I. N. Environmental impact of food packaging materials: A review of contemporary development from conventional plastics to polylactic acid based materials. Materials 13, 4994 (2020).
Han, J., Ruiz-Garcia, L., Qian, J. & Yang, X. Food packaging: A comprehensive review and future trends. Compr. Rev. Food Sci. Food Saf. 17, 860–877 (2018).
Zhang, H. et al. Fuzzy-PID-based atmosphere packaging gas distribution system for fresh food. Appl. Sci. 13, 2674 (2023).
Arvanitoyannis, I. S. & Kotsanopoulos, K. V. Migration phenomenon in food packaging. Food–package interactions, mechanisms, types of migrants, testing and relative legislation—a review. Food Bioproc. Tech. 7, 21–36 (2014).
Jabeen, N., Majid, I. & Nayik, G. A. Bioplastics and food packaging: A review. Cogent Food Agric. 1, 1117749 (2015).
Zhang, M. et al. Recent advances in polymers and polymer composites for food packaging. Mater. Today 53, 134–161 (2022).
Wang, S. S., Lin, P., Wang, C. C., Lin, Y. C. & Tung, C. W. Machine learning for predicting chemical migration from food packaging materials to foods. Food Chem. Toxicol. 178, 113942 (2023).
Huang, L. & Jolliet, O. A combined quantitative property-property relationship (QPPR) for estimating packaging-food and solid material-water partition coefficients of organic compounds. Sci. Total Environ. 658, 493–500 (2019).
Cozzini, P., Cavaliere, F., Spaggiari, G., Morelli, G. & Riani, M. Computational methods on food contact chemicals: big data and in silico screening on nuclear receptors family. Chemosphere 292, 133422 (2022).
Groh, K. J., Geueke, B., Martin, O., Maffini, M. & Muncke, J. Overview of intentionally used food contact chemicals and their hazards. Environ. Int. 150, 106225 (2021).
Muncke, J. et al. Impacts of food contact chemicals on human health: a consensus statement. Environ. Health 19, 1–12 (2020).
Zhang, R., Chen, X., Wan, Z., Wang, M. & Xiao, X. Deep learning-based oyster packaging system. Appl. Sci. 13, 13105 (2023).
Geueke, B. et al. Systematic evidence on migrating and extractable food contact chemicals: most chemicals detected in food contact materials are not listed for use. Crit. Rev. Food Sci. Nutr. 63, 9425–9435 (2023).
Tehrany, E. A., Mouawad, C. & Desobry, S. Determination of partition coefficient of migrants in food simulants by the PRV method. Food Chem. 105, 1571–1577 (2007).
Seiler, A. et al. Correlation of foodstuffs with ethanol–water mixtures with regard to the solubility of migrants from food contact materials. Food Addit. Contam. Part. A 31, 498–511 (2014).
Turley, A. E. et al. Incorporating new approach methodologies in toxicity testing and exposure assessment for tiered risk assessment using the RISK21 approach: case studies on food contact chemicals. Food Chem. Toxicol. 134, 110819 (2019).
Wang, C. C., Liang, Y. C., Wang, S. S., Lin, P. & Tung C.-W. A machine learning-driven approach for prioritizing food contact chemicals of carcinogenic concern based on complementary in silico methods. Food Chem. Toxicol. 160, 112802 (2022).
Tung, C. W., Cheng, H. J., Wang, C. C., Wang, S. S. & Lin, P. Leveraging complementary computational models for prioritizing chemicals of developmental and reproductive toxicity concern: an example of food contact materials. Arch. Toxicol. 94, 485–494 (2020).
Ozaki, A., Gruner, A., Störmer, A., Brandsch, R. & Franz, R. Correlation between partition coefficients polymer/food simulant, KP, F, and octanol/water, log POW-a new approach in support of migration modeling and compliance testing. Dtsch. Lebensm. Rundsch. 106, 203–208 (2010).
El-Kenawy, E. S. M. et al. Greylag goose optimization: nature-inspired optimization algorithm. Expert Syst. Appl. 238, 122147 (2024).
Meng, X., Lee, K., Kang, T. Y. & Ko, S. An irreversible ripeness indicator to monitor the CO 2 concentration in the headspace of packaged Kimchi during storage. Food Sci. Biotechnol. 24, 91–97 (2015).
Shams, M. Y., Hussien, A., Atiya, A., Medhat, L. & Bhatnagar, R. Food item recognition and calories estimation using YOLOv5. In International Conference on Computer & Communication Technologies 241–252 (Springer, 2023).
Elshewey, A. M. et al. Weight prediction using the hybrid stacked-LSTM food selection model. Comput. Syst. Sci. Eng. 46, 765–781 (2023).
Rezaei, T. et al. A universal methodology for reliable predicting the non-steroidal anti-inflammatory drug solubility in supercritical carbon dioxide. Sci. Rep. 12, 1–12 (2022).
Author information
Authors and Affiliations
Contributions
B.V. Writing-original draft, Conceptualization, Project Administration, Validation, Supervision.M.D.: Writing-review and editing, Data curation, Software, Methodology.R.Y.: Writing-original draft, Software, Methodology.A.H.A.: Writing-review and editing, Visualization.All authors have read and agreed to the published version of the manuscript.
Corresponding author
Ethics declarations
Competing interests
The authors declare no competing interests.
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Electronic supplementary material
Below is the link to the electronic supplementary material.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.
About this article
Cite this article
Vaferi, B., Dehbashi, M., Yousefzadeh, R. et al. Prediction of the packaging chemical migration into food and water by cutting-edge machine learning techniques. Sci Rep 15, 7806 (2025). https://doi.org/10.1038/s41598-025-92459-x
Received:
Accepted:
Published:
Version of record:
DOI: https://doi.org/10.1038/s41598-025-92459-x









