Machine-learning-based predictions of imprinting quality using ensemble and non-linear regression algorithms

Yarahmadi, Bita; Hashemianzadeh, Seyed Majid; Milani Hosseini, Seyed Mohammad-Reza

doi:10.1038/s41598-023-39374-1

Download PDF

Article
Open access
Published: 26 July 2023

Machine-learning-based predictions of imprinting quality using ensemble and non-linear regression algorithms

Bita Yarahmadi¹,
Seyed Majid Hashemianzadeh² &
Seyed Mohammad-Reza Milani Hosseini¹

Scientific Reports volume 13, Article number: 12111 (2023) Cite this article

4367 Accesses
34 Citations
9 Altmetric
Metrics details

Subjects

Abstract

The molecularly imprinted polymers are artificial polymers that, during the synthesis, create specific sites for a definite purpose. These polymers due to their characteristics such as stability, easy of synthesis, reproducibility, reusability, high accuracy, and selectivity have many applications. However, the variety of the functional monomers, templates, solvents, and synthesis conditions like pH, temperature, the rate of stirring, and time, limit the selectivity of imprinting. The Practical optimization of the synthetic conditions has many drawbacks, including chemical compound usage, equipment requirements, and time costs. The use of machine learning (ML) for the prediction of the imprinting factor (IF), which indicates the quality of imprinting is a very interesting idea to overcome these problems. The ML has many advantages, for example a lack of human error, high accuracy, high repeatability, and prediction of a large amount of data in the minimum time. In this research, ML was used to predict the IF using non-linear regression algorithms, including classification and regression tree, support vector regression, and k-nearest neighbors, and ensemble algorithms, like gradient boosting (GB), random forest, and extra trees. The data sets were obtained practically in the laboratory, and inputs, included pH, the type of the template, the type of the monomer, solvent, the distribution coefficient of the MIP (K_MIP), and the distribution coefficient of the non-imprinted polymer (K_NIP). The mutual information feature selection method was used to select the important features affecting the IF. The results showed that the GB algorithm had the best performance in predicting the IF, and using this algorithm, the maximum R² value (R² = 0.871), and the minimum mean absolute error (MAE = − 0.982), and mean square error were obtained (MSE = − 2.303).

Multi-target machine learning for predicting mechanical properties of FDM-printed polymer components

Article Open access 22 April 2026

Data-driven frameworks to robustly predict solubility parameter of diverse polymers

Article Open access 25 August 2025

Machine learning analysis of a large set of homopolymers to predict glass transition temperatures

Article Open access 02 October 2024

Introduction

In the last decade, the desire to use ML has gained popularity due to the large volume data sets that are accessible. The machines are essential for cheap computational processing, usually for accelerated data storage. Therefore, it is possible to quickly, and naturally create models in this field that provide researchers with much larger, more complex, and more detailed information¹. The Fig. 1 shows the steps of the modeling using ML, like data gathering, pre-processing, using algorithms, improving the model, and creating a final model. Using ML has received attention in various sciences, including chemistry², chemical engineering³, physics⁴, geology⁵, pharmacy⁶, medicine⁷, and computer science⁸. Also, it was used in various fields, for example disease diagnosis like cancer⁹, sensor design¹⁰, drug design¹¹, drug delivery¹², geography¹³, weather prediction¹⁴, polymers¹⁵, road control¹⁶, and traffic detection¹⁷ has found applications.

The MIPs are synthesis receptor, consisting of polymeric network with selective nano cavities that made based on the shape, size, and functional groups of the template molecule¹⁸. These polymers have many advantages, such as low cost, stability, easy preparation, and reproducibility¹⁹. However, the variety of the functional monomers, template, solvents, and synthesis condition like pH, temperature, rate and time of stirring has limited selectivity of imprinting²⁰. To overcome these problems, ML can be used to predict the factors that represent imprinting quality. Using ML to predict the IF has many advantages, like short time, low cost, no use of chemicals, no use of equipment, elimination needing to control synthesis conditions such as pH, temperature, stirring rate, and time. Indeed using ML lead to elimination disadvantages of the practical imprinting, and increase precision of imprinting without human error²¹.

The supervised learning is a popular method that widely used in ML. A supervised learning algorithm finds the relationship between training data, and their specific output. This algorithm uses the learned relationship to predict the new inputs²². The supervised learning has many advantage, like allowing people to collect data or generate data output from previous experience, helping AI developers optimize performance metrics using expertise, and helping people solve many different types of real-world computational problems²³.

The regression is a supervised ML algorithm, used to predict the continuous values of the output based on the input. There are three main types of regression algorithms, such as simple linear regression, multiple linear regression, and polynomial regression²⁴. This method is mainly used to predict, and find the relationships between features. The regression techniques differ based on the number of the independent features, and the relationship between the independent, and dependent features²⁵. The different regression algorithms were used to model imprinted polymer problems. For example, Sunil K.Jha et al. (2014) used the SVR algorithm to predict the response of an advanced MIP-based odor filter, and sensing system for the detection of volatile organic compounds (VOC). According to their results, the accuracy of the model was perfect²⁶. Also, Zhenhe Wang (2020) and colleagues designed an elemental quartz crystal microbalance gas sensor based on imprinted polymer, for the detection of volatile organic compounds. They used algorithms like LDA, KNN, PNN, and SVR to predict the response of the sensor. According to their results, using the SVR algorithm leads to create a model with high accuracy²⁷. Emma Van de Reydt et al. (2022) successfully used a ridge regression algorithm to create a model to predict the diffusion rate coefficients of various monomers in radical polymerization. The variables studied included boiling point, molecular weight, and dipole moment. Their model is used for monomer like styrene and acrylonitrile, acrylates and methacrylates in general, also accurately predicts the Arrhenius activation parameter and absolute velocity coefficient²⁸. Recently, Ahmed Elsonbaty and colleagues successfully optimized the fabrication conditions of four sensors consisting of PVC membranes based on the MIP, using the innovative self-validated ensemble modeling (SVEM). These sensors were prepared using four different experimental designs including SVEM-LASSO, ccentral composite, SVEM-PFWD and SVEM-FWD. Their proposed sensor had a suitable Nernst response²⁹.

The ensemble algorithms are one of the powerful learning algorithms for classification, and regression problems. The purpose of these algorithms is to combine multiple weak outputs to achieve a final strong output³⁰. Using these algorithms significantly reduces modeling errors, and improves accuracy of the model. Unfortunately, despite improved accuracy, these algorithms are not widely used in polymer development, due to their computational complexity³¹. Chenxi Liu et al. (2022) developed a highly sensitive fluorescence sensor based on molecularly imprinted dual-emitting polymers (dual-em-MIPs) for the detection of pretilachlor in fish, and water samples. They used a RF algorithm to predict sensor responses, and analyze fluorescence images of the samples. Their model perfectly predicted the sensor response³².

In this paper, we developed a minimal-error model to predict the IF for various MIPs. Input data including pH, type of the template, type of the functional monomer type, solvent, K_MIP, and K_NIP were obtained in the laboratory. IF values for 100 different MIPs were also calculated. According to the type, distribution, and range of the data, linear algorithms showed a huge prediction error, so they were not investigated. Non-linear regression algorithms like KNN, CART, SVR, and ensemble algorithms such as GB, RF, and ET were used to create a comprehensive model.

Experimental and methods

The reagents and materials

All chemical compounds were analytical grade, and all solutions were prepared with double distilled water (DDW). Tetraethyl orthosilicate (TEOS, 98.00%) was purchased from Merck (Darmstadt, Germany). Acrylamide (AA, ≥ 99.00%), Ethanol, Ethylene glycol dimethacrylate (EGDMA, 99.00%), hexadecyltrimethylammonium bromide (CTAB, ≥ 99.00%), hydrochloric acid, and 2, 2′-Azobis(2-methyl propionitrile) (AIBN, 99.00%) were obtained from Sigma-Aldrich (www.sigmaaldrich.com, St. Louis, MO, USA). All templates of pure powder were purchased from RazakPharma Company (www.RazakPharma.com, Tehran, Iran).

The apparatus

A scanning electron microscope (Model: GeminiSEM 460) was used to observe the morphology of the surface of MIP before and after washing with a suitable solvent. The absorbance spectra were obtained using a UV-1600 spectrometer.

The synthesis of MIP

The polymers in a typical process were synthesized as will be explained below: First, 1 ml of TEOS, 0.7 ml DDW, 1 ml ethanol, and 1 mL of hydrochloridric acid (0.2 mol L^-1) were mixed and stirred for 22 min. Then, 0.1 mg of AA (functional monomer), 2mg of AIBN, and 100 mg of EGDMA were added, and the mixture stirring for three hours at room temperature. Subsequently, the template molecule, was dissolved in 25 ml of hydrochloric acid (2 mol L^-1), in a beaker to reach pH = 6.2. Then the solution was stirred for 15 min at 42 °C to prepare the preassembly solution. At the end of the procedure, 5 ml ethanol, and 0.32 g CTAB were added to the mixture, and the mixture was kept stirring for one hour at room temperature. Also, for comparison, non-imprinted polymer (NIP) was prepared using the same procedure only without the addition of a template molecule in the polymerization process.

The MIP for one hundred templates was synthesized, like Naproxen, Nicotinamide, Ibuprofen, Cholesterol, Bisphenol A, (S)-Nilvadipine, d-Chlorpheniramine, Cinchonine, Nicotine, Salicylic aldehyde, folic acid, etc. Also, different functional monomers such as acrylic acid, acrylamide, methyl methacrylate, allylthiourea, 4-VBA, L-Val, 4-VP, 2-HEMA, APTES, and 2-MAOEP were used. During synthesis, pH was optimized using buffers (H₃PO₄/NaHPO₄^–2), and optimal pH was considered an essential factor in synthesis of the MIP^33,34.

Dataset

The main components of the MIPs are functional monomer, template, cross-linker, and solvent. If during synthesis pH changes, the structure of compounds that have proton or hydroxyl groups will change, and this structural change affects the imprinting quality, then pH is an essential factor in the synthesis MIPs. The solvent dissolves other reagents, so the type and volume of the solvent are significant factors. The distribution coefficient of NIP, and MIP is related to the imprinting quality. The high selectivity coefficient indicates strong imprinting, and the low selectivity coefficient indicates poor imprinting.

The difference between the concentration of the template molecules in solution after, and before absorption by MIP, leads to the determination of distribution coefficient (K) as per Eq. (1) that in this equation, m is the mass of the polymer, V is the volume of the solution., C_i is the initial concentration of the template molecules, and C_f is the equilibrium concentration of the template molecules. The monomer-template complex stability criteria (IF) was calculated using the following Eq. (2):

$${\text{K}} = \left( {{\text{C}}_{{\text{i}}} - {\text{C}}_{{\text{f}}} } \right){\text{V}}/{\text{mC}}_{{\text{f}}}$$

(1)

$${\text{IF}} = {\text{K}}_{{{\text{MIP}}}} /{\text{K}}_{{{\text{NIP}}}}$$

(2)

The dataset plays a vital role in modeling. In this study dataset was obtained experimentally in the laboratory. Different features, including, the type of the template, K_MIP, K_NIP, pH, the type of the functional monomer, the type, and volume of solvent were used as input, and IF as output. The descriptor of the template, functional monomer, and solvent, respectively, was, topological polar surface area, average dipole moment, and XLogp3 that were obtained from the Pubchem site. The number of the samples was one hundred, and to get a comprehensive model, we used different template molecules, and various algorithms. Python 3.11.1 software was used for modeling.

The modeling using non-linear regression, and ensemble algorithms

The performance of the model is influenced by three steps, including feature selection, algorithm selection, and cross-validation. These steps are critical, and if these three steps are not performed carefully, it will lead to creating a model with an error. Therefore, these steps must be done carefully. The features will have a significant effect on the performance, accuracy, and efficiency of the model. Perhaps the most crucial part of data mining operations and modeling is the feature selection method, because unrelated or somewhat related features reduce system performance. Implementing feature selection methods is the first, and important step in designing intelligent learning systems. Also, when the dimension of the data feature space is vast, and we are faced with the dimension problem, using the appropriate feature selection method reduces the “computational costs” required for optimal system training. Feature selection leads to dimensionality reduction by removing irrelevant and repetitive features. Due to we are facing a regression problem, the mutual information feature selection method was used. The mutual information method uses the application of information gained from the dataset (typically used in the construction of decision trees) to feature selection. In this method, mutual information is calculated between two features, and it measures the reduction in uncertainty for one feature given a known value of the other feature.

Regression is a predictive modeling technique that examines the relationship between a dependent and an independent variable. To accurate modeling, different algorithms, like SVR, CART, KNN, GB, ET, and RF were used. The SVR algorithm is a method where you plot the raw data as points in an n-dimensional space where n is the number of features you have. Each feature is tied to a specific coordinate on the page, making data classification easy. The KNN algorithm is a non-parametric statistical method used for statistical classification and regression. In the regression case, the average of the values obtained from K is its output. The CART algorithm is an algorithm that is needed to build a decision tree based on the Gini impurity index. In this algorithm, to progress from the observations on the samples represented by the branches, and to the desired target value represented by the leaves, conclusions are drawn. The GB algorithm consists of three elements, including a loss function for optimization, a poor learner for predictions, and a collective model for adding a weak learner to minimize the loss function. The GB algorithm is greedy and can quickly over fit the training data set. Therefore, regularization methods can penalize different parts of the amplification algorithm. The RF Algorithm, includes several decision trees in different subsets of the data set and takes an average to improve the prediction accuracy of that data set. Instead of relying on a decision tree, RF predicts the prediction from each tree based on the majority of votes and considers the final result as the output. The ET algorithm is an ensemble supervised method that uses decision trees. This method is similar to the RF algorithm but can be faster. This algorithm, like the RF algorithm, creates many decision trees, but the sampling for each tree is random, without replacement.To achieve a model that predicts the IF value with minimum error, all these algorithms were applied to the data.

Another essential parts of any ML modeling, is model validation. The process of the data separation involves dividing a data set into two or three subsets. Evaluating the model provides essential information about the performance. To divide the dataset into the test and train data, the shuffle split cross validation was used (n-splits = 10.000, test-size = 0.300, random state = 1.000). Also, to increase the accuracy of the model and prevent leakage of the test data to the training data, Pipelines were used. After determining the performance of the algorithms, the hyper parameters of the best algorithm (GB algorithm) were tuned.

Ethics approval and consent to participate

Authors consciously assure that for the manuscript, the following is fulfilled:

(1)
This material is the author’s original work, which has not been previously published elsewhere.
(2)
The paper is not currently being considered for publication elsewhere.
(3)
The paper reflects the author's research and analysis in a truthfully and completely manner.
(4)
All authors have been personally and actively involved in substantial work leading to the paper, and will take public responsibility for its content.

I agree with the above statements and declare that this submission follows the guidelines for Authors and the Ethical Statement.

The results and discussions

The surface evaluation

The Fig. 2 shows an SEM image of the surface of the MIP before (a) and after (b) washing with suitable solvent. It is clear that after washing MIP, template molecules leave the polymeric network, and cavities remain base on the functional group, shape, and size of the template molecule. The optimal synthesis conditions of the MIP, leads to excellent stability, and the MIP can be reused for extraction template molecules.

The data recognition

Data recognition is the first, and most important step in ML. The descriptive statistics can give us a great insight into each feature, and by using it, we can review more summaries of the features in minimum time. The use of descriptive statistics, eight characteristics, like number, average, maximum, minimum, standard deviation, %25, %50, and %75 for each feature was obtained. The results of the statistical analysis of the data are shown in Table 1.

Table 1 The results of the statistical data analysis, including number, average, maximum, minimum, standard deviation, %25, %50, and %75 for each feature.

Full size table

The feature selection

The mutual information feature selection method was used to determine the important features affecting the IF. The result of the using this method is that assigns a score to each feature. A higher score shows that feature have more effect on the output. The results of using mutual information feature selection method are shown in Fig. 3. The K_MIP, the volume of the solvent, the type of the functional monomer, and the type of the templates had the significant impact on the IF. Features such as K_NIP, pH, and the type of solvent had the less effect on the IF.

The non-linear regression, and ensemble algorithms

After standardizing the data, and applying feature selection, and cross-validation, algorithms, including KNN, SVR, CART, GB, ET, and RF were used. The pipeline was used to provide leaking training data to test data, and vice versa. The results of using different algorithms are shown in Table 2. The use of ensemble algorithms lead to improve the performance of the model .Among the regression algorithms, the CART algorithm had the highest value of R² (R² = 0.645), the lowest error (MAE = − 1.966, MSE = − 13.811). The minimum accuracy of the model using the regression algorithms was related to SVR (R² = 0.045, MSE = − 38.735, MAE = − 3.183). The GB algorithm performed better than the other algorithms in predicting IF, and it had the highest R² (R² = 0.859893), lowest error (MAE = − 0.980446, MSE = − 7.133007). Therefore, the GB algorithm was selected as the best algorithm for predicting IF, and adjusted in the next step.

Table 2 The results of using the non-linear regression, and ensemble algorithms.

Full size table

The gradient boosting algorithm tuning

The n_estimators is a suitable candidate parameter for adjustment, and increasing the GB algorithm performance. The default number of boosting steps to perform is 100. A larger number of reinforcement steps results in better performance of the algorithm, but does not increase the training time. To tune the n_estimators, their values were defined from 50 to 500 in steps of 50. The results are shown in Table 3. By tuning the n_estimators on 250, the best performance of the model was obtained, and the model had the maximum value of R² (R² = 0.870021), and the minimum error (MAE = − 0.982562, MSE = − 2.303335).

Table 3 The results of the tuning the GB algorithm, and the n_estimators hyper parameter values.

Full size table

The residual and prediction error plots

After setting the n_estimator value, plots of residuals, and prediction errors were used to assess the error of the model. The residual is a measurable error that depends on the estimates obtained from the population parameter. A residual plot shows the relationship between a particular independent feature, and its output, given the presence of other independent features in the model. This plot of the model is shown in Fig. 4. A regression prediction error plot is a qualitative measure of how a model predicts a response. In determining the prediction error plot, the x_train variable as an independent feature, and the y_train feature as a dependent feature plays a role in the model. This plot of the model is shown in Fig. 5. After tuning the GB algorithm, and the n_estimator values, the maximum R² value of the model was obtained (R² = 0.861), and the model has a remarkable ability to predict the IF. Both Figs. 4, and 5 were drawn by matplotlib library of python software.

Application of the model

The last criterion to evaluate the performance of the model, is its applicability in predicting outputs which their real value is known. Therefore, the use of the model in IF prediction was done for thirty different samples, and the results are shown in Table 4 .In Table 4, the actual and predicted IF values are close with a slight difference. This shows the excellent performance of the model in predicting IF with minimum error.

Table 4 The application of the model in predicting the IF for thirty different samples.

Full size table

Conclusion

Using ML to predict IF, and determine optimal synthetic conditions lead to reduce costs, improve accuracy, increase reproducibility, decrease human error, increase speed, save time, eliminate consume chemical reagents, and no needing devices, and equipment. To create an accurate model, different algorithms such as CART, KNN, SVR, RF, ET, and GB, and the pipelines were used. The ensemble algorithms showed more accuracy in predicting the IF than non-linear regression algorithms, because these algorithms combine several weak outputs to achieve an accurate output. The minimum accuracy of the model using non-linear regression algorithms was related to SVR algorithm (R² = 0.045, MAE = − 3.183, MSE = − 38.735). The CART algorithm had the maximum prediction accuracy among the non-linear regression algorithms (R² = 0.645, MAE = − 1.966, MSE = − 13.811). The results showed that the GB algorithm, after tuning the n_stimators hyper parameter, had the maximum accuracy of the model (R² = 0.871, MAE = − 0.982, MSE = − 2.303). The change in the accuracy of the model when using different algorithms, depends on the algorithm type, and data distribution. Another important factor that affects model accuracy, is the correlation between features, and output. Furthermore, the type of the feature selection method used for modeling is effective in increasing or decreasing the final accuracy of the model. The most important challenge that researchers face for modeling, is the preparation of literature or experimental data sets. So, the only limitation in this work was the preparation of the experimental data set. It is worth noting that after producing these data in the laboratory once and performing modeling with precision, there is no need for practical work, waste of time and money, and the produced models can be used for different samples and under different syntheses. In fact, using this model, the IF of different template molecules can be predicted with high accuracy and minimum error. Although the use of GB algorithms in chemistry prediction problems is still new, the results presented in this study are very encouraging.

Data availability

The data and materials that support the findings of this study are available from the corresponding author, upon reasonable request.

References

Cerezo, M., Verdon, G., Huang, H. Y., Cincio, L. & Coles, P. J. Challenges and opportunities in quantum machine learning. Nat. Comput. Sci. 2(9), 567–576. https://doi.org/10.1038/s43588-022-00311-3 (2022).
Article Google Scholar
Panteleev, J., Gao, H. & Jia, L. Recent applications of machine learning in medicinal chemistry. Bioorganic Med. Chem. Lett. 28(17), 2807–2815. https://doi.org/10.1016/j.bmcl.2018.06.046 (2018).
Article CAS Google Scholar
Paruzzo, F. M. et al. Chemical shifts in molecular solids by machine learning. Nature Commun. 9(1), 1–10. https://doi.org/10.1038/s41467-018-06972-x (2018).
Article CAS Google Scholar
Willard, J., Jia, X., Xu, S., Steinbach, M., & Kumar, V. Integrating physics-based modeling with machine learning: A survey. arXiv preprint arXiv:2003.04919, 1(1), 1–34 (2020).‏ https://doi.org/10.1145/1122445.1122456.
Merembayev, T., Yunussov, R., & Yedilkhan, A. Machine learning algorithms for classification geology data from well logging. In 2018 14th International Conference on Electronics Computer and Computation (ICECCO) (pp. 206–212). IEEE (2018).‏ https://doi.org/10.1109/ICECCO.2018.8634775.
Kolluri, S., Lin, J., Liu, R., Zhang, Y. & Zhang, W. Machine learning and artificial intelligence in pharmaceutical research and development: A review. AAPS J. 24(1), 1–10. https://doi.org/10.1208/s12248-021-00644-3 (2020).
Article Google Scholar
Sidey-Gibbons, J. A. & Sidey-Gibbons, C. J. Machine learning in medicine: A practical introduction. BMC Med. Res. Methodol. 19(1), 1–18. https://doi.org/10.1186/s12874-019-0681-4 (2019).
Article Google Scholar
Khan, A. I. & Al-Habsi, S. Machine learning in computer vision. Procedia Comput. Sci. 167, 1444–1451. https://doi.org/10.1016/j.procs.2020.03.355 (2020).
Article Google Scholar
Kourou, K., Exarchos, T. P., Exarchos, K. P., Karamouzis, M. V. & Fotiadis, D. I. Machine learning applications in cancer prognosis and prediction. Comput. Struct. Biotechnol. J. 13, 8–17. https://doi.org/10.1016/j.csbj.2014.11.005 (2015).
Article CAS PubMed Google Scholar
Ballard, Z., Brown, C., Madni, A. M. & Ozcan, A. Machine learning and computation-enabled intelligent sensor design. Nat. Mach. Intell. 3(7), 556–565. https://doi.org/10.1038/s42256-021-00360-9 (2021).
Article Google Scholar
Ferraro, M. et al. Multi-target dopamine D3 receptor modulators: Actionable knowledge for drug design from molecular dynamics and machine learning. Eur. J. Med. Chem. 188, 111975–112016. https://doi.org/10.1016/j.ejmech.2019.111975 (2020).
Article CAS PubMed Google Scholar
Gonzalez-Cava, J. M., Arnay, R., Méndez Pérez, J. A., León, A., Martín, M., Jove-Perez, E. & Cos Juez, F. J. D., A machine learning based system for analgesic drug delivery. In International Joint Conference SOCO’17-CISIS’17-ICEUTE’17 León, Spain, September 6–8, 2017, Proceeding (pp. 461–470). Springer, Cham (2017). https://doi.org/10.1007/978-3-319-67180-2_45.
Lavallin, A. & Downs, J. A. Machine learning in geography–past, present, and future. Geogr. Compass 15(5), e12563. https://doi.org/10.1111/gec3.12563 (2021).
Article Google Scholar
Haupt, S. E., Cowie, J., Linden, S., McCandless, T., Kosovic, B., & Alessandrini, S, 2018. October). Machine learning for applied weather prediction. In 2018 IEEE 14th International Conference On e-Science (e-Science) (pp. 276–277). https://doi.org/10.1109/eScience.2018.00047.
Gormley, A. J. & Webb, M. A. Machine learning in combinatorial polymer chemistry. Nat. Rev. Mater. 6(8), 642–644. https://doi.org/10.1038/s41578-021-00282-3 (2021).
Article ADS CAS PubMed PubMed Central Google Scholar
Ata, A., Khan, M. A., Abbas, S., Ahmad, G. & Fatima, A. Modelling smart road traffic congestion control system using machine learning techniques. Neural Netw. World 29(2), 99. https://doi.org/10.1431/NNW.2019.29.008 (2019).
Article Google Scholar
Pacheco, F., Exposito, E., Gineste, M., Baudoin, C. & Aguilar, J. Towards the deployment of machine learning solutions in network traffic classification: A systematic survey. IEEE Commun. Surv. Tutor. 21(2), 1988–2014. https://doi.org/10.1109/COMST.2018.2883147 (2018).
Article Google Scholar
Beyazit, S., Bui, B. T. S., Haupt, K. & Gonzato, C. Molecularly imprinted polymer nanomaterials and nanocomposites by controlled/living radical polymerization. Prog. Polym. Sci. 62, 1–21. https://doi.org/10.1016/j.progpolymsci.2016.04.001 (2016).
Article CAS Google Scholar
Huang, D. L. et al. Application of molecularly imprinted polymers in wastewater treatment: A review. Environ. Sci. Pollut. Res. 22(2), 963–977. https://doi.org/10.1007/s11356-014-3599-8 (2015).
Article CAS Google Scholar
Dong, C. et al. Molecularly imprinted polymers by the surface imprinting technique. Eur. Polym. J. 145, 110231. https://doi.org/10.1016/j.eurpolymj.2020.110231 (2021).
Article CAS Google Scholar
Wang, M., Cetó, X. & Del Valle, M. A sensor array based on molecularly imprinted polymers and machine learning for the analysis of fluoroquinolone antibiotics. ACS Sens. 7(11), 3318–3325. https://doi.org/10.1021/acssensors.2c01260 (2022).
Article CAS PubMed PubMed Central Google Scholar
Zhou, Z. H. A brief introduction to weakly supervised learning. Natl. Sci. Rev. 5(1), 44–53. https://doi.org/10.1093/nsr/nwx106 (2018).
Article Google Scholar
Alexopoulos, K., Nikolakis, N. & Chryssolouris, G. Digital twin-driven supervised machine learning for the development of artificial intelligence applications in manufacturing. Int. J. Comput. Integ. Manuf. 33(5), 429–439. https://doi.org/10.1080/0951192X.2020.1747642 (2020).
Article Google Scholar
Rong, S., & Bao-Wen, Z. The research of regression model in machine learning field. In MATEC Web of Conferences (Vol. 176, p. 01033). EDP Sciences (2018).‏ https://doi.org/10.1051/matecconf/201817601033
Jagielski, M., Oprea, A., Biggio, B., Liu, C., Nita-Rotaru, C., & Li, B. Manipulating machine learning: Poisoning attacks and countermeasures for regression learning. In 2018 IEEE Symposium on Security and Privacy (SP) (pp. 19–35). IEEE (2018).‏https://doi.org/10.1109/SP.2018.00057.
Jha, S. K. & Hayashi, K. A novel odor filtering and sensing system combined with regression analysis for chemical vapor quantification. Sens. Actuators, B Chem. 200, 269–287. https://doi.org/10.1016/j.snb.2014.04.022 (2014).
Article CAS Google Scholar
Wang, Z., Chen, W., Gu, S., Wang, J. & Wang, Y. Discrimination of wood borers infested Platycladus orientalis trunks using quartz crystal microbalance gas sensor array. Sens. Actuators B Chem. 309, 127767. https://doi.org/10.1016/j.snb.2020.127767 (2020).
Article CAS Google Scholar
Van de Reydt, E., Maron, N., Saunderson, J., Boley, M., & Junkers, T. Machine-learning based prediction of kinetic rate coefficients in radical polymerization. (2022). https://doi.org/10.26434/chemrxiv-2022-v5nz8.
Mostafa, A. E. et al. Computer-aided design of eco-friendly imprinted polymer decorated sensors augmented by self-validated ensemble modeling designs for the quantitation of drotaverine hydrochloride in dosage form and human plasma. J. AOAC Int. https://doi.org/10.1093/jaoacint/qsad049 (2023).
Article PubMed Google Scholar
Krokidis, M. G. et al. A sensor-based perspective in early-stage parkinson’s disease: Current state and the need for machine learning processes. Sensors 22(2), 409. https://doi.org/10.3390/s22020409 (2022).
Article ADS CAS PubMed PubMed Central Google Scholar
Ferreira, A. J. & Figueiredo, M. A. Boosting algorithms: A review of methods, theory, and applications. Ensemble Mach. Learn. https://doi.org/10.1007/978-1-4419-9326-7_2 (2012).
Article Google Scholar
Liu, C. et al. Random forest algorithm-enhanced dual-emission molecularly imprinted fluorescence sensing method for rapid detection of pretilachlor in fish and water samples. J. Hazard. Mater. 439, 129591. https://doi.org/10.1016/j.jhazmat.2022.129591 (2022).
Article CAS PubMed Google Scholar
Ahmadpour, H. & Hosseini, S. M. M. A solid-phase luminescence sensor based on molecularly imprinted polymer-CdSeS/ZnS quantum dots for selective extraction and detection of sulfasalazine in biological samples. Talanta 194, 534–541. https://doi.org/10.1016/j.talanta.2018.10.053 (2019).
Article CAS PubMed Google Scholar
Panahi, Y., Motaharian, A., Hosseini, M. R. M. & Mehrpour, O. High sensitive and selective nano-molecularly imprinted polymer based electrochemical sensor for midazolam drug detection in pharmaceutical formulation and human urine samples. Sens. Actuators, B Chem. 273, 1579–1586. https://doi.org/10.1016/j.snb.2018.07.069 (2018).
Article CAS Google Scholar

Download references

Acknowledgements

The authors thank the real sample analysis laboratory for providing the conditions for the synthesis of MIPs and experimental results.

Author information

Authors and Affiliations

Real Samples Analysis Laboratory, Department of Chemistry, Iran University of Science and Technology, Tehran, Iran
Bita Yarahmadi & Seyed Mohammad-Reza Milani Hosseini
Molecular Simulation Research Laboratory, Department of Chemistry, Iran University of Science and Technology, Tehran, Iran
Seyed Majid Hashemianzadeh

Authors

Bita Yarahmadi
View author publications
Search author on:PubMed Google Scholar
Seyed Majid Hashemianzadeh
View author publications
Search author on:PubMed Google Scholar
Seyed Mohammad-Reza Milani Hosseini
View author publications
Search author on:PubMed Google Scholar

Contributions

B.Y.: Carried out the experiment, wrote the manuscript, developed the theory and performed the computations, and contributed to sample preparation. S.M.-R.M.H.: Contributed to the final version of the manuscript, designed and directed the project, and contributed to sample preparation. S.M.H.: Conceived and planned the experiments, designed and directed the project, and contributed to interpreting the results. The authors give their consent to the publication of identifiable details, which can include photographs, plots, and details within the text (Materials, synthesis method, and the result of using feature selection methods) to be published in the above Journal and Article. I confirm that I have seen and been allowed to read both the Material and the Article to be published.

Corresponding author

Correspondence to Seyed Majid Hashemianzadeh.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Publisher's note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Yarahmadi, B., Hashemianzadeh, S.M. & Milani Hosseini, S.MR. Machine-learning-based predictions of imprinting quality using ensemble and non-linear regression algorithms. Sci Rep 13, 12111 (2023). https://doi.org/10.1038/s41598-023-39374-1

Download citation

Received: 04 May 2023
Accepted: 25 July 2023
Published: 26 July 2023
Version of record: 26 July 2023
DOI: https://doi.org/10.1038/s41598-023-39374-1

This article is cited by

Machine Learning-based Prediction and Experimental Validation of Cr (VI) Adsorption Capacity of Chitosan-based Composites
- Fatemeh Yazdi
- Mohammad Sepehrian
- Mansoor Anbia
Journal of Polymers and the Environment (2025)
OptiFeat: enhancing feature selection, a hybrid approach combining subject matter expertise and recursive feature elimination method
- G. Vijayakumar
- R. K. Bharathi
Discover Computing (2024)
Optimisation of electrochemical sensors based on molecularly imprinted polymers: from OFAT to machine learning
- Sabrina Di Masi
- Giuseppe Egidio De Benedetto
- Cosimino Malitesta
Analytical and Bioanalytical Chemistry (2024)

Subjects

Abstract

Similar content being viewed by others

Multi-target machine learning for predicting mechanical properties of FDM-printed polymer components

Data-driven frameworks to robustly predict solubility parameter of diverse polymers

Machine learning analysis of a large set of homopolymers to predict glass transition temperatures

Introduction

Experimental and methods

The reagents and materials

The apparatus

The synthesis of MIP

Dataset

The modeling using non-linear regression, and ensemble algorithms

Ethics approval and consent to participate

The results and discussions

The surface evaluation

The data recognition

The feature selection

The non-linear regression, and ensemble algorithms

The gradient boosting algorithm tuning

The residual and prediction error plots

Application of the model

Conclusion

Data availability

References

Acknowledgements

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Competing interests

Additional information

Publisher's note

Rights and permissions

About this article

Cite this article

Share this article

This article is cited by

Machine Learning-based Prediction and Experimental Validation of Cr (VI) Adsorption Capacity of Chitosan-based Composites

OptiFeat: enhancing feature selection, a hybrid approach combining subject matter expertise and recursive feature elimination method

Optimisation of electrochemical sensors based on molecularly imprinted polymers: from OFAT to machine learning

Search

Quick links