Abstract
The molecularly imprinted polymers are artificial polymers that, during the synthesis, create specific sites for a definite purpose. These polymers due to their characteristics such as stability, easy of synthesis, reproducibility, reusability, high accuracy, and selectivity have many applications. However, the variety of the functional monomers, templates, solvents, and synthesis conditions like pH, temperature, the rate of stirring, and time, limit the selectivity of imprinting. The Practical optimization of the synthetic conditions has many drawbacks, including chemical compound usage, equipment requirements, and time costs. The use of machine learning (ML) for the prediction of the imprinting factor (IF), which indicates the quality of imprinting is a very interesting idea to overcome these problems. The ML has many advantages, for example a lack of human error, high accuracy, high repeatability, and prediction of a large amount of data in the minimum time. In this research, ML was used to predict the IF using non-linear regression algorithms, including classification and regression tree, support vector regression, and k-nearest neighbors, and ensemble algorithms, like gradient boosting (GB), random forest, and extra trees. The data sets were obtained practically in the laboratory, and inputs, included pH, the type of the template, the type of the monomer, solvent, the distribution coefficient of the MIP (KMIP), and the distribution coefficient of the non-imprinted polymer (KNIP). The mutual information feature selection method was used to select the important features affecting the IF. The results showed that the GB algorithm had the best performance in predicting the IF, and using this algorithm, the maximum R2 value (R2 = 0.871), and the minimum mean absolute error (MAE = − 0.982), and mean square error were obtained (MSE = − 2.303).
Similar content being viewed by others
Introduction
In the last decade, the desire to use ML has gained popularity due to the large volume data sets that are accessible. The machines are essential for cheap computational processing, usually for accelerated data storage. Therefore, it is possible to quickly, and naturally create models in this field that provide researchers with much larger, more complex, and more detailed information1. The Fig. 1 shows the steps of the modeling using ML, like data gathering, pre-processing, using algorithms, improving the model, and creating a final model. Using ML has received attention in various sciences, including chemistry2, chemical engineering3, physics4, geology5, pharmacy6, medicine7, and computer science8. Also, it was used in various fields, for example disease diagnosis like cancer9, sensor design10, drug design11, drug delivery12, geography13, weather prediction14, polymers15, road control16, and traffic detection17 has found applications.
Modeling steps using ML algorithms. These steps include collecting data, preprocessing of the data, applying different algorithms, improving models by cleaning data, or using pipelines.
The MIPs are synthesis receptor, consisting of polymeric network with selective nano cavities that made based on the shape, size, and functional groups of the template molecule18. These polymers have many advantages, such as low cost, stability, easy preparation, and reproducibility19. However, the variety of the functional monomers, template, solvents, and synthesis condition like pH, temperature, rate and time of stirring has limited selectivity of imprinting20. To overcome these problems, ML can be used to predict the factors that represent imprinting quality. Using ML to predict the IF has many advantages, like short time, low cost, no use of chemicals, no use of equipment, elimination needing to control synthesis conditions such as pH, temperature, stirring rate, and time. Indeed using ML lead to elimination disadvantages of the practical imprinting, and increase precision of imprinting without human error21.
The supervised learning is a popular method that widely used in ML. A supervised learning algorithm finds the relationship between training data, and their specific output. This algorithm uses the learned relationship to predict the new inputs22. The supervised learning has many advantage, like allowing people to collect data or generate data output from previous experience, helping AI developers optimize performance metrics using expertise, and helping people solve many different types of real-world computational problems23.
The regression is a supervised ML algorithm, used to predict the continuous values of the output based on the input. There are three main types of regression algorithms, such as simple linear regression, multiple linear regression, and polynomial regression24. This method is mainly used to predict, and find the relationships between features. The regression techniques differ based on the number of the independent features, and the relationship between the independent, and dependent features25. The different regression algorithms were used to model imprinted polymer problems. For example, Sunil K.Jha et al. (2014) used the SVR algorithm to predict the response of an advanced MIP-based odor filter, and sensing system for the detection of volatile organic compounds (VOC). According to their results, the accuracy of the model was perfect26. Also, Zhenhe Wang (2020) and colleagues designed an elemental quartz crystal microbalance gas sensor based on imprinted polymer, for the detection of volatile organic compounds. They used algorithms like LDA, KNN, PNN, and SVR to predict the response of the sensor. According to their results, using the SVR algorithm leads to create a model with high accuracy27. Emma Van de Reydt et al. (2022) successfully used a ridge regression algorithm to create a model to predict the diffusion rate coefficients of various monomers in radical polymerization. The variables studied included boiling point, molecular weight, and dipole moment. Their model is used for monomer like styrene and acrylonitrile, acrylates and methacrylates in general, also accurately predicts the Arrhenius activation parameter and absolute velocity coefficient28. Recently, Ahmed Elsonbaty and colleagues successfully optimized the fabrication conditions of four sensors consisting of PVC membranes based on the MIP, using the innovative self-validated ensemble modeling (SVEM). These sensors were prepared using four different experimental designs including SVEM-LASSO, ccentral composite, SVEM-PFWD and SVEM-FWD. Their proposed sensor had a suitable Nernst response29.
The ensemble algorithms are one of the powerful learning algorithms for classification, and regression problems. The purpose of these algorithms is to combine multiple weak outputs to achieve a final strong output30. Using these algorithms significantly reduces modeling errors, and improves accuracy of the model. Unfortunately, despite improved accuracy, these algorithms are not widely used in polymer development, due to their computational complexity31. Chenxi Liu et al. (2022) developed a highly sensitive fluorescence sensor based on molecularly imprinted dual-emitting polymers (dual-em-MIPs) for the detection of pretilachlor in fish, and water samples. They used a RF algorithm to predict sensor responses, and analyze fluorescence images of the samples. Their model perfectly predicted the sensor response32.
In this paper, we developed a minimal-error model to predict the IF for various MIPs. Input data including pH, type of the template, type of the functional monomer type, solvent, KMIP, and KNIP were obtained in the laboratory. IF values for 100 different MIPs were also calculated. According to the type, distribution, and range of the data, linear algorithms showed a huge prediction error, so they were not investigated. Non-linear regression algorithms like KNN, CART, SVR, and ensemble algorithms such as GB, RF, and ET were used to create a comprehensive model.
Experimental and methods
The reagents and materials
All chemical compounds were analytical grade, and all solutions were prepared with double distilled water (DDW). Tetraethyl orthosilicate (TEOS, 98.00%) was purchased from Merck (Darmstadt, Germany). Acrylamide (AA, ≥ 99.00%), Ethanol, Ethylene glycol dimethacrylate (EGDMA, 99.00%), hexadecyltrimethylammonium bromide (CTAB, ≥ 99.00%), hydrochloric acid, and 2, 2′-Azobis(2-methyl propionitrile) (AIBN, 99.00%) were obtained from Sigma-Aldrich (www.sigmaaldrich.com, St. Louis, MO, USA). All templates of pure powder were purchased from RazakPharma Company (www.RazakPharma.com, Tehran, Iran).
The apparatus
A scanning electron microscope (Model: GeminiSEM 460) was used to observe the morphology of the surface of MIP before and after washing with a suitable solvent. The absorbance spectra were obtained using a UV-1600 spectrometer.
The synthesis of MIP
The polymers in a typical process were synthesized as will be explained below: First, 1 ml of TEOS, 0.7 ml DDW, 1 ml ethanol, and 1 mL of hydrochloridric acid (0.2 mol L-1) were mixed and stirred for 22 min. Then, 0.1 mg of AA (functional monomer), 2mg of AIBN, and 100 mg of EGDMA were added, and the mixture stirring for three hours at room temperature. Subsequently, the template molecule, was dissolved in 25 ml of hydrochloric acid (2 mol L-1), in a beaker to reach pH = 6.2. Then the solution was stirred for 15 min at 42 °C to prepare the preassembly solution. At the end of the procedure, 5 ml ethanol, and 0.32 g CTAB were added to the mixture, and the mixture was kept stirring for one hour at room temperature. Also, for comparison, non-imprinted polymer (NIP) was prepared using the same procedure only without the addition of a template molecule in the polymerization process.
The MIP for one hundred templates was synthesized, like Naproxen, Nicotinamide, Ibuprofen, Cholesterol, Bisphenol A, (S)-Nilvadipine, d-Chlorpheniramine, Cinchonine, Nicotine, Salicylic aldehyde, folic acid, etc. Also, different functional monomers such as acrylic acid, acrylamide, methyl methacrylate, allylthiourea, 4-VBA, L-Val, 4-VP, 2-HEMA, APTES, and 2-MAOEP were used. During synthesis, pH was optimized using buffers (H3PO4/NaHPO4–2), and optimal pH was considered an essential factor in synthesis of the MIP33,34.
Dataset
The main components of the MIPs are functional monomer, template, cross-linker, and solvent. If during synthesis pH changes, the structure of compounds that have proton or hydroxyl groups will change, and this structural change affects the imprinting quality, then pH is an essential factor in the synthesis MIPs. The solvent dissolves other reagents, so the type and volume of the solvent are significant factors. The distribution coefficient of NIP, and MIP is related to the imprinting quality. The high selectivity coefficient indicates strong imprinting, and the low selectivity coefficient indicates poor imprinting.
The difference between the concentration of the template molecules in solution after, and before absorption by MIP, leads to the determination of distribution coefficient (K) as per Eq. (1) that in this equation, m is the mass of the polymer, V is the volume of the solution., Ci is the initial concentration of the template molecules, and Cf is the equilibrium concentration of the template molecules. The monomer-template complex stability criteria (IF) was calculated using the following Eq. (2):
The dataset plays a vital role in modeling. In this study dataset was obtained experimentally in the laboratory. Different features, including, the type of the template, KMIP, KNIP, pH, the type of the functional monomer, the type, and volume of solvent were used as input, and IF as output. The descriptor of the template, functional monomer, and solvent, respectively, was, topological polar surface area, average dipole moment, and XLogp3 that were obtained from the Pubchem site. The number of the samples was one hundred, and to get a comprehensive model, we used different template molecules, and various algorithms. Python 3.11.1 software was used for modeling.
The modeling using non-linear regression, and ensemble algorithms
The performance of the model is influenced by three steps, including feature selection, algorithm selection, and cross-validation. These steps are critical, and if these three steps are not performed carefully, it will lead to creating a model with an error. Therefore, these steps must be done carefully. The features will have a significant effect on the performance, accuracy, and efficiency of the model. Perhaps the most crucial part of data mining operations and modeling is the feature selection method, because unrelated or somewhat related features reduce system performance. Implementing feature selection methods is the first, and important step in designing intelligent learning systems. Also, when the dimension of the data feature space is vast, and we are faced with the dimension problem, using the appropriate feature selection method reduces the “computational costs” required for optimal system training. Feature selection leads to dimensionality reduction by removing irrelevant and repetitive features. Due to we are facing a regression problem, the mutual information feature selection method was used. The mutual information method uses the application of information gained from the dataset (typically used in the construction of decision trees) to feature selection. In this method, mutual information is calculated between two features, and it measures the reduction in uncertainty for one feature given a known value of the other feature.
Regression is a predictive modeling technique that examines the relationship between a dependent and an independent variable. To accurate modeling, different algorithms, like SVR, CART, KNN, GB, ET, and RF were used. The SVR algorithm is a method where you plot the raw data as points in an n-dimensional space where n is the number of features you have. Each feature is tied to a specific coordinate on the page, making data classification easy. The KNN algorithm is a non-parametric statistical method used for statistical classification and regression. In the regression case, the average of the values obtained from K is its output. The CART algorithm is an algorithm that is needed to build a decision tree based on the Gini impurity index. In this algorithm, to progress from the observations on the samples represented by the branches, and to the desired target value represented by the leaves, conclusions are drawn. The GB algorithm consists of three elements, including a loss function for optimization, a poor learner for predictions, and a collective model for adding a weak learner to minimize the loss function. The GB algorithm is greedy and can quickly over fit the training data set. Therefore, regularization methods can penalize different parts of the amplification algorithm. The RF Algorithm, includes several decision trees in different subsets of the data set and takes an average to improve the prediction accuracy of that data set. Instead of relying on a decision tree, RF predicts the prediction from each tree based on the majority of votes and considers the final result as the output. The ET algorithm is an ensemble supervised method that uses decision trees. This method is similar to the RF algorithm but can be faster. This algorithm, like the RF algorithm, creates many decision trees, but the sampling for each tree is random, without replacement.To achieve a model that predicts the IF value with minimum error, all these algorithms were applied to the data.
Another essential parts of any ML modeling, is model validation. The process of the data separation involves dividing a data set into two or three subsets. Evaluating the model provides essential information about the performance. To divide the dataset into the test and train data, the shuffle split cross validation was used (n-splits = 10.000, test-size = 0.300, random state = 1.000). Also, to increase the accuracy of the model and prevent leakage of the test data to the training data, Pipelines were used. After determining the performance of the algorithms, the hyper parameters of the best algorithm (GB algorithm) were tuned.
Ethics approval and consent to participate
Authors consciously assure that for the manuscript, the following is fulfilled:
-
(1)
This material is the author’s original work, which has not been previously published elsewhere.
-
(2)
The paper is not currently being considered for publication elsewhere.
-
(3)
The paper reflects the author's research and analysis in a truthfully and completely manner.
-
(4)
All authors have been personally and actively involved in substantial work leading to the paper, and will take public responsibility for its content.
I agree with the above statements and declare that this submission follows the guidelines for Authors and the Ethical Statement.
The results and discussions
The surface evaluation
The Fig. 2 shows an SEM image of the surface of the MIP before (a) and after (b) washing with suitable solvent. It is clear that after washing MIP, template molecules leave the polymeric network, and cavities remain base on the functional group, shape, and size of the template molecule. The optimal synthesis conditions of the MIP, leads to excellent stability, and the MIP can be reused for extraction template molecules.
The SEM image of the surface of MIP before (a) and after (b) washing with suitable solvent. After washing the MIP with a suitable solvent capable of removing the template molecules from the network, holes the size of the template molecule remained.
The data recognition
Data recognition is the first, and most important step in ML. The descriptive statistics can give us a great insight into each feature, and by using it, we can review more summaries of the features in minimum time. The use of descriptive statistics, eight characteristics, like number, average, maximum, minimum, standard deviation, %25, %50, and %75 for each feature was obtained. The results of the statistical analysis of the data are shown in Table 1.
The feature selection
The mutual information feature selection method was used to determine the important features affecting the IF. The result of the using this method is that assigns a score to each feature. A higher score shows that feature have more effect on the output. The results of using mutual information feature selection method are shown in Fig. 3. The KMIP, the volume of the solvent, the type of the functional monomer, and the type of the templates had the significant impact on the IF. Features such as KNIP, pH, and the type of solvent had the less effect on the IF.
The result of using the mutual information feature selection method for determination important feature affected IF. The KMIP, volume of the solvent, the type of the functional monomer, and the type of the templates had the significant impact on the IF. Features like KNIP, pH, and the type of solvent had the less effect on the IF.
The non-linear regression, and ensemble algorithms
After standardizing the data, and applying feature selection, and cross-validation, algorithms, including KNN, SVR, CART, GB, ET, and RF were used. The pipeline was used to provide leaking training data to test data, and vice versa. The results of using different algorithms are shown in Table 2. The use of ensemble algorithms lead to improve the performance of the model .Among the regression algorithms, the CART algorithm had the highest value of R2 (R2 = 0.645), the lowest error (MAE = − 1.966, MSE = − 13.811). The minimum accuracy of the model using the regression algorithms was related to SVR (R2 = 0.045, MSE = − 38.735, MAE = − 3.183). The GB algorithm performed better than the other algorithms in predicting IF, and it had the highest R2 (R2 = 0.859893), lowest error (MAE = − 0.980446, MSE = − 7.133007). Therefore, the GB algorithm was selected as the best algorithm for predicting IF, and adjusted in the next step.
The gradient boosting algorithm tuning
The n_estimators is a suitable candidate parameter for adjustment, and increasing the GB algorithm performance. The default number of boosting steps to perform is 100. A larger number of reinforcement steps results in better performance of the algorithm, but does not increase the training time. To tune the n_estimators, their values were defined from 50 to 500 in steps of 50. The results are shown in Table 3. By tuning the n_estimators on 250, the best performance of the model was obtained, and the model had the maximum value of R2 (R2 = 0.870021), and the minimum error (MAE = − 0.982562, MSE = − 2.303335).
The residual and prediction error plots
After setting the n_estimator value, plots of residuals, and prediction errors were used to assess the error of the model. The residual is a measurable error that depends on the estimates obtained from the population parameter. A residual plot shows the relationship between a particular independent feature, and its output, given the presence of other independent features in the model. This plot of the model is shown in Fig. 4. A regression prediction error plot is a qualitative measure of how a model predicts a response. In determining the prediction error plot, the x_train variable as an independent feature, and the y_train feature as a dependent feature plays a role in the model. This plot of the model is shown in Fig. 5. After tuning the GB algorithm, and the n_estimator values, the maximum R2 value of the model was obtained (R2 = 0.861), and the model has a remarkable ability to predict the IF. Both Figs. 4, and 5 were drawn by matplotlib library of python software.
The residual plot of the model in optimal conditions, using the GB algorithm after tuning the n-estimators (n-estimators = 250). Because the training, and test data are closer, the output is more accurately predicted.
The prediction error plot of the model using the GB algorithm after tuning the n-estimators (n-estimators = 250). The closer the best fit, and the identity line are, the more accurate the prediction. Assuming no errors, the two lines almost coincide.
Application of the model
The last criterion to evaluate the performance of the model, is its applicability in predicting outputs which their real value is known. Therefore, the use of the model in IF prediction was done for thirty different samples, and the results are shown in Table 4 .In Table 4, the actual and predicted IF values are close with a slight difference. This shows the excellent performance of the model in predicting IF with minimum error.
Conclusion
Using ML to predict IF, and determine optimal synthetic conditions lead to reduce costs, improve accuracy, increase reproducibility, decrease human error, increase speed, save time, eliminate consume chemical reagents, and no needing devices, and equipment. To create an accurate model, different algorithms such as CART, KNN, SVR, RF, ET, and GB, and the pipelines were used. The ensemble algorithms showed more accuracy in predicting the IF than non-linear regression algorithms, because these algorithms combine several weak outputs to achieve an accurate output. The minimum accuracy of the model using non-linear regression algorithms was related to SVR algorithm (R2 = 0.045, MAE = − 3.183, MSE = − 38.735). The CART algorithm had the maximum prediction accuracy among the non-linear regression algorithms (R2 = 0.645, MAE = − 1.966, MSE = − 13.811). The results showed that the GB algorithm, after tuning the n_stimators hyper parameter, had the maximum accuracy of the model (R2 = 0.871, MAE = − 0.982, MSE = − 2.303). The change in the accuracy of the model when using different algorithms, depends on the algorithm type, and data distribution. Another important factor that affects model accuracy, is the correlation between features, and output. Furthermore, the type of the feature selection method used for modeling is effective in increasing or decreasing the final accuracy of the model. The most important challenge that researchers face for modeling, is the preparation of literature or experimental data sets. So, the only limitation in this work was the preparation of the experimental data set. It is worth noting that after producing these data in the laboratory once and performing modeling with precision, there is no need for practical work, waste of time and money, and the produced models can be used for different samples and under different syntheses. In fact, using this model, the IF of different template molecules can be predicted with high accuracy and minimum error. Although the use of GB algorithms in chemistry prediction problems is still new, the results presented in this study are very encouraging.
Data availability
The data and materials that support the findings of this study are available from the corresponding author, upon reasonable request.
References
Cerezo, M., Verdon, G., Huang, H. Y., Cincio, L. & Coles, P. J. Challenges and opportunities in quantum machine learning. Nat. Comput. Sci. 2(9), 567–576. https://doi.org/10.1038/s43588-022-00311-3 (2022).
Panteleev, J., Gao, H. & Jia, L. Recent applications of machine learning in medicinal chemistry. Bioorganic Med. Chem. Lett. 28(17), 2807–2815. https://doi.org/10.1016/j.bmcl.2018.06.046 (2018).
Paruzzo, F. M. et al. Chemical shifts in molecular solids by machine learning. Nature Commun. 9(1), 1–10. https://doi.org/10.1038/s41467-018-06972-x (2018).
Willard, J., Jia, X., Xu, S., Steinbach, M., & Kumar, V. Integrating physics-based modeling with machine learning: A survey. arXiv preprint arXiv:2003.04919, 1(1), 1–34 (2020). https://doi.org/10.1145/1122445.1122456.
Merembayev, T., Yunussov, R., & Yedilkhan, A. Machine learning algorithms for classification geology data from well logging. In 2018 14th International Conference on Electronics Computer and Computation (ICECCO) (pp. 206–212). IEEE (2018). https://doi.org/10.1109/ICECCO.2018.8634775.
Kolluri, S., Lin, J., Liu, R., Zhang, Y. & Zhang, W. Machine learning and artificial intelligence in pharmaceutical research and development: A review. AAPS J. 24(1), 1–10. https://doi.org/10.1208/s12248-021-00644-3 (2020).
Sidey-Gibbons, J. A. & Sidey-Gibbons, C. J. Machine learning in medicine: A practical introduction. BMC Med. Res. Methodol. 19(1), 1–18. https://doi.org/10.1186/s12874-019-0681-4 (2019).
Khan, A. I. & Al-Habsi, S. Machine learning in computer vision. Procedia Comput. Sci. 167, 1444–1451. https://doi.org/10.1016/j.procs.2020.03.355 (2020).
Kourou, K., Exarchos, T. P., Exarchos, K. P., Karamouzis, M. V. & Fotiadis, D. I. Machine learning applications in cancer prognosis and prediction. Comput. Struct. Biotechnol. J. 13, 8–17. https://doi.org/10.1016/j.csbj.2014.11.005 (2015).
Ballard, Z., Brown, C., Madni, A. M. & Ozcan, A. Machine learning and computation-enabled intelligent sensor design. Nat. Mach. Intell. 3(7), 556–565. https://doi.org/10.1038/s42256-021-00360-9 (2021).
Ferraro, M. et al. Multi-target dopamine D3 receptor modulators: Actionable knowledge for drug design from molecular dynamics and machine learning. Eur. J. Med. Chem. 188, 111975–112016. https://doi.org/10.1016/j.ejmech.2019.111975 (2020).
Gonzalez-Cava, J. M., Arnay, R., Méndez Pérez, J. A., León, A., Martín, M., Jove-Perez, E. & Cos Juez, F. J. D., A machine learning based system for analgesic drug delivery. In International Joint Conference SOCO’17-CISIS’17-ICEUTE’17 León, Spain, September 6–8, 2017, Proceeding (pp. 461–470). Springer, Cham (2017). https://doi.org/10.1007/978-3-319-67180-2_45.
Lavallin, A. & Downs, J. A. Machine learning in geography–past, present, and future. Geogr. Compass 15(5), e12563. https://doi.org/10.1111/gec3.12563 (2021).
Haupt, S. E., Cowie, J., Linden, S., McCandless, T., Kosovic, B., & Alessandrini, S, 2018. October). Machine learning for applied weather prediction. In 2018 IEEE 14th International Conference On e-Science (e-Science) (pp. 276–277). https://doi.org/10.1109/eScience.2018.00047.
Gormley, A. J. & Webb, M. A. Machine learning in combinatorial polymer chemistry. Nat. Rev. Mater. 6(8), 642–644. https://doi.org/10.1038/s41578-021-00282-3 (2021).
Ata, A., Khan, M. A., Abbas, S., Ahmad, G. & Fatima, A. Modelling smart road traffic congestion control system using machine learning techniques. Neural Netw. World 29(2), 99. https://doi.org/10.1431/NNW.2019.29.008 (2019).
Pacheco, F., Exposito, E., Gineste, M., Baudoin, C. & Aguilar, J. Towards the deployment of machine learning solutions in network traffic classification: A systematic survey. IEEE Commun. Surv. Tutor. 21(2), 1988–2014. https://doi.org/10.1109/COMST.2018.2883147 (2018).
Beyazit, S., Bui, B. T. S., Haupt, K. & Gonzato, C. Molecularly imprinted polymer nanomaterials and nanocomposites by controlled/living radical polymerization. Prog. Polym. Sci. 62, 1–21. https://doi.org/10.1016/j.progpolymsci.2016.04.001 (2016).
Huang, D. L. et al. Application of molecularly imprinted polymers in wastewater treatment: A review. Environ. Sci. Pollut. Res. 22(2), 963–977. https://doi.org/10.1007/s11356-014-3599-8 (2015).
Dong, C. et al. Molecularly imprinted polymers by the surface imprinting technique. Eur. Polym. J. 145, 110231. https://doi.org/10.1016/j.eurpolymj.2020.110231 (2021).
Wang, M., Cetó, X. & Del Valle, M. A sensor array based on molecularly imprinted polymers and machine learning for the analysis of fluoroquinolone antibiotics. ACS Sens. 7(11), 3318–3325. https://doi.org/10.1021/acssensors.2c01260 (2022).
Zhou, Z. H. A brief introduction to weakly supervised learning. Natl. Sci. Rev. 5(1), 44–53. https://doi.org/10.1093/nsr/nwx106 (2018).
Alexopoulos, K., Nikolakis, N. & Chryssolouris, G. Digital twin-driven supervised machine learning for the development of artificial intelligence applications in manufacturing. Int. J. Comput. Integ. Manuf. 33(5), 429–439. https://doi.org/10.1080/0951192X.2020.1747642 (2020).
Rong, S., & Bao-Wen, Z. The research of regression model in machine learning field. In MATEC Web of Conferences (Vol. 176, p. 01033). EDP Sciences (2018). https://doi.org/10.1051/matecconf/201817601033
Jagielski, M., Oprea, A., Biggio, B., Liu, C., Nita-Rotaru, C., & Li, B. Manipulating machine learning: Poisoning attacks and countermeasures for regression learning. In 2018 IEEE Symposium on Security and Privacy (SP) (pp. 19–35). IEEE (2018).https://doi.org/10.1109/SP.2018.00057.
Jha, S. K. & Hayashi, K. A novel odor filtering and sensing system combined with regression analysis for chemical vapor quantification. Sens. Actuators, B Chem. 200, 269–287. https://doi.org/10.1016/j.snb.2014.04.022 (2014).
Wang, Z., Chen, W., Gu, S., Wang, J. & Wang, Y. Discrimination of wood borers infested Platycladus orientalis trunks using quartz crystal microbalance gas sensor array. Sens. Actuators B Chem. 309, 127767. https://doi.org/10.1016/j.snb.2020.127767 (2020).
Van de Reydt, E., Maron, N., Saunderson, J., Boley, M., & Junkers, T. Machine-learning based prediction of kinetic rate coefficients in radical polymerization. (2022). https://doi.org/10.26434/chemrxiv-2022-v5nz8.
Mostafa, A. E. et al. Computer-aided design of eco-friendly imprinted polymer decorated sensors augmented by self-validated ensemble modeling designs for the quantitation of drotaverine hydrochloride in dosage form and human plasma. J. AOAC Int. https://doi.org/10.1093/jaoacint/qsad049 (2023).
Krokidis, M. G. et al. A sensor-based perspective in early-stage parkinson’s disease: Current state and the need for machine learning processes. Sensors 22(2), 409. https://doi.org/10.3390/s22020409 (2022).
Ferreira, A. J. & Figueiredo, M. A. Boosting algorithms: A review of methods, theory, and applications. Ensemble Mach. Learn. https://doi.org/10.1007/978-1-4419-9326-7_2 (2012).
Liu, C. et al. Random forest algorithm-enhanced dual-emission molecularly imprinted fluorescence sensing method for rapid detection of pretilachlor in fish and water samples. J. Hazard. Mater. 439, 129591. https://doi.org/10.1016/j.jhazmat.2022.129591 (2022).
Ahmadpour, H. & Hosseini, S. M. M. A solid-phase luminescence sensor based on molecularly imprinted polymer-CdSeS/ZnS quantum dots for selective extraction and detection of sulfasalazine in biological samples. Talanta 194, 534–541. https://doi.org/10.1016/j.talanta.2018.10.053 (2019).
Panahi, Y., Motaharian, A., Hosseini, M. R. M. & Mehrpour, O. High sensitive and selective nano-molecularly imprinted polymer based electrochemical sensor for midazolam drug detection in pharmaceutical formulation and human urine samples. Sens. Actuators, B Chem. 273, 1579–1586. https://doi.org/10.1016/j.snb.2018.07.069 (2018).
Acknowledgements
The authors thank the real sample analysis laboratory for providing the conditions for the synthesis of MIPs and experimental results.
Author information
Authors and Affiliations
Contributions
B.Y.: Carried out the experiment, wrote the manuscript, developed the theory and performed the computations, and contributed to sample preparation. S.M.-R.M.H.: Contributed to the final version of the manuscript, designed and directed the project, and contributed to sample preparation. S.M.H.: Conceived and planned the experiments, designed and directed the project, and contributed to interpreting the results. The authors give their consent to the publication of identifiable details, which can include photographs, plots, and details within the text (Materials, synthesis method, and the result of using feature selection methods) to be published in the above Journal and Article. I confirm that I have seen and been allowed to read both the Material and the Article to be published.
Corresponding author
Ethics declarations
Competing interests
The authors declare no competing interests.
Additional information
Publisher's note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.
About this article
Cite this article
Yarahmadi, B., Hashemianzadeh, S.M. & Milani Hosseini, S.MR. Machine-learning-based predictions of imprinting quality using ensemble and non-linear regression algorithms. Sci Rep 13, 12111 (2023). https://doi.org/10.1038/s41598-023-39374-1
Received:
Accepted:
Published:
Version of record:
DOI: https://doi.org/10.1038/s41598-023-39374-1
This article is cited by
-
Machine Learning-based Prediction and Experimental Validation of Cr (VI) Adsorption Capacity of Chitosan-based Composites
Journal of Polymers and the Environment (2025)
-
OptiFeat: enhancing feature selection, a hybrid approach combining subject matter expertise and recursive feature elimination method
Discover Computing (2024)
-
Optimisation of electrochemical sensors based on molecularly imprinted polymers: from OFAT to machine learning
Analytical and Bioanalytical Chemistry (2024)







