Machine learning models for accurately predicting properties of CsPbCl3 Perovskite quantum dots

Çadırcı, Mehmet Sıddık; Çadırcı, Musa

doi:10.1038/s41598-025-08110-2

Download PDF

Article
Open access
Published: 22 August 2025

Machine learning models for accurately predicting properties of CsPbCl₃ Perovskite quantum dots

Mehmet Sıddık Çadırcı¹ &
Musa Çadırcı²

Scientific Reports volume 15, Article number: 30924 (2025) Cite this article

1825 Accesses
1 Altmetric
Metrics details

Subjects

Abstract

Perovskite Quantum Dots (PQDs) have a promising future for several applications due to their unique properties. This study investigates the effectiveness of Machine Learning (ML) in predicting the size, absorbance (1S abs) and photoluminescence (PL) properties of CsPbCl₃ PQDs using synthesizing features as the input dataset. The study employed ML models of Support Vector Regression (SVR), Nearest Neighbour Distance (NND), Random Forest (RF), Gradient Boosting Machine (GBM), Decision Tree (DT) and Deep Learning (DL). Although all models performed highly accurate results, SVR and NND demonstrated the best accurate property prediction by achieving excellent performance on the test and training datasets, with high R², low Root Mean Squared Error (RMSE) and low Mean Absolute Error (MAE) metric values. Given that ML is becoming more superior, its ability to understand the QDs field could prove invaluable to shape the future of nanomaterials designing.

Topological feature engineering for machine learning based halide perovskite materials design

Article Open access 22 September 2022

Faithful novel machine learning for predicting quantum properties

Article Open access 26 July 2025

Machine-learning-assisted and real-time-feedback-controlled growth of InAs/GaAs quantum dots

Article Open access 29 March 2024

Introduction

Within computer science, Artificial Intelligence (AI) refers to the study of designing intelligent machines which can sense the environment around them and take appropriate actions accordingly¹. Machine Learning (ML) is a branch of artificial intelligence that employs algorithms to develop mathematical models from data for solving specific problems directly without applying physical principles that gave birth to the data. This method is especially valuable when the connection between the variables used in the study and the study’s results is not understood. Widespread access to computational power and the increasing amount of data available for experimentation, have resulted in the emergence and application in different areas of science and industry of advanced machine learning models^2,3. Thanks to the capability of evaluating a massive amount of data, Machine learning (ML) is significant for predicting QDs’ properties with high accuracy. Since QDs’ properties are highly dependent on the size and composition⁴, ML algorithms are appropriate tools to handle the data and exhibit well-expressed interactions between the input variables and the resultant properties. Moreover, ML can improve the procedure of synthesising of QDs in order to give out desired characteristics without running more expensive tests and complex simulations, which take much time⁵. Besides, ML can unearth hidden patterns from data that aid scientists in understanding new mechanisms and links in QDs^6,7. Predicting properties of QDs in diverse conditions using ML is important in materials design for specific applications⁵. Although material property prediction can be implemented using traditional computational approaches(TCAs) such as density functional theory, they are expensive, time-consuming and less accurate for complex data systems. However, newly developed ML models are considered an alternative to the TCAs, overcoming the struggles that TCAs face. For example, it is reported that ML models can achieve highly accurate, unachievable property prediction using the DFT model⁸.

All inorganic metal halide Perovskite Quantum Dots (PQDs) have shown great promise due to their unique optical and electronic properties. They exhibit size- and composition-dependent properties and are cubic, with the size ranging from approximately 3 nm to 15 nm. Therefore, PQDs offer wide-ranging band-gap tunability and property tailoring capabilities. Compared to their counterparts, PQDs have high photo-luminescence quantum yield, narrow emission line-width, higher stability, higher charge mobility, and longer diffusion length^{9,10,11,12,13}. These properties make them exceptionally critical in a wide range of applications, including solar cells¹⁴, lasers¹³, LEDs¹⁵, and medical imaging¹⁶. PQDs are produced using the colloidal synthesis method at high temperatures, where the temperature, reaction time, and the amount of substance are significantly vital for the properties of PQDs. Therefore, to obtain PQDs with the anticipated optical and electronic properties, it is necessary to carefully design the experiment conditions with full knowledge of every parameter’s effect. This process is usually carried out by trial and error, which is time-consuming, costly and requires intensive manpower. In this context, ML is a powerful tool for predicting the precise conditions of experiments for synthesising PQDs with the desired properties. It has been shown that ML enables researchers to extract valuable insights from large datasets, such as forecasting various of chemical and physical features of materials to find intricate mathematical relationships within empirical data¹⁷. Modern developments in ML have considerably improved the capability to predict the properties of QDs. For example, several ML algorithms have been used to predict the properties of CdSe, CdS, PbS, PbSe, and ZnSe QDs¹⁸. In a different study, researchers applied predictive ML models to predict absorption, emission, and diameter properties of InP QDs with the least input data, simplifying quicker material detection in nanotechnology¹⁹. Additionally, the band gap of ZnO QDs was estimated using supervised ML models²⁰. Furthermore, ML helped researchers to discover a new shape of CdSe/CdS core/shell QDs, a vivid example of how ML pave ways to realise novel properties²¹. Reports on the ML application of QDs are only 0.1 % of all ML applications in material science²² Despite the major progress in recent years and several landmark studies with ML applied to PQDs (e.g.,^23,24,25), comprehensive ML-centered research specifically targeting the optical properties of $\mathrm {CsPbCI_3}$ PQDs has remained limited. Our work attempts to fully fill this gap by addressing a systematic comparison of multiple ML aspects on this material. The reasons we chose $\mathrm {CsPbCl_3}$ PQDs instead of $\mathrm {CsPbBr_3}$ and $\mathrm {CsPbI_3}$ PQDs are that these nanostructures provide blue outward to violate wavelengths, which is crucial for applications requiring short-wavelength light sources. At the same time, unlike $\mathrm {CsPbBr_3}$²⁶ and $\mathrm {CsPbI_3}$ PQDs²⁷, there is no ML research on $\mathrm {CsPbCl_3}$ PQDs. Finally, it should be noted that $\mathrm {CsPbCl_3}$ PQDs have superior environmental stability compared to $\mathrm {CsPbBr_3}$ and $\mathrm {CsPbI_3}$ PQDs²⁸.

In this work, we applied several ML algorithms to predict the output properties of $\textrm{CsPbCl}_3$ PQDs synthesized via the hot-injection method. Based on the predicted photoluminescence, absorption and size properties of $\textrm{CsPbCl}_3$ PQDs, we compared the performances of ML models of SVR, NND, Deep Learning, Decision Tree, Random Forest, and GBM. The paper will be organised as follows: The next section deals with the methodology, which briefly explains how data is acquired and is pre-processed before being used by any machine learning algorithm to predict properties of $\textrm{CsPbCl}_3$ QDs. Literature review and theory background for six machine learning methods are discussed in ’Methodology’ Section, along with evaluation metrics, namely $\textrm{R}^2$, RMSE, and MAE to determine how well our models perform. ’Results and Discussion’ Section, we focuses on presenting our research findings; consequently, we provide the performance details for each model and insights derived from the predictive analyses. ’Conclusion’ Section summarises the study’s conclusions.

Methodology

Data description

We initiated our research by thoroughly analysing existing literature to compile a comprehensive database of hot injection synthesis parameters for $\textrm{CsPbCl}_3$ PQDs. The data were collected from 59 peer-reviewed articles, comprehensively listed and cited in Table S2 (Supporting Information). Once the selected papers were decided, relevant synthesis parameters and the corresponding output properties were extracted manually. The following parameters are considered as the independent input variables to train algorithms. The injection temperature, the source of chloride (Cl), the amount of Cl in millimoles (mmol), the source of lead (Pb), the amount of Pb in mmol, the cesium (Cs) source, the quantity of Cs in mmol, the molar ratio of Cs-to-Pb, and the molar ratio of Cl to Pb. In addition, the quantities of octadecene (ODE), oleic acid (OA), and oleylamine (OLA) in millilitres (ml), along with the total volume of ligands (OA+OLA) in ml, are also included as input parameters. Furthermore, the ratio of Cl amount to total ligand volume and Pb to total ligand volume are also input features. The output target parameters are PQDs’ nanometer (nm) size, the 1S abs peak in nm and the PL in nm. 1S abs refers to the first excitonic absorption peak, corresponding to the lowest-optical energy transition in PQDs, whereas PL represents the radiative emission that occurs between the lowest conduction and highest valance energy bands in PQDs. We suitably classified the collected data, each variable parameter in its respective columns, and every outcome in its respective rows. As provided in the Supporting Information, seven hundred eight data points (531 input, 177 output) were used for the predictions. This amount of data is enough for accurately predict the properties of nanocrystals using ML^19,24. The model was better trained and handled data management more quickly because of this well-organised records collection. We assume that the input features were independent throughout the modelling process and that the preferred ML models sufficiently capture fundamental associations between input and target variables. Table S1 in the supporting information section indicates the various stages of preparation used to enhance the quality and applicability of data for ML models. The dataset’s reliability was guaranteed using residual analysis to remove outliers and median imputation for missing values. We employed basic regression models to estimate residuals and used a z-score thresholding approach. Those data points where the residuals were greater than $\pm 3$ standard deviations from the mean (z-score > 3 or < − 3) were classified as outliers and removed from the training data. This avoided skewed learning due to excessive or inconsistent synthesis results. Additionally, it employed principal component analysis (PCA) to increase calculation speed during massive computational computations while maintaining roughly $95\%$ of the variance. Polynomial and logarithmic transformations were used in feature engineering to address the skew issue and maintain linkages within the dataset.

Metrics and machine learning methods

The dataset is separated into training and testing categories according to the hierarchical clustering framework instead of using the same ones repetitively to avoid cases where memorising or overfitting hinders new information. We evaluated the six regression methods suitable for small datasets: SVR, NND, DL, DT, RF and GBM. All of these algorithms were built using the scikit-learn library. We utilised both random and stratified sampling techniques to guarantee representative samples for testing and training. We partitioned our data sets into training, which contained 80% examples, and testing, which contained the remaining 20% examples. The tuning hyperparameters were performed through Grid search. We evaluated the model’s performance by computing. $\textrm{R}^2$, MAE and RMSE metrics. MAE mainly considers outliers and compares datasets and models with different objectives measured on the scale. A simple way to visualise the model’s performance is to look at its MAE value; lower values correspond to higher predicted accuracy. The distance between the predicted and actual value and the observed value of the data sample is the best way to interpret RMSE. If RMSE equals zero, the model correctly estimates the total cost. The coefficient of determination, noted as $\textrm{R}^2$, is a metric that quantifies the degree to which the model accurately represents the data, with values closer to 1 suggesting higher accuracy. The accuracy and performance of the models used for bending force prediction were assessed and compared using three commonly employed statistical metrics: RMSE, MAE, and $R^2$. These metrics are defined mathematically as follows:

$$\begin{aligned} & \text {RMSE} = \sqrt{\frac{1}{N} \sum _{i=1}^{N} (w_i - {\hat{w}}_i)^2} \end{aligned}$$

(1)

$$\begin{aligned} & \text {MAE} = \frac{1}{N} \sum _{i=1}^{N} |w_i - {\hat{w}}_i| \end{aligned}$$

(2)

$$\begin{aligned} & R^2 = 1 - \frac{\sum _{i=1}^{N} (w_i - {\hat{w}}_i)^2}{\sum _{i=1}^{N} (w_i - {\bar{w}})^2} \end{aligned}$$

(3)

where $w_i$ represents the observed values, ${\hat{w}}_i$ denotes the predicted values, $N$ is the total number of predictions, and ${\bar{w}}$ is the mean of the observed values.

In data science, SVR is a line of regression model that effectively models complex relationships within a dataset by mapping input data into a higher dimensional space. SVR’s application is important especially in dealing with high-dimensional data sets and non-linear relationships, which can make SVR computationally intensive especially with large datasets. The SVR model was created using the radial basis function (RBF) kernel with the scikit-learn Python module. The hyperparameters were optimised using a grid search technique. SVR is especially well-suited for accurately forecasting the properties of QDs since it can describe nonlinear dependencies between factors like QD size, composition, and resulting attributes. Due to the close association between these elements and the properties of QDs, ML algorithms like SVR can capture these complicated correlations and reveal interactions between the properties of QDs and input variables.

The SVR model for forecasting the characteristics of nanomaterials is written as:

$$\begin{aligned} f(w) = \sum _{i=1}^N (a_i - a_i^*) K(w_i, w) + b \end{aligned}$$

(4)

where $w$ stands for input features, $w_i$ for support vectors, $a_i, a_i^*$ for Lagrange multipliers, $K(w_i, w)$ for the kernel function, and $b$ the bias term. The model forecasts characteristics like band gaps and surface areas by employing kernels like RBF or polynomial to capture non-linear correlations between features. With little datasets, This method works very well with few datasets for nanomaterials research²⁹.

NND is an important concept in spatial analysis and machine learning. More precisely, it plays a significant role in pattern recognition and classification algorithms, including the k-Nearest Neighbor (k-NN) method. NND is defined as the shortest length that separates any two points within the dataset from one another. NND has been applied in computational geometry foundational concepts, particle systems theoretical analysis, and statistical estimator convergence analysis, according to Ref.³⁰. The Python scikit-learn library was also utilised to implement the NND model. NND algorithm is an essential tool that requires a robust basis for understanding the fundamental mechanism of nanomaterials³¹. Therefore, this algorithm can precisely predict the proprieties such as PL, 1S abs, and size of QDs.

For a collection of nanoparticles $N = \{w_1, w_2, \ldots , w_n\}$ in ${\mathbb {R}}^m$, the NND is defined as :

$$\begin{aligned} \rho _k(w_i, N) = \min _{j \ne i} ||w_i - w_j|| \end{aligned}$$

(5)

where the Euclidean distance between $w_i$ and $w_j$ is represented by $||w_i - w_j||$. The expression for the mean $k$-th NND is:

$$\begin{aligned} H_{N,k} = \frac{1}{N} \sum _{w_i \in N} \log \frac{\rho _k(w_i, N) V_m e^{\psi (k)}}{f(w_i)} \end{aligned}$$

(6)

where $f(w_i)$ is the local density, $\psi (k)$ is a scaling function, and $V_m$ is the volume of the $m$-dimensional ball. This measure offers information about the spatial distribution and configuration of nanoparticles, which is essential for comprehending their chemical and physical characteristics³².

DT are straightforward but robust models that are simple to understand and illustrate. They can deal with both numerical and categorical data, hence becoming flexible for diverse data sets. Nevertheless, DTs are susceptible to overfitting, specifically as the tree becomes too deep. To train the model, a decision tree model was developed using Python’s scikit-learn module. The model’s parameters, including its maximum depth, were changed by applying cross-validation. DT’s algorithm integration into the design of QDs is critical for predicting the properties of QDs as it has the ability to handle complex datasets, and to categorise an optimal mixture of material properties³³.

RF is a machine learning algorithm that has become famous in recent days³⁴. It is considered one of the best machine learning algorithms by many people because it can handle a thousand variables without compromising the accuracy; it is fast; it is simple to implement; and its prediction accuracy is high³⁵. This algorithm has been referred to as one with high-level prediction performance but requiring less tuning, hence regarded as the most appropriate out-of-the-box classification and regression algorithm³⁶. The RF model was implemented in Python’s scikit-learn module. Five hundred trees were used to train the model, and cross-validation was used to optimise max_features, the number of features to take into account at each split. For regression, RF trains each tree independently on a different bootstrap sample drawn with replacement from the training data. Once trained, the RF prediction for an unknown sample $w$ is computed as the average of the predictions from each individual tree, as given by the following equation:

$$\begin{aligned} {\hat{f}}(w) = \frac{1}{N} \sum _{n=1}^{N} f_n(w) \end{aligned}$$

(7)

where: $N$ is the number of decision trees in the random forest, and $f_n(w)$ is the prediction from the $n$-th tree for input $w$. This ensemble approach reduces the variance of individual decision trees, making the random forest a more robust model for regression.

GBM is another machine learning technique that is significantly strong because it combines many weak learners. This technique is efficient in many classification tasks³⁷. It has also been identified for its high predictive accuracy and effectiveness when working with complicated interactions in the data. However, it tends to overfit when not well-adjusted, which is one of the drawbacks of GBM. Python was used to train the GBM model with the scikit-learn library. They were optimised adequately by cross-validation on critical parameters such as learning rate, number of boosting rounds, and max_depth.

DL, particularly neural networks, possess the potential to learn through examples in the same way humans do. These networks do not require specific algorithms and can estimate any nonlinear transformation; hence, they can be used to determine inputs/outputs for intricate systems³⁸. Nonetheless, problems associated with using older model architectures include a lack of balance within the dataset, resulting in memorization rather than generalization by machine learning algorithms themselves, as well as redundancy within feature extraction, along with ignoring cross-layer characteristic interactions³⁹. We used the scikit-learn library in Python to train our RF model.

Results and discussion

The numerical simulations performed in this study have given a more detailed account of how well machine learning techniques could forecast the size, 1S abs and PL for $\textrm{CsPbCl}_3$ PQDs. The employed models were SVR, NND, DT, RF, GBM and DL. Standard measures like RMSE, MAE and $\textrm{R}^2$ were used to assess the models’ performance levels for training and testing data sets.

Initially, to identify the distribution of the data collected from the papers, the outlier data, and data medians, we generated the boxplot for the three output properties as shown in Fig. 1. It is seen that the distribution of median sizes of PQDs is around 9.5 nm, whereas the 1S abs and PL ranges vary between 395 and 402 nm and 405 and 412 nm, respectively. Next, to compare the actual and predicted processed values using ML algorithms, we conducted scatter graphs of all models for three output properties of $\textrm{CsPbCl}_3$ PQDs. Figure 2 compares the DT model prediction against the actual values of size, 1S abs and PL outputs. Clearly, it shows how accurately the model performs. The predicted and observed values for three output properties of PQDs almost overlap in this model. Test data RMSE values of as low as 0.23, 0.19 and 0.16 are respectively obtained for size, 1S abs and PL in this algorithm. On the other hand, The test data MAE indicators for our models were 0.33 (size), 20.29 (1S abs), and 11.46 (PL) for the previously studied InP QDs¹⁹. This demonstrates significant improvements with our current $\textrm{CsPbCl}_3$ PQD results. Our ML approach achieves robust predictive performance for the investigated material system, as demonstrated by a direct comparative analysis. Similar Scatter graphs for other employed ML algorithms are given in Figs. S1, S2, S3, S4, and S5 in the supporting information section. These models also yielded similar prediction performances for properties of $\textrm{CsPbCl}_3$ PQDs. All these findings suggest that ML algorithms are powerful means for accurately estimating PQDs’ properties.

Feature importance charts are effective tools for understanding ML models and their estimates, as they improve interpretability, order input features, and produce feature-engineering visions. The importance of the input feature for predicting 1S abs using the SVR algorithm is shown in Fig. 3. The amounts of Cs and OA are the most significant features. In contrast, the quantity of Cl and ODE is the less important input feature for accurately estimating the indicated property. On the other hand, for size and PL outputs, the amounts of Pb and Cs are the most significant input parameters, respectively (See Figs. S8 and S9). ML models surface Cs ratio importance for CsPbCl3 PQDs, and the results agree with experimental data reported for different types of perovskite materials^40,41,42. The Cs ratio is a critical parameter that significantly changes the optical and electronic properties of the perovskite materials, by altering the morphological and crystal structure and hence the stability.

Table 1 Comparison of performance metrics values for size, 1S abs, and PL using all ML methods.

Full size table

To minimize the training and test data similarity, we compared $\textrm{R}^2$, RMSE, and MAE metrics obtained from test and trained data for 1S abs target output for all employed algorithms, as shown in Fig. 4. The training and testing data appear in different parts of the bar plots. Overall, SVR and NND could perform better than other algorithms for training data, whereas SVR and DT models outperform their rivals for test data. The metric performance comparison for size and PL outputs of the $\textrm{CsPbCl}_3$ PQD can be seen in Figures S6 and S7 in the supporting information section.

Table 1 compiles the metric ($\textrm{R}^2$, RMSE, and MAE) performances of the training and test datasets for three target parameters of $\textrm{CsPbCl}_3$ PQDs using all ML models. All the ML algorithms utilised in this study provide high accuracy for predicting the target characteristics. The proportions of the values of each variable (size, 1S abs, and PL) are maintained across the training and testing datasets thanks to the stratified sample used in the comparisons provided in Table 1. In order to preserve balanced representation and enhance the dependability of the performance indicators for every machine learning technique, this strategy was selected. The SVR model yields the highest $\textrm{R}^2$ values for three target features in the test data category compared with the other models, which is an indicator of being the most accurate prediction method. In this model, in the trained data category, the RMSE and MAE metrics are found to be as low as 0.009, which gives one of the best accuracies among all prediction models. These observations also agree with the results obtained from Fig. S5 for three outputs, where the predicted and observed values are well-correlated. On the other hand, the NND model is also found to be as accurate as the SVR model. It is obvious that the metrics performance of the NND model is nearly identical to those of the SVR model. The NND and SVR models’ comparable performance can be assigned to their respective capacities to successfully represent the underlying relationships in the dataset. When the data structure is well designed for nearest-neighbour approximations, the NND model’s reliance on local spatial relationships may replicate SVR’s non-linear regression capabilities. The well data arrangement in Fig. S1 for size, 1S abs, and PL properties of the PQDs also confirms the accuracy of the NND model.

Conversely, DL and RF models seem to be the least accurate methods for predicting the properties of $\textrm{CsPbCl}_3$ PQDs. For example, the RMSE and MAE metrics values for predicting the test data set for the size feature are 0.74 and 0.56, respectively, being 2 times and 3 times less than those of the SVR model. Although the prediction performance of RF and DL for size features is inferior to other employed models in this study, their performances are marginally better than those used for different QDs in the literature¹⁸. The metrics values for all target parameters obtained from GBM and DT models demonstrate a moderate performance between the SVR-NND and DL-RF algorithm couples. While lacking the complexity of DL and RF or the fine-tuned regularisation of SVR, the GBM and DT models’ moderate performance, which falls between the SVR-NND and DL-RF groups, is consistent with their ability to handle non-linear relationship modelling. These two models showed better prediction performance when used for PL output estimation.

Figure 5 shows the Pearson correlation heatmap to demonstrate the correlations between the input and output parameters dataset of $\textrm{CsPbCl}_3$ PQDs. A Pearson correlation heatmap is a graphical representation that effectively communicates the Pearson correlation coefficients (ranging between -1 and +1) among variables within a dataset. The heatmap displays a colour-coded matrix, with warmer colours indicating stronger positive correlations, cooler colours representing stronger negative correlations, and neutral colours signifying no link between variables⁴³. 1S abs and PL have a positive correlation of 0.66, whereas the correlations between size and PL and size and 1S abs are considerably inferior. Although increasing the size of the data set facilitates the statistical correlation of observations, this may not be sufficient by itself. The Abs-PL relationship in $\mathrm {CsPbCl_3}$ PQDs is also determined by other factors such as exciton recombination mechanisms, trap state dynamics or ligand-related effects^44,45. In general, a strong correlation between size and PL in QDs is expected due to quantum confinement. However, PL in PQDs is also determined by the mechanism described above. That might be the reason for the weak (R $\approx$ 0.081) Pearson correlation between size and PL observed in Fig. 5.

Conclusion

This study aimed to predict the size, 1S abs and PL properties of $\textrm{CsPbCl}_3$ PQDs by comparing the performances of ML algorithms of SVR, NND, GBM, RF, DT and DL. Generally, nearly all models succeeded in promising outcomes in predicting the outputs of PQDs. Among them, SVR and NND models indicated the best performance as they make accurate predictions and give insights into factors that affect QD properties. The SVR and NND ML models demonstrate RMSE metric values of 0.009 and 0.012 for the train data, and 0.34 and 0.47 for the test data, respectively. These findings are close to actual data, which indicates that the employed ML models can predict properties of $\textrm{CsPbCl}_3$ PQDs with high accuracy. For the future direction, the results suggest that the progress of ML can significantly contribute to the progress of QDs design, resulting in more tailored QDs with specific properties.

This study has certain limitations that should be noted, even though it displays how ML models might be used to forecast the characteristics of $\textrm{CsPbCl}_3$ QDs. Firstly, even though the dataset used in this work is extensive, it might not entirely reflect all of the variability of the experimental circumstances, particularly in rare or edge occurrences. This constraint may impact the models’ applicability to different synthesis circumstances or material systems. Second, the existing method is based on a supervised learning framework, which presumes that high-quality, labelled data is available. In real-world situations, obtaining such data types can be costly and time-consuming, which could restrict the use of these techniques in settings with limited resources. To get around this restriction, future research could investigate integrating unsupervised or semi-supervised learning strategies. Third, the Pearson correlation heatmap may not adequately capture the complex, non-linear connections between synthesis factors and material qualities, which only reveal linear relationships. Although ML models can identify such complex patterns, additional theoretical or experimental validation is required to guarantee the findings’ physical relevance and dependability. Lastly, because the study is limited to a particular kind of $\textrm{CsPbCl}_3$ QDs material , the models’ performance may differ when used with other nanomaterials with different physical or chemical characteristics. The results would be strengthened by expanding the methodology to incorporate a wider variety of materials and evaluating the models’ resilience in various situations^23,24,25. ML models used in this study have great potential for synthesising CsPbCl3 PQDs. However, it is important to mention that the current models still face several restrictions when cautiously predicting PQDs properties. These restrictions are the complexity of synthesising conditions (dependence of temperature, chemical inputs, time, etc.)⁵, large, high-quality datasets requirements and the lack of ML algorithm interpretability⁴⁶. Given recent theoretical studies exploring generalizable machine learning (ML) frameworks across protein-quality distribution (PQD) compositions, future work could benefit from integrating these established universal modelling strategies with expanded experimental datasets to enhance predictive performance and generalizability²⁵.

Data availibility

The research data and codes supporting the findings of this study are publicly available at GitHub: https://github.com/mehmetsiddik/Machine-Learning-Models-CsPbCI3_QDs.git. This repository also contains supporting results of the manuscript. For any further assistance regarding the data, please contact the corresponding author.

References

Russell, S. J. & Norvig, P. Artificial Intelligence: A Modern Approach (Pearson, 2016).
Google Scholar
Liu, Y. et al. A deep learning system for differential diagnosis of skin diseases. Nat. Med. 26(6), 900–908 (2020).
Article CAS PubMed Google Scholar
Senior, A. W. et al. Improved protein structure prediction using potentials from deep learning. Nature 577(7792), 706–710 (2020).
Article ADS CAS PubMed Google Scholar
Gündoğdu, Y., Kılıç, H. Ş & Çadırcı, M. Third order nonlinear optical properties of CdTe/CdSe quasi type-II colloidal quantum dots. Opt. Mater. 114, 110956 (2021).
Article Google Scholar
Tao, H. et al. Nanoparticle synthesis assisted by machine learning. Nat. Rev. Mater. 6(8), 701–716 (2021).
Article ADS Google Scholar
Sanchez-Lengeling, B. & Aspuru-Guzik, A. Inverse molecular design using machine learning: Generative models for matter engineering. Science 361(6400), 360–365 (2018).
Article ADS CAS PubMed Google Scholar
Yao, Z. et al. Inverse design of nanoporous crystalline reticular materials with deep generative models. Nat. Mach. Intell. 3(1), 76–86 (2021).
Article Google Scholar
Wang, C., Aykol, M. & Mueller, T. Nature of the amorphous-amorphous interfaces in solid-state batteries revealed using machine-learned interatomic potentials. Chem. Mater. 35(16), 6346–6356 (2023).
Article CAS Google Scholar
Huang, H., Bodnarchuk, M. I., Kershaw, S. V., Kovalenko, M. V. & Rogach, A. L. Lead halide perovskite nanocrystals in the research spotlight: Stability and defect tolerance. ACS Energy Lett. 2(9), 2071–2083 (2017).
Article CAS PubMed PubMed Central Google Scholar
Yettapu, G. R. et al. Terahertz conductivity within colloidal CsPbBr3 perovskite nanocrystals: Remarkably high carrier mobilities and large diffusion lengths. Nano Lett. 16(8), 4838–4848 (2016).
Article ADS CAS PubMed Google Scholar
Wu, K. et al. Ultrafast interfacial electron and hole transfer from CsPbBr3 perovskite quantum dots. J. Am. Chem. Soc. 137(40), 12792–12795 (2015).
Article CAS PubMed Google Scholar
De Roo, J. et al. Highly dynamic ligand binding and light absorption coefficient of cesium lead bromide perovskite nanocrystals. ACS Nano 10(2), 2071–2081 (2016).
Article PubMed Google Scholar
Wang, Y. et al. All-inorganic colloidal perovskite quantum dots: A new class of lasing materials with favorable characteristics. Adv. Mater. 27(44), 7101–7108 (2015).
Article CAS PubMed Google Scholar
Zhao, Q. et al. High efficiency perovskite quantum dot solar cells with charge separating heterostructure. Nat. Commun. 10(1), 2842 (2019).
Article ADS PubMed PubMed Central Google Scholar
Li, Y.-F., Feng, J. & Sun, H.-B. Perovskite quantum dots for light-emitting devices. Nanoscale 11(41), 19119–19139 (2019).
Article CAS PubMed Google Scholar
Ryu, I. et al. In vivo plain x-ray imaging of cancer using perovskite quantum dot scintillators. Adv. Funct. Mater. 31(34), 2102334 (2021).
Article CAS Google Scholar
Lo, Y.-C., Rensi, S. E., Torng, W. & Altman, R. B. Machine learning in chemoinformatics and drug discovery. Drug Discov. Today 23(8), 1538–1546 (2018).
Article CAS PubMed PubMed Central Google Scholar
Baum, F., Pretto, T., Köche, A. & Santos, M. J. L. Machine learning tools to predict hot injection syntheses outcomes for II–VI and IV–VI quantum dots. J. Phys. Chem. C 124(44), 24298–24305 (2020).
Article CAS Google Scholar
Nguyen, H. A. et al. Predicting indium phosphide quantum dot properties from synthetic procedures using machine learning. Chem. Mater. 34(14), 6296–6311 (2022).
Article CAS Google Scholar
Regonia, P. R. et al. Predicting the band gap of ZnO quantum dots via supervised machine learning models. Optik 207, 164469 (2020).
Article CAS Google Scholar
Liu, R. et al. Causal inference machine learning leads original experimental discovery in CdSe/CdS core/shell nanoparticles. J. Phys. Chem. Lett. 11(17), 7232–7238 (2020).
Article CAS PubMed Google Scholar
Gulevich, D., Nabiev, I. & Samokhvalov, P. Machine learning-assisted colloidal synthesis: A review. Mater. Today Chem. 35, 101837 (2024).
Article CAS Google Scholar
Chen, G. et al. Machine learning-assisted microfluidic synthesis of perovskite quantum dots. Adv. Photon. Res. 4(1), 2200230 (2023).
Article CAS Google Scholar
Zhang, S. et al. Machine learning-driven fluorescent sensor array using aqueous CsPbBr3 perovskite quantum dots for rapid detection and sterilization of foodborne pathogens. J. Hazard. Mater. 483, 136655 (2025).
Article CAS PubMed Google Scholar
Chen, M. et al. Application of machine learning in perovskite materials and devices: A review. J. Energy Chem. 94, 254–272 (2024).
Article CAS Google Scholar
Xuan, W. et al. Machine learning-assisted sensor based on CsPbBr3@ZnO nanocrystals for identifying methanol in mixed environments. ACS Sens. 8(3), 1252–1260 (2023).
Article CAS PubMed Google Scholar
Liu, Y., Fang, W.-H. & Long, R. Significant impact of defect fluctuation on charge dynamics in CsPbI3: A study combining machine learning with quantum dynamics. J. Phys. Chem. Lett. 15(14), 3764–3771 (2024).
Article CAS PubMed Google Scholar
Song, S. et al. 4-chlorrate assisted in situ passivation of CsPbCl3 perovskite quantum dots with high water stability for light-emitting diode. ACS Appl. Nano Mater. 7(15), 17561–17568 (2024).
Article CAS Google Scholar
Okoye, P. C. et al. Modeling energy gap of doped tin (II) sulfide metal semiconductor nanocatalyst using genetic algorithm-based support vector regression. J. Nanomater. 2022(1), 8211023 (2022).
Article Google Scholar
Liitiäinen, E., Lendasse, A. & Corona, F. Bounds on the mean power-weighted nearest neighbour distance. Proc. R. Soc. A Math. Phys. Eng. Sci. 464(2097), 2293–2301 (2008).
ADS MathSciNet Google Scholar
Miyagawa, M. Nearest neighbor distance in three-dimensional space. Forma 33(1), 7–11 (2018).
MathSciNet Google Scholar
Cadirci, M. S., Evans, D., Leonenko, N. & Makogin, V. Entropy-based test for generalised Gaussian distributions. Comput. Stat. Data Anal. 173, 107502 (2022).
Article MathSciNet MATH Google Scholar
Gajewicz, A. et al. Decision tree models to classify nanomaterials according to the df4nanogrouping scheme. Nanotoxicology 12(1), 1–17 (2018).
Article CAS PubMed Google Scholar
Zhou, Y. & Qiu, G. Random forest for label ranking. Expert Syst. Appl. 112, 99–109 (2018).
Article Google Scholar
Sari, N.N., Zain, I., Fithriasari, K., & Muhaimin, A. Br+ for addressing imbalanced multilabel data classification combined with resampling technique, in: AECon 2020: Proceedings of The 6th Asia-Pacific Education And Science Conference, AECon 2020, Purwokerto, Indonesia, European Alliance for Innovation, 2021, p. 421.
Rhodes, J. S., Cutler, A. & Moon, K. R. Geometry-and accuracy-preserving random forest proximities. IEEE Trans. Pattern Anal. Mach. Intell. https://doi.org/10.1109/TPAMI.2023.3263774 (2023).
Article PubMed Google Scholar
Dong, M., Yao, L., Wang, X., Benatallah, B., & Zhang, S. Grcan: Gradient boost convolutional autoencoder with neural decision forest, arXiv preprint arXiv:1806.08079.
Zhao, N. & Lu, J. Review of neural network algorithm and its application in temperature control of distillation tower. J. Eng. Res. Rep. 20(4), 50–61 (2021).
Article ADS MathSciNet Google Scholar
Ma, X., Shan, J., Ning, F., Li, W. & Li, H. EFFNet: A skin cancer classification model based on feature fusion and random forests. PLoS One 18(10), e0293266 (2023).
Article CAS PubMed PubMed Central Google Scholar
Saliba, M. et al. Cesium-containing triple cation perovskite solar cells: Improved stability, reproducibility and high efficiency. Energy Environ. Sci. 9(6), 1989–1997 (2016).
Article CAS PubMed PubMed Central Google Scholar
Ašmontas, S. et al. Impact of cesium concentration on optoelectronic properties of metal halide perovskites. Materials 15(5), 1936 (2022).
Article ADS PubMed PubMed Central Google Scholar
Geng, Y., Yang, B., Cao, P., Shi, M. & Zou, J. Effects of CS+ concentration on the optical properties and stability of perovskite quantum dots. ECS J. Solid State Sci. Technol. 9(3), 036005 (2020).
Article ADS Google Scholar
Rainey, C. et al. An experimental machine learning study investigating the decision-making process of students and qualified radiographers when interpreting radiographic images. PLoS Digit. Health 2(10), e0000229 (2023).
Article PubMed PubMed Central Google Scholar
Zhou, J. et al. The luminescence mechanism of ligand-induced interface states in silicon quantum dots. Nanoscale Adv. 5(15), 3896–3904 (2023).
Article ADS CAS PubMed PubMed Central Google Scholar
Giansante, C. & Infante, I. Surface traps in colloidal quantum dots: A combined experimental and theoretical perspective. J. Phys. Chem. Lett. 8(20), 5209–5215 (2017).
Article CAS PubMed PubMed Central Google Scholar
Jin, K. et al. An explainable machine-learning approach for revealing the complex synthesis path-property relationships of nanomaterials. Nanoscale 15(37), 15358–15367 (2023).
Article CAS PubMed Google Scholar

Download references

Author information

Authors and Affiliations

Faculty of Science, Department of Statistics, Cumhuriyet University, Sivas, Turkey
Mehmet Sıddık Çadırcı
Department of Electrical and Electronics Engineering, Duzce University, Düzce, Turkey
Musa Çadırcı

Authors

Mehmet Sıddık Çadırcı
View author publications
Search author on:PubMed Google Scholar
Musa Çadırcı
View author publications
Search author on:PubMed Google Scholar

Contributions

M.S.C.: Conceptualization, Methodology, Software, Validation, Formul analysis, Investigation, Writing - Original Draft, Data Curation, Visualization, Project administration, Supervision, M.C.: Methodology, Validation, Formul analysis, Resource, Visualization, Writing -Review & Editing.

Corresponding author

Correspondence to Mehmet Sıddık Çadırcı.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

Supplementary Information.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.

Reprints and permissions

About this article

Cite this article

Çadırcı, M.S., Çadırcı, M. Machine learning models for accurately predicting properties of CsPbCl₃ Perovskite quantum dots. Sci Rep 15, 30924 (2025). https://doi.org/10.1038/s41598-025-08110-2

Download citation

Received: 14 December 2024
Accepted: 19 June 2025
Published: 22 August 2025
DOI: https://doi.org/10.1038/s41598-025-08110-2