Fig. 2: The results for band gap (Eg) model.

a The marginal histograms and plot of Eg versus Sn-Pb ratio. The black dots are collected reported data points, and the gray line is the result of traditional fitting. The bad fitting index (0.705) and the disorder shaded part indicates that it is difficult to find the relationship between Sn-Pb ratio and Eg by traditional method. The inset is band gap distribution of 43 collected perovskites. b The feature importance ranking produced from GBR and SHAP library with 14 inputs, showing the elemental properties in descending order of importance (rank). The x-axis labeled as the SHAP value represents the impact on Eg value. The red and blue color indicate high and low values of a given feature, respectively. The top five features which are most important on the formation of Eg are weighted first ionization energy Eip, Mulliken’s electronegativity of B-site Een, LUMO, tolerance factor Tf and unit cell lattice edge \(\alpha _o^3\), respectively. c The comparison of the predicted Eg values using Sn-Pb ratio as input and the experimentally measured values. The light gray dots are collected reported data points for comparison. The inset shows actual values versus predicted results by GBR model for test set and our experimental samples marked with red and blue dots, respectively. The black line represents the ideal situation of the prediction (predicted results are equal to actual values). The smaller the distance between data point and black line, the better and more reliable the prediction. The subplot of inset shows the convergence of model accuracy