Fig. 4: Attribution analysis of the Bayesian optimization model.

a Descriptor ranking by SHAP (SHapley Additive exPlanations) values based on the Gaussian process regression (GPR) model for optimizing Pm values (Pm, probability of meso linkages) of the entire dataset of Al complexes. The top rank indicates the most significant effects across all the predictions. The positive and negative SHAP value refers to positive and negative correlation, respectively, between the measured Pm value and the corresponding feature. The larger the absolute value, the stronger the correlation is. The color coding indicates normalized high (red) to low (blue) feature values. b Correlation between Pm and %Vbur of the Al complex with the Pm value. %Vbur, percent of buried volume. c–e Multivariant regression model highlighting important descriptors impacting c the Pm values for all Al complexes, d the Pr values (probability of racemic dyad formation) in Al complexes with AmC1B2 and AmC1B3 ligands, and e the Pm values in Al complexes with A3CpBn ligands. EHOMO(A), HOMO energy of the Am fragment (HOMO, the highest occupied molecular orbital); EN(A), electronegativity of the Am fragment; Vbur, %Vbur of the whole ligand; Vbur-max(A), maximum %Vbur of the Am fragment; Freqmin(A), minimum frequency of the Am fragment; EHOMO(BC), HOMO energy of the BnCp fragment; EN(BC), electronegativity of the BnCp fragment; Vbur-min(BC), minimum %Vbur of the BnCp fragment; Freqmin(BC), minimum frequency of the BnCp fragment; MAE mean absolute error. In the equations in c–e, the descriptors of the Am fragment are highlighted in purple, and those of the BnCp fragment in green, and residue numbers in pink.