Fig. 1: Investigation of ensemble effect by machine learning.

a Schematic of the different approaches toward feature engineering. (Left) Filter module: features are evaluated and prioritized using correlation coefficients before initiating model training. (Center) Wrapper module: feature selection is conducted using a base algorithm, with further optimization through iterative model training. (Right) Embedded module: final feature selection is determined by individual contributions to the model, with concurrent model training. M1, the central metal site. M, the synergistic metal site. N, the coordinating nitrogen site. b The original dataset represented by the normalized data (only component features and site & structure features are shown, but the original dataset also contains another 131 reference features, as detailed in Supplementary Table 3, to expand the feature space and serve as the data base for pre-training). The size and color of a squares describe the feature value, and the size is mapped according to the absolute value of the difference between the feature value and 0.5 to allow visibility of the values close to 0. Thus, this means that the larger the red squares are, the values are closer to 0; the smaller the blue squares are, then the values are closer to 1. The detailed definition of features is listed in Supplementary Table 1–3. c, d Feature heat maps (including structure and component features) of the partial DASC (Main metal = Fe). The horizontal axis is the short name of DASC, the vertical axis is the number of features, and the shade of the color indicates the relative magnitude of the values. e Feature importance (%) of the final XGBR model based on embedded module results.