Fig. 2: Schematic illustrations and prediction performances on the Cads Database of the RFR and GAT models.

a Schematic illustrations of site representation-based random forest regression (RFR) models. Parity plots of the density functional theory (DFT) calculated versus machine learning (ML) predicted formation energies of metal‒carbon bonds (M‒C) from the combined validation set in 5-fold cross-validation (CV) using the RFR models with different site representations through (b) sites, (c) sites and site neighbors, as well as (d) sites, site neighbors and coordination numbers (CN). e Schematic illustrations of the connectivity-based graph attention network (GAT) models without embedded CN (GAT-w/oCN) or with embedded CN (GAT-wCN) using GAT convolution (GATConv). Parity plots of the DFT-calculated versus ML-predicted formation energies of M‒C with 5-fold CV using (f) GAT-w/oCN and (g) GAT-wCN models. The Cads Database with 5096 entries is provided in the GitHub repository at Data Availability. Mean absolute error (MAE) and R2 values are provided in parity plots from the RFR and GAT models; the violin plot in the inset shows the absolute error distributions; the inner dashed line represents the median (unit: eV).