Introduction

The active sites of catalysts, first introduced by H. S. Taylor in 1925, are among the most important yet elusive concepts in heterogeneous catalysis1,2,3. For supported nanocatalysts, this implies a specific ensemble of atoms, which accommodate not only reactant adsorption but also elementary reaction steps and product desorption, thus enabling their application in various chemical reactions4,5,6. Despite extensive investigations thus far, pinpointing the exact location of these active sites at the atomic level and their specific contribution to the observed overall activity remains challenging7. This is due to the intrinsic heterogeneity of the nanocatalysts involved, a complex function of the particle size and morphology, which limits the corresponding activity and selectivity8,9. On the other hand, atomically dispersed catalysts offer complete atom utilization and high selectivity owing to their improved site homogeneity; however, their limited number of available sites, lower mass-specific activity, and long-term stability restrict their broad application10,11. However, supported subnanometer cluster catalysts (SCCs), which are composed of a small group of active metal atoms with better stability, possess multiple sites allowing complex chemical reactions but remain sufficiently small for high atom utilization and selectivity12,13,14,15,16,17. These unique properties have made SCCs promising alternatives, as evidenced by the increase in their applications in various catalytic reactions, such as hydrogenation processes18,19,20,21, oxidation reactions22,23, and many other chemical transformations24,25,26.

To take advantage of SCCs, full exploration of the configuration space is essential but challenging27, since more than thousands of distinct isomers and sites on each isomer due to their intricate permutations and combinations are involved. Moreover, SCCs frequently undergo significant dynamic transitions owing to the close energetics of various isomers and the potential agglomerations/decompositions of clusters of different sizes28,29,30. Finally, many undercoordinated atoms tend to interact strongly with reactants and/or intermediates, which significantly changes their chemical states and morphologies31. These small cluster catalysts are subject to dramatic structural, compositional, and size variations under operational conditions, which profoundly influence their catalytic activity, selectivity, and stability32,33,34,35,36. Although there is increasing consensus on the importance of metastable sites with low concentrations for overall reaction rates37,38,39,40, identification of the very nature of SCCs with full resolution of cluster sizes, isomers, compositions, sites, and reaction pathways for rational design of atom-efficient catalysts remains a challenge.

Herein, we leveraged a machine learning-based multiscale framework to reveal unique catalysis by SCCs. Through the statistical investigation of isomers and the use of data-driven machine learning techniques, we demonstrate a collective effect across different cluster sizes, compositions, isomers, and exposed sites and their impact on reaction pathways and overall activity. Here, we investigated CO oxidation catalyzed by CeO2(111)-supported Cu41. CO oxidation is selected for its simplicity, and extensive experimental data are available, which is perfect for developing corresponding foundational theories for SCCs. Cu is considered a viable alternative to the noble metals Pt, Pd, and Rh in three-way catalysts, and compared with other CeO2 surfaces, CeO2(111) has the highest activity for CO oxidation42.

We discovered that Cu clusters are more active than single-atom and extended-surface catalysts. A variety of active sites, which are composed of different ensembles across various cluster sizes, compositions, isomers, and sites composing both Mars–van Krevelen (MvK) and Langmuir–Hinshelwood (LH) mechanisms, collectively contribute to activity. Using data-driven machine learning techniques, we identify key activity descriptors and uncover the principles governing collective catalytic behavior. Despite distinct reaction pathways and energy landscapes, the collectivity effect emerges from the synergy of high intrinsic activity, dictated by the local environment, and moderate reactant adsorption, coupled with a substantial population of active sites. By incorporating the statistical distribution of cluster isomers and active sites, we determined the key structural parameters, reaction rates, and activation barriers for the Cu/CeO2(111) catalysts, which strongly agreed with the experimental data. This analysis offers crucial insights for catalyst design by optimizing the distribution and population of collective active sites within specific clusters, thereby significantly enhancing the overall catalytic performance.

Results

A machine learning-based framework for heterogeneous cluster catalysis

We leveraged a multiscale modeling and four-step strategy to pinpoint the nature of the active sites and corresponding reaction pathways for SCCs under operational conditions (Fig. 1). To determine the structure and composition of supported clusters under reaction conditions, we first conducted genetic algorithm (GA)-driven modified grand canonical Monte Carlo (M-GCMC) simulations, as detailed in the Supplementary Information and Supplementary Figs. 12. Artificial neural network potentials (ANNPs) were employed to accelerate the M-GCMC simulations, where more than 100,000 cluster structures were sampled. In this way, all possible structures of clusters of different sizes and isomers with the right adsorption and coverage of reactants and intermediates are identified. Considering their free energy of formation, the corresponding distributions and/or concentrations of all the clusters identified, including the metastable ones under thermodynamic equilibrium, are identified according to the Boltzmann distribution law. For practical synthesis, where thermodynamic equilibrium might not be necessarily reached, a cluster distribution from experiments, if available, could be applied.

Fig. 1: Workflow for studying cluster catalysis under reaction conditions.
Fig. 1: Workflow for studying cluster catalysis under reaction conditions.
Full size image

There are four parts: identifying the cluster catalyst composition and structure under operational reaction conditions via machine learning-accelerated grand canonical Monte Carlo (GCMC) simulations; assessing the statistical distributions of sites; computing the reaction rates for each site within the cluster via microkinetic modeling; and gaining a deeper understanding of the sites and comparisons between simulations and experiments. The red, yellow, green, and gray spheres are the O, Ce, Cu, and C atoms, respectively. This notation is used throughout the paper.

For each cluster isomer identified, we considered all exposed sites and optimized the corresponding reaction pathways, from which the isomer- and site-resolved intrinsic reaction rates were calculated via first-principles microkinetics. This process is repeated for all clusters in terms of different isomers and sizes in the presence of adsorbates. By integrating the intrinsic activity of all available sites, weighted by the appropriate distribution function, the overall catalytic activity can be quantified. Here, as a simplification, the sites with the same local coordination number and coordinated reactants/intermediates across all clusters are classified as one type of site. The resulting average reaction rate per site (Ro) of a cluster catalyst accounting for different cluster sizes, isomers, and sites is therefore determined by:

$${R}^{o}=\mathop{\sum }\limits_{n}{p}_{n}{R}_{n}=\mathop{\sum }\limits_{n}{p}_{n}\left(\mathop{\sum }\limits_{site}\,{p}_{n,{site}}\times {r}_{n,{site}}\right)$$
(1)

where pn represents the population of clusters with n metal atoms and Rn denotes the average reaction rate per site for that cluster. pn,site and rn,site refer to the population and reaction rate of a specific site within an n-atom cluster. When isomer distributions are incorporated, Rn for a specific cluster size n is expressed as:

$${R}_{n}=\mathop{\sum }\limits_{{iso}}({P}_{n,{iso}}\times {R}_{n,{iso}})=\mathop{\sum }\limits_{{iso}}\left[{P}_{n,{iso}}\times \left(\mathop{\sum }\limits_{{site}}\,{p}_{n,{iso},{site}}\times {r}_{n,{iso},{site}}\right)\right]$$
(2)

where Rn,iso denotes the average reaction rate per site for a specific isomer of an n-atom cluster, determined by its population Pn,iso and site-specific contributions, which depend on the population of the site pn,iso,site and its corresponding reaction rate rn,iso,site. The populations pn, Pn,iso, pn,site and pn,iso,site are derived from Boltzmann distributions on the basis of Gibbs formation energy calculations, with the detailed methodology provided in Supplementary Note 1. This framework enables a comprehensive calculation of the reaction rate by summing contributions from all cluster sizes, compositions, isomers, and exposed sites, capturing the collective effects in cluster catalysis. Compared with those in experiments, the contributions from specific sites and/or isomers and their collective effects can be identified but are difficult to extract from experiments otherwise. Finally, we used an interpretable machine learning algorithm (SISSO)43 to build a physically meaningful descriptor of activity in terms of the geometric and energy features, providing valuable guidance for the rational design of more efficient SCCs. Structures and components of Cu8/CeO2 under operational conditions.

We first studied CO oxidation on the “magic-numbered” Cu8 cluster on CeO2(111). Cu8 is selected because it closes the first icosahedral shell (maximal intrinsic stability), and ~1 nm Cu8 ensembles are routinely observed on CeO2 under CO/O2 feeds, making it the standard experimental benchmark for Cu/CeO2 catalysis42. Note that, under the CO oxidation conditions considered (T = 400 K and PCO = PO2 = 0.02 bar), we found that oxygen vacancies can be rapidly replenished either by O2 dissociation or oxygen from the supported CuOx clusters (Supplementary Figs. 34). Consequently, the presence of oxygen vacancies is excluded from the following M-GCMC simulations. The optimized clusters, denoted as Cu8(CO)xOy, are shown schematically in Supplementary Figs. 58. As plotted specifically in Fig. 2a, when ΔμCO < − 1.23 eV, no CO is adsorbed, and only dissociative oxygen atoms varying from four to eight adsorbs on Cu. However, when ΔμCO > − 1.23 eV, CO starts to adsorb, varying from two to four. For CO and O2 at partial pressures of 0.02 bar at 400 K (ΔμCO = − 0.86 eV and ΔμO2 = − 0.90 eV, respectively)44, the most stable structure identified is Cu8(CO)2O8.

Fig. 2: Identification of the structures and compositions of the Cu8/CeO2 cluster catalysts under operational conditions.
Fig. 2: Identification of the structures and compositions of the Cu8/CeO2 cluster catalysts under operational conditions.
Full size image

a Phase diagram of Cu8 cluster catalysts as a function of the chemical potentials of gaseous CO and O2. b Isomers of the Cu8(CO)xOy clusters within a 0.50 eV energy window at 400 K, considering CO and O2 partial pressures of 0.02 bar. c Probability distribution of sites with probabilities exceeding 1% in the Cu8(CO)xOy metastable ensemble structure pool on the basis of the proximity of Cu atoms within a cutoff radius of 1.15 times the standard bond length. d Five different sites of the most stable Cu8(CO)2O8 structure under typical CO oxidation conditions (T = 400 K and PCO = PO2 = 0.02 bar).

To quantify the contribution of meta-stable structures to overall activity, for safety, all metastable structures with a free formation energy difference of 0.50 eV were considered (Fig. 2b and Supplementary Fig. 9). In accordance with the Boltzmann statistical distribution law, this means that isomers with populations as low as ~10−7 at 400 K were included. The most stable and populated cluster identified was Cu8(CO)2O8, with the highest population P8,c,iso of 47%. Cu8(CO)3O8, with one more CO molecule, was also identified as a prominent cluster. It has two possible isomers, α and β, with corresponding populations of 17% and 4.2%, respectively (21.2% in total; details are provided in Supplementary Fig. 9 and Supplementary Table 1). The third cluster identified is Cu8(CO)4O8 with one more CO, and it has three possible isomers, α, β and γ, with populations of 16%, 13%, and 2.1% (31.2% in total), respectively (Fig. 2b).

The isomers identified above exhibit a characteristic bilayer structure. Three high-lying Cu atoms bind preferentially to CO molecules and dissociative oxygen atoms, whereas five low-lying interfacial Cu atoms bind to the CeO2 support in a pentagonal configuration. Across all the isomers identified, the exposed sites can be classified into four distinct interfacial sites (labeled IA to ID) and three distinct on-cluster sites (labeled OA to OC) (Fig. 2c, Supplementary Fig. 10 and Table 1). Specifically, the most prevalent isomer, Cu8(CO)2O8, contains five interfacial sites (four IAs and one IB) and three on-cluster sites (one for each OA, OB, or OC) (Fig. 2d). However, for Cu8(CO)3O8, the OA site is no longer available for the more populated isomer α, and one IA site is replaced by one IC site for the less populated isomer β. For Cu8(CO)4O8 with three isomers, the IA sites remain the most populated, and the OA site is available only for isomer β. In terms of the site distribution pn, sites across the Cu8 clusters considered can be concluded that the interfacial sites are the most abundant, with an overall population of 64%, and the corresponding populations for the IA, IB, IC and ID sites are 40%, 14%, 9%, and 1%, respectively. The on-cluster sites OA, OB and OC had populations of 14%, 13%, and 9% (36% in total), respectively.

CO oxidation on the Cu8/CeO2 cluster catalysts

The optimized reaction pathway and potential energy surface (PES) for CO oxidation were identified for all classified sites. Here, both the Mars–van Krevelen (MvK) and Langmuir–Hinshelwood (LH) pathways were considered (Fig. 3a). Note that we also considered the Eley–Rideal (ER) pathway between the gas CO and dissociative oxygen atoms on the supported clusters. For the typical sites tested, the corresponding barriers were comparable to those of the LH pathways (Supplementary Table 2). Considering its much smaller prefactor, its contribution to the overall reactivity was not considered in the following study. Moreover, the participation of lattice oxygen via spillover to the supported Cu cluster for CO oxidation was also excluded because of its dramatically high diffusion barrier of 2.44 eV (Supplementary Fig. 11).

Fig. 3: Study of the CO oxidation mechanism over the Cu8/CeO2 cluster catalyst.
Fig. 3: Study of the CO oxidation mechanism over the Cu8/CeO2 cluster catalyst.
Full size image

a Pathways illustrating the interface reaction mechanism and cluster surface reaction mechanism in cluster catalysts during CO oxidation. The solid and dotted lines represent the dissociation and nondissociation pathways, respectively. The dark blue point, red point, and gray spheres indicate the Cu atom, O atom, and C atom, respectively. The white spheres are oxygen vacancies. We used the oxygen atoms in the CuOx(CO)y/CeO2 clusters to initiate CO oxidation. b The most favorable pathway among the interface reactions. The purple spheres are O atoms involved in CO oxidation. c The most favorable pathway among cluster surface reactions. d CO2 formation rates (rsite) at different temperatures for each site in the Cu8(CO)xOy/CeO2 cluster catalyst, as determined via microkinetic modeling. The apparent activation barriers are indicated in eV. e Distribution of the probability of sites and corresponding CO2 formation rates.

For the interface sites, the MvK pathway is preferred. The most populated IA sites display the highest elementary barrier of 1.36 eV for CO reacting with lattice oxygen (Supplementary Figs. 1213). The least populated ID sites also have considerable barriers of 0.91 eV (Supplementary Figs. 1417 and Supplementary Tables 34). The moderately populated IB and IC sites presented yet modest barriers of 0.58 and 0.72 eV, respectively (Fig. 3b, Supplementary Fig. 18 and Supplementary Table 3); therefore, these two sites would contribute more to the overall activity. Note that CO oxidation via the nondissociative O2 pathway is also preferable because of its rather small barrier of ~ 0.04 eV. Compared with the MvK pathway, benchmark calculations revealed that the LH reaction pathway has a much higher elementary barrier of 0.18 eV at the interface sites (Supplementary Table 2).

However, for the on-cluster sites, the LH pathway is favorable. The calculated PES values indicate that the reaction prefers to occur at the OA site with the highest elementary barrier of 0.74 eV only (Fig. 2c and Supplementary Fig. 19). Here, the optimized reaction pathway reveals that CO reacts with an O atom from the dissociative adsorption of O2, whereas direct CO oxidation via molecular O2, known as the bifunctional effect, encounters a barrier of 0.68 eV greater. The less populated OB and OC sites have barriers higher than 1.23 eV (Supplementary Figs. 2023 and Supplementary Tables 34). This makes the OA sites among the on-cluster sites the most likely contributors to the overall activity. The intrinsic reaction rates of seven classified sites, r8,site, were calculated via microkinetic modeling on the basis of optimized PESs (Fig. 3d and Supplementary Table 5). The IC and OA sites had the highest r8,site values of 9.6×10−2 and 6.0 × 10−2 s−1, respectively, because their lowest apparent barrier Eapp values were 0.22 eV and 0.12 eV, respectively. The ID sites exhibit a lower r8,site of 6.6 × 10−3 s−1 because of their slightly higher Eapp (0.32 eV). Notably, although the IB sites had a similar apparent Eapp of 0.37 eV, the calculated r8,site was 3.1 × 10−5 s−1, two orders of magnitude lower because their oxygen coverage was three orders of magnitude lower than that of the IC and OA sites (Supplementary Fig. 24). However, the IA, OB, and OC sites had the lowest r8,site of 8.2 × 10−6 s−1 because of their considerable Eapp of at least 0.77 eV.

The IC and OA sites had the highest r8,site and considerable population psite values (8.6% and 14%), and the corresponding effective rates R8 were the highest at 8.2 × 10−3 and 8.3 × 10−3 s−1, respectively (Fig. 3e). For the IB sites, although their p8,site is considerable, their three orders of magnitude smaller r8,site results in a much smaller R8 (4.3 × 10−5 s−1). For the ID sites, since both p8,site and r8,site are small, the corresponding R8 values are smaller as well, which are actually very close to those of the IB sites. For the IA, OB, and OC sites, their r8,site values are further decreased by at least 5 orders of magnitude. As a result, both the interfacial IC sites via the MvK pathway and the on-surface OA sites via the LH pathway determine and contribute roughly equally to the overall reaction rate. This reveals a collective effect on how the individual sites contribute distinctly to the overall reactivity.

Although the site-resolved activity is insightful from an atomic point of view, the isomer-resolved activity R8,iso provides another constructive view for rational design. In this context, the site-resolved activity derived was used as input for the specific isomer considered (Supplementary Table 6). Cu8(CO)2O8 has the highest population p8,iso of 47% and contains one active OA site, and the corresponding reaction rate R8,iso is considerable at 3.5 × 10−3 s−1. For Cu8(CO)3O8, the β isomer contains one active IC site, but its p8,iso (4.2%) is rather low, resulting in a relatively small R8,iso of 8.4 × 10−4 s−1. For Cu8(CO)4O8, the α and β isomers have high p8,iso values of 17% and 13%, respectively, and both contain one active OA site; the resulting R8,iso values are considerable at 2.1 × 10−3 s−1 and 2.6 × 10−3 s−1, respectively. For the remaining isomers of Cu8(CO)3O8-α and Cu8(CO)4O8-γ, the corresponding reaction rates can be neglected because of the absence of active sites and/or the lower p8,iso values, respectively. Therefore, the isomer-resolved reaction rates R8,iso for Cu8(CO)4O8, Cu8(CO)2O8, and Cu8(CO)3O8 are 4.7×10−3, 3.5×10−3 and 8.4 × 10−4 s−1, respectively. In addition to configurational isomers of a given stoichiometry, clusters of different stoichiometries are also populated, and the combined ensemble of these species collectively shapes the observed reactivity.

Size effect of the Cu/CeO2 cluster catalysts

We further explored the impact of cluster size by considering Cum clusters (m = 1–4) to account for dynamic transformations and size heterogeneity in the as-prepared catalysts. The most stable structure of the Cum clusters, along with favorable adsorbate coverage, was identified via M-GCMC simulations at 400 K under 0.02 bar CO and 0.02 bar O2 (Supplementary Figs. 2538). To assess their relative stability, a CuO nanoparticle (NP) with a curvature radius of 1.5 nm was used as a zero-energy reference, as Cu tends to oxidize under oxygen-rich conditions. The chemical potential of the CuO NPs was calculated via the Gibbs‒Thomson equation, Δμ = 2Ωγ/R, where γ was approximated by the effective surface energy from the Wulff construction of bulk CuO45,46 under the same conditions (Supplementary Figs. 31, 3945). ANNP calculations revealed that Cu3O4 and Cu8(CO)2O8 are approximately 0.30 eV less stable per Cu atom than bulk CuO/Cu2O is but 0.32 eV more stable than metallic Cu at 400 K and PCO = PO2 = 0.02 bar (Supplementary Table 7). However, detaching a single Cu requires >2 eV (Supplementary Fig. 46), so oxidic subnm ensembles remain thermodynamically preferred and kinetically trapped, matching operando XAFS/XPS observations of persistent Cu oxide clusters on CeO242.

The most stable Cu3 and Cu8 clusters identified had exothermic formation free energies of −0.17 and −0.18 eV/Cu atom, respectively. Specifically, the Cu3 cluster binds to the CeO2 support in a favorable triangular configuration, whereas Cu4 exhibits a quadrilateral configuration with a thermally neutral formation free energy. In contrast, Cu1 and Cu2 have endothermic formation energies of 0.96 eV/Cu atom and 0.14 eV/Cu atom, respectively, making them unfavorable and prone to agglomeration (Fig. 4a and Supplementary Fig. 45). MD simulations (1 ns, 400 K) by ANNPs show that Cu8(CO)2O8/CeO2(111) remains intact with no Cu detachment (Supplementary Fig. 47). The most accessible breakup, Cu8 → Cu7 + Cu1, is uphill by ΔG = 1.33 eV and traverses a 2.06 eV barrier (Supplementary Fig. 46); all other channels (Cu6 + Cu2, Cu5 + Cu3, and Cu4 + Cu4) are > 1.98 eV endergonic (Supplementary Fig. 47). Such energetics render such events negligible under operational conditions. Therefore, Cu3, Cu4 and Cu8 clusters can coexist under operational conditions, although Cu4 has a lower population. The equilibrium concentrations of these clusters were determined on the basis of their energetics, with Cu3, Cu4 and Cu8 as the predominant species, corresponding to monolayer coverages of 0.078, 4.8 × 10−3 and 0.175 ML, respectively (Supplementary Formula S31 and Supplementary Table 8). The coverage for other species was negligible. On the basis of these coverages, the total Cu loading on CeO2(111) was calculated to be 6.9%, which agrees well with the experimental data42,44,47,48 (Supplementary Table 8). Although direct experimental visualization of smaller cluster species remains challenging, our approach effectively elucidates their catalytic contributions to diverse cluster sizes, compositions, isomers, sites and mechanisms (Fig. 4b and Supplementary Figs. 4870), complementing experimental findings. We observed a decrease in Rn with decreasing cluster size. For small clusters (those with fewer than 3 atoms), the isomer and site types are unique, with Rn determined solely by the rn,site. For Cu4 and CuO(111), there are 3 and 2 different types of sites, respectively (Supplementary Tables 912). The interfacial IA sites on Cu4 and CuO have the highest rn,site values of 7.9 × 10−3 and 7.3 × 10−2 s−1, accounting for 48.6% and 50% of all available sites, respectively (Supplementary Table 9). In Cu3, only one OA site has a rn,site value of 5.4 × 10−3 s−1, whereas the other sites have rn,site values of less than 10−5 s−1, rendering their contribution to the overall conversion rates negligible. As a result, the Cu8 cluster and CuO(111) surface exhibit the highest Rn values, achieving rates of approximately 10−2 s−1 and the lowest Eapp values of 0.19 eV and 0.11 eV, respectively. Smaller clusters such as Cu3 and Cu4 have slightly lower Rn values (~10−3 s−1) and higher Eapp values (0.96–1.22 eV). In contrast, the smallest units—Cu1 single atoms and Cu2 dimers—display the lowest activity, with rates of 6.0 × 10−6 s−1 and 5.5 × 10−8 s−1, respectively, and Eapp values exceeding 1.41 eV.

Fig. 4: Effect of the Cu cluster size on CO oxidation.
Fig. 4: Effect of the Cu cluster size on CO oxidation.
Full size image

a The calculated formation energies of the Cum(CO)xOy clusters on the CeO2(111) surface with respect to the 3 nm CuO nanoparticles at 400 K with CO and O2 partial pressures of 0.02 bar. b CO2 formation rates of the Cu clusters (rn) predicted via microkinetic modeling. The apparent activation barriers are indicated in eV. The reaction rates at 400 K are given for different Cu clusters. c The distribution probabilities (blue plot), CO oxidation rates (rn) (green plot), and total CO conversion rates of Cu clusters of different sizes (Rn) (orange column). d The correlation between the transition state energies for O2 dissociation and CO oxidation and the adsorption energy of O2 (depicted in blue) or CO (depicted in red) across various Cu/CeO2 sites on the basis of interpretable machine learning. e Activity map illustrating the relationship between the reaction rate of sites (rsite) and the adsorption energies of CO and O2 at different sites on the Cu/CeO2 cluster catalyst, which vary in size. The red and gray points represent the sites on the cluster surface or at the interface, respectively. f Deviations between the simulations and experiments for 7 observations, namely, CN(Cu-Cu), CN(Cu-O), d(Cu-Cu), d(Cu-O), Ro, Eapp and Q.

The high Rn values of the Cu3 and Cu8 clusters arise from their large populations (41.8% and 56.0%) and the presence of key active sites—on-surface OA for the Cu3 and IC sites and on-surface OA sites on Cu8 (Fig. 4c). These sites collectively enhance CO oxidation activity. In contrast, less stable structures, such as the Cu1 single atom, Cu2 dimer, Cu4 cluster, and CuO(111) surfaces, make minimal contributions because of their lower intrinsic activity and/or fewer exposed sites. For example, Cu4 and CuO have low distributions (1.9% and 0.28%, respectively) (Supplementary Table 8), leading to Rn values of 4.0 × 10−5 s−1 and 3.5 × 10−5 s−1, respectively. Cu1 and Cu2, with even lower distributions (1.3 × 10−15 and 4.8 × 10−5), display significantly low conversion rates of 7.9 × 10−21 and 2.6 × 10−12 s−1, respectively.

Our calculation results suggest that Cu3 or Cu8 clusters are optimal for enhancing the catalytic efficiency and stability. The precise synthesis of these clusters, followed by deposition on CeO2 supports, is critical for maximizing CO oxidation activity, even when the cluster exhibits significant dynamic variation among metastable isomers. Furthermore, our analysis revealed that 68.2% of the activity arises from LH mechanisms, whereas 31.8% is driven by MvK mechanisms, accounting for the population of both interface and surface sites. Despite the different pathways, both mechanisms exhibit similar activation barriers (~0.70 eV) for the rate-determining step of CO reacting with atomic O. These findings highlight another mechanistic collectivity of Cu/CeO2 systems.

To uncover the origin of collectivity in cluster catalysis, we identified key descriptors linking transition state energies for CO oxidation across clusters of varying sizes and the CuO(111) surface via data-driven compressed sensing analysis (Supplementary Note 3). Our analysis revealed that the transition state energies for CO oxidation and/or O2 dissociation are significantly influenced by factors such as ECO, EO2, and Cu–O coordination number (CNCu-O) within both the cluster and support (Fig. 4d). By applying the derived equation for transition state energy estimation in microkinetic simulations, we constructed a two-dimensional activity map that illustrates how varying CO and O2 adsorption energies affect rsite (Fig. 4e).

The collectivity effect arises from the local environment of the Cu sites, captured by the Cu–O binding energy coordination number (BECNCu-O) descriptor and the adsorption energies of CO and O2 (Ei). BECNCu-O is a key factor linking geometric and mechanistic aspects to catalytic performance (Supplementary Note 3). It serves as a crucial descriptor for determining the activation barriers and reaction rates in CO oxidation (Supplementary Fig. 71). The Cu–O coordination number (CNCu-O), CuO formation energy, and CeO2(111) vacancy formation energy (EM-O) collectively influence the transition state energy, directly impacting the catalytic activity.

As the cluster size increases, the reaction mechanism shifts from being primarily cluster-centered to surface and interface-driven pathways. Specifically, as the cluster size increases from Cu1 to Cu4, the bond strength between Cu and CO weakens (from −1.87 eV to −0.45 eV, Supplementary Fig. 72), which lowers the CO oxidation barrier and increases the CO oxidation rate. At larger sizes, such as Cu8, the interface sites present lower BECNCu-O values due to stronger EM-O in CeO2 than in CuO, decreasing the activation barriers and increasing the reaction rates. These sites balance CO/O2 adsorption and Cu–O bond strength, harmonizing the energy landscape and enhancing the mechanistic collectivity effect.

We found that collective active sites on stable Cu3 and Cu8 clusters, despite differing chemical environments and configurations, share similar intrinsic activities, contributing synergistically to the overall catalytic performance. In contrast, smaller species such as Cu1 and Cu2 have lower reactivities due to diminished surface coverage or higher energy barriers. However, Cu1 and Cu2 can migrate into CeO2 lattice vacancies during catalyst preparation49,50, forming bulk-like structures and potentially altering catalytic behavior (Supplementary Figs. 7376). Our study shows that Cu1 and Cu2 doped into CeO2 (111) exhibit activity rates of 4.4 × 10−16 and 6.5 × 10−3 s−1, respectively. While the activity and activation energy (Eapp of 1.19 eV) of Cu2Ce1-xO2(111) are comparable to those of Cu3, this structure also contributes collectively to CO oxidation activity, similar to those of Cu3 and Cu8. The synthesis of such solid solution structures further enhances the CO oxidation performance, illustrating an additional collective effect in Cu/CeO2 catalysis.

Discussion

We calculated key macroscopic observables for CO oxidation over Cu/CeO2 catalysts for direct comparison with experimental data, with a focus on the CeO2(111) surface. Future work will extend these studies to explore the catalytic properties of the (110) and (100) facets to understand facet-specific effects. The key parameters, including the Cu–O and Cu–Cu bond lengths (d(Cu‒O) and d(Cu‒Cu)), average coordination number (CN), Cu charge state (Q), overall reaction rate (Ro), and Eapp, were calculated (Supplementary Figs. 7780 and Supplementary Table 13). For the Cu/CeO2 system, the average d(Cu‒O) was 1.92 Å, with a CN of 3.55 and a Q of approximately +0.98. The calculated reaction rate for CO oxidation was 0.011 s−1, corresponding to 1.72 × 10−6 molCO gcat−1 s−1 for catalysts with 1% Cu loading and an Eapp of 0.63 eV.

Although we focused on equilibrium distributions, our study also considered deviations to better capture catalytic activity across varying cluster sizes and reaction environments. Cu loading and cluster size significantly influenced Eapp, which decreased from ~1.22 eV (dominated by Cu3) to ~0.18 eV (dominated by Cu8). This correlates with an increase in the reaction rate from 10−4 s−1 to 10−2 s−1 as the Cu3 and Cu8 concentrations increase, whereas their depletion drastically reduces the activity (Supplementary Fig. 81). The experimentally measured Eapp values (0.58–0.77 eV) align well with our simulations for systems containing both Cu3 and Cu8 clusters. In contrast, theoretical models based on a single most stable structure, such as Cu8(CO)2O8, underestimate Eapp for CO oxidation, whereas Cu3O4 overestimates it. Both structures lack the MvK reaction mechanism, which highlights the importance of collective structures across different cluster sizes. This insight advances our understanding of the mechanistic collectivity in CO oxidation, where both cluster and interface reactions are important. These findings help explain discrepancies in experiments that suggest either a cluster surface42,44 or an interface reaction mechanism48,51,52,53, depending on the distribution of Cu species and the reaction conditions. The results of the reaction mechanism calculations revealed two concurrent redox cycles—Cu⁺ Cu2+ triggered by surface CO/O2 activation and Ce4+ Ce3+ driven by lattice‒oxygen exchange at the interface (Supplementary Figs. 7980 and Supplementary Table 14)—in quantitative agreement with the experimental results41,42,44.

Direct experimental identification of the collectivity effect and site distribution under operating conditions remains a significant challenge owing to the transient nature of metastable isomers and the dynamic evolution of clusters in heterogeneous catalysis. In contrast, the introduced first-principles microkinetics-based statistical strategy provides a robust approach to address these challenges, offering a systematic and predictive understanding of cluster catalysis beyond the limitations of direct experimental observation. By considering the statistical distributions of isomers and active sites, our models align better with the experimental data, with the measured values for CN(Cu-Cu), CN(Cu-O), d(Cu-O) and d(Cu-Cu) closely matching the theoretical results, with deviations of less than 10%, except for the Cu‒Cu bond length, which deviates by less than 20% (Fig. 4f and Supplementary Table 15)42,44,48,51,52,53,54,55,56,57. The agreement between the theoretical and experimental results emphasizes the robustness of the collective active sites in Cu/CeO2 systems, providing a solid foundation for optimizing catalysts. We propose that synthesizing Cu3 and Cu8 clusters on CeO2(111) through mass-selected cluster deposition or atomic layer deposition enhances CO oxidation activity, with further improvements possible by incorporating Ce vacancy sites with Cu2 clusters. Our work challenges the traditional focus on single, well-defined cluster sizes, instead promoting catalyst designs that leverage the collective effects of multiple cluster sizes, isomers, and sites for more robust catalytic performance under realistic conditions.

In summary, by developing a machine learning-enhanced statistical and multiscale simulation framework, we revealed a collective effect of active sites across the surface and interface ensembles over various cluster sizes, compositions, and isomers on the overall activity of supported subnanometer cluster catalysts. This framework encompasses structure distribution under operational reaction conditions, statistical site analysis, pathway calculations, microkinetic modeling, and the development of descriptive models, providing an in-depth scenario with full details that moves beyond traditional approaches. Taking the Cu/CeO2 system as an example, our study demonstrates that the collective behavior of numerous sites across various sizes, compositions and isomers drives CO oxidation, offering a comprehensive and holistic understanding of the catalytic process. The origin of the collectivity of the sites stems from their substantial intrinsic activity and the population of each individual site. Theoretical predictions from our framework have been validated against experimental data from mechanical, structural and kinetic points of view, reinforcing the robustness of our approach. This work not only deepens our understanding of the structural dynamics of supported cluster catalysts but also redefines the concept of active sites in heterogeneous catalysis, paving the way for the design of more efficient catalytic systems.

Methods

DFT calculations

All spin-polarized DFT calculations were performed via the Vienna Ab Initio Simulation Package (VASP 5.4.1) code58. The projector augmented wave (PAW)59 pseudopotentials and the Perdew–Burk–Eznzerhof (PBE) exchange correlation functionals60 were adopted. The Brillouin zone sampling was restricted to the Γ point with a 1 × 1 × 1 mesh and a Monkhorst-Pack61 2 × 2 × 1 mesh. An energy cutoff of 400 eV was used in the structure optimization. The geometric structure convergence threshold was set to 10–4 eV, with the optimization considered to have converged when the forces on each atom were less than 0.05 eV/Å. To account for the electron localization in the Ce 4 f orbitals, the DFT + U method with Ueff = 5 eV62 was employed.

Two O-Ce-O layers were employed for determining the structures and potential energy surfaces, with relaxation applied to the topmost O-Ce-O layer. A p(4 × 4) CeO2(111) surface cell was employed. In addition, in CuO surface systems, there are three O‒Cu‒O layers on the Cu2O and CuO surface structures, with the topmost layer relaxed. For the CuO and Cu2O systems, a p(2 × 2) supercell configuration was employed. A 15 Å vacuum spacing between adjacent slabs was used to avoid self-interaction. In this study, the strong correlation between Cu and O in the Cu/CeO2 catalyst was disregarded, given that the effective Ueff value of 6.52 eV63 used for the representative Cu3O4/CeO2 and CuO(111) surface calculations does not influence our conclusions regarding the diagram and activity trend (Supplementary Figs. 8286). Benchmark calculations indicate that varying the U value for Cu does not influence the trends in stability for the most stable or metastable structures of the representative Cu3Ox/CeO2 system (Supplementary Table 16), confirming the robustness of our stability assessments. Consequently, the U value for Cu was omitted in all the Cu cluster calculations. ANNPs have been rigorously tested and validated for Cu/CeO2 systems across a range of cluster sizes, resulting in errors of less than 4 meV/atom, ensuring the reliability and consistency of the predicted energies, structural parameters, and magnetic states (Supplementary Table 17). All the structures obtained through the ANNP search process were subsequently verified via DFT calculations for accuracy.

Transition states were identified with the automated nudged elastic band method64 and dynamic nudged elastic band method (dy-neb)65 to produce a good initial guess via the image-dependent pair potential (IDPP) surface method66 to locate transition states with ANNPs as initial structures and identified with a force tolerance of 0.05 eV/Å via multiple methods, including the climbing image nudged elastic band (CI-NEB) method67, improved dimer method68,69, and force reversal method70. Vibrational mode analysis was also conducted to validate the identified transition states.

Ab initio molecular dynamics (AIMD) simulations were performed to calculate the radial distribution function (RDF) of different sizes of Cu clusters supported on CeO2. The Nose‒Hoover thermostat71 was employed for NVT ensemble sampling at 400 K given the short physical duration of 10 ps in the AIMD simulations.

Bader charge analysis72, as implemented by the Henkelman group, was performed. The electronic analysis of the crystal orbital Hamilton populations (COHPs)73,74 was conducted by LOBSTER. Furthermore, the relationship between the transition state energies and the molecule adsorption energies was obtained via Sure Independence Screening and Sparsifying Operator (SISSO)43.

The framework

The framework employs a multiscale simulation approach that combines embedded atom neural network (EANN) potentials75 trained via active learning on a dataset generated via grand canonical Monte Carlo (GCMC) and ab initio molecular dynamics (AIMD) simulations to accurately and efficiently model the complex potential energy surfaces of Cu/CeO2 catalysts. Genetic algorithm (GA) was used for global structure optimization, whereas modified GCMC (M-GCMC) simulations were applied to identify metastable structures under reaction conditions. Finally, microkinetic modeling was used to simulate the CO oxidation reaction kinetics and determine the apparent activation energies and dominant reaction mechanisms.

More details of the framework methods, such as ANNPs, the GA, GCMC simulations, and microkinetic modeling, can be found in the Supplementary Information.