Introduction

Carbon-based nanomaterials, including fullerenes, nanotubes, and graphene, are widely utilized in various fields for their unique properties that arise from their aromatic carbon frameworks1,2,3,4,5. Among them, fullerenes are polyhedral cage molecules with the general molecular formula C20+2n(n ≥ 0, n ≠ 1) composed of sp2 hybridized carbons arranged in pentagon and hexagon rings. The variation in the arrangement of these rings across the fullerene cages leads to numerous isomeric structures for each fullerene size. For example, C60 fullerene adopts 1812 non-isomorphic structures with different distributions of the 12 pentagon and 20 hexagon rings on a spherical surface. The number of possible isomers for fullerene cages with N = 20 + 2n carbon atoms increases at a rate of O(N9)6, creating an extensive family of fullerenes with a vast array of isomers. The unique zero-dimensional spherical cage structures of fullerenes contribute to their remarkable physical and chemical characteristics, such as high thermal and electrical conductivity, extraordinary tensile strength, and efficient electron donation and acceptance abilities7. Owing to their structural diversity that offers tunable electronic properties and reactivities, fullerenes have been extensively investigated as components in optoelectronics8,9,10,11, solar cells12,13,14, gas storage and separation15,16, biology and medicine17,18,19. Although fullerenes have significant untapped potential in various applications, fully harnessing their capabilities—especially in customizing electronic properties and functionalization for practical uses—requires a thorough understanding of the structure-property relationships across the entire spectrum of fullerene varieties.

The investigation of fundamental properties across different fullerene isomers has been the focus of numerous earlier studies20,21,22,23,24,25,26,27,28. For example, Rebecca et al. investigated the relative energy distribution among the 1812 isomers of C60 fullerene with the DFT method and correlated the isomeric stability with a variety of topological indices, electronic, and geometric properties. Aside from the common isolated pentagon rule (IPR), they proposed that a small pentagon signature P1, a large volume, and a more spherical cage can lead to a relatively stable isomer23. Zhao et al. explored the origin and characteristics of isomeric stability in four fullerene systems, C44, C48, C52, and C60, using DFT simulations. Through energy decomposition analysis, they found that electrostatic potential is the primary determinant of isomeric stability, surpassing other factors such as steric and quantum effects. They further highlighted the importance of spatial delocalization of the electron density on the stabilization of fullerene isomers29. Chan and Karton studied the size dependence of a variety of electronic properties in small fullerenes from C20 to C50 with 812 isomeric structures, including the energy difference between the lowest-energy singlet and triplet states (i.e., S0 and T1, or T0 and S1), and the related quantities of ionization potential (IP) and electron affinity (EA). They found a linear correlation between triplet-singlet energy difference and IP-EA difference and further surveyed larger fullerenes in search of candidates with superior charge-transfer properties30. Based on our comprehensive literature review, we observed that the previous research into the structure-property relationships of fullerenes has been grounded in selected datasets that represent an incomplete array of fullerene isomers, with the bulk of these studies primarily focusing on stability analysis. To the best of our knowledge, there remains a gap in the literature for a comprehensive and accurate examination of the fundamental properties of all fullerene isomers from C20 to C60.

In this study, we have compiled the most extensive computational dataset of C20–C60 fullerenes to date, incorporating a total of 5770 structures. Following our benchmark study, we calculated 12 fundamental properties with DFT-level accuracy, e.g., fullerene binding energy (Eb), HOMO-LUMO gap (Eg), and partition coefficient of solvation free energies in 1,2-dichlorobenzene (ODCB) and water phases as listed in Table 1, and evaluated their Pearson correlation coefficients. The weak correlations between Eg and both Eb and logP suggest that favorable electronic properties can be independently adjusted for particular applications without compromising stability and solubility. To delve deeper into the structural features influencing these properties, we introduced various topological indices and geometric measures, going beyond the IPR. We reveal that compared to pentagon-related features, atom, bond, and hexagon features demonstrate significantly superior effectiveness in capturing the intricate local structural environments of the carbon atoms on the spherical cage. The linear models fitted by those features enable the predictions of the fullerene stability across various sizes and show robust transferability to larger-size fullerenes beyond C60. We further offer a practical guide for researchers interested in screening fullerenes for optimal electron acceptor candidates in organic solar cells. Given the prevalence of data-driven models developed in recent decades, we emphasize that our comprehensive datasets are crucial for model training and new materials discovery in the fullerene field. Our study provides a deep understanding of the structure-property relationships across fullerene isomers, setting a foundation for future advancements in fullerene functionalization and applications in energy conversion and nanosciences.

Table 1 The summary of 12 fundamental properties for the 5770 C20–C60 database calculated at B3LYP level

Results and discussion

Stability analysis

We calculated the binding energy per carbon atom using the atomic carbon in the triplet state as the reference31, which serves as one indicator of the thermodynamic stability of 5770 fullerene structures. However, other parameters to assess thermodynamic stability, such as addition energy32 and activation barrier33, which require more computationally intensive calculations, were not included in this study.

$${E}_{b}({C}_{n})=\frac{1}{n}({E}_{{C}_{n}}-n{E}_{C})$$
(1)

\({E}_{{C}_{n}}\) is the total energy of fullerene Cn with n carbon atoms, whereas EC is the energy of an isolated carbon atom at triplet state. The binding energy distribution indicates a trend of decreasing binding energies as the fullerene cage size increases, as depicted in Fig. 1a. Within the dataset, the renowned C60–Ih buckminsterfullerene (C60–#1), which possesses icosahedral symmetry and is the sole isomer conforming to the IPR, exhibits the lowest Eb of −6.96 eV/atom, comparable to the experimental value of −7.04 eV/atom34. Conversely, the C20–#1 fullerene, comprised solely of 12 pentagons devoid of hexagons, exhibits the highest Eb of −6.12 eV/atom. The statistical analysis of the Eb values is detailed in Table S1, together with the molecular geometries of fullerene isomers shown in Fig. S1.

Fig. 1: Size-dependence analysis on 6 fundamental properties.
Fig. 1: Size-dependence analysis on 6 fundamental properties.The alternative text for this image may have been generated using AI.
Full size image

Violin plots for the distributions of a binding energy Eb (eV/atom), b HOMO-LUMO gap Eg (eV), c dipole moment μ (D), d, e solvation free energy in water ΔGsol(water) and 1,2-dichlorobenzene (ODCB) ΔGsol(ODCB) (kJ/mol), f ODCB–water partition coefficient logP from fullerene C20 to C60.

To assess the accuracy of the DFT-calculated energies, we benchmarked the relative energies of ten C60 isomers compared to C60–#1 with the relative energies from DLPNO-CCSD(T)/CBS* reference results adopted from Rebecca et al.’s work23. The geometries of these ten isomers are shown in Fig. S2. The DLPNO-CCSD(T)/CBS* relative energies are expected to have an error of ±0.5 kcal/mol, which can be used for benchmark35,36. The comparison reveals that the outcomes obtained using B3LYP-D3/6-311G* exhibit a consistent ranking of relative energies for the ten fullerene isomers with small deviations when benchmarked against the DLPNO-CCSD(T)/CBS* results (Table S2). To ensure statistical significance, we further selected one isomer from each fullerene size from C20 to C58 for Eb calculations at the B3LYP-D3/6-311G* level to benchmark against DLPNO-CCSD(T)/CBS*. As shown in Table S3, the relative binding energies (ΔEb) with respect to C60–#1 obtained using B3LYP/6-311G* are consistent with the DLPNO-CCSD(T)/CBS* results across a diverse set of fullerenes.

The histogram illustrating the binding energy distribution among the 5770 fullerene structures reveals that 80% are within the energy range of −6.85 to −6.72 eV/atom, with an average binding energy of −6.79 ± 0.06 eV/atom (Fig. 2a). This indicates that most of the fullerene structures are much more stable than C20. Prior research has shown that C20 can be synthesized in the gas phase37, indicating the possibility of synthesizing other more stable fullerene structures. In addition, other smaller fullerenes like C36 have been synthesized experimentally using the arc-discharge method38. Functionalization, both endohedral and exohedral, has been shown to enhance the stability of non-IPR fullerene isomers, exemplified by compounds such as C50Cl1039, C60Cl12, and C60Cl840. We anticipate that our comprehensive computed binding energy will provide a reliable benchmark for future theoretical and experimental research into fullerene synthesis and functionalization.

Fig. 2: Histogram distributions of 6 fundamental properties.
Fig. 2: Histogram distributions of 6 fundamental properties.The alternative text for this image may have been generated using AI.
Full size image

Histogram plots for the distributions of a binding energy Eb (eV/atom), b HOMO-LUMO gap Eg (eV), c dipole moment μ (D), d, e solvation free energy in water ΔGsol(water) and 1,2-dichlorobenzene (ODCB) ΔGsol(ODCB) (kJ/mol), f ODCB–water partition coefficient logP from fullerene C20 to C60. The star marks highlight the values of C60–#1.

Electronic property analysis

Frontier molecular orbitals, i.e., HOMO and LUMO levels, along with HOMO-LUMO gap (Eg), are critical properties in evaluating the electronic and optical properties, electric conductivity, charge transport, and chemical reactivity of materials. Specifically, understanding the distribution of these fundamental properties across various sizes of fullerenes is essential for designing and optimizing fullerenes and their derivatives with specific requirements, such as improved conductivity, enhanced light absorption, or more favorable reactivity. In contrast to the distribution of Eb values, the distribution of Eg exhibits a lower dependence on the size of the fullerene cage, as shown in Fig. 1b. The histogram plot in Fig. 2b reveals that approximately 80% of Eg values are within the range of 0.97–1.54 eV, with the maximum gap being 2.72 eV (C60–#1) and minimum at 0.41 eV (C60–#1748), calculated at the B3LYP level (also see Table S1 and Fig. S1). The distributions of HOMO and LUMO also exhibit weak dependence on the fullerene size, and the details can be found in Fig. S3.

Currently, fullerene C60, C70, C84 and their derivatives have been widely explored as the acceptor to pair with conjugated polymer donors in organic solar cells (OSCs)12,41,42,43. To offer a practical guide for researchers in the field of OSCs, we applied Scharber’s model44,45,46 to assess the photovoltaic performance of the fullerenes. The power conversion efficiency (PCE) of an OSC device under sunlight irradiation is determined by open-circuit voltage (VOC), short-circuit current density (JSC), fill factor (FF), and the power of incident light Pin with:

$$PCE=\frac{{P}_{out}}{{P}_{in}}=\frac{{V}_{OC}{J}_{SC}FF}{{P}_{in}}\times 100 \%$$
(2)

Pin is calculated by integrating the AM1.5 spectra47 across the wavelength, resulting in an approximate value of 1000 W/m2. VOC can be determined by the HOMO level of polymer donor (\({E}_{HOMO}^{D}\)) and the LUMO level of fullerene acceptor (\({E}_{LUMO}^{A}\)) as follows:

$${V}_{OC}=\frac{1}{e}| {E}_{HOMO}^{D}-{E}_{LUMO}^{A}| -0.3$$
(3)

JSC is assumed to be the current from absorbing all the incident photons above the Eg of the conjugated polymer donor as follows:

$${J}_{SC}=EQE\mathop{\int}\nolimits_{\!\!{E}_{g}}^{\infty }e{\Phi }_{ph}(E)dE$$
(4)

in which the external quantum efficiency EQE is set to 0.65, and Φph represents the incident solar photon flux density as a function of energy E. In addition, FF in eq(2) is set to 0.65. Furthermore, in fullerene-based OSCs, energy offsets exceeding 0.3 eV between the HOMO levels of the polymer donor and fullerene (ΔEHOMO); as well as between the LUMO levels of the polymer donor and fullerene (ΔELUMO), are essential to provide the necessary driving force for efficient exciton dissociation42. Scharber’s model has been extensively applied in high-throughput screening and the design of OSC device48, such as the Harvard Clean Energy Project database introduced by Hachmann et al.46.

Here we employ the widely used polymer P3HT as an example of donor material. We utilized an oligomer comprising 8 thiophene rings to depict the polymer, while also substituting all alkyl side chains with hydrogen atoms as shown in Fig. 3b, which exhibits a HOMO level of −4.85 eV and a LUMO level of −2.48 eV at B3LYP level from our calculations. Accordingly, the ideal fullerene acceptor should possess a HOMO level of approximately −5.15 eV and a LUMO level of −2.78 eV (inset of Fig. 3a). The estimated PCE of C60–Ih (C60–#1) based OSC is about 8.1%, which is consistent to the experimental value of ~5%49,50. By applying the aforementioned requirements as screening conditions to the fullerene database, 2771 out of 5770 fullerene isomers could be potential candidates to pair with P3HT in OSC devices (see Fig. 3a). The highest PCE values can reach 9.6%, achieved by C20–#1, as well as the next three top-performing fullerene structures showed in Fig. 3b.

Fig. 3: Estimated efficiency of fullerene-based organic solar cells.
Fig. 3: Estimated efficiency of fullerene-based organic solar cells.The alternative text for this image may have been generated using AI.
Full size image

a The power conversion efficiency (PCE) of fullerene-based OSCs with P3HT as donor estimated by Scharber’s model (inset: Energy level diagram for ideal fullerene acceptors aligned with P3HT polymer donor in OSC devices. The HOMO level offsets (ΔEHOMO) and LUMO level offsets (ΔELUMO) between P3HT and fullerenes are ≥0.3 eV). b Chemical structures of P3HT oligomer and four fullerenes with the highest PCE values predicted by Scharber’s model (plotted with Jmol).

Dipole moment stands as another critical property for OSC materials, which affects intermolecular interactions and the morphology of the organic material film. Previous studies have highlighted the effectiveness of strategically selecting and designing materials with appropriate dipole moments to enhance charge separation, promote efficient exciton dissociation, optimize energy level alignment at interfaces, and consequently improve the overall efficiency and stability of OSCs51,52,53,54. The dipole moment distribution of fullerenes from C20 to C60 also shows weak dependence on the size of the fullerene cage (Fig. 1c). The majority of fullerenes display a permanent dipole moment that is non-negligible. For example, among 5770 fullerene structures, only around 200 exhibit a zero dipole moment, as shown in Fig. 2c. Notably, fullerene C60–#1646 exhibits the largest dipole moment of 3.98 D among all the fullerenes (see Table S1).

In addition to Eg, other fundamental quantities related to electronic properties have also been computed, such as the fundamental gap Efund between IP and EA, and the energy difference between triplet and singlet states ΔETS, which are also crucial for the stability and catalytic activity of materials. As shown in Fig. S4, Eg values show strong correlations with both Efund and ΔETS across the C20–C60 database, with R2 values exceeding 0.9, whereas the linear correlation between the fundamental gap and triplet-singlet energy difference exhibits cage size dependence with a lower R2 of 0.774. Indeed, Table S4 reveals that, for fullerene cages of the same size, R2 values for the Eg, Efund, and ΔETS all exceed 0.94, with the exception of C36, which are consistent with the previous studies30.

Besides its significance in photovoltaic applications, Eg is also connected to other essential properties in electrochemistry. For instance, Eg is linked to the redox potential, a key factor in processes such as cyclic voltammetry (CV), energy storage in batteries and fuel cells, and electrocatalysis. Previous studies have demonstrated that, across a series of five fullerenes, the experimentally determined redox potential strongly correlates with the calculated Eg55. We are confident that our extensive dataset of fullerenes, with computed HOMO, LUMO, and Eg, can assist in the selection of potential fullerene candidates for solar cells and electrochemical applications.

Solubility and ODCB–water partition coefficient analysis

Solubility is an essential factor for the practical utilization of fullerenes. It is widely recognized that fullerenes are insoluble in water but exhibit high solubility in organic solvents56,57. For example, the solubility of C60 is 24 g/L in 1,2-dichlorobenzene (ODCB) solution, compared to 1.3 × 10−11 g/L in water57. In this work, we calculated the Gibbs free energy of solvation (ΔGsol) in ODCB and water for our fullerene dataset. The ΔGsol is computed as the electronic energy difference between the fullerene molecule in solvent phase E(solvent) and gas phase E(gas) as given in the equation with other terms in free energies ignored:58

$$\Delta {G}_{sol}=E(solvent)-E(gas)$$
(5)

We observed a linear correlation between both ΔGsol(ODCB) and ΔGsol(water) with the size of fullerene cage; larger sizes correspond to lower solvation energies (Fig. 1d, e). Previous studies have also found that the solvation free energies of carbon allotropes, including C60 and carbon nanotubes, exhibit a negative linear correlation with their surface area59. The correlation between solvation free energy and geometric measures will be addressed in a later section.

The histogram depicting the distribution of ΔGsol(water) in Fig. 2d indicates that around 80% of ΔGsol(water) values are within the range of −48.81 to −42.10 kJ/mol with an average value of −45.74 ± 2.88 kJ/mol (also see Table S1). The maximum ΔGsol(water), observed for C20–#1 is −21.32 kJ/mol, while C56–#3 exhibits the minimum at −67.74 kJ/mol. Particularly, the ΔGsol(water) value for C60–#1 molecule is −43.85 kJ/mol. Previous research using various computational methodologies reported values are −55.27 kJ/mol by Varanasi et al.60, −54.1 kJ/mol by Garde et al.61, −50.9 kJ/mol by Kevin et al.59, −36.10 kJ/mol by Muthukrishnan et al.62, −18.4 kJ/mol by Graziano63, and −2.9 kJ/mol by Evgeny et al.64 It is important to note that these values exhibit significant variance, with most diverging considerably from the experimental value of −17.4 kJ/mol64, underscoring the challenges in accurately predicting the ΔGsol(water) values for fullerenes. In contrast, the histogram illustrating the distribution of ΔGsol(ODCB) in Fig. 2e shows a skewed pattern, suggesting a pronounced preference of fullerene molecules for organic solvents. Approximately 80% of the value falls within the range of −162.08 to −139.90 kJ/mol (Fig. 2e). Similar to ΔGsol(water), C20–#1 and C56–#3 exhibit the highest and lowest ΔGsol(ODCB) values of −76.21 and −168.41 kJ/mol, respectively (see Table S1). Meanwhile, C60–#1 shows a relatively low ΔGsol(ODCB) value of −158.58 kJ/mol.

To indicate the partitioning of fullerenes between ODCB and water phases, we calculated the ODCB–water partition coefficient logP as given in equation:65,66

$$\log P\,=\,-\frac{{\Delta G}_{sol}(ODCB)-{\Delta G}_{sol}(water)}{2.303RT}$$
(6)

in which R is the gas constant (8.31 J/(mol K)) and T is the room temperature (298.15 K). The partition coefficient serves as an indicator for predicting the lipophilicity of molecules. A higher logP value suggests a stronger propensity for dissolution in non-polar organic solvents as opposed to the polar aqueous phase. For fullerenes, logP demonstrates a significant correlation with cage size, as illustrated in Fig. 1f. Fullerene molecules with favorable solubility in ODCB typically exhibit logP values significantly greater than 0. Among the 5770 fullerene isomers, 80% of logP values fall in the range of 17.1–19.99 with an average value of 18.8 ± 1.32 (Fig. 2f). The maximum and minimum logP values are 20.22 for C60–#1790 and 9.61 for C20–#1, respectively (see Table S1). The logP value for C60–#1 stands at 20.09, with ΔGsol(water) and ΔGsol(ODCB) calculated as −43.85 and −158.58 kJ/mol, respectively, signifying a pronounced preference of C60–#1 molecule for the ODCB solvent over water. This finding is consistent with the previous studies. For example, Evgeny et al. reported the ΔGsol(water) and ΔGsol(ODCB) as −2.9 and −123.9 kJ/mol, respectively, calculated by effective Hamiltonian methods, which gives the logP value of 21.19. They also obtained the experimental solvation free energies of −17.4 and −127.6 kJ/mol for C60–#1 in water and ODCB, which results in a logP value of 19.2964. Our findings suggest that while the absolute ΔGsol values may vary from experimental data, the logP value–derived from the relative difference between ΔGsol(water) and ΔGsol(ODCB)–consistently reflects the lipophilicity of molecules.

For the practical application of fullerenes in energy conversion, storage, electrochemistry, and nanoelectronics, it is essential to concurrently evaluate multiple key properties. Here we examine the Pearson correlation coefficients (r) among 12 fundamental properties for C20–C60 fullerenes (see Fig. S5). Particularly, the correlations among stability, electronic properties, and solubility were assessed by calculating the r values among Eb, Eg, and logP. As shown in Fig. 4a, Eb is inversely related to logP with r = −0.79. This suggests that the greater stability, indicated by a lower binding energy, is associated with enhanced solubility in the ODCB phase. In comparison, the Eg value exhibits a weak correlation with both binding energy and logP, with r = 0.13 and −0.24, respectively. These findings imply that it is possible to attain high stability, desirable solubility, and an optimal Eg concurrently for targeted applications. The density plot in Fig. 4b encapsulates the correlation between these properties and facilitates the identification of potential candidates for customizing properties to suit specific applications. For instance, when considering four fullerenes with the highest PCE values predicted by Scharber’s model (see Fig. 3b), C52–#333 and C60–#60 emerge as more suitable fullerene acceptors compared to C20–#1 and C32–#5 owing to their superior stability and solubility.

Fig. 4: Correlation analysis among stability, electronic property, and solubility of fullerenes.
Fig. 4: Correlation analysis among stability, electronic property, and solubility of fullerenes.The alternative text for this image may have been generated using AI.
Full size image

a Pearson correlation coefficients and b contour plot of binding energy Eb (eV/atom), HOMO-LUMO gap Eg (eV), and logP. The yellow dots highlight the values of four fullerenes with the highest power conversion efficiency (PCE) values predicted by Scharber’s model. (The chemical structures are depicted in Fig. 3b).

Correlations of topological features and indices with fundamental properties

So far, we have constructed a comprehensive dataset of C20–C60 fullerenes with 5770 structures and provided 12 DFT-level fundamental properties, including Eb, Eg, and logP. Previous studies have shown that the IPR can be employed to identify a set of stable isomers among all possible isomers for a given fullerene size67. Additionally, topological indices such as the first-moment pentagon signature P1 and the second-moment hexagon signature H2 have also been used to pinpoint the most stable isomers from a set of IPR-compliant isomers for a given fullerene size23,68. However, to the best of our knowledge, a generalized rule for estimating the stability, electronic properties, and solubility across the entire spectrum of potential fullerene isomers of varying sizes has not yet been established. Leveraging the comprehensive dataset computed in our study, we will examine the Eb, Eg, and logP values with various topological features, indices, and geometric measures.

To characterize the chemical environment of carbon atoms, C–C bonds, pentagons, and hexagons distributed across the fullerene cage, we introduced 4 types of topological features, as described in the “Methods” section. Linear regression has been used to fit Eb, Eg, and logP using these topological features (Fig. 5 and Table 2). We observed that the atom, bond, and hexagon features exhibit substantial linear correlations with Eb, resulting in R2 values exceeding 0.96 (Fig. 5a, b, d). For example, the linear fitting between Eb and atom features, i.e., n0, n1, n2, and n3, can be expressed as follows:

$${E}_{b}\ (eV/atom)={c}_{0}{n}_{0}+{c}_{1}{n}_{1}+{c}_{2}{n}_{2}+{c}_{3}{n}_{3}+b$$
(7)

We found that c0 equals −4.9 meV/atom, indicating that the incorporation of an additional carbon atom fused by three hexagons (thereby increasing n0 by 1) into fullerene cage will negligibly decrease Eb by 4.9 meV/atom. This result aligns with the fact that the growth of sp2 carbons in a honeycomb lattice is thermodynamically favorable69,70,71. We have also noted that the coefficients c1, c2, and c3 are positive and proportional to each other, with a correlation of \({c}_{1}=\frac{1}{2}{c}_{2}=\frac{1}{3}{c}_{3}\). Additionally, c1 is approximately 13 orders of magnitude larger than the absolute value of c0. Consequently, the linear fitting formula can be adjusted as follows:

$${E}_{b}\ (eV/atom)={c}_{1}\left(\frac{{c}_{0}}{{c}_{1}}{n}_{0}+{n}_{1}+2{n}_{2}+3{n}_{3}\right)+b$$
(8)

Given that n1, n2, and n3 denote the count of carbon atoms fused to 1, 2, and 3 adjacent pentagons, respectively, Eb will increase proportionally with the number of carbon atoms fused in pentagons, suggesting that the substitution of a hexagon with a pentagon will significantly diminish the thermodynamic stability of fullerene structures. Furthermore, the relationship of \({c}_{1}=\frac{1}{2}{c}_{2}=\frac{1}{3}{c}_{3}\) implies that as multiple pentagons fuse together (denoted by n2 and n3), Eb will increase dramatically, with the slope doubling or tripling. These findings are consistent with the IPR concept that fullerene isomers with isolated pentagons tend to be more stable. Our interpretations of the values of c0, c1, c2, and c3 contribute to a deeper comprehension of the relationship between structure and stability in fullerene molecules ranging from C20 to C60. In contrast, the pentagon features display a lower R2 of 0.85 (Fig. 5c). The slightly inferior fit from pentagon features stems from their dependence on cage size. This is evidenced by the results obtained from analyzing only 1812 C60 isomers, where the R2 value improved to 0.92 with pentagon features (Fig. S6 and Table S2).

Fig. 5: Linear fitting between topological features and fundamental properties of fullerenes.
Fig. 5: Linear fitting between topological features and fundamental properties of fullerenes.The alternative text for this image may have been generated using AI.
Full size image

The parity plot of DFT-calculated binding energy (Eb (eV/atom)) (left panel), HOMO-LUMO gap (Eg (eV)) (middle panel), and ODCB–water partition coefficient (logP) (right panel) of all 5770 C20–C60 fullerenes versus predicted values of linear regression based on ac atom features, df bond features, gi pentagon features, and jl hexagon features. The color bar indicates varying fullerene sizes.

Table 2 Coefficient of determination (R2) obtained from linear fitting of topological features and geometric measures with fundamental properties, i.e., binding energy (Eb (eV/atom)), HOMO-LUMO gap (Eg (eV)), and ODCB–water partition coefficient (logP) for the 5770 C20–C60 database

To evaluate the transferability of the linear fitting model–defined as its capacity to learn and apply the correlations between topological features and binding energy in the dataset of C20–C60 fullerenes to larger ones–we utilized the linear fitting equations derived from the C20–C60 dataset to predict the Eb values of 15 larger IPR-conforming fullerenes (C70–C100, excluding C88) sourced from Yoshida’s Fullerene Library (see Fig. S7)72. The predictions obtained from atom, bond, and hexagon features demonstrate a consistent trend of decreasing Eb values with the increase in fullerene cage size, with Eb values closely matching the results from B3LYP-D3/6-311G* calculations (Table 3). This consistency underscores the robust transferability of the linear model fitted by the C20–C60 dataset. When applying the linear model based on pentagon features, we observed that it assigned identical Eb values to these 15 IPR-conforming fullerenes. In essence, pentagon features fail to distinguish the stability of IPR-conforming fullerene molecules, as they all share the same p0 value of 12 with other pentagon-related features (p1 to p7) set to 0 (Table S6). Such findings highlight the limitations of relying solely on pentagon features, as well as the IPR, for capturing the nuanced chemical environments of fullerenes across a range of sizes and topologies.

Table 3 The binding energies (Eb) of 15 large-size fullerenes (Cn) calculated with B3LYP-D3/6-311G* compared to the values predicted by linear fitting models with atom, bond, pentagon, and hexagon features respectively across C20–C60 fullerenes, together with various statistical measures, i.e., the mean deviation (MD), the overall mean absolute deviation (MAD), the standard deviation (SD), and the maximal absolute error (MAX)

According to the previous study, the first-moment pentagon signature P1, the first and second-moment hexagon signatures H1 and H2 were highly correlated with the stability of C60 fullerenes23. We further examined these three signatures for the extended dataset from C20 to C60, and broadening the properties from binding energy to include Eg and logP (Table S7 and Fig. S8). Notably, within the group of fullerenes with identical size, a robust linear correlation emerges between topological signatures and the binding energies of fullerenes (Fig. S9), aligning with findings from previous reports23. For example, the linear relationship between Eb and P1 in 1812 C60 database is described as:

$${E}_{b}\ (eV/atom)=-6.9583+0.0167\,{P}_{1}$$
(9)

The Eb value of C60–#1 predicted by linear fitting is −6.9583 eV/atom as P1 equals 0, closely matching the DFT-calculated value of −6.96 eV/atom. Introducing each pentagon-pentagon fusion (thereby incrementing P1 by 1) results in an energy penalty of 0.0167 eV/atom (0.385 kcal/(mol atom)) to Eb, equivalent to approximately 23.1 kcal/mol for C60 fullerenes. Our results are in good agreement with the value of 20–25 kcal/mol reported by Sure et al.23. However, it is essential to note that this linear correlation does not extend consistently across fullerenes with differing sizes (Fig. S8a–c).

Unlike Eb, Eg shows no linear relationship with both topological features (Fig. 5e–h and Table 2) and topological signatures (Fig. S8d–f and Table S7). The R2 values for the correlation between the Eg and topological features fall below 0.2. This weak correlation still persists when only C60 isomers are considered (Table S2 and Fig. S6e–h), suggesting that linear models based on topological indices cannot effectively predict the electronic properties of fullerenes.

Implied by the correlation between Eb and logP, the solubility of fullerenes also exhibits a strong linear relationship with the atom, bond, and hexagon features, sharing an identical R2 value of 0.996 (Table 2 and Fig. 5i, j, l). However, when exclusively considering C60 isomers, the respective R2 values drop to 0.297, 0.353, and 0.448 (Table S2 and Fig. S6i, j, l). These findings underscore the notable relationship between logP values and fullerene cage size, which is well captured by topological features. Yet, within fullerene of the same size, logP values do not show a linear correlation with these topological features. Regarding pentagon features, the low R2 value of 0.324 for logP in C20–C60 dataset (Fig. 5k) indicates a limited ability of pentagon features to correlate with the fullerene cage size. This is consistent with the fact that the number of pentagons in a fullerene molecule remains fixed at 12, regardless of cage size, while the counts of carbon atoms, C–C bonds, and hexagons increase with cage size. This weak correlation with logP persists when focusing exclusively on C60 (Fig. S6k).

We further evaluated the relationships between logP and topological signatures. As illustrated in Fig. S8g–i, the hexagon-derived signatures, H1 and H2, display a stronger correlation with logP compared to the pentagon-based signature P1 across the C20–C60 dataset, demonstrating that hexagon features are more adept at capturing the properties of fullerenes across different sizes than pentagon features. Nevertheless, among fullerenes of the same size, logP values do not exhibit a linear correlation with any of the topological signatures (see Fig. S9g–i).

It is noted that the sum of pentagon features equals 12 for any given fullerene structure (\(\mathop{\sum }\nolimits_{i = 0}^{7}{p}_{i}=12\)), while atom, bond, and hexagon features are associated with the carbon count. For example, the sum of atom features equals N for a fullerene structure with N carbon atoms (\(\mathop{\sum }\nolimits_{i = 0}^{3}{n}_{i}=N\)). When fitting fullerenes of varying sizes (carbon count N ranging from 20 to 60), {n0, n1, n2, n3} are linearly independent. However, for a given fullerene size, such as the C60 dataset with 1812 structures, the atom features {n0, n1, n2, n3} are linearly dependent due to the sum \(\mathop{\sum }\nolimits_{i = 0}^{3}{n}_{i}=60\) being a constant.

Correlations of geometric measures with fundamental properties

We also investigated a series of geometric measures, as described in “Methods,” and their correlations with fundamental properties. The coefficient of determination R2 from linear fitting between geometric measures and fundamental properties can be found in Table 2. Across the entire fullerene dataset, \(\bar{d}\), \(\bar{\kappa }\), and A/V ratio exhibit strong linear correlations with fullerene binding energy Eb, with R2 values of 0.946, 0.839, and 0.878, respectively (Fig. S10a, c, f). Other geometric measures, including Fasym, V, A, and DIPQ, which assess the sphericity of the cage, exhibit a dependence on the fullerene size, as evident from the clustering of Eb values across various fullerene sizes (Fig. S10b, d, e, g). For example, when considering only the 1812 C60 isomers, the R2 values of Eb fitted with Fasym, V, A, and DIPQ are 0.714, 0.85, 0.627, and 0.875, respectively (Table S2 and Fig. S11b, d, e, g), consistent with previous findings23. These findings align with the observation that fullerene cages with larger volumes and more spherical shapes undergo reduced strain, leading to enhanced stability. However, it is crucial to note that this linear relationship does not extend uniformly across groups of fullerenes with differing sizes. When considering the entire fullerene dataset, these R2 values decrease to 0.092, 0.721, 0.595, and 0.707 (Table 2 and Fig. S10b, d, e, g).

In contrast to Eb, Eg displays no correlation with all the geometric measures collected in this study, yielding R2 values below 0.1 (Table 2 and Fig. S12). Importantly, this low correlation is independent of the cage size of fullerenes (see Table S2 and Fig. S13).

Additionally, we calculated the R2 values for logP. As shown in Table 2 and Fig. S14, across the entire C20–C60 fullerene database, both V and A exhibit strong correlations with logP values of fullerenes, yielding R2 values of 0.977 and 0.995, respectively. Given that the V and A of a fullerene cage depend on its size, these findings align with our earlier discussion that logP strongly correlates with fullerene cage size. Interestingly, the R2 values of both V versus logP and A versus logP for 1812 C60 isomers significantly decrease to 0.32 and 0.064, as presented in Table S2 and Fig. S15, suggesting that V and A are inadequate to differentiate the solubility of fullerenes with identical size. \(\bar{\kappa }\) and A/V ratio also exhibit high correlates with logP values, resulting in R2 values of 0.919 and 0.85 for C20–C60, respectively (see Fig. S14c, f). The observed similarity in the distributions of logP when plotted against \(\bar{\kappa }\) and the A/V ratio suggests a strong correlation between A/V and \(\bar{\kappa }\). In fact, \(A/V=3\bar{\kappa }\) for a perfect sphere.

In summary, the stability of fullerenes with different carbon atoms can be inferred from topological attributes such as atom, bond, and hexagon features, alongside geometric measures like \(\bar{d}\), \(\bar{\kappa }\), and A/V ratio. Furthermore, there is a notable correlation between the logP values across ODCB and water phases and the cage sizes of fullerenes. While these topological attributes and geometric measures (i.e., V and A) effectively illustrate trends from C20 to C60 fullerenes, they fall short in precisely predicting logP values for fullerenes of identical sizes. Nonetheless, accurately determining the Eg through these topological features and geometric measures remains a challenge. Future work might benefit from employing more complex models beyond linear regression or incorporating additional properties, to enhance the prediction of Eg values in fullerenes. This falls beyond the scope of our current paper.

To conclude, in this study, we built up a comprehensive computational dataset for C20–C60 fullerenes with 5770 structures and computed 12 fundamental properties at DFT level. We conducted statistical analysis on stability, electronic properties, and solubility, and assessed the Pearson correlation coefficients among them. The observed weak correlations between Eg and both Eb (r = 0.13) and logP (r = −0.24) suggest that favorable electronic properties can be independently adjusted for particular applications without compromising stability and solubility. To gain a deeper understanding of the structure-property relationships in fullerenes, we introduced various multi-dimensional topological features, as well as geometric measures to further explore the nature and origin of these properties expanding beyond the commonly used IPR. Among them, atom, bond, and hexagon features prove effective in capturing the intricate local structural environments of the carbon atoms on the spherical cage. This capability facilitates precise predictions of fullerene stability across various sizes from C20 to C60 and ensures robust transferability to larger fullerene sizes beyond C60. In contrast, pentagon features, along with the IPR, demonstrate inferior capacity due to the fixed number of 12 pentagons in a fullerene molecule, regardless of the cage size. The solubility of fullerenes represented by logP values adopts a high correlation with the fullerene cage size, which is well captured by atom, bond, and hexagon features. However, within fullerenes of the same size, logP values do not show a linear correlation with the topological features. Unlike Eb and logP, Eg exhibits no linear relationship with either topological features or geometric measures, leaving its correlation with fullerene structures as an open question. Our study lays a fundamental basis for future advancements in the functionalization and practical applications of fullerenes in the fields of energy conversion and nanomaterials sciences.

Methods

Dataset construction

We constructed the fullerene dataset from C20 to C60 with a total of 5770 structures. Table S8 lists the number of possible isomers, hexagon rings, C–C bonds, and IPR isomers for each fullerene cage size. The initial structures of fullerenes from C20 to C52 with 1249 structures were obtained from the online fullerene database73, and 1812 isomer structures of C60 were adopted from the published paper23. For the remaining 2709 fullerene structures from C54 to C58, we utilized the program FULLERENE (version 4.5) to generate the initial xyz coordinates and applied a force field specifically designed for fullerenes to optimize them74. The construction of the C54–C58 dataset facilitates understanding the dependence of 12 fundamental properties on fullerene cage sizes ranging from C20 to C60. All the optimized structures from our study can be found in the Supporting Information.

DFT calculations

All the DFT calculations were performed with Gaussian 16 package75. The B3LYP hybrid functional76,77,78, together with D3 dispersion correction79 and 6-31G* basis set, were employed for geometry optimization of all 5770 fullerene isomers. The maximum force tolerance was set to 0.02 eV/Å. To further confirm that the optimized structures are equilibrium geometries, the calculation of harmonic frequencies is essential. However, the complexity of these calculations80,81 makes them impractical for the high-throughput screening studies we conducted. Nevertheless, given the initial structures are derived from the inherent symmetry of the fullerene cage, it suggests that the optimized structures are unlikely to resemble the transition state geometries. Subsequently, single-point calculations at singlet state were carried out with B3LYP functional and 6-311G* basis set to determine the energies and electronic properties. The continuum solvation model based on density (SMD) was used to obtain the solvation energy82. The dielectric constants of water and ODCB solvents are set as 78 and 10, respectively. In addition to the B3LYP functional, we have evaluated six other exchange-correlation (xc) functionals: CAM-B3LYP83, LC-ωPBE84, M0685, M06-2X85, PBE1PBE86, and ωB97XD87, to calculate Eb, Eg, and logP. As shown in Fig. S16, the B3LYP functional shows high correlation with all tested functionals in terms of Eb and logP, achieving R2 values close to 1, whereas for Eg, B3LYP demonstrates lower correlation with range-separated xc functionals (such as CAM-B3LYP, LC-ωPBE, and ωB97XD) compared to other hybrid functionals (such as M06, M06-2X, and PBE1PBE). Previous study showed that, when benchmarked against the CCSD(T) method, the B3LYP functional demonstrates greater accuracy for fullerene isomerization energies compared to other xc functionals with higher exact Hartree-Fock exchange percentage88. Thus, we applied the B3LYP functional for all the DFT calculations in this study.

Topological features and indices

We introduced 4 types of topological features, namely, atom, bond, pentagon, and hexagon features, to characterize the structure of fullerenes. Each carbon atom in the fullerene cage can form three C–C bonds, and each bond is surrounded by four polygonal rings that belong to either pentagon or hexagon. The atom features categorize carbon atoms into 4 types, denoted as {nii = 0, 1, 2, 3}, representing the count of carbon atoms fused to i adjacent pentagons (Fig. S17). Similarly, the bond features define 9 types of C–C bonds {eii = 0, …, 8} according to the number and arrangements of the 4 surrounding polygonal rings (Fig. S18). The pentagon features define 8 types of pentagons {pii = 0, …, 7} depending on the number of arrangements of the 5 abutting polygons (Fig. S19). The hexagon features define 13 types of hexagons {hii = 0, …, 12} which rely on the number and arrangements of the 6 neighboring polygons (Fig. S20). Table S9 lists the values of these 4 topological features for C20–#1, C60–#1, and C60–#1812. It is important to note that Fowler and Manolopoulos also designed pentagon \(\{p{{\prime} }_{i}| i=0,\ldots ,5\}\) and hexagon \(\{h{{\prime} }_{i}| i=0,\ldots ,6\}\) indices; however, they only considered the number of pentagons (hexagons) attached to i other pentagons (hexagons) but ignored different arrangements of the pentagons (hexagons)68,89. For instance, \(p{{\prime} }_{2}\) equals the sum of p2 and p3, whereas \(p{{\prime} }_{3}\) equals the sum of p4 and p5 (see Fig. S19). They further consolidated the pentagon and hexagon indices into a more practical set of topological indices, namely, the pentagon signature P1, and the n-th moment hexagon signature Hn using the formulas below:

$${P}_{1}=\frac{1}{2}\mathop{\sum }\limits_{i=0}^{5}ip{{\prime} }_{i}$$
(10)
$${H}_{n}=\mathop{\sum }\limits_{i=0}^{6}{i}^{n}h{{\prime} }_{i}$$
(11)

For example, the IPR-conforming C60-Ih buckminsterfullerene (C60-#1), where all the pentagons are isolated by the hexagons, yields \(p{{\prime} }_{0}\)= 12 and \(h{{\prime} }_{3}\)= 20 with all other terms equating to zero, resulting in P1 = 0, H1 = 60, and H2 = 180.

Geometric measures

We computed 7 geometric measures based on the optimized fullerene structures: average bond length (\(\bar{d}\)), volume (V), surface area (A), ratio of surface area and volume (A/V), Fowler asymmetry parameter (Fasym)90, deviation from isoperimetric quotient (DIPQ)91, and the average curvature (\(\bar{\kappa }\)) of the fullerene cage.

\(\bar{d}\) is the mean of all C–C bond lengths within the structure. V of a fullerene is determined by treating the cage as a polyhedron, composed entirely of pentagons and hexagons as its facets. Analogous to a polygon being segmented into triangles, the polyhedron fullerene cage can similarly be deconstructed into a collection of tetrahedra, with its volume being the sum of these tetrahedra. Similarly, A of a fullerene cage is the sum of the areas of the pentagons and hexagons constituting the fullerene cage, each of which can be further divided into triangles.

Fasym is calculated as follows:

$${F}_{asym}=\mathop{\sum }\limits_{i}^{N}\frac{{({R}_{i}-{R}_{av})}^{2}}{{R}_{av}^{2}}$$
(12)

in which Ri is the radial distance of each carbon atom i from the center of mass of the fullerene, Rav is the average radial distance, and N represents the number of carbon atoms in the fullerene. Fasym equals 0 when all atoms lie on an ideal spherical fullerene cage, such as C60–Ih (C60–#1).

DIPQ is calculated from the volume and surface area as follows:23

$${D}_{IPD}=1-\frac{36\pi {V}^{2}}{{A}^{3}}$$
(13)

DIPD is dimensionless and equals 0 for a perfect sphere.

The curvature at each carbon atom is defined as follows: in a given fullerene molecule, a carbon atom, designated as the central carbon, forms a tetrahedron with its three neighboring atoms. This tetrahedron uniquely dictates a circumsphere where all four carbon atoms reside. The radius of the circumsphere represents the curvature radius (R) at the central carbon atom’s position. Consequently, the curvature (κ) is the reciprocal of R:

$$\kappa =\frac{1}{R}$$
(14)

Thus, \(\bar{\kappa }\) of the fullerene cage can be obtained by calculating the radius of curvature at each carbon atom of the fullerene cage.