Introduction

As an efficient and low-carbon energy source, nuclear energy plays a significant role in the global energy landscape, particularly in the context of global climate change, where it provides significant support for achieving carbon neutrality goals1,2. However, the rapid expansion of nuclear energy industry is accompanied by potential environmental and safety risks3, especially in the handling of nuclear waste and during nuclear accidents, where the leakage of radioactive substances poses a severe threat to both the environment and human health. Among the radioactive isotopes involved in spent nuclear fuel reprocessing or nuclear accidents, iodine isotopes, particularly 131I and 129I, are of particular concern due to their volatility and strong bioaccumulation, which result in significant long-term impacts on the environment and human health4. The half-life of 131I is only 8 days; although its radiation is intense, the associated risks are generally short-term, primarily entering the human body via inhalation or the food chain, leading to acute health issues such as thyroid cancer. In contrast, 129I has an exceptionally long half-life (~ 1.57 × 107 years), enabling it to persist in the biosphere and cause sustained threats to ecosystems. Consequently, the efficient removal of radioactive iodine isotopes has become an urgent requirement for ensuring nuclear safety and reducing environmental contamination5,6,7,8.

Metal-organic frameworks (MOFs), as novel porous materials formed by metal clusters coordinated with organic ligands, have gained considerable attention as potential iodine adsorbents due to their highly tunable structures, large surface areas, and excellent porosity9,10. However, in the real nuclear industry and spent nuclear fuel reprocessing, high-humidity air environments are prevalent, thus demanding more robust iodine adsorption properties from MOF materials6,11,12. In recent years, many researchers have investigated the iodine adsorption behavior of various MOFs in humid environments: Nenoff and her co-workers explored the competitive I2 sorption by Cu-BTC from humid gas streams (about 3.5% relative humidity) at 75 °C and ambient pressure, revealing a remarkable iodine capacity of ~175 wt% with a derived I2/H2O adsorption selectivity of 1.513. Thallapally’s group reported I2 adsorption capacities and mechanisms in two microporous MOFs in the presence of 33% and 43% relative humidity (RH), in which SBMOF-1 and SBMOF-2 exhibited the 15 wt% and 35 wt% uptake, respectively14. Zhang et al. systematically studied the influence of H2O molecules on the iodine adsorption properties of different zeolitic imidazolate frameworks (ZIFs) using grand canonical Monte Carlo (GCMC) simulations, highlighting the competitive adsorption behavior between H2O and I2, particularly for hydrophilic materials15. Other MOF materials including MIL-101-Cr-TED, MIL-101-Cr-HMTA and ECUT-300, also were used to explore the iodine capture performance in a water-containing system16,17,18,19. In our previous work, grand canonical Monte Carlo (GCMC) and density functional theory methods were employed to investigate the iodine adsorption performance of 21 chemically stable MOF materials in high-humidity environments, and influence of different structural factors were revealed20. However, despite these advances, researches on the iodine adsorption behavior of MOFs under humid conditions remains limited, and a comprehensive insight of the key factors influencing iodine adsorption based on a larger number of MOF materials is still needed.

Nowadys, high-throughput computational screening based on molecular simulations offers a rapid approach to evaluating the iodine adsorption performance of MOFs under humid conditions21,22,23. Furthermore, the rapid development of artificial intelligence has ushered in a new research paradigm that combines data science with chemistry24,25. Machine learning has proven to be an efficient tool for analyzing computational data related to gas adsorption behaviors of MOFs, for revealing structure-property relationships, identifying promising MOF adsorbents and even guiding MOF structures design and modification26,27,28,29. In this work, we first selected 1816 I2-accessible MOF materials (with pore limiting diameter > 3.34 Å - the kinetic diameter of I2) from the well-established CoRE MOF 2014 database established by Chuang et al.30, and employed GCMC simulations via using RASPA software to study their I2 adsorption performance under humid air conditions31. Subsequently, three different types of descriptors (structural, molecular, and chemical features) were explored, and machine learning algorithms were utilized to predict iodine adsorption performance and reveal the relationships between various descriptors and iodine adsorption performance. Finally, molecular fingerprint technique was employed to comprehensively identify the influence of structural features on the iodine adsorption, providing valuable insights and guidelines for the future targeted design of MOF materials.

Results

Structure-performance relationships

To identify the optimal structural features, the relationships between the 1816 MOF structures and iodine adsorption performance were investigated in Fig. 1 and Fig. S1. Structural characteristics of MOFs included the pore limiting diameter (PLD), largest cavity diameter (LCD), void fraction (φ), pore volume, surface area and density. When LCD was less than 4 Å (Fig. 1a), the spatial steric hindrance between the I2 molecules and the pore walls resulted in negligible iodine adsorption. When 4 Å < LCD < 5.5 Å, an increase in LCD reduced the steric hindrance, thereby adsorption interaction between the framework materials and iodine molecules became the dominant factor, which led to an increase in both iodine adsorption capacity and selectivity. However, when LCD exceeded 5.5 Å, further enlargement of the channel size diminished the interaction between MOFs and iodine molecules, which intensified the desorption of I2 in the pores and resulted in a continuous decline in both adsorption capacity and selectivity. To identify MOF materials with optimal iodine adsorption properties, we could find the ideal value for LCD lay between 4 and 7.8 Å. As for porosity (with the optimal value for iodine adsorption in the range of 0 to 0.17) in Fig. 1b, iodine adsorption capacity and selectivity initially increased (φ < 0.09) and then decreased (0.09 < φ < 0.6). The relationships between density and iodine adsorption performance also followed a similar trend (Fig. 1c): at low densities (under 0.9 g/cm3), the increase in density promoted iodine adsorption due to the greater number of available adsorption sites; however, when the density exceeded 0.9 g/cm³, the steric hindrance effect gradually increased and excessively compact pore structures limited iodine adsorption; with the density further surpassing 2.2 g/cm3, iodine uptake amount would fall below 100 cm3/g. Furthermore, the optimal values of pore volume, PLD and surface area for iodine capture were also identified, which lay at the range of 0 ~ 0.18 cm3/g, 3.34 ~ 7 Å and 0 ~ 540 m2/g, respectively (Fig. 1d and Fig. S1). In order to facilitate the comparison of the adsorption behavior of different molecules, structure-performance relationships of MOF materials for H₂O adsorption have been delineated (Fig. S2), in which the optimal structural parameters exhibited a broader range (for instance, the optimal LCD could reach up to 11 Å and the optimal φ could attain 0.48); this is likely attributed to the larger kinetic diameter of H₂O molecules compared to I₂. The above results explained that relatively small pore sizes of MOF materials could confer the advantage during competitive iodine adsorption in humid conditions.

Fig. 1: Structure-performance relationship.
figure 1

I2 capture performance (uptake amount and selectivity) as a function of (a) largest cavity diameter (Å), (b) void fraction, (c) density (g/cm3) and (d) pore volume (cm3/g).

Machine learning

After the aforementioned analysis, we initially employed six structural descriptors - PLD, LCD, φ, pore volume, surface area and density, to train machine learning algorithms for predicting iodine gas adsorption in MOF materials under humid conditions (Fig. 2a, d). Two different machine learning algorithms (including random forest and CatBoost model) were trained and compared32,33. After training the model with only structural parameters as the simplest feature set, we gradually incorporated more comprehensive feature sets, including “structural + molecular descriptors” (Fig. 2b, e) and “structural + molecular + chemical descriptors” (Fig. 2c, f). For molecular descriptors, each molecular feature corresponded to specific elemental, hybridization, and bonding types. For carbon (C) and nitrogen (N) elements, the atomic types included C_1, C_2, C_3, and C_R (or N_1, N_2, N_3, and N_R), depending on the nature of single, double, triple, and ring bonds. Oxygen (O) atoms can form double bonds and ring bonds, defined as O_2 and O_R, respectively, as well as central tetrahedral oxygen (denoted as O_3_f) or central trigonal oxygen (denoted as O_2_z)34. For hydrogen (H), fluorine (F), chlorine (Cl), and bromine (Br), the atomic types are designated as H_, F_, Cl, and Br. Additionally, tetrahedral four-coordinate phosphorus for organo-metallic coordination is defined as P_3+q, along with sulfur atoms connected via cyclic bonds (denoted as S_R)35. Regarding metal atoms, only the predominant metal species within the MOF were considered: descriptors for these metals included metal ratio (the molar ratio relative to all atoms), atomic number, atomic weight, atomic radius, polarizability, electron affinity, and Mulliken electronegativity. In chemical descriptors, Henry’s coefficient and heat of adsorption of I2, H2O, N2 and O2 in MOF materials were considered and defined as I2_Henry (I2_heat), H2O_Henry (H2O_heat), N2_Henry (N2_heat) and O2_Henry (O2_heat), respectively.

Fig. 2: Comparisons of prediction performance.
figure 2

Prediction accuracy of I2 capture using random forest and CatBoost algorithms based on (a), (d) “structural descriptors”, (b), (e) “structural + molecular descriptors” and (c), (f) “structural + molecular + chemical descriptors” feature sets.

Throughout the above process, the prediction performance of both the random forest and CatBoost algorithms progressively improved. The accuracy of the machine learning algorithm was evaluated using R22, mean absolute error (MAE), and mean squared error (MSE). When only structural descriptors were used, the random forest algorithm exhibited a relatively low prediction accuracy (R2 = 0.438). After adding molecular descriptors to the feature set, the prediction accuracy of the random forest model improved, with R2 increasing to 0.592. Further incorporation of chemical descriptors led to the highest prediction accuracy (R2 = 0.900). Simultaneously, the MAE and MSE values of random forest model demonstrated a steady decrease: MAE reduced from 75.588 to 61.673, and finally to 23.378; MSE decreased from 14293.744 to 10387.059, and ultimately to 2547.433. For the CatBoost algorithm, the trends were similar to those observed with the random forest model, but overall, it exhibited better prediction performance. The hyperparameter tuning space and optimal hyperparameters are listed in Table S1. When using the “structural descriptors + molecular descriptors + chemical descriptors” feature set, the CatBoost algorithm achieved the highest R2 of 0.941, with MAE and MSE dropping to 18.276 and 1512.681, respectively. Additionally, by following the above process, the accuracy of the CatBoost algorithm in predicting the H₂O adsorption performance of MOF materials also gradually improved (Fig. S3), and the value of R2, MAE and MSE respectively reached 0.911, 5.672 and 166.554 based on “structural descriptors + molecular descriptors + chemical descriptors” feature set. The above results further confirmed that molecular and chemical descriptors significantly complemented the structural descriptors, playing a crucial role in accurately predicting gas molecules capture in MOF materials.

To investigate the contribution of different features in predicting iodine adsorption performance, the SHAP (SHapley Additive exPlanations) method was used to rank and explain the significance of various features in the CatBoost model (Fig. 3). Among the structural descriptors (Fig. 3a), LCD was identified as the most important descriptor, followed by void fraction and surface area (both of which exhibit a certain negative correlation with I2 adsorption performance). PLD and density of the MOF material ranked next in importance, while pore volume had the least significance among the six structural descriptors. In the “structural descriptors + molecular descriptors” set (considering only the top 20 descriptors in terms of importance), all structural descriptors, except for pore volume, ranked within the top 10, with LCD maintaining the highest importance. Among the molecular descriptors, the most significant was C_R (positively correlated with adsorption capacity), followed by the proportion of metal atoms (negatively correlated with adsorption capacity), and then H_R, N_R, and O_2 (all positively correlated with adsorption capacity). We speculated that iodine adsorption in MOF materials primarily relied on the organic ligands, with carbon rings, nitrogen, and oxygen atoms serving as key adsorption sites. In the “structural descriptors + molecular descriptors + chemical descriptors” set (considering only the top 20 descriptors), the most influential features were I2_Henry, followed by I2_Heat. Both features were positively correlated to iodine capture amounts. Notably, in contrast to N₂ and O₂, the adsorption heat of H₂O (H2O_heat) was strongly negatively correlated with iodine adsorption capacity (with the damping coefficient positively correlated), likely because under humid conditions, H₂O molecules are the main competitive species for iodine gas adsorption. In terms of H2O molecules adsorption (Fig. S4), H2O_heat possessed the highest relative importance, and metal atoms also exhibited relatively high significance with polarizability, metal ratio, Mulliken electronegativity and atomic radius positioning within the top six rankings. In contrast to the adsorption of I2 molecules, the atomic radius of the metal atoms exhibited a positive correlation with the adsorption of H2O molecules (Fig. S4b); furthermore, O_R replaced N_R as the most favorable ligand structure due to the strong hydrogen bonding interactions (Fig. S4c).

Fig. 3: Feature importance for I2 capture.
figure 3

SHAP value distribution of (a) “structural descriptors”, (b) “structural + molecular descriptors” and (c) “structural + molecular + chemical descriptors”.

The correlation coefficients between different features were calculated and visualized as heatmaps. Among the pure structural descriptors (Fig. S5), a strong positive correlation (correlation coefficient > 0.6) was evident between PLD, LCD, void fraction, surface area, and pore volume, while density showed a strong negative correlation (correlation coefficient > -0.4) with the other five structural parameters. In the “structural descriptors + molecular descriptors” set (Fig. S6), metal ratio, metal atomic number, metal atomic weight, and metal atomic radius exhibited positive correlations with the density of the MOF material (with correlation coefficients of 0.32, 0.38, 0.39, and 0.28, respectively), due to the heavier nature of metal atoms compared to ligand atoms. Additionally, the metal atoms ratio was positively correlated with void fraction (with correlation coefficient of 0.23), whereas the correlations between porosity and the number of metal atoms, metal atomic weight, and metal atomic radius were relatively weak. This was because metal clusters, formed by a greater number of metal atoms as connecting nodes, would lead to the increased porosity, irrespective of the type of metal atoms. The metal ratio was negatively correlated with the number of most organic ligand atoms (including H_R, C_R, N_R, O_R, C_3, and N_3) (with correlation coefficient of -0.43, -0.42, -0.11, -0.10, -0.15, and -0.13), but was positively correlated with O_2 (with correlation coefficient of 0.31), which was likely due to the fact that a certain amount of O existed to form metal clusters in MOF materials by bonding with metals atoms. In the “structural + molecular + chemical descriptors” set (Fig. 4), I2_Henry showed a low correlation with both structural and molecular descriptors. N2_Henry, O2_Henry, and H2O_Henry exhibited significant positive correlations with most structural descriptors (including PLD, LCD, porosity, specific surface area, and pore volume), while N2_Heat, O2_Heat, and H2O_Heat were negatively correlated with the aforementioned descriptors. This was attributed to the fact that the larger size of cavity or pore channel in MOFs facilitated the diffusion of gas molecules within MOFs, but the reduced density of adsorption sites weakened the adsorption strength, leading to the lower adsorption heats. The very strong correlations between N2_Henry and O2_Henry (correlation coefficient = 0.98) and between N2_Heat and O2_Heat (correlation coefficient = 0.89) arose from the similar molecular structures of N₂ and O₂. I2_Heat was negatively correlated with the proportion of metal atoms (correlation coefficient of -0.27) and positively correlated with C_R (correlation coefficient of 0.21), further indicating that metal sites were not effective adsorption sites for I₂ molecules, which tended to be adsorbed near organic ligands (such as benzene rings).

Fig. 4: Heatmap of correlation coefficients.
figure 4

Pearson coefficients matrix for the 20 most significant features for I2 capture in “structural + molecular + chemical descriptors” feature set.

Molecular Fingerprint

In order to comprehensively reveal the factors that positively or negatively influenced iodine adsorption performance, thereby paving the way for future molecular design, four types of molecular fingerprint (including Molecular ACCess Systems (MACCS), PubChem, AtomPairs2D and Estate fingerprint) were employed in place of the previously used molecular descriptors. MACCS and PubChem represented two of the most widely used fingerprints derived from substructure key information: the MACCS fingerprint consisted of a set of 166 structural keys, constructed using SMART patterns; while the PubChem fingerprint originated from the PubChem database, encompassing 881 types of structural keys represented as binary substructure encodings. The APFP fingerprint encoded a total of 780 atomic pairs based on their topological distances. Estate summarized the microscopic structure of materials through a 79-byte representation. Based on the presence or absence of bits, molecular fingerprint digitized the molecular features of MOF materials, thereby providing microscopic insights into the structure of exceptional materials. These molecular fingerprints, in conjunction with previous structural and chemical descriptors, were applied to train machine learning models for the prediction of iodine adsorption performance under humid conditions. Although the prediction performance showed a slight decrease based on “structure + molecular fingerprint + chemical descriptors” set compared to the prior “structure + molecular + chemical descriptors” set, because the encoding of molecular fingerprint solely indicated the presence or absence of specific features and other information such as the quantity or proportion in a single MOF unit cell were missed, which limited their ability in prediction of uptake amount; however, their comprehensive inclusion of various structural categories enabled them to serve as excellent interpretative tools.

After comparing the prediction accuracies of the four molecular fingerprints (Fig. 5a, Fig. S7 and Table S2), MACCS molecular fingerprint exhibited its superiority in machine learning (R2 = 0.927, MAE = 20.057, and MSE = 1651.391). The 20 most significant MACCS bits were ranked and accompanied by detailed interpretations (Fig. 5b and Table S3). Additionally, the autocorrelation coefficients of the molecular fingerprints were presented in the heatmap to reveal the interrelationships among the fingerprints (Fig. S8). The top two, Bit_158 and Bit_75, demonstrated the significant positive impact of nitrogen (N) atoms in MOF materials on iodine adsorption performance. The strong correlations between Bit_158 and Bit_156 (with correlation coefficient of 0.9), and between Bit_158 and Bit_45 (with correlation coefficient of 0.73) further highlighted that the presence of N atoms, whether in rings or directly bonded to carbon (C) atoms, promoted the iodine adsorption in humid air conditions. Bit_163 and Bit_162, which had a correlation coefficient of 0.6, represented that six-membered aromatic ring was another important structural feature. Six-membered aromatic ring likely provided electron-rich adsorption sites due to their large π-bonds, thereby enhancing the iodine adsorption. Bit_6 and Bit_12, which were associated with the type of metal, represented lanthanide metal and group IB (or IIB) metal elements, respectively. The negative correlation (-0.38) between these two molecular fingerprints arose from the relatively homogeneous nature of the metal clusters in these materials; however, the former demonstrated a negative effect on I2 adsorption, while the latter had a positive impact (consistent with previous findings, where a negative correlation was observed between the metal atomic radius and iodine adsorption performance). The relationships between MACCS molecular fingerprint and other structural (or chemical) features were also illustrated (Fig. S8). It could be found that although Bit_6 (lanthanide metals) and Bit_12 (group IB/IIB metals) showed little correlation with I2_Henry and I2_Heat, Bit_6 enhanced the adsorption of H2O in MOF materials (correlation coefficient with H2O_Heat of 0.65), while Bit_12 weakened the adsorption of H2O (correlation coefficient with H2O_Heat of -0.41); thus, it could be concluded that group IB/IIB metals played the significant role in promoting competitive adsorption of I2 under humid conditions compared to lanthanide metals.

Fig. 5: Prediction performance and feature importance using CatBoost algorithm.
figure 5

(a) Prediction accuracy of I2 capture and (b) SHAP value distribution based on “structural + MACCS + chemical descriptors” set.

Furthermore, the MACCS fingerprints Bit_138, Bit_69, Bit_100 and Bit_131, representing different forms of hydrogen (H) (whether connected to C or non-C elements), and Bit_143 and Bit_139, representing different forms of oxygen (O) (either as part of a ring structure or as hydroxyl groups), all demonstrated the positive role of H and O atoms in enhancing iodine adsorption in MOF structures. Nevertheless, modifications involving N atoms appeared to offer the greater advantages over O atoms (Overall, Bit_158 and Bit_75 had the higher importance ranking compared to Bit_143 and Bit_139). This was because the presence of N atoms, compared to O atoms, exerted the evident suppressive effect on both pore size and the adsorption of H2O molecules, both of which were more favorable for iodine adsorption in humid environments; specifically, the correlation coefficients between N-related molecular fingerprints (Bit_158, Bit_75, Bit_156, and Bit_45) and pore volume were -0.27, -0.28, -0.26, and -0.26, respectively, and their correlations with H2O_Heat were -0.12, -0.18, -0.13, and -0.17; in contrast, O-related molecular fingerprints (Bit_143 and Bit_139) had the positive correlation coefficients with pore volume of 0.2 and 0.046, respectively, and correlation coefficients with H2O_Heat were 0.24 and 0.029. The role of H atoms lay between that of the N and O atoms: the correlation coefficients between H-related molecular fingerprints (Bit_138, Bit_69, Bit_100 and Bit_131) and pore volume were at the range of -0.1 ~ -0.2, and the correlation with H2O_Heat were also weak (with negative correlation coefficients less than -0.1).

Finally, the top six MOF materials with the best iodine adsorption performance were picked out, and microscopic insights into the common fingerprints present shared by most of these MOFs were provided (Fig. 6 and Table S4). These MOF structures shared a common set of molecular fingerprints: Bit_45, Bit_75, Bit_156, Bit_158, and Bit_163, all of which indicated the presence of six-membered rings and N atoms, and N atoms were part of the six-membered rings or directly coordinated with metal atoms. Furthermore, the metal atoms in these structures were all transition metals from the fourth period (including Zn, Co, Ni, and Mn), which were characterized by the smaller atomic radii and atomic numbers compared to lanthanide elements. Four of these MOF materials had Bit_97 molecular fingerprint, representing the presence of O atoms. To derive more generalizable conclusions, all MOF materials were categorized into different groups based on their adsorption performance, and the occurrence frequencies of the aforementioned MACCS bits were analyzed (Table S5). The proportions of Bit_45, Bit_75, Bit_156, Bit_158, Bit_163 and Bit_97 in the entire MOF material database were 35.8%, 37.2%, 51.7%, 50.9%, 72.8%, and 30.7%, respectively. Among the top 30 high-performance MOF materials, their frequencies increased to 76.7%, 70.0%, 73.3%, 76.7%, 86.7% and 56.7%, respectively; whereas in the 300 lowest-performing MOFs, their frequencies dropped to 22.0%, 27.7%, 47.3%, 47.7%, 45.3% and 26.3%, respectively. Additionally, Bit_6 (representing lanthanide elements) was more prevalent in MOFs with poorer adsorption performance, appearing in 23.0% of the bottom 300 MOFs but entirely absent in the top 30 MOFs. These findings suggested that the presence of six-membered rings, nitrogen atoms, oxygen atoms, small metal atomic radii, and low metal atomic numbers favored iodine adsorption in MOFs. This further validated the previous analysis and provided valuable insights for the future design of high-performance MOF materials.

Fig. 6: Decomposition diagrams of molecular fingerprint of the top six MOF materials with the best iodine adsorption performance.
figure 6

Molecular fingerprint of MOF materials of (a) BARZUR, (b) CUVGOQ, (c) ZEXKUK, (d) QUDJOP, (e) UFATEA01 and (f) CEYPUT.

Discussion

In summary, large-scale GCMC simulations were employed to investigate iodine adsorption performance (including adsorption capacity and selectivity) for 1816 MOF materials from CoRE MOF database under humid conditions. Two machine learning algorithms, Random Forest and CatBoost, were utilized to predict the iodine adsorption performance of MOF materials, gradually incorporating three different types of descriptors to enhance the prediction accuracy of the models: 6 structural features (including pore characteristics, density, and surface area), 25 molecular features (including the types of metal and ligand atoms as well as their bonding modes), and 8 chemical features (including adsorption heat and Henry’s coefficient). SHAP method was used to rank the importance of these descriptors, and correlation coefficients were employed to reveal the relationships among features. Four types of molecular fingerprints were also generated in place of molecular features and combined with the CatBoost algorithm to predict iodine adsorption performance. The top 20 MACCS bits were extracted, demonstrating the most significant role of six-membered rings and N atoms in MOF materials, followed by O atoms. Among the metal sites, the lighter transition metal elements were found to be more favorable for iodine adsorption compared to lanthanide elements. This comprehensive and systematic study shed light on the iodine adsorption performance of MOF materials under humid conditions, providing valuable insights for the future screening and design of high-performance MOF materials.

Methods

Simulation method

GCMC simulations were employed using RASPA software to investigate the adsorption behavior of I₂ within the MOFs at an environment of 423 K and 1 bar. The temperature of 423 K was relevant to operational conditions in the nuclear industry6,36,37. To replicate the high humidity environment encountered during the post-treatment phase of spent nuclear fuel, the mixed gas system was composed of 300 ppm I₂, 68.5% N₂, 18.4% O₂, and 12.2% H₂O, achieving a relative humidity of 100%6. Throughout the simulation, the MOFs were treated as fixed rigid structures with periodic boundary conditions. Supercells were utilized as necessary to ensure that the system dimensions exceeded twice the cutoff distance (12 Å). In addition, the selectivity of I2 during adsorption was calculated as the following equation8,37:

$${{selectivity}}_{{I}_{2}}=\frac{{X}_{{I}_{2}}/{Y}_{{I}_{2}}}{{X}_{{others}}/{Y}_{{others}}}$$

where \({X}_{{I}_{2}}\) and \({Y}_{{I}_{2}}\) denoted the uptake amounts and gas phase concentration of I2; \({X}_{{others}}\) and \({Y}_{{others}}\) were the uptake amounts and gas phase concentration of other gas components (N2, O2 and H2O).

All GCMC simulations comprised an equilibration phase of 50,000 cycles, followed by a production phase of 50,000 cycles. Each cycle involved the movement of all adsorbed molecules, encompassing the insertion, deletion, translation, rotation, reinsertion, identity change, and swap processes. Iodine molecules were modeled as spherical entities, with van der Waals parameters derived from the viscosity of pure iodine38. Water molecules were represented using the transferable intermolecular potential (TIP3P) model (rOH = 0.9527 Å and θHOH = 104.52°), which was a model empirically validated to accurately describe hydrogen bonding interactions39,40. N₂ and O₂ molecules were modeled as three-site representation41,42. The relevant molecular model parameters were referenced from previous published work20. The Universal Force Field (UFF) was employed to establish the Lennard-Jones (LJ) parameters for the MOF structures43. Interatomic interactions were described using LJ and electrostatic potential energy functions as below:

$$U\left({r}_{{ij}}\right)=\sum 4{\varepsilon }_{{ij}}\left[{\left(\frac{{\sigma }_{{ij}}}{{r}_{{ij}}}\right)}^{12}-{\left(\frac{{\sigma }_{{ij}}}{{r}_{{ij}}}\right)}^{6}\right]+\sum \frac{{q}_{i}{q}_{j}}{4\pi {\varepsilon }_{0}{r}_{{ij}}}$$

where \(U({r}_{{ij}})\) denoted the non-bonded interaction energy between atoms i and j; the first term represented the van der Waals non-bonded potential energy, and the second term accounted for the Coulombic electrostatic interaction energy. \({r}_{{ij}}\) signified the interatomic distance, \({\sigma }_{{ij}}\) was the depth of the Lennard-Jones (LJ) potential well, \({q}_{i}\) and \({q}_{j}\) represented the partial charges of atoms \(i\) and \(j\), respectively, and \({\varepsilon }_{0}\) was the vacuum dielectric constant.

Material descriptors

For structural descriptors, PLD and LCD were computed using the Zeo + + software package, based on Voronoi tessellation44. The void fraction was calculated using the RASPA software package with helium atoms (kinetic diameter = 2.58 Å) as probes. Meanwhile, the surface area, pore volume, and density were determined by RASPA using nitrogen molecules (kinetic diameter = 3.64 Å) as probes. To identify the molecular descriptors within the MOF structure, the Python program lammps_interface was utilized based on the UFF4MOF force field34,45,46. As for the chemical descriptors, adsorption heat and Henry’s coefficient were calculated under infinite dilution conditions using RASPA based on an NVT-MC system (MC referred to the Monte Carlo method), with simulations conducted for 10,000 cycles.

The OpenBabel and PaDEL-Descriptor software were performed to compute four types of molecular fingerprint47,48: MACCS, PubChem, AtomPairs2D and Estate49,50,51,52. OpenBabel was an open-source chemical toolbox designed to convert CIF files into SDF files compatible with PaDEL-Descriptor. And the PaDEL-Descriptor software processed the SDF format structural files, ultimately yielding the required four types of molecular fingerprint.

Machine learning

Two machine learning regression algorithms - Random Forest and CatBoost, were implemented in Python 3.9 using the scikit-learn package32,33,53. Both of these algorithms offered advantages in terms of low computational costs and good interpretability compared to other machine learning model such as neural network algorithm54. For each ML algorithm, the dataset was randomly divided into training sets and test sets, of which 80% were used for the training of the model and 20% for testing. Additionally, for optimizing and validating the models, we tuned the related hyperparameters using cross-validation. The accuracy of the models was evaluated using R2, MAE, and MSE, whose relevant equations were as follows:

$${R}^{2}=1-\frac{{\sum }_{i=1}^{n}{({Y}_{i}-\widehat{{Y}_{i}})}^{2}}{\mathop{\sum }\limits_{i=1}^{n}{({Y}_{i}-{\bar{Y}})}^{2}}$$
$${MAE}=\frac{1}{n}\mathop{\sum }\limits_{i=1}^{n}|{Y}_{i}-\widehat{{Y}_{i}}|$$
$${MSE}=\frac{1}{n}\mathop{\sum }\limits_{i=1}^{n}{|{Y}_{i}-\widehat{{Y}_{i}}|}^{2}$$

where \(n\) represented the number of instances in the training or testing set, \(Y\) denoted the predicted values from the machine learning algorithms, \(\bar{Y}\) signified the mean of the model’s predictions, and \(\hat{Y}\) represented the computed values for the MOF materials.