Abstract
Machine learning (ML) models for predicting gas permeability through polymers have traditionally relied on experimental data. While these models exhibit robustness within familiar chemical domains, reliability wanes when applied to new spaces. To address this challenge, we present a multi-tiered multi-task learning framework empowered with advanced machine-crafted polymer fingerprinting algorithms and data fusion techniques. This framework combines scarce “high-fidelity” experimental data with abundant diverse “low-fidelity” simulation or synthetic data, resulting in predictive models that display a high level of generalizability across novel chemical spaces. Additionally, this multi-task scheme capitalizes on known physics and interrelated properties, such as gas diffusivity and solubility, both of which are closely tied to permeability. By amalgamating high throughput generated simulation data with available experimental data for gas permeability, diffusivity, and solubility for various gases, we construct multi-task deep learning models. These models can simultaneously predict all three properties for all gases under consideration, with markedly enhanced predictive accuracy, particularly compared to traditional models reliant solely on experimental data for a singular property. This strategy underscores the potential of coupling high-throughput classical simulations with data fusion methodologies to yield state-of-the-art property predictors, especially when experimental data for targeted properties is scarce.
Similar content being viewed by others
Introduction
Polymer-based gas and solvent separation membrane technologies have significantly impacted a diverse range of applications, including carbon capture, water purification, drug delivery, and food packaging1,2. Crucial to propelling widespread adoption and advancement of this technology is the identification and design of polymer materials endowed with a desired set of properties and performance attributes. A key figure of merit in gas separations is gas permeability, which describes the movement of gas molecules into and through a polymer material. Based on the solution-diffusion model3, gas permeability (P) through a membrane is defined as the product of gas diffusivity (D) and gas solubility (S):
Capabilities that can accurately and rapidly predict gas permeability across a diverse range of gases and polymer chemistries can be transformational and facilitate the discovery and development of new sustainable high-performance polymer membranes4,5.
Traditionally, the measurement of gas permeability relies on the constant volume permeation technique6, which, though serving as the primary benchmark, is both time and resource intensive. In search of alternative approaches, classical molecular dynamics (MD) simulations have emerged as a complementary pathway to estimate gas permeability7. However, the fidelity of these simulations is constrained by the intrinsic limitations of the classical force fields employed and timescales that are computationally accessible. As a result, they can only achieve, at best, semi-quantitative agreement with experimental measurements, despite correctly capturing general trends.
In recent times, data-driven machine learning (ML) methods have achieved remarkable strides, fundamentally reshaping the landscape of materials property predictions and the tailored design of materials with specific target characteristics4,5,8,9,10,11,12,13. ML methods have found extensive applications in the polymer gas transport domain, encompassing a diverse array of studies varying in the number of polymers investigated and the types of features used to train models. An early example of this is the work by Wessline et al. in 2006, where a neural network was used to correlate the infrared spectra of 33 polymers with their carbon dioxide permeability14. In a more current study, Yuan et al. utilized Multivariate Imputation by Chained Equations (MICE) to predict missing gas permeability values in a dataset spanning hundreds of polymers across six gases15. These examples only scratch the surface. In a comprehensive perspective paper, Ricci et al. delve deeper into the evolution of ML in modeling gas separation with polymer membranes, highlighting strategies, challenges, and future directions16.
These informatics approaches require a critical initial step: defining the feature space in which the models are trained by mapping features to the properties being learned. Early machine learning (ML) studies employed simple feature sets; for instance, in a 2006 study by Wang et al., six features related to the experimental setup, such as temperature, feed gas flux, and permeate-side pressure, were used17. These approaches have transitioned to incorporate more descriptive and comprehensive features, capturing atomistic to morphological structural details18,19. In this paradigm, a polymer’s chemical structure is converted into a machine readable numerical representation, commonly known as a fingerprint or feature vector. This fingerprint allows an ML algorithm in the second step (during the training phase) to discern intricate chemistry-morphology-property relationships and subsequently generate predictive models for the properties. While traditional hand-crafted fingerprints20,21 have conventionally represented polymer structures in machine learning models, recent endeavors have expanded the horizons of this methodology and have led to learned fingerprinting techniques, which we adopt in this study. These techniques involve machine learning key features directly from polymer repeat units, offering faster feature extraction with comparable accuracy22,23. Despite these advancements, a common challenge these methods encounter is extrapolating outside of the known polymer-property space, i.e., outside of the training data space24. Exploring new chemical spaces through various avenues, including experiments, simulations, and machine learning models poses unique limitations that necessitate innovative solutions.
In the present contribution, we demonstrate the power of multi-task (MT) learning, harnessing both experimental and computational data to address and bridge the shortcomings outlined above, to build a best-in-class gas transport property predictor. MT learning is a type of transfer learning in which a model is trained on more than one task, learning multiple properties and/or data sources simultaneously25. In contrast, single-task (ST) learning involves the consideration of a singular property and data source. The MT architecture, which integrates various data sources and exploits underlying correlations and calibrations, has shown improved predictive performance and enhanced transferability, compared to ST methods26,27. In the polymer gas transport ML space, MT learning has been commonly implemented by incorporating permeability data for various gases and utilizing datasets that encompass a broad spectrum of properties, including mechanical, thermal, and thermodynamic5,28,29. We expand on these previous works by utilizing MT learning in two novel ways. The first aspect leverages data fidelity by fusing “high-fidelity” experimental data with “low-fidelity” simulation data. While experimentally measured data serves as the ground truth, it often grapples with constraints stemming from labor-intensive protocols and associated expenses. Conversely, simulation-generated data can be produced on a grander scale, but it may exhibit diminished accuracy due to necessary approximations made in the theory to make the simulations practical. MT algorithms learn to calibrate the low-fidelity (simulation) data against the high-fidelity (measured) data across the whole space of the data, thus leading to a high level of generalizability27,30. Typically, gas simulations have been used to validate ML predictions. Here, we integrate the simulation data into the model itself.
The second innovative aspect of the MT learning approach extends the general gas permeability ML model to include directly correlated gas transport data; diffusivity and solubility. Gas transport experiments reported in the literature do not always include all three properties, and they tend to focus on testing specific gases of interest. Consequently, some property values may not be available for certain cases. MT learning offers a solution to this challenge by drawing on available properties to learn correlations between them and make effective generalizations28. Incorporated within these two outlined MT aspects is the integration of gas transport data spanning a variety of gases. Our MT learning strategy leverages potential correlations between the transport characteristics of multiple similar (or dissimilar) gases15. A unified model that harnesses data from (1) diverse sources (i.e., measured and simulated), (2) spanning multiple correlated properties (i.e., P, D, & S), and (3) for various gases, can lead to enhanced predictive performance and generalizability, as will be demonstrated here.
A key ingredient of our MT learning approach involves simulation data that could complement measured data for gas transport, as illustrated in Fig. 1a. To achieve this, we have designed a high-throughput molecular dynamics (MD) and Monte Carlo (MC) simulation pipeline, depicted in Fig. 1b. This pipeline generates data for gas diffusivity (Dsim) and solubility (Ssim); the subscripts explicitly indicate the source of the data. Simulated gas permeability (Psim) is then derived from the product of Dsim and Ssim, as prescribed by Eq. 1. Experimental data are labeled as Pexpt, Dexpt and Sexpt. Data for 6 different gases (CO2, CH4, O2, N2, H2, and He) span this study. An overview of the dataset is presented in Fig. 1c. With this fused dataset, ML models for gas transport properties are created using our newly-developed graph neural networks method – polyGNN22, thus completing the MT learning pipeline as visualized in Fig. 1a. The input for polyGNN consists of polymer “Simplified Molecular-Input Line-Entry System” (SMILES)31 strings. These SMILES strings are translated into graph representations and fingerprints, an essential ingredient for the property prediction model trained on the integrated dataset. The architecture of polyGNN, exhibited in Fig. 1d, illustrates this process.
a MT learning pipeline. Our innovative multi-task learning approach employs the fusion of experimental and simulation data, harnessed through the power of polyGNN, a graph neural network architecture, to construct a state-of-the-art predictor for gas transport properties b Simulation protocol. The process begins with a polymer SMILES string31, from which the Polymer Structure Predictor (PSP) package36 constructs a simulation box. This box undergoes a 21-step equilibration procedure37. Subsequently, the equilibrated structures serve as the starting point for gas diffusivity and solubility calculations, accomplished through molecular dynamics and Monte Carlo simulations, respectively. Gas permeability is determined by the product of the simulated gas diffusivity and solubility. c Dataset overview. Curated experimental and simulation data used for training the multi-task ML models. d polyGNN22 architecture. A method based on graph neural networks is initiated with a polymer SMILES string. The encoder converts the repeat unit SMILES string into a periodic graph along with fingerprints, followed by the computation of initial atomic and bond fingerprint vectors. Subsequently, the message passing unit generates the learned polymer fingerprint. Introducing a selector vector to convey data fidelity (experimental or simulation) and specific properties (permeability, diffusivity, solubility) for six gases, the approach then combines this fingerprint and selector vector before passing it to the estimator, resulting in the prediction of the desired property.
To test our MT learning approach, we constructed four distinct models to examine and benchmark the impact of incorporating multiple data streams. These models were designed to emulate real-world usage scenarios for the prediction model’s application and to assess the improvements in prediction capabilities. To evaluate the efficacy of the MT learning, a comparison with ST learning is employed. Through these case studies, we demonstrate that MT learning surpasses conventional learning models by integrating diverse data sources and extracting meaningful correlations, particularly in data-scarce scenarios. Furthermore, the inclusion of diverse property data in this approach substantially broadens the coverage of the chemical space and effectively addresses the ML extrapolation problem. This is an ongoing process though, one that can lead to continuous improvement as more data becomes available. We then performed a head-to-head comparison of our new MT model against our previous, then state-of-the-art gas permeability predictor, deployed at Polymer Genome (https://www.polymergenome.org)11, making predictions across 13 polymer classes and demonstrating the superiority of the present model.
Finally, we highlight the power of the present development in the realm of materials discovery. Robeson-type trade-off plots are created for gas permeability, diffusivity, and solubility (by pairing each with selectivity), for over 13,000 known (i.e., previously synthesized) polymers. These trade-off plots reveal interesting candidates, as well as the true property limits across the known polymer chemical space. Most importantly the limitations of the present model (in terms of recognizing chemical spaces where the model is uncertain) are also revealed.
By integrating high-throughput simulation data with available measured data and employing data fusion techniques, one can progressively enhance the accuracy and generalizability of predictions. This philosophy and strategy holds the potential to advance polymer discovery not only for membrane technology but also for other applications.
Results
Experimental data acquisition
Measured gas transport properties (permeability, diffusivity, and solubility) for six different gases (CO2, CH4, O2, N2, H2, and He) were obtained from 84 publications listed in the Polymer Handbook32. The experimental testing temperatures ranged from 25 °C to 35 °C, and testing pressures varied between 1 and 30 atm. The dataset comprised a total of 820 polymers and included 3748, 709, and 550 Pexpt, Dexpt, and Sexpt values, respectively, amounting to a total of 5007 data points. Factors such as polymer process history and testing method were not directly included as parameters. Instead, the measured Pexpt, Dexpt, and Sexpt values are treated as samples from the distribution of possible values for a given polymer. As such, it is important to consider the uncertainty in the predictions, and not just the mean value of the prediction.
Molecular Dynamics and Monte Carlo simulations
Gas diffusivity and solubility data were generated using classical molecular dynamics (MD) and Monte Carlo (MC) simulations, respectively. These simulations were conducted using the open-source large atomic molecular massively parallel simulator (LAMMPS) package33. The atomic potential parameters for polymers were adopted from the general AMBER force field 2 (GAFF2)34. In the simulations, the gas molecules (i.e., CO2, CH4, O2, and N2) were treated as rigid molecules, and thus were modeled with non-bonded potentials described by the TraPPE (transferable potentials for phase equilibria) models35. To perform the simulations, 27 polymer chains were inserted into the simulation box, with each chain comprising of approximately 150 atoms, and their ends were capped with a methyl group. The initial polymer configurations were generated using the Polymer Structure Predictor (PSP) package36, and a representative snapshot is shown in Fig. 1b. To achieve equilibrated structures, all systems underwent a 21-step relaxation procedure as recommended by Abbott et al.37. The mean-squared displacement (MSD) of the polymers was then computed, and movement beyond a few times the distance of the radius of gyration on average was assessed. This step ensured that the polymers explored various conformations and reached an equilibrium conformational state and density. Once the equilibrated structures were obtained, Dsim and Ssim were calculated. The simulation protocol is outlined in Fig. 1b.
For the Dsim calculations, a total of 27 gas molecules were randomly added to the simulation box. This specific number of molecules was chosen to be small enough to maintain the system in the dilute Fickian regime such that the gas molecules do not significantly influence each other, and yet large enough to obtain meaningful statistics. Subsequently, all systems underwent an additional equilibration of 10 ns in the NPT ensemble, followed by a 100–200 ns production run in the NVT ensemble. The choice of a 100–200 ns production run duration was made to ensure the convergence of gas diffusivity and gas MSD slope across a broad spectrum of polymer types. While shorter time frames are adequate for certain instances, there are cases where the extended range of 100–200 ns is necessary to achieve the desired level of convergence. To illustrate this behavior, we present an analysis of simulation time versus methane diffusivity for polyethylene, polyimide, polystyrene, and polymethyl methacrylate, with the results detailed in Supplementary Figure S1. The box size in the NVT run was fixed using the average spacing and density obtained from the last 1 ns of the NPT run. Nosé-Hoover thermostat and barostat were employed with a damping parameter of 100 time steps for each, and a time step of 1 fs was used in all MD simulations. The barostat coupled the three dimensions of the box to maintain a cubic box for all systems. Simulation outputs were saved every 1000 fs and block averaging from one polymer configuration was used to calculate an average Dsim and standard deviation from the gas MSD. Block averaging allows for the reduction of random noise and more reliable statistical measures38.
For the Ssim calculations, a 5 ns production run was performed on equilibrated structures in an NVT ensemble. During this 5 ns run, a snapshot of the structure was captured every 100 ps, resulting in a total of 50 snapshots. Employing an ensemble of snapshots allows for improved sampling and a standard error, which is crucial for accurate estimation of Ssim39. Using a built-in LAMMPS function (https://docs.lammps.org/fix_widom.html), 25,000 gas particles were inserted per snapshot, at random positions, following the Widom insertion method40. This method involves determining the excess chemical potential resulting from the insertion of gas molecules into the polymer, which allows for the estimation of Henry’s constant. Henry’s constant indicates how easily a particular gas dissolves in the polymer. Henry’s Law is then used to obtain gas solubility from Henry’s constant, with an assumption of a partial pressure equal to 1 atm, which is the IUPAC standard testing condition41. This derivation is detailed in the methods section. No relaxation was performed to adjust the positions of the polymer atoms or the gas particles during the insertion process. Langevin thermostat was used with a time step of 1 fs for all MC simulations. 25 polymer configurations were used to calculate the Ssim, standard deviation, and the standard error from the excess chemical potential.
Figure 1b provides an overview of the simulation protocol used, and details of Dsim estimation from gas MSD and Ssim from the excess chemical potential are described in the Methods section.
Validation of MD and MC simulations
As an essential step of this investigation, we aimed to validate and calibrate the accuracy of the MD and MC predictions and assess the extent to which the simulations capture trends in gas transport properties. Performing classical simulations with a specific force field for polymer-gas systems across extensive chemical spaces to estimate gas diffusivity and solubility is a relatively rare endeavor. While generic force fields like GAFF2 are designed for a wide variety of materials, they often require fine-tuning of potential parameters for each unique material to attain better accuracy.
A total of 584 polymer-gas systems were simulated, out of which 342 systems had corresponding experimental measurements. The additional simulated systems were intended to expand the chemical coverage of the model. A comparison of Psim, Dsim, and Ssim against their respective experimental values, Pexpt, Dexpt, and Sexpt, is illustrated in Fig. 2. Overall, the simulations tend to overestimate the measured values, but they effectively capture the general trends across the polymer-gas chemical space considered. More specifically, the simulated polymer systems often exhibit lower densities compared to experimental systems, as modeled systems are approximations of the real polymeric materials and may include lower molecular weights and limited equilibration times. In our methodology, we employ a 21-step polymer equilibration relaxation procedure, which results in consistent density trends compared to experimental systems. However, a slight underestimation of density remains, as also observed by Abbott et al. in their study employing the same procedure37. This increased free volume allows gas molecules to move more easily and quickly through the polymer system, resulting in higher diffusivity.
a Gas permeability parity plot, b Gas diffusivity parity plot, and c Gas solubility parity plot. Parity plots comparing the results from simulations against experiment data. Simulated gas permeability was derived using Eq. 1, using simulated gas diffusivity and solubility as inputs. The red lines represent trends in predicted data, while the black lines depict the parity lines of optimal fit. The error bars for all plots are represented in standard deviations. Error propagation techniques were employed to calculate the error bars for gas permeability. While some overestimation is expected across all cases, a qualitative correlation is demonstrated.
Similarly, the discrepancies of Ssim relative to Sexpt may be due to the approximations inherent to the Widom insertion approach and the quality of the classical force fields across chemical spaces. Nonetheless, the favorable trends that the force fields can capture provide optimism for the usage of such simulation-derived datasets, albeit with lower fidelity, in multi-task learning frameworks. Another essential aspect of the validation is the derivation of Psim, from the product of Dsim and Ssim using Eq. 1. While non-equilibrium MD can be used to simulate Psim, it requires a more complex setup and can be computationally intensive.
Multi-task model benchmark
To elucidate the effect of data fusion, we train and compare both ST and MT polyGNN models, using a subset of the experimental data collected and simulation data generated. These models were evaluated based on the predictive accuracy of Pexpt, using various holdout train and test splits of 293 systems (comprised of 80 unique polymers with varying available gas data). For instance, in a 20/80 split, 20% of the Pexpt data is set aside as testing data, while 80% is used to train the model. To ensure representative data sampling, stratified sampling based on polymer SMILES was used when splitting the data into train and test sets. In this type of sampling, when a polymer is selected for the test set, all gas data for Pexpt associated with that polymer are withheld from the training set. This also provides insight into how well the model extrapolates to new unknown polymers. The polyGNN model training parameters used are detailed in Supplementary Table S1.
In Fig. 3, we illustrate the two model types, ST and MT, along with the details of the train and test splits. The performance of the models was evaluated using two key metrics: the coefficient of determination (R2) and the order of magnitude error (OME)–units in Barrer. R2 assesses how well a model predicts an outcome, while OME quantifies the prediction error in terms of orders of magnitude (taken as the logarithm of the mean absolute error). We conducted four random seed selections of the training and test sets to compute the statistics of the model performance.
a Model inputs. This schema illustrates the train and test splits for two model variants: Regular Single Task (ST) and data-fused Multi-Task (MT) models. In the ST models, solely experimental gas permeability data is incorporated. Conversely, the MT model encompasses a possible amalgamation of experimental gas diffusivity and solubility, along with simulated gas permeability, diffusivity, and solubility data. b Benchmark models. Four distinct models were developed to assess the impact of MT learning. The first model (ST) exclusively incorporated experimental gas permeability data. In contrast, the subsequent MT models progressively integrated additional data. The presence of each data type in the model is indicated by a black checkmark. Here, P, D, and S represent gas permeability, diffusivity, and solubility, respectively. The abbreviation expt corresponds to experimental data, while sim signifies simulation data.
Our MT learning methodology comprises two primary components: the integration of simulation data and the inclusion of correlated experimental data. To establish a baseline for comparison, we employ a ST model. Shown in Fig. 3a and represented by the “ST” row in Fig. 3b, the ST model is exclusively trained using Pexpt data. Due to its reliance on limited data and the absence of diverse property inputs, the ST model’s coverage of the chemical space is inherently constrained. As the test set percentage increases, this model is trained on progressively reduced amounts of data, leading to an anticipated decrease in predictive performance. This trend is evident in Fig. 4 where the R2 decreases and the OME increases as the ST model is trained on diminishing data portions. In the most challenging scenario (80% test set size), the R2 dropped to less than 0.50 and the OME increased to ≈0.44 Barrer.
a Coefficient of determination (R2). b Order of magnitude error (OME). R2 evaluates the predictive performance of a model, whereas OME measures the prediction error by considering orders of magnitude, represented as the logarithm of the mean absolute error. The ST and MT models are compared based on varying percentages of the unseen test set. The different test set sizes illustrate the impact of reducing training data. At 80%, the model is trained on only 20% of the dataset and tested on the remaining 80%, reflecting a data-scarce region with limited chemical coverage. Comparatively, the MT models show significant improvement over the ST model, particularly at higher percentages of the unseen test set.
Now let’s consider the first element of our MT learning approach, specifically the augmentation of Pexpt training data with Psim, represented by the “MT-1” row in Fig. 3. The MT-1 model is enriched with simulation data spanning the test set space. Its primary purpose is to exploit the correlations between measured and simulated data learned from the training set. This scenario mirrors situations where experimental data is unavailable, and simulation data is introduced to guide the model’s predictions. Upon examining the MT-1 model, its performance noticeably surpasses that of the baseline ST model. The MT-1 model achieves an average R2 and OME of ≈0.77 and ≈0.30, respectively, as shown in Fig. 4 (MT-1). This improvement is particularly pronounced when the test set size reaches 80%, where the coverage of experimental data within the chemical space is most limited. This accentuates the ability of data fusion models, reinforced with simulation data, to effectively mitigate the challenges of extrapolation that conventional models (trained solely on a single experimental property) would inevitably confront. Furthermore, as another demonstration, this analysis was extended to experimental and simulation data for gas diffusivity, resulting in a similar strengthening in performance, as illustrated in Supplementary Figure S2. This observation underlines the value of bolstering experimental data with simulation data, indicating its potential extension to other properties of interest as well.
Moving on to the second component of our MT learning methodology, we focus on augmenting the Pexpt training data with Dexpt and Sexpt, represented by the “MT-2” row in Fig. 3b. The inclusion of this supplementary data serves the purpose of empowering the model to leverage knowledge from other available pertinent properties and established physics and make predictions for the Pexpt values. In this scenario, a remarkable enhancement is observed, leading to a significant boost in predictive performance. Specifically, the average R2 and OME is ≈0.93 and ≈0.12, respectively, as displayed in Fig. 4 (MT-2). Comparing the MT learning component in the previous passage with this second component reveals a notable difference in performance. While both approaches expand the coverage of the chemical space, MT-2 stands out due to the incorporation of high-fidelity experimental data. Unlike MT-1, where all augmented data comes from simulation, the new information in MT-2 originates from additional experimental sources, contributing to superior predictive capabilities. The MT-2 model can be likened to an ideal scenario where complementary or correlated high fidelity data is readily available. In scenarios where such ideal conditions are not met, the MT-1 approach excels by effectively integrating simulation data to achieve a respectable level of prediction accuracy.
In our final model, we combine the strategies embedded in both the MT-1 and MT-2 models, creating a unified model represented by row “MT-3” in Fig. 3b. This comprehensive model encompasses all available experimental and simulation data points. The performance of the MT-3 model slightly outperforms that of the MT-2 model, exhibiting an elevated average R2 of ≈0.96 and a comparable average OME of ≈0.10, as depicted in Fig. 4 (MT-3). Overall, this model achieves superior performance compared to the base ST model, which had an average R2 and OME is ≈0.57 and ≈0.38, respectively. These results establish the efficacy of integrating simulation and correlated experimental data in successfully addressing the challenges posed by ML extrapolation.
Production model benchmark
In the first iteration of our gas permeability prediction work, deployed at Polymer Genome (https://www.polymergenome.org), a Gaussian process regression algorithm was employed alongside a hierarchical polymer fingerprinting scheme to train a ST model11. In the present work, a transition is made to polyGNN (a recently published Graph Neural Network model that automatically generates fingerprints from SMILES strings), data augmentation, and invariant transformations to train a MT model. The models presented in the preceding section were trained using a subset of our dataset, a deliberate choice made to clearly illustrate the impact of incorporating diverse data types on prediction capabilities in a multi-task setting. Our final production model adopts the MT-3 model scheme and now incorporates all the available experimental and simulation data for gas permeability, diffusivity, and solubility. With this latest model iteration, our objective is to achieve substantial improvements over the previous version and to push the boundaries of transport predictions through polymers. The principal component analysis (PCA) plot in Fig. 5a, created using Polymer Genome fingerprints, displays the chemical space of the present study against 13,000 known polymers in our database. This plot visually demonstrates our production model’s expansion to include additional chemical compositions, increasing the range of polymers for which the model can make accurate predictions. We also present a PCA plot in Supplementary Figure S3, illustrating the chemical coverage of our simulation data in comparison to experimental gas transport data and 13,000 known polymers in our database. Figure 5b highlights the data fusion aspect of the model, showcasing the contrast between the datasets employed in the original and current models. Particularly noteworthy is the considerable enlargement of our dataset, expanding from 315 to 1050 polymers, accompanied by a significant increase in the total number of data points. With this amplified dataset, our model gains the capability to not only predict gas permeability but also include gas diffusivity and solubility. This broader scope of predictions reflects the power of our MT learning approach and its ability to leverage diverse data sources for a more comprehensive understanding of gas transport properties through polymers.
a Principal Component Analysis (PCA) plot. The PCA plot demonstrates an expanded coverage of chemical space by both the original and production models. The orange and blue dots correspond to the coverage of the original and production model, respectively, while the grey dots represent the 13,000 known polymers in our database. b Dataset comparison. A comparison between the original and production models reveals an incorporation of diverse data types. The production model integrates experimental and simulation data for permeability, diffusivity, and solubility properties.
To highlight the superior performance of the present model, Pexpt predictions were made on a holdout test set of 153 systems, consisting of 31 polymers across 13 polymer classes, following a similar approach as the original model11. The summarized outcomes are presented in Table 1, and the specific polymers selected for this assessment are listed in Supplementary Table S2. The overall R2 has increased from 0.93 to 0.95 in the updated model compared to the original. Upon a more detailed examination of individual polymer classes, it becomes evident that the R2 metric exceeds 90% for all classes for the production model, with a particularly significant enhancement observed for polyphosphazenes, where the R2 value has risen from 0.49 to 0.90. Additionally, substantial advancements have been achieved in prediction accuracy for polymers such as polynorbornenes, polypropynes, substituted polyacetylenes, and polypentynes. The diminished performance of the original model in these cases could be attributed to either limited data availability for certain polymer classes or inherent uncertainties within the experimental data. Importantly, it should be noted that the test data points for these specific polymer classes vary widely, ranging from 4 to 53 data points. This variability in data availability across diverse classes could potentially contribute to lower individual R2 values for specific classes while concurrently contributing to a higher overall model R2. Nonetheless, the updated model effectively overcomes these performance variations, highlighting its robustness and versatility. Further insights into the model’s performance are depicted in parity plots showcasing train and test set predictions for the 31 evaluated polymers, shown in Supplementary Figures S4 and S5.
Forward-looking design
The ideal performance of gas separation membranes is related to two intrinsic material properties: the gas permeability and the permselectivity between specific target gas pairs. Ideally, a membrane would provide high permeability and permselectivity to maximize throughput and minimize costs. In 1991, Robeson42 documented a trade-off relationship between these two characteristics for polymers, often referred to as “the upper bound”. This principle asserts that polymers with high permeability typically exhibit diminished selectivity, and vice versa. These upper bounds illustrate the trade-off relationship for pairs of common gases (CO2, CH4, O2, N2, H2, and He), highlighting the best possible combination of permeability and permselectivity. This upper bound establishes a comparative benchmark for evaluating the performance metrics when designing novel membranes. As such, data driven methods that establish a relationship between polymer structure and polymer membrane performance hold immense potential in accelerating the design of tailor-made polymers for specific separation tasks.
To this end, we demonstrate our model’s capability to make these assessments. We constructed permeability trade-off plots for ≈13,000 known polymers (i.e., previously synthesized) for the gas pairs; CO2/CH4, CO2/N2, H2/CH4, H2/CO2, O2/N2, and N2/CH4. Figure 6a shows the permeability trade-off plot for CO2/CH4, while the other gas pairs are shown in Supplementary Figure S6. The ML predicted gas pair permeability and selectivity closely align with the available experimental data and the bounds, while simulation data over-predicts as expected. Both experimental and simulation data are also shown in Fig. 6a. By predicting property values for the ≈13,000 known polymers, we can gain a clearer understanding of the overall trade-off behaviors. Robeson’s upper bound, initially established in 1991, is presented alongside updated bounds introduced in 2008 and 201943,44. PIM-DM-BTrip, a polymer with superior performance, is highlighted as a part of the set of polymers that helped define the 2019 bound.
a Gas permeability, b Gas diffusivity, and c Gas solubility. Using our model to predict ≈13,000 known polymers, we compare the results to experimental data. The orange and blue dots correspond to the experimental data and machine learning predictions, respectively, while the grey dots represent simulated values. The original Robeson upper bound (1991) is depicted as the dashed black line, with reevaluated bounds from 2008 and 2019 shown as red and yellow dashed lines, respectively, for gas permeability. There are no established bounds for gas diffusivity or solubility, but the model predictions closely align with the experimental data values. In the case of CO2/CH4 diffusivity selectivity, the low diffusivity regime has high prediction uncertainty and should be taken with caution.
Research endeavors commonly focus on permeability trade-off plots, however as permeability can be broken down into diffusivity and solubility components, we also created CO2/CH4 trade-off plots for these properties, as shown in Fig. 6b, c. Diffusivity and solubility trade-off plots for CO2/N2, O2/N2, and N2/CH4 are illustrated in Supplementary Fig. S7 and S8. When using these models, the sensibility of predictions can be evaluated by observing common trends for the properties. For example, gas diffusivity tends to follow the relationship of DO2 > DCO2 > DN2 > DCH4, a pattern primarily driven by the molecular diameter effects. However, Fig. 6b, illustrates instances where the CO2/CH4 diffusivity selectivity falls below 1 (i.e., below 0 in the log scale). This contradicts the intuition that CO2 diffusivity should almost always be greater than CH4. Although there are cases where DCO2/DCH4 < 1, it is a rare occurrence. A closer examination of these suspicious predictions reveals that most of them fall in the lower diffusivity regime. In this regime, prediction uncertainty, calculated using Monte Carlo dropout, tends to be inflated, revealing lower confidence for predictions. This heightened uncertainty can be directly attributed to the scarcity of data in this specific range, a challenge that is particularly pronounced in both simulations and experimental measurements. Indeed, as can be seen from Fig. 6b, there are no measured data and only sparse simulation data points in this property range; hence, the ML predictions must be viewed with extreme caution and suspicion. This underscores the importance of recognizing that these models are valuable tools, but they must be used in conjunction with chemical intuition and an understanding of prediction uncertainties, especially for predictions in regions far away from the chemical space of the training set. This becomes especially critical when assessing areas with limited data or when venturing into new domains. These considerations thus mandate either experiments or simulations in such unexplored chemical spaces to better inform the ML models.
Trade-off plots are typically employed in designing amorphous polymers for gas separation, but when considering other applications such as packaging, the degree of crystallinity of the polymer must be considered. Gas transport behavior in semi-crystalline polymers varies due to the crystalline regions acting as impermeable barriers against gas penetration45. Michaels et al. originally described this behavior using a two-phase model, which comprises a crystalline phase and an amorphous phase, where impedance is directly proportional to crystallinity46. Weinkauf et al. extended this model into a three-phase model that incorporates the ratio between the rigid amorphous phase fraction and mobile amorphous phase fraction (RAF/MAF)45. These models provide critical insights into the behavior of semi-crystalline polymers and offer guidelines for tailoring their gas transport properties to specific applications. The present work may be extended to address such practical situations by planning simulations of gas diffusivity and solubility through amorphous, crystalline, and amorphous-crystalline interfaces.
Discussion
In this study, we introduce a novel multi-task (MT) learning approach that leverages a combination of measured and simulation data, along with correlated properties to create a state-of-the-art predictor for gas transport properties. To thoroughly evaluate the effectiveness of this approach, we performed a benchmark study in which we compared the individual impacts of each of these tasks and their collective effect when considered together. It was revealed that the addition of interrelated measured data led to a bigger benefit in enhancing the predictive capabilities of the ML model. The situation indicates that multiple correlated ground truth (i.e., measured) property data are most desirable to generate accurate property forecasts. However, in instances where rich measured data is unavailable, easily producible simulation data, when combined with measured data, demonstrates its potential by offering informed predictions. In any case, both scenarios of MT learning were able to learn underlying physical correlations and are superior to single-task (ST) models that have a less robust basis for predictions.
These ideas have been unified to create a model representing a major advancement in predicting gas transport properties through polymers. This model in a comparative analysis with the prior work displayed concrete improvements, across 13 different polymer classes. Using our new ML model we also generate selectivity trade-off plots for gas permeability, diffusivity, and solubility for ≈13,000 known polymers (i.e., previously synthesized). These plots provide insights into the strengths and limitations of the modes, but more importantly, the need for data across diverse chemical spaces, e.g., via simulations if measured data proves to be laborious to generate. The prospect of continual expansion of the accessible polymer universe will push the frontiers of what is achievable in terms of properties and performance.
Methods
Gas diffusivity calculation in MD simulations
From our MD simulations, the diffusivity (Dsim) of gas molecules was obtained by:
where Ngas is the number of gas molecules in the simulation cell, t is the simulation time, ri(t) is the position of the gas molecule i at time t, ∆ri(t) = ri(t)−ri(0), is the displacement of gas i between time 0 and time t, and 〈∆ri(t)2〉 is the mean square displacement (MSD) of gas molecule i at time t. The gas MSD was block averaged over 2-5 non-overlapping trajectories depending on the system dynamics (depending on whether breaking the trajectory into shorter blocks still allows its MSD curve to reach the Fickian regime). This block averaging was also used to calculate the standard deviation. The diffusivity was obtained using the slope from a least-squares linear fit of the final decade of the MSD data. The log-log slopes of all the systems’ MSD curves are in the range of 0.95–1.05, which is a common range for classical MD simulation diffusivity studies47,48,49.
Gas solubility calculation in MC simulations
The Widom Insertion method was used to calculate the Henry’s constant (k) of gas molecules within a Monte Carlo simulation. In this method, we insert N gas molecules into the simulation box (one at a time, at various random locations), and the excess chemical potential (µex) of the gas in the membrane is obtained. The estimation uses an ensemble average of the N separate, random, insertions, the ith of which will change the internal energy of the system by ∆Ei.
where kb is Boltzmann constant, T is temperature and N = 25,000 insertions. This estimation is defined for the dilute limit, where there are no gas-gas interactions. When an insertion overlaps with the polymer in the system, the µex results in minimal contribution towards gas solubility, and thus these insertions with energies greater than 5kBT are discarded. With µex, we can obtain Henry’s constant, k, of gas molecules by:
With k, we then use Henry’s law which states that the solubility (S) of a gas is directly proportional to the partial pressure of the gas (Pgas), which takes the form:
We assume a standard testing condition of partial pressure equal to 1 atm, and thus:
In this work, for each polymer configuration, we take m = 50 dynamic snapshots from a 5 ns production run on the equilibrated structure in an NVT ensemble. An average solubility, Si of the configuration is obtained by:
For proper convergence and as a measure of standard error, we use n = 25 polymer configurations and Ssim is obtained by:
The standard deviation, σ, and standard error, SE, are then calculated as:
Final gas solubility values were screened for a SE of less than 5%.
polyGNN
The predictive model we used was polyGNN, a multitask graph neural network method that has shown promising results when dealing with large-scale multi-property datasets22. Briefly, polyGNN contains three modules: the Encoder, Message Passing Block, and the Estimator. The inputs to polyGNN are a polymer repeat unit and a property of interest (or, equivalently, the property’s associated selector vector). The two outputs of a polyGNN model are the repeat unit’s fingerprint and the value of the property of interest. In the Encoder, the repeat unit is first converted to a periodic graph, with each atom as a node and each bond as an edge. Then, each node and edge in the graph are given an initial fingerprint. After the graph elements have been assigned their initial features, the graph is passed to the Message
Passing Block. Messages between neighboring atoms are iteratively passed along chemical bonds. After each iteration, every node fingerprint is updated using the messages, while each bond fingerprint remains the same. The message passed from atom j to atom i at time step k is calculated according to Eq. 11.
where each ϕ(k) is a parameterized function, \({x}_{i}^{\left(k\right)}\) and\(\,{x}_{j}^{\left(k\right)}\) are the encodings of neighboring ij atoms after time step k, and ei,j is the fingerprint of the bond that joins atoms i and j. \({m}_{i,j}^{\left(k\right)}=0\) if i, j do not share a chemical bond. After initialization, each node receives messages from all of its neighbors. These messages are aggregated by some permutation-invariant function f (e.g., sum, mean, max). We use the sum in this work. The aggregated message, along with the current node encoding, is used to update the node encoding. The node update process is defined in Eq. 12.
where each χ(k) is a parameterized function, p is a polymer, [1,Np] is the set of integers between 1 and Np, Np is the number of atoms in the repeat unit of p, and x(k) = 0,∀k < 0. Messages are passed for τ time steps, where τ is also the capacity in this work. The fingerprint of the entire polymer, xp, is calculated by the graph aggregation function Ag, as shown in Eq. 13.
Finally, xp and the selector s can be passed to the Estimator. Here, these inputs are mapped to a polymer property prediction, yp, via a parameterized function ψ, which represents the multilayer perceptron (MLP) depth, as shown in Eq. 14.
ψ specifies the number of hidden layers between the input and output layers, with this depth parameterized to range from 2 to 14 layers. During training, the parameters of all ϕ(k), χ(k), ψ are learned simultaneously. As shown in Eq. 12, our update step leverages skip connections, which have been shown to improve the optimization of shallow layers in deep neural networks.
All neural network architectures used dropout layers, fully connected layers, and Leaky ReLU activations (with a negative slope equal to 0.01). MC dropout was implemented by performing 10 forward passes through the network, each time applying dropout to different subsets of nodes. All architectures were created using PyTorch and PyTorch Geometric. The weights of all models were optimized using the Adam optimizer and the mean squared error loss function.
Training procedure
The training procedure used is similar to that in the polyGNN work, where the models are ensemble models, composed of several submodels22. The output of the ensemble is computed by the average of each submodel’s output. The data used for training was grouped based on gas transport type (P, D, & S), gas type, and data source (experiment or simulation). Once grouped, each data subset was then min-max scaled between 0 and 1. The polyGNN model training parameters used are detailed below and also compiled in Supplementary Table S1.
Next, the entire data set was stratified and split into training and test sets (percentages of test sets were 20%, 40%, 60%, or 80%) based on polymer SMILES strings three times. Using the NNDebugger package50, the optimal capacity was found by attempting to overfit (R2 > 0.97) the entire training data set. If the data was not overfit, then the capacity corresponding to the highest R2 value was used. The capacity range considered was between two and fourteen. The training data set was then divided into an 80% hyperparameter (HP) training set and a 20% HP validation set. The remaining HPs (batch size, learning rate, dropout percentage) were optimized using the package scikit-optimize51. The set of HPs corresponding to the lowest RMSE on the HP validation set was considered optimal.
Finally, the training data set was split into five folds using cross-validation (CV), producing one CV train data set and one CV validation data set per fold. For each fold, the model’s HPs were fixed as the optimal HPs and the model’s learnable parameters were fit to the CV train data set for 1000 epochs. At the end of 1000 epochs, the model parameters corresponding to the epoch with the lowest RMSE in the CV validation data set were chosen. After all five models were trained on their respective CV splits, the models were placed in an ensemble. The ensemble was used to make predictions of the test set, that were completely unseen by the ensemble during HP optimization or model training with CV.
Data availability
The experimental sources of data used are reported in the paper. All data, experimental and simulation, are available free of charge at https://github.com/Ramprasad-Group/polyVERSE/tree/main/Other/Gas_permeability_solubility_diffusivity.
Code availability
The Polymer Structure Predictor (PSP) package to create simulation polymer structures is available free of charge at https://github.com/Ramprasad-Group/PSP. The code used to perform molecular dynamics (MD) and Monte Carlo (MC) simulations is available free of charge at https://github.com/Ramprasad-Group/polyVERSE/tree/main/Other/Gas_permeability_solubility_diffusivity. The code used to train our polyGNN models is available at https://github.com/Ramprasad-Group/polygnn for academic use.
Change history
23 January 2025
A Correction to this paper has been published: https://doi.org/10.1038/s41524-024-01494-1
References
Ferreira, A., Alves, V. & Coelhoso, I. Polysaccharide-based membranes in food packaging applications. Membranes 6, 22 (2016).
Baker, R. W. Membrane technology. Encyclopedia of Polymer Science and Technology 2001,
Wijmans, J. & Baker, R. The solution-diffusion model: a review. J. Membr. Sci. 107, 1–21 (1995).
Tran, H., Shen, K.-H., Shukla, S., Kwon, H.-K. & Ramprasad, R. Informatics-driven selection of polymers for fuel-cell applications. J. Phys. Chem. C. 127, 977–986 (2023).
Barnett, J. W. et al. Designing exceptional gas-separation polymer membranes using machine learning. Sci. Adv. 6, eaaz4301 (2020).
Moore, T. T., Damle, S., Williams, P. J. & Koros, W. J. Characterization of low permeability gas separation membranes and barrier materials; design and operation considerations. J. Membr. Sci. 245, 227–231 (2004).
Müller-Plathe, F. Permeation of polymers—a computational approach. Acta Polymerica 45, 259–293 (1994).
Audus, D. J. & de Pablo, J. J. Polymer informatics: Opportunities and challenges. ACS Macro Lett. 6, 1078–1082 (2017).
Batra, R., Song, L. & Ramprasad, R. Emerging materials intelligence ecosystems propelled by machine learning. Nat. Rev. Mater. 6, 655–678 (2021).
Chen, L. et al. Polymer informatics: Current status and critical next steps. Mater. Sci. Eng.: R: Rep. 144, 100595 (2021).
Zhu, G. et al. Polymer genome–based prediction of gas permeabilities in polymers. J. Polym. Eng. 40, 451–457 (2020).
Wu, C. et al. Rational design of all-organic flexible high-temperature polymer dielectrics. Matter 5, 2615–2623 (2022).
Chen, L. et al. Frequency-dependent dielectric constant prediction of polymers using machine learning. npj Computational Mater. 6, 61 (2020).
Wessling, M. et al. Modelling the permeability of polymers: a neural network approach. J. Membr. Sci. 86, 193–198 (1994).
Yuan, Q. et al. Imputation of missing gas permeability data for polymer membranes using machine learning. J. Membr. Sci. 627, 119207 (2021).
Ricci, E. & De Angelis, M. G. A perspective on data-driven screening and discovery of polymer membranes for gas separation, from the molecular structure to the industrial performance. Reviews in Chemical Engineering 2023,
Wang, L., Shao, C., Wang, H. & Wu, H. Radial basis function neural networks-based modeling of the membrane separation process: hydrogen recovery from refinery gases. J. Nat. Gas. Chem. 15, 230–234 (2006).
Rogers, D. & Hahn, M. Extended-connectivity fingerprints. J. Chem. Inf. modeling 50, 742–754 (2010).
Landrum, G. Rdkit documentation. Release 1, 4 (2013).
Huan, T. D., Mannodi-Kanakkithodi, A. & Ramprasad, R. Accelerated materials property predictions and design using motif-based fingerprints. Phys. Rev. B 92, 014106 (2015).
Le, T., Epa, V. C., Burden, F. R. & Winkler, D. A. Quantitative structure–property relationship modeling of diverse materials properties. Chem. Rev. 112, 2889–2919 (2012).
Gurnani, R., Kuenneth, C., Toland, A. & Ramprasad, R. Polymer informatics at scale with multitask graph neural networks. Chem. Mater. 35, 1560–1567 (2023).
Kuenneth, C. & Ramprasad, R. polyBERT: a chemical language model to enable fully machine-driven ultrafast polymer informatics. Nat. Commun. 14, 4099 (2023).
Ramprasad, R., Batra, R., Pilania, G., Mannodi-Kanakkithodi, A. & Kim, C. Machine learning in materials informatics: recent applications and prospects. npj Computational Mater. 3, 54 (2017).
Hutchinson, M. L. et al. Overcoming data scarcity with transfer learning. arXiv e-prints. arXiv–1711 (2017).
Caruana, R. Learning to Learn; Springer, 1998.
Patra, A. et al. A multi-fidelity information-fusion approach to machine learn and predict polymer bandgap. Computational Mater. Sci. 172, 109286 (2020).
Kuenneth, C. et al. Polymer informatics with multi-task learning. Patterns 2, 100238 (2021).
Yang, J., Tao, L., He, J., McCutcheon, J. R. & Li, Y. Machine learning enables interpretable discovery of innovative polymers for gas separation membranes. Sci. Adv. 8, eabn9545 (2022).
Venkatram, S. et al. Predicting crystallization tendency of polymers using multifidelity information fusion and machine learning. J. Phys. Chem. B 124, 6046–6054 (2020).
Weininger, D. SMILES, a chemical language and information system. 1. Introduction to methodology and encoding rules. J. Chem. Inf. Comput. Sci. 28, 31–36 (1988).
Brandrup, J., Immergut, E. H., Grulke, E. A., Abe, A. & Bloch, D. R. Polymer handbook, Vol. 89 (Wiley New York, 1999).
Thompson, A. P. et al. LAMMPS-a flexible simulation tool for particle-based materials modeling at the atomic, meso, and continuum scales. Comput. Phys. Commun. 271, 108171 (2022).
Wang, J., Wolf, R. M., Caldwell, J. W., Kollman, P. A. & Case, D. A. Development and testing of a general amber force field. J. Comput. Chem. 25, 1157–1174 (2004).
Potoff, J. J. & Siepmann, J. I. Vapor–liquid equilibria of mixtures containing alkanes, carbon dioxide, and nitrogen. AIChE J. 47, 1676–1682 (2001).
Sahu, H., Shen, K.-H., Montoya, J. H., Tran, H. & Ramprasad, R. Polymer structure predictor (psp): a python toolkit for predicting atomic-level structural models for a range of polymer geometries. J. Chem. Theory Comput. 18, 2737–2748 (2022).
Abbott, L. J., Hart, K. E. & Colina, C. M. Polymatic: a generalized simulated polymerization algorithm for amorphous polymers. Theor. Chem. Acc. 132, 1–19 (2013).
Frenkel, D.; Smit, B. Understanding molecular simulation: from algorithms to applications; Elsevier, 2023.
Khawaja, M., Sutton, A. & Mostofi, A. Molecular simulation of gas solubility in nitrile butadiene rubber. J. Phys. Chem. B 121, 287–297 (2017).
Longuet-Higgins, H. & Widom, B. A rigid sphere model for the melting of argon. Mol. Phys. 8, 549–556 (1964).
Mocak, J., Bond, A. M., Mitchell, S. & Scollary, G. A statistical overview of standard (IUPAC and ACS) and new procedures for determining the limits of detection and quantification: application to voltammetric and stripping techniques (technical report). Pure Appl. Chem. 69, 297–328 (1997).
Robeson, L. M. Correlation of separation factor versus permeability for polymeric membranes. J. Membr. Sci. 62, 165–185 (1991).
Robeson, L. M. The upper bound revisited. J. Membr. Sci. 320, 390–400 (2008).
Comesaña-Gándara, B. et al. Redefining the Robeson upper bounds for CO2/CH4 and CO2/N2 separations using a series of ultrapermeable benzotriptycene-based polymers of intrinsic microporosity. Energy Environ. Sci. 12, 2733–2740 (2019).
Weinkauf, D. & Paul, D. Effects of structural order on barrier properties; ACS Publications, 1990.
Michaels, A. S. & Bixler, H. J. Solubility of gases in polyethylene. J. Polym. Sci. 50, 393–412 (1961).
Shen, K.-H., Brown, J. R. & Hall, L. M. Diffusion in lamellae, cylinders, and double gyroid block copolymer nanostructures. ACS Macro Lett. 7, 1092–1098 (2018).
Shen, K.-H. & Hall, L. M. Effects of ion size and dielectric constant on ion transport and transference number in polymer electrolytes. Macromolecules 53, 10086–10096 (2020).
Shen, K.-H. & Hall, L. M. Ion conductivity and correlations in model salt-doped polymers: Effects of interaction strength and concentration. Macromolecules 53, 3655–3668 (2020).
Gurnani, R. P. Debugging Neural Networks. https://nanohub.org/resources/netdebugger (2021).
Head, T. et al. scikit-optimize/scikit-optimize: v0. 5.2. Version v0 2018, 5.
PACE Partnership for an Advanced Computing Environment (PACE). (2017).
Acknowledgements
This work is financially supported by Toyota Research Institute through the Accelerated Materials Design and Discovery program and the Office of Naval Research through a multidisciplinary university research initiative (MURI) grant N00014-20-1-2586. This research is supported in part through research cyber-infrastructure resources and services provided by the Partnership for an Advanced Computing Environment (PACE) at the Georgia Institute of Technology, Atlanta, Georgia, USA52. The authors thank XSEDE/ACCESS for computational support through Grant No. TG-DMR080058N.
Author information
Authors and Affiliations
Contributions
The work was conceived and guided by R.R. B.K.P. designed, trained, and evaluated the machine learning models. B.K.P. and KH.S. developed the simulation pipelines. B.K.P., KH.S., R.G., H.T., R.L., and R.R. discussed the results and commented on the manuscript.
Corresponding author
Ethics declarations
Competing interests
R.R. is a founder of Matmerize, Inc., a company specializing in materials informatics software and services. The other authors have no conflicts of interest to declare.
Additional information
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary information
41524_2024_1373_MOESM1_ESM.pdf
Supplementary Information: Multi-fidelity machine learning predictors for gas permeability through polymers: an example of experimental and simulation data fusion
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.
About this article
Cite this article
Phan, B.K., Shen, KH., Gurnani, R. et al. Gas permeability, diffusivity, and solubility in polymers: Simulation-experiment data fusion and multi-task machine learning. npj Comput Mater 10, 186 (2024). https://doi.org/10.1038/s41524-024-01373-9
Received:
Accepted:
Published:
DOI: https://doi.org/10.1038/s41524-024-01373-9
This article is cited by
-
Leveraging generative models with periodicity-aware, invertible and invariant representations for crystalline materials design
Nature Computational Science (2025)
-
Polymer design for solvent separations by integrating simulations, experiments and known physics via machine learning
npj Computational Materials (2025)
-
A physics-enforced neural network to predict polymer melt viscosity
npj Computational Materials (2025)