Lipid Nanoparticle Database towards structure-function modeling and data-driven design for nucleic acid delivery

Collins, Evan; Ji, Jungyong; Kim, Sung-Gwang; Witten, Jacob; Kim, Seonghoon; Zhu, Richard; Park, Peter; Jung, Minjun; Park, Aron; Manan, Rajith S.; Rudra, Arnab; Keum, Gyochang; Bang, Eun-Kyoung; Jin, Jun-O; Jeang, William J.; Langer, Robert; Anderson, Daniel G.; Im, Wonpil

doi:10.1038/s41467-026-68818-1

Download PDF

Article
Open access
Published: 28 January 2026

Lipid Nanoparticle Database towards structure-function modeling and data-driven design for nucleic acid delivery

Nature Communications volume 17, Article number: 2464 (2026) Cite this article

14k Accesses
5 Altmetric
Metrics details

Subjects

Abstract

Lipid nanoparticles (LNPs) are the leading nonviral nucleic acid delivery technology, but LNP structure-function data remains fragmented and nonstandardized. Unlike protein engineering which is anchored by the centralized Protein Data Bank, the LNP field lacks a unified repository for systematic analysis. To address this, we develop Lipid Nanoparticle Database (LNPDB) (https://lnpdb.molcube.com), an integrated database and web tool that consolidates structural and functional data for 19,528 LNPs. LNPDB standardizes LNP featurization by encoding lipid composition, experimental methods, and functional results, and generates CHARMM force field files for constituent lipids to enable molecular dynamics simulations. LNPDB also supports future data contributions for continued growth. We examine the utility of LNPDB through two applications: advancing our deep learning model for predicting LNP delivery performance, and simulating bilayer dynamics to identify structural features – bilayer stability and critical packing parameter – that correlate with LNP delivery performance. Altogether, LNPDB provides the digital framework for LNP modeling and data-driven rational design.

A Comprehensive Dataset of Lipid Nanoparticle Compositions and Properties for Nucleic Acid Delivery

Article Open access 20 December 2025

Designing lipid nanoparticles using a transformer-based neural network

Article Open access 15 August 2025

Challenges and opportunities in computational studies for lipid nanoparticle development

Article Open access 01 September 2025

Introduction

Lipid nanoparticles (LNPs) have emerged as the leading nonviral nucleic acid delivery technology across a variety of applications, including genome editing and protein replacement therapies for genetic diseases, and vaccines for infectious diseases and cancer¹. In recent years, mRNA delivered via LNPs has been essential in combating serious infection and the spread of COVID-19². While LNP delivery systems have demonstrated therapeutic efficacy, the way in which LNP structural composition affects functional delivery of nucleic acids is incompletely understood. Greater understanding of the structure-function relationship of LNPs has the potential to facilitate the development of the next-generation of rationally-designed nanomedicines³.

LNPs for nucleic acid delivery commonly consist of four lipid components⁴. The primary component is the ionizable cationic lipid, which complexes with the negatively charged nucleic acid and facilitates endosomal escape⁵. The other components include the helper lipid, cholesterol, and polyethylene glycol (PEG) lipid⁴. Extensive in vitro and in vivo screening over decades has revealed that varying both the type and ratios of these four lipid components significantly affects LNP delivery performance^{5,6,7,8,9,10,11}. Yet, the resulting data from these screens have remained dispersed across studies without standardized formatting, limiting systematic analysis.

This challenge of fragmented data in the LNP field differs from the data infrastructure in protein engineering, where the recent success of deep learning models like AlphaFold^12,13 was made possible by the Protein Data Bank (PDB), a centralized repository that compiles over 200,000 protein structures derived from decades of structural biology experiments. The PDB-to-AlphaFold paradigm underscores the foundational role of large, high-quality datasets in enabling deep learning breakthroughs in the biosciences¹⁴. However, in contrast to protein engineering, the lipid-based nanomedicine field lacks a unified repository for LNP structure-function data, presenting a barrier to machine learning and predictive modeling.

In recent years, there have been efforts to incorporate machine learning into the screening of mRNA LNPs. One prior study used classifier models trained on 584 LNPs with different ionizable lipids to predict delivery efficacy¹⁵. Another study developed a graph neural network model, AGILE, trained on 1200 LNPs with different ionizable lipids to predict efficacy¹⁶. Most recently, a group introduced a message-passing neural network architecture, LiON, trained on 8727 LNPs to engineer new best-in-class ionizable lipids¹⁷. All of these methods, including a recent effort to synchronize data across studies¹⁸, reflect important first steps in bringing machine learning to lipid nanomedicine; however, there are areas for improvement. First, the datasets used to train these models are limited in size and scope, and offer no way to incorporate future LNP data, restricting their long-term utility. Second, these approaches have focused primarily on ionizable lipid design, overlooking the established contributions of helper lipid^9,11, cholesterol¹⁰, and PEG lipid^19,20 compositions and ratios to LNP performance. Third, these studies have relied on representing ionizable lipids as two-dimensional static graphs as input features for model learning, neglecting potentially important three-dimensional conformational and dynamic features. Current experimental techniques like small-angle X-ray scattering (SAXS)^21,22 and cryogenic electron microscopy (cryo-EM)^22,23 are low-throughput, low-resolution, and cost-prohibitive, which makes it challenging to obtain the three-dimensional structural data on lipids needed for modeling.

Towards addressing these limitations to advance data-driven rational design for nucleic acid delivery, here we develop Lipid Nanoparticle Database (LNPDB) (https://lnpdb.molcube.com). LNPDB is an integrated database and web tool that compiles structure-function data for 19,528 LNP formulations, representing 12,845 unique ionizable lipids across 42 publications (as of August 2025). LNPDB standardizes the featurization of LNPs by encoding their lipid composition, experimental methods, and functional results. LNPDB allows users to systematically search and filter the database. Future user contributions are also supported, enabling LNPDB to expand over time as new data are deposited. Additionally, LNPDB provides CHARMM²⁴ force field topology and parameter files for all constituent lipids, allowing all-atom molecular dynamics (MD) simulations to generate three-dimensional, time-resolved lipid data that can enhance predictive modeling. For rational LNP design, MD simulations offer a new modality to generate dynamic structural data for lipids not readily accessible with current experimental methods.

In this paper, we introduce the curated dataset currently available in LNPDB and outline the functionality of the accompanying web tool. We next examine two applications of LNPDB towards learning structure-function relationships of LNPs. First, we improve our deep learning model LiON for predicting LNP delivery performance. Second, we simulate bilayer dynamics for select LNP formulations and find that two structural features—bilayer stability and critical packing parameter (CPP) of the ionizable lipid—are associated with LNP delivery performance. Recent studies^25,26,27 in LNP design have used MD to study mRNA LNP behavior, specifically to investigate pH-sensitive structural transitions; however, our study is the first to leverage features extracted from MD simulations to predict LNP performance, providing a physics-based, data-efficient alternative to traditional deep learning models that rely on two-dimensional static chemical structures. Altogether, this work develops LNPDB as a tool to advance LNP modeling and data-driven rational design for nucleic acid delivery.

Results

LNPDB is an interactive LNP structure-function data repository

The basis of LNPDB is structure-function data for 19,528 LNP formulations for nucleic acid delivery, representing 12,845 unique ionizable lipids across 42 publications^{6,7,11,15,16,17,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51,52,53,54,55,56,57,58,59,60,61,62,63}. Additionally, 269 commercially available ionizable lipids are provided (Fig. 1a, Supplementary Fig. 1, and Supplementary Table 1). LNPDB standardizes an encoding strategy for LNPs based on three general classes of features: composition, performance, and simulation (Fig. 1b). Composition features include lipid types and ratios, along with the ionizable lipid-to-nucleic acid ratio. Lipid type is represented by name and SMILES (Simplified Molecular Input Line Entry System⁶⁴) strings, including parsed head-linker-tail substructures for ionizable lipids, as well as separate representations for their +1e protonated states. Experimental features include methods (e.g., mixing, delivery target, route of administration, batching, cargo type, readout technique) and functional results. Simulation features include CHARMM²⁴ force field topology and parameter files for lipids comprising each LNP formulation, supporting all-atom MD simulations.

**Fig. 1: LNPDB is a data repository and web tool for compiling and uploading LNP structure-function data.**

We have developed a web tool for LNPDB available at https://lnpdb.molcube.com. The interactive website allows users to view and search the database (Fig. 1c). Users can search for specific LNPs by properties such as library source, atomic characteristics of ionizable lipids, types of helper lipid, cholesterol, and PEG lipid, molar ratio ranges, and experimental properties. Alternatively, users can search for LNPs by ionizable lipid structure or sub-structure, either by selecting from a list of head, linker, and tail groups, or by using a chemical structure drawing sketch tool. The search functionality built into LNPDB allows researchers to systematically analyze the current LNP landscape and identify any underexplored regions of chemical space for potential lipid innovation. Additionally, researchers can deposit their own LNP structure-function data using the standardized template provided on the website. This helps ensure that LNPDB grows over time as the LNP field evolves.

To visualize the diversity of LNPs and ionizable lipids present in LNPDB, representative embeddings were created using UMAP (see “Methods”). The resulting LNP and ionizable lipid landscapes demonstrate clustering patterns that largely correspond to library (i.e., publication) source, suggesting that the individual 42 studies included in LNPDB tend to explore distinct, non-overlapping regions of lipid design space (Fig. 2a). This is reinforced by our finding that within-library LNP pairs exhibit significantly higher similarity than across-library pairs (Supplementary Fig. 2a), and a UMAP of LNP fingerprints from our deep learning model LiON (as discussed below) similarly yields library-specific clusters, albeit with less pronounced separation (Supplementary Fig. 2b).

**Fig. 2: LNPDB includes diverse LNP data from 19,528 formulations across 42 studies.**

Beyond global diversity patterns, LNPDB features diverse compositional and experimental features. As shown in Fig. 2b, the molecular weight of the 12,845 unique ionizable lipids ranges from 201.31 to 3984.45, with a mean of 864.24 ± 393.12. The number of nitrogens present in each ionizable lipid ranges from 1 to 17, with a mean of 2.54 ± 1.41. There are 12,000 ionizable lipids with only tertiary nitrogen(s); 6437 with both tertiary and secondary nitrogens; 299 with tertiary, secondary, and primary nitrogens; 677 with only secondary nitrogen(s); and 8 with both secondary and primary nitrogens.

As for the other components besides the ionizable lipid, among the 19,528 LNP formulations, the ionizable lipid-to-nucleic acid mass ratio is most often set to 10, with a range from 0.86 to 44.58 and mean of 10.78 ± 4.45 (Fig. 2c). The distribution of helper lipid type consists of DOPE (47.7%), DSPC (28.3%), DOTAP (12.6%), none (2.9%), MDOA (2.9%), DDAB (1.8%), 14:0 PA (1.8%), and 18:0 PG (1.8%). The distribution of PEG lipid type consists of DMG-PEG2000 (57.1%), DMPE-PEG2000 (32.4%), unreported (9.2%), none (0.9%), DSG-PEG2000 (0.1%), ALC-0159 (0.1%), DMG-C-PEG2000 (0.1%), C8-Ceramide-PEG2000 (0.1%), C16-Ceramide-PEG2000 (0.1%), DSPE-PEG2000 (0.1%), DMG-PEG5000 (0.01%), DPG-PEG2000 (0.01%), DPG-PEG5000 (0.01%), DSG-PEG5000 (0.01%), DOG-PEG2000 (0.01%), and DOG-PEG5000 (0.01%).

For experimental features, the distribution of nucleic acid cargo consists of mRNA (74.7%), siRNA (19.2%), and pDNA (6.1%) (Fig. 2d and Supplementary Fig. 1b). With respect to the type of cargo encoded by the nucleic acid, the distribution consists of firefly luciferase (90.9%), DNA barcode (3.4%), peptide barcode (2.1%), human erythropoietin (1.4%), Factor VII (0.6%), green fluorescent protein (0.6%), and renilla luciferase (0.3%). The primary delivery target involves in vitro (78.4%), lung epithelium (9.6%), liver (4.6%), muscle (2.5%), spleen (1.3%), multiorgan (1.0%), heart (0.5%), lung (0.5%), and kidney (0.5%). For the preparation method, 93.8% of LNPs were handmixed; the remaining 6.2% were prepared via microfluidics. The readout methods report luminescence (75.2%), discretized luminescence (15.7%), protein abundance (3.9%), cellular uptake (2.5%), editing efficiency (0.7%), LRP6 knockdown (0.6%), diameter (0.5%), zeta potential (0.5%), and percent hemolysis (0.5%). Luminescence measurements quantify LNP delivery performance by reporting the level of nucleic acid transfection in target in vitro or in vivo systems. Additional summary statistics are shown in Supplementary Fig. 1.

LNPDB facilitates an improved deep learning model for predicting LNP delivery performance

In a prior study¹⁷, we introduced lipid optimization using neural networks (LiON), a deep learning model for learning ionizable lipid design towards predicting LNP delivery efficacy. LiON uses deep message-passing neural networks (D-MPNNs)⁶⁵ to learn representations of ionizable lipid structure, while additional formulation details such as component ratios and experimental context are appended as auxiliary features to guide prediction.

The dataset used to train this original version of LiON involved 8727 LNP formulations¹⁷. Building on this foundation, LNPDB expands the dataset more than twofold by incorporating an additional 10,801 LNP formulations, bringing the total to 19,528. Beyond scale, LNPDB captures a more diverse and descriptive set of features for each formulation. The newly added formulations broaden the diversity of ionizable lipids and also place greater emphasis on varying the types and ratios of the other three LNP components. Moreover, as detailed in the next subsection, unlike the original dataset for LiON, LNPDB includes MD–ready CHARMM force field files for all constituent lipids, introducing a new, physics-based modeling modality altogether for assessing LNP structure-function relationships.

Given that a more robust and diverse dataset can enhance model generalization, we first revisited our deep learning framework LiON to evaluate how training on LNPDB impacts predictive performance compared to the original dataset. To compare model performance, we trained LiON on both the original dataset and LNPDB using a 70–15–15% train-validation-test split, partitioned with respect to amine identity (see “Methods”). Similar to a prior study¹⁷, we evaluated model performance as measured by the correlation between predicted and experimental delivery values. Test datasets shared between the original data and LNPDB were evaluated. The results demonstrate that LiON achieves modestly improved predictive performance for 5 out of the 7 test datasets when trained on the larger LNPDB dataset (Fig. 3a and Supplementary 3a). Overlap between LiON-learned embeddings for the original and LNPDB-added data indicates shared structure-function patterns and densely covered feature space (Supplementary Fig. 4). Moreover, despite limitations of integrating data from multiple studies as discussed in Methods, LiON models trained across datasets achieved higher predictive performance than those trained on single datasets (Supplementary Fig. 5), suggesting that training across multiple studies in LNPDB enabled LiON to learn more generalizable structure-function relationships.

**Fig. 3: LNPDB facilitates an improved deep learning model for predicting LNP delivery performance.**

With our improved LiON model trained on LNPDB, we next sought to compare the predictive performance of our model with another published LNP deep learning model, AGILE¹⁶. We find that LiON trained on LNPDB achieves significantly better predictive performance compared to AGILE for the 4 held-out test sets evaluated across different delivery targets (Fig. 3b and Supplementary Fig. 3b). Model performance was assessed using five train/validation splits with a fixed held-out test set, with mean correlation coefficients and standard deviations reported across folds. LiON trained on LNPDB has 16-fold more training data compared to AGILE, potentially enabling it to learn a broader range of structure-function relationships and generalize better to unseen data. Altogether, LNPDB supports an improved deep learning model for predicting LNP delivery performance. Importantly, LNPDB establishes a framework of training data for the continued development of next-generation deep learning models for LNP design. Moreover, given that our results demonstrate significant variation in model accuracy across libraries, future research can leverage the training data of LNPDB to design alternative models that may be better suited for the specific LNP design strategy (e.g., helper lipid optimization) or delivery target (e.g., in vivo muscle) under investigation.

LNPDB facilitates MD simulations to uncover LNP structure-function relationships

By providing CHARMM force field topology and parameter files for each lipid in 19,528 LNP formulations, LNPDB supports all-atom MD simulations of the full dataset, as well as any new formulations constructed from its constituent lipids. LNPDB represents a substantial advancement over existing CHARMM-GUI resources, which were limited to ionizable lipids comprising only 6 different head group types and 5 different tail group types⁶⁶. Moreover, new lipids uploaded to LNPDB will be automatically parametrized for MD simulation as well.

MD offers a complementary alternative to machine learning models for understanding the structure-function relationships of LNPs, differing not only in the type of data produced but also in the way the data is generated. MD simulations yield three-dimensional, temporal structural information for all constituent lipids—not just the ionizable lipid—capturing both type and ratio. For small organic molecules such as those comprising LNPs, the conformational dynamics and interactions with neighboring molecules may play an outsized role in function, making MD especially informative. The importance of MD is amplified by the limited accessibility of three-dimensional structural data for lipids, as the experimental techniques SAXS^21,22 and cryo-EM^22,23 are low-throughput, low-resolution, and cost-prohibitive. Furthermore, unlike machine learning approaches, MD does not require training data, which is particularly valuable given that, as shown in Fig. 2a, most new ionizable lipids lie outside the distribution of previously characterized structures, potentially limiting machine learning generalization.

To demonstrate how LNPDB can be used to facilitate MD simulations, we simulated the bilayer equilibration process for a subset of LNP formulations and extracted structural features to assess correlation with experimental transfection (Fig. 4a). This use case represents just one of many potential simulation strategies enabled by the database (see “Discussion”). We used the CHARMM force field files of lipids in LNPDB to model representative bilayers for select LNP formulations (Supplementary Table 2). Each leaflet contained approximately 100 lipids. PEG lipids were excluded from our analyses, as they are typically shed prior to endosomal escape^19,67,68, the key bottleneck for effective delivery⁶⁹, and the physiological context that we aim to model here.

**Fig. 4: LNPDB facilitates MD simulations of LNP membrane dynamics, uncovering new structure-function relationships towards predicting delivery performance.**

For a given LNP formulation, two bilayer conditions were simulated: fully-neutral ionizable lipids and half-neutral, half-protonated ionizable lipids. These conditions represent, respectively, the neutral pH prior to cellular uptake and the early endosome environment (pH ~ 6.5) where roughly 50% of ionizable lipids would be protonated, assuming a pKa of 6.5⁷⁰. All-atom simulations (N = 134; 77 fully-neutral, 57 half-protonated) were run using OpenMM⁷¹ for 1.5 µs to allow for bilayer equilibration, which we observe occurring before 1 µs (Supplementary Fig. 6; see “Methods”). Any bilayer that did not remain intact—namely, through ionizable lipids escaping from the membrane—was terminated early. Additional details on simulation conditions are provided in Supplementary Table 2.

Snapshots of the final frame and density profiles of select LNP bilayer systems are shown in Fig. 4b. In all systems, protonated ionizable lipids are generally oriented with their head groups exposed at the membrane–water interface, whereas neutral ionizable lipids exhibited more variable behaviors—some remained at the surface (Fig. 4b center), while others were buried within the hydrophobic core (Fig. 4b left). We observed that certain simulated bilayers from the LM_2019 LNP library⁷ were unstable, with ionizable lipids dissociating from the membrane over the course of the simulation (Fig. 4b, right). Notably, we find that simulated bilayer stability is positively associated with experimental transfection, indicating in silico membrane behavior could be a useful screening criterion for the delivery potential of candidate LNPs (Fig. 4c).

Next, we aimed to analyze the simulated bilayers for additional structural features that may correlate with experimental delivery efficacy. One structural feature that we sought to quantify was CPP, which has been used to relate lipid shape to phase behavior and is hypothesized to influence endosomal escape efficiency^22,25,26,69. We computed CPP values averaged across ionizable lipids for each LNP bilayer according to two different approaches: one based on volume (CPP_V) and the other based on radii of gyration (CPP_Rg) (Fig. 4d; see “Methods”). CPP values < 1 indicate a cone shape (i.e., narrower at the hydrophobic tail region buried in the membrane than at the hydrophilic head group exposed to the aqueous interface). CPP values close to 1 indicate a cylindrical shape. CPP values > 1 indicate an inverted cone shape, which has been implicated in promoting an inverse hexagonal (H_II) phase, promoting membrane fusion and endosomal escape^22,69. Our CPP values derived from MD simulations demonstrate comparable relative differences consistent with experimental measurements⁷² for two ionizable lipids (Supplementary Fig. 7), with neutral forms having higher CPP values than their protonated counterparts (Supplementary Figs. 7 and 8), a finding which aligns with prior SAXS experiments²⁶.

We next analyzed the subset of N = 34 different LNP formulations from the LM_2019 library⁷ that formed stable bilayers during the simulations to assess if their ionizable lipid CPP values correlate with experimental delivery performance. We find that ionizable lipid CPP significantly predicts LNP performance. For protonated ionizable lipids, the CPP_V approach based on volume yields a Pearson correlation of 0.530 with delivery performance. This association strengthened when stratified by amine group: amine 12 (r = 0.761), amine 2 (r = 0.798), amine 3 (r = 0.501) (Fig. 4e top). The alternative CPP_Rg approach based on radii of gyration of the protonated ionizable lipids also shows a robust (r = 0.546) correlation with performance, with similarly improved associations when analyzed by amine group: amine 12 (r = 0.745), amine 2 (r = 0.760), amine 3 (r = 0.420) (Fig. 4f top). Compared to protonated ionizable lipids, neutral ionizable lipids in fully-neutral systems demonstrate comparably strong correlations between CPP and delivery performance (Fig. 4e–f bottom). For both CPP approaches, we also find that the significant correlation with performance holds when analyzing neutral ionizable lipids in the half-protonated systems (Supplementary Fig. 9). Moreover, when we focused our analyses on the subset of LNPs with mean CPP values greater than 1—corresponding theoretically to a transition to negative curvature—the correlative performance improved for both the CPP_V method (protonated: overall r = 0.723; neutral: overall r = 0.680) and CPP_Rg method (protonated: overall r = 0.621; neutral: overall r = 0.646) (Supplementary Fig. 10). Some correlations for amine 3 did not reach statistical significance, likely due to its limited representation of LNPs with CPP > 1. Importantly, overall, these MD-derived correlations with experimental delivery are greater than those of the LiON deep learning model for the LM_2019 fully held-out test set (r = 0.104) (Fig. 3b), underscoring the potential of MD as an alternative data-efficient modality for uncovering structure-function relationships. Moreover, we assessed whether the inclusion of PEG lipid or reducing the temperature to 298 K in simulations affected CPP and found no significant effect (Supplementary Fig. 11).

Next, we measured additional structural features of the simulated bilayers: membrane thickness, torque density, and compressibility (see “Methods”). Torque density values of the fully-neutral systems are positively associated with delivery performance (r = 0.461); however, high variance in this metric limits the strength of this conclusion. The remaining features for the simulated LNP formulations did not have any significant correlations with performance (Supplementary Fig. 12). Interestingly, CPP_V variance also significantly predicts delivery performance, suggesting that greater ionizable lipid polymorphism could allow for more effective delivery, potentially due to increased capacity to accommodate more inverse-conical lipid geometries (Supplementary Fig. 13).

Discussion

Despite decades of research and widespread use of LNPs for nucleic acid delivery, no centralized repository exists for compiling LNP structure-function data. Here, we introduce LNPDB, the first large-scale, integrated dataset and web tool for storing, analyzing, and uploading LNP structure-function data.

As of August 2025, LNPDB contains a diverse collection of 19,528 LNP formulations spanning 42 studies and one commercial source, with features capturing lipid chemistry, formulation parameters, experimental conditions, and functional readouts. The web interface allows users to search and filter LNPs by key properties. The database also provides CHARMM force field topology and parameter files for all constituent lipids, facilitating MD simulations on any LNP formulation or custom lipid combination.

We demonstrate that LNPDB enhances predictive modeling through two distinct yet complementary approaches—machine learning and MD. First, when used to retrain our deep learning model LiON¹⁷, LNPDB doubles the training set size and improves predictive performance across test datasets compared to the original LiON model and the AGILE model¹⁶. The robust and growing foundation of training data provided by LNPDB can support the future development of even more predictive and generalizable deep learning models for LNP design. Second, we leverage the CHARMM force field files provided in LNPDB to perform MD simulations, uncovering two structural features—bilayer stability and CPP—that correlate with LNP delivery performance for the selected dataset. Notably, CPP values derived from MD show stronger associations with performance than the LiON model’s predictions on the same held-out dataset, suggesting that MD provides an orthogonal, data-efficient modality for structure-function discovery.

MD offers unique advantages over deep learning for assessing LNP structure-function relationships. Unlike current machine learning approaches that focus primarily on two-dimensional ionizable lipid structure, MD can inherently account for all four LNP components, as well as their molar ratios, capturing the full multicomponent nature of the system. MD simulations generate three-dimensional, time-resolved structural data, providing insights that are inaccessible through static graph-based models and difficult to obtain experimentally via SAXS and cryo-EM. This capability is especially valuable given that the lack of structural definition of nanomedicine remains a major barrier to both therapeutic efficacy and regulatory approval³. Furthermore, MD is data-efficient: it does not require large training datasets, making it beneficial for evaluating novel chemistries, as well as underrepresented formulation spaces that preexisting datasets do not effectively capture. This is especially relevant for LNPs, where the large combinatorial space of lipid types and ratios results in sparse data that can pose a major challenge for machine learning model generalization. For example, when screening novel or underrepresented helper lipids, ML models trained on LNPDB may struggle to generalize because the dataset is disproportionately comprised of DOPE, DSPC, and DOTAP (Fig. 2c), reflecting the field’s longstanding reliance on these lipids. In such cases, MD can provide complementary value by directly modeling the physical interactions of these underrepresented helper lipids, offering mechanistic insights that are not dependent on prior training data. Moreover, MD can be well-suited for small molecules like lipids, where dynamic shape, orientation, and local interactions can have outsized effects on function. In future applications, MD and machine learning may complement one another, with MD simulations contributing dynamic structural data as input features for deep learning models.

A key limitation of current LNP data—including those compiled in LNPDB—is the difficulty of comparing across studies due to variability in experimental methods (e.g., dose, cell type, animal model, nucleic acid purity, imaging equipment, injection technique, etc.). By establishing LNPDB as a centralized repository, we aim to encourage researchers to incorporate standardized LNP controls in future in vitro and in vivo screens to enable more effective cross-study comparisons. These standardized LNP controls could be Spikevax² (50% SM-102, 10% DSPC, 38.5% cholesterol, 1.5% DMG-PEG2000) or Onpattro⁷³ (50% DLin-MC3-DMA, 10% DSPC, 38.5% cholesterol, 1.5% DMG-PEG2000), FDA-approved LNPs for COVID-19 and transthyretin-induced amyloidosis, respectively.

Future research should explore experimental validation of MD-derived features such as CPP. The MD bilayer models presented in this work provide a simplified yet informative framework that yields significant correlations with delivery performance for the evaluated dataset. However, future research should leverage the topology and parameter files provided in LNPDB to expand simulation efforts to include additional delivery-relevant phenomena, such as membrane fusion dynamics, interactions with nucleic acids, and dynamic pH sensitivity during endosomal escape. To support MD simulations with larger system sizes, longer time scales, and the inclusion of nucleic acids, we plan to incorporate Martini 3 coarse-grained lipid and nucleic acid parameters⁷⁴ in future versions of LNPDB. This will facilitate efficient simulations for many more LNPs to further explore structure-function relationships. Moreover, although this version of LNPDB includes some LNPs with five lipid components, future versions will further incorporate LNPs with more than four components (e.g., additional lipids⁸ or lipids conjugated to targeting ligand⁷⁵).

Large, multi-modal datasets will be essential for advancing computational screening approaches in biomolecular design. In protein science, the PDB has provided the foundation for machine learning advances, including the development of AlphaFold^12,13. In a similar way, LNPDB aims to standardize and centralize structure-function data for LNPs, enabling both machine learning and MD modeling and simulation. Altogether, LNPDB is a tool to advance LNP modeling and data-driven design towards more effective nonviral nucleic acid delivery vehicles.

Methods

Data collection

To collect the data for the 19,528 LNPs featured in this initial version of LNPDB, we followed the same method as introduced in our prior study¹⁷. In summary, publications were selected from the literature based on the presence of large screening datasets, primarily focused on ionizable lipids, to allow for meaningful within-dataset comparisons. Additional publications were selected to broaden the representation of helper lipids, cholesterols, and PEG lipids. SMILES were created for each lipid for each publication. Functional data—most commonly delivery performance—were extracted from published heatmaps and bar plots by digitizing the figures and interpolating values based on either the color scale (heatmaps) or bar height (bar plots) as defined in the accompanying legends. Because delivery values are often reported on different scales across studies and modalities, for each publication and for each delivery context (e.g., in vivo or in vitro within the same publication), functional delivery data were standardized to have a mean 0 and a standard deviation 1. When raw luminescence values spanned several orders of magnitude, they were first log-transformed prior to standardization. Predictive performance results were separated by publication, as datapoints were treated as directly comparable within individual screens, but not necessarily across different screens or assay modalities. Moreover, standardization was applied to prevent overemphasis of any single dataset. The deep learning models used in this study do train across datasets to maximize the diversity of trainable data, though we recognize that the ability of data from one study to inform structure-function relationships in another is limited by inherent differences in experimental protocols, measurement modalities, and assay sensitivities across studies that may introduce systematic biases. LNPDB introduces experimental condition variables (e.g., solvents, dose) towards bridging studies, but these additions can only partially mitigate the systematic differences across laboratories and experimental setups (see “Discussion”).

A total of 269 ionizable lipids from the commercial supplier BroadPharm were also included in LNPDB, representing the full set available on the vendor’s website as of June 1, 2024.

Featurization

LNPDB standardizes an encoding strategy for LNPs based on three general classes of features: composition, performance, and simulation. Additional organizational descriptors include LNP ID, experiment ID, formulation ID, publication link, and publication PubMed ID.

Composition features include ionizable lipid name, ionizable lipid SMILES, ionizable lipid amine name, ionizable lipid amine SMILES, ionizable lipid linker name, ionizable lipid linker SMILES, ionizable lipid tail 1 name, ionizable lipid tail 1 SMILES, ionizable lipid tail 2 name, ionizable lipid tail 2 SMILES, ionizable lipid tail 3 name, ionizable lipid tail 3 SMILES, ionizable lipid tail 4 name, ionizable lipid tail 4 SMILES, ionizable lipid molar ratio, ionizable lipid-to-nucleic acid mass ratio, helper lipid name, helper lipid SMILES, helper lipid molar ratio, cholesterol name, cholesterol SMILES, cholesterol molar ratio, PEG lipid name, PEG lipid SMILES, PEG lipid molar ratio, fifth component lipid name, fifth component lipid SMILES, fifth component molar ratio, aqueous buffer, and dialysis buffer.

Experimental features include mixing preparation method, model (i.e., in vitro or in vivo), model system (e.g., HeLa), model target (e.g., lung), route of administration, cargo, cargo type (i.e., encoded protein), nucleic acid dose, readout method, batching approach, and readout value (i.e., most commonly a measure of delivery performance).

For simulation features, CHARMM topology and parameter files for all lipid components were generated (see “All-atom molecular dynamics simulations”). Ionizable lipids were modeled in both neutral and +1e protonated states. A majority of LNPs (13,097) in LNPDB include ionizable lipids that have more than one nitrogen, often with several plausible protonation sites. For simplicity, LNPDB assigns one representative +1e protonated state per ionizable lipid. To select the nitrogen for protonation for each ionizable lipid, the following rule-based decision tree was applied. If the lipid contained only a single nitrogen, that nitrogen was protonated. If multiple nitrogens were present, the nitrogen with the highest priority was protonated based on the following hierarchy: tertiary amine, secondary amine, primary amine, imidazole, pyridine, tertiary aromatic amine, secondary aromatic amine, and primary aromatic amine. Groups comprising amide or sulfonamide structures and quaternary nitrogens were excluded. If multiple candidates of the same class were found, the nitrogen closest to the molecular periphery of the ionizable lipid head—defined as having the greatest graph eccentricity (i.e., the longest existing graph distance to a terminal atom)—was selected. For specific cases within the KZ_2016 dataset involving ionizable lipids with tail amines, the most centrally-located candidate (i.e., the lowest average squared distance to all other atoms) was chosen. Once selected, the nitrogen was protonated by assigning a + 1 formal charge and adjusting the SMILES accordingly. Future research is warranted to explore more accurate, dynamic protonation conditions⁷⁶.

All-atom MD simulations

The CHARMM topology and parameter files for all lipid components were generated as follows: ionizable lipids were modeled in both neutral and +1e protonated states, with parameters assigned manually via CHARMM force field analogy mapping and supplemented by the CGenFF workflow⁷⁷. We used the standard CHARMM force field definitions⁷⁸ for helper lipids, cholesterol, and PEG lipids.

Similar to the CHARMM-GUI Membrane Builder that supports diverse lipid types^66,79, LNPDB uses the latest version of the CHARMM C36 additive force field parameters. The CHARMM force field⁷⁷ is designed to allow a modular, building-block approach to create force fields for molecules composed of components (blocks) similar to the ones already parametrized. Many topologies and parameters of lipids and carbohydrates in the latest version of the CHARMM C36 force field were generated using this building-block approach; and the generated force fields were further validated by comparing simulations with experimental data. Accordingly, we used a similar building-block approach for the ionizable lipids in LNPDB. We have not yet seen abnormal all-atom simulation behavior (e.g., lipid flip-flop); however, further force field optimization is recommended for specific ionizable lipids if abnormal behavior is observed.

All bilayer systems were assembled using the standalone MolCube Membrane Builder—a commercial software application analogous to the CHARMM-GUI Membrane Builder tool⁸⁰—in membrane-only mode using approximately 100 lipids per leaflet and solvated with TIP3P water⁸¹ and 0.15 M NaCl.

Following the six-step equilibration procedure outlined in the CHARMM-GUI Membrane Builder protocol^80,82, NVT (constant particle number, volume, and temperature) simulations were conducted at 310 K (i.e., temperature of cells treated with LNPs) with strong harmonic positional restraints on lipid heavy atoms and dihedral restraints on ionizable head groups. The restraint force constants were gradually reduced to zero over the six equilibration steps for gradual membrane relaxation. Subsequently, unrestrained NPT (constant particle number, pressure, and temperature) production runs were conducted at 310 K and 1 bar for 1.5 µs using OpenMM with a 4 fs time-step enabled by hydrogen-mass repartitioning (HMR)^83,84. Temperature was maintained via a Langevin thermostat (collision frequency 1 ps⁻¹) and pressure via a semi-isotropic Monte Carlo barostat (coupling interval 0.4 ps)^85,86. Bonds involving hydrogen were constrained with SHAKE⁸⁷; van der Waals interactions were force-switched off between 10–12 Å⁸⁸, and long-range electrostatics were treated by the Particle-Mesh Ewald method with a 12 Å real-space cutoff⁸⁹. For ten bilayer systems, we additionally assessed whether the inclusion of PEG lipid or setting the temperature to 298 K (i.e., temperature of LNP synthesis) in simulations affected CPP and found no significant effect (Supplementary Fig. 11). Information for each simulation system is summarized in Supplementary Table 2. Note that the protonated systems have larger system sizes along the z-axis than their neutral counterparts because the protonated systems require more bulk region to fully solvate the system.

We ran all MD simulations on 48 NVIDIA RTX A5000 GPUs in parallel. Across the neutral systems, the average throughput was 363.86 ± 14.93 ns/day; across the protonated systems, it was 319.11 ± 11.65 ns/day. For the LM_2019 bilayers with a run duration of 1.5 μs, this corresponds to 4.1 days per neutral system and 4.7 days per protonated system. The full LM_2019 simulation batch, run on 48 GPUs (one per simulation), completed in about 10 days. Subsequent CPP calculations for all systems, executed on 384 CPU cores in parallel, finished in 5 h, with comparable compute times for CPP_V and CPP_Rg.

Web tool

LNPDB has been developed as a RESTful application built on Django REST Framework and React, with a PostgreSQL backend, together with RDKit for native chemical structure storage and search. Lipid molecules are represented as molecular graphs and indexed via RDKit fingerprints, enabling fast SQL-level structure and substructure queries. User contributions are handled through a CSV upload portal requiring citation metadata. Each submission triggers curator review; data are validated, standardized, and then ingested into PostgreSQL. Approved entries become searchable and visible in all table views and interactive plots via the same API, ensuring seamless integration of new LNP formulations.

UMAP visualization

UMAP visualizations were created of the high-dimensional embedding landscapes of LNPs and ionizable lipids (Fig. 2a). The embedding landscape for ionizable lipids is represented by the top ten principal components (PCs) of Morgan fingerprints⁹⁰ (1024 bits, radius of 3) and the top ten PCs of Mordred descriptors⁹¹. The embedding landscape for LNPs is represented by the same axes as those for ionizable lipids, plus additional dimensions for molar ratios and the top five Morgan fingerprint PCs and top five Mordred descriptor PCs for helper lipids.

UMAP visualizations were also created of the 300-dimensional embedding landscape (i.e., fingerprints) from the LiON model of LNP formulations (Supplementary Figs. 2b and 4a). Fingerprints were extracted from the penultimate linear layer of the LiON model’s feedforward neural network trained on LNPDB.

Deep learning models

As shown in Fig. 3a, we evaluated the predictive performance of LiON (lipid optimization using neural networks)¹⁷, which is based on the message-passing neural network architecture of chemprop⁶⁵. To compare models trained on LNPDB versus the original dataset from our prior study, we computed Spearman correlation values between predicted and experimental delivery outcomes. This evaluation was performed on held-out test sets using a 70–15–15% train-validation-test split by amine, consistent with the approach used in our prior study¹⁷. Datasets shared between LNPDB and the original dataset are compared.

As shown in Fig. 3b, we evaluated the predictive performance of LiON trained on LNPDB compared to another deep learning model, AGILE¹⁶. To perform this analysis, each of the four datasets was fully held out as an external test set. For each held-out set, models were trained using five cross-validation folds with 80%-20% train-test splits on the remaining data. Spearman correlation values between predicted and experimental delivery performance for each fully held-out dataset were computed. For LiON trained on LNPDB, holding out an entire dataset reduced the training sample size. In contrast, AGILE maintained its original training size of 1200 LNPs, as the held-out datasets did not overlap with its training data. To run the AGILE model, the GitHub repository provided in the study was referenced, and the HeLa transfection data was used for training¹⁶.

LNP formulations selected for MD simulations

To assess whether MD simulations could provide meaningful correlations with experimental delivery performance (Fig. 4b–e), we selected N = 54 LNP formulations from a prior study (LM_2019) in LNPDB, which introduced an isocyanide-mediated three-component reaction approach for ionizable lipids⁷. For the sake of modeling, we randomly selected to model the subset of LM_2019 LNPs that contain ionizable lipids with amines A12, A2, or A3; isocyanides Iso5 or Iso9; and any alkyl ketone⁷. PEG lipids were excluded from simulations, as they are typically shed prior to endosomal escape⁶⁸, the key bottleneck for effective delivery⁶⁹, and the physiological context that we aimed to model. This subset, drawn from a single combinatorial ionizable lipid library, was chosen as a representative example of systematic lipid library design commonly employed in the field, while keeping the scope feasible within computational limits.

Additional simulations were conducted for illustrative purposes (Figs. 1b and 4a, b) that contain PEG lipid or the common control ionizable lipids of DLin-MC3-DMA, SM-102, and ALC-0315. Details of all bilayer simulations analyzed in this study are provided in Supplementary Table 2.

Density profiles

To quantify the spatial distribution of lipid components along the membrane normal (z-axis) as shown in Fig. 4b, density profiles were computed from the final 500 ns of each 1.5 μs MD trajectory. At each frame, atomic coordinates were re-centered to have the membrane center of mass be at z = 0. For each lipid molecule, a single representative atom was used to track z-position over time: the hydroxyl oxygen for cholesterol, the phosphorus atom for helper lipids, and the nitrogen atom on the ionizable head group for ionizable lipids. These atom positions were binned along the z-axis to generate one-dimensional density histograms for each lipid type. The profiles were averaged across frames and normalized by bin width to obtain continuous density distributions, reflecting the vertical organization of each component within the bilayer.

Computing CPP using volume (CPP_V)

To quantify ionizable lipid shape, we computed CPP values. For a given stable LNP bilayer simulation, we analyzed each timestep among the final 500 ns of the 1.5 μs trajectory. For each ionizable lipid, atom subsets corresponding to the head group and tail were defined. For the LM_2019⁷ LNP bilayers, atoms in the amine and isocyanide groups—both found to be generally positioned at the membrane–water interface—were assigned to the head group; all other atoms (i.e., alkyl ketones) were assigned to the tail group. In line with its conventional formula^25,92, we calculated CPP based on volume as follows:

$${{\mathrm{CPP}}}_{V}=\frac{V}{{a}_{0}{l}_{c}}$$

(1)

where $V$ is the volume formed by the ionizable lipid, ${a}_{0}$ is the surface area of the head group at the water–membrane interface, and ${l}_{c}$ is the average distance between the head group and tail ends (Fig. 4d). The head group area ${a}_{0}$ was computed as the cross-sectional area of the circle formed in the membrane plane.

$${a}_{0}=\pi {r}_{{\mathrm{head}}}^{2}$$

(2)

where ${r}_{{\mathrm{head}}}$ is the head group radius, calculated as half the maximum pairwise distance between head atoms in the membrane plane. The tail radius ${r}_{{\mathrm{tail}}}$ was computed similarly using terminal tail atoms. The lipid volume $V$ was estimated by modeling the molecule as a truncated cone.

$$V=\frac{1}{3}\pi {l}_{c}\left({r}_{{\mathrm{head}}}^{2}+{r}_{{\mathrm{head}}}{r}_{{\mathrm{tail}}}+{r}_{{\mathrm{tail}}}^{2}\right)$$

(3)

These geometrical parameters were computed frame-by-frame for each ionizable lipid molecule across the trajectory to calculate mean CPP_V values. Standard error of the mean (SEM) was also calculated. We observe that CPP_V values exhibit greater variability across lipids and time steps compared to CPP_Rg. To focus on uncertainty in the central tendency of CPP_V across lipid molecules, SEM was used in place of standard deviation for CPP_V plots.

Lipids with CPP_V > 1 exhibit an inverse cone shape, favoring negative curvature, whereas those with CPP_V < 1 exhibit a cone shape, favoring positive curvature.

Computing CPP using radii of gyration (CPP_Rg)

We also quantified CPP for ionizable lipids using an approach based on radii of gyration, CPP_Rg. We similarly analyzed the final 500 ns of each 1.5 μs trajectory of stable bilayers. For each ionizable lipid, atom subsets corresponding to the head group and tail were defined, and their centers of mass, ${R}_{{\mathrm{COM}}}^{{\mathrm{head}}}$ and ${R}_{{\mathrm{COM}}}^{{\mathrm{tail}}}$ were computed at every frame.

$${R}_{{\mathrm{COM}}}^{{\mathrm{mid}}}=\frac{1}{2}\left({R}_{{\mathrm{COM}}}^{{\mathrm{head}}}\,+\,{R}_{{\mathrm{COM}}}^{{\mathrm{tail}}}\right)$$

(4)

from all atomic coordinates. We then translated each lipid to the origin by subtracting the midpoint. The orientation vector was computed as

$$v=\,{R}_{{\mathrm{COM}}}^{{\mathrm{head}}}-\,{R}_{{\mathrm{COM}}}^{{\mathrm{tail}}}$$

(5)

and aligned to the membrane normal $\hat{z}=[0,\,0,\,1]$ by rotating the coordinate set through the angle between $v$ and $\hat{z}$. With all lipids consistently oriented, the radius of gyration in the ${xy}$ plane was computed as

$${R}_{g}\,=\,\sqrt{\frac{1}{M}{\sum }_{i}\,{m}_{i}\left({x}_{i}^{2}+{y}_{i}^{2}\right)}$$

(6)

where ${m}_{i}$ and (${x}_{i},{y}_{i}$) are the mass and coordinates of atom $i,$ and $M=\,{\sum }_{i}{m}_{i}$. We recorded the average ${R}_{g}$ values for tail and head atoms across all frames—denoted ${R}_{g}^{{\mathrm{tail}}}$ and ${R}_{g}^{{\mathrm{head}}}$—and computed CPP_Rg as

$${{\mathrm{CPP}}}_{{Rg}}=\frac{{R}_{g}^{{\mathrm{tail}}}}{{R}_{g}^{{\mathrm{head}}}}$$

(7)

Lipids with CPP_Rg > 1 exhibit an inverse cone shape favoring negative curvature, whereas those with CPP_Rg < 1 exhibit a cone shape favoring positive curvature. Standard deviation was directly computed from the distribution of CPP_Rg values over the analysis window (single-block averaging).

Membrane thickness, torque density, compressibility

The bilayer thickness $({d}_{B})$ is defined as the instantaneous difference between the average $z$-coordinates of phosphate atoms in the upper and lower leaflets:

$${d}_{B}\left(t\right)=\, \langle {z}_{P,{{\mathrm{upper}}}}\left(t\right)\rangle \,-\, \langle {z}_{P,{{\mathrm{lower}}}}\left(t\right)\rangle$$

(8)

We averaged ${d}_{B}(t)$ over the final 500 ns of each 1.5 µs trajectory to yield a single representative ${d}_{B}$ per system.

Monolayer torque density (${\tau }_{{\mathrm{mean}}}$) was calculated from the first moment of the lateral pressure profile $p\left(z\right)={p}_{T}\left(z\right)-{p}_{N}(z)$, where

$${p}_{T}\left(z\right)=\,\frac{{p}_{{xx}}\left(z\right)+{p}_{{yy}}\left(z\right)}{2},\,{p}_{N}\left(z\right)=\,{p}_{{xx}}\left(z\right)$$

(9)

Pressure profiles (0.2 Å bins) were integrated to give leaflet torques:

$${\tau }_{{\mathrm{upper}}}={\int }_{0}^{\frac{{L}_{z}}{2}}{zp}\left(z\right){dz},\,{\tau }_{{l}{o}{w}{e}{r}}={\int }_{\frac{{-L}_{z}}{2}}^{0}{zp}\left(z\right){dz}$$

(10)

and averaged as

$${\tau }_{{\mathrm{mean}}}\,=\frac{{\tau }_{{\mathrm{upper}}}+{\,\tau }_{{\mathrm{lower}}}\,}{2}$$

(11)

Pressure and torque calculations were performed on velocity- and position-recoupled trajectories over the final 500 ns to ensure full equilibration. As context, for a stress-free symmetric bilayer, the monolayer torque density ($\tau$) is related to the bending modulus (${k}_{c}$) and spontaneous curvature (${c}_{0}$) as $\tau=\,{k}_{c}{c}_{0}$. A monolayer with positive curvature is convex to the head group side, while negative curvature is concave⁶⁶.

The area compressibility modulus ${K}_{A}$ was determined from fluctuations in the instantaneous projected bilayer area $A(t)$:

$${K}_{A}=\,\frac{{k}_{B}T\,\langle A\rangle }{\langle {\left(A\left(t\right)-\,\langle A\rangle \right)}^{2}\rangle }$$

(12)

where ${k}_{B}$ is the Boltzmann constant, $T$ the simulation temperature (310 K). Standard deviation was directly calculated from the distribution of $A(t)$ values over the same 500 ns analysis block.

Reporting summary

Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.

Data availability

LNPDB is publicly accessible and can be interactively viewed and downloaded at https://lnpdb.molcube.com/. Source data for Figs. 2–4 and Supplementary Figs. 1–13 are provided with this paper.

Code availability

Code used to analyze deep learning models and MD trajectories is available on our GitHub repository at https://github.com/evancollins1/LNPDB.

References

Hou, X., Zaks, T., Langer, R. & Dong, Y. Lipid nanoparticles for mRNA delivery. Nat. Rev. Mater. 6, 1078–1094 (2021).
Article ADS CAS PubMed PubMed Central Google Scholar
Baden, L. R. et al. Efficacy and safety of the mRNA-1273 SARS-CoV-2 vaccine. N. Engl. J. Med. 384, 403–416 (2021).
Article CAS PubMed Google Scholar
Mirkin, C. A., Mrksich, M. & Artzi, N. The emerging era of structural nanomedicine. Nat. Rev. Bioeng. 3, 526–528 (2025).
Article CAS PubMed PubMed Central Google Scholar
Albertsen, C. H. et al. The role of lipid components in lipid nanoparticles for vaccines and gene therapy. Adv. Drug Deliv. Rev. 188, 114416 (2022).
Article Google Scholar
Han, X. et al. An ionizable lipid toolbox for RNA delivery. Nat. Commun. 12, 7233 (2021).
Article ADS CAS PubMed PubMed Central Google Scholar
Akinc, A. et al. A combinatorial library of lipid-like materials for delivery of RNAi therapeutics. Nat. Biotechnol. 26, 561–569 (2008).
Article CAS PubMed PubMed Central Google Scholar
Miao, L. et al. Delivery of mRNA vaccines with heterocyclic lipids increases anti-tumor efficacy by STING-mediated immune cell activation. Nat. Biotechnol. 37, 1174–1185 (2019).
Article CAS PubMed Google Scholar
Cheng, Q. et al. Selective organ targeting (SORT) nanoparticles for tissue-specific mRNA delivery and CRISPR–Cas gene editing. Nat. Nanotechnol. 15, 313–320 (2020).
Article ADS CAS PubMed PubMed Central Google Scholar
Chander, N., Basha, G., Yan Cheng, M. H., Witzigmann, D. & Cullis, P. R. Lipid nanoparticle mRNA systems containing high levels of sphingomyelin engender higher protein expression in hepatic and extra-hepatic tissues. Mol. Ther. Methods Clin. Dev. 30, 235–245 (2023).
Article CAS PubMed PubMed Central Google Scholar
Radmand, A. et al. Cationic cholesterol-dependent LNP delivery to lung stem cells, the liver, and heart. Proc. Natl. Acad. Sci. USA 121, e2307801120 (2024).
Article CAS PubMed PubMed Central Google Scholar
Zhu, Y. et al. Multi-step screening of DNA/lipid nanoparticles and co-delivery with siRNA to enhance and prolong gene expression. Nat. Commun. 13, 4282 (2022).
Article ADS CAS PubMed PubMed Central Google Scholar
Jumper, J. et al. Highly accurate protein structure prediction with AlphaFold. Nature 596, 583–589 (2021).
Article ADS CAS PubMed Google Scholar
Abramson, J. et al. Accurate structure prediction of biomolecular interactions with AlphaFold 3. Nature 630, 493–500 (2024).
Article ADS CAS PubMed PubMed Central Google Scholar
Wang, H. et al. Scientific discovery in the age of artificial intelligence. Nature 620, 47–60 (2023).
Article ADS CAS PubMed Google Scholar
Li, B. et al. Accelerating ionizable lipid discovery for mRNA delivery using machine learning and combinatorial chemistry. Nat. Mater. 23, 1002–1008 (2024).
Article ADS CAS PubMed Google Scholar
Xu, Y. et al. AGILE platform: a deep learning powered approach to accelerate LNP development for mRNA delivery. Nat. Commun. 15, 6305 (2024).
Article ADS CAS PubMed PubMed Central Google Scholar
Witten, J. et al. Artificial intelligence-guided design of lipid nanoparticles for pulmonary gene therapy. Nat. Biotechnol. 43, 1790–1799 (2025).
Article CAS PubMed Google Scholar
Kumar, G. & Ardekani, A. M. Machine-learning framework to predict the performance of lipid nanoparticles for nucleic acid delivery. ACS Appl. Bio Mater. 8, 3717–3727 (2025).
Article CAS PubMed Google Scholar
Suzuki, T. et al. PEG shedding-rate-dependent blood clearance of PEGylated lipid nanoparticles in mice: faster PEG shedding attenuates anti-PEG IgM production. Int. J. Pharm. 588, 119792 (2020).
Article CAS PubMed Google Scholar
Jiang, A. Y. et al. Combinatorial development of nebulized mRNA delivery formulations for the lungs. Nat. Nanotechnol. 19, 364–375 (2024).
Article ADS CAS PubMed Google Scholar
Da Vela, S. & Svergun, D. I. Methods, development and applications of small-angle X-ray scattering to characterize biological macromolecules in solution. Curr. Res. Struct. Biol. 2, 164–170 (2020).
Article PubMed PubMed Central Google Scholar
Zheng, L., Bandara, S. R., Tan, Z. & Leal, C. Lipid nanoparticle topology regulates endosomal escape and delivery of RNA to the cytoplasm. Proc. Natl. Acad. Sci. USA 120, e2301067120 (2023).
Article CAS PubMed PubMed Central Google Scholar
Wang, M. M. et al. Elucidation of lipid nanoparticle surface structure in mRNA vaccines. Sci. Rep. 13, 16744 (2023).
Article ADS CAS PubMed PubMed Central Google Scholar
Brooks, B. R. et al. CHARMM: the biomolecular simulation program. J. Comput. Chem. 30, 1545–1614 (2009).
Article ADS CAS PubMed PubMed Central Google Scholar
Tesei, G. et al. Lipid shape and packing are key for optimal design of pH-sensitive mRNA lipid nanoparticles. Proc. Natl. Acad. Sci. USA 121, e2311700120 (2024).
Article CAS PubMed PubMed Central Google Scholar
Philipp, J. et al. pH-dependent structural transitions in cationic ionizable lipid mesophases are critical for lipid nanoparticle function. Proc. Natl. Acad. Sci. USA 120, e2310491120 (2023).
Article CAS PubMed PubMed Central Google Scholar
Garaizar, A. et al. Toward understanding lipid reorganization in RNA lipid nanoparticles in acidic environments. Proc. Natl. Acad. Sci. USA 121, e2404555121 (2024).
Article CAS PubMed PubMed Central Google Scholar
Love, K. T. et al. Lipid-like materials for low-dose, in vivo gene silencing. Proc. Natl. Acad. Sci. USA 107, 1864–1869 (2010).
Article ADS CAS PubMed PubMed Central Google Scholar
Li, L. et al. A biomimetic lipid library for gene delivery through thiol-yne click chemistry. Biomaterials 33, 8160–8166 (2012).
Article CAS PubMed Google Scholar
Whitehead, K. A. et al. Degradable lipid nanoparticles with predictable in vivo siRNA delivery activity. Nat. Commun. 5, 4277 (2014).
Article ADS CAS PubMed PubMed Central Google Scholar
Miller, J. B. et al. Non-viral CRISPR/Cas gene editing in vitro and in vivo enabled by synthetic nanoparticle co-delivery of Cas9 mRNA and sgRNA. Angew. Chem. Int. Ed 56, 1059–1063 (2017).
Article CAS Google Scholar
Zhou, K. et al. Modular degradable dendrimers enable small RNAs to extend survival in an aggressive liver cancer model. Proc. Natl. Acad. Sci. USA 113, 520–525 (2016).
Article ADS CAS PubMed PubMed Central Google Scholar
Lee, S. M. et al. A systematic study of unsaturation in lipid nanoparticles leads to improved mRNA transfection in vivo. Angew. Chem. Int. Ed. 60, 5848–5853 (2021).
Article CAS Google Scholar
Liu, S. et al. Membrane-destabilizing ionizable phospholipids for organ-selective mRNA delivery and CRISPR–Cas gene editing. Nat. Mater. 20, 701–710 (2021).
Article ADS CAS PubMed PubMed Central Google Scholar
Li, Z. et al. Enzyme-catalyzed one-step synthesis of ionizable cationic lipids for lipid nanoparticle-based mRNA COVID-19 vaccines. ACS Nano 16, 18936–18950 (2022).
Article CAS PubMed Google Scholar
Li, B. et al. Combinatorial design of nanoparticles for pulmonary mRNA delivery and genome editing. Nat. Biotechnol. 41, 1410–1415 (2023).
Article CAS PubMed PubMed Central Google Scholar
Chen, J. et al. Combinatorial design of ionizable lipid nanoparticles for muscle-selective mRNA delivery with minimized off-target effects. Proc. Natl. Acad. Sci. USA 120, e2309472120 (2023).
Article CAS PubMed PubMed Central Google Scholar
Rhym, L. H., Manan, R. S., Koller, A., Stephanie, G. & Anderson, D. G. Peptide-encoding mRNA barcodes for the high-throughput in vivo screening of libraries of lipid nanoparticles for mRNA delivery. Nat. Biomed. Eng. 7, 901–910 (2023).
Article CAS PubMed Google Scholar
Goldman, R. L. et al. Understanding structure activity relationships of Good HEPES lipids for lipid nanoparticle mRNA vaccine applications. Biomaterials 301, 122243 (2023).
Article CAS PubMed Google Scholar
Xu, Y. et al. Delivery of mRNA vaccine with 1, 2-diesters-derived lipids elicits fast liver clearance for safe and effective cancer immunotherapy. Adv. Healthc. Mater. 13, 2302691 (2024).
Article CAS Google Scholar
Yan, Y. et al. Branched hydrophobic tails in lipid nanoparticles enhance mRNA delivery for cancer immunotherapy. Biomaterials 301, 122279 (2023).
Article CAS PubMed Google Scholar
Chen, Z. et al. Modular design of biodegradable ionizable lipids for improved mRNA delivery and precise cancer metastasis delineation in vivo. J. Am. Chem. Soc. 145, 24302–24314 (2023).
Article ADS CAS PubMed Google Scholar
He, Z. et al. A multidimensional approach to modulating ionizable lipids for high-performing and organ-selective mRNA delivery. Angew. Chem. Int. Ed. 62, e202310401 (2023).
Article CAS Google Scholar
Su, K. et al. Reformulating lipid nanoparticles for organ-targeted mRNA accumulation and translation. Nat. Commun. 15, 5659 (2024).
Article CAS PubMed PubMed Central Google Scholar
Xue, L. et al. High-throughput barcoding of nanoparticles identifies cationic, degradable lipid-like materials for mRNA delivery to the lungs in female preclinical models. Nat. Commun. 15, 1884 (2024).
Article ADS CAS PubMed PubMed Central Google Scholar
Chaudhary, N. et al. Lipid nanoparticle structure and delivery route during pregnancy dictate mRNA potency, immunogenicity, and maternal and fetal outcomes. Proc. Natl. Acad. Sci. USA 121, e2307810121 (2024).
Article CAS PubMed PubMed Central Google Scholar
Han, X. et al. In situ combinatorial synthesis of degradable branched lipidoids for systemic delivery of mRNA therapeutics and gene editors. Nat. Commun. 15, 1762 (2024).
Article ADS CAS PubMed PubMed Central Google Scholar
Ren, Y. et al. Enhancing spleen-targeted mRNA delivery with branched biodegradable tails in lipid nanoparticles. J. Mater. Chem. B 12, 8062–8066 (2024).
Article CAS PubMed Google Scholar
Zhu, Y. et al. Screening for lipid nanoparticles that modulate the immune activity of helper T cells towards enhanced antitumour activity. Nat. Biomed. Eng. 8, 544–560 (2024).
Article CAS PubMed Google Scholar
Sabnis, S. et al. A novel amino lipid series for mRNA delivery: improved endosomal escape and sustained pharmacology and safety in non-human primates. Mol. Ther. 26, 1509–1519 (2018).
Article CAS PubMed PubMed Central Google Scholar
Patel, S. et al. Naturally-occurring cholesterol analogues in lipid nanoparticles induce polymorphic shape and enhance intracellular delivery of mRNA. Nat. Commun. 11, 983 (2020).
Article ADS CAS PubMed PubMed Central Google Scholar
Bae, S.-H. et al. A lipid nanoparticle platform incorporating trehalose glycolipid for exceptional mRNA vaccine safety. Bioact. Mater. 38, 486–498 (2024).
CAS PubMed PubMed Central Google Scholar
Li, J. et al. High-throughput synthesis and optimization of ionizable lipids through A3 coupling for efficient mRNA delivery. J. Nanobiotechnol. 22, 672 (2024).
Article CAS Google Scholar
Xue, L. et al. Multiarm-assisted design of dendron-like degradable ionizable lipids facilitates systemic mRNA delivery to the spleen. J. Am. Chem. Soc. 147, 1542–1552 (2025).
Article ADS CAS PubMed PubMed Central Google Scholar
Xue, L. et al. Combinatorial design of siloxane-incorporated lipid nanoparticles augments intracellular processing for tissue-specific mRNA therapeutic delivery. Nat. Nanotechnol. 20, 132–143 (2025).
Article ADS CAS PubMed Google Scholar
Peña, Á et al. Multicomponent thiolactone-based ionizable lipid screening platform for efficient and tunable mRNA delivery to the lungs. Commun. Chem. 8, 1–12 (2025).
Article Google Scholar
Yoo, S. et al. Novel less toxic, lymphoid tissue-targeted lipid nanoparticles containing a vitamin B5-derived ionizable lipid for mRNA vaccine delivery. Adv. Healthc. Mater. 14, 2403366 (2025).
Article CAS PubMed Google Scholar
Liu, L. et al. PEGylated lipid screening, composition optimization, and structure–activity relationship determination for lipid nanoparticle-mediated mRNA delivery. Nanoscale 17, 11329–11344 (2025).
Article CAS PubMed Google Scholar
Zhang, L. et al. Role of PEGylated lipid in lipid nanoparticle formulation for in vitro and in vivo delivery of mRNA vaccines. J. Control. Release 380, 108–124 (2025).
Article CAS PubMed Google Scholar
Xu, S. et al. In vivo genome editing of human haematopoietic stem cells for treatment of blood disorders using mRNA delivery. Nat. Biomed. Eng. https://doi.org/10.1038/s41551-025-01480-y (2025).
Han, X. et al. Plug-and-play assembly of biodegradable ionizable lipids for potent mRNA delivery and gene editing in vivo. Preprint at https://doi.org/10.1101/2025.02.25.640222 (2025).
Wu, S. et al. Paracyclophane-based ionizable lipids for efficient mRNA delivery in vivo. J. Control. Release 376, 395–401 (2024).
Article CAS PubMed Google Scholar
Han, X. et al. Optimization of the activity and biodegradability of ionizable lipids for mRNA delivery via directed chemical evolution. Nat. Biomed. Eng. 8, 1412–1424 (2024).
Article CAS PubMed Google Scholar
Weininger, D. SMILES, a chemical language and information system. 1. Introduction to methodology and encoding rules. J. Chem. Inf. Comput. Sci. 28, 31–36 (1988).
Article CAS Google Scholar
Yang, K. et al. Analyzing learned molecular representations for property prediction. J. Chem. Inf. Model. 59, 3370–3388 (2019).
Article CAS PubMed PubMed Central Google Scholar
Park, S., Choi, Y. K., Kim, S., Lee, J. & Im, W. CHARMM-GUI membrane builder for lipid nanoparticles with ionizable cationic lipids and PEGylated lipids. J. Chem. Inf. Model. 61, 5192–5202 (2021).
Article CAS PubMed PubMed Central Google Scholar
Mui, B. L. et al. Influence of polyethylene glycol lipid desorption rates on pharmacokinetics and pharmacodynamics of siRNA lipid nanoparticles. Mol. Ther. Nucleic Acids 2, e139 (2013).
Article CAS PubMed PubMed Central Google Scholar
Yanez Arteta, M. et al. Successful reprogramming of cellular protein production through mRNA delivered by functionalized lipid nanoparticles. Proc. Natl. Acad. Sci. USA 115, E3351–E3360 (2018).
Chatterjee, S., Kon, E., Sharma, P. & Peer, D. Endosomal escape: a bottleneck for LNP-mediated therapeutics. Proc. Natl. Acad. Sci. USA 121, e2307800120 (2024).
Article CAS PubMed PubMed Central Google Scholar
Jayaraman, M. et al. Maximizing the potency of siRNA lipid nanoparticles for hepatic gene silencing in vivo. Angew. Chem. Int. Ed Engl. 51, 8529–8533 (2012).
Article CAS PubMed PubMed Central Google Scholar
Eastman, P. et al. OpenMM 7: rapid development of high performance algorithms for molecular dynamics. PLOS Comput. Biol. 13, e1005659 (2017).
Article PubMed PubMed Central Google Scholar
Yu, H. et al. Real-time pH-dependent self-assembly of ionisable lipids from COVID-19 Vaccines and in situ nucleic acid complexation. Angew. Chem. 135, e202304977 (2023).
Article Google Scholar
Adams, D. et al. Patisiran, an RNAi therapeutic, for hereditary transthyretin amyloidosis. N. Engl. J. Med. 379, 11–21 (2018).
Article CAS PubMed Google Scholar
Kjølbye, L. R. et al. Martini 3 Building Blocks for Lipid Nanoparticle Design. J. Chem. Theory Comput. https://doi.org/10.1021/acs.jctc.5c01207 (2025).
Shi, D., Toyonaga, S. & Anderson, D. G. In vivo RNA delivery to hematopoietic stem and progenitor cells via targeted lipid nanoparticles. Nano Lett. 23, 2938–2944 (2023).
Article ADS CAS PubMed PubMed Central Google Scholar
Jansen, A., Aho, N., Groenhof, G., Buslaev, P. & Hess, B. phbuilder: a tool for efficiently setting up constant pH molecular dynamics simulations in GROMACS. J. Chem. Inf. Model. 64, 567–574 (2024).
Article CAS PubMed PubMed Central Google Scholar
Vanommeslaeghe, K. et al. CHARMM general force field: A force field for drug-like molecules compatible with the CHARMM all-atom additive biological force fields. J. Comput. Chem. 31, 671–690 (2010).
Article ADS CAS PubMed PubMed Central Google Scholar
Klauda, J. B. et al. Update of the CHARMM all-atom additive force field for lipids: validation on six lipid types. J. Phys. Chem. B 114, 7830–7843 (2010).
Article CAS PubMed PubMed Central Google Scholar
Lee, J. et al. CHARMM-GUI Membrane Builder for complex biological membrane simulations with glycolipids and lipoglycans. J. Chem. Theory Comput. 15, 775–786 (2019).
Article CAS PubMed Google Scholar
Jo, S., Kim, T., Iyer, V. G. & Im, W. CHARMM-GUI: a web-based graphical user interface for CHARMM. J. Comput. Chem. 29, 1859–1865 (2008).
Article ADS CAS PubMed Google Scholar
Jorgensen, W. L., Chandrasekhar, J., Madura, J. D., Impey, R. W. & Klein, M. L. Comparison of simple potential functions for simulating liquid water. J. Chem. Phys. 79, 926–935 (1983).
Article ADS CAS Google Scholar
Lee, J. et al. CHARMM-GUI input generator for NAMD, GROMACS, AMBER, OpenMM, and CHARMM/OpenMM simulations using the CHARMM36 additive force field. J. Chem. Theory Comput. 12, 405–413 (2016).
Article CAS PubMed Google Scholar
Hopkins, C. W., Le Grand, S., Walker, R. C. & Roitberg, A. E. Long-time-step molecular dynamics through hydrogen mass repartitioning. J. Chem. Theory Comput. 11, 1864–1874 (2015).
Article CAS PubMed Google Scholar
Gao, Y. et al. CHARMM-GUI supports hydrogen mass repartitioning and different protonation states of phosphates in lipopolysaccharides. J. Chem. Inf. Model. 61, 831–839 (2021).
Article CAS PubMed PubMed Central Google Scholar
Chow, K.-H. & Ferguson, D. M. Isothermal-isobaric molecular dynamics simulations with Monte Carlo volume sampling. Comput. Phys. Commun. 91, 283–289 (1995).
Article ADS CAS Google Scholar
Åqvist, J., Wennerström, P., Nervall, M., Bjelic, S. & Brandsdal, B. O. Molecular dynamics simulations of water and biomolecules with a Monte Carlo constant pressure algorithm. Chem. Phys. Lett. 384, 288–294 (2004).
Article ADS Google Scholar
Ryckaert, J.-P., Ciccotti, G. & Berendsen, H. J. C. Numerical integration of the cartesian equations of motion of a system with constraints: molecular dynamics of n-alkanes. J. Comput. Phys. 23, 327–341 (1977).
Article ADS CAS Google Scholar
Dion, M., Rydberg, H., Schröder, E., Langreth, D. C. & Lundqvist, B. I. Van der Waals density functional for general geometries. Phys. Rev. Lett. 92, 246401 (2004).
Article ADS CAS PubMed Google Scholar
Essmann, U. et al. A smooth particle mesh Ewald method. J. Chem. Phys. 103, 8577–8593 (1995).
Article ADS CAS Google Scholar
Rogers, D. & Hahn, M. Extended-connectivity fingerprints. J. Chem. Inf. Model. 50, 742–754 (2010).
Article CAS PubMed Google Scholar
Moriwaki, H., Tian, Y.-S., Kawashita, N. & Takagi, T. Mordred: a molecular descriptor calculator. J. Cheminform. 10, 4 (2018).
Article PubMed PubMed Central Google Scholar
Kobierski, J., Wnętrzak, A., Chachaj-Brekiesz, A. & Dynarowicz-Latka, P. Predicting the packing parameter for lipids in monolayers with the use of molecular dynamics. Colloids Surf. B Biointerfaces 211, 112298 (2022).
Article CAS PubMed Google Scholar

Download references

Acknowledgements

This work is supported by Sanofi (D.G.A.), MIT Jameel Clinic (E.C.), NIH grant R33AI161805-05 (D.G.A.), São Paulo Research Foundation Process Number #2024/14345-4 (P.P.), the Nano-Material Technology Development Program through the National Research Foundation of Korea funded by the Ministry of Science and ICT RS-2023-00281553 (J.O.J., W.I.), and the Scale-up TIPS Program RS-2023-00321786 and the TIPA Global R&D Project RS-2025-25458614 funded by the Ministry of SMEs and Startups of Korea (W.I.). Figures created in part using images from BioRender.com.

Author information

These authors contributed equally: Evan Collins, Jungyong Ji, Sung-Gwang Kim.
These authors jointly supervised this work: Daniel G. Anderson, Wonpil Im.

Authors and Affiliations

Department of Biological Engineering, Massachusetts Institute of Technology, Cambridge, MA, USA
Evan Collins, Jacob Witten & Robert Langer
David H. Koch Institute for Integrative Cancer Research, Massachusetts Institute of Technology, Cambridge, MA, USA
Evan Collins, Jacob Witten, Rajith S. Manan, Arnab Rudra, William J. Jeang, Robert Langer & Daniel G. Anderson
Jameel Clinic, Massachusetts Institute of Technology, Cambridge, MA, USA
Evan Collins
MolCube Inc., Seoul, Republic of Korea
Jungyong Ji, Sung-Gwang Kim, Seonghoon Kim, Minjun Jung, Aron Park & Wonpil Im
Department of Chemical Engineering, Massachusetts Institute of Technology, Cambridge, MA, USA
Jacob Witten, Rajith S. Manan, Arnab Rudra, Robert Langer & Daniel G. Anderson
Department of Biology, Massachusetts Institute of Technology, Cambridge, MA, USA
Richard Zhu
Department of Biological Sciences, Lehigh University, Bethlehem, PA, USA
Peter Park & Wonpil Im
Department of Anesthesiology, Critical Care and Pain Medicine, Boston Children’s Hospital, Boston, MA, USA
Arnab Rudra, William J. Jeang & Daniel G. Anderson
Medicinal Materials Research Center, Biomedical Research Division, Korea Institute of Science and Technology, Seoul, Republic of Korea
Gyochang Keum & Eun-Kyoung Bang
KHU-KIST Department of Converging Science and Technology, Graduate School, Kyung Hee University, Seoul, Republic of Korea
Eun-Kyoung Bang
Department of Microbiology, Brain Korea 21 Project, University of Ulsan College of Medicine, ASAN Medical Center, Seoul, Republic of Korea
Jun-O Jin
Department of Materials Science and Engineering, Massachusetts Institute of Technology, Cambridge, MA, USA
William J. Jeang
Harvard and MIT Division of Health Science and Technology, Massachusetts Institute of Technology, Cambridge, MA, USA
Robert Langer & Daniel G. Anderson
Institute for Medical Engineering and Science, Massachusetts Institute of Technology, Cambridge, MA, USA
Robert Langer & Daniel G. Anderson

Authors

Evan Collins
View author publications
Search author on:PubMed Google Scholar
Jungyong Ji
View author publications
Search author on:PubMed Google Scholar
Sung-Gwang Kim
View author publications
Search author on:PubMed Google Scholar
Jacob Witten
View author publications
Search author on:PubMed Google Scholar
Seonghoon Kim
View author publications
Search author on:PubMed Google Scholar
Richard Zhu
View author publications
Search author on:PubMed Google Scholar
Peter Park
View author publications
Search author on:PubMed Google Scholar
Minjun Jung
View author publications
Search author on:PubMed Google Scholar
Aron Park
View author publications
Search author on:PubMed Google Scholar
Rajith S. Manan
View author publications
Search author on:PubMed Google Scholar
Arnab Rudra
View author publications
Search author on:PubMed Google Scholar
Gyochang Keum
View author publications
Search author on:PubMed Google Scholar
Eun-Kyoung Bang
View author publications
Search author on:PubMed Google Scholar
Jun-O Jin
View author publications
Search author on:PubMed Google Scholar
William J. Jeang
View author publications
Search author on:PubMed Google Scholar
Robert Langer
View author publications
Search author on:PubMed Google Scholar
Daniel G. Anderson
View author publications
Search author on:PubMed Google Scholar
Wonpil Im
View author publications
Search author on:PubMed Google Scholar

Contributions

E.C., J.J., S.-G.K., J.W., S.K., R.Z., P.P., M.J., A.P., R.S.M., A.R., G.K., E.-K.B., J.-O.J, and W.J.J. created, refined, and analyzed the dataset. E.C., J.J, S.G.K., D.G.A, and W.I. discussed the results and wrote the paper with input from all authors. D.G.A. and W.I. acquired funding. R.L., D.G.A., and W.I. supervised the project.

Corresponding authors

Correspondence to Daniel G. Anderson or Wonpil Im.

Ethics declarations

Competing interests

D.G.A. receives research funding from Sanofi and is a founder of Orna Therapeutics, Soufflé Therapeutics, and Combined Therapeutics. R.L. is a co-founder and former member of the board of directors of Moderna. R.L. also serves on the board and has equity in Particles for Humanity. For a full list of entities with which R.L. is involved, compensated or uncompensated, see https://www.dropbox.com/scl/fi/ty2b7x8vyebid8ybcbeox/Rev-Langer-COI.pdf?rlkey=lko2srm1qjknm53ck9yns1dfj&e=1&dl=0. W.I. is a co-founder and CEO of MolCube Inc. The remaining authors have no competing interests to declare.

Peer review

Peer review information

Nature Communications thanks anonymous reviewers for their contribution to the peer review of this work. A peer review file is available.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

Supplementary Information (download PDF )

Reporting Summary (download PDF )

Transparent Peer Review file (download PDF )

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Collins, E., Ji, J., Kim, SG. et al. Lipid Nanoparticle Database towards structure-function modeling and data-driven design for nucleic acid delivery. Nat Commun 17, 2464 (2026). https://doi.org/10.1038/s41467-026-68818-1

Download citation

Received: 07 June 2025
Accepted: 16 January 2026
Published: 28 January 2026
Version of record: 16 March 2026
DOI: https://doi.org/10.1038/s41467-026-68818-1