Abstract
Lipid-protein interactions are crucial for virtually all biological processes in living cells. However, existing structural databases focusing on these interactions are limited to integral membrane proteins. A systematic understanding of diverse lipid-protein interactions also encompassing lipid-anchored, peripheral membrane and soluble lipid binding proteins remains to be elucidated. To address this gap and facilitate the research of universal lipid-protein assemblies, we developed BioDolphin - a curated database with over 127,000 lipid-protein interactions. BioDolphin provides comprehensive annotations, including protein functions, protein families, lipid classifications, lipid-protein binding affinities, membrane association type, and atomic structures. Accessible via a publicly available web server (www.biodolphin.chemistry.gatech.edu), users can efficiently search for lipid-protein interactions using a wide range of options and download datasets of interest. Additionally, BioDolphin features interactive 3D visualization of each lipid-protein complex, facilitating the exploration of structure-function relationships. BioDolphin also includes detailed information on atomic-level intermolecular interactions between lipids and proteins that enable large scale analysis of both paired complexes and larger assemblies. As an open-source resource, BioDolphin enables global analysis of lipid-protein interactions and supports data-driven approaches for developing predictive machine learning algorithms for lipid-protein binding affinity and structures.

Similar content being viewed by others
Introduction
Lipids are a heterogeneous group of organic biomolecules, usually amphipathic or hydrophobic, that are generally poorly soluble in aqueous solutions. Lipids can be classified into eight different categories according to LIPID MAPS1,2: 1. Fatty Acyls, 2. Glycerolipids, 3. Glycerophospholipids, 4. Sphingolipids, 5. Sterol Lipids, 6. Prenol Lipids, 7. Saccharolipids, and 8. Polyketides (Fig. 1). Due to their complexity, many lipids fall into multiple classes. Lipids are known for their well characterized roles in energy storage and cell membrane structure3,4. However, recent studies have revealed the structural and functional diversity of lipids, revealing their crucial functional roles in immune signaling, signal transduction, epigenetic regulation, and post-translation modification of biomolecules, which calls for new studies on key lipid-protein systems with relevance to biology and human health5,6,7. For example, genetic mutation of proteins involved in lipid metabolism and homeostasis result in lipid storage disorders and autoimmunity8,9. Due to their limited aqueous solubility, many lipids rarely exist as free species in cells and instead predominantly carry out their function in membranes, vesicles, lipid droplets, or through interactions with proteins10,11. Given the vast diversity of lipid structures, physicochemical properties, and biological functions, the potential for interactions between lipids and proteins is extensive, offering a rich landscape for exploring and understanding binding modes and specificity12,13.
Example lipids are shown for each class. Fatty acyl Palmitic acid (16:0), PDB Chemical ID PLM, PubChem CID 135369651; Sphingolipid Sulfatide (24:1), PDB Chemical ID CIS, PubChem CID 5459361; Sterol Cholesterol, PDB Chemical ID CLR, PubChem CID 5997; Polyketide Cannabidiol, PDB Chemical ID P0T, PubChem CID 644019; Saccharolipid Lipid X, PDB Chemical ID LP5, PubChem CID 123907; Prenol Siphonaxanthin, PDB Chemical ID 0IE, PubChem CID 11204185; Glycerophospholipid Phosphatidylethanolamine (15:0/20:0), PDB Chemical ID PTY, PubChem CID 446872; Glycerolipid Tripalmitin, PDB Chemical ID 4RF, PubChem CID 11147.
Lipid-protein interactions can be sorted into four main categories based on the environment where the proteins are found: i) integral membrane proteins, ii) peripheral membrane proteins, iii) soluble proteins, and iv) lipid-anchored proteins10. Integral membrane proteins are permanently embedded in the plasma membrane surrounded by a large amount of lipids, typically glycerophospholipids, sphingolipids, and sterols3,14. The interactions between integral membrane proteins and lipid membranes have been the focus of most studies13,15,16. The lipid composition of membranes can regulate the function of integral membrane proteins, such as G protein-coupled receptors, ATP-binding cassette transporters, and ion channels17,18. These interactions are often non-specific and are related to the hydrophobic properties of the membrane as a lipid environment. Integral membrane proteins can, however, also interact with lipids with specificity, which has been shown play an important role in various biological activities17,18. For instance, the human β2-Adrenergic Receptor contains a specific cholesterol binding site which plays roles in stabilizing the protein19. Integral membrane proteins also contain classes of proteins that contain two or more domains with a transmembrane domains that associate with membrane lipids and a soluble luminal domains that associate with lipids outside of the membrane. For example, the CD1 immunoreceptor family of proteins involved in immune signaling to T cells contain transmembrane domains for cell surface presentation, but their extracellular soluble domains that have been shown to sample repertoires of more than 2000 unique lipid types, including phospholipids, sphingolipids, glycolipids, waxes, oils, and lipopeptides, highlighting its impressive ability to sample the massive diversity of the lipidome20,21,22.
Another classes of lipid binding proteins are peripheral membrane proteins, which may be permanently associated with one of the two membrane leaflets, or may be transiently recruited to the periphery of the membrane23,24. Lipid interactions with peripheral membrane proteins often involve cell signaling25. One well-studied example is phospholipase C delta (PLC-δ), which selectively binds with high affinity to phosphatidylinositol 4,5-bisphosphate, enabling PLC-δ to target the phospholipid substrate in the membrane26,27. Another class of membrane associated proteins are lipid-anchored proteins, which are covalently linked to cell membranes via lipids. Common mechanisms for anchoring proteins into membranes are myristoylation, palmitoylation, prenylation, and glycosylphosphatidylinositol anchoring28. RAS GTPase proteins that regulates cell survival and proliferation in many different tumor types are lipid-anchored into plasma membranes via a combination of prenylation and palmitoylation29. Finally, lipids can also interact with soluble proteins within different cellular compartments, such as the cytoplasm, endoplasmic reticulum, Golgi, lysosome, and endosome30,31. These soluble proteins can serve as lipid transporters, lipid chaperones, membrane reorganizers, or signaling factors32,33,34,35. For example, oxysterol-binding homology proteins binds to phosphatidylserine and act as lipid transfer proteins for transport to the plasma membrane36.
To facilitate systematic analysis of lipid-protein binding interactions, a collection of available experimental data of all kinds of lipid-protein complexes is needed. However, to our knowledge, structural database that offers a complete repository of all types of lipid-protein interactions are largely lacking. Although the Protein Data Bank (PDB) includes universal collections of lipid-protein complex structures37, it does not provide the functionality to easily search for complexes with specific class of lipids and details on paired protein/lipid associations are lacking. Beyond structural information, quantitative binding data, such as binding affinities, are also crucial for understanding lipid-protein interactions. The BioLiP2 database contains biologically relevant ligand-protein interactions with binding affinities integrated, but it lacks filtering options to retrieve different types of lipid-protein interactions38,39. Similarly, DBLiPro provides metabolic pathway associations between lipids and proteins but lacks details on lipid-protein structural interactions and affinities40. MemProtMD database, despite containing lipid-protein interaction data, focuses exclusively on the interactions of integral membrane proteins with membrane lipids together with integration from molecular dynamics simulations41,42. Consequently, lipid-protein interactions of other kinds, despite their biological significance, remain inaccessible in current databases. Thus, a comprehensive database that contains inclusive lipid-protein interactions is essential to advance our understanding of their mechanism and patterns.
To address this gap, we have developed BioDolphin (Biological Database of Lipid-Protein Highly Inclusive Interactions), a curated database containing rich annotated information for a comprehensive set of lipid-protein binding interactions. BioDolphin provides data on pair lipid-protein annotation, experimental binding affinity, intermolecular interactions, and atomic structures across a wide range of lipid-protein interactions. Integrated into a user-friendly webserver (www.biodolphin.chemistry.gatech.edu), BioDolphin is freely accessible to the public. BioDolphin provides lipid-protein interaction entries that are searchable through a wide range of terms, including but not limited to cellular function, lipid and protein identity, and classification. The development of BioDolphin enables the research community to easily view and download interaction data and analyze lipid-protein interactions systematically.
Methods
Database construction
Overview of data curation
The procedure for the curation and annotation of lipid-protein interactions involved both manual and automated data collection from a wide range of resources (Fig. 2). Each entry of BioDolphin summarizes an interaction of a single lipid molecule and a single protein monomer (we define the pair as a unique lipid-protein “complex”). To obtain a comprehensive list of lipid-protein interactions, we curated pair complex data from two databases: BioLiP238,39 and PDB37. We first extracted all available lipid-protein interactions from BioLiP2. Although BioLiP2 contains detailed binding information of ligand-protein interactions, a non-trivial number of lipid-protein interactions that we deemed to be important were missing in BioLiP2. As such, after extracting data from BioLiP2, we further curated lipid-protein interactions that were missing in BioLiP2 using a combination of the PDB and LIPID MAPS and incorporated those entries. Then, all proteins and lipids involved in these interactions were annotated at the protein and lipid level. Finally, we obtained the intermolecular interaction information by running PLIP (Protein-Ligand Interaction Profiler)43,44.
Top Left: BioLiP2 is used to retrieve lipid-protein interaction entries and their binding affinities. Top Right: The Protein Data Bank (PDB) is used to retrieve additional lipid-protein interaction entries missing from BioLiP2 together with the atomic coordinates of all lipid-protein complexes and assemblies. Data from each pathway is integrated together to form the core of BioDolphin. In each case, lipids are classified with ClassyFire and LIPID MAPS. Bottom Left: Protein annotations are obtained from databases such as Gene Ontology, UniProt, InterPro, DeepLoc, and others. Bottom Right: Lipid annotations are obtained from resources such as LIPID MAPS, ClassyFire, and PubChem. Each pathway involves both automated and manual data collection and curation. Protein-Ligand Interaction Profiler (PLIP) provides atomistic details for lipid-protein interactions.
Extracting lipid-protein interactions from BioLiP2
A dataset of ligand-protein interactions was downloaded from BioLiP2 (https://zhanggroup.org/BioLiP)38,39. All ligands in BioLiP2 were mapped to InChIKeys from their PDB chemical component ID using the data provided from PDB (http://ligand-expo.rcsb.org/ld-download.html). Ligands were classified into chemical classes by ClassyFire45, a cheminformatics tool for classifying chemicals. To enable batch classification of ClassyFire’s API, we utilized the Python package pybatchclassyfire (https://gitlab.unige.ch/Pierre-Marie.Allard/pybatchclassyfire#running_instructions). Based on the classification results, ligands that either have “lipid and lipid-like molecule” as their superclass or have “phenylpropanoids and polyketides” as their superclass with LIPID MAPS terms were identified as lipids. All ligand-protein interactions from BioLiP2 dataset with ligands identified as lipids were extracted.
Curating lipid-protein interactions from PDB
Missing lipid-protein interactions from BioLiP2 were collected from PDB. The procedure was carried out in two steps: first, collecting a comprehensive list of lipids and second, retrieving all lipid-protein interactions from available assembly data in PDB. The list of lipids was collected by concatenating the set of lipids defined in LIPID MAPS1,2 and the set of lipids classified by ClassyFire in the previous step. A total of 2,619 lipids were collected. The list was used to search for PDB IDs that contain lipids with the RCSB PDB search API46,47. Lipid-protein interactions within these PDB assemblies were then obtained using the RCSB PDB graphQL-based Data API46,47. For all ligand-protein pairs in a PDB assembly structure, the criteria to collect them as a lipid-protein interaction is as follow: First, the ligand needs to be a lipid molecule defined in the list of lipids described above (defined by LIPID MAPS and ClassyFire). Second, the lipid and protein are either in close contact or have intramolecular interactions according to PLIP. For the former, a lipid-protein pair is determined to be in close contact if they have the same chain identifier (auth_asym_id) in the PDB. For cross-chain lipid-protein pairs, we included them as an interaction if there are one or more intermolecular interactions in the lipid-protein complex determined by PLIP. The details of calculating intermolecular interactions with PLIP are described in the section below.
Annotation of proteins
The RCSB PDB graphQL-based Data API was used to obtain protein annotations including its name, synonyms, sequence, enzyme commission number, source organism, and gene ontology (biological process, cellular component, and molecular function). UniProt REST API48,49 was used to fetch protein annotations including InterPro50, Pfam51, and Reactome52. Membrane association classifications of proteins are obtained based on the “subcellular location” information fetched from UniProt REST API. The subcellular location IDs provided by UniProt were mapped into membrane association classification using the criteria provided by DeepLoc 2.153. The membrane association classifications include soluble, transmembrane, peripheral, and lipid-anchor. For the proteins lacking subcellular locations in UniProt, we ran DeepLoc 2.1 to predict their membrane associations. BioDolphin entry pages explicitly note whether membrane association classification is obtained from DeepLoc 2.1’s in silico prediction.
Annotation of lipids
The RCSB PDB graphQL-based Data API30 was used to obtain lipid annotations including PDB chemical component identifier, name, synonym, InChI, InChIKey, SMILES, IUPAC name, chemical database identifiers (PubChem, DrugBank, ChEBI, and ChEMBL), molecular formula, and molecular weight. PubChem’s PUG View REST-style web service54 was used to fetch lipid annotations including pharmacological class, adverse effect, and related disease. Additionally, lipids are assigned with categories as defined by LIPID MAPS. For lipids that are unavailable in the LIPID MAPS database, their lipid categories were assigned using ClassyFire.
Atom-level intermolecular interactions
BioDolphin defines lipid-protein “complexes” as unique lipid-protein interactions within paired PDB chains. In contrast, lipid-protein “assemblies” are defined as entries that contain multiple lipids and/or proteins within the same PDB ID. To obtain lipid-protein intermolecular interaction details of both lipid-protein complexes and lipid-protein assemblies, we processed all source PDB assemblies of our lipid-protein complexes using BioPython to extract the lipid and protein components in each assembly. Next, we ran PLIP on those processed PDB assemblies to obtain atom-level interaction details and 3D visualizations of the interactions. The PLIP software in a Singularity image was installed from https://github.com/pharmai/plip and ran with the command line tool. Within the assembly, PLIP provides detail information on lipid-protein intermolecular interaction types (hydrophobic interaction, hydrogen bond, water bridge, π-stacking, π-cation interaction, and salt bridges), atomic distances, and chains.
Website application
Web server implementation:
The BioDolphin website was developed as a pure front-end project using Vue (version 3.2.47), a progressive JavaScript framework for building user interfaces, along with Vue Router (version 4.1.6) for navigation and Vuetify (version 3.4.4) for pre-built Vue components that adhere to the Material Design specification. The website is hosted on an Ubuntu Linux Server with nginx as the web server. ECharts (version 5.4.2) is used to create charts and graphs for data summaries. For molecular visualization, PDBe Molstar (version 3.3.0)55 (https://github.com/molstar/pdbe-molstar) renders detailed lipid-protein complex structures, while JSmol (version 16.3.1) is used to display molecular interactions. The BioDolphin dataset is file-based, stored and loaded asynchronously in JavaScript Object Notation (JSON) format for data storage and asynchronous loading. There is no SQL or NoSQL database schema; all data is managed directly through JSON files. As a pure front-end project, this approach simplifies maintenance and upgrades, reduces server dependencies, and minimizes exposure to database-related security risks, such as SQL injection attacks.
API development
We developed a RESTful API service to enable programmatic access to the BioDolphin database. The API implements a GET method that accepts a BioDolphin ID as a query parameter. Upon receiving a request, the service retrieves the full lipid-protein interaction information from the metadata. The system validates the requested ID and returns the corresponding data entry in JSON format. If the requested ID is not found or if other errors occur, the service will return error messages with the corresponding HTTP status codes.
Results
Overview of BioDolphin
BioDolphin is a comprehensive database of lipid-protein binding interactions (Fig. 3). A binding interaction is defined as a one-to-one binding relationship between a single protein monomer and a single lipid molecule. We define a pair of lipid and protein with binding interaction as a complex; PDB entries with multiple lipid and proteins chains are considered assemblies. BioDolphin contains 127,359 entries of complexes, curated from 14,891 unique PDBs. These entries encompass 6464 unique proteins and 2619 unique lipid molecules. For each entry, BioDolphin provides detailed information on the lipid-protein complex, including binding affinity (when available), atomic structures, annotations of the lipids and proteins involved in the interaction, and detailed intermolecular interactions within the source PDB assembly. Lipids are classified into one of the eight lipid categories defined by LIPID MAPS1,2 (Fig. 1).
Lipid-protein interactions can be searched via complex and/or assembly, protein, or lipid information. Results of the queried interactions can be viewed on the web server, and users can easily access the annotation of lipids/proteins and information on the complex. The full data matrix with interactions of interest can be downloaded as a text file and a csv file. Data can also be obtained through a GET based API.
The BioDolphin webserver allows users to search and filter for lipid-protein interactions by specific lipids or proteins of interest. Users can visualize the 3D binding structures interactively and access information about the lipid and protein involved. The lipid-protein intermolecular interactions in each PDB assembly can also be viewed in 3D interactively with detailed atom-level binding information. Each entry is assigned a unique BioDolphin ID that allows users to cite and share any entry page. The BioDolpin IDs can be found on the Browse page and within the “View Entry Details” page. As an example, the unique BioDolphin ID “BD1a54-A-A-IPM1” can be accessed and shared at https://biodolphin.chemistry.gatech.edu/detail/BD1a05-A-A-IPM1. There are also unique, sharable pages for the molecular interactions defining entire lipid-protein assemblies (i.e, for all lipid-protein chain pairs in PDBs with multiple lipid-protein interactions); for example: https://biodolphin.chemistry.gatech.edu/interaction/1a05. Additionally, the full metadata is available for download in a tab-delimited text file format and comma-separated file (.csv format).
BioDolphin summary statistics
Figure 4 shows a set of summary statistics describing the contents of BioDolphin. The statistical distributions of the feature annotations for lipids and proteins were constructed based on: i) proteins/lipids from all entries in the database with each unique protein/lipid providing a contribution for each occurrence in the database, and ii) the set of unique proteins/lipids present in the database. Based on their membrane association classification, proteins were classified into soluble, transmembrane, peripheral, and lipid-anchor, based on their membrane association. Among the set of 6464 unique proteins in BioDolphin, 3125 proteins are soluble proteins, 2593 proteins are transmembrane proteins, 303 proteins are peripheral proteins, 266 proteins are both peripheral and soluble proteins, and 61 proteins are lipid-anchor proteins (Fig. 4a). The proteins in BioDolphin come from a wide range of species, totaling 1409 organisms. The distribution of the natural source organisms of the unique proteins in BioDolphin is also diverse. Homo sapiens, Escherichia coli, Mus musculus, and Saccharomyces cerevisiae are the top source organisms of the unique and all proteins in the database (Fig. 4b and Supplementary Fig. 1).
a Pie chart of the distribution of the protein membrane association classifications of the set of unique proteins. b Pie chart of the distribution of the source organisms of the set of unique proteins. c The molecular weight (Da) distribution of lipids as a fraction of the total for the set of all lipids (blue) and the set of unique lipids (yellow). d Left: The lipid class distribution of lipids as a fraction of the total for the set of unique lipids (yellow). Right: The lipid class distribution of lipids as a fraction of the total for the set of all lipids (blue).
The molecular weights of lipids in BioDolphin range from 58.07 Da to 2036.77 Da (Fig. 4c). The average molecular weight of the set of unique lipids is 455.69 Da. The molecular weights of lipids from all interaction entries are skewed toward a larger value, with an average of 582.16 Da. This suggests that larger lipid molecules involve in more lipid-protein interactions in those atomic structures. Among the set of unique lipids, fatty acyls comprise the largest group, followed by polyketides, prenol lipids, and sterol lipids (Fig. 4d). Among the set of lipids in all entries, glycerophospholipids are the largest group, followed by fatty acyl, prenol lipids, fatty acyl and glycerophospholipids.
BioDolphin web interface
Search functionality
BioDolphin provides multiple methods to query relevant lipid-protein interactions. Users can either browse all entries in the database or can query specific entries by filtering (Fig. 5). Lipid-protein complexes can be filtered by their source assembly PDB ID. Lipid-protein interactions without binding affinity data can be filtered out by selecting the “Only With Affinity Measurement” option. Proteins of interest can be searched for via source organisms, biological processes, molecular functions, cellular components, protein families, pathways, UniProt IDs, names, or amino acid sequences. The protein sequence search identifies proteins with similar sequences based on the MMseqs256,57 software from the RCSB PDB search API46,47, with an E-value cutoff of 0.1 and identify cutoff of 0.9, which allows for searching of sequences with mutations. Finally, lipids of interest can be filtered by their LIPID MAPS class, common names, IUPAC name, SMILES, InChIKey, InChI, PubChem Compound Identification (CID)58, molecular formula, PDB Chemical ID (CCD ID), pharmacological class, adverse effect, or related disease. Users can specify multiple filters at a time, and different filters will be searched with the AND operation. If multiple options in the same filter are specified, the OR operation will be performed for those options.
The image shows a screenshot taken from the BioDolphin Search page highlighting the different options for searching and/or filtering. Top: Users can filter by PDB ID and/or affinity. Middle: Users can filter by protein annotation. Bottom: Users can filter by lipid annotation. Combinations of filtering by PDB ID, affinity, protein annotation, and lipid annotation are possible.
Browsing interface
Lipid-protein interaction entries in BioDolphin can be viewed in the results table (Fig. 6a). The table can be retrieved by either selecting “browse all” to view all the data entries in BioDolphin or by submitting queries described above to view specific interactions. The results table gives a brief overview of each lipid-protein interaction, which contains 1. specifications of the PDB structure the interaction was sourced from: (PDB ID, PDB chain IDs of the lipid and protein), 2. names of the lipid and protein involved in the interaction, 3. LIPID MAPS categories of the lipid, and 4. the affinity values of the lipid-protein complex.
a Browsing interface as a result table. Users can click on the view button under the column “View Entry Details” (circled in red) to view the details of a specific entry (shown in b). Users can also click on the view button under the column “View Interactions in PDBs” (circled in yellow) to view the intermolecular interaction details (shown in c). b Interface of the entry details including information on the complex and annotations of the lipid and protein (Full annotations are on the website). c Interface of the intermolecular interaction details of the PDB assembly the entry complex is sourced from.
By clicking “view” under the “View Entry Details” column, users can view the interactive 3D structure of the complex, the binding affinity between the lipid and the protein (Fig. 6b), and detailed annotations on both the protein and lipid. The protein annotations displayed include protein names, protein sequence, the natural source organism of the protein, gene ontology (biological process, cellular component, and molecular functions), protein family, biological pathway, protein membrane association, UniProt ID, and Enzyme Commission number (EC number) for enzymes. The lipid annotations displayed include lipid names, PDB chemical IDs, InChI, InChIKey, LIPID MAPS categories, LIPID MAPS terms, IUPAC name, PubChem ID, SMILES, molecular formula, pharmacological class, adverse effect, and related disease. The link of the entry detail pages can be shared via the URL: https://biodolphin.chemistry.gatech.edu/detail/<BioDolphin_ID > . Additionally, by clicking “view” under the “View Interactions in PDBs” column, users can access the lipid-protein intermolecular interaction details of the whole lipid-protein source PDB assembly (Fig. 6c). The interactions are grouped into sections based on the lipids found in the PDB assembly. Each section shows the interactive 3D visualization of a specific lipid (in yellow), the protein residues it interacts with (in blue), and the type of interactions between the lipid and the protein residues. Furthermore, atomic-level interaction details formatted in a table is also provided for each type of interaction. Each row of the table shows information on the participating protein residue, lipid, and the interaction geometry. Detailed explanation of the table columns can be found on https://plip-tool.biotec.tu-dresden.de/plip-web/plip/index. The link of the BioDolphin interaction pages can be shared via https://biodolphin.chemistry.gatech.edu/interaction/<PDB_ID>.
Downloading the dataset
Users can download the full metadata from the website in the “download” tab of the navigation bar. Alternatively, they can also download the full data of specific interaction entries by selecting the check box on the left of the entries in the results table, and they will be able to download these data by clicking “Download Selected” at the bottom left of the dataset. The downloaded data will be in the format of csv file or tab-delimited text file. Data can also be obtained through a GET-based API. The full columns of the metadata and a more detailed explanation of the information contained in each can be found in the supplementary information.
Discussion
In summary, we developed BioDolphin as a curated inclusive database with comprehensive lipid-protein interactions that provide the scientific community with a tool to research how proteins and lipids interact. The database contains more than 127,000 lipid-protein interactions sourced from a wide range of sources. Entries in our database contain binding affinities and atomic structures of the lipids and proteins, and they are annotated with detailed molecular information.
To our knowledge, BioDolphin is the first lipid-protein interaction database that contains universal lipid-protein interactions beyond membrane lipid-protein interactions. The development of BioDolphin is expected to widely impact future computational and experimental studies on lipid-protein interactions. With a user-friendly web interface, experimentalists will be able to search for and view lipid-protein interactions of interest with one click. Computational researchers can also utilize the dataset for large-scale analysis of the lipid-protein interaction properties. In addition, it is currently unknown whether the structure library for general modes of lipid-protein binding is complete enough to in principle solve the structure prediction problem for lipid-protein complex at low-to-moderate resolutions; this has been demonstrated for proteins and protein-protein complexes59,60,61. A curated summary of lipid-protein interactions for different classes should begin to address this question.
Data availability
The complete BioDolphin database can be accessed from the web portal: www.biodolphin.chemistry.gatech.edu. The metadata is freely available for download on the website. BioDolphin v1.1 contains data in the PDB as of Sep 6, 2024. The authors are currently working on implementation for automatic updating of BioDolphin as new entries are available in the PDB. Until then, the database will be updated through in-house scripts. BioDolphin data is can also be accessed via an API described in the manuscript.
References
Sud, M. et al. LMSD: LIPID MAPS structure database. Nucleic Acids Res 35, D527–D532 (2007).
Conroy, M. J. et al. LIPID MAPS: update to databases and tools for the lipidomics community. Nucleic Acids Res 52, D1677–D1682 (2024).
Harayama, T. & Riezman, H. Understanding the diversity of membrane lipid composition. Nat. Rev. Mol. Cell Biol. 19, 281–296 (2018).
Welte, M. A. & Gould, A. P. Lipid droplet functions beyond energy storage. Biochim. Biophys. Acta Mol. Cell Biol. Lipids 1862, 1260–1272 (2017).
Yoon, H., Shaw, J. L., Haigis, M. C. & Greka, A. Lipid metabolism in sickness and in health: Emerging regulators of lipotoxicity. Mol. Cell 81, 3708–3730 (2021).
González-Becerra, K. et al. Fatty acids, epigenetic mechanisms and chronic diseases: a systematic review. Lipids Health Dis. 18, 178 (2019).
Moody, D. B. & Suliman, S. CD1: From Molecules to Diseases. F1000Research 6, 1909 (2017).
Schulze, H. & Sandhoff, K. Lysosomal lipid storage diseases. Cold Spring Harb. Perspect. Biol. 3, a004804 (2011).
Bagchi, S., Genardi, S. & Wang, C.-R. Linking CD1-Restricted T Cells With Autoimmunity and Dyslipidemia: Lipid Levels Matter. Front. Immunol. 9, 1616 (2018).
Saliba, A.-E., Vonkova, I. & Gavin, A.-C. The systematic analysis of protein-lipid interactions comes of age. Nat. Rev. Mol. Cell Biol. 16, 753–761 (2015).
Thiam, A. R., Farese, R. V. & Walther, T. C. The Biophysics and Cell Biology of Lipid Droplets. Nat. Rev. Mol. Cell Biol. 14, 775–786 (2013).
Dowhan, W., Mileykovskaya, E. & Bogdanov, M. Diversity and versatility of lipid-protein interactions revealed by molecular genetic approaches. Biochim. Biophys. Acta 1666, 19–39 (2004).
Corradi, V. et al. Emerging Diversity in Lipid-Protein Interactions. Chem. Rev. 119, 5775–5848 (2019).
Coskun, U. & Simons, K. Cell membranes: the lipid perspective. Struct. Lond. Engl. 1993 19, 1543–1548 (2011).
Muller, M. P. et al. Characterization of Lipid-Protein Interactions and Lipid-mediated Modulation of Membrane Protein Function Through Molecular Simulation. Chem. Rev. 119, 6086–6161 (2019).
Liang, B. & Tamm, L. K. NMR as a tool to investigate the structure, dynamics and function of membrane proteins. Nat. Struct. Mol. Biol. 23, 468–474 (2016).
Pabst, G. & Keller, S. Exploring membrane asymmetry and its effects on membrane proteins. Trends Biochem. Sci. 49, 333–345 (2024).
Lee, A. G. How lipids affect the activities of integral membrane proteins. Biochim. Biophys. Acta 1666, 62–87 (2004).
Hanson, M. A. et al. A specific cholesterol binding site is established by the 2.8 A structure of the human beta2-adrenergic receptor. Struct. Lond. Engl. 1993 16, 897–905 (2008).
Cheng, T.-Y. et al. Lipidomic scanning of self-lipids identifies headless antigens for natural killer T cells. Proc. Natl Acad. Sci. USA 121, e2321686121 (2024).
Huang, S. et al. CD1 lipidomes reveal lipid-binding motifs and size-based antigen-display mechanisms. Cell 186, 4583–4596.e13 (2023).
Szoke-Kovacs, R., Khakoo, S., Gogolak, P. & Salio, M. Insights into the CD1 lipidome. Front. Immunol. 15, 1462209 (2024).
Larsen, A. H., John, L. H., Sansom, M. S. P. & Corey, R. A. Specific interactions of peripheral membrane proteins with lipids: what can molecular simulations show us? Biosci. Rep. 42, BSR20211406 (2022).
Tubiana, T., Sillitoe, I., Orengo, C. & Reuter, N. Dissecting peripheral protein-membrane interfaces. PLoS Comput. Biol. 18, e1010346 (2022).
Scott, J. D. & Pawson, T. Cell signaling in space and time: where proteins come together and when they’re apart. Science 326, 1220–1224 (2009).
Rebecchi, M. J. & Pentyala, S. N. Structure, function, and control of phosphoinositide-specific phospholipase C. Physiol. Rev. 80, 1291–1335 (2000).
Essen, L. O., Perisic, O., Cheung, R., Katan, M. & Williams, R. L. Crystal structure of a mammalian phosphoinositide-specific phospholipase C delta. Nature 380, 595–602 (1996).
Resh, M. D. Covalent Lipid Modifications of Proteins. Curr. Biol. CB 23, R431–R435 (2013).
Zhou, Y. et al. Lipid-Sorting Specificity Encoded in K-Ras Membrane Anchor Regulates Signal Output. Cell 168, 239–251.e16 (2017).
Sarmento, M. J. et al. The expanding organelle lipidomes: current knowledge and challenges. Cell. Mol. Life Sci. CMLS 80, 237 (2023).
Teyton, L. Role of lipid transfer proteins in loading CD1 antigen-presenting molecules. J. Lipid Res. 59, 1367–1373 (2018).
Wong, L. H., Gatta, A. T. & Levine, T. P. Lipid transfer proteins: the lipid commute via shuttles, bridges and tubes. Nat. Rev. Mol. Cell Biol. 20, 85–101 (2019).
D’Anneo, A. et al. Lipid chaperones and associated diseases: a group of chaperonopathies defining a new nosological entity with implications for medical research and practice. Cell Stress Chaperones 25, 805–820 (2020).
Glatz, J. F. C. Lipids and lipid binding proteins: a perfect match. Prostaglandins Leukot. Essent. Fat. Acids 93, 45–49 (2015).
Chiapparino, A., Maeda, K., Turei, D., Saez-Rodriguez, J. & Gavin, A.-C. The orchestra of lipid-transfer proteins at the crossroads between metabolism and signaling. Prog. Lipid Res. 61, 30–39 (2016).
Maeda, K. et al. Interactome map uncovers phosphatidylserine transport by oxysterol-binding proteins. Nature 501, 257–261 (2013).
Berman, H. M. et al. The Protein Data Bank. Nucleic Acids Res. 28, 235–242.
Zhang, C., Zhang, X., Freddolino, P. L. & Zhang, Y. BioLiP2: an updated structure database for biologically relevant ligand-protein interactions. Nucleic Acids Res 52, D404–D412 (2024).
Yang, J., Roy, A. & Zhang, Y. BioLiP: a semi-manually curated database for biologically relevant ligand-protein interactions. Nucleic Acids Res 41, D1096–D1103 (2013).
Wu, Q. et al. DBLiPro: A Database for Lipids and Proteins in Human Lipid Metabolism. Phenomics Cham Switz. 3, 350–359 (2023).
Newport, T. D., Sansom, M. S. P. & Stansfeld, P. J. The MemProtMD database: a resource for membrane-embedded protein structures and their lipid interactions. Nucleic Acids Res 47, D390–D397 (2019).
Stansfeld, P. J. et al. MemProtMD: Automated Insertion of Membrane Protein Structures into Explicit Lipid Membranes. Struct. England1993 23, 1350–1361 (2015).
Adasme, M. F. et al. PLIP 2021: expanding the scope of the protein-ligand interaction profiler to DNA and RNA. Nucleic Acids Res 49, W530–W534 (2021).
Salentin, S., Schreiber, S., Haupt, V. J., Adasme, M. F. & Schroeder, M. PLIP: fully automated protein-ligand interaction profiler. Nucleic Acids Res 43, W443–W447 (2015).
Djoumbou Feunang, Y. et al. ClassyFire: automated chemical classification with a comprehensive, computable taxonomy. J. Cheminformatics 8, 61 (2016).
Bittrich, S. et al. RCSB Protein Data Bank: Efficient Searching and Simultaneous Access to One Million Computed Structure Models Alongside the PDB Structures Enabled by Architectural Advances. J. Mol. Biol. 435, 167994 (2023).
Rose, Y. et al. RCSB Protein Data Bank: Architectural Advances Towards Integrated Searching and Efficient Access to Macromolecular Structure Data from the PDB Archive. J. Mol. Biol. 433, 166704 (2021).
UniProt Consortium. The universal protein resource (UniProt). Nucleic Acids Res 36, D190–D195 (2008).
UniProt Consortium. UniProt: the Universal Protein Knowledgebase in 2023. Nucleic Acids Res 51, D523–D531 (2023).
Paysan-Lafosse, T. et al. InterPro in 2022. Nucleic Acids Res 51, D418–D427 (2023).
Mistry, J. et al. Pfam: The protein families database in 2021. Nucleic Acids Res 49, D412–D419 (2021).
Milacic, M. et al. The Reactome Pathway Knowledgebase 2024. Nucleic Acids Res 52, D672–D678 (2024).
Ødum, M. T. et al. DeepLoc 2.1: multi-label membrane protein type prediction using protein language models. Nucleic Acids Res 52, W215–W220 (2024).
Kim, S. et al. PUG-View: programmatic access to chemical annotations integrated in PubChem. J. Cheminformatics 11, 56 (2019).
Sehnal, D. et al. Mol* Viewer: modern web app for 3D visualization and analysis of large biomolecular structures. Nucleic Acids Res 49, W431–W437 (2021).
Steinegger, M. & Söding, J. MMseqs2 enables sensitive protein sequence searching for the analysis of massive data sets. Nat. Biotechnol. 35, 1026–1028 (2017).
Mirdita, M., Steinegger, M. & Söding, J. MMseqs2 desktop and local web server app for fast, interactive sequence searches. Bioinforma. Oxf. Engl. 35, 2856–2858 (2019).
Kim, S. et al. PubChem Substance and Compound databases. Nucleic Acids Res 44, D1202–D1213 (2016).
Zhang, Y. & Skolnick, J. The protein structure prediction problem could be solved using the current PDB library. Proc. Natl Acad. Sci. Usa. 102, 1029–1034 (2005).
Jumper, J. et al. Highly accurate protein structure prediction with AlphaFold. Nature 596, 583–589 (2021).
Abramson, J. et al. Accurate structure prediction of biomolecular interactions with AlphaFold 3. Nature 630, 493–500 (2024).
Acknowledgements
The authors acknowledge the computational resources from the Advanced Cyberinfrastructure Coordination Ecosystem: Services & Support (ACCESS) program, which is supported by National Science Foundation. This research was supported in part through research cyberinfrastructure resources and services provided by the Partnership for an Advanced Computing Environment (PACE) at the Georgia Institute of Technology, Atlanta, Georgia, USA. A.C.M. acknowledges funding from the Shurl and Kay Curci Foundation under award GR00027161. Y.L. acknowledges funding from the National Institute of General Medical Sciences of the National Institutes of Health under the award R35GM150890. L.-Y.Y. acknowledges funds from the Taiwan Ministry of Education Government Scholarship to Study Abroad program.
Author information
Authors and Affiliations
Contributions
L-Y.Y. and A.C.M. conceived, designed, and implemented BioDolphin. L-Y.Y. and A.C.M. curated and annotated data. L-Y.Y. and K.P. built the web implementation of BioDolphin. L-Y.Y., K.P., and A.C.M. wrote the manuscript with feedback from Y.L. Y.L. and A.C.M. supervised the research.
Corresponding author
Ethics declarations
Competing interests
The authors declare no competing interests.
Peer review
Peer review information
Communications Chemistry thanks Joan Segura, Wentao Dai, and the other, anonymous, reviewers for their contribution to the peer review of this work. Peer review reports are available.
Additional information
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary information
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.
About this article
Cite this article
Yang, LY., Ping, K., Luo, Y. et al. BioDolphin as a comprehensive database of lipid–protein binding interactions. Commun Chem 7, 288 (2024). https://doi.org/10.1038/s42004-024-01384-z
Received:
Accepted:
Published:
Version of record:
DOI: https://doi.org/10.1038/s42004-024-01384-z








