Abstract
Vitellogenin (Vg) is the main yolk precursor lipoprotein in almost all egg-laying animals. In addition, along its evolutionary history, Vg has developed a range of new functions in different taxa. In the honey bee, Vg has functions related to immunity, antioxidant protection, social behavior and longevity. However, the molecular mechanisms underlying Vg functionalities are still poorly understood. Here, we report the cryo-EM structure of full-length honey bee Vg, one-step purified directly from hemolymph. The structure provides structural insights into the overall domain architecture, including the lipid binding cavity and the previously uncharacterized von Willebrand factor type D domain. A domain of unknown function has been identified as a C-terminal cystine knot domain based on structural homology. Information about post-translational modifications, cleavage products, metal and lipid binding allow an improved understanding of the mechanisms underlying the range of Vg functionalities. The findings have numerous implications for the structure-function relationship of vitellogenins of other species as well as members of the same protein superfamily, which share the same structural elements.
Similar content being viewed by others
Introduction
Pleiotropy – or the ability of a single gene to affect multiple phenotypic traits – is a common phenomenon but is generally poorly understood at the molecular level. Interesting examples of pleiotropy are found in the large lipid transfer protein (LLTP) superfamily. LLTPs are primarily responsible for the circulatory transport of lipids in animals, with their emergence linked to the increased need for lipid transport associated with multicellularity1. Members of the superfamily include the mammalian apolipoprotein B (apoB), the microsomal triglyceride transfer protein (MTP), vitellogenin (Vg) and insect apolipophorins II/I (apoLp-II/I). Interestingly, some LLTPs have acquired new functionalities along their evolutionary history. Given their circulation in body fluids in relatively high concentrations, these new functions are often linked to immunity. These include involvement in antigen presentation2, blood clotting (reviewed in chapter 2 of Hoeger and Harris3) and a plethora of other functions for apoLp-II/I (reviewed in chapters 4 and 5 of Hoeger and Harris3). In particular, pleiotropy appears to be most developed for the egg yolk precursor lipo-glyco-metallo-phosphoprotein Vg.
Vg is present in almost all egg-laying animals and has been traditionally studied as a female-specific protein in the context of vitellogenesis4. During vitellogenesis, Vg synthesis by somatic cell lineages is boosted and the protein is released into circulation, from where it is internalized by the oocytes as the main precursor of yolk proteins. Thus, Vg provides the developing egg with amino acids, ions, lipids and fat-soluble vitamins and hormones. In the last 20 years, data about the immune functions of Vg have emerged in taxa as different as corals5, mollusks6,7, arthropods8,9,10 and fishes11,12,13,14,15,16,17. Vg has been found to have antibacterial5,6,7,8,9,11,12,13,14,15,17,18 and antiviral16 activities. Vg achieves these by recognizing a range of pathogen-associated molecular patterns (PAMPs)6,18, directly causing the death of the pathogen6,12 or opsonizing for phagocytosis by immune cells5,11,17,18,19. Interestingly, it has been found that trans-generational immune priming can occur through Vg10, suggesting a link between Vg immune and reproductive functions. Vg has also been shown to recognize membranes20,21 and protect from oxidative stress through different mechanisms21,22,23. Additionally, in insects, Vg has been found to regulate and functionally interact with different hormones24,25. For social insects such as bees and ants, Vg governs social roles for sterile workers as a function of nutritional and metabolic status, while controlling the highly variable lifespan of different castes25,26,27,28,29, most likely through its ability to protect from oxidative damage. More recently, it has been suggested that a cleavage product of Vg can translocate to the nucleus and thus regulate gene expression30, hinting at a possible mechanism by which Vg can exert some of its various functions. However, there is limited understanding of the molecular basis for the functional data regarding Vg pleiotropy.
Structurally, Vg is characterized by a lipid binding module common for the LLTP superfamily1. At the turn of the millennium, a crystal structure of lipovitellin (Lv), the proteolytically processed Vg product obtained from silver lamprey eggs (Ichthyomyzon unicuspis), provided a glimpse of the LLTP lipid binding module, with particular insight on the Vg lipid binding cavity31,32 and its reproductive role as a nutrient source for the developing embryo. However, the structure covered only about 75% of the Vg sequence, with entire domains missing and several flexible stretches. In addition to lipovitellin from silver lamprey (IuLv), the LLTP lipid binding module was experimentally observed within the crystal structure of human MTP, showing a much smaller lipid binding cavity that adapts to the role of the protein, which is not a transporter but a lipid-loading protein for other LLTPs33. Recently, cryo-EM also allowed the reconstruction of human MTP from native raw liver lysate34. Among vitellogenins, a wide range of structural variation is produced by taxa-specific loops and domain additions35. Additionally, different proteolytic cleavage events generate separate chains35,36 that may or may not keep working together as a single unit, as observed for IuLv31. In silver lamprey lipovitellin, some domains are missing from the lipid binding module, which is formed by separate protein chains. This molecular complexity is further increased by the presence of splicing variants9, multiple non-identical gene copies in a single species and the existence of Vg-like proteins in some species. Pleiotropy through neo- and sub-functionalization in different taxa seems to be an intrinsic characteristic of Vg evolutionary history37.
Much of the research around Vg pleiotropy has been carried out for the honey bee (Apis mellifera and Apis cerana). These widely distributed species highlight the importance of Vg for animal health, with extensive ecologic and economic implications. Here, we report the 3.2 Å resolution cryogenic electron microscopy (cryo-EM) structure of honey bee Vg (AmVg) purified from its hemolymph, the first structure from a non-vertebrate species with nearly full-length coverage. This includes the von Willebrand factor type D (vWD) domain, present also in some other LLTPs1, with unknown function and not reported before for a protein of the LLTP superfamily. We also identified a putative dimerization site in the C-terminal domain, which we classify as a C-terminal cystine knot (CTCK) domain. In addition, the structures reported herein provide insight into vitellogenin post-translational modifications, binding to metals and lipids. The structure of a previously uncharacterized cleavage product with unclear biological significance was also solved at 3.0 Å. All these findings represent a leap forward in our understanding of the multiple molecular mechanisms that underlie Vg pleiotropy in general and for the honey bee in particular. Many of the structural elements and domains of vitellogenin here described are shared by other members of the LLTP superfamily.
Results
The cryo-EM structures of vitellogenin from a native source
In order to better understand the molecular mechanisms that allow AmVg to exert its many functions, we determined the cryo-EM structure of AmVg purified from hemolymph (Supplementary Fig. S1) of the honey bee to 3.2 Å (Fig. 1, Supplementary Table S1). The sample was heterogeneous and contained the full-length protein along with a roughly 150 kDa AmVg cleavage product at an abundance similar to that of the full-length protein. Particles of the cleavage product yielded maps of a resolution of 3.0 Å. For both particle classes, AmVg was observed as a monomer, and there was no evidence for dimerization. A general sequence and structure comparison of AmVg with other LLTP structures is shown in Fig. 2.
a Domain architecture of full-length Vg from Apis mellifera (AmVg). Different domains and subdomains are coloured differently, while flexible regions not observed in the experimental structure are shaded in grey. Disulfide bonds are also shown (a detailed discussion on disulfide bond modelling can be found in the Supplementary Fig. S3). b Cryo-EM density map of native AmVg at 3.2 Å resolution in different orientations, coloured and labelled by domain and subdomain. c Ribbon model of the AmVg structure coloured by domain and subdomain as in panels (b) and (a).
a Domain architecture of full-length Vg from the honey bee (AmVg), with a second representation showing flexible regions not observed in the cryo-EM structure shaded in light grey. The domain architecture of the cleavage product of AmVg we observed in our sample is also shown. Below, the domain architecture of Vg from silver lamprey (IuVg) and its cleavage product Lv (IuLv) are shown. Regions not observed experimentally due to flexibility are shaded light grey for the crystal structure of IuLv32. Finally, the domain architecture of human MTP (HsMTP) for which the crystal structure has been reported is also shown33. b Atomic models of all experimentally solved LLTPs, coloured by domain and subdomain. The backbone root mean square deviation (RMSD) values of each structure compared to the full-length AmVg are shown.
Multiple sequence alignments
The structural data was complemented with multiple sequence alignment (MSA) of homologous sequences, providing information into conservation of residue positions. This provided insight into the conservation of different structural elements as well as confidence in their presence when the cryo-EM density was not conclusive. Two different alignments were performed, one including homologues only from arthropods and a broader one including vertebrate vitellogenins (Supplementary Fig. S2).
The lipid binding module
As expected, AmVg contains an LLTP lipid binding module, which is characterized by several subdomains, the N sheet, responsible for receptor binding38,39, the lipid binding cavity itself formed by the A and C-sheets, and the α-helical subdomain that wraps around the A and C-sheets. The N-sheet is found at the N-terminus and formed by an antiparallel β-sheet wrapped around a central α-helix. The sheet is one strand short of forming a barrel and has strands of very different lengths, allowing for an overlap between the N-sheet and the A-sheet from the lipid binding cavity to form a β-sandwich, as already observed for IuLv31. The N-sheet contains loops of varying length, some of which contain short helices. In one of the loops, a disulfide bridge (C178–C222) conserved in IuLv and human MTP (HsMTP) seems to stabilize a short β-strand that integrates with the A-sheet. Interestingly, after this short β-strand, density is not observed for the rest of the loop (residues 232–245), most likely due to flexibility. The equivalent loop is well resolved for IuLv, containing a second disulfide not conserved for honey bee Vg. This loop might be stabilized upon zinc binding in AmVg, as discussed later. The long and solvent-exposed section at residues 147–160 is only well-resolved for the AmVg cleavage product, where it forms a short helix.
The region comprised of residues 340–384 between the N-sheet and the α-helical domain corresponds to a polyserine region (polyS) that is characteristic of insect vitellogenins35. This region was studied before using nuclear magnetic resonance (NMR) spectroscopy and shown to be highly disordered, with protease binding sites and multiple phosphorylated serine residues preventing its cleavage40. Not surprisingly, cryo-EM density for this region is not observed for AmVg. However, at high contour levels, poorly-defined densities can be observed next to the protein at the interface where the loop is found. From the ordered regions of the protein, basic side chains point towards the poorly defined cryo-EM densities (H20, K264, H265, K112, H113, K601 and H602), showcasing potential ionic interactions with the phosphorylated residues in the flexible polyserine region.
The α-helical subdomain is formed by 17 long α-helices arranged in two layers forming a super-helical right-handed coil. The helices are parallel within each layer and antiparallel between layers. The interface between layers is hydrophobic while the solvent-exposed surface is highly positively charged containing up to 34 positively charged amino acids (Fig. 3e). The highly charged surface of the subdomain has been associated with membrane binding21. Some of the positively charged amino acids contribute to stabilizing the subdomain by means of salt bridges (E435–K450, R498–E526, K514–E551, R547–E609, E617–R629, R664–E695 and R700–D727). Here, some loops between helices are significantly longer than for IuLv. These include the stretch before the domain starts that contains two short helices (residues 387–421), the loop between long helices 3 and 4 (residues 473–488) and the insect-specific loop21 containing two short helices likely involved in zinc binding found between helices 9 and 10 (residues 581–604).
a, b Unmodelled partial lipid densities in the interior and at the edge of the lipid binding cavity contoured at 7σ. c Phospholipid head group identified at the edge of the lipid binding cavity contoured at 7σ. d Glycosylation at N296. The quality of the density allows the identification of the N-acetyl groups of the first two N-acetylglucosamine units and the identification of up to three more mannose units at lower contour levels. Density shown for the higher-resolution cleavage product of AmVg contoured at 1σ. e Solvent-facing view of the a-helical domain with the side chains of basic amino acids coloured in blue, which have been associated with the ability of Vg to bind membranes21.
The structure of the C-sheet is similar to that of the IuLv C-sheet. The helix-containing loop corresponding to residues 842–873 is involved in inter-domain stabilization by wrapping around the vWD domain. This loop is flexible in the absence of the vWD domain, as is the case for IuLv and the AmVg cleavage product we observed. The A-sheet is concave-shaped and significantly larger than the C-sheet. The A-sheet defines most of the lipid binding cavity. Interestingly, the region of the A-sheet closest to the front (as defined in Fig. 1) of the lipid binding cavity has a different configuration in AmVg compared with IuLv. In IuLv, a semi-disordered and highly polar glutamine-rich mini domain of unknown function31 is present in the region, creating significantly different openings to the lipid binding cavity. At the top of the lipid binding cavity, vertebrate vitellogenins have a loop corresponding to the phosvitin chain. This serine-rich region, similar but not functionally equivalent to the polyserine region in insect vitellogenins, is released by proteolytic cleavage and not observed associated to IuLv31. In the loop where phosvitin is observed for the IuLv structure, AmVg contains a long helix with non-structured stretches at both ends that sit on top of the A-sheet (residues 1144–1189). Compared with IuLv, the loops extending towards the vWD domain are longer. The loop comprising residues 1291–1336 even contains three α-helices with their positions stabilized by a disulfide bond (C1310–C1324). These loops, together with slightly longer β-strands, effectively decrease the exposure of the lipid cavity to the solvent and allow an increased interaction surface between the vWD domain and the A-sheet. The A-sheet is capped off by a long α-helix that establishes the edge of the lipid binding cavity and is conserved for lamprey Lv. The A- and C-sheet are connected by an α-helix running perpendicular to the sheets that establishes the back of the lipid binding cavity (darker orange in Fig. 1a).
The vWD and CTCK domains
At the C-terminal end of the lipid binding module, we observe the vWD domain (Fig. 4d). It is connected to the lipid binding module through a flexible region that shows no density in our maps (residues 1411–1432). The vWD domain sits at the top-back of the lipid cavity opening between the A-sheet and the C-sheet. It is characterized by a beta sandwich, with a long loop (residues 1573–1492) involved in interaction with the C-sheet and a long, mostly unstructured region at the C-terminal end including 3 short helices. The vWD domain is stabilized by a range of disulfide bonds that are mostly conserved from its functionally unrelated homologues studied in humans, including mucins and the von Willebrand factor (vWF) (C1444–C1598, C1466–C1634, C1615–C1650)41,42 (Supplementary Fig. S3). In mucins and vWF, the vWD domain is found with the accompanying modules C8-3 and trypsin inhibitor-like (TIL), responsible for homodimerization stabilized through intermolecular disulfide bonds (Fig. 4a). Importantly, in all vitellogenins, C8-3 and TIL3 modules are absent along with the vWD domain (Fig. 4d). Furthermore, all cysteines are involved in intramolecular disulfide bonds. Therefore, no cysteines are available to form additional intermolecular disulfide bonds.
a Structures of vWD (including the accompanying modules C8-3, TIL3 and E3) and CTCK domains of the human von Willebrand factor (HsvWF)42. Intramolecular disulfide bridges are shown as yellow sticks and cysteines involved in intermolecular disulfide bonds are shown as dark spheres. b Schematic representation of how such domains allow the formation of long concatemers through intermolecular disulfide bonds is also shown. In the vWF such concatemers achieve specific mechanical properties relevant for biological function. Equivalent interactions have been observed for mucins59,82. c CTCK domain of the human hormone chorionic gonadotropin (HsCG)60. This CTCK domain can form stable dimers without stabilizing intermolecular disulfide bonds. Intramolecular disulfide bonds are shown as yellow sticks and regions involved in dimerization shown as yellow stretches. d Experimental structure of the vWD domain and AlphaFold 2 prediction of the CTCK domain from AmVg. No cysteines are available for the formation of intermolecular disulfide bridges as in the case of vWF or mucins. Vg could achieve dimerization through non-covalent interactions between the CTCK domains, as in the case of chorionic gonadotropin (HsCG) CTCK domain (panel c). The PDB IDs of the relevant structures presented are shown in a light purple box.
No density is observed for AmVg after the last disulfide bond (residues C1615–C1650, where the protein is predicted to have a small C-terminal domain of unknown function linked to the vWD domain through a long (35 residue) flexible linker (Supplementary Fig. S4). The C-terminal domain has been suggested to participate in gating the lipid binding cavity43. Structural alignment based on the AlphaFold 2 prediction of AmVg for the C-terminal domain using Foldseek44 shows it is a C-terminal cystine knot domain (CTCK) (Fig. 4d). CTCK domains are often found for vWD domain-containing proteins such as mucins and the vWF (Fig. 4a). CTCK domains stabilize homodimerization through intermolecular disulfide bonds. The combination of homodimerization through vWF and CTCK domains leads to the formation of long concatemers for mucins and the vWF (Fig. 4b)41,45.
Glycosylation
Conservation of glycosylation among different vitellogenins is generally low. Of the three N‑glycosylations predicted for honey bee Vg from sequence analysis46, only one glycosylation has been identified: a covalently linked carbohydrate at N296 in the N-sheet (Fig. 3d). Vg N-linked glycans contain the fundamental unit of high-mannose oligosaccharides from both vertebrates and invertebrates, with varying amounts of glucose residues at the ends47,48,49. The exact identity of the glycan tree is unknown, but the identity of the residues modelled at the base of the tree is clear based on the cryo-EM density, matching the NMR studies by Osir et al.48. Our structures include the first two N-acetyl glucosamines (GlcNAc) as well as three mannose residues. The rest of the residues are flexible without cryo-EM map density. The base of the glycosylation tree is stabilized by long structured loops of the N-sheet. The loops are not present in the vertebrate IuLv structure, where no equivalent asparagine residue that could be subject to N-glycosylation can be found at the surface of the N-sheet. Indeed, N296 is only partially conserved in the strict MSA, showing this specific glycosylation is present only in some arthropods.
Lipid binding
Vitellogenins contain around 16% of their mass in lipids32,49. These include mostly but not only phospholipids. The binding is generally non-specific with the lipid binding residues showing very low conservation (Supplementary Fig. S2)32,33. The headgroup of only one phospholipid can be modelled in AmVg, while small density blobs likely belonging to part of lipid chains are observed in different parts of the lipid binding cavity (Fig. 3a–c). The unmodelled density blobs (Fig. 3a, b) are found in small, non-polar and non-conserved sub cavities, thus, it is unlikely they belong to unidentified post-translational modifications or specific ligand binding sites. The general lack of lipid cryo-EM density in AmVg shows the generally disordered and dynamic nature of lipid interactions under native conditions. The lipid binding cavities of AmVg (33,506 Å3) and IuLv (33,179 Å3) have similar volumes, although the typical funnel shape of the IuLv cavity is not conserved for AmVg, with the narrow part of the funnel being completely cut off from solvent access. The shape of the cavity for the cleavage product of AmVg is similar to that of the uncleaved protein, while the volume (16,168 Å3) is significantly smaller as a result of a shorter A-sheet that is wrapped around itself. The polarity of the lipid binding cavity is also similar between AmVg and IuLv. Positively charged residues at the base of the cavity are thought to interact with phospholipid headgroups while the less polar sides and top of the cavity are able to accommodate non-polar lipids, as was originally described for IuLv32. The most obvious difference between the lipid binding cavities of AmVg, its cleavage product and IuLv is their accessibility and solvent exposure, as described later in the section on the cleavage product of AmVg. Our analysis of the dynamics of AmVg using cryo-EM data processing methods (described in the methods section) did not yield an indication of large opening or closing motions for the lipid binding cavity.
In IuLv, lipids not only bind in the main lipid cavity but also in a small cavity in the N-sheet subdomain. In the case of AmVg, no such cavity is observed as the equivalent region contains extra protein density. Such density corresponds to the longer linker between the polyserine region and the α-helical domain.
Putative metal ion binding sites
Vg is known to provide zinc ions for developing oocytes and the metal ions have been suggested to play a role in its antioxidant activities21. The number of zinc ions carried by Vg varies in different species50,51,52, with histidine residues known to play a central role in zinc binding53. For the honey bee, it was recently determined that an average of 3.5 zinc ions bind to each Vg molecule54. Importantly, no metal chelators were used during the purification of AmVg from honey bee hemolymph and no divalent cations were included in the buffer during purification. We identified three putative binding sites in our structures (Fig. 5), however, there is no clear density for metal ions at most of these positions. Zinc has been modelled in all three sites to analyse binding distances and geometry (Supplementary Fig. S5), although conclusions are difficult to draw given the limited resolution of the reconstructions and the incomplete coordination spheres likely pointing to low occupancy. Further information on the binding of metals to AmVg can also be obtained from the latest release of AlphaFold55 (Supplementary Fig. S6) and conservation data, including the occurrence of histidine pairs in the sequences analysed in the MSA (Supplementary Fig. S2).
The sites are all found at the interface between the N- and A-sheets and the a-helical domain. Arrows point to the ends of the loop not observed between residues 232–245 that might be stabilized upon zinc binding. Panels a–c show the three putative binding sites in detail and the evidence supporting them as actual zinc binding sites. Densities are shown at a contouring level of 7σ. Spherical density can be observed between histidine side chains for sites 2 and 3, while density for a third ligand (likely water) can be observed in site 2. The densities shown here represent the higher resolution map obtained for the AmVg cleavage product. *Percentage of sequences in the strict MSA where the histidine pair is conserved versus all the sequences where at least one histidine in the pair is conserved (Supplementary Fig. S2).
Site 1 is in the loop of the N-sheet that forms a short β-strand and interacts with the A-sheet, involving residues H229 and H926, where the two histidine side chains are in close proximity (Fig. 5a). Although there is no clear density for a metal ion, the flexible loop from residues 232–245 may be stabilized by the presence of a zinc ion, where E239 would contribute to the coordination of the cation. Site 2 is very close to site 1 and formed by H587 and H593, in an insect-specific loop21 of the α-helical domain that contains two short helices (Fig. 5c). The helices are oriented such that the histidine side chains are positioned to coordinate a metal cation. The cryo-EM density of the cleavage product of AmVg seems to indicate the presence of a coordinating water next to the poor density for the putative zinc ion. No other residues in the proximity seem to be available to complete the coordination sphere of a zinc cation, although AlphaFold 3 predicts residue S376 to be part of the coordination sphere (Supplementary Fig. S6). S376 is not observed experimentally and assumed to be flexible. Site 2 is the only site predicted in the AlphaFold 3 model as a zinc-binding site, with pLDDT score for the ion of 78.24. Site 3 is formed by H990 and H1045 and again seems to involve only two histidine side chains, although there is density between the side chains likely indicating the presence of a cation (Fig. 5d). All putative sites described are solvent exposed, either at the protein surface or in an aqueous cavity easily accessible through loop motion (Site 3). Given the incomplete coordination spheres and the quality of the cryo-EM densities we have not included any metal cations in our models. None of the putative sites herein described are conserved for lamprey Lv, while the residues involved in the different sites show only moderate or low conservation scores in the strict MSA with arthropod sequences. Data regarding the number of sequences where both histidine residues are conserved as a pair show the highest confidence for site 2.
An additional and highly conserved Ca2+ binding site is believed to be part of the vWD domain based on sequence alignments with known structures of functionally unrelated homologues56 as well as conservation data among Vgs (Supplementary Fig. S2) and the AlphaFold 3 prediction for AmVg (pLDDT score for the ion of 65.15) and. The quality of the cryo-EM density in the area, however, is not sufficient to draw any conclusions regarding metal binding, although the position of the protein backbone allows ion coordination as predicted (Supplementary Fig. S6).
The AlphaFold predictions for AmVg
Since its development, AlphaFold 257 has revolutionized the field of structural biology. For honey bee Vg, the AlphaFold 2 structure prediction is surprisingly accurate (Supplementary Fig. S4). AlphaFold 2 was able to correctly predict the overall domain organization of the protein, completely unknown for the vWD domain with respect to the lipid binding module. In addition, it predicted the fold of individual domains and long structured loops with astonishing accuracy. The root mean square deviation (RMSD) value of the main chain comparing our cryo-EM AmVg structure and the AlphaFold 2 model is 2.21 Å. The AI-generated model also provides information about the C-terminal domain (CTCK) which is not observed in our experimental structure. However, a lipoprotein rich in post-translational modifications such as Vg also highlights the current limitations of AlphaFold 2. No information about cleavage products, metal binding sites –including side chain conformations and densities for the metals– or glycosylations can be obtained from the AlphaFold 2 model. The recent development of AlphaFold 355 addresses some of these limitations and provides extra insight into the binding of metals by AmVg. The AlphaFold 3 model for AmVg was generated using the AlphaFold Server (alphafoldserver.com). By manually adding the desired metal ions as input for the prediction, the model predicts a calcium ion that is not observed experimentally in the vWD domain and provides extra support for zinc binding in one of the binding sites as discussed in the previous section. Compared with the cryo-EM structure, the AlphaFold 3 model shows an RMSD of the main chain of 1.74 Å, showing an improvement compared with AlphaFold 2.
The presence of a 150-kDa cleavage product of AmVg
During cryo-EM data processing, we also determined the structure of a cleavage product of AmVg that was as abundant as the full-length protein in the grids. A 150-kDa cleavage product in honey bee hemolymph has been described previously for samples obtained from the hemolymph of the insect46,49. However, most of the cleavage of AmVg occurred after purification, with the presence of the cleavage product increasing over time as observed with denaturing polyacrylamide gel electrophoresis (Supplementary Fig. S1). For regions that are common for the cleavage product and the full-length protein, the fold is identical, with a main-chain RMSD of 0.65 Å. The biggest deviations are close to the cleavage site, where the A-sheet of the cleavage product is substantially more wrapped around itself, effectively decreasing the volume of the lipid binding cavity. Although smaller, the lipid binding cavity is almost intact, with density up to residue 1276 in the A-sheet, with some arguable evidence at low contouring levels for some residues of the next β-strand being present (residues 1285–1292) (Fig. 6b). Therefore, a small part of the A-sheet and the whole vWD domain are missing in the cleavage product of AmVg. This effectively increases the accessibility of the lipid cavity when compared with full-length AmVg, both at the front and at the top-back (Fig. 2b). Interestingly, through loss of the vWD domain the cleavage product displays a domain composition more similar to that of lipovitellin.
a Location of possible cleavage sites of AmVg in its structure and sequence. Possible motifs for cleavage are coloured grey, with basic residues shown in ball-and-stick representation and disulfide bonds coloured yellow. Numbers have been given to the cleavage sites and letters to the b-strands in the cleavage area. The secondary structure of the segments is indicated under the sequence. b Density around the cleavage motifs for the cleavage product at two contour levels representing different confidence levels. Strands a and b are clearly visible at normal contour levels (7σ). What could be residual density for strand c (shown as a dashed orange line) is visible only at low contour levels (1σ). The positions of the different possible cleavage motifs are also shown. Map quality around the cleavage area is quite poor and some regions are modelled as alanine sidechains except for glycine and proline residues. More details regarding the areas modelled as poly-alanine can be found in the Supplementary Information.
Insect vitellogenins are known to be processed post-translationally by subtilisin-like proteases recognizing the motif R/KXXR/K35. However, no evidence for cleavage at such motifs has been described previously for the suborder apocrita, to which bees and ants belong. Honey bee Vg contains several such motifs, any of which could constitute the cleavage site. One of these motifs —Motif 1— is found where the density for the cleavage product ends near residue 1276 (Motif 1: RYGK, residues 1274–1277). A second motif can be found immediately after (Motif 2: KGER, residues 1281–1284), although no density is observed for the residues between the two motifs. The residues corresponding to these two motifs are at the beginning and end of a short loop in the A-sheet (Fig. 6a). Interestingly, the only disulfide bond in the β-strands of the A-sheet of the full-length AmVg is found stabilizing this short loop (C1242–C1279). The disulfide bond is conserved for IuLv but not observed at all for the cleavage product of AmVg. However, weak density for the β-strand immediately adjacent could be evidence that a partly flexible stretch of amino acids is still present. Two overlapping protease-recognition motifs in a loop shortly after are therefore also cleavage site candidates (Motif 3: RGNK, residues 1316–1319 and Motif 4: KILR, residues 1319–1322). Unfortunately, given the specificity of cleavage patterns for Vg to each species, conservation data are not useful to get further insight into putative cleavage motifs.
Discussion
The cryo-EM structures presented herein represent a non-vertebrate vitellogenin and the full-length LLTP containing a vWD domain. Furthermore, due to the native source of the protein, we have been able to identify post-translational modifications, metal binding sites and a cleavage product of AmVg. However, a lot is still unknown about the molecular details of AmVg.
The function of the vWD domain and the C-terminal domains of vitellogenins and other LLTPs remains elusive. Recently, copper binding was reported for the vWD domain D1 of human mucin 258. However, the residues involved in copper coordination are not conserved in AmVg. As presented in the results section, proteins containing vWD and CTCK domains are able to form extensive concatemers through dimerization and subsequent linkage by disulfides both close to the N-terminus (vWD domain and accompanying modules) and the C-terminus (CTCK domains) (Fig. 4b). Such covalent concatemers confer mechanical stress resistance to secreted mucins and vWF in blood, which is required for their functions41,59. However, AmVg lacks the cysteine residues that are involved in intermolecular disulfide formation, both within the vWD domain56 and in the CTCK domain identified in this work (Fig. 4d). Thus, AmVg contains the two domains required for formation of concatemers but lacks the main structural features needed for their stabilization. It cannot be ruled out that there are inaccuracies in the AlphaFold 2 prediction of the CTCK domain and that some cysteines predicted to be involved in intramolecular interactions might in fact be available for intermolecular interactions. However, some cystine knot containing cytokines such as chorionic gonadotropin (CG) form dimers without any intermolecular disulfide bond stabilization (Fig. 4c)60. Vg is known to be able to form dimers56, although their biological significance is unknown. The CTCK domain is therefore a strong candidate for a dimerization domain for vitellogenin. Dimerization trough the CTCK domain would also explain why no evidence of dimerization has been observed during cryo-EM data processing. In such hypothetic dimer (Fig. 4d), the long flexible linker between two dimerized CTCK domains and the rest of Vg would never generate a single conformation that can be averaged in 2D or 3D classification. Then, the two Vg molecules would be resolved only separately and without density corresponding to the CTCK domain, as it is the case.
In our experimental structures, none of the putative Zn2+ binding sites shows a complete coordination sphere. We only observe clear cryo-EM density for the pairs of histidine side chains involved in metal binding. There is, however, density that seems to correspond to a metal cation in two of the sites and density for a third coordinating group, likely water, in one of the sites. Therefore, AmVg likely binds Zn2+ with low affinity and some of the bound metals were stripped off during purification. Loose binding of Zn2+ might be desirable for the specific needs of Vg as a transporter, where the zinc ions need to be able to dissociate easily when required in the oocyte. Low affinities for metal ions might also be statistically compensated for by the presence of several binding sites. The presence of multiple binding sites might also allow the evolutionary plasticity required for the protein to retain its Zn2+ binding functions in different species without specific binding sites being highly conserved. In addition, keeping a low number of coordinating residues might allow the binding not only of Zn2+ but also of other metal cations.
The biological relevance of the cleavage product of honey bee Vg remains to be established. The presence of a 150-kDa cleavage product in honey bee hemolymph has been described previously for samples obtained from the hemolymph of the insect46,49. A 150-kDa AmVg fragment has also been identified as more abundant in samples purified from the fat body of the insect46, compared with those from hemolymph. The fragment was named fat-body vitellogenin (fbVg) and was described as lacking the N-sheet subdomain based on a lower concentration of matching hits in the N-sheet from liquid chromatography coupled to tandem mass spectrometry (LC-MS/MS)46. However, in the same study fbVg was shown to contain N-linked glycosylations46, which our cryo-EM study only observed in the N-sheet subdomain, supporting the fact that fbVg contains the N-sheet. The cleavage fragment of AmVg observed in this study could correspond to fbVg, which could thus be the 150-kDa fragment that co-purifies at low abundance together with full-length AmVg obtained from honey bee hemolymph. The relative abundance of the cleavage product could increase over time in storage through the action of small amounts of proteases remaining after purification. The exact cleavage site remains unknown, with all the motifs presented in the results section as possible candidates. The AmVg cleavage product described here has its lipid binding cavity almost intact and more accessible than full-length AmVg. If a biological significance is demonstrated, it is tempting to assume that the fragment is mostly involved in lipid binding related functions such as storage of lipids in the fat body. Since the cleavage fragment is also present in the hemolymph46,49, it could also be involved in vitellogenesis. The cleavage product contains all the structural elements related to vitellogenesis functionalities. These include receptor binding for internalization in the oocyte, lipid and full zinc binding capabilities for use as a nutrient source and PAMP recognition for trans-generational immune priming. Remarkably, the cleavage product seems to be structurally analogous to the type-C vitellogenins from fish, which lack the vWD and the CTCK domains (named β-component in vertebrate vitellogenins) as well as the phosvitin chain, which is not present in invertebrate Vgs36. The function of fish type-C vitellogenins and their role in vitellogenesis – if any – remain unknown.
An alternative interpretation is that the 150 kDa AmVg fragment from the hemolymph and fbVg are different, though of similar size. Indeed, protein product formation can proceed differently in intra or extra cellular compartments that have different biochemical environments61,62. By such mechanisms, AmVg can be processed differently in hemolymph and fat body cells. For instance, different cleavage products are identified in fat body and eggs for cockroach35. A sequence analysis of insect vitellogenins identified cleavage motifs (R/KXXR/K) at both the N- and C-terminal, often in the same molecule35,63. Glycosylation is a modification that affects the availability of protease cleavage sites63, and more than one potential glycosylation site has been identified for AmVg46. Glycosylation, moreover, is highly variable and non-permanent64 changing with respect to time, tissue, subcellular location65, organismal health status64,66, as well as with protein subtypes67. Thus, it cannot be ruled out that AmVg has more than one cleavage product and glycosylation variant. The 150 kDa fragment observed in hemolymph, thereby, can be an alternative cleavage product compared with fbVg46.
The relevance of Vg as a target for AI-based technologies to predict protein structures is highlighted by an article where it was used to exemplify the upcoming revolution in structural biology68. Although some important features of such a complex protein are missing in the AlphaFold predictions, the quality of the AlphaFold 2 model compared with our experimental structures shows high accuracy for the structure prediction of species-specific structural features of Vg. This suggests that AlphaFold is useful for structural phylogenetic studies of Vg and other Vg-like proteins. The presence of certain domains and subdomains, together with their architecture and functional data can easily provide important information on neo-functionalization. AI-based structure prediction for vitellogenins will become even more relevant after the advent of AlphaFold 355, given its reported ability to predict protein structures with non-protein ligands such as metal ions, lipids and covalent glycosylation. Vitellogenin from the honey bee is a target in the ongoing CASP 16 competition (target T1210).
Overall, the structures described herein significantly advance our understanding of vitellogenins and their molecular complexity, both for all egg-laying animals but for invertebrates and the honey bee in particular. Insight has been obtained into AmVg domain arrangement for the previously uncharacterized vWD domain as well as for several structural features not observed in the vertebrate IuLv structure. The molecular complexity of vitellogenins is the basis of its pleiotropy, with taxa-specific variations underlying the different neo- and sub-functionalizations the protein has developed over time in widely different animal groups. Furthermore, valuable information about the honey bee specific glycosylation profile, metal binding abilities and cleavage products has been obtained. The results presented here provide the molecular basis for further in-depth investigation of the particular details of Vg functionalities and binding abilities, including comparative studies among multiple taxa and for other related LLTPs.
Methods
Ethical statement
No specialized ethics approval is needed for the collection of hemolymph. Laws and regulations for research animals in Norway (Forskrift om bruk av dyr i forsøk) and the animal welfare act (Dyrevelferdsoven) were followed.
Protein extraction and purification
To obtain purified AmVg, 1–10 µL of natural wintering honey bees (Apis mellifera carnia) hemolymph were collected and diluted 1/10 in 0.5 M Tris/HCl pH 7.6, using BD needles (30 G) as described earlier69. The diluted protein was filtered using a 0.2 µm syringe filter. Vg was then subjected to ion-exchange chromatography using a HiTrap Q FF 1 mL column equilibrated in 0.5 M Tris/HCl pH 7.6 and eluted with a NaCl gradient in 0.5 M Tris/HCl pH 7.6 up to 0.45 M NaCl (Supplementary Fig. S1). 400–450 µL of diluted hemolymph were manually injected and Vg eluted at a conductivity of 15–22 mS·cm−1. Fractions were collected, pooled and concentrated using an Amicon Ultracel 100 kDa membrane centrifuge filter (Merck KGaA, Darmstadt, Germany). Fraction purity was verified by running SDS-PAGE (Supplementary Fig. S1). Protein concentration was measured with Qubit.
Cryo-EM grid preparation and data collection
Cryo-EM grids preparation and data collection were carried out at the cryo-EM Sweden National facility at SciLife lab in Stockholm according to standard procedures. Grids were prepared and plunge frozen in liquid ethane using a FEI Vitrobot Mark IV. AmVg at a concentration of 1.2 mg/mL was applied to a set of different grids with different treatments in order to overcome the preferential orientation problem we had observed previously for the sample. The different grids included Quantifoil R2/1, R3.5/1 and R3/5 holey carbon on 300 copper meshes and R1.2/1.3 holey carbon on 300 gold mesh. Different grids were either not treated or glow-discharged both in the presence and absence of pentylamine. From all the different grids, two datasets that led to final reconstructions were collected, one for a glow-discharged (in the absence of pentylamine) Quantifoil R2/1 holey carbon on a 300-mesh copper grid and one for a Quantifoil R1.2/1.3 holey carbon on a 300 mesh gold grid that was not glow discharged. Using the EPU control software, 28,406 movies were recorded on a FEI Titan Krios operating at 300 kV equipped with a K3 BioQuantum detector operated in counting and zero loss mode with an energy filter slit width of 20 eV. The nominal magnification was 105,000×, corresponding to a physical pixel size of 0.8464 Å. The dose rate was set to 15.8 e−/px/s, and the total exposure time was 2.83 s, resulting in a total dose of 62.4 e−/Å2. Each movie was split into 40 frames of 0.07075 s. Nominal defocus range was −0.8 μm to −2.8 μm in 0.2 μm steps.
Image processing and model building
The two datasets were merged and underwent processing in CryoSPARC 4.170 as detailed in Supplementary Fig. S7. The preprocessing of the movies involved patch-based motion correction and patch-based contrast transfer function (CTF) estimation. Micrographs exhibiting a CTF fit greater than 3.5 Å were discarded, leaving a final dataset of 23,127 images. Based on knowledge from previous rounds of processing, initial 2D classes were generated and representative classes were selected for template-based particle picking. The selected classes included different views of the two particles found in the dataset. Subsequent template-based particle picking generated a stack of 23.3 million particles. These were extracted within a box size of 320 and then subjected to Fourier cropping down to 80 pixels. To further refine the stack, two additional rounds of 2D classification were performed prior to re-extraction of the particles at a box size of 320 and cropping to 160 pixels. The remaining 10.9 million particles underwent two additional rounds of 2D classification, yielding a residual stack of 9.4 million particles.
The final extraction was carried out with a box size of 320 pixels without additional cropping. These particles were then utilized for ab-initio 3D reconstruction, with the most promising classes chosen for further heterogeneous refinement. This step yielded two distinct classes: one for the smaller cleavage product and one for the full-length protein. These stacks were then further refined through two cycles of heterogeneous refinement, followed by non-uniform refinement to evaluate potential improvements. A final local refinement was performed to enhance the quality of the consensus maps further, resulting in final maps with resolutions of 3.2 Å for the full protein and 3.0 Å for the smaller cleavage product. The estimated local resolution for the two reconstructed maps is shown in Supplementary Fig. S8.
The flexibility and dynamics of AmVg, both for the full-length protein and the cleavage product were also analysed in CryoSPARC 4.170. 3D classification and 3D variability jobs did not provide obvious clues into alternative conformations, while flexible refinement did not produce final densities of better quality than local and non-uniform refinement.
The AlphaFold 2 structure prediction for AmVg (Uniprot ID: Q868N5) was docked in the cryo-EM map in UCSF Chimera (v.1.16)71. From there, the model was manually fitted and rebuilt in Coot (v.0.0.8.93)72 (see supplementary information for details on map quality, model building and regions not included in the final models). The smaller cleavage product was modelled manually from the full-length AmVg model. Iterative cycles of real-space refinement were done in Phenix (v.1.19.2)73 with further model rebuilding in Coot. Final geometries were assessed with Molprobity74. Glycan geometries were assessed with Privateer75. Structural analysis, comparison and figures were done in PyMOL (v.3.0.0) (Schrödinger, LLC). The volumes of the lipid binding cavities were calculated with CASTp (v.3.0)76 using a probe of 2.5 Å radius optimized to exclude areas outside of the lipid binding cavities.
The cryo-EM maps were deposited in the Electron Microscopy Data Bank with accession codes EMD-19842 and EMD-19843 for full-length AmVg and the 150 kDa cleavage product of AmVg, respectively. The atomic coordinates of the models were deposited in the Protein Data Bank with accession numbers 9ENR for full-length AmVg and 9ENS for the cleavage product of AmVg.
Alphafold 3 model generation
The Alphafold 3 model was generated using release 23/6/2024 of the AlphaFold server (alphafoldserver.com). The sequence for AmVg (Uniprot ID: Q868N5) was used as input, together with three Zn2+ ions and one Ca2+ ion. Additionally, the glycosylation observed in our cryo-EM density was input to occur at residue N298, as was observed experimentally.
Multiple sequence alignment
In order to identify relevant sequences from the UniProtKB (v.2021_04) database, the full-length sequence of AmVg (UniProt ID Q868N5) was queried in Jackhammer (v. 2.41.2)77. Two different alignments were performed to study Vg conservation. In the first, sequences of distant homologues including vertebrate vitellogenins such as IuVg were selected (threshold at an e-value < e−4; 3427 sequences). For the second alignment, only sequences from closely related homologues from arthropods were selected (threshold at an e-value < e−30; 1536 sequences). A length filter was applied to both alignments to ensure that the selected sequences were 20% or less different in length, which reduced the sequence list to 2018 and 863 sequences for each alignment. Duplicated sequences (identical accession number) were removed to reduce redundancy, bringing the number of sequences down to 798 and 358. For the broader alignment including vertebrate vitellogenins, sequences with an identity >=70% (measured with MMseqs278) were clustered, resulting in 313 sequence clusters. One representative from each cluster in the broad selection (honey bee Vg was selected as the representative sequence from its cluster) and the all the sequences in the strict selection were aligned using MAFFT (v.7)79. Conservation scores for each residue were calculated using the Valdar algorithm80 in Jalview (v.9.0.5)81. Residues are scored from 0 (not conserved) to 1 (highly conserved) within a single alignment.
Reporting summary
Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.
Data availability
The data that support this study are available from the corresponding authors upon request. The cryo-EM maps are deposited in the Electron Microscopy Data Bank with accession codes EMD-19842 and EMD-19843 for full-length AmVg and the 150 kDa cleavage product of AmVg, respectively. The atomic coordinates of the models are deposited in the Protein Data Bank (PDB) with accession numbers 9ENR for full-length AmVg and 9ENS for the cleavage product of AmVg. Other structures used in this study can be accessed in the PDB: IuLv as 1LSH, HsMTP as 6I7S, HvWF vWD domain as 6N29 and CTCK as 4NT5 and HsCG as 1HRP. The MSA data underlying Supplementary Fig. S2 and the raw images for the SDS-PAGE gels in Supplementary Fig. S1 are provided as a source data file with this manuscript. Source data are provided with this paper.
References
Smolenaars, M. M. W., Madsen, O., Rodenburg, K. W. & Van Der Horst, D. J. Molecular diversity and evolution of the large lipid transfer protein superfamily. J. Lipid Res. 48, 489–502 (2007).
Rakhshandehroo, M. et al. CD1d-mediated presentation of endogenous lipid antigens by adipocytes requires microsomal triglyceride transfer protein. J. Biol. Chem. 289, 22128–22139 (2014).
Hoeger, U. & Harris, J. R. Vertebrate and Invertebrate Respiratory Proteins, Lipoproteins and other Body Fluid Proteins. In Subcellular Biochemistry, vol. 94 (Springer Cham, 2020).
Byrne, B. M., Gruber, M. & Ab, G. The evolution of egg yolk proteins. Prog. Biophys. Mol. Biol. 53, 33–69 (1989).
Du, X. et al. Functional characterization of Vitellogenin_N domain, domain of unknown function 1943, and von Willebrand factor type D domain in vitellogenin of the non-bilaterian coral Euphyllia ancora. Dev. Comp. Immunol. 67, 485–494 (2017).
Wu, B., Liu, Z., Zhou, L., Ji, G. & Yang, A. Molecular cloning, expression, purification and characterization of vitellogenin in scallop Patinopecten yessoensis with special emphasis on its antibacterial activity. Dev. Comp. Immunol. 49, 249–258 (2015).
Zhang, Q., Lu, Y., Zheng, H., Liu, H. & Li, S. Differential immune response of vitellogenin gene to Vibrio anguillarum in noble scallop Chlamys nobilis and its correlation with total carotenoid content. Fish. Shellfish Immunol. 50, 11–15 (2016).
Raikhel, A. S. et al. Molecular biology of mosquito vitellogenesis: From basic studies to genetic engineering of antipathogen immunity. Insect. Biochem. Mol. Biol. 32, 1275–1286 (2002).
Yang, Y., Zheng, B., Bao, C., Huang, H. & Ye, H. Vitellogenin2: Spermatozoon specificity and immunoprotection in mud crabs. Reproduction 152, 235–243 (2016).
Salmela, H., Amdam, G. V. & Freitak, D. Transfer of immunity from mother to offspring is mediated via egg-yolk protein vitellogenin. PLoS Pathog. 11, 1–12 (2015).
Liu, Q. H., Zhang, S. C., Li, Z. J. & Gao, C. R. Characterization of a pattern recognition molecule vitellogenin from carp (Cyprinus carpio). Immunobiology 214, 257–267 (2009).
Li, Z., Zhang, S., Zhang, J., Liu, M. & Liu, Z. Vitellogenin is a cidal factor capable of killing bacteria via interaction with lipopolysaccharide and lipoteichoic acid. Mol. Immunol. 46, 3232–3239 (2009).
Zhang, S., Sun, Y., Pang, Q. & Shi, X. Hemagglutinating and antibacterial activities of vitellogenin. Fish. Shellfish Immunol. 19, 93–95 (2005).
Tong, Z., Li, L., Pawar, R. & Zhang, S. Vitellogenin is an acute phase protein with bacterial-binding and inhibiting activities. Immunobiology 215, 898–902 (2010).
Shi, X., Zhang, S. & Pang, Q. Vitellogenin is a novel player in defense reactions. Fish. Shellfish Immunol. 20, 769–772 (2006).
Garcia, J. et al. Atlantic salmon (Salmo salar L.) serum vitellogenin neutralises infectivity of infectious pancreatic necrosis virus (IPNV). Fish. Shellfish Immunol. 29, 293–297 (2010).
Sun, C., Hu, L., Liu, S., Gao, Z. & Zhang, S. Functional analysis of domain of unknown function (DUF) 1943, DUF1944 and von Willebrand factor type D domain (VWD) in vitellogenin2 in zebrafish. Dev. Comp. Immunol. 41, 469–476 (2013).
Li, Z., Zhang, S. & Liu, Q. Vitellogenin functions as a multivalent pattern recognition receptor with an opsonic activity. PLoS One 3, 4–10 (2008).
Liu, M., Pan, J., Ji, H., Zhao, B. & Zhang, S. Vitellogenin mediates phagocytosis through interaction with Fcγ. R. Mol. Immunol. 49, 211–218 (2011).
Campanella, C. et al. Lipovitellin constitutes the protein backbone of glycoproteins involved in sperm-egg interaction in the amphibian Discoglossus pictus. Mol. Reprod. Dev. 78, 161–171 (2011).
Havukainen, H. et al. Vitellogenin recognizes cell damage through membrane binding and shields living cells from reactive oxygen species. J. Biol. Chem. 288, 28369–28381 (2013).
Ando, S. & Yanagida, K. Susceptibility to oxidation of copper-induced plasma lipoproteins from japanese eel: Protective effect of vitellogenin on the oxidation of very low density lipoprotein. Comp. Biochem. Physiol. C. Pharmacol. Toxicol. Endocrinol. 123, 1–7 (1999).
Nakamura, A. et al. Vitellogenin-6 is a major carbonylated protein in aged nematode, Caenorhabditis elegans. Biochem. Biophys. Res. Commun. 264, 580–583 (1999).
Amdam, G. V., Norberg, K., Hagen, A. & Omholt, S. W. Social exploitation of vitellogenin. Proc. Natl Acad. Sci. USA 100, 1799–1802 (2003).
Marco Antonio, D. S., Guidugli-Lazzarini, K. R., Do Nascimento, A. M., Simões, Z. L. P. & Hartfelder, K. RNAi-mediated silencing of vitellogenin gene function turns honeybee (Apis mellifera) workers into extremely precocious foragers. Naturwissenschaften 95, 953–961 (2008).
Amdam, G. V. & Omholt, S. W. The hive bee to forager transition in honeybee colonies: The double repressor hypothesis. J. Theor. Biol. 223, 451–464 (2003).
Guidugli, K. R. et al. Vitellogenin regulates hormonal dynamics in the worker caste of a eusocial insect. FEBS Lett. 579, 4961–4965 (2005).
Nelson, C. M., Ihle, K. E., Fondrk, M. K., Page, R. E. & Amdam, G. V. The gene vitellogenin has multiple coordinating effects on social organization. PLoS Biol. 5, 0673–0677 (2007).
Kohlmeier, P., Feldmeyer, B. & Foitzik, S. Vitellogenin-like A-dependent shifts in social cue responsiveness regulate behavioral task specialization in an ant. PLoS Biol. 4, 4–7 (2017).
Salmela, H. et al. Nuclear translocation of vitellogenin in the honey bee (Apis mellifera). Apidologie 53, 1–17 (2022).
Anderson, T. A., Levitt, D. G. & Banaszak, L. J. The structural basis of lipid interactions in lipovitellin, a soluble lipoprotein. Structure 6, 895–909 (1998).
Thompson, J. R. & Banaszak, L. J. Lipid-protein interactions in lipovitellin. Biochemistry 41, 9398–9409 (2002).
Biterova, E. I. et al. The crystal structure of human microsomal triglyceride transfer protein. Proc. Natl Acad. Sci. USA 116, 17251–17260 (2019).
Su, C. C. et al. High-resolution structural-omics of human liver enzymes. Cell Rep. 42, 112609 (2023).
Tufail, M. & Takeda, M. Molecular characteristics of insect vitellogenins. J. Insect Physiol. 54, 1447–1458 (2008).
Sullivan, C. V. & Yilmaz, O. Vitellogenesis and yolk proteins, fish. Encyclopedia of Reproduction vol. 6 (Elsevier, 2018).
Carducci, F., Biscotti, M. A. & Canapa, A. Vitellogenin gene family in vertebrates: evolution and functions. Eur. Zool. J. 86, 233–240 (2019).
Li, A., Sadasivam, M. & Ding, J. L. Receptor-ligand interaction between vitellogenin receptor (VtgR) and vitellogenin (Vtg), implications on low density lipoprotein receptor and apolipoprotein B/E. The first three ligand-binding repeats of VtgR interact with the amino-terminal region of Vtg. J. Biol. Chem. 278, 2799–2806 (2003).
Roth, Z. et al. Identification of receptor-interacting regions of vitellogenin within evolutionarily conserved β-sheet structures by using a peptide array. ChemBioChem 14, 1116–1122 (2013).
Havukainen, H., Underhaug, J., Wolschin, F., Amdam, G. & Halskau, Ø. A vitellogenin polyserine cleavage site: Highly disordered conformation protected from proteolysis by phosphorylation. J. Exp. Biol. 215, 1837–1846 (2012).
Javitt, G. et al. Assembly mechanism of mucin and von Willebrand factor polymers. Cell 183, 717–729.e16 (2020).
Dong, X. et al. The von Willebrand factor D9D3 assembly and structural principles for factor VIII binding and concatemer biogenesis. Blood 133, 1523–1533 (2019).
Leipart, V., Halskau, Ø. & Amdam, G. V. How Honey Bee Vitellogenin Holds Lipid Cargo: A Role for the C-Terminal. Front. Mol. Biosci. 9, 1–8 (2022).
Van Kempen, M. et al. Foldseek: fast and accurate protein structure search. Nat. Biotechnol. 42, 243–246 (2023).
Springer, T. A. Von Willebrand factor, Jedi knight of the bloodstream. Blood 124, 1412–1425 (2014).
Havukainen, H., Halskau, Ø., Skjaerven, L., Smedal, B. & Amdam, G. V. Deconstructing honeybee vitellogenin: Novel 40 kDa fragment assigned to its N terminus. J. Exp. Biol. 214, 582–592 (2011).
Kornfeld, R. & Kornfeld, S. Assembly of asparagine-linked oligosaccharides. Annu. Rev. Biochem. 54, 631–664 (1985).
Osir, E. O., Wells, M. A. & Law, J. H. Studies on vitellogenin from the tobacco hornworm, Manduca sexta. Arch. Insect Biochem. Physiol. 3, 217–233 (1986).
Wheeler, D. E. & Kawooya, J. K. Purification and characterization of honey bee vitellogenin. Arch. Insect Biochem. Physiol. 14, 253–267 (1990).
Montorzi, M., Falchuk, K. H. & Vallee, B. L. Vitellogenin and lipovitellin: zinc proteins of Xenopus laevis oocytes. Biochemistry 34, 10851–10858 (1995).
Mitchell, M. A. & Carlisle, A. J. Plasma zinc as an index of vitellogenin production and reproductive status in the domestic fowl. Comp. Biochem. Physiol. 100, 719–724 (1991).
Martin, D. J. & Rainbow, P. S. The kinetics of zinc and cadmium in the haemolymph of the shore crab Carcinus maenas (L.). Aquat. Toxicol. 40, 203–231 (1998).
Auld, D. S., Falchuk, K. H., Zhang, K., Montorzi, M. & Vallee, B. L. X-ray absorption fine structure as a monitor of zinc coordination sites during oogenesis of Xenopus laevis. Proc. Natl Acad. Sci. USA 93, 3227–3231 (1996).
Leipart, V. et al. Resolving the zinc binding capacity of honey bee vitellogenin and locating its putative binding sites. Insect Mol. Biol. 31, 810–820 (2022).
Abramson, J. et al. Accurate structure prediction of biomolecular interactions with AlphaFold 3. Nature 630, 493–500 (2024).
Leipart, V. et al. Structure prediction of honey bee vitellogenin: a multi-domain protein important for insect immunity. FEBS Open Bio 12, 51–70 (2022).
Jumper, J. et al. Highly accurate protein structure prediction with AlphaFold. Nature 596, 583–589 (2021).
Reznik, N. et al. Intestinal mucin is a chaperone of multivalent copper. Cell 185, 4206–4215.e11 (2022).
Zhou, Y. F. & Springer, T. A. Highly reinforced structure of a C-terminal dimerization domain in von Willebrand factor. Blood 123, 1785–1793 (2014).
Tegoni, M., Spinelli, S., Verhoeyen, M., Davis, P. & Cambillau, C. Crystal structure of a ternary complex between human chorionic gonadotropin (hCG) and two FV fragments specific for the α and β-subunits. J. Mol. Biol. 289, 1375–1385 (1999).
Schlüter, H., Apweiler, R., Holzhütter, H. G. & Jungblut, P. R. Finding one’s way in proteomics: A protein species nomenclature. Chem. Cent. J. 3, 1–10 (2009).
Tufail, M. & Takeda, M. Vitellogenin of the cockroach, Leucophaea maderae: Nucleotide sequence, structure and analysis of processing in the fat body and oocytes. Insect Biochem. Mol. Biol. 32, 1469–1476 (2002).
Sappington, T. W. & Raikhel, A. S. Molecular characteristics of insect vitellogenins and vitellogenin receptors. Insect Biochem. Mol. Biol. 1748, 277–300 (1998).
Daenzer, J. M. I. & Fridovich-Keil, J. L. Drosophila melanogaster models of galactosemia. In: Current Topics in Developmental Biology vol. 121 (Elsevier Inc., 2017).
Furukawa, K. & Kobata, A. Protein glycosylation. Curr. Opin. Biotechnol. 3, 554–559 (1992).
Butler, W. & Huang, J. Glycosylation changes in prostate cancer progression. Front. Oncol. 11, 1–13 (2021).
Aslebagh, R. et al. Identification of posttranslational modifications (PTMs) of proteins by mass spectrometry. Adv. Exp. Med. Biol. 1140, 199–224 (2019).
Callaway, E. The Entire protein universe: AI predicts shape of nearly every protein. Nature 608, 15–16 (2022).
T.O. Aase, A. L., Amdam, G. V., Hagen, A. & Omholt, S. W. A new method for rearing genetically manipulated honey bee workers. Apidologie 36, 293–299 (2005).
Punjani, A., Rubinstein, J. L., Fleet, D. J. & Brubaker, M. A. CryoSPARC: Algorithms for rapid unsupervised cryo-EM structure determination. Nat. Methods 14, 290–296 (2017).
Pettersen, E. F. et al. UCSF Chimera - A visualization system for exploratory research and analysis. J. Comput. Chem. 25, 1605–1612 (2004).
Emsley, P., Lohkamp, B., Scott, W. G. & Cowtan, K. Features and development of Coot. Acta Crystallogr. Sect. D. Biol. Crystallogr. 66, 486–501 (2010).
Liebschner, D. et al. Macromolecular structure determination using X-rays, neutrons and electrons: Recent developments in Phenix. Acta Crystallogr. Sect. D. Struct. Biol. 75, 861–877 (2019).
Chen, V. B. et al. MolProbity: All-atom structure validation for macromolecular crystallography. Acta Crystallogr. Sect. D. Biol. Crystallogr. 66, 12–21 (2010).
Agirre, J. et al. Privateer: software for the conformational validation of carbohydrate structures. Nat. Struct. Mol. Biol. 22, 833–834 (2015).
Tian, W., Chen, C., Lei, X., Zhao, J. & Liang, J. CASTp 3.0: Computed atlas of surface topography of proteins. Nucleic Acids Res. 46, W363–W367 (2018).
Potter, S. C. et al. HMMER web server: 2018 update. Nucleic Acids Res. 46, W200–W204 (2018).
Steinegger, M. & Söding, J. MMseqs2 enables sensitive protein sequence searching for the analysis of massive data sets. Nat. Biotechnol. 35, 1026–1028 (2017).
Katoh, K., Rozewicki, J. & Yamada, K. D. MAFFT online service: multiple sequence alignment, interactive sequence choice and visualization. Brief. Bioinforma. 20, 1160–1166 (2019).
Valdar, W. S. J. Scoring residue conservation. Proteins 48, 227–241 (2002).
Waterhouse, A. M., Procter, J. B., Martin, D. M. A., Clamp, M. & Barton, G. J. Jalview Version 2—a multiple sequence alignment editor and analysis workbench. Bioinformatics 25, 1189–1191 (2009).
Javitt, G. et al. Intestinal Gel-Forming Mucins Polymerize by Disulfide-Mediated Dimerization of D3 Domains. J. Mol. Biol. 431, 3740–3752 (2019).
Acknowledgements
Cryo-EM data were collected at the Cryo-EM Swedish National Facility funded by the Knut and Alice Wallenberg, Family Erling Persson and Kempe Foundations, SciLifeLab, Stockholm University and Umeå University. We thank Marta Carroni and Dustin Morado for facilitating grid freezing, remote data collection and initial assistance with data processing. We thank Gabriele Cordara for his help during model building and Marta Sanz-Gaitero for her work during the initial negative-staining EM studies of AmVg. We thank the CASP team for their comments on model building based on participants predictions. We acknowledge The RCN grant number 262137, 335244 and 350231 for funding toward running costs and positions. This project received funding from the European Union’s Horizon Europe Research and Innovation Programme under grant agreement number 101087571 (H.L). A.M. was supported by SFB 944 and SFB 1557 and the DFG INST190/196-1 FUGG.
Author information
Authors and Affiliations
Contributions
M.M.-C. performed cryo-EM data processing, built the structural models and performed structural analysis under the supervision of H.L. and E.C. E.C. also performed initial cryo-EM data processing. K.S. processed the cryo-EM data, obtained the final reconstructions and contributed to manuscript writing under the supervision of A.M. V.L. purified AmVg supervised by Ø.H. and G.A. V.L. also performed the MSA. The manuscript draft was written by M.M.-C. with contributions from all authors.
Corresponding author
Ethics declarations
Competing interests
The authors declare no competing interests.
Peer review
Peer review information
Nature Communications thanks the anonymous reviewers for their contribution to the peer review of this work. A peer review file is available.
Additional information
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary information
Source data
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.
About this article
Cite this article
Montserrat-Canals, M., Schnelle, K., Leipart, V. et al. Cryo-EM structure of native honey bee vitellogenin. Nat Commun 16, 5736 (2025). https://doi.org/10.1038/s41467-025-58575-y
Received:
Accepted:
Published:
DOI: https://doi.org/10.1038/s41467-025-58575-y