Abstract
All extant jawless vertebrates (lampreys and hagfishes) possess a unique adaptive immune system characterized by highly variable lymphocyte receptors (VLR) that are assembled in developing lymphocytes using leucine-rich-repeat donor cassettes. Five VLR types have been identified in lampreys: VLRA, VLRB, VLRC, VLRD, and VLRE. VLRB-expressing lymphocytes are functional analogs to B cells, whereas VLRA, VLRC, VLRD, and VLRE-expressing lymphocytes are more akin to T cells of jawed vertebrates. Here we define an additional VLR, designated VLRF. VLRF is phylogenetically closest to VLRA, with which it likely shares a common ancestral gene of at least 250 million years in the past. VLR assembly analyses show that VLRA, VLRC, VLRD, VLRE, and VLRF share donor cassettes through long-range intra- and inter-chromosomal interactions, whereas VLRB utilizes a distinct, dedicated cassette set. The pattern of gene expression, donor cassette usage, and distinctive amino acid composition in the C-terminal stalk suggest that VLRF⁺ lymphocytes may represent an additional T-like sub-lineage, adding further complexity to the VLR-based adaptive immune system.
Similar content being viewed by others
Introduction
The extant species of jawless vertebrates (agnathans), including the lampreys and hagfishes, represent an ancient lineage that diverged from the lineage leading to the jawed vertebrates (gnathostomes) early in vertebrate evolution1,2,3,4,5,6. Two alternative types of combinatorial adaptive immune systems (AIS) arose in vertebrates ~ 500 million years ago7,8,9,10,11. All classes of jawed vertebrates have a system of immunoglobulins (Igs) and T cell receptors (TCRs) based on immunoglobulin domain structures7,8,9,10,11. Jawless vertebrates, in contrast, have developed an AIS based on somatically diversifying variable lymphocyte receptors (VLRs) that are composed of leucine-rich repeats (LRR)12,13,14,15,16,17,18,19,20. Crystallographic analysis of VLR monomers reveal a horseshoe-shaped structure with antigen binding via β-strands of the hypervariable concave surface21,22,23,24,25.
While immunoglobulins and T cell receptors in jawed vertebrates generate diversity through recombination of variable, diversifying and joining (V, D, J) gene segments, VLRs achieve comparable diversity by assembling complete receptor sequences from gene fragments encoding LRR segments7,19,20,26,27. The incomplete germline VLR genes have intervening non-coding regions between N-terminal and C-terminal coding sequences and thus cannot encode functional proteins. During the development of VLR-bearing lymphocyte-like cells, numerous flanking LRR-encoding donor cassettes are used as templates for the serial piece-wise and step-wise replacement of the intervening sequences in the assembly of mature VLR genes17,19,20,28,29. This process likely occurs through a gene conversion-like mechanism mediated by AID/APOBEC-family cytidine deaminases (CDAs): CDA1 in T-like cells and CDA2 in B-like cells20,30,31.
Previously, five VLR genes (VLRA, VLRB, VLRC, VLRD and VLRE) have been identified in lampreys, while three VLR genes (VLRA, VLRB, and VLRC) have been reported in hagfish12,14,16,18,19,20. In sea lampreys, VLRB is expressed by B cell-like lymphocytes that can respond to antigen stimulation by proliferation and differentiation into plasma cells that secrete multivalent VLRB antibodies, whereas VLRA, VLRC, VLRD and VLRE are expressed by T-like lymphocytes32,33,34,35,36.
Here, we identify and characterize a previously undescribed VLR gene (VLRF) in lampreys. VLRF is phylogenetically related to VLRA and is located on the same chromosome. Surveys of available lamprey genomes indicate that VLRA and VLRF diverged from each other at least 250 million years ago. Mature VLRF is predominantly expressed in the triple negative (VLRA − /VLRB − /VLRC − ) populations of lymphocytes. Single cell sequence analysis and the trends in genomic donor cassette usage and sharing between VLRA, VLRC, VLRD, VLRE and VLRF suggest that the VLRF gene is expressed in distinct T-cell-like lymphocytes in lampreys. These findings provide valuable insights into the evolution of anticipatory receptors on T-like lymphocytes in jawless vertebrates.
Results
Identification and characterization of VLRF gene in sea lamprey
We conducted HMMER searches against lamprey transcriptomes available in the SIMRBASE and NCBI databases to identify transcript sequences encoding either an LRRNT domain and/or an LRRCT domain. Among the filtered sequences, we identified a transcript in the Pacific lamprey (Entosphenus tridentatus; ID: ETRf_mk00031278-P) that encodes a complete LRRNT domain, a partial LRRCT domain, and a potential C-terminal stalk region containing a transmembrane (TM) domain. Subsequent similarity searches against the sea lamprey reference genome sequence (kPetMar1.pri) predicted a distinct incomplete germline VLR-like gene with an N-terminal coding region followed by a non-coding intervening sequence of 184 bp and a C-terminal coding region that we designated as VLRF. The N-terminal coding region of the germline VLRF gene encodes the signal peptide (SP) and an entire LRRNT module, whereas the C-terminal coding region encodes the 3′ portion of the LRRCT module and the stalk region with a TM domain (Fig. 1a and Supplementary Fig. 1a). This germline configuration is similar to that of the VLRA gene in lampreys (Supplementary Fig. 1b).
a Germline VLRF gene in the sea lamprey. b Isolation of mature (M) and germline (GL) VLRF sequences from electrophoretic bands (n = 3). c Schematic of the mature VLRF sequence. d Alignment of representative mature VLRA, VLRB, VLRC, VLRD, VLRE, and VLRF sequences in the sea lamprey. Conserved cysteine residues in LRRNT and LRRCT are highlighted in yellow. e Number of LRRV modules in VLRA, VLRB, VLRC, and VLRF repertoires in the sea lamprey. The ending-LRRVe module is included in the number of LRRVs of a mature VLR sequence. 100 mature sequences of each VLR isotype are analyzed. f Predicted 3D structure of VLRF, highlighting the protruding loop in the LRRCT region of this assembly (GenBank accession no. PQ159758). The LRRNT and LRR1 regions are shown in blue, LRRVs are shown in green, while the CP and LRRCT regions are represented in red. g Analysis of the length of the HVR in the LRRCT domain. The trends in the HVR length distribution of VLRB are distinct from those of VLRA, VLRC and VLRF. h A phylogenetic tree constructed using five representative mature sequences for each VLR isotype in the sea lamprey. Bootstrap support values are provided for interior branches. The scale bar represents sequence substitution per site. i Prediction of the transmembrane domain (TM) in VLRF. j Histograms showing anti-HA staining on transfected cells detected by flow cytometry. HEK 293 cells were transfected with an Empty vector, HA epitope-tagged VLRF clone 1 or VLRF clone 2. k Whole cell lysates from transfected HEK 293 cells were analyzed by immunoblotting for the presence of HA epitope (VLRF) and beta actin loading control. Brackets indicate the expected bands for VLRFs with three (clone 1) and four (clone 2) LRRV domains. Each blot is a representative of two independent experiments.
Accordingly, a set of primers were designed to amplify mature VLRF sequences using leukocyte cDNA from sea lamprey ammocoete larvae as template (Supplementary Table 1). These primers amplified both germline and mature sequences. The mature VLRF transcripts were larger than those for the germline genes (Fig. 1b), and this size difference enabled the discrimination and purification of assembled sequences. We obtained 100 unique assembled VLRF transcript sequences (Supplementary Fig. 2) by isolating the fragments for potential mature sequences after gel electrophoresis. The predicted VLRF protein of the sea lamprey possesses characteristics similar to those of known VLRs (Fig. 1c, d). However, differences in the SP and C-terminal stalk region identify it as a distinct member of the lamprey VLR family. The mature VLRF sequences have a signal peptide of 23 residues, an LRRNT domain of 51 residues, an LRR1 domain of 18 residues, followed by a variable number (2–6) of distinct LRRV domains (each 24 residues in length) including LRRVe (the end or last LRRV domain), a 12-residue-long CP, an LRRCT domain of varying length (57–62 amino acids), a 76-residue C-terminus stalk region with a TM domain, and a short cytoplasmic tail. Notably, although the germline VLRF encodes a complete LRRNT module, this sequence can be replaced by sequences from divergent LRRNT-coding genomic donor cassettes. The LRRNT and LRRCT domains of all known VLRs in lampreys, which both exhibit conservation of four cysteine residues known to be involved in the formation of two disulfide bridges12,16,37 are also present in VLRF (Fig. 1d). In lampreys, the LRRCT domain of VLRB and VLRC contains the CXC motif (X stands for any amino acid other than cysteine), whereas VLRF possesses the CXXC motif in the LRRCT domain, similar to that of VLRA, VLRD, and VLRE.
More than 70% of functionally assembled VLRC in lampreys exhibit a length preference for four LRRV modules, and a similar trend is observed for lamprey mature VLRA24,28,29. To determine the number of LRRV modules in functionally assembled VLRF sequences in comparison with other VLRs, we analyzed 100 sequences each of mature VLRF, VLRA, VLRB, and VLRC transcripts from sea lamprey (VLRD and VLRE were excluded in this analysis due to a paucity of mature sequences12. We found that VLRAs, VLRCs, and VLRFs have a higher average number of LRRV modules than does VLRB (Fig. 1e). VLRB assemblies in lampreys predominantly contain two LRRV modules, with fewer assemblies possessing one or three LRRV modules, respectively. In contrast, VLRF, VLRA, and VLRC assemblies predominantly possess four LRRV modules, typically ranging from three to five modules (Fig. 1e), indicating a clear distinction in the selection of the number of LRRVs in mature VLRB compared to that in mature VLRA, VLRC, and VLRF assemblies. To compare the distribution of LRRV modules in lamprey VLRs with those in hagfish, we analyzed previously identified mature VLRA, VLRB, and VLRC sequences in the Pacific hagfish (Eptatretus stoutii) (Supplementary Fig. 3). Our analysis revealed a pattern consistent with that observed in lampreys, where mature VLRA and VLRC predominantly contain four LRRV modules, whereas mature VLRB preferentially harbors two LRRV modules (Supplementary Fig. 3a).
Our structural modeling38,39 of VLRF predicts that it adopts a VLR-like solenoid structure and has the potential to form a protruding loop at its hypervariable region (HVR) in the LRRCT module (Fig. 1f). Previous studies have shown that VLRA and VLRB use amino acid variations on their concave surfaces, along with those in the HVR loop, to bind antigens13,22,40,41,42. However, structural variations exist among different lamprey VLRs, particularly concerning the HVR of the LRRCT module (Fig. 1g and Supplementary Fig. 4). The number of amino acid residues in the HVR of VLRC (which has only three LRRCT options: one encoded by the germline VLRC gene and two by donor LRRCT cassettes) and by VLRD (only encoded by the germline VLRD gene) is too few to form a protruding loop12,14. In contrast, there is a protruding loop in the single germline encoded VLRE LRRCT module12. The germline configurations of VLRA12, VLRB43, and VLRF genes (Supplementary Fig. 1b) suggest that their LRRCT module HVR is encoded by genomic donor cassettes, as the germline genes only encode the 3’ portion of LRRCT. However, there is a clear discrepancy between the length distribution of the HVR in mature VLRB and those of VLRA and VLRF (Fig. 1g). The distribution of HVR lengths in lamprey VLRBs (sample size: 100) exhibits a bimodal pattern, with the majority encoding an HVR of 14–18 amino acid residues, and others have very short HVR (9–11 residues) that potentially cannot form a protruding loop (Fig. 1g and Supplementary Fig. 4). A similar trend is observed in hagfish, where most VLRB sequences harbor long HVRs ranging from 16 to 20 residues, while a subset contains shorter HVRs (6–10 residues) (Supplementary Fig. 3b). In contrast, analysis of 100 VLRA and 100 VLRF sequences in lampreys revealed that the HVRs in all VLRA (18–21 residues) and VLRF (16–21 residues) sequences consistently encode similarly large protruding LRRCT loops (Fig. 1g and Supplementary Fig. 4). Likewise, in hagfish, the HVRs of VLRA exhibit a comparable length range (17–21 residues), mirroring the pattern observed in lampreys (Supplementary Fig. 3b).
To examine the evolutionary relationship of this recently identified VLRF with known VLRs, we performed phylogenetic analysis using representative mature sequences from sea lampreys. This analysis was based on reliably aligned amino acid sequences corresponding to the SP, LRRNT, LRR1, LRRVe, LRRCT, and stalk regions. The resulting phylogeny revealed that VLRF is strongly clustered with VLRA with 98% bootstrap support, indicating that VLRF is phylogenetically closest to VLRA among lamprey VLRs (Fig. 1h). Similarly, the phylogenetic proximity of VLRD and VLRE in lampreys is also supported by a high bootstrap value (100%).
To investigate whether VLRF can be expressed on the plasma membrane, given its predicted transmembrane domain in the C-terminal stalk region (Fig. 1i), we conducted a series of experiments using a mammalian expression system (Fig. 1j). Two unique VLRF assemblies containing three (clone 1) and four (clone 2) LRRV domains were cloned into a mammalian expression vector, with the influenza hemagglutinin (HA) epitope tag incorporated at the N-terminus of VLRF to facilitate detection. We transfected the heterologous HEK 293 cell line with either an empty vector or one of the two HA-tagged VLRF clones. Transfected cells expressing the VLRF clones exhibited positive staining, indicating the presence of VLRF at the cell surface. Moreover, the transfected VLRF proteins could be detected in whole cell lysates (Fig. 1k). Together, these results suggest that VLRF can be expressed on the surface of transfected cells.
Genomic organization of the VLRF locus and donor cassette usage
We mapped the germline VLRF gene along with genomic donor cassettes used in mature VLRF assemblies to the sea lamprey reference genome sequence5,44 (kPetMar1.pri, reference Annotation Release 100), using an iterative similarity search strategy. The germline VLRF is located on chromosome 58, which also harbors the germline VLRA in the opposite transcriptional orientation (Fig. 2a, b and Supplementary Table 2). Like the sea lamprey VLRA gene, the majority of VLRF-encoding cassettes are distributed in two clusters on chromosome 58 (Ch.58-cI and Ch.58-cII). There are 129 genomic donor cassettes in cluster I, nearly all of which (127 cassettes) are flanked on either side by the germline VLRA and VLRF genes. We identified 144 donor cassettes in cluster II. The germline VLRA and VLRF genes, along with the 129 cassettes within cluster I (Ch.58-cI), are distributed across an 883 kb region, while the 144 donor cassettes in cluster II (Ch.58-cII) are confined to a 263 kb region at a higher genomic density. Chromosome 7 contains the germline VLRC gene (Fig. 2c, d and Supplementary Table 2) flanked by 18 donor cassettes.
a Germline VLRA and VLRF genes are located on chromosome 58, with donor cassettes distributed in two genomic clusters (Cluster I and Cluster II) on chromosome 58. b Genomic organization of the sea lamprey VLRA and VLRF loci on chromosome 58. Cluster I and Cluster II on chromosome 58 contain 129 and 144 donor cassettes, respectively. c Germline VLRC and adjacent donor cassettes are located on chromosome 7. Chromosome 7 is much larger than chromosome 58, as indicated by the gap. d Germline VLRC and the donor cassette cluster on chromosome 7. This cluster contains 18 donor cassettes primarily used for VLRC assembly. For figures (b and d), the cassettes are presented according to their genomic spacing, although the icons are not to scale. Filled triangles above each donor cassette and the germline VLR gene indicate the transcriptional orientation.
Chromosome 75 harbors the germline VLRD and VLRE genes, associated with 39 donor cassettes12. Notably, the 39 donor cassettes on chromosome 75 are almost exclusively used by mature VLRD and VLRE (Supplementary Table 2). The germline VLRB gene is situated on chromosome 21, associated with VLRB-specific genomic donor cassettes, some of which are also found on chromosome 54 and scaffold 6143.
Intra- and interchromosomal cassette sharing
Next, we analyzed the chromosomal origins of donor cassettes that are incorporated into mature VLRA, VLRC, VLRD, VLRE, and VLRF assemblies (Fig. 3 and Supplementary Table 2). This analysis reveals that VLRF and VLRA primarily use donor cassettes from Ch.58-cI, followed by cluster Ch.58-cII (Figs. 2b, 3a, and Supplementary Table 2). In contrast, mature VLRC assemblies predominantly incorporate cassettes from Ch.58-cII, followed by those from chromosome 7 and Ch.58-cI, respectively. Donor cassettes from chromosome 75 and Ch.58-cII, along with a limited number from Ch.58-cI, are integrated into mature VLRD/VLRE sequences. These findings suggest that despite the shared usage of donor cassettes among mature VLRA, VLRC, VLRD, VLRE, and VLRF sequences (Fig. 3b, c), there are clear preferences in cassette usage (Fig. 3d). Notably, as observed with other VLRs12,28,29,43, different portions of the same donor cassettes can be incorporated into mature VLRFs (Supplementary Fig. 5), indicating a potential for enormous sequence diversity.
a Donor cassette usage from different genomic locations. Donor cassettes from cluster I (Ch.58-cI) and cluster II (Ch.58-cII) of chromosome 58 are mainly used in VLRA and VLRF assemblies. In contrast, VLRC assemblies predominantly use cassettes from Ch.58-cII, followed by chromosome 7 and Ch.58-cI, respectively. For VLRD/VLRE assemblies, donor cassettes from chromosome 75 and Ch.58-cII, along with a small number of cassettes from Ch.58-cI, are used. VLRD and VLRE are considered together due to the limited number of assembled sequences available and their high similarity. b Patterns of donor cassette sharing among VLRs analyzed in relation to the total cassette usage in a dataset comprising 100 VLRA, 100 VLRC, 60 VLRD/VLRE, and 100 VLRF mature sequences. Notably, the highest percentage of cassette sharing is observed between VLRF and VLRA. c Examples of donor cassette sharing with representative sequences. Cassettes incorporated into mature VLR sequences are depicted in red, with the accession numbers of the mature sequences provided in parentheses. Only regions with 100% sequence identity between the donor cassettes and mature sequences are shown, and the corresponding encoded LRR modules are indicated. Alternative codons are highlighted in yellow and orange. d Schematic representation of donor cassette usage among mature VLRA, VLRC, VLRD, VLRE, and VLRF sequences. Genomic donor cassette clusters on chromosomes 58, 7, and 75 are highlighted in color-coded rectangles. Solid lines represent major contributions, while dotted lines denote minor contributions of donor cassettes from each genomic cluster to the mature sequences. Line colors correspond to their respective donor clusters. The graphic was created using BioRender.
It is noteworthy that the highest frequency of donor cassette sharing is observed between VLRF and VLRA (Fig. 3b and Supplementary Table 2). This shared use is most prominent for cassettes encoding LRRV modules (Supplementary Table 2), though cassettes encoding the LRRNT, LRR1, CP, and LRRCT regions are also shared (Fig. 3c and Supplementary Table 2). Of particular interest, the 5’LRRCT-LRRCTm cassettes, which encode the 5’ and middle portions of the LRRCT module, are uniquely shared by mature VLRA and VLRF —these cassettes are not shared among other VLRs in lampreys. This specificity likely reflects similar trends in variation within the HVR region (a component of the LRRCT encoded by 5’LRRCT-LRRCTm cassettes) in lamprey VLRF and VLRA. In contrast, no evidence of genomic donor cassette sharing was observed between mature VLRB and other VLRs, including VLRF (Fig. 3d and Supplementary Fig. 6a, b). However, nucleotide sequence similarities were identified between cassettes encoding VLRB and those encoding other VLRs (Supplementary Fig. 6c–f). In sum, our data clearly demonstrate the presence of inter-chromosomal cassette sharing, a phenomenon that is exceedingly rare in antigen receptor assemblies of jawed vertebrates and thus represents a unique feature of VLR gene assembly. This finding suggests the presence of chromosomal mechanisms beyond mere sequence homology, influencing the differential incorporation of specific donor cassettes into the VLRB assembly process and those of other VLRs (VLRA, VLRC, VLRD, VLRE, and VLRF).
Given that mature VLRA and VLRF share the highest percentage of donor cassettes encoding LRRNT, LRR1, LRRV, CP, and LRRCT modules, we sought to determine whether these receptors share the N-terminal and C-terminal coding regions of their respective germline-incomplete genes, where one might observe chimeric sequences with an N-terminal VLRA and C-terminal VLRF, or vice versa. To test this hypothesis, we employed PCR primers designed to amplify any potential chimeric mature products (Supplementary Fig. 7). However, gel electrophoresis results reveal the presence of only mature VLRA and mature VLRF products, with no evidence of chimeric sequences. These findings suggest that the N-terminal and C-terminal coding regions of the germline VLRA and VLRF remain distinct and are not exchanged during the assembly process.
VLRF orthologs in different lamprey species
We conducted a comprehensive survey of five additional lamprey genomes and three hagfish genomes to identify orthologs of the VLRF gene. Orthologs of the sea lamprey (Petromyzon marinus) germline VLRF gene are found in our search of genomic data for five additional lamprey species covering different taxonomic positions (Supplementary Table 3). However, no orthologous sequences were detected in the currently available genome assemblies of the inshore hagfish (Eptatretus burgeri), brown hagfish (Eptatretus atami), and Atlantic hagfish (Myxine glutinosa). Among these five lampreys, Pacific lamprey (Entosphenus tridentatus), European brook lamprey (Lampetra planeri), and Far Eastern brook lamprey (Lethenteron reissneri) belong to the Petromyzoninae subfamily of which sea lamprey (Petromyzon marinus) is also a member. In contrast, the short-headed lamprey (Mordacia mordax) and pouched lamprey (Geotria australis) belong to the distantly related Mordaciinae and Geotrinae subfamilies, respectively45. The lampreys of the southern hemisphere, Mordaciinae and Geotrinae subfamilies, diverged from a common ancestor with the northern hemisphere Petromyzoninae approximately 250 million years ago46. Despite their different taxonomic positions, VLRF sequences are fairly conserved in these six lamprey species (Fig. 4 and Supplementary Fig. 8). Like sea lamprey germline VLRF, the homologous germline VLRF gene in all five additional species contains two coding segments separated by a short intervening sequence. The N-terminal coding segment of germline VLRF encodes the signal peptide and entire LRRNT domain, whereas the C-terminal segment encodes the 3′LRRCT and the C-terminal stalk region with a transmembrane domain (Supplementary Fig. 8). The identification of VLRF in the most distant lampreys (Geotria australis and Mordacia mordax) relative to sea lamprey indicates that the gene diverged from a common ancestor with VLRA at least ~ 250 Mya.
aVLRF and VLRA are phylogenetically close in all lamprey species. The phylogenetic tree is constructed by using the protein-coding segments of germline VLR genes in six different lamprey species. Bootstrap values are provided for all branches. Mm, Mordacia mordax; Ga, Geotria australis; Pm, Petromyzon marinus; Et, Entosphenus tridentatus; Lp, Lampetra planeri; Lr, Lethenteron reissneri. Note that the current version of the Mordacia mordax genome lacks detectable orthologs for VLRD. b Genomic map of germline VLRA and VLRF in short-headed lamprey (Mordacia mordax), Far Eastern brook lamprey (Lethenteron reissneri) and Pacific lamprey (Entosphenus tridentatus). The distance between germline VLRA and VLRF is much larger in L. reissneri and E. tridentatus than in M mordax, as indicated by the gap. The map is not drawn to scale. Filled triangles above the germline genes indicate transcription orientation.
Using the coding regions of germline VLRA, VLRB, VLRC, VLRD, VLRE and VLRF genes in six lamprey species, we conducted a phylogenetic analysis. Similar to the phylogenetic tree made using portions of the assembled VLR sequences in sea lampreys (see Fig.1), the phylogenetic proximity of VLRF and VLRA is confirmed in all six lamprey species examined (Fig. 4a). Nevertheless, the amino acid composition of the C-terminal stalk regions (excluding the transmembrane domain) shows pronounced divergence between VLRA and VLRF (Supplementary Fig. 8). Among these additional five species, we also found that germline VLRA and VLRF are located on the same chromosome in three lamprey species where the relative genomic positions can be determined because of the availability of more completely assembled genome sequences (Fig. 4b). Like the sea lamprey, germline VLRA and VLRF are located in opposite transcriptional orientations in the Far Eastern brook lamprey (Lethenteron reissneri) and Pacific lamprey (Entosphenus tridentatus), whereas they are located in the same transcriptional orientation in the distantly related short-headed lamprey (Mordacia mordax). To investigate whether donor cassette sharing between VLRF and VLRA, as observed in the sea lamprey, also occurs in other lamprey species, we cloned mature VLRF sequences from the European brook lamprey. Our findings provide evidence that, similar to the sea lamprey (Fig. 3b, c), mature VLRA and VLRF in the European brook lamprey share donor cassettes (Supplementary Fig. 9).
Cellular and tissue expression patterns of VLRF
To examine the cellular expression patterns of VLRF, we isolated sea lamprey lymphocyte populations from white blood cells (WBCs) using monoclonal antibodies (mAbs) specific for VLRA, VLRB and VLRC. Previously, the specificities of these mAbs were confirmed by testing against a panel of VLRA-, VLRB- and VLRC-expressing transfectants33,35. The qPCR analysis of cDNA from these cells, using primers targeting the gene’s constant region to detect both germline and mature assembly transcripts (as the diversity region in mature sequences formed from variable donor cassettes), showed that the highest expression levels of VLRF is found in the triple-negative (VLRA−/VLRB−/VLRC−) population of lymphocytes. A lower amount of VLRF expression is also detected in VLRA+ and VLRC+ cells, whereas it is undetectable in VLRB+ lymphocytes (Fig. 5a). When examining the assembly status of VLRF in cDNA from the VLRA+, VLRB+, VLRC+ and triple-negative lymphocyte populations, we found that both germline and assembled VLRF genes are transcribed in triple-negative cells as well as in VLRA+ and VLRC+ cells, with the highest expression of mature VLRF in the triple-negative population (Fig. 5b).
a Expression of the VLRF gene in different lymphocyte populations in lamprey larvae. TN represents the triple-negative (VLRA−/VLRB−/VLRC−) lymphocyte population. Bars indicate the standard error of the mean in each experiment (n = 6). b Assembly status of VLRF in total RNA extracted from purified lymphocyte populations. bp, base pairs; GL, germline transcripts; M, transcripts of assembled VLRF gene. A+, VLRA+; B+, VLRB+; C+, VLRC+. Each experiment was repeated three times independently with similar results. c, d Tissue expression profiles for VLRF in larvae (c) and adult (d) lampreys. Transcripts are analyzed by real-time RT-PCR with beta-actin as a normalization control for both cellular and tissue distribution analyses. Bars indicate the standard error of the mean for three lamprey larvae (c) and three adult lampreys (d) in each experiment. Bars indicate the standard error of the mean for four lamprey larvae (c) and four adult lampreys (d) in each experiment. The corresponding data points were shown as dot plots.
In lamprey ammocoete larvae, T-cell maturation occurs in the thymus equivalent tissues, collectively termed the thymoid, located at the tips of the gills30,47,48. VLRB+ cell maturation is thought to take place in the kidney and typhlosole47,48. After metamorphosis to the adult form, the hematopoietic activity of the kidney decreases markedly, the typhlosole disappears, and the supraneural body becomes the main hematopoietic organ47,49,50. To gain insight into the tissue expression of the VLRF gene, we examined its basal expression in various immune-related organs of larvae and adults of sea lampreys (Fig. 5c, d). In larvae, VLRF expression is high in the blood, kidney, and gills. This pattern is similar in adults, with high expression in the blood and gills, but the highest levels are found in the supraneural body. In contrast, the adult kidney shows low VLRF expression.
To elucidate the distribution and role of VLRF⁺ cells within the immune tissues of the sea lamprey, we designed specific in situ hybridization probes for VLRF, VLRA, and CDA1, using VLRA as a positive control due to its well-characterized presence in immune tissues33, and CDA1 as a marker of assembly location30. Accounting for the sequence variation within the diversity region of mature sequences produced by donor cassettes, the VLR probes were designed to target the gene’s constant regions, thereby allowing detection of both germline transcripts and mature assemblies. In light of the extensive lymphopoietic changes occurring throughout the lamprey life cycle, we investigated immune-related tissues in both larval and parasitic adult stages (Fig. 6). To validate the VLRF probe, we co-stained sections of ammocoetes using probes for both VLRF and VLRA. When focusing on the thymoid and the epipharyngeal ridge, a mucosal surface dorsal to the gills, we found that cells often co-express germline and/or mature VLRF and VLRA transcripts and are present in both mucosal tissues (Fig. 6f and Supplementary Fig. 10).
a H&E-stained cross-section of the anterior region of lamprey larvae, including the epipharyngeal ridge (Ep), thymoids (T), and gills (Gi). b, c Higher magnification of H&E-stained gill filaments and thymoids (b) and the epipharyngeal ridge (c). d–f In situ hybridization for VLRF (red) in larval immune tissues. VLRF + cells are dispersed throughout the gill filaments and thymoids (d). In the epipharyngeal ridge, VLRF expression is observed in cells lining the epithelial layer (e). Co-localization of VLRF (red) and VLRA (green) in the epipharyngeal ridge (f). g H&E-stained cross-section of the posterior body, depicting the kidney (K) and typhlosole (Ty) in larvae. h, i Higher magnification views of the H&E-stained kidney (h) and typhlosole regions (i) showing lymphoid cells located adjacent to the renal tubules (Rt) in the kidney and surrounding the main blood vessel (Bv) in the typhlosole. j–l In situ hybridization for VLRF (red) in posterior immune tissues. VLRF + cells localize around blood vessels within the typhlosole (j) and appear scattered in the kidney (k). Panel (l) shows VLRF+ cells beneath the intestine (ln). m, n H&E-stained sections of the gills (m) and the supraneural body (Sb) (n) of adults. o–q in situ hybridization for VLRF (red) in adult immune tissues. VLRF+ cells line the gill filament epithelium (o). Supraneural body contains numerous VLRF+ cells in the lymphoid area (p). VLRF+ cells are observed beneath the intestinal epithelium in adults (q). r–u In situ hybridization showing expression of VLRF (red) and CDA1 (green) in the thymoid. A VLRF⁺CDA1⁺ cell is indicated by an arrow, and an arrowhead marks a VLRF⁺CDA1⁻ cell. Nuclei are counterstained with DAPI. Abbreviations: Sc, spinal cord; N, notochord; Ep, epipharyngeal ridge; T, thymoids; Gi, gills; K, kidney; In, intestine; Ty, typhlosole; Bv, blood vessel; Rt, renal tubules; Sb, supraneural body. Scale bars: (a, g, m) 400 µm; (b, c, d, h, i, j, o, p) 20 µm; (e) 25 µm; (f) 5 µm; (n) 100 µm; (k, l, q–u) 10 µm. Representative images shown; histological patterns were consistent across all specimens analyzed (n = 4).
Our RNA in situ hybridization studies revealed a prominent presence of VLRF-expressing cells across multiple immune-related tissues in both naïve ammocoetes and adult parasitic lampreys. Notably, these cells are abundant in the gills, typhlosole, supraneural body, intestine, and kidney (Fig. 6). In ammocoetes, CDA1 transcripts are localized to the thymoid region, with predominant expression at the tips of the gill filaments; a subset of CDA1-expressing cells was found to co-express VLRF (Fig. 6r–u and Supplementary Fig. 10). VLRF-expressing cells are also scattered throughout the gill filaments and the epipharyngeal ridge, where they are particularly numerous (Fig. 6d–f). In the typhlosole, VLRF-expressing cells are abundant around the blood vessels of the typhlosole, occasionally forming aggregates (Fig. 6j). In the kidney, these cells are dispersed in the ventral region (Fig. 6k), particularly within the renal tubules, where lymphocytes are typically abundant (Fig. 6h). In addition, VLRF-expressing cells are detected beneath the intestinal epithelial cells (Fig. 6l).
In adult parasitic lampreys, a similar distribution pattern of VLRF-expressing cells is observed (Fig. 6o–q). VLRF-expressing cells lined the gill epithelium (Fig. 6o) and are located beneath the intestinal epithelium (Fig. 6q), consistent with continued exposure to environmental antigens during the parasitic phase. The supraneural body exhibited extensive lymphoid tissue (Fig. 6n), densely populated with VLRF-expressing cells showing varying levels of mRNA expression (Fig. 6p). Collectively, our histological studies demonstrate the widespread distribution of VLRF-expressing cells across multiple immune-related tissues in the sea lamprey.
VLR-expressing cell populations at single cell resolution
We performed single-cell RNA sequencing (scRNA-seq) on WBCs isolated from a sea lamprey ammocoete larva. Figure 7a illustrates the leukocyte cell-type clusters identified in an analysis of 10,612 WBCs. The expression profiles of VLR genes, along with B-cell and T-cell-specific markers, enabled the identification of B-like and T-like cell clusters (Fig. 7 and Supplementary Figs. 11 and 12). Monocyte, granulocyte, and red blood cell (RBC) clusters were identified using cell-type-specific marker genes (Supplementary Fig. 13). Eighteen clusters are delineated in this analysis, showing strong separation between T-like cells, B-like cells, monocytes, granulocytes and RBCs. Given the unique assembly mechanism of VLR genes, we initially focused on the expression of invariant VLR segments, which are present in both germline and assembled VLR transcripts. Invariant sequences from VLRA, VLRC, and VLRF were expressed in a broad and contiguous set of clusters (Fig. 7b, d, e). Similarly, VLRD and VLRE showed comparable expression patterns, although in very few cells (Fig. 7f). In contrast, invariant VLRB sequences were expressed in three distinct clusters corresponding to B-like cells (Fig. 7c).
a Cluster designations were assigned to known cell type by VLR (B- and T-like cells) and marker gene expression. b–f Expression of VLR invariant sequence segments of (b) VLRA, (c) VLRB, (d) VLRC, (e) VLRF, and (f) VLRD (red dots) and VLRE (green dots). Note that in figures (b–f), detection of the VLR invariant regions cannot distinguish between germline or assembled VLR transcription. For (f), positive cells are designated without reference to expression level due to very low expression of VLRD and VLRE.
To visualize the assembled VLR sequences within the UMAP plot, we identified cell barcodes associated with read pairs where one read aligned to the invariant sequence and the paired read mapped to the assembled VLR sequence. These reads are excluded in the standard mapping protocol but are associated with cell barcodes corresponding to cells in the UMAP plot. Because the distribution of 10 × 5′ reads tends to be close to the transcriptional start and the read pairs are separated by only a short interval or are sometimes overlapping, these pairs are relatively rare for VLRA and VLRF, although they are somewhat more abundant for VLRB and VLRC. Nonetheless, the expression patterns of cells containing the read pairs (Fig. 8a–c) are consistent with the corresponding invariant sequences (Fig. 7b, d, e). For comparison, assembled VLRB reads were more abundant and predominantly localized within the three B-like cell clusters (Fig. 8d).
a–d Cells expressing assembled VLR genes in an analysis of all white blood cells: VLRA (a), VLRC (b), VLRF (c), and VLRB, (d). e–h Overlap among cells that are positive for assembled VLRA, VLRC, and VLRF sequence in a reanalysis of only cells from T-like cell clusters (see Fig. 7) showing cells positive for at least one assembled sequence for VLRA (e) VLRC (f) and VLRF (g). h A Venn diagram showing the numbers of overlapping cells expressing at least one sequence from an assembled T-like cell VLR gene (bold numbers) or VLR invariant regions (numbers in parentheses). Note that because the distribution of 10 x sequence along the VLR transcripts in this analysis is highly biased toward the 5′ end of most transcripts, assembled sequences are significantly underrepresented, but are nonetheless localized to T-like (VLRA, VLRC, VLRF) and B-like (VLRB) cell clusters. VLRB and VLRC assemblies are more prevalent than VLRA and VLRF assemblies, so a higher cut-off of ≥ 2 UMI counts was set for assembled VLRC (b) and VLRB (d) to reduce background from ambient (non-cell-associated) RNA. For much rarer VLRA and VLRF assemblies, all detectable cells with assembled sequences are highlighted.
Despite the relatively low sequencing depth inherent to the sparse 10x scRNA-seq data, we investigated whether T-like cells expressing assembled VLRA, VLRC, or VLRF transcripts exhibited any overlap. Figure 8e–g highlights the distribution of cells containing assembled VLR sequences within a focused reanalysis of the T-like cell clusters. Additionally, the Venn diagram in Fig. 8h demonstrates minimal overlap among these populations, with a small subset of cells co-expressing VLRC and VLRA, consistent with previous findings24,35. Notably, some overlap was also detected between VLRF- and VLRC-expressing cells, whereas no overlap was observed between VLRA and VLRF-expressing populations.
Discussion
Agnathan VLR germline genes are difficult to identify de novo as they usually encompass poorly conserved N-terminal and C-terminal coding regions separated by non-conserved, non-coding intervening sequences. Furthermore, mature VLRs are not captured by assembly algorithms in typical transcriptome datasets generated from short read sequence strategies. The cloning and comparative analysis of mature VLRF sequences with their germline counterparts has identified VLRF as a distinct member of the VLR gene family in lampreys. The VLRF gene generates a highly diverse repertoire through the incorporation of LRR-encoding donor cassette sequence across the LRRNT to LRRCT germline intervening region. Previous studies have demonstrated that the orthologue of the gene encoding forkhead box N1 (FOXN1)—a key marker of the thymopoietic microenvironment—as well as VLRA and CDA1, are expressed in the thymoid region, where VLRA⁺ T-like cells are thought to develop30. Our in situ hybridization analyses further confirmed the co-expression of VLRF and CDA1 in the thymoid (Fig. 6 and Supplementary Fig. 10), supporting the notion that this region serves as a developmental niche for both VLRA⁺ and VLRF⁺ T-like lymphocytes. During the ammocoete larval stage, VLRF gene expression is predominantly localized to white blood cells and tissues implicated in hematopoiesis, such as the kidney and typhlosole, as well as in the gill, which is an active immune barrier tissue and contains the lymphopoietic thymoid region. This tissue-specific expression pattern is similarly observed in adult lampreys, where the supraneural body, having developed as a major hematopoietic organ during metamorphosis47,49,50, exhibits pronounced VLRF expression after the typhlosole regression and gut remodeling.
The expression profile of the VLRF gene (Figs. 5–8), the presence of VLRF on the cell surface (Fig. 1), the selective usage of LRRV modules (Fig. 1), and the trends in genomic cassette usage and sharing (Fig. 3) provide evidence for the role of VLRF as a previously uncharacterized anticipatory receptor expressed in T-like lymphocytes in lampreys. Among the VLRs expressed in T-like lymphocyte populations, VLRF is phylogenetically most closely related to VLRA (Fig. 1h and Supplementary Fig. 8). However, the extracellular and the intracellular portions of the C-terminal stalk region differ substantially between VLRF and VLRA. Notably, differences in amino acid composition within regions flanking the transmembrane domain have been shown to influence receptor responsiveness to distinct signaling molecules, even among homologous proteins51,52,53. It is therefore plausible that the divergent sequences adjacent to the transmembrane domains of VLRF and VLRA underlie differences in ligand recognition or downstream signaling. Previous studies have shown that VLRB-expressing cells display characteristics akin to B-cells, while VLRA+ and VLRC+ cells resemble the αβ- and γδ-T cell lineages observed in jawed vertebrates, respectively32,33,35. Moreover, the expression of VLRD and VLRE is primarily localized to the triple-negative (VLRA−, VLRC−, VLRB−) lymphocyte population12. The predominant expression of VLRF within this triple-negative lymphocyte subset (Fig. 5a, b), combined with the largely non-overlapping expression of mature VLRF in the T-cell clusters (Fig. 8e–h) and the pronounced divergence in both the extracellular and intracellular portions of the C-terminal stalk region between VLRF and VLRA (Supplementary Fig. 8), suggests that VLRF-expressing cells may constitute a distinct T-like sublineage, separate from the VLRA- and VLRC-expressing T-like lymphocytes in lampreys. A recent analysis of lamprey lymphocytes has revealed that T-like cells in lampreys comprise multiple functionally specialized subtypes36. This expanded diversity of T-like lineages in jawless vertebrates potentially mirrors the diversification of T-cell lineages seen in jawed vertebrates, highlighting an evolutionary parallel in adaptive immune strategies.
In our examination of VLRF gene assembly across different lymphocyte populations, we observed transcription of both germline and assembled VLRF genes in triple-negative, VLRA+, and VLRC+ cells, with the highest levels of mature VLRF expression detected in the triple-negative population. The absence of mature VLRF transcripts in VLRB+ cells further suggests a clear functional demarcation between B-cell-mediated and T-cell-mediated immune mechanisms in lampreys. This observation is reinforced by our single-cell RNA sequencing and UMAP clustering analyses, which highlight the unique gene expression profiles of VLRB-expressing lymphocytes relative to other VLRF-, VLRA- and VLRC-expressing cells in sea lamprey (see Fig. 7). The tissue distribution analysis of VLRF expression in sea lampreys reveals a fairly conserved pattern across developmental stages relative to VLRA. In larvae, VLRF expression is notably high in the blood, kidney, and gills—key sites of immune and lymphopoietic activity. In adults, VLRF expression persists in the blood, gills, and supraneural body (see Figs.5 and 6), the latter of which is a principal hematopoietic organ in adult lampreys. In situ hybridization studies using VLRF probes further confirmed the widespread presence of VLRF-expressing cells in both systemic and mucosal immune tissues of unstimulated ammocoetes, including the gills, typhlosole, intestine, and kidney, indicating a broad role in immune surveillance across multiple tissue types. Particularly in the gills, a primary site of respiration and pathogen entry, significant populations of VLRF-expressing cells were scattered throughout the filaments, epipharyngeal ridge, and hypopharyngeal fold, as well as surrounding the lamprey thymoids, the thymus-equivalent structures located at the gill tips.
The availability of the updated sea lamprey genome sequence (kPetMar1) facilitated our analysis of the genomic organization and repertoire development of VLRF, which we compared with other VLRs expressed by T-like lymphocytes in lampreys. Notably, both germline VLRF and VLRA are located on chromosome 58, flanking cluster I (Ch.58-cI) at opposite ends of the cluster. Both VLRF and VLRA predominantly use donor cassettes from Ch.58-cI, as well as from cluster II (Ch.58-cII). In contrast, despite the location of germline VLRC, VLRD and VLRE on other chromosomes (Chr. 7 and Chr. 75), they use donor cassettes from the Ch.58-cI and Ch.58-cII in addition to the donor cassettes located in the corresponding chromosome harboring the germline genes (see Fig. 3 and Supplementary Table 2). The absence of donor cassette incorporation from Ch.58-cI, Ch.58-cII, chromosome 7, and chromosome 75 in VLRB assemblies indicates that B-like and T-like lymphocytes in lampreys utilize distinct donor cassette pools for their anticipatory receptors. Previously, we reported that certain donor cassettes are shared between mature VLRA, VLRC, VLRD and VLRE12,29. In this study, we observed a higher degree of corresponding cassette sharing between mature VLRF and VLRA compared to that between VLRF, VLRC, VLRD, and VLRE expressed in T-like lymphocytes. Notably, this cassette sharing is predominantly confined to cassettes located on chromosome 58. These findings imply the potential influence of differential selective pressures on cassette usage patterns among VLRs expressed in B-like and T-like lymphocytes, as well as among the distinct VLR genes expressed within T-like cell populations in lampreys. This could be driven both by sequence similarity and structural regulation at the level of chromosomal interactions. The observed differences in donor cassette utilization and the extent of cassette sharing among the anticipatory receptors in T-like lymphocytes may indicate their potential diversified roles in lamprey immunity. Our data also corroborate previous observations12 that genomic donor cassette usage in assembled VLRA, VLRC, VLRD, VLRE, and VLRF is not restricted to cis-configured cassettes (i.e., cassettes on the same chromosome), but often involves trans-configured cassettes (i.e., cassettes on different chromosomes). Our data provide compelling evidence for the occurrence of long-range intrachromosomal and interchromosomal cassette sharing, a phenomenon rarely observed in the antigen receptor assemblies of jawed vertebrates. This discovery underscores a unique feature of VLR gene assembly, indicating the presence of chromosomal mechanisms that extend beyond simple sequence homology. These mechanisms appear to facilitate the selective incorporation of donor cassettes into the VLRB assembly process within B-like lymphocytes, as well as into the assembly processes of VLRA, VLRC, VLRD, VLRE, and VLRF in T-like lymphocyte lineages. Although the underlying molecular mechanisms are not yet understood, accumulating evidence suggests that functional interactions occur between genomic regions situated either distantly on the same chromosome or across different chromosomes, especially for vertebrate microchromosomes as are present in the lamprey54,55,56,57,58.
Sequence comparisons of VLRs across different lamprey species suggest that VLRF and VLRA genes share a common evolutionary origin (Fig. 4). This relationship is also supported by their similar structural features, including the cysteine configuration, the presence of an extended loop comprising the hypervariable region in the LRRCT domain and the existence of a transmembrane domain following the C-terminal stalk region. Their phylogenetic distribution indicates that the divergence from a shared ancestor is nonetheless ancient (~ 250 mya). Stable retention throughout the diversification of extant lamprey lineages indicates that VLRF plays an essential role in the VLR-based adaptive immune system. It is notable that among the donor cassettes encoding the LRRNT to LRRCT regions, the cassettes that encode hypervariable loops (see Fig. 3 and Supplementary Table 2), are frequently shared between VLRF and VLRA, but are not shared by any other VLRs. This unique pattern suggests that these genes, following a duplication event, have evolved under similar immunological pressures. However, the non-shared N-terminal and C-terminal coding regions of germline VLRA and VLRF are distinct, with the C-terminal regions, encoding the 3’ LRRCT and stalk, likely conferring specialized roles unique to VLRA or VLRF in lamprey immunity.
In conclusion, the identification of VLRF as an additional member of the VLR family highlights the complexity and diversification of T-like lymphocyte receptors in jawless vertebrates. The findings demonstrate that VLRF, along with other VLRs expressed in T-like cells, enhances the diversity of the lamprey immune repertoire through sophisticated genetic rearrangements involving LRR-encoding donor cassettes from both cis and trans configurations. These insights lay the groundwork for future studies aimed at elucidating the specialized functions of VLRF-expressing cells and their broader significance in the evolutionary emergence of T cell–mediated immunity in vertebrates.
Methods
Animals
Sea lamprey and European brook lamprey larvae (8–15 cm in length) were purchased from local suppliers and maintained in sand-lined aquariums at 18 °C. Parasitic adult lampreys (46–53 cm in length) were obtained from the US Geological Survey Hammond Bay Biological Station and housed in a 400-liter tank at 16 °C. Prior to all procedures, the animals were thoroughly anesthetized and sacrificed with buffered MS222 (Syndel) at a concentration of 1 g/L. All experiments were conducted in compliance with the relevant guidelines and regulations under approval from the Institutional Animal Care and Use Committee at Emory University, as well as the Review Committee of the Max Planck Institute.
Genome and transcriptome analyses
Similarity searches were conducted using BLASTN59 and TBLASTN60 against lamprey and hagfish genome and transcriptome datasets available through NCBI and SIMRbase (https://simrbase.stowers.org/). The analysis encompassed genomic sequences from six lamprey species (Petromyzon marinus, Lampetra planeri, Lethenteron japonicum, Entosphenus tridentatus, Geotria australis, Mordacia mordax) and three hagfish species (Eptatretus burgeri, Eptatretus atami, Myxine glutinosa), as detailed in Supplementary Table 3. To identify potential genomic donor cassettes, BLASTN searches were performed as described previously28,29 using the sea lamprey genome assembly (kPetMar1). To examine patterns of donor cassette usage and enable comparative analyses among lamprey VLRs, mature sequences were analyzed, comprising 100 VLRA (accession numbers KJ716237-KJ716286, EF094770-EF094819), 100 VLRB (EF094560-EF094628, EF464170-EF464202), 100 VLRC (KC244050-KC244109, KJ649525-KJ649564), 60 VLRD/VLRE (OQ595111- OQ595170), and 100 VLRF (PQ159753-PQ159852) sequences from the sea lamprey. For the analysis of hagfish VLRs, we used a dataset comprising mature sequences of 64 VLRA (KF314046-KF314109), 67 VLRB (AY965535-AY965571, AY965579-AY965612), and 100 VLRC (AY964809-AY964931) sequences from Eptatretus stoutii, all retrieved from the NCBI database.
Sequence alignment and phylogenetic analysis
Sequence alignments were performed using the CLUSTALW algorithm61, with additional manual inspection to ensure accuracy. Phylogenetic trees were generated using the neighbor-joining method implemented in MEGA software (version 11), employing the pairwise deletion option for gap treatment62,63. The robustness of the phylogenetic tree was evaluated through bootstrap analysis with 1000 replications, providing statistical support for branching patterns.
Transmembrane domain and 3D structure prediction
Transmembrane domains were predicted using the TMHMM64 HMMTOP65, and Phobius66 software, while leucine-rich repeat (LRR) domains were identified via the SMART sequence analysis tool with the addition of PFAM models and manual inspection. Three-dimensional structural predictions were performed using AlphaFold238,39. The resulting models were visualized using PyMOL software (The PyMOL Molecular Graphics System, Version 2.0, Schrödinger, LLC), allowing for detailed structural analysis.
Quantitative real-time PCR
Tissues from adult and larval lamprey were harvested, and total RNA was extracted using a Qiagen RNeasy Mini Kit (Cat. No. 74104) with on-column DNA digestion. First-strand cDNA synthesis was performed using random hexamer primers with the Superscript IV Reverse Transcriptase kit (Invitrogen). RNA was subsequently digested with E. coli RNase H. Quantitative real-time PCR (qPCR) was conducted using SYBR Green on a 7900HT ABI Prism thermocycler (Applied Biosystems). Data were analyzed using one-way repeated measures ANOVA via GraphPad Prism software, with expression levels normalized to β-actin. All primer sequences used for this analysis are provided in Supplementary Table 1.
Flow cytometry
White blood cells were isolated from sea lamprey blood for analysis by immunofluorescence flow cytometry following the protocol described previously33,35. Briefly, Percoll-separated leukocytes from lamprey blood were stained with specific antibodies, including mouse anti-VLRB monoclonal antibody (4C4), mouse anti-VLRC monoclonal antibody (3A5), and mouse anti-VLRA monoclonal antibody (9A3). Cells were gated based on forward scatter-A (FSC-A) vs. side scatter-A (SSC-A) for lymphocyte populations, FSC-A vs. FSC-H for singlets, and exclusion of dead cells using negative LIVE/DEAD Aqua (Invitrogen) staining. Flow cytometric analyses were performed using a MACSQuant Analyzer (Miltenyi Biotec), and VLRA+, VLRB+, VLRC+, VLR triple-negative (TN) cells were sorted using a BD FACS Aria II (BD Biosciences) for subsequent PCR analysis. The purity of sorted populations exceeded 90%.
Single cell suspensions of transfected HEK 293 cells were stained with anti-HA antibodies conjugated to Alexa488 (clone 16B12) (Biolegend) and LIVE/DEAD Aqua (Invitrogen) viability dye. Flow cytometric analyses were performed using a MACSQuant Analyzer (Miltenyi Biotec) and FlowJo analysis software (BD).
Immunoblotting
Two VLRF cDNA constructs, excluding the native lamprey signal peptide, were cloned into a mammalian expression vector encoding a human signal peptide and an N-terminal hemagglutinin (HA) tag. One construct includes three LRRV domains, while the other has four. Transfected cells were selected using G-418 antibiotic. Whole cell lysates from transfected HEK 293 cells were prepared by adding 4x Laemmli sample buffer, followed by ultracentrifugation at 100,000 rpm to pellet DNA. The lysates were reduced with dithiothreitol (DTT) and subsequently heat-denatured. Protein samples were separated via SDS-PAGE using 4–16% gradient Bis-Tris gels (Invitrogen) and transferred onto polyvinylidene fluoride (PVDF) membranes (GE Healthcare). The membranes were blocked with 3% bovine serum albumin in Tris-buffered saline containing 0.1% Tween-20 (TBS-T), and incubated overnight at 4 °C with anti-HA antibodies (Clone 16B12). After washing, membranes were treated with goat anti-mouse IgG horseradish peroxidase (HRP)-conjugated secondary antibodies (Southern Biotech), diluted 1:5000, for detection. Unless otherwise specified, all antibodies were diluted 1:1000 in blocking buffer. Protein bands were visualized using a chemiluminescent substrate (Amersham) and imaged with a ChemiDoc system (Bio-Rad).
Fluorescence in situ hybridization (FISH)
FISH was conducted using the RNAscope Multiplex Fluorescent V2 Assay kit (Advanced Cell Diagnostics, ACD) according to the manufacturer’s protocol. Custom-designed probes specific for VLRA and VLRF were obtained from ACD. Larval and adult lamprey tissues were embedded in optimal cutting temperature (OCT) compound, cryosectioned, and fixed in 4% paraformaldehyde (PFA) in PBS at 4 °C for 1 h. Sections were rinsed twice with PBS and progressively dehydrated using increasing concentrations of ethanol. Pre-treatment involved incubation with hydrogen peroxide and Protease Plus (ACD) for 10 min each at room temperature. Hybridization with the VLRA, VLRF and CDA1 probes was performed at 40 °C for 2 h. Signal amplification was achieved through sequential incubations with amplification reagents. Signals were detected using TSA Vivid Fluorophore 520 (CDA1; 1:6000), TSA Vivid Fluorophore 570 (VLRF; 1:3000) and TSA Vivid Fluorophore 650 (VLRA; 1:3000) (ACD). Sections were counterstained with DAPI and mounted using ProLong Gold Antifade Reagent (ThermoFisher). Imaging was performed using a Leica SP8 MP upright confocal multiphoton microscope or a BioTek Lionheart FX widefield microscope.
Single-cell RNA sequencing
Single-cell RNA sequencing (scRNA-seq) was performed on white blood cells from a single sea lamprey ammocoete larva. Blood was collected in 0.67x PBS, following previously described protocols33,35 and WBCs were isolated using a 55% Percoll cushion. Gene expression libraries were prepared using the 10x Genomics Chromium Controller, targeting 10,000 cells, and sequenced to an average depth of 50,000 paired reads per cell on a NovaSeq6000 platform. Gene expression matrices were generated using the Cell Ranger v8.0.0 pipeline, aligned to the sea lamprey reference genome (kPetMar1.pri), annotation release 100 (GCF_010993605.1), with the addition of custom-edited VLR germline annotations described below. A total of 10,612 cells were identified in this analysis. Data analysis and visualization were conducted in R version 4.4.1, utilizing the Seurat package (version 5.1.0)67,68. A Seurat object was generated from the filtered_gene_bc_matrices. The SCTransform package69 was used. Dims = 1:30 were used for FindNeighbors and RunUMAP.
VLRB and VLRC genes were mapped using NCBI annotations for the untranslatable germline gene transcripts (LOC116944516 and LOC116939838, respectively). VLRA, VLRD, VLRE, and VLRF were mapped using custom-annotated models: VLRA (NC_046126.1, 816799-817022, forward), VLRD (NC_046143.1, 1839633-1839755, forward), VLRE (NC_046143.1, 1876494-1876725, forward), and VLRF (NC_046126.1, 1613417-1614240 + 1620121-1620262, reverse complement).
To identify cells associated with assembled VLR sequences, we searched the raw 389,328,350 scRNA-seq read pairs with BLASTN using as queries portions of VLRA, VLRB, VLRC and VLRF sequences that are invariant between germline and assembled VLRs. Sequence matches were filtered to include those in the expected orientations (forward for R1 and reverse complement for R2), with matches of ≥ 30 nucleotides at ≥ 90% nucleotide identity. For R1 reads, retained matches were restricted to nucleotides 40–101 to exclude overlaps with the cell barcode, UMI, and template switch oligo (TSO) sequences, or artifactual R1 reads missing these regions. Matches were further restricted to the invariant coding regions 5′ of the intervening sequence for R1 reads and 3′ of the intervening region for R2 reads. Paired R2 reads corresponding to R1 matches and paired R1 reads corresponding to R2 matches were then retrieved. Sequence pairs were further filtered to isolate sequences with R1 cell barcodes matching those used in the Seurat analysis on which the data was overlaid. For R1 BLASTN matches, we identified R2 sequences that did not match or only partially matched the germline VLR gene. Similarly, for R2 matches, we identified paired R1 sequences (nucleotides 40–101) that did not match or only partially matched the germline VLR gene. These sequences were found to correspond to assembled central regions of the VLR coding sequence. Cell barcodes corresponding to these assembled sequence pairs were then mapped to the UMAP plot using DimPlot and cell highlighting.
Statistics and reproducibility
Statistical analyses were performed using GraphPad Prism (v10.2.2) and the R package (v4.4.1). The number of independent biological replicates and technical details are provided in the corresponding figure legends. Experiments were repeated at least twice to ensure reproducibility.
Reporting summary
Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.
Data availability
All data supporting the findings of this study are provided within the main text and Supplementary Information. The VLRF sequences generated in this study from sea lampreys (GenBank accession numbers PQ159753-PQ159852) and European brook lampreys (GenBank accession numbers PQ352468–PQ352477) are deposited in the NCBI GenBank database. scRNAseq data is deposited in the Sequence Read Archive (SRA) database (Accession number: PRJNA1188601) in NCBI. Source data are provided in this paper.
References
Holland, N. D. & Chen, J. Origin and early evolution of the vertebrates: new insights from advances in molecular biology, anatomy, and palaeontology. Bioessays 23, 142–151 (2001).
Janvier, P. Early jawless vertebrates and cyclostome origins. Zool. Sci. 25, 1045–1056 (2008).
Kuratani, S. Evo-devo studies of cyclostomes and the origin and evolution of jawed vertebrates. Curr. Top. Dev. Biol. 141, 207–239 (2021).
Marletaz, F. et al. The hagfish genome and the evolution of vertebrates. Nature 627, 811–820 (2024).
Timoshevskaya, N. et al. An improved germline genome assembly for the sea lamprey Petromyzon marinus illuminates the evolution of germline-specific chromosomes. Cell Rep. 42, 112263 (2023).
Yu, D. et al. Hagfish genome elucidates vertebrate whole-genome duplication events and their evolutionary consequences. Nat. Ecol. Evol. 8, 519–535 (2024).
Boehm, T. Understanding vertebrate immunity through comparative immunology. Nat. Rev. Immunol. 25, 141–152 (2024).
Cooper, M. D. & Alder, M. N. The evolution of adaptive immune systems. Cell 124, 815–822 (2006).
Flajnik, M. F. & Kasahara, M. Origin and evolution of the adaptive immune system: genetic events and selective pressures. Nat. Rev. Genet. 11, 47–59 (2010).
Litman, G. W., Cannon, J. P. & Dishaw, L. J. Reconstructing immune phylogeny: new perspectives. Nat. Rev. Immunol. 5, 866–879 (2005).
Litman, G. W., Cannon, J. P. & Rast, J. P. New insights into alternative mechanisms of immune receptor diversification. Adv. Immunol. 87, 209–236 (2005).
Das, S. et al. Evolution of two distinct variable lymphocyte receptors in lampreys: VLRD and VLRE. Cell Rep. 42, 112933 (2023).
Han, B. W., Herrin, B. R., Cooper, M. D. & Wilson, I. A. Antigen recognition by variable lymphocyte receptors. Science 321, 1834–1837 (2008).
Kasamatsu, J. et al. Identification of a third variable lymphocyte receptor in the lamprey. Proc. Natl. Acad. Sci. USA 107, 14304–14308 (2010).
Kishishita, N. et al. Regulation of antigen-receptor gene assembly in hagfish. EMBO Rep. 11, 126–132 (2010).
Li, J., Das, S., Herrin, B. R., Hirano, M. & Cooper, M. D. Definition of a third VLR gene in hagfish. Proc. Natl. Acad. Sci. USA 110, 15013–15018 (2013).
Nagawa, F. et al. Antigen-receptor genes of the agnathan lamprey are assembled by a process involving copy choice. Nat. Immunol. 8, 206–213 (2007).
Pancer, Z. et al. Somatic diversification of variable lymphocyte receptors in the agnathan sea lamprey. Nature 430, 174–180 (2004).
Pancer, Z. et al. Variable lymphocyte receptors in hagfish. Proc. Natl. Acad. Sci. USA 102, 9224–9229 (2005).
Rogozin, I. B. et al. Evolution and diversification of lamprey antigen receptors: evidence for involvement of an AID-APOBEC family cytosine deaminase. Nat. Immunol. 8, 647–656 (2007).
Collins, B. C. et al. Crystal structure of an anti-idiotype variable lymphocyte receptor. Acta Crystallogr. F Struct. Biol. Commun. 73, 682–687 (2017).
Deng, L. et al. A structural basis for antigen recognition by the T cell-like lymphocytes of sea lamprey. Proc. Natl. Acad. Sci. USA 107, 13408–13413 (2010).
Herrin, B. R. et al. Structure and specificity of lamprey monoclonal antibodies. Proc. Natl. Acad. Sci. USA 105, 2040–2045 (2008).
Holland, S. J. et al. Selection of the lamprey VLRC antigen receptor repertoire. Proc. Natl. Acad. Sci. USA 111, 14834–14839 (2014).
Kanda, R. et al. Crystal structure of the lamprey variable lymphocyte receptor C reveals an unusual feature in its N-terminal capping module. PLoS ONE 9, e85875 (2014).
Alder, M. N. et al. Antibody responses of variable lymphocyte receptors in the lamprey. Nat. Immunol. 9, 319–327 (2008).
Alder, M. N. et al. Diversity and function of adaptive immune receptors in a jawless vertebrate. Science 310, 1970–1973 (2005).
Das, S. et al. Organization of lamprey variable lymphocyte receptor C locus and repertoire development. Proc. Natl. Acad. Sci. USA 110, 6043–6048 (2013).
Das, S. et al. Genomic donor cassette sharing during VLRA and VLRC assembly in jawless vertebrates. Proc. Natl. Acad. Sci. USA 111, 14828–14833 (2014).
Bajoghli, B. et al. A thymus candidate in lampreys. Nature 470, 90–94 (2011).
Morimoto, R. et al. Cytidine deaminase 2 is required for VLRB antibody gene assembly in lampreys. Sci. Immunol. 5, https://doi.org/10.1126/sciimmunol.aba0925 (2020).
Das, S. et al. Evolution of two prototypic T cell lineages. Cell Immunol. 296, 87–94 (2015).
Guo, P. et al. Dual nature of the adaptive immune system in lampreys. Nature 459, 796–801 (2009).
Hirano, M., Das, S., Guo, P. & Cooper, M. D. The evolution of adaptive immunity in vertebrates. Adv. Immunol. 109, 125–157 (2011).
Hirano, M. et al. Evolutionary implications of a third lymphocyte lineage in lampreys. Nature 501, 435–438 (2013).
Huang, Y. et al. Discovery of an unconventional lamprey lymphocyte lineage highlights divergent features in vertebrate adaptive immune system evolution. Nat. Commun. 15, 7626 (2024).
Kajava, A. V. Structural diversity of leucine-rich repeat proteins. J. Mol. Biol. 277, 519–527 (1998).
Jumper, J. et al. Highly accurate protein structure prediction with AlphaFold. Nature 596, 583–589 (2021).
Mirdita, M. et al. ColabFold: making protein folding accessible to all. Nat. Methods 19, 679–682 (2022).
Kim, H. M. et al. Structural diversity of the hagfish variable lymphocyte receptors. J. Biol. Chem. 282, 6726–6732 (2007).
Kirchdoerfer, R. N. et al. Variable lymphocyte receptor recognition of the immunodominant glycoprotein of Bacillus anthracis spores. Structure 20, 479–486 (2012).
Velikovsky, C. A. et al. Structure of a lamprey variable lymphocyte receptor in complex with a protein antigen. Nat. Struct. Mol. Biol. 16, 725–730 (2009).
Das, S. et al. Evolution of variable lymphocyte receptor B antibody loci in jawless vertebrates. Proc. Natl. Acad. Sci. USA 118, https://doi.org/10.1073/pnas.2116522118 (2021).
Smith, J. J. et al. The sea lamprey germline genome provides insights into programmed genome rearrangement and vertebrate evolution. Nat. Genet. 50, 270–277 (2018).
Kuraku, S. & Kuratani, S. Time scale for cyclostome evolution inferred with a phylogenetic diagnosis of hagfish and lamprey cDNA sequences. Zool. Sci. 23, 1053–1064 (2006).
Brownstein, C. D. & Near, T. J. Phylogenetics and the Cenozoic radiation of lampreys. Curr. Biol. 33, 397–404 e393 (2023).
Amemiya, C. T., Saha, N. R. & Zapata, A. Evolution and development of immunological structures in the lamprey. Curr. Opin. Immunol. 19, 535–541 (2007).
Boehm, T. & Bleul, C. C. The evolutionary history of lymphoid organs. Nat. Immunol. 8, 131–135 (2007).
Ardavin, C. F. & Zapata, A. Ultrastructure and changes during metamorphosis of the lympho-hemopoietic tissue of the larval anadromous sea lamprey Petromyzon marinus. Dev. Comp. Immunol. 11, 79–93 (1987).
Zapata, A. G. et al. The relevance of cell microenvironments for the appearance of lympho-haemopoietic tissues in primitive vertebrates. Histol. Histopathol. 10, 761–778 (1995).
Davis, T. L. et al. Autoregulation by the juxtamembrane region of the human ephrin receptor tyrosine kinase A3 (EphA3. Structure 16, 873–884 (2008).
Kornilov, F. D. et al. The architecture of transmembrane and cytoplasmic juxtamembrane regions of Toll-like receptors. Nat. Commun. 14, 1503 (2023).
Lemmon, M. A. & Schlessinger, J. Cell signaling by receptor tyrosine kinases. Cell 141, 1117–1134 (2010).
Baer, R., Chen, K. C., Smith, S. D. & Rabbitts, T. H. Fusion of an immunoglobulin variable gene and a T cell receptor constant gene in the chromosome 14 inversion associated with T cell tumors. Cell 43, 705–713 (1985).
Lipkowitz, S., Stern, M. H. & Kirsch, I. R. Hybrid T cell receptor genes formed by interlocus recombination in normal and ataxia-telangiectasis lymphocytes. J. Exp. Med. 172, 409–418 (1990).
Lomvardas, S. et al. Interchromosomal interactions and olfactory receptor choice. Cell 126, 403–413 (2006).
Mielczarek, O. et al. Intra- and interchromosomal contact mapping reveals the Igh locus has extensive conformational heterogeneity and interacts with B-lineage genes. Cell Rep. 42, 113074 (2023).
Waters, P. D. et al. Microchromosomes are building blocks of bird, reptile, and mammal chromosomes. Proc. Natl. Acad. Sci. USA 118, https://doi.org/10.1073/pnas.2112494118 (2021).
Altschul, S. F., Gish, W., Miller, W., Myers, E. W. & Lipman, D. J. Basic local alignment search tool. J. Mol. Biol. 215, 403–410 (1990).
Gertz, E. M., Yu, Y. K., Agarwala, R., Schaffer, A. A. & Altschul, S. F. Composition-based statistics and translated nucleotide searches: improving the TBLASTN module of BLAST. BMC Biol. 4, 41 (2006).
Thompson, J. D., Gibson, T. J. & Higgins, D. G. Multiple sequence alignment using ClustalW and ClustalX. Curr. Protoc. Bioinform. https://doi.org/10.1002/0471250953.bi0203s00 (2002).
Saitou, N. & Nei, M. The neighbor-joining method: a new method for reconstructing phylogenetic trees. Mol. Biol. Evol. 4, 406–425 (1987).
Tamura, K., Stecher, G. & Kumar, S. MEGA11: Molecular Evolutionary Genetics Analysis Version 11. Mol. Biol. Evol. 38, 3022–3027 (2021).
Krogh, A., Larsson, B., von Heijne, G. & Sonnhammer, E. L. Predicting transmembrane protein topology with a hidden Markov model: application to complete genomes. J. Mol. Biol. 305, 567–580 (2001).
Tusnady, G. E. & Simon, I. The HMMTOP transmembrane topology prediction server. Bioinformatics 17, 849–850 (2001).
Kall, L., Krogh, A. & Sonnhammer, E. L. A combined transmembrane topology and signal peptide prediction method. J. Mol. Biol. 338, 1027–1036 (2004).
Hao, Y. et al. Dictionary learning for integrative, multimodal and scalable single-cell analysis. Nat. Biotechnol. 42, 293–304 (2024).
Satija, R., Farrell, J. A., Gennert, D., Schier, A. F. & Regev, A. Spatial reconstruction of single-cell gene expression data. Nat. Biotechnol. 33, 495–502 (2015).
Choudhary, S. & Satija, R. Comparison and evaluation of statistical error models for scRNA-seq. Genome Biol. 23, 27 (2022).
Acknowledgements
This study was supported by National Institutes of Health Grants R01AI072435, R35GM122591, and GM108838, the Georgia Research Alliance, National Science Foundation Grant IOS-1755418, Emory University Research Committee Grant 00137492, the Max Planck Society, and the European Research Council European Research Council (ERC) under the European Union’s Seventh Framework Program (FP7/2007-2013), ERC grant agreement 323126. We thank Mr. Evan Dessasau and Ms. Julia Rose from the ENPRC Histology Core, Dr. Deepa Kodandera from the Molecular Pathology Core, and the Emory University Integrated Cellular Imaging Core for their assistance with confocal imaging, as well as R.E. Karaffa II and K. Fife from the Emory University School of Medicine Flow Cytometry Core for help with cell sorting. We are also grateful to the Stowers Institute for Medical Research for providing access to the genomic and transcriptomic data of multiple lamprey species from SIMRbase. Single-cell sequencing services were provided by the Emory NPRC Genomics Core, which is supported in part by NIH P51 OD011132.
Author information
Authors and Affiliations
Contributions
S.D., F.F-I., J.P.R., M.D.C., and T.B. contributed to the experimental design. S.D., F.F-I., J.P.R., M.H., R.M., B.B.A., Y.W., W.L., T.B., and M.D.C. performed the experiments, supervised the study, and/or analyzed the data. S.D., J.P.R., F.F-I., T.B., and M.D.C. wrote the manuscript. All authors reviewed the manuscript and provided feedback.
Corresponding authors
Ethics declarations
Competing interests
The authors declare no competing interests.
Peer review
Peer review information
Nature Communications thanks Guobing Chen and the other anonymous reviewer(s) for their contribution to the peer review of this work. A peer review file is available.
Additional information
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary information
Source data
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.
About this article
Cite this article
Das, S., Fontenla-Iglesias, F., Hirano, M. et al. Variable lymphocyte receptor F is generated via somatic diversification and expressed by lamprey T-like cells. Nat Commun 16, 6503 (2025). https://doi.org/10.1038/s41467-025-61187-1
Received:
Accepted:
Published:
DOI: https://doi.org/10.1038/s41467-025-61187-1