Fig. 1: Overview of COVID-19 data compendium stored in VDJdb.
From: VDJdb in the pandemic era: a compendium of T cell receptors specific for SARS-CoV-2

a, General pipeline used to acquire and store COVID-19 TCR specificity data. SARS-CoV-2 epitopes of interest are selected and used to construct MHC multimers, which are in turn used to enrich T cells and select T cells specific to a given epitope; those T cells are then subjected to a conventional TCR repertoire sequencing procedure (part 1). The data on TCR receptor sequences and their cognate epitopes is acquired independently by proficient laboratories around the globe; pie chart sizes reflect the number of TCR specificity records, with chart colors representing distinct epitopes (part 2). Data is processed, curated and stored in the VDJdb, which provides means to browse the COVID-19 compendium and annotate novel TCR sequences of unknown specificity (part 3). Maps are adapted (see https://github.com/antigenomics/vdjdb-db/blob/master/summary/vdjdb_summary.Rmd for code) from open-source R package “maps” released under GPL-2 license (https://CRAN.R-project.org/package=maps), copyright 2015–2022 VDJdb Developers and reproduced with permission of VDJdb Developers. b, Numbers of TCR specificity records for SARS-CoV-2 epitopes presented by various HLAs. Correspondence is shown using an alluvial plot with bands colored by epitopes. First three letters are used to code epitopes; only epitopes with ≥10 records are shown; band widths represent log-scaled number of records. c, Comparing TCR repertoires specific for the HLA-A*02-restricted YLQ epitope from SARS-CoV-2 obtained by different laboratories using sequence similarity map, with each dot representing a unique CDR3 sequence (top). Dot locations are based on CDR3 sequence similarity graphs generated using the TCRNET algorithm (see Supplementary Methods). Each dot is colored according to the parental dataset (key). Large red dots represent CDR3 sequences that were identified in multiple datasets. Left, TCR α chains; right, TCR β chains. Labels highlight TCRs that were successfully used to refold TCR–peptide–MHC complexes6. Sequence motif logos for clusters from the similarity map are shown below. Two recurring motifs each, CVVNXXDKIIF and CVVNXXDDMRF for TCRα and CAS-NTGELFF and CASSXDIEAFF for TCRβ, were shared among datasets (“Multi-lab” means shared across all laboratories).