Fig. 2: pmIG reference database introduces erroneous mutations. | Genes & Immunity

Fig. 2: pmIG reference database introduces erroneous mutations.

From: Commentary on Population matched (pm) germline allelic variants of immunoglobulin (IG) loci: relevance in infectious diseases and vaccination studies in human populations

Fig. 2

Repertoire analysis was performed on a naïve B-cell cohort (not expected to carry somatic mutations) of 98 individuals reported by Gidoni et al. (SRA: PRJEB26509) [6]. The repertoires were sequenced using the 5’RACE protocol and pre-processing was done as described in Gidoni et al. [6]. For the downstream analysis, repertoires were initially aligned (IgBLAST version 1.16.0) with the IMGT reference (March 29, 2021). Non-functional sequences, and functional sequences that were not full-length, were not assigned to a single V-gene unambiguously, or were assigned to a V-gene not present in the pmIG database were removed. The remaining 70% of functional sequences were then aligned with the pmIG reference database (downloaded from https://pmtrig.lumc.nl/ on July 15th, 2021) and the repertoires were compared. A For each repertoire, the mean mutation count was calculated using each reference database. Each dot represents the mean mutation count and each boxplot represents the variation within the cohort for each of the reference databases. B B.1 Each dot is the median of the mean individual mutation frequency per gene. The X axis is based on the IMGT reference and the Y axis is based on the pmIG reference. Red labels represent the genes with a duplicated copy in the chromosome (e.g., IGHV1-69/IGHV1-69D). B.2 Each dot represents an individual with IGHV4-4*02 in their IMGT data. These calls were matched via sequence IDs to calls in the matching pmIG datasets, and different gene annotations are shown with different colors. Where multiple allele calls were made, these calls are separated in the legend by vertical bars. The X axis shows the annotations to the two datasets. The Y axis shows the mean mutation numbers for the sequences assigned to the IGHV4-4*02 allele and to their matching calls in the pmIG dataset. C The count of alleles that are represented in the cohort for each of the reference databases.

Back to article page