Fig. 5: Integration site gene hotspots across tissues.
From: Early pandemic HIV-1 integration site preferences differ across anatomical sites

a UpSet plot displaying the intersection of gene sets hosting HIV-1 integration sites located within 100 bp of each other (“gene hotspots”) across different tissues. The horizontal bars represent the total number of gene hotspots identified in each tissue, while the vertical bars indicate the number of shared gene hotspots among tissue-specific sets. Black dots and connecting lines denote specific intersections, with single dots representing unique hotspots within one tissue and multiple connected dots indicating shared hotspots across tissues. The dataset was filtered to retain only genes with integration sites meeting the 100 bp proximity threshold, and intersection sizes reflect the degree of conservation or tissue specificity of integration patterns. b Tissue-specific gene hotspots hosting five or more integration sites per gene are listed. These represent genes not targeted by any of the other tissues. c All hotspot genes hosting ten or more integration sites (‘gene super-hotspots’) were filtered for each tissue and compared to each other to identify genes highly targeted by four or more tissues. The ribbons emerging from each tissue in the Circos plot connect to the genes shared by the tissues. d–h Box-and-Whisker plots display the distribution of gene expression levels (transcripts per million, TPM) for two groups of genes: those hosting six or more HIV-1 integration sites (high-frequency group) and those hosting 1–5 integration sites (low-frequency group). Gene expression data were obtained from RNA-seq datasets available in the Human Protein Atlas for the brain, colon, duodenum, esophagus, and stomach. The central line within each box represents the median expression level, the box spans the interquartile range (IQR), and whiskers extend to the minimum and maximum. Statistical significance was assessed using a Mann–Whitney U test (Wilcoxon Rank–Sum test) to determine whether gene expression levels significantly differ between the two integration frequency groups across tissues.