Early pandemic HIV-1 integration site preferences differ across anatomical sites

Kohio, Hinissan P.; Ajoge, Hannah O.; Barua, Emile A.; Vajaria, Neel R.; Wu, Isaac K. F.; Coleman, Macon D.; Tom, Sean K.; van der Meer, Frank; Gill, John; Church, Deirdre; Beck, Paul; Power, Christopher; van Marle, Guido; Barr, Stephen D.

doi:10.1038/s43856-025-01146-5

Download PDF

Article
Open access
Published: 06 October 2025

Early pandemic HIV-1 integration site preferences differ across anatomical sites

Hinissan P. Kohio¹^na1,
Hannah O. Ajoge¹^na1,
Emile A. Barua¹,
Neel R. Vajaria¹,
Isaac K. F. Wu ORCID: orcid.org/0009-0009-8312-3121¹,
Macon D. Coleman¹,
Sean K. Tom¹,
Frank van der Meer ORCID: orcid.org/0000-0003-3785-9639^2,3,
John Gill^3,4,5,
Deirdre Church^3,4,5,
Paul Beck^3,5,
Christopher Power⁶,
Guido van Marle ORCID: orcid.org/0000-0002-5148-6229^2,3,4 &
…
Stephen D. Barr ORCID: orcid.org/0000-0002-0028-7430¹

Communications Medicine volume 5, Article number: 405 (2025) Cite this article

954 Accesses
101 Altmetric
Metrics details

Subjects

Abstract

Background

HIV-1 persists in the body even when treatment suppresses viral replication. This persistence is due in part to the virus integrating into the DNA of infected cells. While it is known that HIV-1 can integrate into many different tissues, it remains unclear whether integration patterns differ across anatomical sites. This study investigated how the location and characteristics of HIV-1 integration sites vary across distinct tissues in people living with HIV-1 subtype B during the early years of the pandemic, before modern treatment was widely available.

Methods

Integration site data were obtained from matched samples from the esophagus, blood, stomach, duodenum, and colon, and from unmatched brain tissue. We evaluated how frequently the virus integrated near different genomic features, including gene regions, repetitive elements, and predicted DNA structures, and compared integration patterns across tissues and individuals.

Results

We show that integration site patterns differ by tissue. In brain tissue, HIV integrates less frequently into genes and more frequently into specific repetitive elements and accessible regions of DNA. We also find that integration near unusual DNA shapes varies by tissue, and that certain integration hotspots are shared while others are unique. Genes involved in HIV-1-related diseases are frequently targeted across tissues.

Conclusions

This study reveals that HIV-1 integration patterns are shaped by the tissue environment. These findings suggest that the long-term persistence of HIV-1 depends in part on tissue-specific integration site features, with potential implications for disease risk and treatment strategies.

Plain Language Summary

HIV can stay in the body for life by hiding inside the DNA of infected cells. This makes it hard to completely remove the virus, even with strong treatment. In this study, researchers explored whether HIV hides in different ways depending on where in the body the infected cells are found. They examined tissues from several parts of the body from people living with HIV before modern treatments were available. The study found that HIV inserts its genetic material in different spots depending on the tissue. For example, in the brain, the virus avoids genes and hides in less active parts of the DNA. These findings may help scientists understand why HIV acts differently in different areas of the body, which could improve future treatments.

Heritability of the HIV-1 reservoir size and decay under long-term suppressive ART

Article Open access 02 November 2020

Expression of type I interferon-associated genes at antiretroviral therapy interruption predicts HIV virological rebound

Article Open access 10 January 2022

Antiretroviral APOBEC3 cytidine deaminases alter HIV-1 provirus integration site profiles

Article Open access 10 January 2023

Introduction

The widespread use of antiretroviral therapy (ART) has improved the quality of life for individuals living with HIV-1. While ART effectively suppresses plasma viremia to undetectable levels, it is not curative due to the persistence of HIV-1 reservoirs—cells infected with replication-competent HIV-1 that persist despite long-term therapy and can cause viral rebound if treatment is interrupted^1,2,3. A major component of this reservoir is the HIV-1 latent reservoir, defined as quiescent cells carrying an integrated copy of the HIV-1 provirus that does not express viral transcripts or proteins. These latent reservoirs are found in various cell types and anatomical sites where replication-competent virus can persist long after active viral replication has been controlled^4,5,6. During early infection, HIV-1 preferentially integrates into transcriptionally active regions of the genome to maximize proviral expression and viral dissemination. However, this strategy comes with risks: the production of viral proteins can trigger immune-mediated cell clearance or virus-induced cytopathic effects. To evade immune detection and promote long-term persistence, HIV-1 often integrates into transcriptionally inactive regions of the host genome, limiting viral gene expression and facilitating immune evasion. These transcriptionally repressive genomic regions are characterized by distinct features. For instance, lamina-associated domains (LADs) create a chromatin environment that is tightly linked to the nuclear periphery, suppressing transcription^7,8. Short interspersed nuclear elements (SINEs) and other transposable elements contribute to gene silencing through repressive chromatin marks^9,10. Additionally, non-B DNA structures—such as G-quadruplex (G4) DNA, cruciform DNA, triplex DNA, and Z-DNA—can modulate transcription by interfering with gene regulatory mechanisms^{11,12,13,14,15,16,17,18,19,20,21}. Recent studies have shown that in both elite controllers and individuals on long-term ART, most integrated HIV-1 proviruses are located in these transcriptionally repressive genomic features, reinforcing the role of integration site selection in viral persistence^{8,22,23,24,25,26,27,28,29,30}.

Despite extensive research on blood-based HIV-1 reservoirs, integration site profiling across diverse anatomical tissues remains limited. HIV-1-infected cells persist in a wide range of tissues, including the brain, lungs, kidneys, liver, adipose tissue, lymphoid organs, gastrointestinal tract, male and female genitourinary systems, and bone marrow³¹. While CD4 + T cells serve as the primary latent reservoir^{1,32,33,34,35}, macrophages and organ-specific cells—such as epithelial cells, microglia, astrocytes, and podocytes—also harbor integrated virus^36,37,38. Myeloid cells, in particular, have been increasingly recognized as important reservoirs³⁹. A recent study assessed HIV-1 integration sites in sorted memory, tissue-resident, and follicular helper CD4⁺ T cell subsets from paired peripheral blood mononuclear cells (PBMCs) and lymphoid tissues, including inguinal, cervical, and tonsil samples⁴⁰. The findings revealed that HIV-1 reservoirs exhibited tissue compartmentalization, while integration site patterns remained largely conserved across anatomical sites, cell subsets, and stages of infection, with minimal overlap between distinct tissues.

In this study, we characterize the genomic landscape of HIV-1 integration across multiple anatomical sites using a unique tissue biobank from PLWH infected with subtype B during the early years of the HIV/AIDS pandemic. Our analysis includes matched samples from the esophagus, blood, stomach, duodenum, and colon, as well as unmatched brain tissue. We find that integration site patterns vary across tissues, with brain tissue exhibiting reduced integration within genes and increased enrichment in repetitive and accessible genomic regions. We observe that integration hotspots are both shared and tissue-specific, and that HIV-1 frequently targets genes associated with HIV-related disease. While the replication competence of the proviruses cannot be determined without full-length sequencing, our findings highlight how tissue-specific integration site selection may contribute to the long-term persistence of HIV-1 and inform strategies aimed at targeting viral reservoirs.

Methods

Sex as a biological variable

Participant samples were obtained from the cryobank without consideration of sex as a biological variable. All samples were derived from males.

Clinical samples

Esophagus, PBL/PBMC, stomach, duodenum and colon tissue for this study were collected from five PLWH^41,42,43. Gastrointestinal tract biopsies were taken during the study period at the level of the esophagus, stomach, duodenum and colorectum. The biopsies taken via endoscopic procedure were shipped cryopreserved and stored at −70 °C, within 1 hour of collection. PLWH were enrolled from a cohort of HIV-1 seropositive men who have sex with men (MSM) followed at the Southern Alberta Clinic (SAC), Calgary, Alberta (Canada) between the years 1993 and 2010. Participants were prospectively followed and assessed for plasma viral load, and CD4⁺ T counts were performed for each individual during each visit. Upper and lower gastrointestinal endoscopies were performed in order to collect biopsies of tissues from the esophagus, stomach, duodenum, and colon. Samples were cryopreserved during shipment and stored at −70 °C within 1 hour of collection⁴¹. Peripheral blood leukocytes (PBLs)/PBMCs were isolated from blood and stored in liquid nitrogen⁴³. The cohort from this study was recruited prior to the introduction of highly active antiretroviral therapy (HAART)/ART at the SAC. Participants received non-suppressive monotherapy or dual therapy with the nucleoside reverse transcriptase (NRTIs) such as azidothymidine (AZT/zidovudine), dideoxyinosine (ddI) prior to the study and during the study⁴³. Four participants also received HAART/ART at a later time point in treatment. Frozen brain samples originated from a separate, unmatched cohort of HIV-1 seropositive patients not on ART and who succumbed to AIDS-defining illnesses (Supplementary Data 1).

Preparation of HIV-1 integration site library

Total genomic DNA was extracted from tissue samples using the DNeasy Blood & Tissue Kit (Qiagen). The purified genomic DNAs were processed for integration site analysis in a DNA clean room using different pipets, as previously described in detail^28,29,30. Briefly, the DNA was subjected to MseI/SacI digestion and linker ligation. Following purification, the DNA was subjected to two rounds of nested PCR using 3’ LTR- and linker-specific primers. The barcoded samples were sequenced through Illumina MiSeq using 2 × 150 bp chemistry at the London Regional Genomics Centre/Robarts Research Institute from Western University (Canada). Integration sites were determined from the sequence junction between the HIV-1 3’ LTR and human genome sequences.

Computational analysis

Each paired fastq sequencing read was quality trimmed and excluded from further analysis if the LTR-genome junction sequence did not match between the two paired reads. The HIV-1 LTR-containing fastq sequences were filtered by allowing up to a maximum of five mismatches with the reference LTR sequence. LTR sequences matching any region of the human genome (GRCh37/hg19) were discarded. Flanking human genomic sequences more than 20 nucleotides in length were used to identify integration sites using our in-house bioinformatics pipeline (Barr Lab Integration Site Identification Pipeline (BLISIP version 2.9))^28,29,30. BLISIP version 2.9 includes the following updates: bedtools (version 2.25.0), bioawk (awk version 20110810), bowtie2 (version 2.3.4.1), and restrSiteUtils (version 1.2.9). All genomic sites within each dataset that hosted two or more identical sites were collapsed into one unique site for the analysis. Sites that could not be unambiguously mapped to a single region in the genome were excluded from analysis. All non-B DNA motifs were defined according to previously established criteria⁴⁴. LADs were retrieved from https://doi.org/10.1038/nature06947⁴⁵. To account for restriction site bias in the cloning procedure during library construction, restriction enzyme site-matched random controls were independently generated for each dataset by matching each experimentally determined site with 50 random sites in silico that were constructed to be the same number of bases from the restriction site as was the experimental site⁴⁶. The integration site heatmaps were generated using our in-house Python program BLISIP Heatmap (BLISIPHA version 1.0), which calculates the fold enrichment of sites in each distance bin for each feature compared to that of the matched random control dataset.

LTR sequence heterogeneity

Raw HIV-1 LTR sequences were obtained from sequencing libraries prepared from multiple tissues and individuals. Quality control was performed to remove sequences with ambiguous bases (Ns) or deviations from the expected 29 bp length. PCR duplicates were filtered using USEARCH version 11⁴⁷. Sequences were aligned using MAFFT version 7.490 in auto mode with default gap penalties to preserve biologically relevant sequence variations⁴⁸. Pairwise nucleotide differences were calculated using a computational framework for sequence mutation analysis and visualized using NetworkX to generate a network-based representation of sequence relationships⁴⁹. Nodes represented individual LTR sequences, and edges indicated sequence differences, with edge weight and color intensity reflecting the degree of divergence. Only sequences with ≤5 nucleotide differences were connected to prevent network saturation. Mutation frequency analysis was conducted by calculating per-site mutation frequencies and comparing them to established HIV-1 reverse transcriptase (RT) mutation rates of 2 × 10⁻⁵ to 5 × 10⁻⁵ mutations per base per replication cycle^50,51, or the error rate of high-fidelity DNA polymerases of 1 × 10⁻⁶ to 1 × 10⁻⁷ errors per base per replication cycle^52,53. The expected mutation frequency per 29 bp LTR sequence was derived and compared to observed mutation rates using a chi-square goodness-of-fit test with a significance threshold of α = 0.05. The presence of APOBEC3-mediated hypermutations was evaluated using a computational framework for sequence mutation analysis. G to A transition frequencies and specific APOBEC3 target motif mutations (GG to AG and TC to TT transitions) were identified. The statistical enrichment of these motifs was assessed by comparing observed versus expected mutation frequencies, using background HIV-1 mutation rates from published studies⁵⁴.

Study approval

Ethical approval for use of human esophagus, PBL/PBMC, stomach, duodenum and colon tissue was obtained from the Conjoint Health Ethics Research Board (CHREB, protocol approval #: REB15-1941) at the University of Calgary (Calgary, Alberta, Canada). Ethical approval for use of human brain tissue was obtained from the University of Alberta Human Ethics Committee, protocol approval #: Pro00002291. Participants signed an informed consent upon enrollment.

Reporting summary

Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.

Results

Brain tissue exhibits a distinct HIV-1 integration profile with reduced gene targeting and increased SINE enrichment

Most HIV-1 integration studies have focused on PBMCs or blood-derived CD4 + T cells, with limited characterization of tissue-specific integration patterns. Given the compartmentalization of HIV-1 in vivo and the potential influence of tissue residency on integration site selection, we sought to address this knowledge gap^40,55. We analyzed integration site profiles from patient-matched tissue samples, including esophagus (165 integration sites), PBL/PBMC (161 sites), stomach (202 sites), duodenum (137 sites), and colon (119 sites) from five individuals infected with HIV-1 subtype B between 1993 and 2010, all of whom were receiving ART (Supplementary Data 1). Additionally, we examined 145 integration sites from brain tissue samples obtained from a separate, unmatched cohort of eight HIV-1 subtype B-infected individuals who were not receiving ART and had succumbed to AIDS-related illnesses. Integration site mapping was performed using an established in-house pipeline^28,29,30.

Across all tissues, we identified 929 unique integration sites, excluding those resulting from clonal expansion or mapping to ambiguous genomic regions. To assess tissue-specific preferences, we compared the distribution of integration sites across various genomic features—including CpG islands, DNase I hypersensitivity sites (DHS), endogenous retroviruses (ERVs), heterochromatin, SINEs, LINEs, low-complexity regions (LCRs), oncogenes, genes, simple repeats, and transcription start sites (TSS)—against a matched random control (MRC). Figure 1a summarizes these integration patterns, highlighting largely consistent site preferences across tissues, with a few notable exceptions (Supplementary Data 2). For instance, the duodenum exhibited enrichment in CpG islands and LCRs, while both the duodenum and brain showed preferential integration within oncogenes. Notably, brain tissue displayed significant enrichment in SINEs and DHS.

**Fig. 1: Integration site preference for common genomic features differs in brain compared to other tissues.**

A key distinction emerged in gene targeting patterns. While integration within genes was consistently high across PBL/PBMC (82%), esophagus (79%), stomach (71%), duodenum (82%), and colon (85%), brain tissue exhibited significantly lower gene targeting, with only 57% of integration sites occurring within genes (Fig. 1b and Supplementary Data 2). For comparison, 46% of integration sites in the MRC dataset were within genes, suggesting that brain integration deviates substantially from other tissues. To further dissect spatial relationships between integration sites and genomic features, we classified sites into three distance bins: within the feature, 1–499 bp away, and 500–4999 bp away (Fig. 1c and Supplementary Data 2). Compared to the MRC, integration preferences were largely conserved across most tissues, except for the brain. Brain-derived integration sites were uniquely enriched 500–5000 bp away from genes, whereas all other tissues showed enrichment within 1–499 bp of DHS and 500–5000 bp of TSS.

Together, these findings indicate that while most tissues share common integration site preferences, brain tissue exhibits a distinct profile characterized by reduced gene targeting and preferential integration into SINEs. This divergence suggests that brain-specific chromatin environments may play a role in shaping HIV-1 persistence within the central nervous system (CNS).

Distinct HIV-1 integration patterns near non-B DNA across anatomical tissues

To investigate tissue-specific preferences for HIV-1 integration near non-B DNA structures, we analyzed integration sites within 500 bp of various non-canonical DNA motifs, including A-phased DNA, cruciform structures, G4, inverted repeats, triplex DNA, Z-DNA, and others. This 500 bp window was selected based on its potential functional impact on transcriptional regulation²⁹. As shown in Fig. 2a and Supplementary Data 3, integration preferences varied by tissue. Brain-derived sites were less frequently associated with A-phased DNA (P < 0.01), cruciform structures, G4, inverted repeats (P < 0.001), triplex DNA, and Z-DNA (P < 0.01) but showed modest enrichment near direct repeats and slipped DNA. Esophageal sites followed a similar pattern, with reduced association near A-phased DNA (P < 0.01), cruciform structures, slipped DNA, triplex DNA, and Z-DNA, while showing modest enrichment near G4 and short tandem repeats (STRs). PBL/PBMC sites exhibited modest enrichment near G4, mirror repeats, and STRs. Stomach-derived sites favored mirror repeats and Z-DNA (P < 0.001), whereas duodenum sites were enriched near cruciform structures and mirror repeats but disfavored A-phased DNA, direct repeats, slipped DNA, triplex DNA, and Z-DNA. Colon sites showed enrichment near G4, mirror repeats, and STRs, while exhibiting lower association with cruciform structures, triplex DNA, and Z-DNA.

**Fig. 2: Integration site preference for non-B DNA features differs across tissues.**

Given our prior findings that HIV-1 integration preferences near non-B DNA vary with distance^28,29,30, we next examined integration patterns using 50 bp sliding windows (Fig. 2b and Supplementary Data 3). This analysis revealed distinct distance-dependent tissue-specific preferences: Brain: HIV-1 integration was enriched at greater distances from several non-B DNA features, including G4 (100–149 bp, 4.4-fold, 3.4% of sites), direct repeats (350–399 bp, 4.6-fold, 10.3% of sites, P < 0.0001), STRs (350–399 bp, 3.0-fold, 13.8% of sites, P < 0.001), and slipped motifs (400–449 bp, 5.8-fold, 4.1% of sites, P < 0.05). Esophagus: Integration sites exhibited a strong preference for mirror repeats (300–349 bp, 2.1-fold, 7.3% of sites, P < 0.05) and a tendency toward proximity to cruciforms, G4s, mirror repeats, triplex and Z-DNA. PBL/PBMC: Integration was more frequent within or near most non-B DNA features, particularly Z-DNA (14.2-fold, 2.5% of sites, P < 0.05), mirror repeats (1–49 bp, 2.0-fold, 8.1% of sites), and cruciforms (9.5-fold, 1.2% of sites). Stomach: Integration was highly enriched within Z-DNA (18-fold, 5.9% of sites, P < 0.0001) and STRs (3.7-fold, 5.4% of total sites, P < 0.01), with additional enrichment near mirror repeats (1–49 bp, 2.7-fold, 10.4% of total sites, P < 0.001). Duodenum: Enrichment of integration sites was observed near G4s (100–149 bp, 6.0-fold, 4.4% of total sites, P < 0.05), inverted repeats (within, 1.9-fold, 10.2% of total sites) and mirror repeats (1–49 bp, 2.0-fold, 8.8% of total sites), while sites were positioned further from cruciform (200–249 bp, eightfold, 2.9% of total sites, P < 0.01), STRs, slipped motifs, and Z-DNA. Colon: Integration sites were highly enriched near G4s (100–149 bp, 8.8-fold, 5.0% of total sites, P < 0.01), direct repeats (2.7-fold, 6.7% of total sites) and STRs (1.9-fold, 12.6% of total sites). Additionally, integration sites also tended to be located further from G4s, direct repeats, mirror repeats, STRs, and slipped motifs (300–349 bp, 2.3–4.5-fold, 3.4–10.1% of sites). These findings highlight the tissue-specific nature of HIV-1 integration site selection relative to non-B DNA structures, suggesting that DNA topology may influence viral persistence in different anatomical compartments.

Integration sites overlap across tissues and individuals

We examined the extent to which HIV-1 integration sites were shared among different tissues. Out of 929 unique integration sites, 159 (17.1%) were found in more than one tissue (Fig. 3a, Supplementary Data 4). Most of these overlaps involved two tissues (70 sites, 7.5%), followed by three tissues (53 sites, 5.7%), and four tissues (26 sites, 2.8%). A smaller number of sites overlapped across five (6 sites, 0.6%) and six (4 sites, 0.4%) tissues. Among the most frequent overlaps: 13 sites were shared between PBL/PBMC, duodenum, and stomach; 11 sites were shared in PBL/PBMC, duodenum, stomach, and colon; 11 sites were shared between the esophagus and stomach; and 11 sites were shared in the esophagus, stomach, and duodenum. This observed overlap contrasts sharply with the matched random dataset, where only one overlapping site (between PBL/PBMC and stomach) was detected out of 13,055 total sites (<0.0001%). This reconfirms that HIV-1 integration is non-random and that certain genomic regions may be preferentially targeted across multiple tissues.

To determine whether integration site overlap extended across individuals, we analyzed shared sites among the five individuals with matching PBL/PBMC, esophagus, stomach, duodenum, and colon samples, as well as the eight individuals with brain tissue (13 individuals total). Of the 929 unique integration sites, 172 (18.5%) were found in more than one individual (Fig. 3b, Supplementary Data 4). Most overlapping sites were shared between two individuals (74 sites), with smaller numbers found across three (35 sites), four (26 sites), five (29 sites), six (5 sites), seven (1 site), and eight (1 site) individuals. Notably, one integration site (chr3:17443680, within the TBC1D5 gene) was present in all 13 individuals. Additional recurrent sites included FBXL17 (chr5:107484132), found in eight individuals, and SCN5A (chr3:38684700), found in seven individuals. These findings highlight HIV-1 integration site overlap, both across different tissues within individuals and across individuals, suggesting that certain genomic loci may serve as preferential sites for HIV-1 integration.

Integration site hotspots exist across tissues

To identify regions of frequent HIV-1 integration, we defined hotspots as genomic loci where multiple integration events occurred within 100 bp of each other. Brain tissue exhibited the highest proportion of integration hotspots, with 45% of total integration sites clustering within these regions—significantly more than all other tissues (P < 0.0001). Other tissues also displayed notable hotspot integration, including esophagus (21%), duodenum (15%), colon (10%), stomach (10%), and PBL/PBMC (8%) (Fig. 4a, Supplementary Data 4). Some hotspot integrations were observed at identical chromosomal positions across different tissues (Supplementary Fig. 1 and Supplementary Data 4). While the chromosomal distribution of integration sites was largely similar among tissues, a striking exception was observed in brain tissue: chromosome 9 accounted for 28.3% of all brain-derived integration sites, compared to only 5–10% in other tissues (Fig. 4b–d, Supplementary Data 4). This was driven by a dense cluster of 39 brain-derived integration sites within a 313 bp window on chromosome 9 (chr9:129,604,522–129,604,835) (Fig. 4c).

**Fig. 4: Chromosomal hotspots for integration.**

We next examined which genes were most frequently targeted for integration across tissues. The total number of genes harboring integration sites varied by tissue: brain, 70 genes; PBL/PBMC, 129 genes; esophagus, 107 genes; stomach, 134 genes; duodenum, 99 genes; and colon, 97 genes (Fig. 5a and Supplementary Data 5)⁵⁶. Many integration sites were tissue-specific, occurring in genes uniquely targeted within a given tissue: brain, 42 genes; PBL/PBMC, 38 genes; esophagus, 35 genes; stomach, 54 genes; duodenum, 33 genes; and colon, 20 genes (Supplementary Data 5). Despite this tissue specificity, certain genes were recurrent integration hotspots, with five or more unique integration events: brain, ATP2C1, CHKA, ST3GAL3; PBL/PBMC, SYNRG; esophagus, MBOAT7; stomach, ANKFY1, CARD8; duodenum, DEPDC5 (Fig. 5b and Supplementary Data 5). Notably, some genes were integration hotspots across multiple tissues. Four tissues shared ASCC1, SSH2, ZCCHC11, ZCCHC7, CCNL2, MAP3K3, MFSD11, and PTPRD. Five tissues shared TBC1D5, EIF4G3, PPP6R2, and RBM6. The gene SCN5A was a recurrent integration hotspot in all six tissues (Fig. 5c and Supplementary Data 5).

**Fig. 5: Integration site gene hotspots across tissues.**

To assess whether HIV-1 preferentially integrates into highly expressed genes, we analyzed gene expression levels in genes hosting HIV-1 integration sites across five tissues: brain, colon, duodenum, esophagus, and stomach. Gene expression data were obtained from RNA-seq datasets available in the Human Protein Atlas. Genes were categorized into low-frequency integration targets (1–5 integration sites per gene) and high-frequency integration targets (≥6 integration sites per gene), and comparisons were made using a Mann–Whitney U Test (Wilcoxon Rank-Sum Test). Across tissues, the median gene expression levels showed variability between the low- and high-integration frequency groups, with some tissues exhibiting trends toward higher expression in genes with more integration sites (Fig. 5d–h and Supplementary Data 6). However, statistical significance varied by tissue. In the colon, a significant difference in expression was observed (U = 521, p = 0.046), suggesting a potential preference for integration into highly expressed genes. In contrast, no significant differences were found in the brain (U = 333, p = 0.26), duodenum (U = 627, p = 0.10), esophagus (U = 949, p = 0.35), or stomach (U = 1249, p = 0.28).

Together, these findings reveal both tissue-specific and widely shared integration preferences, highlighting regions of the genome that may be particularly favorable for HIV-1 persistence. The presence of common integration hotspots across multiple tissues suggests that certain genomic loci could play a role in HIV-1 latency and reservoir maintenance.

LTR sequence heterogeneity confirms biological origins of shared HIV integration sites

To assess whether PCR contamination contributed to the identification of shared HIV-1 integration sites across tissues and individuals, we analyzed the TBC1D5 hotspot, present in all tissues from all 13 PLWH. A total of 1289 HIV-1 3’ LTR sequences integrated precisely at chr3:17443680 within the TBC1D5 gene were examined using multiple sequence alignment, mutation frequency analysis, and statistical comparison to known HIV-1 mutation rates. Additionally, we assessed host-mediated hypermutation signatures and searched for the TBC1D5 hotspot in independent datasets from the Retrovirus Integration Database (v2.0)⁵⁷.

Pairwise sequence alignments quantified nucleotide differences within the last 29 bp of the HIV-1 LTR, and a network-based analysis revealed high sequence diversity, with pairwise differences ranging from 1 to 5 bp (Fig. 6a, b). While distinct sequence clusters were observed, no single dominant sequence indicative of widespread contamination was detected. Instead, several highly divergent sequences exhibited little to no connections to others, suggesting independent integration events rather than PCR amplification artifacts. If contamination had occurred, we would expect a single dominant sequence to appear across unrelated samples, along with reduced sequence heterogeneity. However, our results showed a naturally diverse sequence distribution, consistent with authentic HIV-1 integration rather than contamination. Furthermore, identical integration coordinates were associated with subtle but reproducible LTR sequence variations, a pattern expected from natural HIV-1 evolution rather than PCR artifacts. In addition, uninfected genomic DNA controls processed alongside the infected samples yielded zero integration sites, further ruling out contamination.

**Fig. 6: HIV-1 LTR heterogeneity at the *TBC1D5* integration site hotspot.**

To determine whether the observed mutation rate aligned with known biological mechanisms, we compared it to expected HIV-1 RT and PCR polymerase error rates. Based on established RT mutation rates (2 × 10⁻⁵ to 5 × 10⁻⁵ per nucleotide per cycle), the expected mutation frequency for the 29 bp LTR region ranged from 5.8 × 10⁻⁴ to 14.5 × 10⁻⁴ per bp, with an expected G-to-A transition frequency between 5.8 × 10⁻⁵ to 2.2 × 10⁻⁴ per bp^50,51. Similarly, the expected total PCR-induced mutation frequency was ~7.3 × 10⁻⁴ per bp (high-fidelity polymerase, 1 × 10⁻⁶ errors per nucleotide for 25 cycles), with an expected 7.3 × 10⁻⁵ per bp for G-to-A transitions (~10%)^52,53. A chi-square goodness-of-fit test revealed that the observed LTR mutation rate (0.034 per bp) was significantly higher than all expected background mutation rates (p < 0.0001), indicating that random RT and PCR errors alone cannot explain the observed sequence variability.

To evaluate APOBEC3-mediated mutagenesis, we quantified G-to-A transitions, a hallmark of APOBEC3 activity. These mutations accounted for 47% of all detected mutations (0.0162 per bp), a frequency higher than expected RT (5.8 × 10⁻⁵ to 2.2 × 10⁻⁴ per bp) and PCR error rates (7.3 × 10⁻⁵ per bp). A position-specific analysis identified 16 GG to AG mutations and 38 TC to TT mutations across 1294 sequence variants. A chi-square test confirmed that these mutations occurred significantly more frequently than expected under a random mutation model (χ² = 8.96, df = 1, p = 0.00276). These findings provide evidence for APOBEC3-mediated G-to-A hypermutation, supporting the role of host antiviral defense mechanisms.

Finally, we examined whether the TBC1D5 (chr3:17443680) hotspot was present in independent datasets from the Retrovirus Integration Database (v2.0). One PBMC dataset contained an integration site precisely at chr3:17443680, identical to our findings, and another integration site 5 bp away. Additionally, three other independent datasets involving HEK293T, PBMCs, and/or lymph node tissue showed integration sites 1–7 bp away from chr3:17443680, some of which were also shared between those independent datasets, reinforcing the biological relevance of this hotspot (Fig. 6c). Together, our analysis demonstrates that the observed sequence diversity, mutation rates, and APOBEC3 mutation signatures cannot be attributed to PCR contamination. Instead, the results strongly support host-mediated mutagenesis and natural HIV-1 sequence evolution at the TBC1D5 integration site hotspot.

HIV-1 integration targets genes implicated in diverse HIV-associated diseases

Proviral integration within or near genes can influence gene expression, potentially altering biological processes, molecular functions, and cellular pathways (reviewed in ref. ⁵⁸). To investigate the relevance of integration site selection, we analyzed HIV-1 integration within genes linked to HIV-1-associated diseases using the Kyoto Encyclopedia of Genes and Genomes Disease database (Table 1). This analysis revealed tissue-specific integration patterns. Several genes implicated in cancer were frequent HIV-1 integration targets, with distinct patterns across tissues: brain, PAN3; esophagus, CDK6, PRKG1; stomach, CBL, LDLR; and colon: BRIP1, KCNE2. The gene SCN5A, linked to cardiovascular diseases, was a common integration target across all tissues. Additional cardiovascular-related genes were targeted in: esophagus, PRKG1; stomach, LDLR, LMNA; and colon, KCNE2. Several genes associated with nervous system diseases were highly targeted (>5 unique integration sites per gene): brain, ATP6V0A1, ST3GAL3; esophagus, USP48; stomach, RIMS2; duodenum, DEPDC5; and colon: KCNT2, WNK1. Several congenital malformation-associated genes were recurrent HIV-1 integration sites, including highly targeted genes in: brain, ATP2C1; stomach, MACF1; and duodenum, MAPRE2, PPP2R5D. HIV-1 integration was also observed in genes linked to a wide range of other diseases, including digestive, endocrine, hematologic, immune, metabolic, mental, respiratory, skin, urinary, and musculoskeletal disorders. Together, these findings highlight the potential impact of integration site selection on disease pathogenesis, emphasizing the need for further investigation into how integration influences host gene function across tissues.

Table 1 List of genes hosting four or more HIV-1 integration sites that are linked with HIV-associated disease conditions

Full size table

HIV-1 integration site hotspots across tissues are located in non-B DNA

Previous studies have shown that HIV-1 integration site hotspots are enriched in or adjacent to non-B DNA features, particularly slipped and G4 DNA motifs^28,29,30. To identify local genomic sequences that may serve as integration hotspots within tissues, we compared two pools of integration sites for each tissue: hotspot sites—integration sites occurring within hotspots (defined as two or more sites located <100 bp apart, regardless of genomic location); and non-hotspot sites—integration sites that did not cluster within hotspots. For each integration site, we extracted a sequence window 100 bp upstream and 100 bp downstream (200 bp total) and analyzed motif composition using DiffLogo, a tool that visualizes pairwise differences in DNA motifs⁵⁹. In DiffLogo visualizations, stack height represents the degree of sequence distribution dissimilarity (Jensen–Shannon divergence) and symbol height within each stack reflects the relative abundance of specific nucleotides at that position. Analysis of sequences surrounding integration sites revealed tissue-specific differences in motif composition (Fig. 7). PBL/PBMCs exhibited the most sequence divergence, followed by colon, stomach, and duodenum. The esophagus showed the least sequence divergence, suggesting greater conservation of integration site sequences in this tissue. Consensus motif analysis across tissues revealed a consistent enrichment of slipped DNA motifs within hotspot regions (Fig. 7). However, additional tissue-specific sequence preferences emerged. G4 DNA motifs were enriched at or near the integration site in the esophagus, PBL/PBMCs, stomach, and duodenum.

**Fig. 7: Integration site hotspots across tissues are located in non-B DNA.**

Brain tissue displayed overlapping STR and triplex motifs, positioned further downstream (66–87 bp from the integration site).

These findings underscore the conserved role of slipped DNA motifs in HIV-1 integration hotspots, while highlighting tissue-specific preferences for G4 motifs (esophagus, PBL/PBMCs, stomach, duodenum) and STR/triplex motifs (brain). Such sequence biases may influence regional chromatin accessibility and contribute to the establishment of tissue-specific viral reservoirs.

Discussion

This study provides new insights into HIV-1 integration site preferences across anatomically distinct tissue sites known to harbor infected cells. While most tissues (PBL/PBMC, esophagus, stomach, duodenum, colon) exhibited largely similar integration patterns with respect to common genomic features, brain tissue displayed a unique profile, with reduced gene targeting and increased integration into SINEs and DHS. Additionally, each tissue exhibited distinct preferences for integration near non-B DNA structures, and we identified integration hotspots across all tissues—some tissue-specific and others shared. Importantly, we observed substantial integration site overlap between tissues and individuals, suggesting highly precise targeting at specific genomic locations.

Understanding HIV-1 integration in the brain remains challenging due to limited tissue availability and the low frequency of infected cells in this compartment. A recent study in cultured microglia found that HIV-1 preferentially integrates into transcriptionally active chromatin and genes⁶⁰. Our finding that brain-derived integration sites were enriched in DHS aligns with this, as DHS mark regions of accessible chromatin, often associated with active transcription. However, brain tissue exhibited significantly less integration into genes than all other tissues, which was unexpected given the correlation between DHS and active transcription. Although we did not differentiate between cell types in the brain, cell type-specific factors may influence integration preferences. One possible explanation is the role of Lens Epithelium-Derived Growth Factor (LEDGF/p75), a key host co-factor that directs HIV-1 integration toward active genes. LEDGF/p75 expression is regionally restricted and lower in the adult brain^{61,62,63,64,65,66,67}, potentially contributing to the reduced gene targeting observed. Conversely, APOBEC3G and APOBEC3F, which are known to shift HIV-1 integration away from genes and toward SINEs, are widely expressed in the CNS and can be upregulated by interferon^30,68,69. Together, decreased LEDGF/p75 and increased APOBEC3 expression may promote integration into more transcriptionally repressed regions in brain cells.

A particularly striking finding was the increased integration into SINEs in brain tissue, while all other tissues showed reduced SINE targeting. The enrichment of integration sites within SINE elements, particularly in brain tissue, may reflect broader principles of genomic “availability” for integration. SINEs are abundant, repetitive elements that often reside in DNA regions with dynamic structural potential, and while they are generally associated with repressive chromatin (e.g., H3K9me3), they can also co-localize with DHS in certain contexts⁷⁰. Their structural plasticity and abundance may render them accessible to the pre-integration complex, especially in environments with altered chromatin landscapes such as the inflamed or ART-naive CNS. Furthermore, genome architecture and positioning within the nucleus, such as proximity to nuclear pores or lamina, can influence integration targeting independently of gene expression⁷. Supporting this, our DiffLogo analysis revealed distinct sequence signatures surrounding integration hotspots, including motifs associated with non-B DNA structures such as slipped DNA and G4 DNA, which can modulate chromatin structure and accessibility^12,18,19,21. Additionally, brain-derived integration hotspots showed dense clustering in regions enriched for repetitive elements, which may reflect both structural availability and tolerance for proviral persistence. Together, these findings suggest that SINE-targeting may emerge from a convergence of chromatin context, nuclear topology, and selective forces acting post-integration.

A limitation of our study is that brain-derived integration sites were obtained from ART-naive individuals, whereas integration sites from other tissues came from ART-treated individuals. This raises the possibility that the observed brain-specific integration patterns, such as reduced gene targeting and increased SINE enrichment, may reflect ongoing viral replication in untreated individuals rather than inherent tissue-specific factors. ART reshapes integration site landscapes by selectively expanding clonally integrated proviruses, particularly in actively transcribed genes⁷¹. In untreated individuals, integration likely reflects initial infection dynamics, with broader distribution across genic and intergenic regions. This could explain the lower gene-targeting in brain tissue, as opposed to the gene-enriched profiles in ART-treated tissues, where clonal expansion dominates. Additionally, increased SINE integration in brain tissue may reflect a bias toward transcriptionally inactive regions, allowing proviruses to evade immune clearance. While our study does not directly compare pre- and post-ART integration landscapes, future research using matched pre-/post-ART samples or single-cell analyses of integration site clonality could clarify whether these differences are driven by intrinsic tissue properties or ART-related selection pressures. Recognizing these distinctions underscores the need for further investigation into how ART shapes reservoir persistence across tissues, particularly in the brain.

In addition to the absence of ART, the presence of ongoing immune activation and inflammation in the brain tissue of ART-naive individuals likely contributed to the unique integration patterns observed. While our study does not directly compare integration site profiles to regional gene expression in the brain, future analyses incorporating cell-type-specific or spatially resolved transcriptomic datasets may help clarify how integration site targeting relates to chromatin accessibility and transcriptional activity in the CNS⁷². Elevated levels of pro-inflammatory cytokines in the CNS during untreated HIV-1 infection can increase activation of microglia and astrocytes, leading to enhanced susceptibility to infection and altered chromatin accessibility^73,74. This inflammatory environment may shift integration site preferences toward accessible but transcriptionally silenced regions, such as SINEs, as a strategy to evade immune detection. Moreover, in the absence of ART, ongoing viral replication may promote a broader diversity of integration events, with less selection pressure for proviruses integrated into actively transcribed genes^8,22. We also cannot rule out the possibility that certain integration sites in the brain reflect clonal expansion of long-lived infected cells, although this would require additional data to confirm. Overall, the ART-naive status of brain tissue donors likely contributes to the observed integration profile and further highlights the importance of considering treatment status in future comparative tissue analyses.

The observation of tissue-specific integration patterns, despite the known migratory capacity of CD4 + T cells, suggests that additional mechanisms may underlie these genomic preferences. One possibility is that the sampled integration sites reflect long-lived tissue-resident memory T cells, which are known to persist in mucosal and barrier tissues, including the gastrointestinal tract and brain, and exhibit restricted trafficking^55,75. These cells may harbor proviruses integrated into chromatin regions shaped by their local microenvironment. Alternatively, clonal expansion of infected cells within specific tissues could enrich for integration sites in certain genomic loci due to selective pressures such as antigenic stimulation or local cytokine profiles⁷⁶. Furthermore, tissue-specific chromatin accessibility and the differential expression of host factors like LEDGF/p75 or APOBEC3 proteins can influence integration targeting within cells residing in distinct anatomical niches^8,30. Lastly, suboptimal ART drug penetration in sanctuary sites like the brain or gut may permit ongoing low-level replication, contributing to localized integration events⁷⁷. Together, these factors may shape the emergence and maintenance of tissue-specific HIV-1 integration site profiles, despite the systemic circulation of many CD4 + T cell subsets.

Our analysis also revealed substantial integration site overlap between different tissues within the same individuals. Given that all individuals in this study were MSM who likely acquired HIV-1 through direct gastrointestinal (GI) tract exposure, it is possible that infection was initially established in the lower bowel before disseminating to other GI sites and the bloodstream. This overlap could arise due to trafficking of infected cells containing identical proviruses, potentially due to clonal expansion, as observed in ART-treated individuals^40,78,79. Another possibility is migration of infected blood cells, leading to the seeding of identical proviruses across multiple tissues³¹. Although our study did not assess clonality or proviral sequence identity, these mechanisms could explain the shared integration sites observed across tissues within PLWH.

Beyond intra-individual overlap, we also identified integration site overlap between different individuals, suggesting a common mechanism of integration site targeting during the early HIV-1 pandemic. Several factors could contribute to this: Transmission of a clonal HIV-1 strain—early in the pandemic, limited genetic diversity among circulating viruses may have led to similar integration patterns. A comprehensive sequencing analysis of the viral quasispecies from each patient would be needed to confirm this. Selective advantage—certain genomic regions may promote viral persistence or immune evasion, making them preferred integration sites across individuals. Host genetics—shared regions of DNA accessibility or chromatin structure may create similar integration site preferences. Convergent viral evolution—the selective pressure of early non-suppressive ART regimens may have driven the virus toward similar integration patterns over time. Further studies are needed to dissect how HIV-1 selects integration sites across different individuals and whether these patterns reflect evolutionary constraints, host factors, or a combination of both.

HIV-1 compartmentalization in CD4 + T cells across tissues suggests that integration site landscapes may differ between tissue-resident and circulating cells^31,40,55,80. Some CD4 + T cell subsets remain tissue-resident or have limited ability to migrate, which could contribute to distinct integration site hotspots in different tissues. If ongoing viral replication occurs within these tissue compartments during ART, integration sites may continue to be generated at specific chromosomal regions that are highly receptive to integration. However, if there is limited or no active viral replication during ART, as has been observed in lymphoid tissues⁷⁹, these integration events likely predate ART initiation and persist due to the long-lived nature of infected cells. Further investigation into proviral sequences, matching integration sites, and proviral RNA expression in tissues will help determine whether tissue-specific integration hotspots arise from ongoing replication or are remnants of past infection events.

The mechanisms that drive integration site hotspots, particularly those that are tissue-specific, remain poorly understood. Our previous work identified slipped DNA and G4 DNA motifs as common features of retroviral integration hotspots^28,29,30. In this study, we observed these same motifs within tissue-specific integration hotspots, suggesting that non-B DNA structures influence integration site targeting. Since slipped DNA and G4 DNA structures can be influenced by transcriptional activity, correlating host gene expression, proviral gene expression, and chromatin architecture with integration sites may provide key insights into why certain genomic regions are favored for integration in different tissues.

Retroviral integration in or near genes can disrupt gene function, leading to diverse biological consequences⁵⁸. While insertional mutagenesis has been implicated in the clonal expansion of infected cells, its role in HIV-1-associated diseases remains understudied. HIV-1-infected individuals face higher risks of AIDS-defining cancers (Kaposi sarcoma, aggressive B-cell non-Hodgkin lymphoma, cervical cancer) and non-AIDS-defining malignancies^81,82,83,84. While immune suppression and co-infections contribute to some of these cancers, the mechanisms underlying others remain unclear. Our discovery of HIV-1 integration within genes associated with various diseases, including cancer, cardiovascular disorders, and neurological conditions, raises an important question: Could HIV-1 insertional mutagenesis contribute to the wide range of clinical manifestations observed in PLWH? Determining whether HIV-1 integration into specific genetic loci drives disease progression will require further research, particularly longitudinal studies assessing gene expression changes in relation to integration events.

Despite providing valuable insights into HIV-1 integration site preferences across anatomically diverse tissues, this study has several limitations. First, the sample size is relatively small, particularly for brain tissue, which was obtained from an unmatched cohort. This limits the generalizability of our findings and may not fully capture the heterogeneity of HIV-1 integration patterns across different individuals and clinical backgrounds. Second, we acknowledge that the interpretation of integration site overlap versus tissue specificity is constrained by the use of single, small biopsies per tissue and the limited anatomical coverage from each organ. Given that tissues such as the colon and brain are large and heterogeneous, the integration patterns we observed may represent only a fraction of the true diversity. It is plausible that broader sampling, such as collecting multiple biopsies along different anatomical axes, would uncover additional unique and overlapping integration sites. This reflects both localized chromatin environments and the stochastic nature of infection^31,40. For instance, localized inflammation, differential immune cell infiltration, or regional differences in chromatin accessibility could drive distinct integration landscapes even within the same tissue. Furthermore, the dynamic nature of the HIV-1 reservoir, which is shaped by temporal fluctuations in viral replication and immune pressure, suggests that longitudinal sampling would also reveal shifts in integration patterns over time. Our data likely capture a snapshot of tissue-specific integration bias, but future studies using spatial transcriptomics, multi-region sampling, and single-cell resolution could help disentangle local versus systemic drivers of HIV-1 integration site heterogeneity. Third, our analysis does not differentiate between specific cell types within each tissue, which is important given that HIV-1 exhibits cell type-specific integration preferences. For example, microglia and astrocytes in the brain may harbor HIV-1 differently than CD4 + T cells in the blood or gastrointestinal tract. Fourth, while we identified substantial integration site overlap across tissues and individuals, we did not assess whether these shared sites result from clonal expansion or other selective pressures. A deeper analysis of proviral sequence identity and transcriptional activity would help clarify the mechanisms underlying this overlap. Additionally, our study focuses on early-pandemic HIV-1 subtype B, and it remains unclear whether these tissue-specific integration site preferences extend to other HIV-1 subtypes or more current circulating HIV-1. Finally, while we identified associations between HIV-1 integration and specific genomic features, further mechanistic studies are needed to establish causal relationships between integration site selection, viral persistence, and disease outcomes. Future research incorporating single-cell sequencing, chromatin accessibility profiling, and host gene expression analyses will be critical for addressing these limitations and refining our understanding of HIV-1 reservoir dynamics across different tissues.

In conclusion, this study provides key insights into the complex relationship between HIV-1 integration preferences and tissue environments. We observed tissue-specific differences, with brain tissue exhibiting a unique integration profile. The identification of substantial integration site overlap and hotspots, including shared integration sites across individuals, suggests the presence of common targeting mechanisms. Further investigations across distinct cell types, including analysis of proviral sequences, transcriptional activity, and host epigenomic factors, will be essential to unravel the mechanisms driving tissue-specific HIV-1 integration and how these patterns might contribute to long-term persistence and disease processes.

Data availability

Source data are available in Supplementary Data 1–6. The RNA expression data analyzed in Fig. 5 (Supplementary Data 6) is based on The Human Protein Atlas version 24.0 and Ensembl version 109, available from proteinatlas.org⁸⁵. The in vivo integration site data analyzed in Fig. 6c was downloaded from the Retrovirus Integration Database (v2.0) and included datasets: rid0000001, rid12843741, rid15163705, rid23953889, rid24968937, rid25011556, rid26912621, rid30024859, rid30688658, rid30857886, rid31217357, rid33832973, rid33844021, rid34636876, rid34960744, rid35026153, rid35347274, and rid36044447⁵⁷. Additional data is available upon reasonable request.

References

Chun, T. W. et al. Presence of an inducible HIV-1 latent reservoir during highly active antiretroviral therapy. Proc. Natl. Acad. Sci. USA 94, 13193–13197 (1997).
Article PubMed PubMed Central CAS Google Scholar
Arts, E. J. & Hazuda, D. J. HIV-1 antiretroviral drug therapy. Cold Spring Harb. Perspect. Med. 2, a007161 (2012).
Chun, T. W., Engel, D., Mizell, S. B., Ehler, L. A. & Fauci, A. S. Induction of HIV-1 replication in latently infected CD4+ T cells using a combination of cytokines. J. Exp. Med. 188, 83–91 (1998).
Article PubMed PubMed Central CAS Google Scholar
Dufour, C., Gantner, P., Fromentin, R. & Chomont, N. The multifaceted nature of HIV latency. J. Clin. Invest. 130, 3381–3390 (2021).
Article Google Scholar
Blankson, J. N., Persaud, D. & Siliciano, R. F. The challenge of viral reserviors in HIV-1 infection. Annu. Rev. Med. 53, 557–593 (2002).
Article PubMed CAS Google Scholar
Eisele, E. & Siliciano, R. F. Redefining the viral reservoirs that prevent HIV-1 eradication. Immunity 37, 377–388 (2012).
Article PubMed PubMed Central CAS Google Scholar
Marini, B. et al. Nuclear architecture dictates HIV-1 integration site selection. Nature 521, 227–231 (2015).
Article PubMed CAS Google Scholar
Battivelli, E. et al. Distinct chromatin functional states correlate with HIV latency reactivation in infected primary CD4+ T cells. Elife 7, e34655 (2018).
Jiang, J.-C. & Upton, K. R. Human transposons are an abundant supply of transcription factor binding sites and promoter activities in breast cancer cell lines. Mob. DNA 10, 16 (2019).
Article PubMed PubMed Central Google Scholar
Kondo, Y. & Issa, J.-P. J. Enrichment for histone H3 lysine 9 methylation at Alu repeats in human cells. J. Biol. Chem. 278, 27658–27662 (2003).
Article PubMed CAS Google Scholar
Brázda, V., Laister, R. C., Jagelská, E. B. & Arrowsmith, C. Cruciform structures are a common DNA feature important for regulating biological processes. BMC Mol. Biol. 12, 33 (2011).
Article PubMed PubMed Central Google Scholar
Siddiqui-Jain, A., Grand, C. L., Bearss, D. J. & Hurley, L. H. Direct evidence for a G-quadruplex in a promoter region and its targeting with a small molecule to repress c-MYC transcription. Proc. Natl. Acad. Sci. USA 99, 11593–11598 (2002).
Article PubMed PubMed Central CAS Google Scholar
Verma, A., Yadav, V. K., Basundra, R., Kumar, A. & Chowdhury, S. Evidence of genome-wide G4 DNA-mediated gene expression in human cancer cells. Nucleic Acids Res. 37, 4194–4204 (2009).
Article PubMed PubMed Central CAS Google Scholar
Waga, S., Mizuno, S. & Yoshida, M. Chromosomal protein HMG1 removes the transcriptional block caused by the cruciform in supercoiled DNA. J. Biol. Chem. 265, 19424–19428 (1990).
Article PubMed CAS Google Scholar
Waga, S., Mizuno, S. & Yoshida, M. Nonhistone protein HMG1 removes the transcriptional block caused by left-handed Z-form segment in a supercoiled DNA. Biochem. Biophys. Res. Commun. 153, 334–339 (1988).
Article PubMed CAS Google Scholar
Jain, A., Magistri, M., Napoli, S., Carbone, G. M. & Catapano, C. V. Mechanisms of triplex DNA-mediated inhibition of transcription initiation in cells. Biochimie 92, 317–320 (2010).
Article PubMed CAS Google Scholar
Maher, L. J., Dervan, P. B. & Wold, B. Analysis of promoter-specific repression by triple-helical DNA complexes in a eukaryotic cell-free transcription system. Biochemistry 31, 70–81 (1992).
Article PubMed CAS Google Scholar
Bochman, M. L., Paeschke, K. & Zakian, V. A. DNA secondary structures: stability and function of G-quadruplex structures. Nat. Rev. Genet. 13, 770–780 (2012).
Article PubMed PubMed Central CAS Google Scholar
Delic, J., Onclercq, R. & Moisan-Coppey, M. Inhibition and enhancement of eukaryotic gene expression by potential non-B DNA sequences. Biochem. Biophys. Res. Commun. 180, 1273–1283 (1991).
Article PubMed CAS Google Scholar
Tornaletti, S., Park-Snyder, S. & Hanawalt, P. C. G4-forming sequences in the non-transcribed DNA strand pose blocks to T7 RNA polymerase and mammalian RNA polymerase II. J. Biol. Chem. 283, 12756–12762 (2008).
Article PubMed PubMed Central CAS Google Scholar
Belotserkovskii, B. P. et al. A triplex-forming sequence from the human c-MYC promoter interferes with DNA transcription. J. Biol. Chem. 282, 32433–32441 (2007).
Article PubMed CAS Google Scholar
Einkauf, K. B. et al. Parallel analysis of transcription, integration, and sequence of single HIV-1 proviruses. Cell 185, 266–282.e15 (2022).
Article PubMed PubMed Central CAS Google Scholar
Einkauf, K. B. et al. Intact HIV-1 proviruses accumulate at distinct chromosomal positions during prolonged antiretroviral therapy. J. Clin. Invest. 129, 988–998 (2019).
Article PubMed PubMed Central Google Scholar
Lian, X. et al. Signatures of immune selection in intact and defective proviruses distinguish HIV-1 elite controllers. Sci. Transl. Med. 13, eabl4097 (2021).
Lian, X. et al. Progressive transformation of the HIV-1 reservoir cell profile over two decades of antiviral therapy. Cell Host Microbe 31, 83–96.e5 (2023).
Article PubMed PubMed Central CAS Google Scholar
Jiang, C. et al. Distinct viral reservoirs in individuals with spontaneous control of HIV-1. Nature 585, 261–267 (2020).
Article PubMed PubMed Central CAS Google Scholar
Cohn, L. B. et al. HIV-1 integration landscape during latent and active infection. Cell 160, 420–432 (2015).
Article PubMed PubMed Central CAS Google Scholar
Kohio, H. P. et al. Specialized DNA structures act as genomic beacons for integration by evolutionarily diverse retroviruses. Viruses 15, 465 (2023).
Article PubMed PubMed Central CAS Google Scholar
Ajoge, H. O. et al. G-quadruplex DNA and other non-canonical B-form DNA motifs influence productive and latent HIV-1 integration and reactivation potential. Viruses 14, 2494 (2022).
Article PubMed PubMed Central CAS Google Scholar
Ajoge, H. O. et al. Antiretroviral APOBEC3 cytidine deaminases alter HIV-1 provirus integration site profiles. Nat. Commun. 14, 16 (2023).
Article PubMed PubMed Central CAS Google Scholar
Chaillon, A. et al. HIV persists throughout deep tissues with repopulation from multiple anatomical sources. J. Clin. Invest. 130, 1699–1712 (2020).
Article PubMed PubMed Central CAS Google Scholar
Wong, J. K. et al. Recovery of replication-competent HIV despite prolonged supression of plasma viremia. Science 278, 1291–1295 (1997).
Article PubMed CAS Google Scholar
Finzi, D. et al. Identification of a reservoir for HIV-1 in patients on highly active antiretroviral therapy. Science 278, 1295–1300 (1997).
Article PubMed CAS Google Scholar
Chomont, N. et al. HIV reservoir size and persistence are driven by T cell survival and homeostatic proliferation. Nat. Med. 15, 893–900 (2009).
Article PubMed PubMed Central CAS Google Scholar
Chun, T.-W. et al. Quantification of latent tissue reservoirs and total body viral load in HIV-1 infection. Nature 387, 183–188 (1997).
Article PubMed CAS Google Scholar
Wong, J. K. & Yukl, S. A. Tissue reservoirs of HIV. Curr. Opin. HIV AIDS 11, 362–370 (2016).
Article PubMed PubMed Central CAS Google Scholar
Barton, K., Winckelmann, A. & Palmer, S. HIV-1 reservoirs during suppressive therapy. Trends Microbiol 24, 345–355 (2016).
Article PubMed PubMed Central CAS Google Scholar
Mzingwane, M. L. & Tiemessen, C. T. Mechanisms of HIV persistence in HIV reservoirs. Rev. Med. Virol. 27, e1924 (2017).
Article Google Scholar
Vanhamel, J., Bruggemans, A. & Debyser, Z. Establishment of latent HIV-1 reservoirs: what do we really know?. J. Virus Erad. 5, 3–9 (2019).
Article PubMed PubMed Central Google Scholar
Wu, V. H. et al. Assessment of HIV-1 integration in tissues and subsets across infection stages. JCI Insight 5, e139783 (2020).
Article PubMed PubMed Central Google Scholar
van Marle, G. et al. Compartmentalization of the gut viral reservoir in HIV-1 infected patients. Retrovirology 4, 87 (2007).
Article PubMed PubMed Central Google Scholar
van Marle, G., Sharkey, K. A., Gill, M. J. & Church, D. L. Gastrointestinal viral load and enteroendocrine cell number are associated with altered survival in HIV-1 infected individuals. PLoS One 8, e75967 (2013).
Article PubMed PubMed Central Google Scholar
van Marle, G. et al. Higher levels of Zidovudine resistant HIV in the colon compared to blood and other gastrointestinal compartments in HIV infection. Retrovirology 7, 74 (2010).
Article PubMed PubMed Central Google Scholar
Cer, R. Z. et al. Non-B DB v2.0: a database of predicted non-B DNA-forming motifs and its associated tools. Nucleic Acids Res. 41, D94–D100 (2013).
Article PubMed CAS Google Scholar
Guelen, L. et al. Domain organization of human chromosomes revealed by mapping of nuclear lamina interactions. Nature 453, 948–951 (2008).
Article PubMed CAS Google Scholar
Barr, S. D. et al. HIV integration site selection: targeting in macrophages and the effects of different routes of viral entry. Mol. Ther. 14, 218–225 (2006).
Article PubMed CAS Google Scholar
Edgar, R. C. Search and clustering orders of magnitude faster than BLAST. Bioinformatics 26, 2460–2461 (2010).
Article PubMed CAS Google Scholar
Katoh, K. & Standley, D. M. MAFFT multiple sequence alignment software version 7: improvements in performance and usability. Mol. Biol. Evol. 30, 772–780 (2013).
Article PubMed PubMed Central CAS Google Scholar
Hagberg hagberg, A. A., -Los, lanlgov, Schult, D. A. & Swart swart, P. J. Exploring network structure, dynamics, and function using networkX. Python in Science Conference (2008).
Mansky, L. M. & Temin, H. M. Lower in vivo mutation rate of human immunodeficiency virus type 1 than that predicted from the fidelity of purified reverse transcriptase †. J. Virol. 69, 5087–5094 (1995).
Article PubMed PubMed Central CAS Google Scholar
Cuevas, J. M., Geller, R., Garijo, R., López-Aldeguer, J. & Sanjuán, R. Extremely high mutation rate of HIV-1 in vivo. PLoS Biol. 13, e1002251 (2015).
Article PubMed PubMed Central Google Scholar
McInerney, P., Adams, P. & Hadi, M. Z. Error rate comparison during polymerase chain reaction by DNA polymerase. Mol. Biol. Int 2014, 1–8 (2014).
Article Google Scholar
Kinde, I., Wu, J., Papadopoulos, N., Kinzler, K. W. & Vogelstein, B. Detection and quantification of rare mutations with massively parallel sequencing. Proc. Natl. Acad. Sci. USA 108, 9530–9535 (2011).
Article PubMed PubMed Central Google Scholar
Harris, R. S. et al. DNA deamination mediates innate immunity to retroviral infection. Cell 113, 803–809 (2003).
Article PubMed CAS Google Scholar
Cantero-Pérez, J. et al. Resident memory T cells are a cellular reservoir for HIV in the cervical mucosa. Nat. Commun. 10, 4739 (2019).
Lex, A., Gehlenborg, N., Strobelt, H., Vuillemot, R. & Pfister, H. UpSet: visualization of intersecting sets. IEEE Trans. Vis. Comput. Graph. 20, 1983 (2014).
Article PubMed PubMed Central Google Scholar
Shao, W. et al. Retrovirus Integration Database (RID): a public database for retroviral insertion sites into host genomes. Retrovirology 13, 1–9 (2016).
Article Google Scholar
Bushman, F. D. Retroviral insertional mutagenesis in humans: evidence for four genetic mechanisms promoting expansion of cell clones. Mol. Ther. 28, 352 (2020).
Article PubMed PubMed Central CAS Google Scholar
Nettling, M. et al. DiffLogo: a comparative visualization of sequence motifs. BMC Bioinforma. 16, 387 (2015).
Article Google Scholar
Rheinberger, M. et al. Genomic profiling of HIV-1 integration in microglia cells links viral integration to the topologically associated domains. Cell Rep. 42, 112110 (2023).
Article PubMed CAS Google Scholar
Shun, M.-C. et al. LEDGF/p75 functions downstream from preintegration complex formation to effect gene-specific HIV-1 integration. Genes Dev. 21, 1767–1778 (2007).
Article PubMed PubMed Central CAS Google Scholar
Llano, M. et al. An essential role for LEDGF/p75 in HIV integration. Science 314, 461–464 (2006).
Article PubMed CAS Google Scholar
Marshall, H. M. et al. Role of PSIP1/LEDGF/p75 in lentiviral infectivity and integration targeting. PLoS One 2, e1340 (2007).
Article PubMed PubMed Central Google Scholar
Ciuffi, A. et al. A role for LEDGF/p75 in targeting HIV DNA integration. Nat. Med. 11, 1287–1289 (2005).
Article PubMed CAS Google Scholar
Vandekerckhove, L. et al. Transient and stable knockdown of the integrase cofactor LEDGF/p75 reveals its role in the replication cycle of human immunodeficiency virus. J. Virol. 80, 1886–1896 (2006).
Article PubMed PubMed Central CAS Google Scholar
Chylack, L. T. et al. Lens epithelium-derived growth factor (LEDGF/p75) expression in fetal and adult human brain. Exp. Eye Res 79, 941–948 (2004).
Article PubMed CAS Google Scholar
Yoder, K. E. Absence of LEDGF/p75 expression in astrocytes may affect HIV-1 integration efficiency. Mol. Gen. Microbiol Virol. 34, 81–83 (2019).
Article PubMed PubMed Central CAS Google Scholar
Hill, M. S. et al. APOBEC3G expression is restricted to neurons in the brains of pigtailed macaques. AIDS Res Hum. Retroviruses 22, 541–550 (2006).
Article PubMed CAS Google Scholar
Argyris, E. G. et al. The interferon-induced expression of APOBEC3G in human blood-brain barrier exerts a potent intrinsic immunity to block HIV-1 entry to central nervous system. Virology 367, 440–451 (2007).
Article PubMed CAS Google Scholar
Thompson, P. J., Macfarlan, T. S. & Lorincz, M. C. Long terminal repeats: from parasitic elements to building blocks of the transcriptional regulatory repertoire. Mol. Cell 62, 766–776 (2016).
Article PubMed PubMed Central CAS Google Scholar
Coffin, J. M. et al. Clones of infected cells arise early in HIV-infected individuals. JCI Insight 4, e128432 (2019).
Article PubMed PubMed Central Google Scholar
Sjöstedt, E. et al. An atlas of the protein-coding genes in the human, pig, and mouse brain. Science 367, eaay5947 (2020).
Gannon, P., Khan, M. Z. & Kolson, D. L. Current understanding of HIV-associated neurocognitive disorders pathogenesis. Curr. Opin. Neurol. 24, 275 (2011).
Article PubMed PubMed Central Google Scholar
Gelman, B. B. Neuropathology of HAND with suppressive antiretroviral therapy: encephalitis and neurodegeneration reconsidered. Curr. HIV/AIDS Rep. 12, 272–279 (2015).
Article PubMed PubMed Central Google Scholar
Sathaliyawala, T. et al. Distribution and compartmentalization of human circulating and tissue-resident memory T cell subsets. Immunity 38, 187–197 (2013).
Article PubMed CAS Google Scholar
Simonetti, F. R. et al. Clonally expanded CD4 + T cells can produce infectious HIV-1 in vivo. Proc. Natl. Acad. Sci. 113, 1883–1888 (2016).
Article PubMed PubMed Central CAS Google Scholar
Letendre, S. et al. Validation of the CNS penetration-effectiveness rank for quantifying antiretroviral penetration into the central nervous system. Arch. Neurol. 65, 65–70 (2008).
Article PubMed PubMed Central Google Scholar
Patro, S. C. et al. Combined HIV-1 sequence and integration site analysis informs viral dynamics and allows reconstruction of replicating viral ancestors. Proc. Natl. Acad. Sci. USA 116, 25891–25899 (2019).
Article PubMed PubMed Central CAS Google Scholar
McManus, W. R. et al. HIV-1 in lymph nodes is maintained by cellular proliferation during antiretroviral therapy. J. Clin. Invest. 129, 4629–4642 (2019).
Article PubMed PubMed Central CAS Google Scholar
Beura, L. K. et al. CD4+ resident memory T cells dominate immunosurveillance and orchestrate local recall responses. J. Exp. Med. 216, 1214–1229 (2019).
Article PubMed PubMed Central CAS Google Scholar
Grulich, A. E., van Leeuwen, M. T., Falster, M. O. & Vajdic, C. M. Incidence of cancers in people with HIV/AIDS compared with immunosuppressed transplant recipients: a meta-analysis. Lancet 370, 59–67 (2007).
Article PubMed Google Scholar
Hernández-Ramírez, R. U., Shiels, M. S., Dubrow, R. & Engels, E. A. Cancer risk in HIV-infected people in the USA from 1996 to 2012: a population-based, registry-linkage study. Lancet HIV 4, e495–e504 (2017).
Article PubMed PubMed Central Google Scholar
Wang, C. C. J., Silverberg, M. J. & Abrams, D. I. Non-AIDS-defining malignancies in the HIV-infected population. Curr. Infect. Dis. Rep. 16, 406 (2014).
Silverberg, M. J. et al. Cumulative incidence of cancer among persons with HIV in North America: a cohort study. Ann. Intern. Med. 163, 507–518 (2015).
Article PubMed PubMed Central Google Scholar
Uhlén, M. et al. Proteomics. Tissue-based map of the human proteome. Science 347, 1260419 (2015).

Download references

Acknowledgements

We thank the individuals who generously contributed tissue samples during the early years of the HIV/AIDS pandemic. Their bravery and foresight continue to advance our understanding of HIV-1, demonstrating the enduring value of historical specimens in uncovering new insights. Funding was obtained from: Canadian Institutes of Health Research (CIHR) (FRN-150406) to S.B.; National Health Research and Development Program (NHRDP) (Grant No. 6609-1782-AIDS) to D.C.; Canadian Institutes of Health Research (CIHR) (CIHR-III HIV/AIDS HOP-9018) to G.vM.

Author information

These authors contributed equally: Hinissan P. Kohio, Hannah O. Ajoge.

Authors and Affiliations

Western University, Department of Microbiology & Immunology, London, ON, Canada
Hinissan P. Kohio, Hannah O. Ajoge, Emile A. Barua, Neel R. Vajaria, Isaac K. F. Wu, Macon D. Coleman, Sean K. Tom & Stephen D. Barr
University of Calgary, Faculty of Veterinary Medicine, Calgary, AB, Canada
Frank van der Meer & Guido van Marle
Snyder Institute for Chronic Diseases University of Calgary, Calgary, AB, Canada
Frank van der Meer, John Gill, Deirdre Church, Paul Beck & Guido van Marle
University of Calgary, Department of Microbiology, Immunology and Infectious Diseases, Calgary, AB, Canada
John Gill, Deirdre Church & Guido van Marle
University of Calgary, Departments of Pathology & Laboratory Medicine and Medicine, Calgary, AB, Canada
John Gill, Deirdre Church & Paul Beck
University of Alberta, Departments of Medicine (Neurology), Psychiatry, Medical Microbiology & Immunology, Edmonton, AB, Canada
Christopher Power

Authors

Hinissan P. Kohio
View author publications
Search author on:PubMed Google Scholar
Hannah O. Ajoge
View author publications
Search author on:PubMed Google Scholar
Emile A. Barua
View author publications
Search author on:PubMed Google Scholar
Neel R. Vajaria
View author publications
Search author on:PubMed Google Scholar
Isaac K. F. Wu
View author publications
Search author on:PubMed Google Scholar
Macon D. Coleman
View author publications
Search author on:PubMed Google Scholar
Sean K. Tom
View author publications
Search author on:PubMed Google Scholar
Frank van der Meer
View author publications
Search author on:PubMed Google Scholar
John Gill
View author publications
Search author on:PubMed Google Scholar
Deirdre Church
View author publications
Search author on:PubMed Google Scholar
Paul Beck
View author publications
Search author on:PubMed Google Scholar
Christopher Power
View author publications
Search author on:PubMed Google Scholar
Guido van Marle
View author publications
Search author on:PubMed Google Scholar
Stephen D. Barr
View author publications
Search author on:PubMed Google Scholar

Contributions

Research design: H.K., H.A., S.B., G.vM. Conducting experiments: H.K., H.A., E.A.B., N.R.V., I.K.F.W., M.C., S.T., S.B. Acquiring data: H.K., H.A., E.A.B., N.R.V., I.K.F.W., M.C., S.T., S.B. Analyzing data: H.K., H.A., J.G., D.C., P.B., C.P., G.vM., S.B. Providing reagents: S.B., G.vM.; F.vdM., J.G., D.C., P.B., C.P. Writing the manuscript: H.K., S.B. Editing the manuscript: S.B., G.vM.; J.G., D.C., P.B., C.P.

Corresponding authors

Correspondence to Guido van Marle or Stephen D. Barr.

Ethics declarations

Competing interests

The authors declare no competing interests.

Peer review

Peer review information

Communications Medicine thanks Claire Deleage and the other, anonymous, reviewer(s) for their contribution to the peer review of this work.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

Supplementary Information

Description of Additional Supplementary Files

Supplementary Data 1

Supplementary Data 2

Supplementary Data 3

Supplementary Data 4

Supplementary Data 5

Supplementary Data 6

Reporting summary

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.

Reprints and permissions

About this article

Cite this article

Kohio, H.P., Ajoge, H.O., Barua, E.A. et al. Early pandemic HIV-1 integration site preferences differ across anatomical sites. Commun Med 5, 405 (2025). https://doi.org/10.1038/s43856-025-01146-5

Download citation

Received: 25 March 2025
Accepted: 12 September 2025
Published: 06 October 2025
DOI: https://doi.org/10.1038/s43856-025-01146-5

Subjects

Abstract

Background

Methods

Results

Conclusions

Plain Language Summary

Similar content being viewed by others

Introduction

Methods

Sex as a biological variable

Clinical samples

Preparation of HIV-1 integration site library

Computational analysis

LTR sequence heterogeneity

Study approval

Reporting summary

Results

Brain tissue exhibits a distinct HIV-1 integration profile with reduced gene targeting and increased SINE enrichment

Distinct HIV-1 integration patterns near non-B DNA across anatomical tissues

Integration sites overlap across tissues and individuals

Integration site hotspots exist across tissues

LTR sequence heterogeneity confirms biological origins of shared HIV integration sites

HIV-1 integration targets genes implicated in diverse HIV-associated diseases

HIV-1 integration site hotspots across tissues are located in non-B DNA

Discussion

Data availability

References

Acknowledgements

Author information

Authors and Affiliations

Contributions

Corresponding authors

Ethics declarations

Competing interests

Peer review

Peer review information

Additional information

Supplementary information

Rights and permissions

About this article

Cite this article

Share this article

Search

Quick links