Introduction

The HIV pandemic continues to pose a global health burden, despite considerable advances in treatment and prevention1,2. Although a large fraction of people with HIV are receiving life-saving antiretroviral therapy (ART), treatment does not eradicate the virus, and HIV persists in multiple tissue reservoirs, including the brain. Individuals with HIV may develop neurologic disorders that are predicated in part on immune dysfunction, and that are presumed to result from the action of HIV on nervous system cells2,3,4. An important comorbidity relevant to HIV neurologic disease is substance use disorder (SUD), and in particular injection drug use5, with its negative impact on health by a multitude of factors including risky behaviors, emergence of medical and psychiatric sequelae, and lack of compliance with therapy6,7,8,9,10,11,12,13,14,15. Notably, HIV and SUD show a striking neuroanatomical convergence on dopamine-rich structures in brain. In ventral midbrain16 this includes substantia nigra (SN)17,18, a region important for mediating habitual behaviors and salience of cues associated with drug use18, withdrawal-related anhedonia and dysphoria, and extra-pyramidal motor dysfunction19,20. There is also evidence, mostly from preclinical work, that drug-of-abuse-induced dynamic adaptations in addiction-relevant dopaminergic circuitry could worsen HIV-associated cognitive and behavioral deficits21, with multiple dopaminergic signaling pathways converging on susceptible myeloid cells to promote viral entry and replication22,23.

Despite its importance, very little is known about cellular and molecular alterations in the ventral midbrains of people with HIV with or without SUD comorbidity, with the majority of studies limited to quantification of neurotransmitter (e.g., dopamine) in bulk tissue or neuroimaging and analysis of cerebrospinal fluid (CSF) in the context of advanced disease (AIDS)24,25,26.

Herein, we profile the ventral midbrain transcriptomes from n = 90 total people with and without HIV (HIV−), at single (cell) nucleus resolution, and deliver insights into the cell-type-specific neurogenomic responses to chronic infection and SUD. We specifically examine responses to opioid and/or cocaine SUDs, given the high prevalence of these substances and their combination among individuals with HIV27,28,29. Our analyses, focused on SN dopaminergic (DA) and GABAergic (GABA) neurons and their surrounding microglia, strongly suggest that SUD for opioids and cocaine creates, even in donors with ART-mediated viral suppression, a molecular environment permissive for HIV viral replication and risk for cytotoxic damage to dopaminergic neurons. In addition, we report a stepwise progression of transcriptomic dysregulation for opioid/cocaine SUD and HIV comorbidity that culminates in widespread neuronal pathology and pronounced inflammatory signatures in microglia from individuals without viral suppression (viremia) in the context of SUD.

Results

SN cell type composition in the context of HIV and SUD

We dissected ventral midbrain for unilateral collection of the SN including pars compacta and pars reticulata from n = 90 donors collected by the Manhattan HIV Brain Bank (MHBB) (39 female, 51 male; mean+/− standard deviation age 54.0+/−11.4 years; 33% Hispanic, 38% non-Hispanic Black, 22% non-Hispanic White; Table 1), including 67 participants followed longitudinally in the MHBB with deep clinical phenotyping during years of study participation (median 6, maximum 21 years). We assigned each donor to one of three HIV statuses: negative (HIV−, n = 28 donors), HIV positive with undetectable final plasma viral load (VL) defined as <50 RNA copies per mL (HIV+u, n = 30 donors), or HIV positive with viremia, evidenced by a detectable final plasma VL (HIV+d, n = 32 donors [viral RNA copies per mL ranging 65 to >750,000, median 6721]). We further grouped donors by substance use into SUD− or SUD+ for any combination of opioid/cocaine SUD (n = 60 SUD+, including n = 42 with a history of misusing opioids, of which n = 28 had a history of misusing both opioids and cocaine). This grouping produced HIV/SUD-defined subgroups that were overall closely matched with regard to demographic factors: SUD−HIV− (n = 13); SUD+HIV− (n = 15); SUD−HIV+u (n = 10); SUD+HIV+u (n = 20); SUD−HIV+d (n = 7); and SUD+HIV+d (n = 25) (Fig. 1a, Supplementary Table 1, and Supplementary Data 1 and 2). Notably, the wider MHBB cohort from which these donors were selected had a high prevalence of polysubstance SUD, with approximately three quarters of SUD-positive individuals endorsing symptoms for two or more drug classes.

Table 1 Demographic characteristics of donor groups
Fig. 1: snRNA-seq data generation and cell typing results.
figure 1

a Schematic of study design and snRNA-seq data acquisition procedure (showing at left the number N of donors per study group and at right, with the same color-coding, each group’s final number of quality-control-passing nuclei). V1-W and V1-A: width and area, respectively, of violet-channel fluorescence signals used to detect intact nuclei with fluorescence-activated nuclear sorting (FANS). DAPI: 4′,6-diamidino-2-phenylindole. be Cell type metrics. b 2D UMAP showing the cell types we identified for n = 200,732 nuclei total. c Cell type color legend for (be), also showing numbers and percentages of nuclei of each cell type. d UMAPs showing nuclei by cell type per donor group (colors as in (c)). e Cell type proportions for each donor group (see also Supplementary Data 4), with colors as in (c).

We processed nuclei in pools of up to 4 donors, with most pools containing mixed HIV/SUD donor phenotypes, using the 10× Chromium 3′ system followed by Illumina sequencing. We then matched each single, sequenced nucleus to a specific donor using single nucleotide polymorphism (SNP)-array genotyping and demuxlet30, and applied a series of quality control (QC) filters to each nucleus (Supplementary Figs. 1a–c), including an upper threshold on the proportion of mitochondrial reads (set at 0.0128; Supplementary Fig. 1a). We note that 100% of glial and 98% of neuronal nuclei that passed QC had a minimum of 200 genes expressed (Supplementary Figs. 1b, c), thereby exceeding standards from published SN reference sets31,32 (compare Supplementary Fig. 2 panels a and b). After preprocessing, the final dataset amounted to a total of 200,732 QC-passing nuclei, with 81 of 90 donors contributing >1000 nuclei (Supplementary Fig. 1d and Supplementary Data 3; “Methods”).

We first asked whether SN cell type composition is affected in the context of HIV and SUD. To maximize sensitivity for potential shifts in cellular subpopulations, we used as input 20,000 of 36,601 total gene transcripts (Supplementary Fig. 1e) as opposed to more typically suggested sizes of ~200033, and we performed a likelihood-based, agnostic approach for cell typing (“Methods”). Our approach produced 12 major cell types (Fig. 1b, c, Supplementary Fig. 3) that by numbers and proportions closely matched those reported for human SN single-cell-based transcriptomic reference sets34,35, with the largest share (65%) composed of oligodendrocytes (ODCs) and their precursors, and 14–15% made up of microglia and astrocytes. In contrast, SN neurons, endothelium, lymphocytes, and macrophages each contributed a much smaller proportion of nuclei. There were no significant differences for 176 out of 180 statistical comparisons of cell type proportions across HIV/SUD donor groups, indicating that the overall representation of major cell types in SN is preserved independent of HIV or SUD status (Fig. 1d, e and Supplementary Data 4).

Responses to SUD and HIV are each enhanced in dual diagnosis

To examine cell-level impacts on SN function due to SUD, HIV, or both, we performed differential expression analysis (DEA; “Methods”) by stratifying the entire cohort of n = 90 SN samples first by HIV groupings to interrogate effects of SUD, thereby comparing SUD+ vs. SUD− for HIV− donors, SUD+ vs. SUD− for HIV+u donors, and SUD+ vs. SUD− for HIV+d donors (Fig. 2). Likewise, to examine the impacts of HIV status, we stratified by SUD grouping, comparing HIV+u vs. HIV− for SUD+ donors, then for SUD− donors, and repeating similar analyses for HIV+d vs. HIV− and HIV+d vs. HIV+u, first for SUD+ and then for SUD− donors (Figs. 35).

Fig. 2: SUD-attributed differentially expressed genes (DEGs) by cell type.
figure 2

a–c SUD DEGs (p < 0.05, two-tailed Wald test with Benjamini–Hochberg multiple comparison correction) in dopaminergic (DA) neurons (blue, following Fig. 1c) of a HIV− (dark purple), b HIV+u (light purple), and c HIV+d (pink) donors. Red and blue dots: significantly up and downregulated DEGs, respectively. Numbers of up- and down-DEGs are shown. Norm.: normalized. d–f SUD DEGs in GABAergic (GABA) neurons (rose as in Fig. 1c) for d HIV−, e HIV+u, and f HIV+d donors. In (e), the DEG RBFOX1 is shown in blue and circled. g–i SUD DEGs in microglia (MG; light green as in Fig. 1c) for g HIV−, h HIV+u, and i HIV+d donors. jo Venn diagrams showing the extent of overlap among SUD DEGs across HIV levels in the same colors as (a–c), for j DA neuronal up- and k down- DEGs; l GABA neuronal up- and m down-DEGs; and n microglial up- and o down-DEGs. p, q Gene-by-gene comparisons of differential transcription across HIV levels for p DA neurons and q microglia. Columns in (p, q) contain all genes arising as SUD DEGs at any HIV level; rows correspond to HIV levels with colors as in (a–c). Genes per HIV level (row) are colored blue to red by log2 fold change if they are DEGs (see color bar) or gray if not. DA neuronal, GABA neuronal, and microglial nuclei are shown in the UMAP insets in (a, d, and g), respectively, with colors following Fig. 1c. Source data are provided as a Source Data file.

Fig. 3: HIV-attributed differentially expressed genes (DEGs) for dopaminergic (DA) neurons.
figure 3

a–c HIV DEGs (p < 0.05, two-tailed Wald test with Benjamini–Hochberg multiple comparison correction; red, upregulated; blue, downregulated) occurring in DA neurons of SUD+ individuals (green outline), for DEAs of a SUD+HIV+u vs. SUD+HIV− (peach), b SUD+HIV+d vs. SUD+HIV− (red), and c SUD+HIV+d vs. SUD+HIV+u (green-gray). d–f HIV DEGs occurring in DA neurons of SUD− individuals (gray outline) for DEAs of d SUD−HIV+u vs. SUD−HIV− (light green), e SUD−HIV+d vs. SUD−HIV− (yellow), and f SUD−HIV+d vs. SUD−HIV+u (light gray). g Gene-by-gene comparisons of differential transcription across HIV comparisons for SUD+ (top 3 rows, outlined in green, labeled with colors following (a–c)) and SUD− (bottom 3 rows, outlined in black, label colors as in (d–f)), similar to Fig. 2p, q. In (a, b, d, and e), the monoamine transporter DEGs SLC6A3 and SLC18A2 are circled and labeled in blue. Source data are provided as a Source Data file.

Fig. 4: HIV-attributed differentially expressed genes (DEGs) for GABAergic (GABA) neurons.
figure 4

a–c HIV DEGs (p < 0.05, two-tailed Wald test with Benjamini–Hochberg multiple comparison correction; red, upregulated; blue, downregulated) occurring in GABA neurons of SUD+ donors (green outline), for DEAs of a SUD+HIV+u vs. SUD+HIV− (peach), b SUD+HIV+d vs. SUD+HIV− (red), and c SUD+HIV+d vs. SUD+HIV+u (green-gray). d–f HIV DEGs in SUD− donors (gray outline) for d SUD−HIV+u vs. SUD−HIV− (light green), e SUD−HIV+d vs. SUD−HIV− (yellow), and f SUD−HIV+d vs. SUD−HIV+u (light gray). g Gene-by-gene comparisons of differential transcription across HIV comparisons for SUD+ (top 3 rows, outlined in green, label colors as in (a–c)) and SUD− donors (bottom 3 rows, outlined in black, label colors as in (df)). Source data are provided as a Source Data file.

Fig. 5: HIV-attributed differentially expressed genes (DEGs) for microglia.
figure 5

ac HIV DEGs (p < 0.05, two-tailed Wald test with Benjamini–Hochberg multiple comparison correction; red, upregulated; blue, downregulated) occurring in microglia of SUD+ donors (green outline) for DEAs of a SUD+HIV+u vs. SUD+HIV− (peach), b SUD + HIV+d vs. SUD+HIV− (red), and c SUD+HIV+d vs. SUD+HIV+u (green-gray). d–f HIV DEGs in SUD− donors (gray outline) for DEAs of d SUD−HIV+u vs. SUD−HIV− (light green), e SUD−HIV+d vs. SUD−HIV− (yellow), and f SUD−HIV+d vs. SUD−HIV+u (light gray). g Gene-by-gene comparisons of differential transcription across HIV comparisons for SUD+ (top 3 rows, outlined in green, label colors as in (a–c)) and SUD− donors (bottom 3 rows, outlined in black, label colors as in (d–f)). Source data are provided as a Source Data file.

We focused our DEAs on SN DA neurons and GABA neurons, given their key role in addiction circuitry and potential sensitivity to HIV infection25,36,37,38,39, and also on microglia, as mediators of neuroinflammation and host cells for HIV genomic integration39,40,41. We reduced the rate of false positive differentially expressed genes (DEGs) using multiple steps. First, we applied cross-fold validation (k = 3 folds), creating for each comparison and cell type three separate subsets of nuclei, performing DEA on each, and retaining only DEGs appearing in all 3 subsets. Second, we applied DESeq2 DEA model fitting, to exclude genes with low expression dispersion estimates and low counts42,43 (“Methods”).

Using the above DEA pipeline on DA neurons, and stratifying for HIV status (to assess the impact of SUD), we called 51 genes downregulated in SUD+HIV− vs. SUD−HIV− donors (false discovery rate [FDR] p < 0.05; log2 fold change [l2fc] > variable values, dependent upon FDR, sample size, and dataset information content43; Supplementary Data 5). These DEGs enriched gene ontology (GO) terms44 linked to neuronal function, including axon myelination, presynaptic organization, cell adhesion, and Na+ transport (Fig. 2a and Supplementary Data 6). Notably, SUD+HIV+u donors, while exhibiting significant sharing of these downregulated DEGs with SUD+HIV− donors (by permutation test, p < 10−5), also showed 82 uniquely upregulated DEGs, which enriched most prominently GO terms for axon growth and presynaptic transmission (Fig. 2b, j, k and Supplementary Data 6). This observation suggests enhanced sensitivity to SUD in the context of HIV comorbidity, even in the face of viral suppression (for HIV+u donors). In contrast, SUD+HIV+d had 0 DEGs vs. SUD−HIV+d (Fig. 2c, p and Supplementary Data 5), suggesting that for these two viremic donor groups, transcriptomic alterations of SN DA neurons were primarily driven by factors related to active HIV replication (viremia).

Indeed, this is what we observed when the DEA of DA neurons was stratified for SUD status (to assess the impacts of HIV), with SUD+HIV+d donors having 40 up/38 downregulated DEGs vs. SUD+HIV− (Fig. 3b). Furthermore, while downregulated DEGs were highly shared between SUD+HIV+d and SUD+HIV+u donors (35 shared, p < 10−5; Fig. 3a, b), many viremia-specific (HIV+d) effects were identified (56 downregulated and 5 upregulated DEGs that enriched different GO terms; Fig. 3c and Supplementary Data 6). Importantly, the presence of SUD increased virus-associated transcriptomic dysregulation: DA neurons from SUD−HIV+d and SUD−HIV+u donors consistently showed a much smaller number of DEGs than their SUD+ counterparts (by approximately 1/3; Fig. 3d, e) and in the absence of SUD, HIV+u and HIV+d transcriptomes were indistinguishable upon direct comparison (0 DEGs, Fig. 3f).

Despite the above SUD-attributable differences, a striking commonality in all HIV+ groupings also emerged from our analyses: DA neurons from all HIV+ groups, regardless of SUD status or VL, showed significant downregulation in SLC6A3 dopamine and SLC18A2 monoamine transporter genes (Fig. 3a, b, d, e and Supplementary Data 5 and 6), together with a more variable dysregulation of genes related to neuronal immune responses, presynaptic vesicle release, synaptic signaling, and cell-cell adhesion. From all the above observations, we draw two conclusions. First, the neurogenomic response to HIV infection, both for donors with viral suppression and without, alters fundamental signaling mechanisms in SN DA neurons, including dopamine reuptake. Second, SUD comorbidity enhances abnormality in HIV+ donors, driving a stepwise progression in DA neuron dysregulation, in which abnormalities are more severe in SUD+ compared to SUD− donors and enhance the effects of systemic viral replication (Fig. 3c, f).

Next, we ran our DEA pipeline on SN GABA neurons, which produced few DEGs across all comparisons (there were median [interquartile range or IQR] 0 [0, 0.75] GABA DEGs for each of our 6 DEAs, as opposed to 24 [0, 40.75] DA DEGs; compare Fig. 2a–c to d–f and Figs. 3 to 4). Stratification by HIV status produced only one DEG attributable to SUD: downregulation of RBFOX1 GABAergic transcription factor in SUD+HIV+u vs. SUD−HIV+u donors (associated with decreased inhibition45,46; Fig. 2d–f, l, m). However, stratification by SUD revealed severe HIV-associated GABA neuron transcriptomic dysregulation in viremic (HIV+d) donors with SUD, pointing to reduced inhibitory neurotransmission (Fig. 4b, c and Supplementary Data 6). We counted 115 HIV-associated DEGs (96 down, 17 up) in SUD+HIV+d donors vs. SUD+HIV−, and 322 DEGs (266 down, 57 up) in SUD+HIV+d vs. SUD+HIV+u donors. There was significant overlap (p < 10−5) between these two comparisons, including downregulated DEGs involved in GABA production, glutamatergic G-protein-coupled receptor (GPCR) signaling, homophilic cell-cell adhesion, and action-potential-associated membrane depolarization (Supplementary Data 5 and 6). In addition, there was downregulation of several opioid-related transcripts, including of postsynaptic opioid-binding cell adhesion molecule OPCML, which is downregulated during chronic opioid agonist exposure47,48,49; the closely related neurotrimin (NTM); ephrin-B1 receptor (EPHB1), which colocalizes and interacts closely with ephrin-B2 receptor50,51 as a mediator of OPCML signaling49; and non-coding RNA AC073225.1, previously reported to show decreased expression in opioid use disorder (OUD)52. These alterations were specific to SUD+HIV+d, given that SUD−HIV+d vs. SUD−HIV−, SUD−HIV+u vs. SUD−HIV−, and SUD+HIV+u vs. SUD+HIV− comparisons completely lacked significant differences in gene expression (0 DEGs).

Finally, we applied our stratified DEA pipeline to SN microglia, resulting in hundreds of DEGs attributable to SUD (Fig. 2g–i, n, o, q). These transcriptional alterations were mostly enriched for various combinations of pro- and anti-inflammatory molecules (Supplementary Data 5 and 6), in broad agreement with recent studies that have reported upregulation of immune signaling genes and transcriptomic signatures indicative of microglial activation and altered glial motility in OUD, chronic opioid exposures, and overdose34,53,54,55. However, in our analysis, a substantial fraction of SUD-attributable microglial DEGs were unique to each HIV grouping (Fig. 2n, o), pointing to distinct microglial responses to SUD in the context of varying levels of HIV (HIV−, HIV+u, HIV+d56). For example, SUD-associated gene sets supporting reduced protein stabilization and interferon-1 production were seen only in HIV+d, and not in the other HIV groupings (Supplementary Data 6).

Stratifying by SUD groupings (Fig. 5), we also counted hundreds of microglial DEGs attributable to HIV, including 446 down and 89 upregulated DEGs for SUD− subjects with viral suppression (SUD−HIV+u) vs. HIV− controls (SUD−HIV−), >50% of which were replicated in viremic SUD−HIV+d donors (p < 10−5). These shared changes primarily involved gene sets for increased inflammatory signaling, decreased phagocytosis and iron ion responses, and increased stress response protein folding/refolding (Supplementary Data 6). For the SUD−HIV+d group, there were additional changes vs. SUD−HIV+u, including enhancements of immune signaling and telomere maintenance/remyelination processes, as well as antiviral-response-related reductions in glucose transport/GPCR signaling57,58. In the context of SUD (Fig. 5a–c), when comparing SUD+HIV+u and SUD+HIV+d donors to SUD+HIV− controls, microglial transcriptional alterations included multiple gene sets related to immune activation, many of which were shared; further, some of these shared effects were opposite to shared HIV effects in the absence of SUD. For example, in the absence of SUD, HIV resulted in downregulated iron ion responses; in the context of SUD, iron ion responses were upregulated (Supplementary Data 6). Additionally, in the context of SUD, microglia-mediated regulation of synaptic signaling was dysregulated in both HIV+u and HIV+d donors; this effect was absent in the SUD− comparisons.

Coordinated DEG sets link microglial and neuronal function

Next, we wanted to complement our pathway-by-cell type analyses with an alternative approach, by capturing DEGs with statistically linked, or “coordinated”, expression levels across the majority of donors within an SUD/HIV group. To that end, for each DEA we performed, we searched for groups of DEGs that had a high degree of inter-patient covariance (“Methods”). We selected similarity-based DEG subclusters with average pairwise similarity scores >0.75, our cutoff for identifying gene groups with the most ubiquitous coordinated transcriptional alteration (Fig. 6a), for functional annotation. In the following, we discuss these DEG subclusters, with a focus on microglia-neuron biological functions and interactions.

Fig. 6: Differentially expressed gene (DEG) subclustering approach, and dopaminergic (DA) neuronal-microglial (MG) DEG subclusters showing SUD impacts.
figure 6

a Schematic of our approach for identifying DEG subclusters, shown for DA and MG DEGs from our differential expression analysis (DEA) of SUD+HIV− vs. SUD−HIV− donors (“Methods”). b Venn diagram showing color legend used to categorize subcluster DEGs and/or DEG functional groups, based on which DEAs they arise in, specifically for the 4 DEAs assessing impacts of either SUD (left; DEGs in Fig. 2a, g), HIV+u (right; Figs. 3d and 5d), or SUD+HIV+u (middle; Figs. 2b, h, 3a, and 5a). Labels correspond to interpretations for each effect type. c Dendrogram for DEGs from SUD+HIV− vs. SUD−HIV−, showing subclusters for SUD impacts; labels identify subclusters shown in (dg). d, e DEG list and pairwise similarities for subcluster C22S1. d Subcluster C22S1 DEGs, organized by functional group; color at right serves as a group index used with (e); color at left indicates type of effect following (b); bold text denotes DEGs recurring as SUD impacts in SUD+HIV+u donors. e Heatmap showing subcluster C22S1 DEGs arranged by similarity, with element color corresponding to pairwise DEG correlations. Colors at left of DEG names correspond to functional group indices as in (d); DEGs names with pink color at right recur in SUD+HIV+u following (b, d). f, g Similar to (d, e), but for subcluster C22S2.

Microglial HIV and DA neurotransmission in SUD + HIV + u

We first examined highly coordinated DEGs for DA neurons and microglia, using the 4 DEAs that showed changes for either SUD, suppressed HIV (HIV+u), or both (Fig. 6b). We identified 117 highly coordinated DEG subclusters (with median [IQR] 10 [6,21] DEGs per subcluster; Supplementary Table 2 and Supplementary Data 7 and 8). Interestingly, 17 of these subclusters contained a mix of DA and microglial DEGs, suggesting DA-microglial interactions. Two of the 17 were specific for the comparison of SUD+HIV− vs. SUD−HIV− (related to impacts of SUD in the absence of HIV; Fig. 6d–g). Both of these subclusters involved glycoproteins: one (C22S1) linked decreased N-glycan synthetic enzyme FUT8 fucosyltransferase59 in both microglia and DA neurons to altered DA neurotransmission (reduced membrane maintenance, reduced pre-/post-synaptic machinery, slowed presynaptic release, and increased excitability; Fig. 6d, e). Notably, FUT8 has previously been associated with increased basal microglial reactivity60 and impaired neuronal membrane maintenance61,62,63. The other subcluster (C22S2) linked decreased DA neuron expression of GPM6B glycoprotein to a microglial switch from oxidative phosphorylation to aerobic glycolysis64,65, as well as to deficits in the microglial antiviral response and in transcripts associated with increased microglial-neuroligin-1-mediated excitatory synapse formation (MDGA2 ↓ 66,67) and increased catecholaminergic receptor responses (SPATS2L ↓ 68,69), among others (Fig. 6f, g).

For dual-diagnosis (SUD+HIV+u) donors, there were 9 mixed DA-microglial subclusters. Three of these closely resembled the subclusters described above for SUD without HIV, specifically for DA DEGs (see SUD conserved impacts denoted by deep pink color in Fig. 6b, d, f and Supplementary Data 8). Additionally, and remarkably as these were virally suppressed donors, 5 out of 9 had prominent representation of DEGs linked to HIV infection and replication (Supplementary Data 8). For example (subcluster C16S2, Fig. 7c, d), DA neuronal upregulation of the Na+-Ca2+ exchanger SLC8A1 was linked to 31 microglial DEGs important for HIV transcription and replication (HIF1A ↑ 70,71, HIF1A-AS3 ↑ 72,73,74, HIVEP2 ↑ 75,76,77,78, SYTL3 ↑ 79,80, REL ↑ 81,82,83,84, ELL2 ↑ 85,86), HIV host entry and genome stabilization (TFRC ↑ 87, SEC14L1 ↑ 88, CD55 ↑ 89,90) or viral stabilization (SIPA1L1↑ or E6TP1 ↑ 91), and HIV-1 intracellular compartment release (ATP1B3 ↑ 92). We note that Na+-Ca2+-exchanger-mediated prolongment of neuronal excitation was previously linked to HIV infection in an in vitro model93. Thus, these subclusters indicated that in virally suppressed people with HIV, SUD may result in microglial changes permissive for HIV. Other DEG subclusters for SUD+HIV+u donors also linked microglial dysregulation to increased excitability in DA neurons (Fig. 7g, h and Supplementary Data 8).

Fig. 7: Dopaminergic (DA) neuronal-microglial (MG) differentially expressed gene (DEG) subclusters showing added functional impacts for SUD+HIV+u donors.
figure 7

a Dendrogram relating DA and MG DEGs for the differential expression analysis (DEA) comparing SUD+HIV+u vs. SUD−HIV+u donors (Fig. 2b, h); the two subclusters shown in (cf) are labeled. b Dendrogram relating DEGs for SUD+HIV+u vs. SUD+HIV− (Figs. 3a and 5a); the subcluster shown in (g, h) is labeled. c, d DEG list and pairwise similarities for subcluster C16S2. c Subcluster C16S2 DEGs organized by functional group; color at right serves as a group index used with (d); color at left indicates type of effect following Fig. 6b. M. sim: subcluster mean pairwise similarity. d Heatmap showing subcluster C16S2 DEGs arranged by similarity, with element color corresponding to pairwise DEG correlations. Colors below DEG names correspond to functional group indices as in (c). e, f Similar to (c, d) but for subcluster C17S0, and with functional group indices from (e) to the right of DEG names in (f). g, h Similar to (e, f) but for subcluster C6S0.

To validate these findings, we performed dual in situ hybridization (RNAscope) and immunofluorescence (FISH/IF) imaging of SN tyrosine hydroxylase (TH)-expressing DA neurons and IBA1-expressing microglia for a total of 3 SUD+HIV+u donors, targeting subcluster C16S2 upregulated transcripts SLC8A1 and REL (Fig. 7c, d; “Methods”). We confirmed that FISH/IF signal for both transcripts showed statistically significant positive covariation (p ranging from <3 × 10−5 to 0.016) across donors for their respective cell types (Supplementary Fig. 4).

Microglial phagocytosis and DA neurotoxicity in SUD+HIV+u

Another specific SUD+HIV+u subcluster (subcluster C17S0, Fig. 7e, f) linked microglial upregulation of receptors for neurodegeneration-associated inflammatory phagocytosis (OLR194,95, and ITGAX, which targets complement-coated particles96) to DA neuronal downregulation of neuronal maintenance genes CLMN97,98 and PEX5L99). Taken together, this subcluster suggests that for individuals with undetectable HIV, SUD+/HIV+u synergy still may be associated with damage to vulnerable DA neuron populations.

Linked GABAergic impairment and inflammation in SUD+HIV+d

We also identified DEG subclusters for GABA neurons and microglial cells, to explore their relationships in SUD+HIV+d, because in our DEAs this subgroup specifically showed signs of disrupted GABA neuron function (Fig. 4b, c and Supplementary Data 8). Compared to SUD+HIV+u, SUD+HIV+d individuals had microglial upregulation of DEGs linked to pro-inflammatory signaling (EPSTI1100, IFI44L101,102) together with alterations in GABAergic regulators of aging (RANBP17 ↓ 103), cellular maintenance, GABAergic transmission (SYT9 ↓ 104; DDAH1 ↓ 105), and endoplasmic-reticulum-Ca2+-mediated apoptosis (DPP10 ↑ , RYR3 ↑ 106,107). Therefore, the impairment of GABAergic regulation in SUD+HIV+d is linked to pro-inflammatory responses in microglia.

Discussion

We have explored cell-type-specific transcriptional changes in the SN of donors with HIV, opioid/cocaine SUD, or both. Our rationale for HIV groupings (HIV−, HIV+u, HIV+d) was based on an extant literature that has linked plasma HIV to brain microglial activation and other aspects of brain pathology108, and on the relevance of plasma HIV load as a standard for guiding patient care. Our rationale for using a polysubstance (opioid/cocaine) approach to SUD groupings was based on the pronounced polysubstance prevalence both in the MHBB cohort and overall in the United States109, the importance of these particular drug classes to the HIV epidemic in New York City27,28,29, and a call in existing literature for more research on polysubstance users, which reflects the lifetime exposures of the majority of individuals with chronic SUDs109,110,111.

Our findings are summarized in Fig. 8. Among the neuronal and microglial changes we have reported, one major component is altered expression of hundreds of pro- and anti-inflammatory transcripts in microglia attributable both to SUD and HIV, with the combined effects of SUD and HIV resulting in a more pronounced dysregulation of the microglial inflammatory transcriptome, particularly in those donors lacking viral suppression (HIV+d). These changes were significantly coupled with abnormalities in neuronal integrity, excitability, and neurotransmission. Perhaps most strikingly, we observed that dysregulated DA neuron transcription in the SN of virally suppressed, opioid/cocaine-SUD+HIV+u donors was linked to upregulated microglial transcription of genes permissive for HIV infection, replication, and reactivation, along with an altered microglial antiviral response and increased neurodegeneration-associated inflammatory phagocytosis (Supplementary Data 8). These findings, in a carefully phenotyped, prospectively characterized human population, are striking given the previously reported evidence of a selectively expanded brain viral reservoir in ART-treated, Simian Immunodeficiency Virus (SIV)-infected non-human primates with chronic morphine exposure112,113. In this SIV model, reservoir expansion occurred in conjunction with microglial transcriptional changes indicative of immunosuppression and degeneration, relative to SIV+ opioid-negative comparators112,113. With the techniques utilized in the present study, we are unable to comment on the brain viral reservoir in our donors, and we identified only 36 HIV transcripts in our entire collection of nuclei (“Methods”). This is not surprising, given that in a previous study with similar snRNA-seq methodology, HIV transcripts in a subset of microglial nuclei were limited to donor brains displaying the histopathology of active brain viral replication or HIV encephalitis (HIVE)114. None of the donors in our present study had histopathologic evidence of HIVE, assessed as part of the standard MHBB neuropathology protocol115,116. The lack of viral transcript detection constitutes a limitation of our present work. Furthermore, it contrasts with recent study in which HIV RNA was detected in whole-cell, sorted brain microglia from 2 of 3 virally suppressed HIV+ donors117,118. The donors in this prior study were exclusively male and described as having been on ART “until at least 2 wk [sic] before death”117. The discrepancy between our study and this previous study warrants further investigation, and it could reflect differences in nuclear vs. cytoplasmic substrates for analysis; the time at which ART therapy was discontinued in donors prior to death; or possibly either geographical variabilities in circulating and evolving HIV strains or differing genetic determinants of immune function and viral control in donor populations.

Fig. 8: Summary of functional impacts identified for SUD and HIV.
figure 8

Schematics summarizing the biological impacts identified for a SUD only, b SUD with HIV+u, and c SUD with HIV+d, on dopaminergic neurons (labeled DA), GABAergic neurons (GB), and microglia (MG). Descriptions indicate biological impacts, with font color showing differential expression analysis (DEA)-based association with HIV and/or SUD (as defined in (a)).

Another important aspect of our work was the demonstration that SUD comorbidity in the context of HIV status exerted impacts on SN neuronal transcriptomes in a stepwise progression. In HIV− donors, we observed only downregulation of DA neuronal transcripts that potentially induce impaired function, senescence-like changes, and increased excitability (Fig. 6 and Supplementary Data 8). In virally suppressed individuals (HIV+u), we observed both a shared (with HIV−) downregulation of DA neuronal genes potentially increasing DA neuron excitability, as well as uniquely upregulated DEGs that suggested enhanced sensitivity to SUD (Fig. 2). Similarly, we observed evidence of enhanced sensitivity to HIV in SUD+HIV+u donors (Fig. 3). With viremia/absence of viral suppression (HIV+d), we further observed changes affecting hundreds of GABAergic transcripts and potentially compromising inhibitory neurotransmission. As microglial transcriptional dysregulation had unique components and also progressed with each HIV grouping, this raises the question of what drives progression, and whether it is unidirectional (driven by one cell type) or bidirectional (driven reciprocally). Recent work in mixed cell cultures (containing microglia with either DA or GABA neurons) has shown that healthy neurons may drive HIV into latency in microglia; and conversely, neuronal damage, including substance-related damage, fosters microglial HIV reactivation and viral expression, thereby further compromising neuronal health due to enhanced microglia-mediated neurotoxicity119. Our findings in postmortem human brain raise the possibility that similar bidirectional phenomena may be relevant in human, but future studies will be needed to further examine the mechanisms driving neuronal and glial dysfunction, expanding on our observations of transcriptomic signatures of coordinated detrimental interactions between HIV and SUD (for a more complete listing of significant coordinated abnormalities, see Supplementary Data 7 and 8).

Finally, while we have focused on effects of HIV and SUD comorbidity in altering SN neuronal and microglial transcriptomes, it is noteworthy that in all our HIV+ donor groups, regardless of VL or SUD status, DA neurons showed a consistent deficit in the expression of the dopamine and monoamine reuptake transporters SLC6A3 (encoding the protein DAT) and SLC18A2, respectively, two key regulators of DA turnover at synapses. Combination cocaine-and-raclopride positron emission tomography studies of viremic people with HIV, both with and without SUD, have demonstrated similar abnormalities at the protein level in vivo, showing decreased DAT signal in the dopamine-terminal-rich basal ganglia120,121. The prevailing hypothesis for this phenomenon was that HIV viral proteins, including its transactivator of transcription or Tat and others, directly bind to monoamine transporters, thereby inhibiting monoamine (including dopamine) reuptake and damaging DA neurons122,123. However, downregulation of DAT and monoamine reuptake transporter transcripts occurred in our study even in the virally suppressed HIV+SUD− donor group; and furthermore, evidence of viral Tat transcripts in our brains was lacking. The DA-neuron-specific RNA results presented here point to a transcriptional mechanism resulting in decreased monoamine transporter expression that, while linked to some unified aspect of HIV infection, does not require evidence of ongoing viral replication or transcription.

In summary, the strengths of our study include: a large number of carefully phenotyped brain donors with large amounts of prospectively collected data on SUD and immunovirologic status; analytic pipelines that are tolerant to abnormality yet rigidly subjected to quality assurance/QC procedures that exceed other published norms; analytic groups parsed by clinically relevant factors in HIV and SUD; and in-depth annotation of DEG groups that show coordinated expression in most members of their associated donor group (Supplementary Data 7 and 8). Limitations include a lack of viral transcript detection and the need for additional mechanistic exploration of the multiple transcriptomic abnormalities described in these donor groupings. Future studies are indicated to further understand the significance of these genomic adaptations for future development of therapies for people with HIV and SUD.

Methods

Brain donor tissue and clinical data collection

Brain donors were autopsied and their donations curated by the MHBB between the years 1999 and 2021, using study protocols and consent documents approved by the Icahn School of Medicine at Mount Sinai (ISMMS) Institutional Review Board (IRB; IRB Approval Number STUDY-11-00388). Briefly, at the time of autopsy, the entire brain was removed, photographed, and bisected, and one hemisphere was placed in phosphate-buffered formalin for subsequent processing, gross neuropathologic review, sectioning, and histological analysis115. The other hemisphere was immediately sectioned, each section photographed, and then was snap-frozen between aluminum plates chilled to a temperature of −80° Celsius (C). Frozen brain sections were stored at −80 °C in a monitored facility. For this study, samples of ventral midbrain containing recognizable pigmentation indicative of SN were dissected, using a rotary saw on a cold table (chilled to −20 °C). As there was some rostro-caudal variation in the anatomical level at which each donor’s SN was dissected, we accounted for the relative location as rostral midbrain (SN at the transverse level of the red nucleus) or caudal midbrain (at the level of the cerebellar peduncular decussation) and included this as a covariate in our DEA (see below).

Prospectively collected clinical information was available for 67 donors (62 HIV+, 5 HIV−) as a consequence of their participation in the MHBB longitudinal, observational study. For 23 other HIV− donors, information was extracted by medical record review at the time of demise. The sex of brain donors was ascertained by self-report for members of the prospective MHBB cohort, or chart review. For all donors, sex was verified by genotyping as part of quality control measures in utilizing SNP arrays. Substance use characterization (DSM IV dependency diagnoses) for those followed in the MHBB study was obtained with either the Psychiatric Research Interview for Substance and Mental Disorders versions 1.9b and SL124 or the World Health Organization Composite International Diagnostic Interview version 2.1125, and drug utilization was monitored by urine toxicologic analysis for opiates, cocaine, and other psychoactive substances at each study visit, as described previously126. At the time of urine toxicology, the donor’s reported medications were examined to determine whether psychoactive substances were prescribed secondary to medical illness. For donors not enrolled in the prospective MHBB study, DSM IV diagnoses and problematic utilization were obtained from medical records, inclusive of urine toxicologic analysis when available. Immunovirologic data (CD4 T-cell counts and plasma HIV load) were similarly obtained either as a prospective research study laboratory or retrospectively extracted from medical records; all were performed in the CLIA-certified laboratories of the Mount Sinai Hospital. In this study, for HIV+ individuals, we used a VL of 50 copies per mL as the threshold for HIV detectability, to be consistent with the quantitative lower limit of the earliest assays utilized in the MHBB cohort.

Brain donors for this study did not receive compensation for the use of their tissues, as they were decedents.

snRNA-seq

We performed sample processing consisting of nuclei purification, RNA extraction, and subsequent generation of snRNA-seq libraries, following an established method34. Samples were homogenized using a douncer at least 20 times in 1 mL lysis buffer (0.32 M sucrose, 5 mM CaCl2, 3 mM Mg(Ace)2, 0.1 mM EDTA, 10 mM Tris pH8, 0.5 mM DTT, 0.1% Triton ×-100) with 400 U RNase inhibitor (Takara Bio Recombinant RNase Inhibitor, Cat. 2313). Then, an additional 4 mL of lysis buffer was added and the sample solution was dounced an additional 20 times until homogenous. After douncing, each pool of 1–4 unique samples was transferred to an ultracentrifuge tube (Beckman Coulter Polypropylene Centrifuge Tubes ⅝ × 3 ¾ in., Ref. 361707) and underlaid with 9 mL of sucrose buffer (1.8 M sucrose, 3 mM Mg(Ace)2, 0.5 mM DTT, 10 mM Tris-HCl pH8), then ultracentrifuged at 24,000 rpm in a SureSpin 630 (17 mL) Rotor (106,803 × g) for 1 hr at 4 °C. After centrifugation, the supernatant was removed and each pellet of nuclei was carefully resuspended in 1 mL of 1% BSA with 1000 U RRI added to it, transferred to a sterile tube, and 1 µL of the nucleophilic dye, DAPI (4′,6-Diamidino-2-Phenylindole, Dihydrochloride, Invitrogen Cat. D1306), was added. For FACS collection, sterile tubes were coated with 5% BSA. After residual BSA solution at bottom of tubes was removed, DAPI+ nuclei were sorted into the collection tubes using a BD FACSAria Cell Sorter, with approximately DAPI+ nuclei for each set of unique midbrain samples collected and processed using the 10× Chromium Next GEM Single Cell 3′ v3.1 (Dual Index) Protocol (CG000315 Rev A) according to the manufacturer’s instructions. The Agilent 2100 High Sensitivity DNA Bioanalyzer Kit was used as a quality control step at Step 2.4 and end of the library preparation, also as per 10× Genomics’ guidelines. To prepare samples for sequencing, sample concentration was determined using the KAPA Biosystems Library Quantification Kit (ROX Low qPCR Master Mix, Cat. KK4873). Libraries were sequenced by the New York Genome Center using the Illumina NovaSeq platform. Libraries consisted of paired-end reads with a read length of 100 bp. As described below, each library was separated by donor after sequencing and prior to analysis, using genotype-based demultiplexing, such that we performed our analyses on a per-donor basis (see below).

We generated the first round of 33 snRNA-seq libraries by pooling tissue from 3 to 4 patients per library, then performed QC and generated additional snRNA-seq libraries for donors with <1000 QC-passing nuclei, in each pooling tissue from only 1 to 2 patients to increase the per-patient nuclear yield. We added libraries for low-yield donors until the tissue sample was exhausted or the total number of quality-passing nuclei exceeded 1000 for 90% of donors, resulting in a total of 51 libraries across the 90 study donors. We determined the threshold of 1000 QC-passing nuclei per donor using a power analysis (see below).

Imputed donor SNP arrays for demultiplexing

We used donor SNP arrays as a reference to demultiplex pooled snRNA-seq data, generating them from donor cerebellum tissue using the Illumina Infinium Global Screening Array platform (GSA-24v3.0). We prepared DNA for each donor at a final concentration of 20 ng per μL. Genotyping was performed at the Center for Applied Genomics Core at Children’s Hospital of Pennsylvania. We performed QC on SNP arrays using GenomeStudio v2.0127,128, retaining patient genotypes with call rate of >95% of SNPs, and SNPs with minor allele frequency (MAF) > 0.2, call rate >99%, and clustering (“GenTrain”) score >0.7. We performed a second round of SNP genotyping for three donors with SNP call rates <95%, resulting in two panels with 517,091 SNPs (614,679 variants total) and 416,166 SNPs (603,685 variants total) across chromosomal designations 1–22, X, Y, XY, PAR, and MT. To create a demultiplexing reference, we retained SNPs from chromosomes 1–22 for each panel, and we imputed larger arrays for these to improve demultiplexing accuracy30 using the Trans-Omics for Precision Medicine (TOPMed) Imputation Server129,130,131 (TOPMed r2 reference, EAGLE v2.4 phasing132,133, no r2 filter, data-TOPMed reference panel allele frequency comparisons, pre-run QC). After pre-run QC, 482,995 and 212,783 SNPs from the two panels were used for phasing and imputation; after imputation the panels had 27,749,087 and 11,426,093 SNPs. We combined these imputed genotypes for donors into a single reference and ran demuxlet on each group of sequenced libraries we produced as described above (4 total). For each group, we filtered the demultiplexing reference using bcftools134 to retain only biallelic SNPs that had a depth of coverage across donors >9, as well as estimated imputation accuracy r2 > 0.5 and MAF > 0.2. Our resulting reference panels included 568,817; 480,835; 676,322; and 291,170 SNPs.

Alignment and cleaning of snRNA-seq data

We pre-processed sequenced data using multiple steps, first performing alignment, demultiplexing, and expression noise/doublet removal on each pooled library separately, then performing heuristic-based filtering on the data from all libraries combined, to generate per-donor data cleaned of ambient RNA, doublets, and other low-quality information.

Read alignment

We first aligned reads per library to a composite genome using Cell Ranger version 6.1.2 (using the count function with option -include-introns)135,136. We built the composite genome reference using Cell Ranger’s mkref, to be a combination of the GRCh38 human genome, the mm10 mouse genome, and a split version of the HIV-1 reference genome (previously shown to reduce ambiguities in alignment of HIV-associated reads, increasing detection137). Notably, we only detected 36 HIV reads in our data, a limitation likely related to both the presence of highly mutated HIV dissimilar to the reference sequence and lower viral levels in our chronically infected, non-encephalitic HIV+ donors (“Discussion”).

Demultiplexing/multidonor doublet removal (demuxlet) + expression noise removal (CellBender)

Second, we performed two processes in parallel: demultiplexing and ambient RNA/noise removal, using as input for each the human-genome-aligned reads for each library. We demultiplexed data with demuxlet version 230 using popscle (version v.0.1b; https://github.com/statgen/popscle), with our donor SNP reference panel described above, doublet mixing fraction α ranging from 0 to 0.5, and 0.1 doublet prior, which produced for each nuclear barcode a label of “singlet”, “doublet”, or “ambiguous” and its maximum-likelihood donor ID(s). We confirmed that for each library, the donor IDs returned matched those expected.

We performed ambient RNA/noise removal using CellBender remove-background138,139 with the remove-background-v3-alpha workflow on the Terra.bio platform140,141. CellBender uses unsupervised learning to model and remove expression count noise, due to ambient RNA molecules and random barcode swapping. For most libraries we ran CellBender using false positive rate 0.01, learning rate 0.00005, posterior batch size 5, 150 epochs, and 2 retries. For a few libraries, we adjusted these: CellBender failed to converge during learning for library SNr11 with our original parameters, but achieved reasonable loss (indicative of successful learning) with learning rate 0.000025 over 200 epochs. Another library (SNr27) showed signs of overfitting that we corrected by using learning rate 0.00005 over 60 epochs. For each library, the output from CellBender was a cleaned expression matrix in anndata142 format (version 0.8.0).

We combined demuxlet and CellBender results by filtering each library’s CellBender-cleaned expression data to retain only demuxlet-assigned, nuclear singlet barcodes, then adding the corresponding donor IDs to the expression data as barcode annotations.

Multi-cell-type doublet removal (Scrublet)

Third, we used Scrublet version 0.2.3143 to remove remaining doublets (barcodes containing reads from multiple cell types from one patient; we expected that demuxlet removed most multi-patient doublets), processing each patient’s nuclei within each library separately (134 patient groups across 51 libraries). Scrublet removed 4372 doublets in total, or a median (IQR) of 33 (8, 37) (1.45% [0.82%, 2.22%]) of doublet barcodes per group. After this step, we presumed that most barcodes corresponded to single nuclei and subsequently called them “nuclei”.

Heuristic filtering

Fourth, we combined expression data across all libraries and performed heuristic filtering to remove potentially low-quality or dying nuclei. We filtered based on fraction of mitochondrial genes per nucleus, number of genes per nucleus, and number of reads (denoted by unique molecular identifiers or UMIs) per nucleus. For each heuristic, we produced the distribution of values across all dataset nuclei, then identified the thresholds of specific distribution tails and filtered out nuclei with more extreme values. We excluded nuclei with the highest mitochondrial fractions from our dataset, since fractions >5% or 10% are typically thought to indicate potentially dying cells31. We removed nuclei in the upper tail of our dataset, amounting to 10% of our nuclei with mitochondrial fractions >1.28% (Supplementary Fig. 1a). This measure is notably lower than typical thresholds (5% to 10%), such that we may have conservatively over-removed nuclei in this step. We also removed nuclei with excessively high numbers of genes per nucleus (which may indicate e.g., doublets), excluding the upper tail or the ~5% of our nuclei with >4396 genes per nucleus (Supplementary Fig. 1b). Importantly, because previous studies have shown that human SN neurons and some immune cells exhibit lower gene counts in snRNA-seq datasets31,32, we did not filter out nuclei with fewer genes per nucleus at this step. Instead, we relied on cell-type-specific filtering performed during differential expression analysis to exclude low-count data (see below). Following a similar rationale, we excluded nuclei in the upper tail of our dataset in terms of number of UMIs per nucleus, amounting to ~5% of our nuclei with >17,153 UMIs (Supplementary Fig. 1c). Notably, we identified heuristic thresholds each time a new group of sequenced data was added to our dataset (during data generation; see above) and identified roughly the same thresholds as those above during each intermediate stage.

snRNA-seq dataset integration and clustering

To produce a finalized dataset for analysis, we performed three additional steps. First, we further reduced expression count noise using two forms of linear dimensionality reduction: highly variable gene (HVG) selection and principal component analysis (PCA) dimensionality reduction (Supplementary Fig. 1e, f). Second, we performed batch correction, to facilitate cross-library dataset integration (Supplementary Data 9). Third, we performed nucleus clustering to use as input for cell typing. Below, we explain several choices that we made in these steps to accommodate expected transcriptional abnormalities for donors. Many of these and other downstream analyses were performed using scanpy (versions 1.9.1 and 1.9.3)144 and other packages as noted (see Zenodo links in the Code Availability statement for full package lists and versions).

When selecting HVGs (scanpy.pp.highly_variable_genes with the seurat_v3 option, which ranks genes by cross-nucleus variance33), we found that the top 2000 HVGs excluded many marker genes we used for cell typing, and included genes, like inflammatory markers, that might relate more to cell state. To retain more cell type marker genes but still reduce noise, we ordered all 36,601 genes in our data by increasing HVG rank (decreasing expression variance) and inspected the simultaneous decrease in minimum expression variance and increase in number of cell type marker genes as we considered more HVGs (Supplementary Fig. 1e). We selected a number of HVGs to retain that was past the elbow of the curve for fraction of retained cell type marker genes, because beyond this point, including additional HVGs would introduce cell type marker genes at slower rates at the cost of adding progressively lower-variance data (a diminishing return). We kept 20,000 HVGs, which preserved 67% of our database’s cell type marker genes (Supplementary Data 10). Notably, working with this number of HVGs posed a substantial memory burden, since several scanpy functions by default convert expression counts to high-precision data types (64-bit). To alleviate this burden, we created modified versions of several scanpy functions for downstream analysis that maintained data at a user-specified precision; we then found the smallest data precision that would accommodate our expression data (32-bit) and used it as input to those functions.

We performed PCA on HVG-selected expression data (scanpy.tl.pca) and retained the first 25 components (Supplementary Fig. 1f) for batch correction and also UMAP (Uniform Manifold Approximation and Projection)145,146 generation, clustering, and other downstream analysis. We preserved PCA-reduced, batch-corrected expression data for downstream analysis by projecting it back to the barcodes-by-genes basis from the truncated PCA basis (details below). To facilitate this last step, we stored the means and standard deviations used to scale data prior to PCA as well as PCA loadings using a data-type-modified version of scanpy.pp.scale, with zero centering and no expression count capping (to recover the original counts data in the limit of no PCA truncation or batch correction).

We ran batch correction using Harmony147 (snRNA-seq library ID batch variable, 20 maximum iterations); Harmony converged in 3 iterations. We confirmed that batch correction results were reasonable by comparing pre- vs. post-Harmonization 2D UMAP localizations for nuclei that were from the same patient but processed in different libraries (Supplementary Data 9).

Batch-corrected data in the PCA basis was used to make a UMAP-based neighborhood connectivity graph using scanpy.pp.neighbors (20 neighbors) for local manifold approximation; we then generated 2D UMAPs using scanpy.tl.umap and initial Leiden clusters for cell typing using scanpy.tl.leiden (resolution parameter 0.5).

We projected the batch-corrected, scaled expression data back to the gene basis from the PCA basis using the 25 stored PCA loadings, then made the resulting scaled expression data count-like, ensuring non-negative counts, then rescaled it using stored means and standard deviations (Supplementary Fig. 5). Batch-corrected distributions sometimes had non-negligibly negative values (indicative of barcodes with minimal, effectively zero, HVG counts relative to others). Most of these values were between −1 and 0 (Supplementary Fig. 5b), and any adjusted count values <−1 occupied <1% of any given HVG’s corrected distribution (Supplementary Fig. 5c). We thus set these negative values to zero; our approach for doing so was to zero the lowest 1% of each HVG’s count distribution. This approach had no effect on existing non-negative count values because a substantial percentage of each HVG’s distribution was zero counts. We then shifted each distribution so its minimum matched its uncorrected counterpart and would produce non-negative values upon rescaling. Finally, we rescaled the values, set any remaining negative values to zero, and converted to integer-type.

Likelihood-based cell typing

Because we observed indications of abnormal transcription, we wanted to avoid assumptions about cell types that were present in our data or the most appropriate marker genes to use for cell typing. We thus developed an approach for cell typing based on the principle employed in ScType, a computational cell typing tool148. ScType estimates cell types using a large, multi-tissue, multi-organism database of cell-type-specific positive and negative marker genes that are expressed in or suppressed in a cell type, respectively. For a group of unidentified nuclei or cells and specified tissue(s), ScType computes a “cell type score” for each nucleus and possible cell type, which is a weighted sum of the nucleus’s standardized expression levels across the possible cell type’s positive and negative marker genes. Weights each reflect a “marker gene specificity score”, reducing the impact of less-specific marker genes, and have negative sign for negative marker genes to penalize the score when these are expressed; ScType assigns the cell type with the maximum score to each nucleus.

Our approach, like ScType’s, compares gene expression levels against a cell type marker gene database, but in other respects it differs considerably. We perform cell typing on nuclear clusters rather than individual nuclei, with the reasoning that clusters, by definition, group nuclei with similar transcriptional profiles, and that considering them may highlight genes with characteristic expression levels that may be more helpful for cell typing. We identify such “distinguishing HVGs” for each cluster as the HVGs that are differentially overexpressed vs. all other clusters combined, using scanpy.tl.rank_genes_groups with k-fold, leave-one-out cross-fold validation (k = 10) on scaled (scanpy.pp.normalize_total), log1p-transformed (scanpy.pp.log1p) data (Wilcoxon-rank-sum comparison, Benjamini–Hochberg p-value correction). We considered genes that were overexpressed across all k = 10 calculations with l2fc > 1.0 and adjusted p < 0.01 to be distinguishing HVGs, and we used these (as opposed to all HVGs) for comparison of the nuclear cluster against our cell type marker gene database.

The cell type database we use in this study is a modified subset of the ScType database148 that contains marker genes for relevant human tissues (brain, immune tissue, and smooth muscle tissue, which has cell types resembling those in the blood-brain barrier). Some ScType cell types appeared in multiple tissues, so we removed any redundancies in these, either merging types with identical marker gene lists or assigning tissue-specific names to cell types with differing lists (e.g., renaming “Endothelial Cell” to “Brain Endothelial Cells”). We then added marker genes from recent postmortem human SN snRNA-seq studies34,149,150 to the database. We also retained cell types not expected for SN (e.g., glutamatergic neurons, fibroblasts, cancer cells) to detect potential sample issues and to better characterize unexpected phenotypes. Finally, although we pulled positive and negative marker genes from ScType and have included an option to use them in our code, we only used positive marker genes in this study, to avoid assuming that negative markers would be absent for cells in non-normative states. The final database that we used in our analysis here has 61 cell types, with median (IQR) 14 (6, 27) positive marker genes per cell type, and 515 positive marker genes total (Supplementary Data 10).

In our approach, we use likelihoods to assess whether a cluster is of a particular cell type. Explicitly, for a cluster n of nuclei, and cell type c, we define the likelihood LK as the joint probability that each of c’s markers, \({{{\rm{mg}}}}_{c}^{i}\) (where i is an identifying index running from 1 to the total number of marker genes M) is a true, defining marker gene for n (meaning that it is in the set of n’s distinguishing HVGs, which we denote as mgn):

$${{{\rm{mg}}}}_{c}^{i}\in \left\{{{{\rm{mg}}}}_{n}^{j}\right\}$$
(1)

(where j is another identifying index). We assume that the probability of \({{{\rm{mg}}}}_{c}^{i}\) being a marker for n is a monotonically increasing function of its overexpression level in n, which we denote as \({o}_{c,i}^{n}\). By this, we mean that if \({{{\rm{mg}}}}_{c}^{i}\) was not overexpressed in n, its probability of being a marker for n is ~0, and if it is overexpressed, its probability of being a marker increases with increasing \({o}_{c,i}^{n}\), saturating at ~1 above some overexpression level:

$${{\rm{Prob}}}({{{\rm{mg}}}}_{c}^{i}\in \left\{{{{\rm{mg}}}}_{n}^{j}\right\}) \sim {{\rm{f}}}({o}_{c,i}^{n}),\ {{\rm{f}}}\; {{\rm{increasing}}}\; {{\rm{with}}} \ {o}_{c,i}^{n}$$
(2)

We make an approximation of linear dependence on \({o}_{c,i}^{n}\),

$${{\rm{Prob}}}({{mg}}_{c}^{i}\in \left\{{{mg}}_{n}^{j}\right\}) \sim f({o}_{c,i}^{n})\, \sim \,{o}_{c,i}^{n}$$
(3)

and we make the simplifying assumption that the probability functions for \(\left\{{{{\rm{mg}}}}_{c}^{i}\right\}\) can reasonably be treated as independent. The likelihood then takes the form

$${{\rm{LK}}}\left(n,c\right)={\prod }_{i=1}^{M}{o}_{c,i}^{n}$$
(4)

In our calculations, we use the log-likelihood LLK:

$${{\rm{LLK}}}\left(n,c\right)={\sum }_{i=1}^{M}\log ({o}_{c,i}^{n})$$
(5)

If negative marker genes were to be used (they are not in this study), the log-likelihood becomes

$${{\rm{LLK}}}\left(n,c\right)={\sum }_{i=1}^{M{{\rm{pos}}}}\log ({o}_{c,i}^{n})-{\sum }_{j=1}^{M{{\rm{neg}}}}\log ({o}_{c,j}^{n})$$
(6)

We do not apply cell-type-specificity or other weighting to marker gene likelihood terms, to avoid inadvertently making biased assumptions. For example, by not applying weights, we avoid inappropriately upweighting marker genes that appear for fewer database cell types as a result of missing data rather than biological specificity. We also avoid discounting less-specific marker genes that may nevertheless be informative for cell typing.

We construct the log-likelihood in Eq. (5) by setting \(\log ({o}_{c,i}^{n})\), the log overexpression for marker gene i of cell type c, to either the mean (across k calculations) l2fc expression of that gene in cluster n if the gene was a distinguishing HVG (and thus significantly overexpressed), or 0 if not (overexpression level \({o}_{c,i}^{n}\) = 1). Using this measure, we observe substantial drop-off in log-likelihood as the overlap between a cluster’s distinguishing HVG’s and a cell type’s marker genes decreases (Supplementary Fig. 6), leaving only a few putative cell types. We used the highest-likelihood cell types and the contributing database marker genes for each to assign a cell type to each nuclear cluster, and we used this approach in multiple rounds to arrive at final cell types. First, we ran cell typing on our original Leiden clusters, then combined contiguous clusters with the same cell type, then performed subclustering on any of the resulting clusters that had visible substructure (8 total), again using a Leiden algorithm but with resolution 0.1 on each cluster in isolation. Subclustering had the effect of splitting nuclei groups with different cell types that were originally merged, including microglia and macrophages, and GABA and DA neurons. Finally, we performed cell typing again on these re-clustered nuclei, generating the cell types in Fig. 1. We found our cell types to be in agreement with previously published human SN snRNA-seq data34 (Supplementary Fig. 3).

DEA

We performed Wald-formulation DEAs on each cell type separately with k-fold cross-validation (k = 3 set by power analysis; below), using DESeq243 ported into Python with the package diffexpr (https://github.com/wckdouglas/diffexpr/tree/master). For SUD+ vs. SUD− comparisons, the generalized linear model (GLM) we used as DESeq2 input was

$${{\rm{gene}}}\; {{\rm{expression}}}\; {{\rm{level}}} \sim {{{\rm{HIV}}}\; {{\rm{level}}},{{\rm{SUD}}}}_{{{\rm{opc}}}}{{\rm{donor}}}\; {{\rm{group}}}+{{{\rm{SUD}}}}_{{{\rm{other}}}} \\ +{{\rm{age}}}+{{\rm{sex}}}+{{\rm{race}}}+{{\rm{ethnicity}}}+{{\rm{overdose}}}\; {{\rm{death}}}\; {{\rm{status}}} \\ +{{\rm{PMI}}} +{{\rm{tissue}}}\; {{\rm{location}}}+{{\rm{CD}}}4 \ {{\rm{nadir}}}$$
(7)

For HIV comparisons (HIV+u vs. HIV−; HIV+d vs. HIV−; HIV+d vs. HIV+u), our GLM was

$${{\rm{gene}}}\; {{\rm{expression}}}\; {{\rm{level}}} \sim \,{{{\rm{HIV}}}\; {{\rm{level}}},{{\rm{SUD}}}}_{{{\rm{opc}}}}{{\rm{donor}}}\; {{\rm{group}}} \\ +\,{{{\rm{SUD}}}}_{{{\rm{other}}}} +{{\rm{age}}}+{{\rm{sex}}}+{{\rm{race}}}+{{\rm{ethnicity}}} \\ +{{\rm{overdose}}}\; {{\rm{death}}}\; {{\rm{status}}} +{{\rm{PMI}}} +{{\rm{tissue}}}\; {{\rm{location}}}$$
(8)

Covariates are defined as follows:

HIV level: Categorical. Final plasma HIV load in copies RNA per mL. (Options are “HIV−”, “HIV+u” for <50 copies per mL, “HIV+d” for ≥50 copies per mL)

SUDopc: Categorical. Presence of opioid and/or cocaine SUD diagnosis. (“yes”, “no”)

SUDother: Categorical. Presence of SUD diagnosis for substance other than opioids/cocaine. (“yes”, “no”)

Age: Numeric. Donor age in years.

Sex: Categorical. Donor sex assigned at birth. (“female”, “male”)

Race: Categorical. Donor race described in clinical records. (“Black or African American”, “White”, “Asian”, “Black and White”, “Black and Indian”, “Unknown or Not Reported”)

Ethnicity: Categorical. Donor ethnicity described in clinical records. (“Hispanic or Latino”, “Not Hispanic or Latino”)

Overdose Death Status: Categorical. Presence of overdose as cause of death. (“yes”, “no”)

Postmortem Interval (PMI): Numeric. Donor postmortem interval in hours.

Tissue Location: Categorical. Approximate anatomical location of SN sample. (“coronal” for rostral midbrain, “midbrain” for caudal midbrain)

CD4 Nadir: Categorical. Lowest donor CD4 T cell count. (“<200  cells per µL”, “[200,500] cells per µL”, “>500 cells per µL”). HIV− donors are assigned “>500 cells per µL”.

Prior to running DESeq2, numerical covariates were standardized, and categorical covariates that covaried perfectly were identified, and all but one were removed, to avoid ambiguity in fitting. None of the DEAs in this study required us to drop covariates.

DESeq2 implements multiple controls to remove false positives43, and we used k-fold cross-validation as an additional layer of false positive control for each DEA. In k-fold cross-validation, random subsetting of data is expected to randomly alter low gene dispersions. Low dispersions can exist for genes with prominent within-donor correlations (e.g., if there are many similar nuclei per donor relative to the rest of the DEA data) and/or low counts, and they can inflate the apparent effect size of expression changes, leading to false positives. The randomly altered dispersions across k-fold cross-validation computations alters the degree to which such genes are false positives. We filtered to include only DEGs that show up in all k random subsets (below), which helps to remove these impacts. DESeq2 can also be run on single-nucleus data rather than pseudobulked data (for which nuclei are aggregated per cell type and donor); the single-nucleus approach can help to highlight DEGs that may be pertinent to subsets of a cell type if the nuclei exhibit biological variability42, and for this reason we ran DESeq2 on our single-nucleus data without pseudobulking. DESeq2 authors have made recommendations for run parameters that increase the number of true positive DEGs returned when running on single-nucleus data by handling zero inflation characteristic of single-cell approaches42; we have confirmed that those recommendations reproduce our results and add additional DEGs. Importantly, our procedure is more restrictive in terms of DEGs returned but still benefits from false positive controls and does not produce DEGs not seen when run with DESeq2-author-recommended parameters. One comparison is shown in Table 2 for DEA results using our parameters (Wald test, useT = False, default sizeFactors, and pseudocount addition) vs. zero-inflation recommendations (likelihood ratio test, useT = True, minmu = 1e-6, sizeFactors from scran::computeSumFactors).

Table 2 Comparisons of DESeq2 results using current study parametersa vs. parameters recommended for increasing true positives through zero-inflation handlingb, for selected differential expression analyses (DEAs)

To perform DEA, we formed k data subsets from a cell type’s nuclei, then fit the DESeq2 model to each subset (using the GLM in Eq. (7) when assessing SUD impacts or Eq. (8) when assessing HIV impacts; py_DESeq2 followed by run_deseq). We then pulled the results for each donor group comparison of interest (e.g., SUD+HIV− vs. SUD−HIV− donor groups) using dds.deseq_result. With this approach, DEA model fitting for each kth subset was based on nuclei from ~90 donors across all donor groups, such that fits of gene dispersion benefitted from a large sample size and higher nuclear variability. We identified DEGs as those genes with adjusted p-values < 0.05 across all k calculations. For significant DEGs, we used the cross-fold means of l2fc and normalized counts in Figs. 25 and for gene set enrichment analysis (GSEA).

We included an extra term in our SUD GLM (Eq. (7)) for CD4 T cell nadir, that was not in our HIV GLM (Eq. (8)). This is because, although we split HIV groups by the clinically relevant measure of final VL, we noted that other lifetime characteristics of HIV infection could impact transcription, most notably long-term damage incurred during periods of low T-cell counts. We thus added CD4 nadir to Eq. (7) to account for longer-term HIV impacts, and to aid detection of SUD impacts. We found that CD4 nadir tracked closely with final HIV VL for our HIV+ donors, accounting for a small degree of HIV-related variation. For HIV comparisons, we excluded CD4 nadir because it closely recapitulated HIV VL variation.

Gene set enrichment analysis (GSEA)

We performed GSEA with Enrichr44 using GSEApy, a Python wrapper for R-based GSEA tools151. For each DEA, we ran Enrichr separately for up and downregulated DEGs (gseapy.enrichr), using our 20,000 HVGs as background and human annotated gene set reference GO_Biological_Process_2023152,153. We saved all gene sets returned regardless of significance, but considered gene sets with adjusted p < 0.05 to be significantly endorsed (Supplementary Data 6). We also tested the tool GSEA154,155 in GSEApy, and found that it identified similar but generally fewer significant gene sets for DEGs (Supplementary Data 11 and 12).

Identification of highly coordinated DEG subclusters

To find DEG subclusters for a particular DEA and set of cell types, we first pulled the normalized count data (scanpy.pp.normalize_total; as opposed to l2fc or any other DEA result) for all nuclei of each cell type of interest, across all that DEA’s case group donors. We then computed for each cell type, each DEG’s mean normalized expression count (across all that cell type’s nuclei) per case donor. This step produced a matrix of cell-type-specific DEGs per row, and the mean normalized expression of that DEG for each case donor per column. To distinguish DEGs that arose for multiple cell types, and to keep track of the direction of dysregulation, we named DEGs using the convention “cell-type_DEG-direction_gene-name”. For all DEGs across all cell types, we then computed cross-donor Pearson correlations in mean expression for DEG pairs using scipy.stats.pearsonr, using the built-in beta-assumption hypothesis test to determine significance (two-tailed alternative) with significance level 0.05. Across all our computations (Supplementary Table 3), there were median (IQR) 30% (28%, 34%) DEG pairs with significant correlations.

We computed the similarity for each DEG pair to be the correlation magnitude if it was significant or 0 if not significant; for DEGs i, j

$${{{\rm{similarity}}}}_{i,j}={{{\rm{sim}}}}_{i,j}={{\rm{abs}}}({r}_{i,j}),\, \ {p}_{i,j} < 0.05; \ 0 \ {{\rm{otherwise}}}$$
(9)

To use hierarchical agglomerative clustering to group DEGs, we transformed similarities to distances using

$${{{\rm{dist}}}}_{i,j}=\,\frac{1}{(1\,+\,{{{\rm{sim}}}}_{i,j})}$$
(10)

adding 1 to the similarity to avoid division by zero. We used Eq. (10) instead of \({{{\rm{dist}}}}_{i,j}=1-{{{\rm{sim}}}}_{i,j}\) because its nonlinearity required DEG clusters to have more pronounced similarity to be detected, by inflating distances between higher-similarity DEG pairs, and thus by suppressing the dynamic range of distances used to parse clusters. We performed clustering using sklearn.cluster.AgglomerativeClustering (average linkage distance) with distance_threshold = 0 and n_clusters = None to compute the entire linkage tree and return algorithm-generated DEG clusters. The resulting DEG clusters often contained noticeable subclusters, with inter-subcluster distances substantially larger than intra-group distances. We thus split them at points where the distance between adjacent DEGs was ≥3 standard deviations above the cluster’s mean adjacent DEG distance. To keep track of original cluster identity, we named DEG subclusters using the convention “C[cluster-index]S[subcluster-index]” (e.g., C17S2 for cluster 17, subcluster 2). We characterized subcluster coordination strength by computing the average similarity across all DEG pairs in a subcluster. Some subclusters had exceptionally low average similarities (~ 0.2 vs. ~0.8; Fig. 6a, right), indicating weaker overall relationships with other DEGs. To focus on annotating the most highly coordinated DEG groups, we removed these low-similarity clusters by excluding anything in the lower tail of the distribution of average similarities across all DEG subclusters (< 0.75; Fig. 6a right; see also Supplementary Data 13). We verified that DEGs in each subcluster showed coordinated mean expression across case groups by plotting mean expression per DEG, per donor; and further, for SUD+ groups, we verified that coordination was irrespective of opioid/cocaine SUD diagnosis, by performing agglomerative clustering of donors by mean expression across subcluster DEGs, then using a Pearson chi-squared test to determine whether any outlier donors had SUD types statistically different from inliers; we found they did not (p-values 0.248–0.598; Supplementary Data 14).

DEG subcluster annotations

We produced a skeleton annotation.csv file for each DEG subcluster using GO_Biological_Process_2023 annotations and the acyclic graph of their relationships (go-basic.obo). For each DEG, we found all related GO terms and filtered them to include only the most subordinate annotation represented on each graph branch (most specific process). We split remaining terms into those explicitly mentioning positive regulation of a process, explicit negative regulation, explicit response, and others, as an initial basis for annotation (Supplementary Data 7). We used DEG naming convention “cell-type_DEG-direction_gene-name” to facilitate interpretation.

For focused DEG subcluster annotations, two experts (M.J., A.W.) independently searched different gene databases for each DEG (M.J., GeneCards; A.W., National Center for Biotechnology Information gene database) and pulled the functional summary. They then performed a literature search for the DEG using terms related to HIV, SUD, brain location (“substantia nigra”, “ventral midbrain”), and cell type (e.g., “gene x in neurons”, “gene x in dopaminergic neurons”, “gene x in microglia”). Relevant references were collected and used to annotate potentially relevant functions for each gene; these results were then used to generate a consensus across annotators that was used in each DEG subcluster (Supplementary Data 8).

Dual FISH/IF validation imaging

For validation imaging, we selected 3 case donors to examine the correlation between DA neuronal SLC8A1 and microglial REL transcripts in the context of SUD with controlled HIV (Fig. 7c, d; DEG subcluster C16S2). We sectioned formalin-fixed, paraffin-embedded (FFPE) blocks of ventral midbrain tissue containing SN at 5 μm thickness and mounted sections onto Superfrost Plus slides (Catalog [Cat.] # 22-037-246, Thermo Fisher Scientific, Waltham, MA). For each donor, we processed two consecutive sections separately for microglial and DA neuronal targets, using the RNAscope 2.5 HD-RED kit for FISH (Cat. # 322350, Advanced Cell Diagnostics, Newark, CA) and following the manufacturer’s protocol for ISH with subsequent immunohistochemistry (IHC) staining. We used RNAscope FISH probes for the targets REL (microglial slide, Cat. # 888141) and SLC8A1 (neuronal slide, Cat. # 829101), and probes for PPIB (Cat. # 313901) and dapB (Cat. # 310043) as positive and negative controls, respectively. For probe detection, we used Warp Red chromogen (Cat. # WR806, Biocare Medical, Pacheco, CA); upon application, we incubated slides for 10 min at room temperature prior to washing. For IHC-IF, we used primary antibodies to detect the microglial marker IBA1 (microglial slide; Goat Anti-Iba1, 1:100 dilution, Cat. # ab5076, Abcam, Waltham, MA) and the DA neuronal marker TH (neuronal slide; Rabbit Anti-TH, 1:100 dilution, Cat. # AB152, Millipore Sigma, Burlington, MA); we used secondary antibodies Donkey Anti-Goat Alexa 647 (1:300 dilution, Cat. # A32849, Invitrogen, Carlsbad, CA) and Donkey Anti-Rabbit Alexa 647 (1:300 dilution, Cat. # A31573, Invitrogen, Carlsbad, CA) for IBA1 and TH, respectively. We detected nuclei using Hoechst 33342 stain (Cat. # H3570, Invitrogen, Carlsbad, CA).

We acquired whole-slide fluorescence images of processed sections at 0.1625-µm-per-pixel resolution using a high-speed, high-resolution VS200FL Slide Scanner (Evident, Olympus Scientific Solutions Americas Corp., Waltham, MA), with wavelength filters specific to each fluorophore described above (Cy3 filter set for FISH probes [REL, SLC8A1]; Cy5 for IHC-IF [IBA1, TH]; DAPI filter set for the Hoechst 33342 nuclear stain). We imaged all slides at the same signal-maximizing exposure time per filter (100 ms for Cy3; 1000 ms for Cy5; 8 ms for DAPI; Supplementary Fig. 4a, b).

FISH/IF image analysis

We analyzed FISH/IF images using QuPath version 0.4.3156 (https://QuPath.github.io/), first manually delineating each donor’s SN by identifying the band-like, TH-rich region on the TH-stained image. For neuronal (TH/SLC8A1) images, we used a high-pass intensity filter on the TH image channel to retain perikarya (and to remove lower-intensity, less-distinct dendritic or axonal processes), followed by a high-pass area filter (≥100 μm2) to confirm neuronal cell bodies as the target ROIs. We used QuPath pixel classifiers and Groovy scripts (Supplementary Software 1) to select and index the resulting ROIs and measure the total area of SLC8A1 within each. For microglial (IBA1/REL) images, we identified all IBA1-labeled processes within the SN as ROIs and measured the total area of REL overlapping each as the signal. Finally, to remove potential artifacts due to variability in detected ROI sizes, we normalized each ROI’s transcript signal by its TH- or IBA1-labeled area.

FISH/IF REL vs. SLC8A1 signal comparisons

We performed linear regression on REL vs. SLC8A1 signals across case donors, followed by a permutation test of significance on each resulting slope, to ascertain whether the two transcripts had a significant, positive covariation. After removing zeros from REL FISH/IF data, we performed regressions using metrics describing the lower bulk of donors’ REL distributions (10, 15, 20, and 25th percentiles) against the means of SLC8A1 (Supplementary Fig. 4c–k).

In our permutation test for each slope, for 30,000 iterations, we simulated a situation of no cross-transcript relationship by shuffling REL signals across donors, then performed a regression on the resulting distributions. We computed a p-value for the observed slope to be the fraction of all randomized slopes with more extreme values and applied a significance threshold of 0.05.

snRNA-seq sample size selection

We used a power analysis simulation to estimate the number of nuclei per donor needed to power detection of cell-type-specific expression differences. We simulated a case where expression data was compared across two groups for one gene in one cell type, and there was a small but true difference in the two distributions. We defined each group’s underlying expression distribution for the gene to have a lognormal distribution, i.e.,

$${{\rm{Prob}}}(x{{\rm{counts}}})\, \sim {{\rm{Lognormal}}}(\mu,{\sigma }^{2})\,={\mathrm{exp}}({{\rm{N}}}(\mu,{\sigma }^{2}))$$
(11)

We let both distributions have the same variance of 1 and means differing by some l2fc value. The difference between means μ1 and μ2 was related to the l2fc by

$${{\rm{abs}}}\left({\mu }_{1}-{\mu }_{2}\right)={\log }_{{{\rm{e}}}}({2}^{{{\rm{l}}}2{{\rm{fc}}}})$$
(12)

For each l2fc we tested (0.5, 1.0, 1.5), we defined the two underlying distribution means using Eq. (12), then over 10,000 iterations sampled distributions of size s (nuclei) from each and performed a Wilcoxon rank sum test of distinguishability. We computed the power as the fraction of iterations in which the true difference was detected (p < 0.05). We repeated this simulation starting with s = 10 and incrementing s by 2 until we reached a power ≥80%. We achieved sufficient detection power with s = 158 for l2fc 0.5; s = 42 for l2fc 1.0; and s = 20 for l2fc 1.5. Based on this simplified situation, we set a threshold of ~100 nuclei per cell type, aiming for detection of expression differences at l2fc = 1.0. Prior to analysis, we estimated there would be ~10 expected cell types in SN (DA and GABA neurons, microglia, astrocytes, ODCs, OPCs, multiple blood-brain-barrier cells, multiple immune cells149,157) and thus aimed to sequence 1000 nuclei per donor.

Distinguishing HVG overexpression threshold and k selection

We used a similar power analysis to estimate the l2fc at which distinguishing HVG detection was powered for every cluster of nuclei. We set a significance threshold of p = 0.01 to restrict detected distinguishing HVGs to a higher-confidence subset, since these could impact cell typing and thus all downstream analysis. We let distributions represent expression of a gene for one cluster vs. the rest, choosing normal distributions with equal variance and differing means to reflect log1p pre-processing (applied for scanpy.tl.rank_genes_groups). The means and l2fc were related by

$${\mu }_{1}=1,\,{\mu }_{2}={2}^{{{\rm{l}}}2{{\rm{fc}}}}$$
(13)

At l2fc = 1.0, every cluster exceeded the minimum sample size for 80% detection power. We determined the number of folds k for cross-validation by increasing k from 2 in increments of 1 until every cluster with one of its k folds removed had enough nuclei to also power detection. Our code also had an option for the resulting k to be increased by 1, to put each one-fold-removed group of nuclei further over the power detection threshold during cross-fold validation calculations. We enabled this option, finding k = 9 and using k = 10.

DEA overexpression threshold and k selection

We used the same power simulation to estimate the values of l2fc and k that would power DEAs for each cell type at p = 0.05, using k = 3 and l2fc = 1.0.

Statistical comparisons of donor group cell type proportions

We tested whether cell type proportions were differently represented across donor groups by first using the binomial distribution to model cell type proportions for each donor group. For a donor with k nuclei of a specified cell type and n nuclei total, the maximum likelihood estimation for the probability of that cell type’s occurrence, \(\hat{p}\), is

$$\hat{p}=\frac{k}{n}$$
(14)

Within each donor group, we computed for each donor the \(\hat{p}\) for each cell type, producing a distribution of each cell type’s occurrence probability across group donors. For each cell type, we performed Kolmogorov–Smirnov tests to compare these distributions for each pair of donor groups, using Benjamini–Hochberg multiple comparison correction (12 comparisons, one per cell type, per donor group pair). We considered donor group pairs and cell types with adjusted p < 0.05 to have significant differences (Supplementary Data 4).

Significance test for DEG overlaps

We performed random sampling tests to determine whether fractions of overlapping DEGs observed for related DEAs (Figs. 25) were significantly different from expected outcomes of random sampling. For each DEA pair, we sampled the observed numbers of DEGs per DEA from the 20,000-HVG background, and we recorded the fraction of overlapping DEGs between the two simulated populations. We repeated sampling for 100,000 iterations and computed the upper limit on the p-value to be the fraction of iterations with at least as many overlapping DEGs as observed. In all tests we found 0 such iterations, corresponding to p < 10−5.

Reporting summary

Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.