CoTF-reg reveals cooperative transcription factors in oligodendrocyte gene regulation using single-cell multi-omics

Choi, Jerome J.; Svaren, John; Wang, Daifeng

doi:10.1038/s42003-025-07570-6

Download PDF

Article
Open access
Published: 05 February 2025

CoTF-reg reveals cooperative transcription factors in oligodendrocyte gene regulation using single-cell multi-omics

Communications Biology volume 8, Article number: 181 (2025) Cite this article

5043 Accesses
5 Citations
2 Altmetric
Metrics details

Subjects

Abstract

Oligodendrocytes are the myelinating cells within the central nervous system, but the mechanisms by which transcription factors (TFs) cooperate for gene regulation in oligodendrocytes remain unclear. We introduce coTF-reg, an analytical framework that integrates scRNA-seq and scATAC-seq data to identify cooperative TFs co-regulating the target gene (TG). First, we identify co-binding TF pairs in the same oligodendrocyte-specific regulatory regions. Next, we train a deep learning model to predict each TG expression using the co-binding TFs’ expressions. Shapley interaction scores reveal high interactions between co-binding TF pairs, such as SOX10-TCF12. Validation using oligodendrocyte eQTLs and their eGenes that are regulated by these cooperative TFs show potential regulatory roles for genetic variants. Experimental validation using ChIP-seq data confirms some cooperative TF pairs, such as SOX10-OLIG2. Prediction performance of our models is evaluated through holdout data and additional datasets, and an ablation study is also conducted. The results demonstrate stable and consistent performance.

The Oligodendrocyte Transcription Factor 2 OLIG2 regulates transcriptional repression during myelinogenesis in rodents

Article Open access 17 March 2022

Co-occupancy identifies transcription factor co-operation for axon growth

Article Open access 05 May 2021

Gtf2i-encoded transcription factor Tfii-i regulates myelination via Sox10 and Mbp regulatory elements

Article Open access 26 September 2025

Introduction

Oligodendrocytes play key functional roles in the central nervous system (CNS) function, including that they are responsible for myelination^1,2. Myelination is a complex neurodevelopmental process that begins during brain development in the third trimester of pregnancy and increases steadily during childhood, but it can also be dynamically regulated in the context of learning and diseases affecting the mature CNS^3,4. Also, Oligodendrocyte dysfunction and myelin abnormalities have been reported in CNS disorders^2,5,6. Multidirectional interactions between neuronal and glial cells are required for CNS function⁷, including interactions between oligodendrocytes and neurons through myelination⁸. Therefore, it is critical to better understand the functions and roles of oligodendrocytes and myelin.

Gene expression of oligodendrocyte development from oligodendrocyte progenitor cells (OPC) is governed by complex gene regulatory mechanisms involving transcription factors (TFs)^3,4. TFs often work in a combinatorial fashion to regulate gene expression from regulatory elements^9,10. For example, some TFs such as SOX10 and OLIG2 cooperate during the induction of genes for differentiation and myelin formation^11,12,13,14. Enhancers can increase transcription levels from promoters and transcription start sites (TSS), and much of the regulatory code that drives cell type-specific gene expression resides in these distal regulatory elements. Especially, some active enhancers are associated with the gene expression that characterizes cell identity and functions¹⁵. Thus, it is important to identify active oligodendrocyte-specific enhancers as well as promoters and the co-binding TFs that are responsible for their activity.

Next-generation sequencing technologies, including single-cell RNA sequencing (scRNA-seq) and the assay for transposase-accessible chromatin sequencing (scATAC-seq), have provided important insights into cell-type-specific gene regulation. Recent functional genomic resources such as PsychENCODE2¹⁶ and GTEx¹⁷, and emerging tools for integrating multi-omics data enable creating cell-type-level gene regulatory networks (GRNs) linking TFs and their binding sites (TFBS), regulatory elements to target genes (TGs). Those networks can reveal the cell-type-specific regulatory roles of TFs via regulatory elements. Moreover, additional bioinformatic tools such as SCENIC+¹⁸, Signac¹⁹, and scGRNom²⁰ predict cell-type-specific gene regulatory networks to explain potential TF-TG relationships. However, most of these studies and tools focus on relationships between individual TFs and TGs instead of TF-TF interactions and their effects on TG expression. Consequently, due to the lack of tools, the mechanistic roles of cooperative TFs in establishing cell type-specific gene regulation remain uncharacterized.

To tackle these challenges, we introduce coTF-reg, an analytical framework that integrates scRNA-seq and scATAC-seq data to identify cooperative TFs co-regulating the TG. coTF-reg identifies cooperative co-binding TFs along with active regulatory elements for gene regulation as hallmarks of active oligodendrocyte-specific regulatory elements. First, it identifies co-binding TF pairs in these regulatory regions. Second, a deep learning model is trained to predict TG expression based on the expression profiles of co-binding TFs. Third, Shapley interaction scores are computed to evaluate the interactions between TF pairs. Our findings reveal high interactions between co-binding TF pairs, such as SOX10-TCF12. Validation using oligodendrocyte eQTLs and their eGenes that are regulated by these cooperative TFs showed potential regulatory roles for genetic variants. Experimental validation using ChIP-seq data confirmed some cooperative TF pairs, such as SOX10-OLIG2 and SOX10-NKX2.2. Prediction performance of our models was evaluated through holdout data and additional datasets, and an ablation study was also conducted. The results demonstrated stable and consistent performance. Overall, our results create an analytic framework in which co-binding TF pairs cooperatively activate the TG expression through oligodendrocyte-specific regulatory elements.

Results

Deep learning and single-cell multi-omics for identifying cooperative transcription factors in oligodendrocytes

In order to predict cooperative TFs involved in oligodendrocyte gene regulation, we designed coTF-reg, which integrates scRNA-seq and scATAC-seq data to identify the cooperative TFs that co-regulate the target gene (TG) expression in oligodendrocytes (Fig.1, Methods and Materials). Briefly, we first used scATAC-seq data with peak-to-gene links²¹. Second, among the regulatory regions for various cell types, we focused on those specific to oligodendrocytes. We then identified transcription factor binding sites (TFBSs) and co-binding TF pairs through motif co-occurrence and co-enrichment analyses. Third, we trained deep neural networks (DNNs) to predict the expression levels of the TGs and measure interaction effects between co-binding TFs on the expression levels of TGs using gene expression from scRNA-seq data²² and computed Shapley interaction (SI)^23,24 scores for co-binding TF pairs and found cooperative TF pairs. Fourth, we built a gene regulatory network based on SI scores for co-binding TF pairs. Lastly, as an independent validation, to validate the cooperative TF pairs we found, we mapped oligodendrocyte eQTLs onto the regulatory regions where cooperative TF pairs exist, performed Liftover analysis and co-enrichment analysis using ChIP-seq data, and applied Boolean rules to characterize the cooperativity of regulatory factors. To evaluate the prediction performance of our models, we used other publicly available datasets and conducted an ablation study by generating random TF sets to predict TG expressions.

Identification of the co-binding transcription factors in oligodendrocyte-specific regulatory regions

First, we identified a set of 787 oligodendrocyte-differentially accessible and oligodendrocyte-specific regulatory regions by comparison of oligodendrocyte scATAC-seq data to other brain cell types. In this set, we identified 958 motifs for inferred TFBSs using the JASPAR database. Second, we used co-occurrence analysis and co-enrichment analysis to identify 8101 co-binding TF pairs out of 458,403 possible TF pairs (‘Methods and Materials: Co-enrichment analysis’ for more details). We removed TF pairs from the same families and applied a cutoff (<0.1) for false discovery rate (FDR) yielding 8101 co-binding TF pairs. There were 206 TFs that have co-binding TFs linked to 445 TGs (Supplementary Data) that are oligodendrocyte specific in 643 regulatory regions (Fig. 2a). We annotated the regulatory regions to categorize them into promoters (32.5%) and enhancers (67.5%) (Fig. 2b).

Fig. 2: Distribution and correlation of the numbers of co-binding transcription factors, target genes, and peaks for individual transcription factors, peak annotation, and summary statistics for transcription factor-target gene links.

The density plots show the distributions of the number of co-binding TF pairs, the number of TGs, and the number of peaks for individual TFs that are co-bound to other TFs. Most of the TFs have 50 to 103 co-binding TFs (median = 78). The distribution of the number of TGs for TFs is right-skewed, and many TFs have 76 to 172 TGs linked. The distribution of the number of peaks for TFs is also right skewed, and the most frequent intervals were between 75 and 180 peaks (Fig. 2c). The distributions of the numbers of TGs and peaks for co-binding TF pairs are approximately normal. On average, co-binding TF pairs have 60 TGs linked and 59 peaks (Supplementary Fig. 1). Additionally, other density plots show the distributions of the number of TGs and the number of peaks for co-binding TF pairs and bar plots display the numbers of co-binding TFs, TGs, and peaks for individual TFs by their family categories (Fig. 2d). Co-binding TFs have 4 to 115 TGs (median = 59) and 4 to 123 peaks (median = 56) and the most frequent motifs are associated with TF families with C2H2 zinc finger (ZF), bZip, and bHLH DNA-binding domains (Fig. 2e). C2H2 ZF proteins are a large family and C2H2 ZF TFs (e.g., ZNF24²⁵ and KLF9/13²⁶) are known to play significant roles in the development and function of oligodendrocytes, which are the myelinating cells of the CNS. These TFs can regulate the expression of genes essential for oligodendrocyte differentiation, survival, and myelination processes^25,27,28.

We computed Pearson correlation coefficient (r) to measure correlations between the number of co-binding TFs and the number of TGs and the number of co-binding TFs and the number of peaks for individual TFs (Fig. 2d). The number of co-binding TFs and the number of TGs for individual TFs are strongly positively correlated (r = 0.70). It suggests that TFs that are co-bound to other TFs tend to have more TGs linked to them. The number of co-binding TFs and the number of peaks for individual TFs are also strongly positively correlated (r = 0.67). It shows that co-binding TFs may exist in many different peaks.

In the following sections, we incorporate RNA-seq data to explore gene expression relationships between TFs and TGs, train deep learning models to predict TG expression using co-binding TFs, and compute TF interaction scores using the trained models.

Oligodendrocyte gene expression relationships between transcription factors and target genes

A single cell study identified the unique gene expression profile of oligodendrocytes compared to other brain cell types²², as shown by the two dimensional Uniform Manifold Approximation and Projection (UMAP) space after computing latent representations of the neighborhood graph (Fig. 3a). The UMAP embeddings reveal that oligodendrocytes exhibit a distinct expression profile compared to other cell types. This separation suggests that oligodendrocytes have unique transcriptional programs that differentiate them from neighboring cell types. The distinct clustering of oligodendrocytes in the UMAP space indicates specialized functional roles and may reflect their involvement in myelination and maintenance of neural integrity. In order to focus on oligodendrocyte-specific mechanisms of gene regulation, we conducted differential expression testing using 17,946 genes and 20,191 metacells and identified 4387 differentially expressed genes (DEGs) for oligodendrocytes. We found 445 TGs out of 507 TGs of oligodendrocyte-specific regulatory elements (88%) were DEGs for oligodendrocytes. Subsequently, we conducted enrichment analysis for these 445 TGs revealing their involvement in crucial biological processes for oligodendrocytes, such as oligodendrocyte development, oligodendrocyte differentiation, and myelination (Fig. 3b).

We categorized TFs into oligodendrocyte key TFs, oligodendrocyte-specific non-key TFs, and non-oligodendrocyte-specific TFs using oligodendrocyte expression level and the list of key TFs (see ‘Methods and Materials: Key TFs’ for more details). ‘Oligodendrocyte-specific key TFs’ are oligodendrocyte differentially expressed TFs and key TFs, ‘oligodendrocyte-specific non-key TFs’ are oligodendrocyte differentially expressed TFs but not key TFs, and ‘non-oligodendrocyte-specific TFs’ are neither oligodendrocyte differentially expressed TFs nor key TFs. The key oligodendrocyte TFs were defined based on mouse loss-of-function studies that have shown that specific TF’s are critical for oligodendrocyte differentiation. The key TF’s include SOX10²⁹, SOX2^30,31, SOX8³², MYRF³³, OLIG1³⁴, OLIG2³⁵, TCF7L2^36,37, ZNF24²⁵, NKX2.2³⁸, and NKX6.2³⁹.

Each of 206 TF, who have co-binding TFs, regulates a different set of TGs, and we computed correlations between the expression of the TFs and their TGs in the three categories. Pairwise two-sided t-tests show that correlations between TFs and TGs in oligodendrocyte key TF pairs and those in non-oligodendrocyte-specific TF pairs are significantly different (p < 0.001). It also indicates that correlations between TFs and TGs in oligodendrocyte-specific non-key TF pairs and those in non-oligodendrocyte-specific TF pairs are significantly different (p < 0.001) (Fig. 3c). The results for differential expression testing show that the six TFs in the two categories, oligodendrocyte key TF pairs and oligodendrocyte-specific non-key TF pairs are all significant and up-regulated (Fig. S8).

We color-coded the UMAP embeddings based on the expression level of the TFs (Fig. 3a) and selected three TFs as examples for each category. Oligodendrocyte-specific key TFs such as SOX10, MYRF, and OLIG2 are specifically highly expressed in oligodendrocytes. Oligodendrocyte-enriched non-key TFs, including RBPJ, JUND, and KLF7, are expressed in multiple cell types but are more highly expressed in oligodendrocytes. Non-oligodendrocyte-specific TFs, such as RUNX1, HLF, and CREB1, are not specifically expressed in oligodendrocytes (Fig. 3d).

Deep learning and Shapley interaction scores to measure cooperativity of co-binding transcription factors

To understand the complex relationships between TFs for predicting TGs, we built deep learning models. We trained a deep learning model for each of the 445 TG. Each model used the expression levels for the 206 TFs that have co-binding TFs to predict a TG expression level. We used seven hidden layers in each DNN (Fig. 4a). We excluded co-binding TF-TG pairs that exhibited high variability in their SI scores (coefficient of variance > 0.5). Using a trained model and a hold-out test dataset, we computed SI scores for TFs in each DNN. Additionally, we determined the percentile SI score for all co-binding TF pairs. Then, a two-sided t-test to compare the mean values for the percentile SI scores of key co-binding TF pairs and non-key co-binding TF pairs revealed a significant difference between the two groups (p < 0.0001) (Fig. 4b).

**Fig. 4: Cooperative transcription factor pairs by Shapley interaction scores.**

To emphasize the several important key co-binding TF pairs, we selected the top forty-eight interacting pairs for each key co-binding TF pair, such as SOX10, MYRF, OLIG1, OLIG2, NKX6.2, and TCF7L2, and generated a heatmap for their SI scores scaled from 0 to 1 (Fig. 4c). Similarly, we chose the top forty-eight interacting co-binding TF pairs for non-key TFs and created another heatmap for their SI scores scaled from 0 to 1 (Fig. 4d). We noticed that the SI scores for key-TF co-binding pairs have higher values than those for non-key co-binding TF pairs.

We also validated our model prediction performance for one TG, myelin basic protein (MBP), using additional data (Supplementary Fig. 2)⁴⁰. We regressed the scaled actual values on the scaled predicted values. For our primary dataset, we obtained an R-squared of 0.81 and a r of 0.90 (Supplementary Fig. 2a). Furthermore, when analyzing another dataset, we observed an R-squared of 0.69 and a r of 0.83, affirming the predictive capability of our model architecture (Supplementary Fig. 2b).

Oligodendrocyte gene regulatory network analysis for cooperative TF pairs and transcription factor hierarchy

We chose one pair of co-binding TF with the highest interaction scores from six key co-binding TF pairs, including SOX10, MYRF, OLIG1, OLIG2, NKX6.2, and TCF7L2. We built a gene regulatory network (GRN) for these cooperative TF pairs and their TGs that are co-regulated by them (Fig. 5a). We found that a TG, CALD1, is co-regulated by three key cooperative TF pairs, SOX10-TCF12, RORA-OLIG2, and FOXP1-NKX6.2 and another TG, PPP1R16B, is co-regulated by three key cooperative TF pairs, RORA-OLIG2, FOXP1-NKX6.2, and FOXP1-OLIG1. There are other TGs, such as AMOTL2, BOK, CALD1, FA2H, and CPM, in the GRN that are co-regulated by two pairs of cooperative TFs.

**Fig. 5: Gene regulatory network and transcription factor hierarchy.**

We computed in-degree and out-degree for eighteen TFs that can also be TGs at the same time since TF feed forward and feedback loops are common (Fig. 5b). Then, we conducted a TF hierarchy analysis and found eight top-level regulators (Fig. 5c), called ‘Master regulators’, including SOX10, SOX2, and SOX8, which are key TFs that are known to play critical roles in oligodendrocyte differentiation^30,31,32. The other five master regulators, MEIS1, MEIS2, RBPJ, JUND, and ZNF281, are categorized as oligodendrocyte-specific non-key TFs in Fig. 3c. All TFs that are middle-level regulators and bottom-level regulators, except for MYRF, are categorized as oligodendrocyte-specific non-key TFs. MYRF is one of the key TFs which is specifically activated in myelinating oligodendrocytes. PROX1 has been identified as being important for oligodendrocyte differentiation^41,42. Most of these eighteen TFs are expressed in both oligodendrocytes and OPCs (Supplementary Fig. 4). It provides evidence that oligodendrocyte differentiation is pre-set in OPCs⁴³.

Independent validation for cooperative TFs

eQTL mapping

As an independent assessment of the regulatory regions we mapped oligodendrocyte eQTLs⁴⁴ onto oligodendrocyte-specific regulatory regions to explain the causal relationships between the expression levels of the co-binding TF pairs we identified and their target genes (TGs). Using chromosome and position of eQTL SNPs (eSNPs) from oligodendrocyte eQTLs, eSNPs integrated with a total of 643 oligodendrocyte-specific regulatory regions (Fig. 2a). This integration facilitates the identification of potential regulatory connections between the eSNPs and the co-binding TFs in these regions, enhancing our understanding of how genetic variations influence the expression levels of the identified co-binding TF pairs and their corresponding TGs. Notably, it provides evidence of causation if the eQTL genes and TGs are identical where co-binding TF pairs occur, indicating that these co-binding TF pairs are co-regulating TG expressions.

First, among 4.8 million oligodendrocytes eQTLs, we filtered 2 million significant (FDR < 0.05) eQTLs. Second, we mapped these significant eSNPs onto oligodendrocyte-specific regulatory regions (Fig. 6a). In total, 383 eSNPs and 159 eGenes were mapped onto 188 regulatory regions. Among these, 373 eSNPs and 153 eGenes (and TGs) were found in 179 regulatory regions associated with key TF pairs. Enrichment analysis for TGs indicates their strong involvement in biological processes such as oligodendrocyte development, myelination, and oligodendrocyte differentiation. (Fig. 6b)

**Fig. 6: Independent validation using eQTLs and ChiP-seq data.**

Validation of cooperative TF pairs

The model generated from human epigenome and expression data predicted a number of enriched TF pairs within oligodendrocyte-specific TF regulatory elements. In order to test if the coordination occurs as predicted, we utilized rat oligodendrocyte ChIP-seq data that were available for selected transcription factors. One predicted pair was OLIG2/SOX10, which had previously been shown to be extensively colocalized in analyses of rat oligodendrocytes⁴⁵. To visualize the preferential binding of SOX10 on a global scale, a read density plot for SOX10 ChIP-seq reads¹¹ was generated centered on the previously defined OLIG2 peaks⁴⁵ in oligodendrocytes (Fig. 6c). In line with previous analysis, the average read density of SOX10 is highly enriched over OLIG2 bound sites. A newly found pair predicted by the model was that of NKX2.2 and SOX10, and we generated a similar plot of SOX10 ChIP-seq reads over a defined set of NKX2.2 ChIP-seq peaks in oligodendrocytes⁴⁶, and we found a similarly high enrichment of SOX10 binding on ~40% of NKX2.2 binding sites (Fig. 6d). An example of the colocalization is shown for the MBP gene, which MBP is a crucial TG in oligodendrocytes as a key component of the myelin sheath^47,48. Expression of MBP is essential for the differentiation and maturation of oligodendrocytes^49,50, and MBP maintains the structure and integrity of the myelin sheath⁵¹. As shown in Fig. 6e, there are at least 2 sites upstream of MBP where there is colocalization of SOX10 with NKX2.2 and OLIG2.

Boolean cooperativity of TF pairs

We applied a logic circuit to characterize Boolean cooperativity of TFs using Loregic⁵². A total of 206 TFs that form 8101 co-binding TF pairs were input. 6660 (82.2%) out of the 8101 co-binding TF pairs have consistent triplets—matching the same logic gate across all targets, demonstrating strong cooperation between the activities of the two TFs on the TGs. More than half of the TF1-TF2-TG pairs are categorized as “AND” indicating a positive correlation between TG expression and the expression of both TF1 and TF2 (Fig. S6a). We also achieved permutation scores to remove logic gates chosen by random. Still, 6092 TF pairs have consistent triplets and 64% of triplets are categorized as “AND” (Fig. S6b).

Independent validation for the prediction performance of the models

Model prediction validation and ablation study

Using Multi-omics scRNA-seq data²¹ from the same cells as the scATAC-seq data in the main analysis, we trained deep learning models and computed SI scores. Forty-eight SI scores for key-TF pairs in Fig. 4c were selected. The correlation between the SI scores computed from the main data and the Multi-omics data are shown in Fig. S5a. We also ran a two-sided t-test to compare the mean values for the percentile SI scores of key co-binding TF pairs and non-key co-binding TF pairs as we did for the main data (Fig. 4b). There was a significant difference between the two groups (p < 0.0001) (Fig. S5b).

Model performance was evaluated using the holdout data. Additionally, we included three more publicly available scRNA-seq datasets: Multi-omics, ROSMAP⁴⁰, and Cross-disorder⁵³, and validated the prediction performance of our model for each TG. Here, TGs were predicted using the trained models and the entire datasets. The holdout data, Multi-omics, and ROSMAP show consistently low normalized root mean squared error (NRMSE), while more than 75% of predictions in Cross-disorder also have low NRMSE. NRMSE can be compared across genes (Fig. S7a).

We also conducted an ablation study to compare the prediction performance of our models. Another dataset for 206 random TFs that are neither co-binding nor cooperative was created and their prediction performance was compared to that of 206 co-binding TFs (Fig. S7b). The model prediction performance is much better overall when 206 TFs used for predicting TGs are either co-binding, cooperative, or both.

Discussion

With resources provided by advances in single-cell sequencing, some studies^54,55,56,57 have elucidated the roles of several TFs, enabling the construction of cell type-specific gene regulatory networks to explain potential TF-TG relationships using bioinformatic tools. However, most of these studies and tools primarily focus on relationships between independent TFs and TGs.

This study introduces an analytical framework, coTF-reg, which identifies co-binding TFs and their TGs in oligodendrocyte-specific regulatory regions. Deep learning models predicted TG expression levels using the expression levels of co-binding TF pairs, and we computed TF SI scores to define highly interacting co-binding TF pairs as ‘cooperative’ TFs that co-regulate TG expression levels. We found that the key co-binding TF pairs tend to highly interact with each other compared to non-key co-binding TF pairs for predicting TG expression levels. Independent validation, such as mapping eQTLs onto the regulatory regions, provides evidence for causal relationships between co-binding TF pairs and TGs. Additionally, converting these regions to the rat genome assembly coordinates and measuring the density of ChIP-seq signals for key cooperative TFs show that many of these TF pairs are enriched in the regulatory regions, indicating their collaborative role in co-regulating TG expression levels. We defined specific key TFs and examined co-binding TF pairs containing them, along with their interactions in predicting TG expression levels. We then compared these results with those of non-key TF pairs. Overall, co-binding TF pairs with known regulators of oligodendrocyte development exhibit higher SI scores, suggesting that they not only regulate TG expression individually but also cooperatively. We identified several highly cooperative TF pairs, such as SOX10 and OLIG2^12,58, which are already known. Additionally, we discovered previously unreported cooperative pairs, such as SOX10 and NKX2.2.

Our study demonstrates several strengths. First, we concentrate on interactions between co-binding TF pairs and their impact on TG expression using deep learning approaches. Deep learning can elucidate complex TF relationships and their effects on TG expression levels. Second, the coTF-reg pipeline can be used by general users with any scATAC-seq and scRNA-seq data. The code for coTF-reg is openly available on GitHub, allowing users to input their scATAC-seq and scRNA-seq data for specific purposes. Third, we provide a comprehensive analytical framework that incorporates analyses utilizing co-bindings by motif and expression levels. We define ‘cooperative’ TF pairs as TF pairs significantly co-enriched across regulatory regions, exhibiting high SI scores in terms of expression when predicting TG expression. The term cooperativity has often been applied to co-bindings of TFs to nearby sites that facilitates stabilized binding due to protein-protein interactions, but in our model, we use TF pairs that can bind to sites in the same regulatory regions, since TF’s can coordinately activate enhancers without direct interactions.

Nevertheless, there are some limitations to our study. To begin with, it’s important to note that more than two TFs can co-regulate TG expression^59,60. However, our current tool is limited to analyzing interactions between two co-binding TFs. In future research, developing or applying more sophisticated methods capable of handling clusters of TFs that co-regulate the same TG expression will be informative. Moreover, our method for identifying binding sites relies on the position frequency matrices in the motif database. While both SOX10 and MYRF are key TFs for oligodendrocytes, we encountered difficulty in obtaining sufficient binding sites for MYRF. Consequently, we had to supplement with a different motif for MYRF based on our prior knowledge. More generally, the definition of TF motifs relies on disparate methods, and limitations of motif generation and analysis have been noted previously. Nonetheless, our analysis provided TF-TF coordination that we could validate using data from previous studies. We predict that future analysis can be used to determine if the predicted TF pairing plays a role in oligodendrocyte differentiation, since reliance on single factor studies is not able to recapitulate the important combinatorial functions of TF’s in generating cell type-specific gene expression patterns. Lastly, there can be alternative methods for establishing cooperative relationships between TFs, such as Boolean rules^{61,62,63,64,65}. Logic-based models are also powerful tools for understanding the complex interactions among regulatory TFs in gene regulation. Developing new tools that incorporate Boolean rules and machine learning approaches will help us effectively infer more intricate TF relationships, paving the way for future research aimed at unraveling the complexities of gene regulation.

Methods

coTF-reg pipeline workflow

First, published scATAC-seq data with peak-to-gene links²¹ is inputted into the coTF-reg pipeline. Second, transcription factor binding sites (TFBSs) and co-binding TF pairs in the oligodendrocyte regulatory regions are identified through motif co-occurrence and co-enrichment analyses. Third, deep neural networks (DNNs) to predict the expression levels of the TGs are trained and the interaction effects between co-binding TFs on the expression levels of TGs using gene expression from scRNA-seq data²² are measured by computing Shapley interaction (SI)^23,24 scores. Fourth, a gene regulatory network is built based on SI scores for co-binding TF pairs. Fifth, a TF hierarchy analysis is used to define TFs as regulators in three categories. Lastly, as an independent validation, to validate the cooperative TF pairs: 1. The oligodendrocyte eQTLs are mapped onto the regulatory regions where cooperative TF pairs exist, 2. Liftover analysis and co-enrichment analysis using ChIP-seq data are conducted, 3. Boolean rules are applied to characterize the cooperativity of regulatory factors. To evaluate the prediction performance of our models: 1. Other publicly available datasets are used as validation data to predict TG expressions, 2. Ablation study is implemented by generating random TF sets to predict TG expressions.

Step 1: Infer transcription factor binding sites

We inferred transcription factor binding sites (TFBSs) in 787 scATAC-seq peak regions that have linkages with TGs.

a)
The R package GenomicRanges was used to format the ATAC-seq peaks into genomic ranges.
b)
Position frequency matrices (PFMs) for the 949 motifs in JASPAR2022 database⁶⁶ were set in R, along with nine additional PFMs for the important modified motifs based on our prior knowledge.
c)
TFBSs in the scATAC-seq peak regions were inferred using a R package, motifmatchr⁶⁷.

Step 2: Identify co-binding transcription factor pairs

We identified co-binding TF pairs using the inferred TFBSs in Step 1.

a)
All possible TF-TF pairs with binding sites in the scATAC-seq peak regions were considered.
b)
TF pairs from the same families were excluded.
c)
Co-enrichment analysis: Co-occurrence analysis was conducted to find TF pairs that have overlapping regions. We then conducted hypergeometric tests to find significantly enriched TF pairs in the same regions. We used multiple testing corrections via FDR and applied FDR < 0.1 cutoff. We define the TF pairs that are co-enriched (FDR < 0.1) as ‘co-binding’ TF pairs.
d)
Gene regulatory networks (GRNs) were constructed for TG-co-TF pair-peak links and matched TGs and co-TF pairs to the scRNA-seq data.
e)
Lowly expressed TGs and TFs were removed from the GRNs by applying a cutoff, median expression level > 1; more than half of the cells are expressed, from the GRNs.
f)
Differential expression testing was implemented using Seurat⁶⁸ and selected TGs that are oligodendrocyte specific in the GRNs.
g)
Peaks were annotated as promoters or enhancers using annotatr⁶⁹.

Step 3: Measure cooperativity of co-binding transcription factors

Gene expression levels of the co-binding TF pairs from scRNA-seq data were incorporated into deep learning models to predict the expression levels of the TGs and measure interaction effects between co-binding TFs on the expression levels of TGs using Shapley interaction (SI) scores.

a)
Metacells for the cells in scRNA-seq data were projected using a Python package, metacells⁷⁰.
b)
Expression levels of TFs that have co-binding TFs and TGs were used to construct deep learning models for each TG using PyTorch⁷¹ in Python.
c)
SI scores for TF pairs were computed in each deep learning model.
d)
Interaction matrices for the SI scores were generated in deep learning models and the mean interaction scores for co-binding TF pairs were calculated.
e)
Coefficients of variation (CV)⁷² of the interaction scores for each co-binding TF pair were computed and the pairs with CV values higher than 0.5 were removed.

Step 4: Gene regulatory network and TF hierarchy analysis

A gene regulatory network was built for six key cooperative TFs.

a)
One cooperative TF pair for each of the six key TFs was selected based on the top interaction scores.
b)
A gene regulatory network was bulit linking cooperative TF pairs to TGs.
c)
TGs co-regulated by cooperative TF pairs were selected.
d)
A network plot was generated using Cytoscape⁷³.

Step 5: TF hierarchy analysis

TFs that can be TGs were chosen, and we implemented hierarchy analysis⁷⁴ for those TFs.

a)
In-degree (I) and out-degree (O) for the TFs were calculated.
b)
Hierarchy height metrics for the TFs were computed.
c)
TFs were classified as top-regulator, middle-regulator, or bottom-regulator.

Step 6: Independent validation

We implemented eQTL mapping, ChIP-seq enrichment analysis, and Boolean cooperativity analysis for validating cooperative TF pairs and model prediction validation and an ablation study for validating the prediction performance of the models.

Validation of cooperative TF pairs

eQTL mapping

We mapped the significant (FDR < 0.05) oligodendrocyte eQTLs onto the scATAC-seq peak regions.

a)
Publicly available oligodendrocyte eQTL data⁴⁴ were downloaded and the significant (FDR < 0.05) eQTLs were extracted.
b)
The significant eQTLs were mapped to the scATAC-seq peak regions in the GRNs.
c)
The results were verified by comparing the number of eQTLs mapped onto the peak regions for key-TF pairs and non-key-TF pairs.

ChIP-seq enrichment analysis

We performed the LiftOver analysis to convert genome coordinates for rat to human hg38 assembly using UCSC Genome Browser⁷⁵.

a)
Genome coordinates for human hg38 assembly were converted to the rn5 rat genome coordinates for human (hg19) assembly.
b)
Overlapping genome coordinates between conserved (from hg38 to rn5) assembly and the regulatory regions in the GRN were identified.
c)
Cooperative TF pairs in the overlapping regions identified, along with the TGs they co-regulate.

Using the results from the LiftOver analysis, we tried to find signals in co-enriched binding sites for cooperative key TF pairs in rat oligodendrocyte ChIP-seq data. Heatmaps were created via EAseq⁷⁶. ChIP-seq tracks were visualized using UCSC genome browser. Previous ChIP-seq datasets for SOX10, OLIG2, and NKX2.2 are available at GEO accession numbers: GSE64703, GSE42447 and GSM1906296.

Boolean cooperativity of TF pairs

We applied a logic circuit to characterize Boolean cooperativity of TFs using Loregic⁵². Loregic is a computational tool, integrating gene expression and regulatory network, to characterize the cooperativity of regulatory factors. It uses 16 possible two-input-one-output logic gates (e.g. AND) to describe triplets of two factors regulating a common target. The GRN was inputted including co-binding TFs-TG links. Then, we binarized the gene expression levels to Boolean values 1 and 0 to represent high and low gene expression, respectively, using BoolNet⁷⁷. BoolNet assigned Boolean values to expression data on the basis of modular co-expression patterns by K-means clustering across inputted samples and therefore accounts for differences in the dynamic ranges of expression among genes in the input data. The triplet gene expression data was extracted and matched to all possible logic gates. We selected consistent logic gates. We also ran 100 permutation tests to find significant logic gates.

Validation of the prediction performance

Model prediction validation

To verify the performance of deep learning model architectures, we trained a deep learning model for predicting a TG, MBP using another data⁴⁰. The trained model was used to predict the expression level of MBP and compared the results with the model for MBP using the main data.

Using Multi-omics scRNA-seq data²¹ from the same cells as the scATAC-seq data in the main analysis, we trained deep learning models and computed SI scores, following the same processes we did in coTF-reg pipeline for identifying cooperative TFs in oligodendrocyte gene regulation (‘Step 2 Measure cooperativity of co-binding TFs’) for the main scRNA-seq data.

Model performance was evaluated using the SEA-AD²² holdout data. We also include three more publicly available scRNA-seq datasets: Multi-omics²¹, ROSMAP⁴⁰, and Cross-disorder⁵³, and validate the prediction performance of our model for each TG. Here, TGs were predicted using the trained models and the entire datasets. Normalized root mean squared error (NRMSE) is used to compare the performance across different datasets.

Ablation study

It is important to assess whether the 206 co-binding TFs effectively predict their TGs. Another dataset with 206 random TFs that are neither co-binding nor cooperative was generated to evaluate the prediction performance of our models. We used our trained models to predict holdout data for random TFs and compared their prediction performance to that of 206 co-binding TFs.

Single-cell ATAC-seq data

Chromatin accessibility data²¹ was used for the main analyses. Brain samples were selected and eight thousand nuclei from each sample were subjected to the Chromium Next GEM Single-Cell Multiome ATAC-seq. We filtered oligodendrocyte-specific peak-gene links for our analyses. 930 peaks and 606 genes were initially chosen.

Single-cell RNA-seq data

SEA-AD (Main analysis)

The data for the whole taxonomy collected from dorsolateral prefrontal cortex (1,395,601 cells) were downloaded through the Open Data Registry on AWS as AnnData objects (h5ad format)²². The cells for disease were excluded and only the controls were retained. Then, we projected metacells for the whole taxonomy and found 2004 metacells and 17,946 genes for oligodendrocytes.

Multi-omics

The normalized and quality controlled data was gained from the CELLxGENE (RRID:SCR_021059) portal. Brain samples were selected and eight thousand nuclei from each sample were subjected to the Gene Expression protocol (10x Genomics). We filtered 5459 cells for oligodendrocyte.

ROSMAP

The processed count matrix for oligodendrocyte was downloaded from a supplementary website for ‘Single-cell atlas reveals correlates of high cognitive function, dementia, and resilience to Alzheimer’s disease pathology’⁴⁰. We projected metacells for the controls only and found 7072 metacells and 16,707 genes.

Cross-disorder

Post quality control filtered data was obtained from the CELLxGENE portal. We projected metacells for oligodendrocyte controls and found 1004 metacells and 21,248 genes for oligodendrocytes.

Uniform manifold approximation and projection for dimension reduction

We gained scRNA-seq data for the whole taxonomy collected from dorsolateral prefrontal cortex through the Open Data Registry on AWS as AnnData objects (h5ad format)²². There were 1,395,601 cells across 18 sub-cell types. A total of 18,431 hg38 protein-coding genes, obtained via BioMart⁷⁸, were selected from 36,517 genes. We normalized the data to a depth of 10,000 and log1 transformed it using Scanpy⁷⁹ in Python. Then, the highly variable genes (HVGs) were identified using dispersion-based methods⁸⁰ to normalize dispersion, obtained by scaling with the mean and standard deviation of the dispersions for genes falling into a given bin for mean expression of genes. The cutoffs for the mean dispersions for genes were a minimum of 0.0125 and a maximum of 3, and for the minimum dispersion was 0.5. We identified 3032 HVGs and scaled each gene to unit variance to clip values exceeding standard deviation of 10. To reduce the dimensionality of the data, we ran principal component analysi and used top 30 PCs to compute the neighborhood graph of the cells. Finally, we embedded the neighborhood graph with 20 neighbors in two dimensions using Uniform Manifold Approximation and Projection for Dimension Reduction (UMAP)⁸¹.

Differential expression testing

We inputted metacells for all cell types to identify oligodendrocyte-specific genes using Seurat v4 in R. We used the Poisson likelihood ratio test in FindMarkers function assuming that gene expression follows the negative binomial distribution. Oligodendrocytes and oligodendrocyte precursor cells (OPCs), and astrocytes that are known as major cell types among glia in the CNS were grouped and the other fifteen cell types were compared. We used a cutoff, FDR (<0.05) to select differentially expressed genes in the oligodendrocyte group.

Position Frequency Matrices

Position frequency matrices (PFMs) for the 949 motifs in JASPAR2022 were used to infer TF binding sites. We added PFMs for MYRF, SP7, and OLIG2 that are one of the key TFs from another study⁸², Mus musculus in JASPAR2022⁶⁶, and HOCOMOCO v12⁸³, respectively. We also included shorter motifs for other key TFs, such as SOX10, MYRF, ZNF24, NKX2.2, and SP7, considering their importance in oligodendrocytes (Supplementary Fig. 3).

Co-enrichment analysis

We used a hypergeometric test to assess whether a number of overlaps in the binding sites for two TFs follows a hypergeometric distribution. Specifically, given that a random variable $X$ represents the possible outcomes of a hypergeometric process, the probability of getting k or more overlapping binding sites between two TFs inside a particular chosen set, as a hypergeometric random process, is

$$\Pr \left(X\ge {k|n;N;m}\right)={\sum }_{x=k}^{\min (n.m)}\frac{\left(\begin{array}{c}m\\ x\end{array}\right)\left(\begin{array}{c}N-m\\ n-x\end{array}\right)}{\left(\begin{array}{c}N\\ n\end{array}\right)}$$

(1)

where ${N}$ is the total number of transcription binding sites for all TFs, $m$ is the number of binding sites for TF1, n is the number of binding sites for TF2, and $x$ is the number of overlapping binding (co-occurrence) sites between TF1 and TF2. We applied an FDR adjusted p-value as a cutoff (<0.1) for all possible TF pairs and chose co-binding TF pairs.

Key transcription factors

We defined ten key TFs that are oligodendrocyte marker genes based on mouse loss-of-function studies that have shown that specific TF’s are critical for oligodendrocyte differentiation. This includes SOX10²⁹, SOX2^30,31, SOX8³², MYRF³³, OLIG1³⁴, OLIG2³⁵, TCF7L2^36,37, ZNF24²⁵, NKX2.2³⁸, and NKX6.2³⁹ were chosen as key TFs. Ten ‘Oligodendrocyte-specific key TFs’ are oligodendrocyte differentially expressed TFs and key TFs, eighty-three ‘oligodendrocyte-specific non-key TFs’ are oligodendrocyte differentially expressed TFs but not key TFs, and a hundred-thirteen ‘non-oligodendrocyte-specific TFs’ are neither oligodendrocyte differentially expressed TFs nor key TFs. Especially, ten ‘Oligodendrocyte-specific key TFs’ play crucial roles in the development and differentiation of oligodendrocytes. They regulate various stages of oligodendrocyte maturation and promote the expression of myelin genes; essentially, they are key players in the process of myelination within the CNS.

Deep learning models

We inputted expression levels of TFs that have co-binding pairs into the deep neural network (DNN) models to predict TG expression levels. 2004 metacells (samples), 206 TFs (features), and a TG expression level (label) were used in the DNN models. A DNN for each TG was built to predict a TG expression level. The mean squared error (MSE) between predicted TG expression and actual TG expression was used as the loss function in DNN models. We cross-validated the training dataset (80% of the input samples) with 5-fold cross-validation and validated the best trained model on the 20% of hold-out validation dataset for the best use of data and to achieve reliable model performance. We used an early stopping function with patience 10 and determined the number of epochs and we set the batch size to 32. Adam with a learning rate 0.001 was used for training the models. The structure of our neural network model can be written as

$${Z}_{i}=f\left({W}_{i}\cdot X+{b}_{i}\right)$$

(2)

where $X$ denotes the input data and $f$ represents the activation function, specifically the LeakyReLU function. TF expression levels serve as the input data, while Zi represents the output of the i^th hidden layer. The final output of the model is the predicted TG expression level, and ${W}_{i}$ and ${b}_{i}$ are the weight matrix and bias vector for the $i$^th layer, respectively.

To evaluate the performance of our neural network model, we utilize the Mean Squared Error (MSE) loss function. The MSE quantifies the average squared difference between the predicted outputs of the model, Z and the true labels in our regression task. Mathematically, we can express the MSE as follows:

$${MSE}=\quad \frac{1}{N}{\sum }_{i=1}^{N}{({Z}_{i}-{Y}_{{{true}}_{i}})}^{2},$$

(3)

where ${Z}_{i}$ represents the predicted output for the i^th sample, and ${Y}_{{{true}}_{i}}$ denotes the true label corresponding to the i^th sample.

Shapley interaction scores

We denote the set of all TFs by F, a feature i∈F, and a feature set S⊆F. We define the interaction effect between TF i and j, with feature set S, of a neural network f at a data point ${X}_{k}$ to be

$${\delta } \, _{ij}^{f}=f({X}_{k};S\cup \{i,j\})\left)\right.-f({X}_{k};S\cup \{i\})\left)\right.-f({X}_{k};S\cup \{ \, j\})\left)\right.+f({X}_{k};S),$$

(4)

where $f$$({X}_{k}{;S})$ is the prediction at ${X}_{k}$ when only TFs in S are used, which often requires retraining the NN multiple times. A common approximation is to replace the absent features (i.e., F\S) by the corresponding values in a baseline C_F\S, such that

$$f\left({X}_{k}{;S}\right) \approx f\left({X}_{K,S};{C}_{F{{\backslash }}S}\right)$$

(5)

The baseline is set as the empirical mean of each feature. The Shapley interaction score ${{SI}}_{{ij}}^{f}({X}_{k})$ is the expectation of ${\delta }_{{ij}}^{f}({X}_{k}{;S})$,

$${{SI}}_{{ij}}^{f}({X}_{k})={E}_{p\left(S\right)}\left[{\delta } \, _{{ij}}^{f}\left({X}_{k}{;S}\right)\right],$$

(6)

over a uniformly random chosen feature set $S$ from $F$. We use Monte-Carlo procedure⁸⁴ to approximate ${{SI}}_{{ij}}^{f}({X}_{k})$ by a small number of samples of $S$. To aggregate the local interaction effect at different data points into a global interaction effect, we take the expectation $\left|{{SI}}_{{ij}}^{f}({X}_{k})\,\right|$ of w.r.t. the empirical data distribution $p(X)$, such that

$${{SI}}_{{ij}}^{f}={E}_{p\left(X\right)}\left[\left|{{SI}}_{{ij}}^{f}\left(X\right)\right|\right]$$

(7)

For our deep ensemble of deep learning models, we utilize a posterior distribution of functions $q( \, f)$ induced by the ensemble distribution of the weights $q(w)$, as outlined in Eq. (2). This ensemble approach involves training multiple instances of the model, each initialized with different random weights to promote diverse learning paths.

The weights $w$ are drawn from a Gaussian prior, reflecting our initial uncertainty about their values. After training, we apply Bayesian inference techniques to update our beliefs about these weights and compute the posterior distribution $q(w)$. This posterior captures the uncertainty in the model parameters, providing a more comprehensive understanding of the model’s behavior.

The function $q( \, f)$ represents the expected output of the model across this ensemble of weights. To compute the interaction score, we take the expectation of the interaction score ${{SI}}_{{ij}}$ with respect to $q(f)$. This is estimated by averaging ${N}_{f}$ samples drawn from the ensemble:

$${{SI}}_{{ij}}={E}_{q\left(f\right)}\left[{{SI}}_{{ij}}^{f}\right]\approx \frac{1}{{N}_{f}}{\sum }_{k=1}^{{N}_{f}}{{SI}}_{{ij}}^{{f}_{k}}.$$

(8)

We compute Shapley interaction scores^23,24 for the co-binding TF pairs, TF $i$ and TF $j$ using the trained DNN models and validation datasets. We calculate mean values for co-binding TF pairs using interaction matrices. We rank them by percentile and scaled them to 0 and 1 for easier interpretation.

Coefficient of variance

The coefficient of variation (CV) is a statistical measure of the dispersion of data points in a data series around the mean. The CV represents the ratio of the standard deviation to the mean, and it is a useful statistic for comparing the degree of variation from one data series to another, even if the means are drastically different from one another. The CV is defined as the ratio of standard deviation to the mean as follows:

$${CV}=\quad \frac{\sigma }{\mu }$$

(9)

Hierarchy analysis

We computed connectivity statistics, out-degree (O) and in-degree (I), for individual TFs to get a ‘hierarchy height’ metric (h), a normalized value of the difference between O and I for each TF. The $h$ is calculated as

$$h=\quad \frac{O-I}{O+I}$$

(10)

We defined TFs as top-regulator (h > 0.33), middle-regulator (−0.33 < h < 0.33), and bottom-regulator (h < -0.33) by their h values.

Statistics and reproducibility

Data manipulation and analyses were performed using Python 3.10.14 and R 4.3.1. All relevant information including the sample sizes in the groups for statistical tests are included in the figure legends. The plots in this study are generated by Scanpy⁷⁹ (v1.10.3), and seaborn (v0.13.2) in Python and ggplot2 (v3.5.1) in R.

Reporting summary

Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.

Data availability

All data supporting the results are included in Supplementary Data 1~5 and are publicly available on GitHub (https://github.com/daifengwanglab/coTF-reg). For the main analyses, scATAC-seq data were obtained from Supplementary Materials of the Multi-omics study²¹ and scRNA-seq data was sourced from SEA-AD: Seattle Alzheimer’s Disease Brain Cell Atlas (https://cellxgene.cziscience.com/collections/1ca90a2d-2943-483d-b678-b809bf464c30). For the independent validation, scRNA-seq data were acquired from the following websites: (1) Multiome (https://cellxgene.cziscience.com/collections/ceb895f4-ff9f-403a-b7c3-187a9657ac2c); (2) ROSMAP (https://compbio.mit.edu/ad_aging_brain/#loading-the-raw-data); (3) Cross-disorder (https://cellxgene.cziscience.com/collections/c53573b2-eff4-4c5e-9ad0-b24d422dfd9b).

Code availability

The code for the analyses and figures is available at https://github.com/daifengwanglab/coTF-reg.

References

Bercury, K. K. & Macklin, W. B. Dynamics and mechanisms of CNS myelination. Dev. Cell 32, 447–458 (2015).
Article CAS PubMed PubMed Central Google Scholar
Nasrabady, S. E., Rizvi, B., Goldman, J. E. & Brickman, A. M. White matter changes in Alzheimer’s disease: a focus on myelin and oligodendrocytes. Acta Neuropathol. Commun. 6, 22 (2018).
Article PubMed PubMed Central Google Scholar
Singh, D. K., Ling, E. & Kaur, C. Hypoxia and myelination deficits in the developing brain. Int. J. Dev. Neurosci. 70, 3–11 (2018).
Article CAS PubMed Google Scholar
Emery, B. & Lu, Q. R. Transcriptional and epigenetic regulation of oligodendrocyte development and myelination in the central nervous system. Cold Spring Harb. Perspect. Biol. 7, a020461 (2015).
Article PubMed PubMed Central Google Scholar
Valdés-Tovar, M. et al. Insights into myelin dysfunction in schizophrenia and bipolar disorder. WJP 12, 264–285 (2022).
Article PubMed PubMed Central Google Scholar
Maitre, M. et al. Myelin in Alzheimer’s disease: culprit or bystander? Acta Neuropathol. Commun. 11, 56 (2023).
Article CAS PubMed PubMed Central Google Scholar
Quan, L., Uyeda, A. & Muramatsu, R. Central nervous system regeneration: the roles of glial cells in the potential molecular mechanism underlying remyelination. Inflamm. Regen. 42, 7 (2022).
Article PubMed PubMed Central Google Scholar
Simons, M. & Nave, K.-A. Oligodendrocytes: myelination and axonal support. Cold Spring Harb. Perspect. Biol. 8, a020479 (2016).
Article PubMed Central Google Scholar
Elbaz, B. & Popko, B. Molecular control of oligodendrocyte development. Trends Neurosci. 42, 263–277 (2019).
Article CAS PubMed PubMed Central Google Scholar
Ibarra, I. L. et al. Mechanistic insights into transcription factor cooperativity and its impact on protein-phenotype interactions. Nat. Commun. 11, 124 (2020).
Article CAS PubMed PubMed Central Google Scholar
Lopez-Anido, C. et al. Differential Sox10 genomic occupancy in myelinating glia. Glia 63, 1897–1914 (2015).
Article PubMed PubMed Central Google Scholar
Sock, E. & Wegner, M. Using the lineage determinants Olig2 and Sox10 to explore transcriptional regulation of oligodendrocyte development. Dev. Neurobiol. 81, 892–901 (2021).
Article CAS PubMed Google Scholar
Bujalka, H. et al. MYRF is a membrane-associated transcription factor that autoproteolytically cleaves to directly activate myelin genes. PLoS Biol. 11, e1001625 (2013).
Article CAS PubMed PubMed Central Google Scholar
Weider, M. et al. Nfat/calcineurin signaling promotes oligodendrocyte differentiation and myelination by transcription factor network tuning. Nat. Commun. 9, 899 (2018).
Article PubMed PubMed Central Google Scholar
Heinz, S., Romanoski, C. E., Benner, C. & Glass, C. K. The selection and function of cell type-specific enhancers. Nat. Rev. Mol. Cell Biol. 16, 144–154 (2015).
Article CAS PubMed PubMed Central Google Scholar
Emani, P. S. et al. Single-cell genomics and regulatory networks for 388 human brains. Science 384, eadi5199 (2024).
Article CAS PubMed PubMed Central Google Scholar
Lonsdale, J. et al. The Genotype-Tissue Expression (GTEx) project. Nat. Genet 45, 580–585 (2013).
Article CAS Google Scholar
Bravo González-Blas, C. et al. SCENIC+: single-cell multiomic inference of enhancers and gene regulatory networks. Nat. Methods 20, 1355–1367 (2023).
Article PubMed PubMed Central Google Scholar
Stuart, T., Srivastava, A., Madad, S., Lareau, C. A. & Satija, R. Single-cell chromatin state analysis with Signac. Nat. Methods 18, 1333–1341 (2021).
Article CAS PubMed PubMed Central Google Scholar
Jin, T. et al. scGRNom: a computational pipeline of integrative multi-omics analyses for predicting cell-type disease genes and regulatory networks. Genome Med. 13, 95 (2021).
Article CAS PubMed PubMed Central Google Scholar
Zhu, K. et al. Multi-omic profiling of the developing human cerebral cortex at the single-cell level. Sci. Adv. 9, eadg3754 (2023).
Article CAS PubMed PubMed Central Google Scholar
Gabitto, M. I., Travaglini, K. J., Rachleff, V. M. et al. Integrated multimodal cell atlas of Alzheimer’s disease. Nat. Neurosci. 27, 2366–2383 (2024).
Cui, T. et al. Gene–gene interaction detection with deep learning. Commun. Biol. 5, 1238 (2022).
Article PubMed PubMed Central Google Scholar
Dhamdhere, K., Agarwal, A. & Sundararajan, M. The Shapley Taylor Interaction Index. https://doi.org/10.48550/ARXIV.1902.05622. (2019).
Elbaz, B. et al. Phosphorylation sState of ZFP24 controls oligodendrocyte differentiation. Cell Rep. 23, 2254–2263 (2018).
Article CAS PubMed PubMed Central Google Scholar
Bernhardt, C. et al. KLF9 and KLF13 transcription factors boost myelin gene expression in oligodendrocytes as partners of SOX10 and MYRF. Nucleic Acids Res. 50, 11509–11528 (2022).
Article CAS PubMed PubMed Central Google Scholar
Howng, S. Y. B. et al. ZFP191 is required by oligodendrocytes for CNS myelination. Genes Dev. 24, 301–311 (2010).
Article CAS PubMed PubMed Central Google Scholar
Al-Naama, N., Mackeh, R. & Kino, T. C2H2-Type Zinc finger proteins in brain development, neurodevelopmental, and other neuropsychiatric disorders: systematic literature-based analysis. Front. Neurol. 11, 32 (2020).
Article PubMed PubMed Central Google Scholar
Aprato, J. et al. Myrf guides target gene selection of transcription factor Sox10 during oligodendroglial development. Nucleic Acids Res. 48, 1254–1270 (2020).
Article CAS PubMed Google Scholar
Zhao, C. et al. Sox2 sustains recruitment of oligodendrocyte progenitor cells following CNS demyelination and primes them for differentiation during remyelination. J. Neurosci. 35, 11482–11499 (2015).
Article CAS PubMed PubMed Central Google Scholar
Zhang, S. et al. Sox2 is essential for oligodendroglial proliferation and differentiation during postnatal brain myelination and CNS remyelination. J. Neurosci. 38, 1802–1820 (2018).
Article CAS PubMed PubMed Central Google Scholar
Turnescu, T. et al. Sox8 and Sox10 jointly maintain myelin gene expression in oligodendrocytes. Glia 66, 279–294 (2018).
Article PubMed Google Scholar
Garnai, S. J. et al. Variants in myelin regulatory factor (MYRF) cause autosomal dominant and syndromic nanophthalmos in humans and retinal degeneration in mice. PLoS Genet. 15, e1008130 (2019).
Article CAS PubMed PubMed Central Google Scholar
Chen, Y. et al. The oligodendrocyte-specific G protein–coupled receptor GPR17 is a cell-intrinsic timer of myelination. Nat. Neurosci. 12, 1398–1406 (2009).
Article CAS PubMed PubMed Central Google Scholar
Wang, J. et al. Olig2 ablation in immature oligodendrocytes does not enhance CNS myelination and remyelination. J. Neurosci. 42, 8542–8555 (2022).
Article CAS PubMed PubMed Central Google Scholar
Zhao, C. et al. Dual regulatory switch through interactions of Tcf7l2/Tcf4 with stage-specific partners propels oligodendroglial maturation. Nat. Commun. 7, 10883 (2016).
Article CAS PubMed PubMed Central Google Scholar
Ye, F. et al. HDAC1 and HDAC2 regulate oligodendrocyte differentiation by disrupting the beta-catenin-TCF interaction. Nat. Neurosci. 12, 829–838 (2009).
Article CAS PubMed PubMed Central Google Scholar
Qi, Y. et al. Control of oligodendrocyte differentiation by the Nkx2.2 homeodomain transcription factor. Development 128, 2723–2733 (2001).
Article CAS PubMed Google Scholar
Southwood, C. et al. CNS myelin paranodes require Nkx6-2 homeoprotein transcriptional activity for normal structure. J. Neurosci. 24, 11215–11225 (2004).
Article CAS PubMed PubMed Central Google Scholar
Mathys, H. et al. Single-cell atlas reveals correlates of high cognitive function, dementia, and resilience to Alzheimer’s disease pathology. Cell 186, 4365–4385.e27 (2023).
Article CAS PubMed PubMed Central Google Scholar
Bunk, E. C. et al. Prox1 is required for oligodendrocyte cell identity in adult neural stem cells of the subventricular zone. Stem Cells 34, 2115–2129 (2016).
Article CAS PubMed Google Scholar
Kato, K. et al. Prox1 inhibits proliferation and is required for differentiation of the oligodendrocyte cell lineage in the mouse. PLoS ONE10, e0145334 (2015).
Article PubMed PubMed Central Google Scholar
Suzuki, N. et al. Differentiation of oligodendrocyte precursor cells from Sox10-venus mice to oligodendrocytes and astrocytes. Sci. Rep. 7, 14133 (2017).
Article PubMed PubMed Central Google Scholar
Bryois, J. et al. Cell-type-specific cis-eQTLs in eight human brain cell types identify novel risk genes for psychiatric and neurological disorders. Nat. Neurosci. 25, 1104–1112 (2022).
Article CAS PubMed Google Scholar
Yu, Y. et al. Olig2 targets chromatin remodelers to enhancers to initiate oligodendrocyte differentiation. Cell 152, 248–261 (2013).
Article CAS PubMed PubMed Central Google Scholar
Aguado, L. C. et al. microRNA function is limited to cytokine control in the acute response to virus infection. Cell Host Microbe 18, 714–722 (2015).
Article CAS PubMed PubMed Central Google Scholar
Galiano, M. R. et al. Myelin basic protein functions as a microtubule stabilizing protein in differentiated oligodendrocytes. J. Neurosci. Res. 84, 534–541 (2006).
Article CAS PubMed Google Scholar
Aber, E. R. et al. Oligodendroglial macroautophagy is essential for myelin sheath turnover to prevent neurodegeneration and death. Cell Rep. 41, 111480 (2022).
Article CAS PubMed PubMed Central Google Scholar
Ehrlich, M. et al. Rapid and efficient generation of oligodendrocytes from human induced pluripotent stem cells using transcription factors. Proc. Natl Acad. Sci. USA 114, E2243–E2252 (2017).
Smirnova, E. V. et al. Comprehensive Atlas of the myelin basic protein interaction landscape. Biomolecules 11, 1628 (2021).
Article CAS PubMed PubMed Central Google Scholar
Snaidero, N. et al. Antagonistic Functions of MBP and CNP establish cytosolic channels in CNS Myelin. Cell Rep. 18, 314–323 (2017).
Article CAS PubMed PubMed Central Google Scholar
Wang, D. et al. Loregic: a method to characterize the cooperative logic of regulatory factors. PLoS Comput. Biol. 11, e1004132 (2015).
Article PubMed PubMed Central Google Scholar
Rexach, J. E. et al. Cross-disorder and disease-specific pathways in dementia revealed by single-cell genomics. Cell 187, 5753–5774.e28 (2024).
Article CAS PubMed Google Scholar
Yashar, W. M. et al. Predicting transcription factor activity using prior biological information. iScience 27, 109124 (2024).
Article CAS PubMed PubMed Central Google Scholar
Duren, Z. et al. Sc-compReg enables the comparison of gene regulatory networks between conditions using single-cell data. Nat. Commun. 12, 4763 (2021).
Article CAS PubMed PubMed Central Google Scholar
Ferrari, C., Manosalva Pérez, N. & Vandepoele, K. MINI-EX: integrative inference of single-cell gene regulatory networks in plants. Mol. Plant 15, 1807–1824 (2022).
Article CAS PubMed Google Scholar
Duren, Z., Chen, X., Jiang, R., Wang, Y. & Wong, W. H. Modeling gene regulation from paired expression and chromatin accessibility data. Proc. Natl Acad. Sci. USA 114, E4914–E4923 (2017).
Liu, Z. et al. Induction of oligodendrocyte differentiation by Olig2 and Sox10: Evidence for reciprocal interactions and dosage-dependent mechanisms. Dev. Biol. 302, 683–693 (2007).
Article CAS PubMed Google Scholar
Kim, J. et al. The co-regulation mechanism of transcription factors in the human gene regulatory network. Nucleic Acids Res. 40, 8849–8861 (2012).
Article CAS PubMed PubMed Central Google Scholar
Fuda, N. J., Ardehali, M. B. & Lis, J. T. Defining mechanisms that regulate RNA polymerase II transcription in vivo. Nature 461, 186–192 (2009).
Article CAS PubMed PubMed Central Google Scholar
Buchler, N. E., Gerland, U. & Hwa, T. On schemes of combinatorial transcription logic. Proc. Natl Acad. Sci. USA100, 5136–5141 (2003).
Article CAS PubMed PubMed Central Google Scholar
Silva-Rocha, R. & De Lorenzo, V. Mining logic gates in prokaryotic transcriptional regulation networks. FEBS Lett. 582, 1237–1244 (2008).
Article CAS PubMed Google Scholar
Krumsiek, J., Marr, C., Schroeder, T. & Theis, F. J. Hierarchical differentiation of myeloid progenitors is encoded in the transcription factor network. PLoS ONE 6, e22649 (2011).
Article CAS PubMed PubMed Central Google Scholar
Malekpour, S. A., Shahdoust, M., Aghdam, R. & Sadeghi, M. wpLogicNet: logic gate and structure inference in gene regulatory networks. Bioinformatics 39, btad072 (2023).
Article CAS PubMed PubMed Central Google Scholar
Malekpour, S. A., Haghverdi, L. & Sadeghi, M. Single-cell multi-omics analysis identifies context-specific gene regulatory gates and mechanisms. Brief. Bioinform. 25, bbae180 (2024).
Article PubMed PubMed Central Google Scholar
Castro-Mondragon, J. A. et al. JASPAR 2022: the 9th release of the open-access database of transcription factor binding profiles. Nucleic Acids Res. 50, D165–D173 (2022).
Article CAS PubMed Google Scholar
Alicia, S. motifmatchr: Fast Motif Matching in R. R package version 1.28.0. https://doi.org/10.18129/B9.bioc.motifmatchr (2024).
Hao, Y. et al. Integrated analysis of multimodal single-cell data. Cell 184, 3573–3587.e29 (2021).
Article CAS PubMed PubMed Central Google Scholar
Cavalcante, R. G. & Sartor, M. A. annotatr: genomic regions in context. Bioinformatics 33, 2381–2383 (2017).
Article CAS PubMed PubMed Central Google Scholar
Ben-Kiki, O., Bercovich, A., Lifshitz, A. & Tanay, A. Metacell-2: a divide-and-conquer metacell algorithm for scalable scRNA-seq analysis. Genome Biol. 23, 100 (2022).
Article CAS PubMed PubMed Central Google Scholar
Paszke, A. et al. PyTorch: An imperative style, high-performance deep learning library. https://doi.org/10.48550/ARXIV.1912.01703. (2019).
Koopmans, L. H., Owen, D. B. & Rosenblatt, J. I. Confidence intervals for the coefficient of variation for the normal and log normal distributions. Biometrika 51, 25–32 (1964).
Article Google Scholar
Shannon, P. et al. Cytoscape: a software environment for integrated models of biomolecular interaction networks. Genome Res. 13, 2498–2504 (2003).
Article CAS PubMed PubMed Central Google Scholar
Gerstein, M. B. et al. Architecture of the human regulatory network derived from ENCODE data. Nature 489, 91–100 (2012).
Article CAS PubMed PubMed Central Google Scholar
Kent, W. J. et al. The human genome browser at UCSC. Genome Res. 12, 996–1006 (2002).
Article CAS PubMed PubMed Central Google Scholar
Lerdrup, M., Johansen, J. V., Agrawal-Singh, S. & Hansen, K. An interactive environment for agile analysis and visualization of ChIP-sequencing data. Nat. Struct. Mol. Biol. 23, 349–357 (2016).
Article CAS PubMed Google Scholar
Müssel, C., Hopfensitz, M. & Kestler, H. A. BoolNet—an R package for generation, reconstruction and analysis of Boolean networks. Bioinformatics 26, 1378–1380 (2010).
Article PubMed Google Scholar
Smedley, D. et al. BioMart – biological queries made easy. BMC Genom.10, 22 (2009).
Article Google Scholar
Wolf, F. A., Angerer, P. & Theis, F. J. SCANPY: large-scale single-cell gene expression data analysis. Genome Biol. 19, 15 (2018).
Article PubMed PubMed Central Google Scholar
Satija, R., Farrell, J. A., Gennert, D., Schier, A. F. & Regev, A. Spatial reconstruction of single-cell gene expression data. Nat. Biotechnol. 33, 495–502 (2015).
Article CAS PubMed PubMed Central Google Scholar
McInnes, L., Healy, J. & Melville, J. UMAP: Uniform Manifold Approximation and Projection for Dimension Reduction. https://doi.org/10.48550/ARXIV.1802.03426. (2018).
Kim, D. et al. Homo-trimerization is essential for the transcription factor function of Myrf for oligodendrocyte differentiation. Nucleic Acids Res. 45, 5112–5125 (2017).
Article CAS PubMed PubMed Central Google Scholar
Vorontsov, I. E. et al. HOCOMOCO in 2024: a rebuild of the curated collection of binding models for human and mouse transcription factors. Nucleic Acids Res. 52, D154–D163 (2024).
Article CAS PubMed Google Scholar
Štrumbelj, E. & Kononenko, I. Explaining prediction models and individual predictions with feature contributions. Knowl. Inf. Syst. 41, 647–665 (2014).
Article Google Scholar

Download references

Acknowledgements

This work was supported by National Institutes of Health grants, R21 NS128761, RF1MH128695, R01AG067025, R21NS128761, and National Science Foundation Career Award 2144475, and a core grant to the Waisman Center from NICHD (P50 HD105353).

Author information

These authors jointly supervised this work: John Svaren, Daifeng Wang.

Authors and Affiliations

Waisman Center, University of Wisconsin-Madison, Madison, WI, USA
Jerome J. Choi, John Svaren & Daifeng Wang
Department of Population Health Sciences, University of Wisconsin-Madison, Madison, WI, USA
Jerome J. Choi
Department of Comparative Biosciences, School of Veterinary Medicine, University of Wisconsin-Madison, Madison, WI, USA
John Svaren
Department of Biostatistics and Medical Informatics, University of Wisconsin-Madison, Madison, WI, USA
Daifeng Wang
Department of Computer Sciences, University of Wisconsin-Madison, Madison, WI, USA
Daifeng Wang

Authors

Jerome J. Choi
View author publications
Search author on:PubMed Google Scholar
John Svaren
View author publications
Search author on:PubMed Google Scholar
Daifeng Wang
View author publications
Search author on:PubMed Google Scholar

Contributions

Conceptualization, J.S. and D.W.; Methodology, J.C., J.S., and D.W.; Formal Analysis, J.C.; Investigation, J.C., J.S., and D.W.; Writing – Original Draft, J.C.; Writing – Review & Editing; J.C., J.S., and D.W.; Supervision, J.S., and D.W.; Funding Acquisition, J.S. and D.W.

Corresponding author

Correspondence to Daifeng Wang.

Ethics declarations

Competing interests

The authors declare no competing interests.

Peer review

Peer review information

Communications Biology thanks Seyed Malekpour and Ping-Han Hsieh for their contribution to the peer review of this work. Primary Handling Editors:Chien-Yu Chen and Mengtan Xing.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

Supplementary Information

reporting summary

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.

Reprints and permissions

About this article

Cite this article

Choi, J.J., Svaren, J. & Wang, D. CoTF-reg reveals cooperative transcription factors in oligodendrocyte gene regulation using single-cell multi-omics. Commun Biol 8, 181 (2025). https://doi.org/10.1038/s42003-025-07570-6

Download citation

Received: 19 June 2024
Accepted: 17 January 2025
Published: 05 February 2025
Version of record: 05 February 2025
DOI: https://doi.org/10.1038/s42003-025-07570-6

This article is cited by

Cooperative multi-view integration with a scalable and interpretable model explainer
- Jerome J. Choi
- Noah Cohen Kalafut
- Daifeng Wang
Nature Machine Intelligence (2025)

Subjects

Abstract

Similar content being viewed by others

The Oligodendrocyte Transcription Factor 2 OLIG2 regulates transcriptional repression during myelinogenesis in rodents

Co-occupancy identifies transcription factor co-operation for axon growth

Gtf2i-encoded transcription factor Tfii-i regulates myelination via Sox10 and Mbp regulatory elements

Introduction

Results

Deep learning and single-cell multi-omics for identifying cooperative transcription factors in oligodendrocytes

Identification of the co-binding transcription factors in oligodendrocyte-specific regulatory regions

Oligodendrocyte gene expression relationships between transcription factors and target genes

Deep learning and Shapley interaction scores to measure cooperativity of co-binding transcription factors

Oligodendrocyte gene regulatory network analysis for cooperative TF pairs and transcription factor hierarchy

Independent validation for cooperative TFs

eQTL mapping

Validation of cooperative TF pairs

Boolean cooperativity of TF pairs

Independent validation for the prediction performance of the models

Model prediction validation and ablation study

Discussion

Methods

coTF-reg pipeline workflow

Step 1: Infer transcription factor binding sites

Step 2: Identify co-binding transcription factor pairs

Step 3: Measure cooperativity of co-binding transcription factors

Step 4: Gene regulatory network and TF hierarchy analysis

Step 5: TF hierarchy analysis

Step 6: Independent validation

Validation of cooperative TF pairs

eQTL mapping

ChIP-seq enrichment analysis

Boolean cooperativity of TF pairs

Validation of the prediction performance

Model prediction validation

Ablation study

Single-cell ATAC-seq data

Single-cell RNA-seq data

SEA-AD (Main analysis)

Multi-omics

ROSMAP

Cross-disorder

Uniform manifold approximation and projection for dimension reduction

Differential expression testing

Position Frequency Matrices

Co-enrichment analysis

Key transcription factors

Deep learning models

Shapley interaction scores

Coefficient of variance

Hierarchy analysis

Statistics and reproducibility

Reporting summary

Data availability

Code availability

References

Acknowledgements

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Competing interests

Peer review

Peer review information

Additional information

Supplementary information

Supplementary Information

reporting summary

Rights and permissions

About this article

Cite this article

Share this article

This article is cited by

Cooperative multi-view integration with a scalable and interpretable model explainer

Search

Quick links