Fig. 1: Method overview of DOLPHIN for exon-level single-cell RNA-seq data analysis. | Nature Communications

Fig. 1: Method overview of DOLPHIN for exon-level single-cell RNA-seq data analysis.

From: DOLPHIN advances single-cell transcriptomics beyond gene level by leveraging exon and junction reads

Fig. 1

a Preprocessing of single-cell RNA-seq data, including quantification of exon-mapped reads and exon-exon junction reads. b Construction of gene-specific exon graphs, where nodes represent exons and edges represent junctions, aggregated to form an exon graph for each cell. c Learning cell embeddings from exon-level quantification and junction reads through a Variational Graph Autoencoder (VGAE). Each exon graph is converted into feature matrices (Xi) and normalized adjacency matrices (ANi), which are processed by a Graph Attention Network (GAT) layer to capture exon dependencies. The output (Hi) from the GAT layer is then passed to a Variational Autoencoder (VAE) that projects graph representations into a latent space (Z), defined by mean (μ) and standard deviation (σ) parameters, with a KL divergence term weighted by a hyperparameter (β) to regularize the latent space. The decoders reconstruct both the feature matrix (\({{\bf{X}}}_{i}^{{\prime} }\)) and raw adjacency matrix (\({{\bf{A}}}_{i}^{{\prime} }\)), with losses weighted by a hyperparameter (λ) to minimize feature and adjacency reconstruction errors, thereby learning cell-specific embeddings. d Construction of a K-nearest neighbor (KNN) graph in the latent space for refining and aggregating junction reads from neighboring cells based on consensus (majority voting), which enhances junction coverage for downstream splicing analysis. e Calculation of percent-splice-in (PSI) values from aggregated junction reads, enabling accurate alternative splicing inference at the single-cell level. f High-resolution cell embeddings generated by DOLPHIN improve the characterization of cellular heterogeneity compared to conventional gene count-based methods. g Detection of exon-specific markers and identification of biological pathways that are often missed in gene-level analyses. Exon-level biomarkers were identified through differential expression analysis using MAST. h Extensive alternative splicing analysis enabled by DOLPHIN across diverse cellular populations. By default, PSI values and splicing modalities were quantified using Expedition. However, DOLPHIN can be adapted to work with other alternative splicing quantification tools.

Back to article page