Fig. 1: Overview of the scGALA Framework.

a Cell Alignment Backbone Module via Graph Link Prediction. The foundational alignment framework operates on each pair of datasets by constructing cell-cell graphs and learning alignments via masked link prediction. Intra-dataset graphs are built using K-nearest neighbors (KNN) based on molecular profiles, optionally incorporating spatial coordinates, while inter-dataset edges are initialized using mutual nearest neighbors (MNN). A Graph Attention Network (GAT) is trained on these graphs using a self-supervised strategy in which a subset of edges is randomly masked during training and the model is optimized to reconstruct them, encouraging the discovery of cross-dataset correspondences. The predicted links are iteratively refined via score-based optimization and merged with the MNN-based priors to yield a high-confidence alignment backbone that enables subsequent integration analyses. b All-in-One Data Integration Pipeline. Building upon the enhanced cell alignment backbone from panel (a), scGALA supports core tasks for single-cell data integration and harmonization, including: (1) batch correction, by removing technical variation guided by predicted alignments while preserving biological heterogeneity; (2) label transfer, by propagating annotations through alignment-based matching; (3) multi-omics integration, by aligning and constructing unified representations across modalities; and (4) spatial alignment, by incorporating spatial coordinates to enable spatial-aware alignment across tissue slices. c Advanced Multimodal Functionalities. Leveraging the improved cell alignment methodology independently, scGALA enables distinct advanced multi-omics functionalities not widely supported by existing methods, including: (1) mosaic integration, where datasets with partially overlapping modalities (e.g., RNA+ATAC and RNA+protein) are jointly integrated to reconstruct unified tri-modal profiles; (2) cross-modality multi-omics imputation and generation, where unmeasured modalities (e.g., RNA for ATAC cells) are inferred using alignment-derived mappings to support full-transcriptome analysis; and (3) spatial transcriptomics enhancement, where scRNA-seq data is aligned with spatial datasets to impute missing gene expression (such as in Xenium), increasing resolution and enabling downstream analyses such as spatial domain identification and spatial marker discovery.