Fig. 1: Overview of the CorALS framework.

a–c, CorALS leverages feature projections into specialized vector spaces (a) embedded into a flexible computational pipeline (b) for large-scale correlation analysis (c). In particular, CorALS exploits the direct connection between Euclidean distance and the correlation of individual features in correlation space (a, middle), as well as the Euclidean distance and correlation differences of feature pairs across conditions in differential space (a, right), to derive efficient indexing structures (b, left). These indexes are utilized in a computational pipeline that splits correlation computations into batches based on a specifically designed approximation scheme for effective memory management and parallelization (b, middle). Batches are then joined in a memory efficient manner to yield the final correlation results (b, right). This enables applications such as full correlation matrix computation and correlation-based feature embeddings (c, left), top correlation network approximations (c, middle) and differential correlation discovery (c, right) for large-scale, high-dimensional datasets. Points represent features (two specific features are denoted as x and y), and subindices and colors indicate two conditions (1, blue; 2, orange). x1 and x2 (and y1 and y2) are the same feature illustrated by a cross (square) marker across the two conditions. In a, feature projections are denoted as
in the middle panel, and \(\delta\) and κ in the right panel. In b, individual features are represented as fi, and ci,j is a short notation for the correlation between feature fi and fj.