Fig. 1: BEANIE (group Biology EstimAtioN in sIngle cEll).

A Method Overview: Tumor cells from multiple patient samples are clustered to identify tumor states. BEANIE then focuses on shared states between the two patient groups. For each corresponding tumor state, user inputs include a gene-by-cell count matrix, sample and group IDs, and a list of gene signatures to test (referred to as test signatures or t_signatures). Test signatures are first organized into bins based on their size (i.e., number of genes per signature). For each bin, a list of background signatures (b_signatures) of equivalent size is created by random gene sampling to serve as a control distribution in subsequent p value calculations. Signature scoring is performed per cell for both test and random gene signatures, followed by differential expression analysis to identify statistically significant and robust gene signatures. B Differential Expression Workflow: Differential expression testing relies on a Monte Carlo approximation of empirical p values through subsampling, combined with leave-one-out cross-validation by excluding individual patient samples. Initially, the data (counts matrix) is divided into folds, with each fold fq representing the exclusion of one sample from either comparison group. In the subsampling step (Monte Carlo simulation), an equal number of cells are sampled from each patient to ensure balanced representation. A Mann-Whitney U test is then performed per subsample for each fold, for both the test gene signatures and the background distribution (derived from random gene signatures). Each test gene signature is matched with a corresponding background distribution based on bin size, and an empirical p value is computed (reflecting the test distribution’s median percentile relative to the background). Additionally, a Fold Rejection Ratio (FRR) (see “Methods”) is calculated per test gene signature for each fold, providing a measure of the robustness of each gene signature to patient sample exclusion.