Fig. 1: Overview of the OncoMark framework.

Single-cell transcriptomic data from multiple cancer types undergo quality control to remove low-quality cells. Each cell is then scored for hallmark gene expression signatures, followed by binary annotation (Yes/No) indicating the presence or absence of each hallmark. These annotated single cells are aggregated to create synthetic pseudo-bulk datasets for each hallmark. A multi-task neural network (M-TNN) is trained on this synthetic data, learning a shared feature representation across all hallmarks, with hallmark-specific output layers enabling accurate prediction of hallmark presence.