Fig. 2: Model Evaluation. | Nature Communications

Fig. 2: Model Evaluation.

From: A Bayesian model for unsupervised detection of RNA splicing based subtypes in cancers

Fig. 2: Model Evaluation.The alternative text for this image may have been generated using AI.

a Error in Ψ variance estimation under Gaussian (red), Beta (blue), and Beta-Binomial (green) models as a function of LSV coverage (x-axis). Absolute error in Ψ variance estimates (y-axis) is compared to the true variance, assuming a Beta(10,90) distribution. Inset histograms show empirical distributions of LSV coverage in beatAML and TARGET data. b Error in \(\hat{\Psi }\) quantification estimates under a naive and empirical shrinkage model as a function of read coverage (x-axis, 1000 samples from the same Beta as above for each point). Naive approach uses only read ratios to estimate \(\hat{\Psi }\) while shrinkage model uses the expectation over the posterior for the Beta. Error bars represent the 90% confidence interval for the error in Ψ. c Correlation between Ψ and \(\hat{\Psi }\) estimates under a naive (left) and empirical shrinkage model (right). Ψ was sampled as in (a) while number of reads n represented by the grey scale was sampled randomly from [10,500]. d Information gained (Supplementary Note 2.2) from missing signals. Here a background matrix was used, consisting of 100 samples and 100 LSVs with a fixed missingness rate of 10%, into which a signal tile was implanted. The signal tile consisted of 50 samples and a varying number of LSVs (x-axis) with an elevated missingness rate of 60%. The observed values in both tile and background were drawn from the same distribution. Green represents the CHESSBOARD model (MNAR), red represents a missing completely at random (MCAR) version of CHESSBOARD. As a reference, we also plot (gray) the information gain from a similarly sized signal tile where the signal is based on a significantly different Ψ distribution simulated with parameters estimated from real data (Supplementary Note 2.3). Missing signals (green) contribute to an increase in information gain as the number of missing signals increases. e Evaluation of CHESSBOARD’s (top right) performance on synthetic data, sampled to mimic BeatAML, compared to hierarchical clustering (bottom left) and spectral co-clustering (bottom right). Ψ values are represented as a heatmap, sample groups as colored bars and tiles as red rectangles. Note that tiles may appear permuted. Performance was evaluated using a modified version of recovery relevance score (Supplementary Note 2.2) which is permutation invariant.

Back to article page