Fig. 5: Pipeline components increase the difficulty of causal gene identification in simulated patients. | Nature Communications

Fig. 5: Pipeline components increase the difficulty of causal gene identification in simulated patients.

From: Simulation of undiagnosed patients with novel genetic conditions

Fig. 5

We run a gene prioritization algorithm on patients simulated by our pipeline when varying subsets of pipeline components are included. We report the fraction of simulated patients where the causal gene was prioritized within the top k ranked genes for varying k (horizontal axis for all plots) when different components of the simulation pipeline are included (vertical axis for all plots). The average rank of the causal gene is listed in italics at the base of each bar. The color and width of each stacked bar section corresponds to causal gene rank grouping. We show gene prioritization performance on simulated patients produced when the following components are included in the simulation pipeline: a no phenotype- nor gene-based components (i.e., candidate genes sampled randomly and phenotype terms unaltered from initialization), all standalone phenotype-altering components alone, all distractor gene modules alone, or all pipeline components together; b a “gene-only” version of distractor gene modules and each possible combination of subsets of phenotype-altering components; c all three standalone phenotype-altering components and all but one distractor gene module at a time. Note that in b, horizontal purple lines in the vertical axis labels are for visual clarity, whereas in c, horizontal black lines in the vertical axis signify set difference. Source data are provided as a Source Data file.

Back to article page