Figure 1 | Scientific Reports

Figure 1

From: Predicting pathogenic non-coding SVs disrupting the 3D genome in 1646 whole cancer genomes using multiple instance learning

Figure 1

Overview of the svMIL2 method and performance. (a) svMIL2 methodology. From disrupted TADs, pairs are identified between SVs and genes disrupted due to gained or lost regulatory elements. These SV-gene pairs are modeled as bags (keychain), in which the regulatory elements (eQTLs, enhancers or super enhancers) that the gene gained or lost due to the SV are instances (keys). Instances are described with features such as histone marks (see panel b). A similarity score is constrcuted between bags and instances by computing the absolute distance from the mean instance of each bag to all other instances. The resulting similarity matrix is used as input to a random forest model to classify bags. (b) All features used in the svMIL2 model to describe instances, grouped by feature category. (c) Performance in AUC of the svMIL2 model on 12 cancer types from the HMF dataset.

Back to article page