Fig. 3: Predicting transgenic assay activity using an MPRA-based, coverage-corrected model.

a Examples of VISTA elements with complete (top) and zero (bottom) coverage of their sequence conserved cores using MPRA tiles. Conservation is shown as the PhastCons UCSC track for 30 mammals (27 primates). The MPRA tile with highest activity (used for modeling) has a thicker border. MPRA elements are colored by MPRA activity, see inset. b Visualization of input variables for the GLM. Top: transgenic assay (VISTA) elements are binarized according to chosen tissue activity (here: neural; jitter added for visualization). The blue line is the binomial-link GLM regression on this variable. Bottom: relationship between fraction of conserved core covered and MPRA activity is modeled as a covariate. The blue line is the binomial-link GLM regression on this variable. c Results of the GLM predicting binomial transgenic assay activity from MPRA activity and fraction of conserved core covered. Asterisks indicate p-value < 0.05 (likelihood ratio test, no multiple testing correction). Boxed percentages to the right are Nagelkerke R2 measures. Bars extend two standard errors of the mean in each direction. DRG = dorsal root ganglia. Face-mesen = facial mesenchyme. Cranial nerves category does not include the trigeminal nerve, as per VISTA Browser.