Extended Data Fig. 7: Overview of DeepLiver models and their predictability on previously tested enhancers.

a. Loss and accuracy curves for the DeepLiver models. The grey dashed lines indicate the selected epochs per model. b. ROC (Receiver Operating Characteristic) and PR (Precision-Recall) values on test data per topic for the DeepLiver accessibility model. The red line shows the values for a random classifier; the grey line, on the training data; and the black line on the validation data. c. ROC and PR values per topic for the DeepLiver zonation model. The red line shows the values for a random classifier; the grey line, on the training data; and the black line on the validation data. d. ROC and PR values per topic for the DeepLiver activity model. The red line shows the values for a random classifier; the grey line, on the training data; and the black line on the validation data. e. Correlation plot between Smith et al.38 enhancer activity and DeepLiver activity predictions (n = 4,966) coloured by the number of motif instances in the sequences (red scale). f. Correlation between in silico and experimental saturation mutagenesis for different sequences tested in vivo by Patwardhan et al.41 (AldoB, ECR11 and LTV1) and in HepG2 by Kircher et al.42 (F9, LDLR and SORT1). g. DeepExplainer and saturation mutagenesis plots for the accessibility, zonation and activity models on the LTV1 promoter (mm9: chr7:29161343-29161843), with motifs highlighted. Saturation mutagenesis, shown below, was performed in this enhancer by Patwardhan et al.41. h. Correlation between DeepLiver in silico mutagenesis and experimental saturation mutagenesis in the LTV1 promoter. The blue line represents the fitted linear regression and the grey bands represent the 95% confidence interval bands. Source numerical data are available in source data.