Fig. 4: Results for the single group gene targeting rates analysis. | Nature Communications

Fig. 4: Results for the single group gene targeting rates analysis.

From: Modeling integration site data for safety assessment with MELISSA

Fig. 4

A Gene targeting score for the HSPC dataset. Dots correspond to genes and are ordered according to chromosome location, with genes in even-numbered chromosomes having a lighter color. Y axis value is the signed likelihood ratio test statistics (score) for the null hypothesis that the tested gene’s integration rate is equal to genome-wide baseline. Genes marked with a * are included in the high-risk gene list. The dashed red line represents the threshold for statistically significant enrichment (586 genes, with adjusted FDR P value < 0.05). 3 replicates, 8204 tested genomic intervals (containing at least one integration), corresponding to 1959 genes. Source data are provided as a Source Data file. B Overlap between significantly over-targeted (adjusted FDR P value < 0.05) gene sets. 4 replicates for both MSC types, 3019 and 6288 tested genomic intervals for BM MSC and Ad MSC respectively. Source data are provided as a Source Data file. C Most targeted genes in MSCs and HSPC. IS enrichment scores for the top 10 most frequently targeted genes in each dataset. Genes associated with documented clonal expansion in gene therapy clinical trials are included in the heatmap, marked by a red line under the gene name. Colors in the heatmap represent the IS enrichment, quantified as the log odds (regression coefficient estimate) associated with the gene effect. Source data are provided as a Source Data file. D Detailed view of two over-targeted genes reveals cell type-specific and shared integration patterns: NPLOC4 is over-targeted among all cell types, PLEC is over-targeted exclusively in both BM MSC and Ad MSC. E Comparison between the expression level (x axis, log10 scale) and the targeting rates (y axis) defined in Fig. 4A. The expression level is the average across 8 samples of raw counts, normalized for differences in sequencing depth across samples. A default zero expression level is assigned to genes not expressed. Red line represents the linear regression of gene targeting score and gene log10 expression score. Source data are provided as a Source Data file.

Back to article page