Supplementary Figure 2: Model performance assessment by simulation and real data.
From: High-resolution genetic mapping of putative causal interactions between regions of open chromatin

(a), Confusion matrix of mapped interactions for seven different simulated hypotheses. The row corresponds to the true hypothesis and the column corresponds to the inferred hypothesis by PHM. The first four hypotheses are with two causal variants for each peak and the bottom three hypotheses are hybrid hypotheses with a combination of linkage, pleiotropy and causality. See section 4.14 in the Supplementary Note for details. (b), Scatterplot of the posterior probability of causality from peak k to peak j (PPCkj) against the strength of genetic association from the causal variant to peak j (BFj) under the simulation scenario of causality (j → k). The red line shows the misclassification rate of causality (k → j) from the true causality (j → k). The rate is around 40% when caQTL signal for peak j is very weak. It decreases below 1% for BFj > 100. (c), Result of correlation analysis across 53 different cell types from the Roadmap Epigenomics Project. The top panel shows a heatmap of Spearman’s correlation between 32 chromatin accessibility peaks in the TTC34 gene body. The second panel shows normalized chromatin accessibility (see section 2.8 of the Supplementary Note) at 12 peaks across 53 different cell types. Those 12 peaks were selected according to the highly correlated block in the top panel. The third panel shows our ATAC-seq coverage plot stratified by the putative causal variant (rs4648682[G > A]) within the inferred master regulatory peak. PPCjk from the master peak is shown by arrows. (d), Inferred high-confidence (PPCjk = 0.99) causal interaction around the 3ʹ end of the RAP1GAP2 gene. ATAC-seq coverage is stratified by genotype at the putative causal SNP (rs6502671[C > T]) in the left peak. This example also showed causal interaction spanning two adjacent TADs. (e), Box plot of allelic imbalance for three genotypes at rs6502671 (n = 27, 12 and 2 samples for CC, CT and TT, respectively) estimated by aggregated allele-specific counts at heterozygous SNPs in the downstream peak (right peak) in d. Allelic imbalance is observed only in individuals whose genotype at rs6502671 is heterozygous (light blue dots). The downstream peak (right peak) is a strong caQTL given the regulatory SNP (rs6502671) with allelic imbalance π = 0.383 (P = 1.9 × 10–35) estimated by RASQUAL (Nat Genet. 48, 206–213, 2016). This suggests the accessibility linked to the T allele at rs6502671 is significantly lower than that linked to the C allele at rs6502671. In the box plots, the box represents the interquartile range (IQR), the black line is the median, and the whiskers are 1.5 times the IQR above or below the first and third quartiles, with data points outside the whiskers shown by open circles. (f), Scatterplot showing the ratio of putative TF binding affinities between the reference and alternative alleles at each lead SNP (predicted by simple hierarchical model) against the ratio of ATAC-seq allele-specific (AS) chromatin accessibility counts (n = 26,213 SNPs). AS counts for each lead SNP were generated by aggregating AS counts for heterozygous individuals at each lead SNP. The red line shows the linear regression line (beta = 0.36 and P = 2.0 × 10–25).