Supplementary Figure 6: Models of joint genotype-expression distribution with varying numbers of parameters for a positively correlated eQTL.
From: Quantification of private information leakage from phenotype-genotype data: linking attacks

(a) The true genotype-expression distribution. Grey boxes represent the expression distributions given different genotypes. Red line indicates the gradient of correlation between genotype and expression. (b) First simplification of the joint distribution. The expression distribution can be modeled with Gaussians with different means and variances with total of 6 parameters. (c) Simplification of joint distribution with equal variances. The variances can be assumed same for different genotypes, resulting in a 4-parameter model. (d) A representation of the uniform expression distribution given genotypes, where 4 parameters are required. The conditional distribution of expression is uniform (blue rectangles) over the ranges (e1, e2), (e2, e3), and (e3, e4) given genotypes 0, 1, and 2, respectively. The transparent grey rectangles show the original distributions. (e) A simplification of (d) where conditional probability of expression is zero given genotype is 1. In this model, only one parameter (emid) is necessary. The conditional probability of expression given genotypes 0 and 2 are uniform for expression levels below emid and above emid, respectively (shown with blue rectangles). The original distribution is shown with grey rectangles for comparison. Extremity-based prediction uses an instantiation of the model in (e).