Figure 3
From: Across-cohort QC analyses of GWAS summary statistics from complex traits

Pseudo profile score regression for pinpointing overlapping samples/relatives. (a) Each cluster represents a pair of cohorts as denoted on the x axis. Within each cluster, from left to right, the detected overlapping controls using λmeta based either on effect size estimates or minor allele frequency (MAF), PPRS using 100, 200, and 500 markers. WTCCC cohort codes: BD for bipolar disorder, CAD for coronary artery disease, CD for Crohn’s disease, HT for hypertension, RA for rheumatoid arthritis, T1D for type 1 diabetes, and T2D for type 2 diabetes. (b) Illustration for regression coefficients between WTCCC BD and CAD from 57 pseudo profile scores (PPS) generated from 500 markers. The x axis is the PPSR regression coefficients and y axis is real genetic relatedness (as calculated from individual-level genotype data). The red points are the shared controls between two cohorts, and blue points are first-degree relatives. (c) The PPS regression coefficients for detecting overlapping first-degree relatives using 286 PPS generated from 500 markers. (d) Decoding genotypes from the PPS. Given the set of profile scores, one may run a GWAS-like analysis to infer the genotypes. The ratio between the number of markers (M) and number of pseudo profile scores (K) determines the potential discovery of individual-level information. The higher the ratio and, the higher the allele frequency, the less information can be recovered. From left to right, the profile scores generated using different number of markers. The y axis is a R2 metric representing the accuracy between the inferred genotypes and the real genotypes. From left to right panels, 100, 200, 500, and 1000 SNPs were used to generate 10, 20, 50, and 1000 profiles scores. In each cluster, the three bars are inferred accuracy using different MAF spectrum alleles, given the SE of the mean.