Fig. 1 | Nature Communications

Fig. 1

From: Efficient cross-trait penalized regression increases prediction accuracy in large cohorts using secondary phenotypes

Fig. 1

Overview of CTPR method regarding distributed memory MPI algorithm for biobank-based GWAS data. Biobank-based large-scale GWAS data are first divided into q non-overlapping subgroups (GWAS data 1,…,q) containing SNPs in low LD and each MPI core is assigned to one of q subgroups. Each subgroup of GWAS data runs on each MPI core of computing nodes with its own memory which only requires 1/q of whole memory size. We next propose to use another group, i.e. s core-group (s ≤ q), each of which contains several subgroups. All MPI cores in each core-group execute simultaneously at each estimation step (1,…,t) keeping all cores in other core-groups waiting till finish. In this way, coefficients within a core-group are concurrently updated and eventually all coefficients are updated consecutively in core-groups to improve the computational efficiency as well as to avoid convergence problem. This algorithm enables multiple subgroups of SNP coefficients updated simultaneously or sequentially at each estimation step and therefore it provides the computationally more efficient or exact coordinate descent optimization for polygenic risk prediction

Back to article page