Extended Data Fig. 1: UK Biobank sample selection.
From: Genetic analysis of dietary intake identifies new loci and functional links with metabolic traits

A total of 502,536 participants were available in UK Biobank at the beginning of this study. Thirty participants withdrew consent during the implementation of the analysis plan. We excluded 892 participants with invalid diet data based on previously defined quality control filtering criteria for diet data. A total of 16,034 participants did not pass quality control criteria based on UK Biobank quality control definitions (high heterozygosity & high missing rate, sex aneuploidy, submitted sex different from inferred sex). We excluded 27,965 participants based on a non-European self-reported ancestry, and 5,955 European ancestry outliers based on + /-6SD from the mean in the subset of 192,025 participants based on the first 4 PCs. Among 451,660 participants remaining for the discovery of genetic variants for dietary intake, 260,503 had missing diet data (n = 258,393) or invalid values (n = 2,110). Due to the skewed distribution of macronutrient intake, we winsorized at mean + /− 5 SD for each phenotype.