Fig. 2: Cohort characteristics.
From: The Taiwan Precision Medicine Initiative provides a cohort for large-scale studies

a, Sex-specific age distribution. b, Top 20 most prevalent ICD-10 codes: E78 (disorders of lipoprotein metabolism and other lipidaemias), I10 (EHT), E11 (type 2 diabetes mellitus), K21 (gastro-oesophageal reflux disease), J30 (vasomotor and allergic rhinitis), G47 (sleep disorders), K05 (gingivitis and periodontal diseases), N39 (other urinary disorders), M47 (spondylosis), K59 (other functional intestinal disorders), M79 (other and unspecified soft tissue disorders), R10 (abdominal and pelvic pain), H10 (conjunctivitis), I25 (chronic ischaemic heart disease), N40 (enlarged prostate), L30 (other and unspecified dermatitis), I11 (hypertensive heart disease), H04 (lacrimal system disorders), N18 (chronic kidney disease) and R07 (pain in throat or chest). c, Age of onset for the top 20 diseases. Onset ages in male individuals (blue) and female individuals (pink) are presented as box plots, ordered by median. Box plots represent minima, first quartile, median, third quartile and maxima. Values and sample sizes are in the Source Data. d, Top 20 most prevalent laboratory tests: creatinine_B (blood creatinine), WBC (white blood cell count), SGPT (serum glutamic pyruvic transaminase or alanine aminotransferase; S-GPT/ALT), HB (haemoglobin), platelet (platelet count), HCT (haematocrit), RBC (red blood cell count), EGFR (estimated glomerular filtration rate), SGOT (serum glutamic–oxaloacetic transaminase or aspartate aminotransferase; S-GOT/AST), TG (triglyceride), cholesterol_T (Total Cholesterol), BUN (blood urea nitrogen), glucose_AC (fasting glucose), LDL_C (low-density lipoprotein cholesterol), HDL_C (high-density lipoprotein cholesterol), uric acid_B (blood uric acid), HbA1c (haemoglobin A1c), bilirubin_T (bilirubin, total value), albumin and TSH (thyroid-stimulating hormone, measured by enzyme immunoassay or luminescence immunoassay). Left, sex-specific distribution of record counts per individual (winsorized at the 95th percentile); middle, proportion of individuals with test data; right, distribution of average follow-up years. Box plots represent minima, first quartile, median, third quartile and maxima. Values and sample sizes are in the Source Data. e, The top pie chart shows the proportions of related and unrelated samples. The bottom pie chart shows relationship categories: duplicate (DUP) or monozygotic twin (MZ), parent–offspring (PO), full sibling (FS), second degree (2nd) and third degree (3rd).