Fig. 3: Host filtration pipeline simulated data validation.

Using the 10 simulated datasets of 1 million reads as described in Fig. 2b, we a calculated the number of human reads remaining, and b number of microbial reads remaining, for host filtration Methods 1–3 (HPRC host filtration performed excluding the 10 genomes used for data simulation). HG38: GRCH38.p14, T2T: T2T-CHM13v2.0, HPRC: Human Pangenome Reference Consortium 2024 release. Box plots show the median (center line), interquartile range (IQR; Q1–Q3; box), whiskers extending to Q1 − 1.5 × IQR and Q3 + 1.5 × IQR, minimum and maximum values at whisker ends, and points representing individual observations both within and beyond the whisker range.