Fig. 3: Host filtration pipeline simulated data validation. | Nature Communications

Fig. 3: Host filtration pipeline simulated data validation.

From: Incomplete human reference genomes can drive false sex biases and expose patient-identifying information in metagenomic data

Fig. 3

Using the 10 simulated datasets of 1 million reads as described in Fig. 2b, we a calculated the number of human reads remaining, and b number of microbial reads remaining, for host filtration Methods 1–3 (HPRC host filtration performed excluding the 10 genomes used for data simulation). HG38: GRCH38.p14, T2T: T2T-CHM13v2.0, HPRC: Human Pangenome Reference Consortium 2024 release. Box plots show the median (center line), interquartile range (IQR; Q1–Q3; box), whiskers extending to Q1 − 1.5 × IQR and Q3 + 1.5 × IQR, minimum and maximum values at whisker ends, and points representing individual observations both within and beyond the whisker range.

Back to article page