Fig. 1: Schematic illustration of the benchmarking workflow in HoC samples.
From: High-resolution microbiome analysis of host-rich samples using 2bRAD-M without host depletion

a HoC mock communities were created by mixing the standard MSA 1002 (synthetic community) with 90% and 99% human DNA. Sequencing was performed using 2bRAD-M, 16S, and WMS methods. Obtained profiles were evaluated for microbial identification and abundance estimation using AUPR and L2 similarity, respectively. AUPR: Area under the precision-recall curve, a metric that measures the average performance of a classification model in terms of precision and recall across all possible thresholds, particularly suitable for imbalanced datasets; L2 similarity: a metric that measures how close two data points are in space by calculating the reciprocal of the Euclidean distance between them. b, c For diurnal saliva and oral cancer samples, we employed 2bRAD-M, 16S, and WMS, with WMS results considered as the gold standard. The acquired feature tables underwent L2 similarity analysis for abundance estimation, followed by diversity analysis. Rarefaction analysis was applied to WMS and 2bRAD-M profiles. Furthermore, the temporal dynamics of oscillating species were examined. d Early childhood caries (ECC) saliva samples were analyzed using 2bRAD-M, short- and long-read 16S sequencing methods. Diagnostic models of ECC were developed employing Random Forest algorithms on sequencing datasets. The efficacy of these models was evaluated with ROC curves. Indicative species for the 2bRAD-M-derived diagnostic model were visualized on a scatter plot, highlighting their importance scores and corresponding AUC values. This figure was created using BioRender.com.