Table 1 Area under the curve (AUC) for the prediction of cancer/control with different machine learning models using ONT-V1V9 and Emu’s Default database, based on if feature selection is automatic (Boruta, with two prevalence thresholds of 10% and 30%) or manual.
Selection | Features | Feature count | AUC |
---|---|---|---|
Boruta top 10 (30%) | P. micra, A. butyriciproducens, A. cellulosilytica, O. timonensis, A. bacterium, R. timonensis, Streptococcus sp. A12, B. luti, Clostridium sp. BNL1100, S. variabile | 10 | 0.92 |
Boruta top 10 (10%) | P. micra, A. cellulosilytica, A. rhamnosivorans, P. stomatis, A. butyriciproducens, P. anaerobius, P. stercorea, Candidatus Saccharibacteria bacterium oral taxon 957, O. timonensis, R. timonensis | 10 | 0.91 |
Manual top 4 | F. nucleatum, P. micra, B. fragilis, A. butyriciproducens | 4 | 0.82 |
Manual top 14 | F. nucleatum, P. micra, B. fragilis, A. butyriciproducens, P. stomatis, P. anaerobius, G. morbillorum, D. pneumosintes, S. wadsworthensis, C. perfringens, R. ilealis, P. clara, Longibaculum sp. KGMB06250, R. massiliensis | 14 | 0.87 |