Table 2 Neural network accuracies and F1 scores on test sets for the different noise levels.

From: Creating interpretable deep learning models to identify species using environmental DNA sequences

Test noise level	Training noise level	Flück et. al	Best baseline	Our updated CNN	Our transformer	Our ProtoPNet latent = 0.7	Our ProtoPNet latent = 1
Accuracy on different noise levels
0	0	84.38 ± 1.19	96.57 ± 0.00	89.33 ± 0.71	83.77 ± 1.38	87.54 ± 0.76	89.94 ± 1.18
0	1	90.10 ± 0.87	96.57 ± 0.00	94.48 ± 0.97	87.77 ± 0.77	95.31 ± 0.23	95.66 ± 0.46
0	2	89.33 ± 1.32	93.14 ± 0.00	94.10 ± 2.69	90.97 ± 0.48	94.63 ± 0.46	95.66 ± 0.28
1	0	26.10 ± 2.31	96.00 ± 0.00	50.86 ± 5.38	20.11 ± 6.12	41.37 ± 2.57	61.71 ± 0.96
1	1	80.76 ± 1.19	93.14 ± 0.00	92.95 ± 0.27	79.31 ± 3.46	89.94 ± 1.00	91.66 ± 0.78
1	2	80.76 ± 0.87	92.00 ± 0.00	94.29 ± 1.23	79.77 ± 4.07	88.57 ± 1.77	90.74 ± 0.56
2	0	12.00 ± 0.57	92.57 ± 0.00	31.43 ± 3.36	6.97 ± 1.30	40.91 ± 2.93	47.31 ± 4.68
2	1	60.57 ± 4.98	88.57 ± 0.00	87.43 ± 0.00	61.94 ± 2.70	86.74 ± 1.32	90.51 ± 1.12
2	2	64.38 ± 2.16	81.14 ± 0.00	90.86 ± 0.47	67.77 ± 1.32	85.37 ± 0.78	88.00 ± 1.20
F1 on different noise levels
0	0	0.800 ± 0.01	0.950 ± 0.00	0.868 ± 0.02	0.796 ± 0.02	0.833 ± 0.01	0.895 ± 0.01
0	1	0.877 ± 0.01	0.953 ± 0.00	0.930 ± 0.01	0.843 ± 0.01	0.904 ± 0.00	0.915 ± 0.01
0	2	0.871 ± 0.01	0.915 ± 0.00	0.929 ± 0.00	0.882 ± 0.01	0.908 ± 0.00	0.910 ± 0.00
1	0	0.223 ± 0.03	0.942 ± 0.00	0.404 ± 0.03	0.163 ± 0.05	0.388 ± 0.01	0.653 ± 0.04
1	1	0.771 ± 0.02	0.910 ± 0.00	0.900 ± 0.01	0.754 ± 0.04	0.839 ± 0.00	0.860 ± 0.01
1	2	0.775 ± 0.01	0.896 ± 0.00	0.923 ± 0.02	0.754 ± 0.05	0.829 ± 0.00	0.839 ± 0.01
2	0	0.087 ± 0.01	0.904 ± 0.00	0.247 ± 0.03	0.046 ± 0.01	0.428 ± 0.03	0.566 ± 0.07
2	1	0.532 ± 0.05	0.838 ± 0.00	0.860 ± 0.02	0.563 ± 0.03	0.795 ± 0.02	0.764 ± 0.02
2	2	0.576 ± 0.02	0.790 ± 0.00	0.893 ± 0.02	0.619 ± 0.02	0.777 ± 0.02	0.767 ± 0.03

The baseline models performed worse when trained on high noise than no noise, while the CNNs, transformers, and ProtoPNets were the opposite. This shows that additional noise helps mitigate overfitting for CNNs, transformers, and ProtoPNets. The majority of the best baseline models were produced by logistic regression models. Baseline models were trained on k-mer = 3, 5, and 8. Offline augmentation was used for the baselines and Flück et. al⁴., while online augmentation was used for our CNN and ProtoPNet. Accuracies for Flück et. al. differ from their paper due to a lack of a binarization threshold. ± indicates standard deviation for n = 3 runs.

Back to article page

Table 2 Neural network accuracies and F1 scores on test sets for the different noise levels.

Search

Quick links