Table 2 Neural network accuracies and F1 scores on test sets for the different noise levels.

From: Creating interpretable deep learning models to identify species using environmental DNA sequences

Test noise level

Training noise level

Flück

et. al

Best baseline

Our updated CNN

Our transformer

Our ProtoPNet latent = 0.7

Our ProtoPNet latent = 1

Accuracy on different noise levels

0

0

84.38 ± 1.19

96.57 ± 0.00

89.33 ± 0.71

83.77 ± 1.38

87.54 ± 0.76

89.94 ± 1.18

0

1

90.10 ± 0.87

96.57 ± 0.00

94.48 ± 0.97

87.77 ± 0.77

95.31 ± 0.23

95.66 ± 0.46

0

2

89.33 ± 1.32

93.14 ± 0.00

94.10 ± 2.69

90.97 ± 0.48

94.63 ± 0.46

95.66 ± 0.28

1

0

26.10 ± 2.31

96.00 ± 0.00

50.86 ± 5.38

20.11 ± 6.12

41.37 ± 2.57

61.71 ± 0.96

1

1

80.76 ± 1.19

93.14 ± 0.00

92.95 ± 0.27

79.31 ± 3.46

89.94 ± 1.00

91.66 ± 0.78

1

2

80.76 ± 0.87

92.00 ± 0.00

94.29 ± 1.23

79.77 ± 4.07

88.57 ± 1.77

90.74 ± 0.56

2

0

12.00 ± 0.57

92.57 ± 0.00

31.43 ± 3.36

6.97 ± 1.30

40.91 ± 2.93

47.31 ± 4.68

2

1

60.57 ± 4.98

88.57 ± 0.00

87.43 ± 0.00

61.94 ± 2.70

86.74 ± 1.32

90.51 ± 1.12

2

2

64.38 ± 2.16

81.14 ± 0.00

90.86 ± 0.47

67.77 ± 1.32

85.37 ± 0.78

88.00 ± 1.20

F1 on different noise levels

0

0

0.800 ± 0.01

0.950 ± 0.00

0.868 ± 0.02

0.796 ± 0.02

0.833 ± 0.01

0.895 ± 0.01

0

1

0.877 ± 0.01

0.953 ± 0.00

0.930 ± 0.01

0.843 ± 0.01

0.904 ± 0.00

0.915 ± 0.01

0

2

0.871 ± 0.01

0.915 ± 0.00

0.929 ± 0.00

0.882 ± 0.01

0.908 ± 0.00

0.910 ± 0.00

1

0

0.223 ± 0.03

0.942 ± 0.00

0.404 ± 0.03

0.163 ± 0.05

0.388 ± 0.01

0.653 ± 0.04

1

1

0.771 ± 0.02

0.910 ± 0.00

0.900 ± 0.01

0.754 ± 0.04

0.839 ± 0.00

0.860 ± 0.01

1

2

0.775 ± 0.01

0.896 ± 0.00

0.923 ± 0.02

0.754 ± 0.05

0.829 ± 0.00

0.839 ± 0.01

2

0

0.087 ± 0.01

0.904 ± 0.00

0.247 ± 0.03

0.046 ± 0.01

0.428 ± 0.03

0.566 ± 0.07

2

1

0.532 ± 0.05

0.838 ± 0.00

0.860 ± 0.02

0.563 ± 0.03

0.795 ± 0.02

0.764 ± 0.02

2

2

0.576 ± 0.02

0.790 ± 0.00

0.893 ± 0.02

0.619 ± 0.02

0.777 ± 0.02

0.767 ± 0.03

  1. The baseline models performed worse when trained on high noise than no noise, while the CNNs, transformers, and ProtoPNets were the opposite. This shows that additional noise helps mitigate overfitting for CNNs, transformers, and ProtoPNets. The majority of the best baseline models were produced by logistic regression models. Baseline models were trained on k-mer = 3, 5, and 8. Offline augmentation was used for the baselines and Flück et. al4., while online augmentation was used for our CNN and ProtoPNet. Accuracies for Flück et. al. differ from their paper due to a lack of a binarization threshold. ± indicates standard deviation for n = 3 runs.