Table 1 Performance of PhyloTune in identifying the smallest taxonomic unit on the Plant dataset

From: PhyloTune: An efficient method to accelerate phylogenetic updates using a pretrained DNA language model

(a) Performance on novelty detection.

 

AUROC (↑)

AUPR (↑)

Method

Class

Order

Family

Genus

Class

Order

Family

Genus

baseline

85.37

64.52

73.67

82.96

99.58

97.80

95.61

91.13

PhyloTune

98.27

89.41

89.06

90.17

99.96

99.52

98.39

93.85

(b) Performance on taxonomic classification.

 

Macro Precision (↑)

Macro Recall (↑)

Method

Class

Order

Family

Genus

Class

Order

Family

Genus

baseline

81.31

62.94

68.89

86.75

80.67

71.81

77.74

86.46

PhyloTune

91.18

89.09

89.56

98.20

85.67

88.72

93.25

98.18

 

Macro F1 (↑)

Matthews correlation coefficient (↑)

Method

Class

Order

Family

Genus

Class

Order

Family

Genus

baseline

79.48

65.25

71.44

86.49

83.18

77.86

81.29

94.69

PhyloTune

87.16

88.07

90.40

98.18

87.46

86.75

89.91

98.16

  1. Performance is evaluated on a test set of 15,000 sequences from the Plant dataset. The baseline employs a fine-tuning strategy that freezes the backbone while training hierarchical linear probes (HLPs). PhyloTune consistently outperforms the baseline across all taxonomic ranks. The distribution of in-distribution (ID) and out-of-distribution (OOD) sequences at each rank is detailed in Table 2. Source data are provided as a Source Data file.