Table 1 Reduce sequence redundancy in the different datasets by using the CD-HIT-EST software.

From: csDMA: an improved bioinformatics tool for identifying DNA 6 mA modifications via Chou’s 5-step rule

Species

Dataset

Sequence identity threshold

0.95

0.90

0.85

0.80

Mouse

Positive

1,931

1,924

1,914

1,892

Negative

1,885

1,866

1,844

1,836

Rice

Positive

880

879

878

876

Negative

880

880

880

880

cross-species

Positive

2,811

2,803

2,792

2,768

Negative

2,767

2,746

2,724

2,716