Table 1 Datasets from previous research are incorporated to evaluate the effectiveness of different encoding methods and their response to various parameters, such as the number of sequences and the maximum sequence length. To prevent any potential biases from the datasets used in previous studies, a new dataset, Dataset0 is also included in this analysis. Dataset0 has not been previously used in any encoding techniques, ensuring a fair comparison between the different encoding methods being tested.
From: Comparative study of encoded and alignment-based methods for virus taxonomy classification
Name | Description | Total Seqs | Min. length | Max. length |
---|---|---|---|---|
DataSet0 | Viruses in the genus AlphaCoV and BetaCoV of coronaviruses, along with their subgenera in BetaCov | 59 | 27165 | 31526 |
DataSet1 | Viruses from the family Coronaviridae to classify SARS-CoV-2 | 56 | 25425 | 31686 |
DataSet2 | Viruses in the genus BetaCoV to classify SARS-CoV-2 at the genus level | 50 | 29037 | 31491 |
DataSet3 | Closely related coronaviruses from the seafood market | 69 | 27213 | 30311 |
DataSet4 | Transmission modes of human coronaviruses originating from animals | 106 | 26883 | 31473 |
DataSet5 | Virus genomes obtained from human SARS-CoV-2 viruses | 141 | 29674 | 29882 |
DataSet6 | Genus within the Coronaviridae family, known to induce a range of severe diseases in the respiratory and gastrointestinal systems | 34 | 9646 | 31357 |
DataSet7 | Influenza A viruses, which are single-stranded, segmented RNA viruses categorized according to their hemagglutinin and neuraminidase viral surface proteins | 38 | 1350 | 1467 |
DataSet8 | Human rhinoviruses, which is the most common cause of upper respiratory tract | 116 | 6944 | 7458 |
DataSet9 | HPV (Human Papillomavirus) is a common sexually transmitted DNA virus responsible for cervical cancer and genital warts | 400 | 7814 | 10424 |