Table 1 Available pre-miRNA detection tools

From: On the performance of pre-microRNA detection algorithms

Study

ML algorithm

Feature number

Positive data

Negative data

Sampling

Implementation

Number of citations (Google Scholar)

Xue46

SVM

32

MiRBase 5.0

CODING dataset (Pseudo)

Random selection (approx. 1:1 positive negative ratio)

*

412 (34)

Jiang47

RF, SVM

34

MiRBase 8.2

pseudo

Random sampling (approx. 1:1 positive negative and 1:1.5 training testing ratio)

*

376 (48)

Ng37

SVM

29

MiRBase 8.2

pseudo

Random selection without replacement (1:2 positive negative ratio)

*

203 (19)

Batuwita48

SVM

21

MiRBase 12

pseudo & Human other ncRNAs

Outer-5-fold-cv

+

172 (16)

Xu49

A novel ranking algorithm based on random walks & SVM

35

MiRBase (September 1, 2007)

Random, non-overlapping 90nt fragments from the human genome

Random selection (1:2 positive to negative ratio)

*

80 (4)

Ding50

SVM

32

Known miRNAs

UTRdb & ncRNA from Rfam 9.1

Outer 3-fold cross-validation

61 (11)

Chen41

LibSVM

99

miRBase (2013)

pseudo & Zou

Leave-one-out

+

31 (24)

Burgt51

L score classifier

18

non-plant miRNA hairpin sequences (miRBase version 9.0)

10-fold cross-validation

*

31 (4)

Gudys40

NB, MLP, SVM, RF, APLSC

28

MiRBase 17

From genomes and mRNAs of ten animal and seven plant species as well as 29 viruses

Stratified 10-fold CV

+

27 (5)

Ritchie52

SVM

36

Murine miRBase v17

Transcripts without evidence of processing by Dicer

20 (5)

Bentwich53

26

Hairpins from Human Genome

10000 hairpins found in non-coding regions

 

20 (2)

Lopes54

SVM, RF, G2 DE

13

MiRBase 19

pseudo

Non-standard training and testing scheme.

*

16 (6)

Gao55

SVM

57

MiRBase v20

Exonic regions of our some available genomes and ncRNAs from rFam

1:1 positive to negative ratio

*

11 (1)

  1. SVM support vector machine, NB Naïve Bayes, MLP Multi-Layered Perceptron, RF Random Forest, APLSC Asymmetric Partial Least Squares Classification, G2DE Generalized Gaussian Density Estimator, + implementation exists, no implementation, * experienced problems with the implementation
  2. Previously published studies performing ab initio pre-miRNA detection using machine learning (ML). Listed are the number of features that were effectively used, the training data that was employed and whether an implementation is available
  3. The negative data (see Online Methods) “pseudo” was generated by Xue16 but downloaded from Ng17. The Table is sorted by the number of citations in Google Scholar (please note that there is a relationship between year of publication and number of citations, therefore, the number of citations in 2016 is provided in parentheses, as well)