Table 2 Validation performance of utilized machine learning models.

From: Semi-supervised learning framework for oil and gas pipeline failure detection

Model*

Class

Accuracy

False positive

False negative

Overall accuracy

KNN

1

56.4

41.3

43.6

59.0

2

53.8

45.5

46.2

 

3

53.8

36.6

33.3

 

Naïve bayes

1

74.3

25.7

33.3

72.2

2

58.1

41.9

30.8

 

3

88.7

11.3

19.2

 

SVM

1

78.2

21.8

21.8

76.1

2

73.1

36.0

26.9

 

3

76.9

10.4

23.1

 

CART

1

96.2

3.8

2.6

88.9

2

86.8

13.2

15.4

 

3

83.5

16.5

15.4

 

ANN

1

88.0

12.0

12.0

88.6

2

82.5

17.5

19.5

 

3

94.0

6.0

4.1

 

Boosted trees

1

97.4

3.8

2.6

91.0

2

91.0

12.3

9.0

 

3

84.6

10.8

15.4

 
  1. All models' performance metrics are evaluated based on a five-fold cross validation study.
  2. *The exact computational time depends on the configuration of the parallel computing environment. In this case, local parallelization is used, 4 cores sharing 24 GB. The average computation cost to train the complete CIC framework and assess a failure event from a given incident report is ~ 8 min. This is based on the optimal CIC configuration and using the aforementioned computational settings.
  3. Best performance values are in [bold].