Skip to main content

Table 4 Species identification success rates for different combinations of k-mer and g-spaced feature sets, where 4 and 5 sequences per species were used to train the prediction model. It can be seen that though the species identification success rates for both feature sets are at par, number of k-mer features used are larger than that of g-spaced features.

From: funbarRF: DNA barcode-based fungal species prediction using multiclass Random Forest supervised learning model

Feature-type

Feature combination

#Features

#Sequences/Species

5

6

k-mer

1+2

20

76.37±4.91

79.61±3.33

1+2+3

84

79.21±4.71

82.72±2.81

1+2+3+4

340

80.61±4.03

83.68±2.85

g-spaced

g=1+2+3+4+5

96

81.74±2.72

83.49±2.36