funbarRF: DNA barcode-based fungal species prediction using multiclass Random Forest supervised learning model

BMC Genomic Data

Table 4 Species identification success rates for different combinations of k-mer and g-spaced feature sets, where 4 and 5 sequences per species were used to train the prediction model. It can be seen that though the species identification success rates for both feature sets are at par, number of k-mer features used are larger than that of g-spaced features.

Feature-type	Feature combination	#Features	#Sequences/Species
Feature-type	Feature combination	#Features	5	6
k-mer	1+2	20	76.37±4.91	79.61±3.33
	1+2+3	84	79.21±4.71	82.72±2.81
	1+2+3+4	340	80.61±4.03	83.68±2.85
g-spaced	g=1+2+3+4+5	96	81.74±2.72	83.49±2.36

ISSN: 2730-6844