Abstract:
|
In this paper, we investigate four alignment-free methods along with different classification approaches for comparative genomic analysis. We consider classical nearest neighbor classifier, logistic regression analysis, support vector machine, diagonal linear discriminant analysis, classification trees, neural networks as well as permanental classification model. Each of these classification approaches is applied on viral genome sequences vectorized by alignment-free methods, including k-mer, natural vector, composition vector, and Q-vector. Due to the high dimensionality of data, we use feature selection technique based on variance ratio to facilitate the comparison of different classification methods. A comprehensive comparison is made based on the Baltimore class labels of viruses and recommendations are made for comparative genomic analysis using alignment-free methods.
|