Abstract:
|
Proteins perform many functions within the cells of living organisms, and these functions are closely related to their subcellular location: where in a cell they reside. Due to whole genome sequencing in many organisms, protein sequences are generated and deposited into databases faster than their subcellular locations can be experimentally determined, so there is a need for predictors that can accurately predict protein subcellular locations. Many algorithms have been developed to help build up tools for predicting subcellular locations of proteins, but only a small fraction have a high level of prediction accuracy. This research takes the approach of using Chou's pseudo amino acid composition as a numerical representation of the proteins, turning a long string of amino acids that make up a protein into a vector. Through this transformation, the predictive performance of Random Forests, AdaBoost, and SAMME in predicting fungal protein subcellular locations are examined and are compared with the performance results from support vector machines and the covariant discriminant algorithm. The latter two methods we find to be of particular efficacy, worth future use and expansion upon.
|
ASA Meetings Department
732 North Washington Street, Alexandria, VA 22314
(703) 684-1221 • meetings@amstat.org
Copyright © American Statistical Association.