Abstract:
|
It is known that a protein's biological function is in some way related to its physical structure. Many researchers have studied this relationship both for the entire backbone structures of proteins as well as their binding sites, which are where binding activity occurs. However, despite this research, it remains an open challenge to predict a protein's function from its structure. The main purpose of this research is to gain a better understanding of how structure relates to binding activity and to classify proteins according to function via structural information. First, we performed the classification of binding sites for the dataset arising from Ellingson and Zhang (2012) through the use of logistic regression. Then we approach the problem from the data set compiled by Kahraman et al. (2007). We calculated the covariance matrices of the binding sites' coordinates, which use the distance of each atom to the center of mass, and calculated the distance from an atom to the 1st, 2nd and 3rd principal axes. Then we obtained covariance matrices of these distances to serve as our data objects. Finally, we performed classification on these matrices using a variety of techniques, including nearest neighbor.
|