Abstract:
|
Kernels are essential elements in the construction of learning systems and have received considerable attention in machine learning. In statistics, kernels are used as tools for achieving specific data analytic goals, such as density estimation. The literature includes methods for constructing multivariate kernels for interval scale data. We discuss the construction and properties of a special class of kernels, the class of diffusion kernels. We first offer a statistical definition of this class, and present an important sub-class, the set of canonical diffusion kernels. Using these kernels, we present an algorithm to construct kernels for categorical scale, either nominal or ordinal, data. We further extend this construction to obtain kernels appropriate for use with mixed-scale, that is both categorical and interval scale, data. Our algorithm uses ideas that relate to the theory of continuous time Markov processes and the theory of Toeplitz matrices. We illustrate the construction of these kernels in high-dimensional density estimation. Time permitting we will indicate the construction of test statistics, akin to chi-squared tests for independence.
|