Abstract:
|
This course will provide a practical introduction to statistical learning methods for unsupervised problems. We will discuss three classes of methods: cluster analysis, dimension reduction, and graphical modeling. Specifically, we will first discuss hierarchical and K-means clustering methods. Then, we will talk about principal component analysis and multi-dimensional scaling as tools for reducing the ambient dimension of the data. Finally, we will discuss sparse graphical models for analysis of high-dimensional data, including data from Gaussian and non-Gaussian distributions. Throughout, we will emphasize practical application of these methods, as well as their limitations in high-dimensional settings, including validation of results of unsupervised learning methods and tools for reproducible research. A number of case studies from finance and biology will be discussed to describe various statistical learning methods. The course will incorporate material from Elements of Statistical Learning by Hastie et al, Introduction to Statistical Learning by James et al, and instructor's notes from two courses taught at the Summer Institute for Statistical Genetics.
|