Abstract:
|
This course surveys fundamental concepts in modern data science. It emphasizes nonparametric regression and classification, with sparsity, regularization, and the Curse of Dimensionality being recurring themes. Specific topics include: (1) nonparametric regression, including the backfitting algorithm, with the bootstrap and cross-validation as associated tools, (2) the Lasso, elastic net, and LARS, with the Hoff algorithm for solutions when the penalty function is Lq for 0 < q < 1, (3) the p >> n problem, with a survey of key results from Donoho and Tanner, Candes and Tao, and Wainwright, (4) the median model of Berger and Barbieri, (5) comparison of geometric, algorithmic and probabilistic classification methodology, including nearest-neighbor,support vector machine, and Random Forests techniques, (6) improvement of classification techniques through ensembles and forward stagewise learning, such such as bagging, stacking and boosting, (7) topic modeling, using Latent Dirichlet Allocation. Most ideas will be illustrated through an application to a data set.
|