Abstract:
|
We will present a number of supervised learning methods that can be applied to Biomedical Big Data: In particular we will cover penalized approaches to regression and classification; as well as support vector machines, and tree-based methods. We will consider the analysis of "high-dimensional Omics" data sets. These data are typically characterized by a huge number of molecular measurements (such as genes) and a relatively small number of samples (such as patients). In addition, we will discuss the use of these tools in the development of prognostic and predictive biomarkers. Each topic will be illustrated with examples, both of well-done and poorly-done analyses. The example analyses will be conducted using state-of-the-art packages in R (including "e1071", "rpart", "gbm" and "glmnet"). Throughout the course, we will focus on common pitfalls in the supervised analysis of Biomedical Big Data and how to avoid them. The course will include interactive discussions/"challenge questions", to help participants actively engage with applying these tools in biomedical scenarios. This course assumes some previous exposure to linear regression, statistical hypothesis testing and R.
|