Human learners appear to have a talent to transfer their knowledge gained from one task to another similar but different task. However, in statistical learning, most procedures are designed to solve one single task, or to learn one single distribution. In this talk, we consider transfer learning based on observations from different distributions under the posterior drift model, which is a general framework for many applications.
We first establish the minimax rate of convergence and construct a rate-optimal two-sample weighted K-NN classifier. The results characterize precisely the contribution of the observations from the source distribution to the classification task under the target distribution. A data driven adaptive classifier is then introduced and is shown to simultaneously attain within a log factor of the optimal rate over a wide collection of parameter spaces. Extensions to multiple source distributions will also be discussed.