Online Program Home
  My Program

Abstract Details

Activity Number: 96 - New Statistical Methods with Distributed and Parallel Algorithms
Type: Invited
Date/Time: Monday, July 31, 2017 : 8:30 AM to 10:20 AM
Sponsor: Section on Statistical Learning and Data Science
Abstract #322164 View Presentation
Title: Embracing Blessing of Massive Scale in Big Data
Author(s): Guang Cheng* and Stanislav Volgushev and Shih-Kang Chao
Companies: Purdue and Univ of Toronto and Purdue
Keywords: B-spline estimation ; conditional distribution function ; distributed computing ; divide-and-conquer ; quantile regression progress

The increased availability of massive datasets provides a unique opportunity to discover subtle patterns in their distributions, but also imposes overwhelming computational challenges. To fully utilize the information contained in big data, we propose a two-step procedure: (i) estimate conditional quantile functions at different levels in a parallel computing environment; (ii) construct quantile process through projection based on these estimated quantile curves. This approach attempts to exploit most aspects of the conditional distribution of the response, and does not assume homoskedastic errors or sub-Gaussian tails. Our general quantile regression framework covers both linear models with fixed or growing dimension and series approximation models. We prove that the proposed procedure does not sacrifice any statistical inferential accuracy provided that the number of distributed computing units and quantile levels are chosen properly. In particular, a sharp upper bound for the former and a sharp lower bound for the latter are derived to capture the minimal computational cost from a statistical perspective.

Authors who are presenting talks have a * after their name.

Back to the full JSM 2017 program

Copyright © American Statistical Association