Online Program Home
  My Program

Abstract Details

Activity Number: 519 - Sparse Statistical Learning
Type: Contributed
Date/Time: Wednesday, August 2, 2017 : 10:30 AM to 12:20 PM
Sponsor: Section on Statistical Learning and Data Science
Abstract #323558
Title: Variable Selection for Massive Data Using the Divide And
Author(s): Lei Yang* and Yixin Fang and Junhui Wang and Yongzhao Shao
Companies: New York University and New Jersey Institute of Technology and CityU and New York University-School of Medicine
Keywords: Divide-and-conquer ; Lasso ; SCAD ; Massive data ; Variable selection
Abstract:

Variable selection is an important problem in data analysis, especially in massive data analysis. Many variable selection methods, such as Lasso and SCAD, have been developed for high-dimensional data. However, these methods are impractical for conducting variable selection in massive data analysis, due to high computational cost. This manuscript introduces a framework of making these methods practical for conducting variable selection in massive data analysis, via the divide and conquer approach. The divide and conquer approach divides the massive data into subgroups randomly to which the existing variable selection methods can be applied computationally efficiently, and then find the most frequent set of informative variables. The proposed variable selection method reduces the computational cost significantly when applied to massive data. Some asymptotic properties, such as the selection consistency and convergence rate, are derived. The performance of the proposed method is evaluated via simulation studies and real-data applications.


Authors who are presenting talks have a * after their name.

Back to the full JSM 2017 program

 
 
Copyright © American Statistical Association