Online Program

Return to main conference page

All Times ET

Program is Subject to Change

Thursday, June 17
Thu, Jun 17, 1:30 PM - 3:30 PM
Modernization Efforts in Establishment Statistics 2

Score-Matching Representative Approach for Big Data Analysis and Its Extension (308043)

*Keren Li, Northwestern University 
Jie Yang, University of Illinois at Chicago 

Keywords: Big data regression, Divide and conquer, Distributed database, Mean representative approach, Variable Selection, Model Selection

We propose a fast and efficient strategy, called the representative approach, for big data analysis with linear models and generalized linear models. With a given partition of big dataset, this approach constructs a representative data point for each data block and fits the target model using the representative dataset. In terms of time complexity, it is as fast as the subsampling approaches in the literature. As for efficiency, its accuracy in estimating parameters is better than the divide-and-conquer method. With comprehensive simulation studies and theoretical justifications, we recommend two representative approaches. For linear models or generalized linear models with a flat inverse link function and moderate coefficients of continuous variables, we recommend mean representatives (MR). For other cases, we recommend score-matching representatives (SMR). The proposed representative strategy is ideal for analyzing massive data dispersed over a network of interconnected computers. A further discussion on do model selection and variable selection based on the SMR framework is also included.