Online Program Home
  My Program

Abstract Details

Activity Number: 588 - Statistical Learning: Clustering
Type: Contributed
Date/Time: Wednesday, August 2, 2017 : 2:00 PM to 3:50 PM
Sponsor: Section on Statistical Learning and Data Science
Abstract #322678 View Presentation
Title: Two-Layer Heterogeneity Model for Massive Data
Author(s): Ching-Wei Cheng* and Guang Cheng
Companies: Purdue University and Purdue
Keywords: Confidence distribution ; Fusion penalty ; Linear mixed-effects model ; Massive data
Abstract:

Massive data generally consist of numerous heterogeneous datasets, while some of them may be similar enough to be thought as being drawn from the same sub-population. In this paper, we attempt to characterize this subtle data structure by proposing a Two-layer HEterogeneity Model (THEM) framework that accounts for heterogeneity among sub-populations and within each sub-population. Under this framework, a confidence distribution fusion approach is proposed to discover the underlying sub-population structure, and further achieve highest statistical inferential accuracy as if the true sub-population structure were revealed. This statistical analysis tool can be efficiently implemented in a parallel fashion through an alternating direction method of multipliers. In the end, the proposed methodology is applied to a big climate dataset that reveals a possible association with the El Nino-Southern Oscillation.


Authors who are presenting talks have a * after their name.

Back to the full JSM 2017 program

 
 
Copyright © American Statistical Association