Online Program Home
  My Program

Abstract Details

Activity Number: 183 - SPEED: Bayesian Methods Student Awards
Type: Contributed
Date/Time: Monday, July 31, 2017 : 10:30 AM to 11:15 AM
Sponsor: Section on Bayesian Statistical Science
Abstract #325145
Title: A Bayesian Mixture Model for Clustering and Selection of Feature Occurrence Rates Under Mean Constraints
Author(s): Qiwei Li* and Michele Guindani and Brian Reich and Howard Bondell and Marina Vannucci
Companies: Rice University and University of California, Irvine and NCSU and NC State University and Rice University
Keywords: count data ; Bayesian nonparametrics ; Poisson mixture ; feature selection ; text analysis
Abstract:

In this paper, we consider the problem of modeling a matrix of count data, where multiple features are observed as counts over a number of samples. Due to the nature of the data generating mechanism, such data are often characterized by a high number of zeros and overdispersion. In order to take into account the skewness and heterogeneity of the data, some type of normalization and regularization is necessary for conducting inference on the occurrences of features across samples. We propose a zero-inflated Poisson mixture modeling framework that incorporates a model-based normalization through prior distributions with mean constraints, as well as a feature selection mechanism, which allows us to identify a parsimonious set of discriminatory features, while simultaneously clustering the samples into homogenous groups. Using simulated data, we show how our approach improves on the accuracy of the clustering with respect to more standard approaches for the analysis of count data. We also present an application to a bag-of-words benchmark data set, where the features are represented by the frequencies of occurrence of each word.


Authors who are presenting talks have a * after their name.

Back to the full JSM 2017 program

 
 
Copyright © American Statistical Association