Online Program Home
My Program

Abstract Details

Activity Number: 88
Type: Invited
Date/Time: Sunday, July 31, 2016 : 6:00 PM to 8:00 PM
Sponsor: Section on Statistical Learning and Data Science
Abstract #319281
Title: Cascaded High-Dimensional Histograms: A Generative Approach to Density Estimation
Author(s): Siong Thye Goh* and Cynthia Rudin
Companies: MIT and Duke University
Keywords: Cascaded Histograms ; Nonparametric Density Estimation ; Interpretable Models ; Density List ; Generative Model

We present density estimation methods for high dimensional binary/categorical data. Our density estimation models are tree- or list- structured and can be visualized to provide an interpretable representation of the data. Our methods are high dimensional analogies to variable bin width histograms. In each leaf of the tree (or list), the density is constant, similar to the flat density within the bin of a histogram. Histograms, however, cannot easily be visualized in higher dimensions, whereas our models can. The accuracy of histograms fades as dimensions increase, whereas our models have priors that help with generalization. We present three generative models, where the first one allows the user to specify the number of desired leaves in the tree within a Bayesian prior. The second model allows the user to specify the desired number of branches within the prior. The third model allows the user to specify the desired number of rules and the length of rules within the prior and returns a list. Our results indicate that the new approach yields a better balance between sparsity and accuracy of density estimates than other methods for this task.

Authors who are presenting talks have a * after their name.

Back to the full JSM 2016 program

Copyright © American Statistical Association