Online Program

Return to main conference page
Saturday, October 21
Sat, Oct 21, 7:30 AM - 8:30 AM
Aventine Ballroom G
Continental Breakfast and Speed Poster 4 sponsored by Bank of America

Coherent Set Mining in Binary Data (304092)

*Kelly Nicole Bodwin, Cal Poly San Luis Obispo 

A common problem in statistical data mining is to identify groups of variables, often from a much larger pool, that are strongly associated. We introduce Coherent Set Mining (CSM), a new method of association mining in high-dimensional binary data. CSM makes use of an iterative testing-based method for extracting significant associated variable sets. Our approach relies a new measure of association, coherence, which captures latent relationships between variables when data consists of thresholded sample observations. An estimator of coherence is proposed based on a null model and corresponding consistent parameter estimators. Relevant significance tests for coherence are derived from asymptotic results. We demonstrate the effectiveness of CSM via applications in market basket data, text mining, and music recommendation.