Online Program

Return to main conference page
Friday, October 20
Fri, Oct 20, 2:30 PM - 3:45 PM
Aventine Ballroom E
Speed Session 4

Coherent Set Mining in Binary Data (303898)

*Kelly Nicole Bodwin, Cal Poly San Luis Obispo 
Suman Chakraborty, UNC Chapel Hill 
Andrew B. Nobel, UNC Chapel Hill 
Kai Zhang, UNC Chapel Hill 

Keywords: data mining, association mining, latent correlation, binary data

A common problem in statistical data mining is to identify groups of variables, often from a much larger pool, that are strongly associated. We introduce Coherent Set Mining (CSM), a new method of association mining in high-dimensional binary data. CSM makes use of an iterative testing-based method for extracting significant associated variable sets. Our approach relies a new measure of association, coherence, which captures latent relationships between variables when data consists of thresholded sample observations. An estimator of coherence is proposed based on a null model and corresponding consistent parameter estimators. Relevant significance tests for coherence are derived from asymptotic results. We demonstrate the effectiveness of CSM via applications in market basket data, text mining, and music recommendation.