JSM 2012 Home

JSM 2012 Online Program

The views expressed here are those of the individual authors and not necessarily those of the JSM sponsors, their officers, or their staff.

Online Program Home

Abstract Details

Activity Number: 176
Type: Contributed
Date/Time: Monday, July 30, 2012 : 10:30 AM to 12:20 PM
Sponsor: Section on Statistical Learning and Data Mining
Abstract - #305535
Title: Detecting Novel Bivariate Associations in Large Data Sets
Author(s): Yakir Reshef*+ and David Reshef and Hilary Kiyo Finucane and Sharon Grossman and Gilean McVean and Peter Turnbaugh and Eric Lander and Michael Mitzenmacher and Pardis Sabeti
Companies: and Harvard/MIT Division of Health Science and Technology and Weizmann Institute of Science and Harvard/MIT Division of Health Science and Technology and Oxford University and Harvard University and Broad Institute of MIT and Harvard and Harvard University and Harvard University
Address: 92 Springbrook Rd., Livingston, NJ, 07039, United States
Keywords: data exploration ; dependence ; association ; data mining ; exploratory data anlysis

Identifying interesting relationships between pairs of variables in large data sets is increasingly important. One way of doing so is to search such data sets for pairs of variables that are closely associated. This can be done by calculating some measure of dependence for each pair, ranking the pairs by their scores, and examining the top-scoring pairs. We outline two heuristic properties--generality and equitability--that the statistic we use to measure dependence should have in order for such a strategy to be effective.

We present a measure of dependence for two-variable relationships, the maximal information coefficient (MIC), that appears to have these properties. MIC captures a wide range of associations both functional and not (generality), and assigns similar scores to relationships with similar noise levels, regardless of relationship type (equitability). Finally, we show that MIC belongs to a larger class of maximal information-based nonparametric exploration (MINE) statistics for identifying and classifying relationships.

The address information is for the authors that have a + after their name.
Authors who are presenting talks have a * after their name.

Back to the full JSM 2012 program

2012 JSM Online Program Home

For information, contact jsm@amstat.org or phone (888) 231-3473.

If you have questions about the Continuing Education program, please contact the Education Department.