Online Program Home
  My Program

All Times EDT

Abstract Details

Activity Number: 167 - Data Mining and Econometrics
Type: Contributed
Date/Time: Tuesday, August 10, 2021 : 10:00 AM to 11:50 AM
Sponsor: Business and Economic Statistics Section
Abstract #318385
Title: Ranking Interestingness Scores for Overdispersed and Heteroskedastic Data at Scale
Author(s): Serge Sverdlov*
Companies: Microsoft Corporation
Keywords: interestingness score; residual ranking; data mining; overdispersion; heteroskedasticity; subgroup analysis
Abstract:

We develop statistically interpretable counterparts to interestingness measures from the data mining literature, on Bayesian and Frequentist foundations. We develop scalable estimation procedures for the corresponding statistics, focusing on stable distributions, overdispersion parameters for count data, and heteroskedasticity parameters for continuous data. We illustrate a method (most interesting subgroup prediction) that connects interestingness measures to decision tree/random forest methods, especially for risk prediction.

We present applications to traditional market basket analysis, genomic risk prediction, and clinical trial subgroup analysis, and to geographic Covid incidence variability.


Authors who are presenting talks have a * after their name.

Back to the full JSM 2021 program