JSM 2015 Preliminary Program

Online Program Home
My Program

Abstract Details

Activity Number: 312
Type: Contributed
Date/Time: Tuesday, August 11, 2015 : 8:30 AM to 10:20 AM
Sponsor: Section for Statistical Programmers and Analysts
Abstract #316415
Title: Examining Model Fit for Logistic Regression on Large Data Sets
Author(s): Todd Connelly*
Companies:
Keywords: Big Data ; Goodness of Fit ; Logistic Regression
Abstract:

The Hosmer Lemeshow Test (HLT) is commonly used as a goodness of fit test for logistic regression. However, it is over-powered in medium (100,000 to 500,000 observations) to large (1 million plus observations) datasets. Recent research [Paul, Pennell, Lemeshow 2012] proposes to address this by increasing the number of groups for the HLT to disperse the power. This helps expand the HLT to datasets of up to 25,000 observations. Yet, in today's world of big data we need to be able to assess fit on logistic regression models with large datasets. We propose a bootstrapping approach to obtain a modified HLT (mHLT) statistic. Several point estimates are considered for being the mHLT, including a median, trimmed mean and 5th and 95th percentiles.


Authors who are presenting talks have a * after their name.

Back to the full JSM 2015 program





For program information, contact the JSM Registration Department or phone (888) 231-3473.

For Professional Development information, contact the Education Department.

The views expressed here are those of the individual authors and not necessarily those of the JSM sponsors, their officers, or their staff.

2015 JSM Online Program Home