![IconGems-Print](images/IconGems-Print.png)
312 – Advanced Topics in Statistical Programming
Examining Model Fit for Logistic Regression on Large Data Sets
Todd Connelly
Trent L. Lalonde, PhD
University of Northern Colorado
The Hosmer Lemeshow Test (HLT) is commonly used as a goodness of fit test for logistic regression. However, it is over-powered in medium (100,000 to 500,000 observations) to large (1 million plus observations) datasets. Recent research [Paul, Pennell, Lemeshow 2012] proposes to address this by increasing the number of groups for the HLT to disperse the power. This helps expand the HLT to datasets of up to 25,000 observations. Yet, in today's world of big data we need to be able to assess fit on logistic regression models with large datasets. We propose a bootstrapping approach to obtain a modified HLT (mHLT) statistic. Several point estimates are considered for being the mHLT, including a median, trimmed mean and 5th and 95th percentiles.