Saturday, February 25
CS24 It's a Package Deal Sat, Feb 25, 11:00 AM - 12:30 PM
City Terrace 9

Logistic Regression Cross-Package Comparison (303375)

*Lillian Ma, Capital One Bank 

Keywords: Logistic Regresson, R, Python, H2O, Spark MLLib

In this paper, we compare the logistic regression methods available in different widely-used packages, including SAS proc logistic, R GLM, R GLMNET, H2O GLM, Python Sklearn LogisticRegression and Sklearn SGDClassifier, Python Statsmodels Discrete_model Logit, and Spark MLlib Logistic Regression.

Using datasets simulated following a preset logistic relationship, we test how well these tool packages detect the true relationship between the target variable (0/1) and the predicting features, as well as how different tuning methods and solver choices offered by each package may affect the computational time and model outcomes. This paper also addresses the differences in data preparation and model algorithm implementation among these tool packages.

Our main contributions are: • Addressing the differences in data preparation and model fit among popular logistic regression computational packages. • Listing and comparing the different solvers and tuning methods provided by each tool and demonstrating their impact on model performance. • Using a simulated dataset to test the functionality of the logistic regression packages provided by different tools.