Abstract:
|
Electronic, longitudinal health databases provide a wealth of real-world clinical data that enable very large observational studies. Logistic regression is a staple model for predicting binary outcomes in a variety of clinical study designs. However, there are inadequate computational tools in many situations involving large models. For example, propensity score estimation with logistic regression using thousands of covariates often involves statistical regularization that requires expensive cross-validation. Also, predicting outcomes in stratified data using conditional logistic regression to avoid nuisance parameter estimation becomes computationally infeasible for even moderately large strata. We present efficient and GPU parallelized implementations of conditional and unconditional logistic regression that allow for extensively cross-validated models with many thousands of predictors. We compare our methods to existing software packages and also propose extensions to other commonly used generalized linear models. We aim to remove computational burden as a barrier to using desired methods for massive problems in observational health research.
|