Abstract:
|
Logistic regression is arguably the most widely used and studied non-linear model in the statistical literature. Classical maximum likelihood theory provides asymptotic distributions for the maximum likelihood estimate(MLE) and the likelihood ratio test(LRT), which are universally used for inference. Our findings reveal, however, when the number of features p and the sample size n both diverge, with the ratio p/n converging to a positive constant, classical results are far from accurate. For a certain class of logistic models, we observe, (1) the MLE is biased, (2) variability of the MLE is much higher than the classical prediction based on the inverse Fisher information, (3) the LRT is not distributed as a Chi-Squared. We develop a new theory that quantifies the asymptotic bias and variance of the MLE, and characterizes asymptotic distribution of the LRT. Empirical results demonstrate that our predictions are extremely accurate in finite samples. These novel predictions depend on the underlying regression coefficients through a single scalar, the overall signal strength, which can be estimated efficiently. This is based on joint works with Emmanuel Candes and Yuxin Chen.
|