Abstract:

We propose a general framework for statistical inference in highdimensional binary regression problems where noise is present in the response. The noise distribution can be classconditional, so that the responses can be flipped with different probabilities depending on their original classes. This problem can be viewed as a latent variable problem where the original responses are hidden. The likelihood based approach leads to optimal inference in the classical n>>p regime, but the associated optimization problem is nonconvex. A generalized method of moment approach leads to a convex formulation of the problem, but estimates from these approaches are usually not optimal. We demonstrate computational and theoretical advantages of each estimator, and argue that we can take the best of both worlds. We show that our proposal leads to a computationally efficient estimation procedure that is asymptotically optimal in the classical regime, and also achieves an optimal mean squared error rate in the highdimensional regime. We also propose a method for hypothesis testing using the constructed estimator. We empirically show that our method works well in real data applications.
