Abstract:
|
This paper develops a statistical framework for the prediction of sub-national monthly anti-government violence in Mexico. Owing to the large number of observations and potential social predictors available for this highly imbalanced classification task, fitting a reasonable statistical or machine learning model is time and memory intensive, and several desirable models are not computationally feasible. To address these issues, we propose a bag of little over-sampled bootstraps (BLOB) approach that integrates Bayesian survey sampling and penalized high-dimensional logistic regression within a two-layered resampling scheme. This approach is computationally scalable and parallelizable, comes with the mathematical guarantee of being asymptotically consistent for statistical inference, and provides scientifically meaningful measures of uncertainty for several quantities of interest. After outlining our proposed framework, we provide theorems and proofs of each of the above contentions, compare our framework's classification accuracy to several more established supervised classifiers, and substantively assess the key drivers of anti-government violence that our analysis identifies.
|