Abstract:
|
The National Incident-Based Reporting System (NIBRS) is used by law enforcement agencies for collecting and reporting a variety of data on each crime incident known to the police. With over 8,000 agencies reporting millions of crimes each year, this is a big administrative data set with over 40 tables. The various tables can be joined together to produce very granular and specific estimates, such as victim, offender, arrestee, and incident characteristics. However, there are missing values or reported “unknown” values as responses to some data items. Assuming missing at random, in this presentation we treat missing data through editing and imputation procedures including a hot deck imputation approach. In particular, we employ the Multivariate Imputation by Chained Equations (MICE), provided in the R “MICE” package, to impute missing demographic variables such as age, gender, and race. We will present the computational process and its modeling challenges, when dealing with a big data set with a multilevel data structure. The proposed methodology can be tailored for other administrative data or survey data collected from establishments.
|