Abstract:
|
In big data time, less attention is paid to Bayesian methods as they are known to be computationally intensive for both parameter estimation and model selection, while existing literature focus more on approaches to speed up Markov chain Monte Carlo (MCMC). Deviance-based model selection criteria like the deviance information criterion (DIC) Bayesian predictive information criterion (BPIC) are well-known Bayesian criteria for model selection. In this article, we introduce the subsampled DIC and the subsampled information criterion ICAT introduced by Ando and Tsay (2010) in the big data context. Under reasonable regularity conditions, we show that our proposed subsampled criteria closely approximate their full data counterparts. Extensive simulation studies are conducted to evaluate the empirical performance of the proposed criteria. The usage of our proposed criteria is further illustrated with the analysis of two large datasets, the Public Use Microdata Sample (PUMS) data and the cover type data.
|