Abstract:
|
We introduce the normal-inverse-gamma (NIG) summation operator, which combines Bayesian regression results from different data sources and leads to a simple split-and-merge algorithm for big data regressions. The NIG summation operator satisfies commutativity and associativity with an identity element. Regression data can be processed in an embarrassingly parallel fashion, and online updating for flow data is justified. The summation operator is also useful for computing the marginal likelihood and facilitates Bayesian model selection (BMS) methods, including Bayesian LASSO, stochastic search variable selection, Markov chain Monte Carlo model composition, etc. Observations are scanned in one pass and then the sampler iteratively combines NIG distributions without reloading the data. Computational complexity analysis shows that NIG summations help BMS run almost as fast as the OLS regression, if the sample size is large. Simulation studies demonstrate that our algorithms efficiently handle highly correlated big data. A real-world data set on employment and wage is also analyzed.
|