Abstract:
|
This era of Big Data includes many applications of mixed effects modeling, such as large genomics studies (Hoffman et al Bioinf. 2014), recommender systems (Gao and Owen, arXiv 2016), and salary prediction by micro-region and occupation (Kenthapadi et al, arXiv 2017). However, the time complexity for estimation in such models can grow as fast a n^1.5. Even worse, the computation may not fit into available memory, rendering direct, single-stage estimation impossible. Parallel computation on separate machines may be a remedy, using a model-specific algorithm. Here we present a model-independent approach to the problem using a technique we call Software Alchemy (Matloff, JSS 2016), and show computational speedup on various real datasets. Due to algorithmic time complexity issues, it is actually possible in some cases to achieve superlinear performance, i.e. with a speedup factor greater than the number of computational processes.
|