Abstract:
|
This paper presents an incremental Newton-Raphson learning algorithm to analyze streaming datasets using generalized linear models. The proposed method is formulated within a new framework of renewable estimation and incremental inference, in which the estimates are renewed with current data and summary statistics of historical data, but with no use of any historic subject-level data. In the implementation, we design a new paradigm of data storage and data processing, named as the Rho architecture, as part of expansion to the currently popular Apache Spark Lambda architecture, to accommodate the inference-related statistics in the inference layer, as well as to communicate with the speed layer to facilitate sequential updating of estimation and inference. Both estimation consistency and asymptotic normality of the renewable estimator are established, through which the Wald test is utilized for incremental inference. Our methods are examined and illustrated by various numerical examples from both simulation experiments and a real-world data analysis.
|