Abstract:
|
A common challenge in the cybersecurity realm is the proper handling of high-volume streaming data. Typically in this setting, analysts are restricted to techniques with computationally cheap model-fitting and prediction algorithms. In many situations, however, it would be beneficial to use more sophisticated techniques. In this talk, a general framework is proposed that adapts a broad family of statistical and machine learning techniques to the streaming setting. The techniques of interest are those that can generate computationally cheap predictions, but which require iterative model-fitting procedures. This broad family of techniques includes various clustering, classification, regression, and dimension reduction algorithms. We discuss applied and theoretical issues that arise when using these techniques for streaming data whose distribution is evolving over time.
|