Abstract:
|
Intrusion detection and response algorithms are key technologies for improving network resilience against cyber-attacks. A key challenge to the development of these technologies is the sparsity of robust and realistic data sets that describe attack features. Data generation techniques have been previously developed to augment data sets when representative data sets are lacking, but algorithms for common applications (e.g., image and text) are frequently not applicable to the analysis of cyber systems. Cyber analyses often leverage timeseries data from a variety of system sensors, and, when viewed together, they form multivariate timeseries. The range of techniques for generating multivariate timeseries data is limited. This paper introduces GAMVT, a new method for generating multivariate timeseries data. GAMVT needs relatively few training samples compared to machine learning-based methods. GAMVT statistically characterizes each class of samples and generates new samples that are both distinct from and reasonably similar to the training samples. This paper demonstrates GAMVT for a space-cyber use case and compares it to Generative Adversarial Networks and Variational Autoencoders.
|