Abstract:

Data arising in cybersecurity applications often have a network, or `graphlike', structure, and accurate statistical modelling of connectivity behaviour has important implications, for instance, for network intrusion detection. We present a linear algebraic approach to network modelling, which is massively scalable and also very general. In this approach, nodes are embedded in a finite dimensional latent space, where common statistical, signalprocessing and machinelearning methodologies are then available. A central limit theorem provides asymptotic guarantees on the statistical accuracy of the embedding. We explore an intriguing connection between `disassortivity', whereby nodes that are similar are relatively unlikely to connect, and spacetime, as defined in special relativity. Mass testing for anomalous edges, correlations, and changepoints is then discussed. Results are illustrated on network flow data collected at Los Alamos National Laboratory. This is joint work with Nick Heard (Imperial College London).
