Abstract:
|
It has been discussed in the literature that Internet traffic data is statistically self-similar. In this work, we are re-visiting the notion of self-similarity for data collected through NetFlow from Imperial College London network. NetFlow is one of the data sources that exist in the field of cyber-security for monitoring a network. NetFlow records present connection events between the devices of the studied network with devices within and outside the network. A number of statistical approaches have been proposed in the literature for estimating the self-similarity Hurst parameter. These include the heuristic approaches: rescaled-range statistic and variance-time plots, and the frequency-domain estimators: Periodogram and Whittle. Through these approaches we have observed that characteristics of the NetFlow connections exhibit long-range dependence, as for example the packets and bytes exchanged, the total duration and the total number of connections. Such findings are valuable in modeling the correlations observed in the NetFlow data and for further development of systems that detect any abnormal deviations from the normal activity of a network.
|