Abstract:
|
Recent advancements in software infrastructure have caused enormous complexities for the maintenance of stable software environments and anomalies have become almost inevitable. Moreover, it is important to react to anomalies as quickly as possible, if detected, by taking possible corrective actions. But, it is not quite trivial to detect the root cause of a problem because typically the metric under consideration has multiple internal and external dependencies. It involves significant manual engineering and requires strong domain expertise to isolate the correct reason for the problem. We propose a method for automated isolation of the problem to a much smaller scale by narrowing down the affected feature space under consideration to a much smaller subset. The main idea is to consider the time series to be a series of observations of an underlying process passing through some discretized states. We also consider that the propagation of the effect of a given problem causes unaligned but homogenous shifts of the underlying states. Finally, we perform causal alignment between different time series to isolate a small set of time series resulting from (or a set of) an affected feature.
|