Paul Boniol, Themis Palpanas, Mohammed Meftah, Emmanuel Remy


Subsequence anomaly detection in long sequences is an important problem with applications in a wide range of domains. However, the approaches that have been proposed so far in the literature have severe limitations: they either require prior domain knowledge that is used to design the anomaly discovery algorithms, or become cumbersome and expensive to use in situations with recurrent anomalies of the same type. In this work, we address these problems, and propose an unsupervised method suitable for domain agnostic subsequence anomaly detection. Our method, Series2Graph, is based on a graph representation of a novel low-dimensionality embedding of subsequences. Series2Graph needs neither labeled instances (like supervised techniques), nor anomaly-free data (like zero-positive learning techniques), and identifies anomalies of varying lengths. The experimental results, on the largest set of synthetic and real datasets used to date, demonstrate that the proposed approach correctly identifies single and recurrent anomalies without any prior knowledge of their characteristics, outperforming by a large margin several competing approaches in accuracy, while being up to orders of magnitude faster.

P. Boniol and T. Palpanas, Series2Graph: Graph-based Subsequence Anomaly Detection in Time Series, PVLDB (2020)

P. Boniol and T. Palpanas and M. Meftah and E. Remy, GraphAn: Graph-based Subsequence Anomaly Detection, demo PVLDB (2020)

Real Datasets

The real datasets we use in our study are the following:

Synthetic Datasets

We use several synthetic datasets that contain sinusoid patterns at fixed frequency following a random walk trend (Figure 7). We then inject different numbers of anomalies, in the form of sinusoid waveforms with different phases and higher than normal frequencies, and add various levels of Gaussian noise on top. We refer to those datasets using the label SRW-[# of anomalies]-[% of noise]-[length of anomaly], and use them in order to test the performance of the algorithms under different, controlled conditions.

Source Code

Algorithms protected by patent, and provided as is. You may use this code for research purposes only, provided that you properly acknowledge the authors using the reference to the paper (email the authors for the password of the ZIP file).