Abstract

Data series classification is an important and challenging problem in data science. Explaining the classification decisions by finding the discriminant parts of the input that led the algorithm to some decision is a real need in many applications. Convolutional neural networks perform well for the data series classification task; though, the explanations provided by this type of algorithms are poor for the specific case of multivariate data series. Addressing this important limitation is a significant challenge. In this paper, we propose a novel method that solves this problem by highlighting both the temporal and dimensional discriminant information. Our contribution is two-fold: we first describe a convolutional architecture that enables the comparison of dimensions; then, we propose a method that returns dCAM, a Dimension-wise Class Activation Map specifically designed for multivariate time series (and CNN-based models). Experiments with several synthetic and real datasets demonstrate that dCAM is not only more accurate than previous approaches, but the only viable solution for discriminant feature discovery and classification explanation in multivariate time series.

Paul Boniol, Mohammed Meftah, Emmanuel Remy, and Themis Palpanas. 2022. dCAM: Dimension-wise Class Activation Map for Explaining Multivariate Data Series Classification. In Proceedings of the 2022 International Conference on Management of Data (SIGMOD ’22), June 12–17, 2022.

Synthetic Datasets

The real datasets are injected with known discriminant patterns and a real use case from the medical domain to evaluate the discriminant features identification. We use the StarLightCurves (classes 2 and 3 only), ShapesAll (classes 1 and 2 only), and Fish (class 1 and 2 only) datasets from the UCR archive, in which we inject subsequences that will generate discriminant features. We build two types of datasets to study the ability of the algorithms to identify the discriminant patterns guiding the classification decision, (1) when these patterns occur in a subset of the dimensions at different timestamps, and (2) when these patterns occur in a subset of the dimensions at the same timestamp.

Type 1 datasets: we build each dimension of Class 1 by concatenating random instances from one class of one of our two UCR seed datasets. We build Class 2 by injecting in the series of the other class of our two UCR datasets a pattern in 2 random dimensions at a random position in the series.

Type 2 datasets: we build each dimension of Class 1 by concatenating random instances from one of the classes of our two UCR datasets and injecting patterns from the other class in x random dimensions and at different positions. We build Class 2 by injecting patterns at the same positions of 2 random dimensions.

dCAM: Explaining multivariate data series classification

Abstract

Real Datasets

Synthetic Datasets

Source Code