dCAM: Explaining multivariate data series classification

Paul Boniol, Mohammed Meftah, Emmanuel Remy, Themis Palpanas

Abstract

Data series classification is an important and challenging problem in data science. Explaining the classification decisions by finding the discriminant parts of the input that led the algorithm to some decision is a real need in many applications. Convolutional neural networks perform well for the data series classification task; though, the explanations provided by this type of algorithms are poor for the specific case of multivariate data series. Addressing this important limitation is a significant challenge. In this paper, we propose a novel method that solves this problem by highlighting both the temporal and dimensional discriminant information. Our contribution is two-fold: we first describe a convolutional architecture that enables the comparison of dimensions; then, we propose a method that returns dCAM, a Dimension-wise Class Activation Map specifically designed for multivariate time series (and CNN-based models). Experiments with several synthetic and real datasets demonstrate that dCAM is not only more accurate than previous approaches, but the only viable solution for discriminant feature discovery and classification explanation in multivariate time series.

Paul Boniol, Mohammed Meftah, Emmanuel Remy, and Themis Palpanas. 2022. dCAM: Dimension-wise Class Activation Map for Explaining Multivariate Data Series Classification. In Proceedings of the 2022 International Conference on Management of Data (SIGMOD ’22), June 12–17, 2022.

Real Datasets

We conduct our experimental evaluation using real datasets from the UCR/UEA archive to evaluate the classification performance of the competing methods.

Synthetic Datasets

The real datasets are injected with known discriminant patterns and a real use case from the medical domain to evaluate the discriminant features identification. We use the StarLightCurves (classes 2 and 3 only), ShapesAll (classes 1 and 2 only), and Fish (class 1 and 2 only) datasets from the UCR archive, in which we inject subsequences that will generate discriminant features. We build two types of datasets to study the ability of the algorithms to identify the discriminant patterns guiding the classification decision, (1) when these patterns occur in a subset of the dimensions at different timestamps, and (2) when these patterns occur in a subset of the dimensions at the same timestamp.

Source Code

You may freely use this code for research purposes, provided that you properly acknowledge the authors using the reference to the paper.