Abstract

Deep learning models usually need abundant data, which is, however, very limited in practice. Data augmentation is believed to be one of the effective countermeasures, especially for imbalanced data with a small amount of positive samples. Nevertheless, most of existing work pays more effort to feature extraction than data manipulation. In this paper, we carefully study the characteristics of mainstream data augmentation techniques, evaluate their performance and efficiency for time series classification. First, we develop a taxonomy by examining a set of data augmentation techniques. Second, we carefully select three representative techniques for comprehensive experimental evaluation and comparative studies : noise, interpolation and generative based techniques, used on 144 datasets in the UCR/UEA archive, with two subsequent models : ROCKET and InceptionTime, known to be two good compromises between speed and efficiency. In particular, injecting noise is known to be a reliable and fast augmentation technique, as is SMOTE. TimeGANs are, so far, the only generative model to take into account the temporality of time series. Then, we draw some recommendations and conclusions by linking data properties, augmentation technique, and subsequent model. Thanks to these techniques, we can improve the accuracy of the models on 71.60% of the improvable balanced datasets for ROCKET and 72.62% for InceptionTime. Moreover, using only 20% randomly selected real data, and a process to generate the remaining missing data, we can achieve 86.1% of our initial performance obtained with 100% real data. Finally, thanks to well-defined properties, we can predict with 77.04% accuracy, if our dataset could benefit from the augmentation even before using it.

Romain Ilbert, Themis Palpanas, Hassine Moungla, Thai V Hoang, Zonghua Zhang. Data augmentation for Time Series : An Experimental Study. Under review, (2023)

Abstract

Data and Source Code