ULISSE: ULtra compact Index for Variable-Length Similarity SEarch in Data Series

Michele Linardi* Themis Palpanas*

*(Lipade, University Paris Descartes)

In a nutshell...

ULISSE is the first data series (a.k.a time series) index structure designed for answering similarity search queries of variable length. The main ULISSE building block is a new proposed data series summarization, which succinctly represents neighboring non Z-normalized and Z-normalized sequences. The experimental evaluation on synthetic and real datasets shows that ULISSE is several times (and up to orders of magnitude) more efficient in terms of both space and time cost, when compared to competing approaches.

This page is the support page of ULISSE, which mainly complements the experiments conducted, providing the relative materials. You may find the ULISSE publications here:

Empirical evaluation of our approach

In this following part, we provide the source code of ULISSE (in C). Since the DATASETS used to evaluate our solution are quite big (1GB-100GB), and we do not dispose of such space in the webserver, if you are interested on having those data series collections, please send me an e-mail (michele[dot]linardi[at]orange[dot]fr) and I will provide you a temporary link to download them. Nevertheless, the ULISSE boundle disposes of a synthtetic data generator (Random walk). Please check the README file, you can find it in the source code folder. In this page you can find some information about ASTRO and SEISMIC datasets, we use during our evaluation. Note that the source code file is protected by password. Please contact me at michele[dot]linardi[at]orange[dot]fr.

Please note that we compared ULISSE with other approaches suitable for Variable Length Similarity Search in Data Series :
MASS (Mueen Algorithm for similarity search)
Since these methods were proposed by different authors, we invite the interested user to ask them the source code.