CORAD: Correlation-Aware Compression of Massive Time Series using Sparse Dictionary Coding

Big Data 2019 · Abdelouahab Khelifati, Mourad Khayati, Philippe Cudré-Mauroux ·

Time series streams are ubiquitous in many application domains, e.g., transportation, network monitoring, autonomous vehicles, or the Internet of Things (IoT). Transmitting and storing large amounts of such fine-grained data is however expensive, which makes compression schemes necessary in practice. Time series streams that are transmitted together often share properties or evolve together, making them significantly correlated. Despite the rich literature on compression methods, the state-of-the-art approaches do not typically avail correlation information when compressing times series. In this work, we demonstrate how one can leverage the correlation across several related time series streams to both drastically improve the compression efficiency and reduce the accuracy loss.We present a novel compression algorithm for time series streams called CORAD (CORelation-Aware compression of time series streams based on sparse Dictionary coding). Based on sparse dictionary learning, CORAD has the unique ability to exploit the correlation across multiple related time series to eliminate redundancy and perform a more efficient compression. To ensure the accuracy of the compressed time series, we further introduce a method to threshold the information loss of the compression. Extensive validation on real-world datasets shows that CORAD drastically outperforms state-of-the-art approaches achieving up to 40:1 compression ratios while minimizing the information loss.

PDF Abstract