tsrobprep - an R package for robust preprocessing of time series data

26 Apr 2021  ·  Michał Narajewski, Jens Kley-Holsteg, Florian Ziel ·

Data cleaning is a crucial part of every data analysis exercise. Yet, the currently available R packages do not provide fast and robust methods for cleaning and preparation of time series data. The open source package tsrobprep introduces efficient methods for handling missing values and outliers using model based approaches. For data imputation a probabilistic replacement model is proposed, which may consist of autoregressive components and external inputs. For outlier detection a clustering algorithm based on finite mixture modelling is introduced, which considers time series properties in terms of the gradient and the underlying seasonality as features. The procedure allows to return a probability for each observation being outlying data as well as a specific cause for an outlier assignment in terms of the provided feature space. The methods work robust and are fully tunable. Moreover, by providing the auto_data_cleaning function the data preprocessing can be carried out in one cast, without comprehensive tuning and providing suitable results. The primary motivation of the package is the preprocessing of energy system data. We present application for electricity load, wind and solar power data.

PDF Abstract

Datasets


  Add Datasets introduced or used in this paper

Results from the Paper


  Submit results from this paper to get state-of-the-art GitHub badges and help the community compare results to other papers.

Methods


No methods listed for this paper. Add relevant methods here