Multivariate Time Series Imputation with Generative Adversarial Networks

Multivariate time series usually contain a large number of missing values, which hinders the application of advanced analysis methods on multivariate time series data. Conventional approaches to addressing the challenge of missing values, including mean/zero imputation, case deletion, and matrix factorization-based imputation, are all incapable of modeling the temporal dependencies and the nature of complex distribution in multivariate time series. In this paper, we treat the problem of missing value imputation as data generation. Inspired by the success of Generative Adversarial Networks (GAN) in image generation, we propose to learn the overall distribution of a multivariate time series dataset with GAN, which is further used to generate the missing values for each sample. Different from the image data, the time series data are usually incomplete due to the nature of data recording process. A modified Gate Recurrent Unit is employed in GAN to model the temporal irregularity of the incomplete time series. Experiments on two multivariate time series datasets show that the proposed model outperformed the baselines in terms of accuracy of imputation. Experimental results also showed that a simple model on the imputed data can achieve state-of-the-art results on the prediction tasks, demonstrating the benefits of our model in downstream applications.

PDF Abstract

Datasets


  Add Datasets introduced or used in this paper
Task Dataset Model Metric Name Metric Value Global Rank Benchmark
Multivariate Time Series Imputation Basketball Players Movement GRUI Path Length 1.141 # 5
OOB Rate (10^−3) 4.703 # 5
Step Change (10^−3) 14.95 # 5
Path Difference 0.690 # 4
Player Distance 0.398 # 1
Multivariate Time Series Imputation KDD CUP Challenge 2018 GAN-2-stage MSE (10% missing) 0.355 # 2
Multivariate Time Series Imputation PEMS-SF GRUI L2 Loss (10^-4) 15.24 # 5

Methods