Ghost Imputation: Accurately Reconstructing Missing Data of the Off Period

Noise and missing data are intrinsic characteristics of real-world data, leading to uncertainty that negatively affects the quality of knowledge extracted from the data. The burden imposed by missing data is often severe in sensors that collect data from the physical world, where large gaps of missing data may occur when the system is temporarily off or disconnected. How can we reconstruct missing data for these periods' We introduce an accurate and efficient algorithm for missing data reconstruction (imputation), that is specifically designed to recover off-period segments of missing data. This algorithm, Ghost, searches the sequential dataset to find data segments that have a prior and posterior segment that matches those of the missing data. If there is a similar segment that also satisfies the constraint such as location or time of day then it is substituted for the missing data. A baseline approach results in quadratic computational complexity, therefore we introduce a caching approach that reduces the search space and improves the computational complexity to linear in the common case. Experimental evaluations on five real-world datasets show that our algorithm significantly outperforms four state-of-the-art algorithms with an average of 18% higher F-score.

PDF Abstract

Datasets


  Add Datasets introduced or used in this paper

Results from the Paper


  Submit results from this paper to get state-of-the-art GitHub badges and help the community compare results to other papers.

Methods


No methods listed for this paper. Add relevant methods here