Learning Representations for Incomplete Time Series Clustering
Time-series clustering is an essential unsupervised technique for data analysis, applied to many real-world fields, such as medical analysis and DNA microarray. Existing clustering methods are usually based on the assumption that the data is complete. However, time series in real-world applications often contain missing values. Traditional strategy (imputing first and then clustering) does not optimize the imputation and clustering process as a whole, which not only makes per- formance dependent on the combination of imputation and clustering methods but also fails to achieve satisfactory re- sults. How to best improve the clustering performance on incomplete time series remains a challenge. This paper pro- poses a novel unsupervised temporal representation learning model, named Clustering Representation Learning on Incom- plete time-series data (CRLI). CRLI jointly optimizes the im- putation and clustering process to impute more discrimina- tive values for clustering and make the learned representa- tions possessed good clustering property. Also, to reduce the error propagation from imputation to clustering, we introduce a discriminator to make the distribution of imputation values close to the true one and train CRLI in an alternating train- ing manner. An experiment conducted on eight real-world in- complete time-series datasets shows that CRLI outperforms existing methods. We demonstrates the effectiveness of the learned representations and the convergence of the model through visualization analysis. Moreover, we reveal that the joint training strategy can impute values close to the true ones in those important sub-sequences, and impute more discrim- inative values in those less important sub-sequences at the same time, making the imputed sequence cluster-friendly.
PDF Abstract