HOPE: Hierarchical Object Prototype Encoding for Efficient Object Instance Search in Videos

CVPR 2017 · Tan Yu, Yuwei Wu, Junsong Yuan ·

This paper tackles the problem of efficient and effective object instance search in videos. To effectively capture the relevance between a query and video frames and precisely localize the particular object, we leverage the object proposals to improve the quality of object instance search in videos. However, hundreds of object proposals obtained from each frame could result in unaffordable memory and computational cost. To this end, we present a simple yet effective hierarchical object prototype encoding (HOPE) model to accelerate the object instance search without sacrificing accuracy, which exploits both the spatial and temporal self-similarity property existing in object proposals generated from video frames. We design two types of sphere k-means methods, i.e., spatially-constrained sphere k-means and temporally-constrained sphere k-means to learn frame-level object prototypes and dataset-level object prototypes, respectively. In this way, the object instance search problem is cast to the sparse matrix-vector multiplication problem. Thanks to the sparsity of the codes, both the memory and computational cost are significantly reduced. Experimental results on two video datasets demonstrate that our approach significantly improves the performance of video object instance search over other state-of-the-art fast search schemes.

PDF Abstract