A Binary Variational Autoencoder for Hashing
Searching a large dataset to find elements that are similar to a sample object is a fundamental problem in computer science. Hashing algorithms deal with this problem by representing data with similarity-preserving binary codes that can be used as indices into a hash table. Recently, it has been shown that variational autoencoders (VAEs) can be successfully trained to learn such codes in unsupervised and semi-supervised scenarios. In this paper, we show that a variational autoencoder with binary latent variables leads to a more natural and effective hashing algorithm that its continuous counterpart. The model reduces the quantization error introduced by continuous formulations but is still trainable with standard back-propagation. Experiments on text retrieval tasks illustrate the advantages of our model with respect to previous art.
PDF AbstractCode
Task | Dataset | Model | Metric Name | Metric Value | Global Rank | Uses Extra Training Data |
Benchmark |
---|---|---|---|---|---|---|---|
Text Retrieval | 20 Newsgroups | B-VAE | Precision@100 | 0.441 | # 1 | ||
Text Retrieval | 20 Newsgroups | VDSH | Precision@100 | 0.319 | # 3 | ||
Text Retrieval | Reuters-21578 | VDSH | Precision@100 | 0.556 | # 3 | ||
Text Retrieval | Reuters-21578 | B-VAE | Precision@100 | 0.698 | # 2 |