Semantic Hashing with Locality Sensitive Embeddings

1 Jan 2021 · Levi Boyles, Aniket Anand Deshmukh, Urun Dogan, Rajesh Koduru, Charles Denis, Eren Manavoglu ·

Semantic hashing methods have been explored for learning transformations into binary vector spaces. These learned binary representations may then be used in hashing based retrieval methods, typically by retrieving all neighboring elements in the Hamming ball with radius 1 or 2. Prior studies focus on tasks with a few dozen to a few hundred semantic categories at most, and it is not currently well known how these methods scale to domains with richer semantic structure. In this study, we focus on learning embeddings for the use in exact hashing retrieval, where Approximate Nearest Neighbor search comprises of a simple table lookup. We propose similarity learning methods in which the optimized similarity is the angular similarity (the probability of collision under SimHash.) We demonstrate the benefits of these embeddings on a variety of domains, including a coocurrence modelling task on a large scale text corpus; a rich structure of which cannot be handled by a few hundred semantic groups.

PDF Abstract