Although much effort has recently been devoted to training high-quality sentence embeddings, we still have a poor understanding of what they are capturing.
Vector space embedding models like word2vec, GloVe, fastText, and ELMo are extremely popular representations in natural language processing (NLP) applications.
Paragraph Vectors has been recently proposed as an unsupervised method for learning distributed representations for pieces of texts.
We present a new approach for the approximate K-nearest neighbor search based on navigable small world graphs with controllable hierarchy (Hierarchical NSW, HNSW).
This paper considers the problem of approximate nearest neighbor search in the compressed domain.
Similarity search finds application in specialized database systems handling complex data such as images or videos, which are typically represented by high-dimensional features and require specific indexing structures.
The use of phonological features (PFs) potentially allows language-specific phones to remain linked in training, which is highly desirable for information sharing for multilingual and crosslingual speech recognition methods for low-resourced languages.
Our main insight is that queries can be embedded as boxes (i. e., hyper-rectangles), where a set of points inside the box corresponds to a set of answer entities of the query.
Ranked #4 on
Complex Query Answering
on FB15k-237
Based on the observation that for a given query, the database points that have the largest inner products are more relevant, we develop a family of anisotropic quantization loss functions.
Efficient Nearest Neighbor (NN) search in high-dimensional spaces is a foundation of many multimedia retrieval systems.