A standard solution is to train networks with Quantization Aware Training, where the weights are quantized during training and the gradients approximated with the Straight-Through Estimator.
More precisely, we augment the self-attention layers with persistent memory vectors that play a similar role as the feed-forward layer.
Ranked #4 on Language Modelling on Text8
Continuous word representations learned separately on distinct languages can be aligned so that their words become comparable in a common space.
Then, we show the interest of using this strategy in an asymmetrical manner, with only the database features being aggregated but not those of the query.
This paper addresses the construction of a short-vector (128D) image representation for large-scale image and particular object retrieval.
Our approach significantly outperforms the state-of-the-art on both datasets, while restricting the search of actions to a fraction of possible bounding box sequences.
We consider the design of a single vector representation for an image that embeds and aggregates a set of local patch descriptors such as SIFT.
Furthermore, we extend product quantization to complex vectors in order to compress our descriptors, and to compare them in the compressed domain.
Several recent works on action recognition have attested the importance of explicitly integrating motion characteristics in the video description.