Search Results for author: Matthijs Douze

Found 53 papers, 34 papers with code

Machine learning and high dimensional vector search

no code implementations24 Feb 2025 Matthijs Douze

Machine learning and vector search are two research topics that developed in parallel in nearby communities.

Inference-time sparse attention with asymmetric indexing

no code implementations12 Feb 2025 Pierre-Emmanuel Mazaré, Gergely Szilvasy, Maria Lomeli, Francisco Massa, Naila Murray, Hervé Jégou, Matthijs Douze

Self-attention in transformer models is an incremental associative memory that maps key vectors to value vectors.

Lossless Compression of Vector IDs for Approximate Nearest Neighbor Search

1 code implementation16 Jan 2025 Daniel Severo, Giuseppe Ottaviano, Matthew Muckley, Karen Ullrich, Matthijs Douze

Approximate nearest neighbor search for vectors relies on indexes that are most often accessed from RAM.

Quantization

Qinco2: Vector Compression and Search with Improved Implicit Neural Codebooks

1 code implementation6 Jan 2025 Théophane Vallaeys, Matthew Muckley, Jakob Verbeek, Matthijs Douze

QINCo recently addressed this inefficiency by using a neural network to determine the quantization codebook in RQ based on the vector reconstruction from previous steps.

Decoder Quantization

Watermark Anything with Localized Messages

1 code implementation11 Nov 2024 Tom Sander, Pierre Fernandez, Alain Durmus, Teddy Furon, Matthijs Douze

Image watermarking methods are not tailored to handle small watermarked areas.

Results of the Big ANN: NeurIPS'23 competition

1 code implementation25 Sep 2024 Harsha Vardhan Simhadri, Martin Aumüller, Amir Ingber, Matthijs Douze, George Williams, Magdalen Dobson Manohar, Dmitry Baranchuk, Edo Liberty, Frank Liu, Ben Landrum, Mazin Karjikar, Laxman Dhulipala, Meng Chen, Yue Chen, Rui Ma, Kai Zhang, Yuzheng Cai, Jiayang Shi, Yizhuo Chen, Weiguo Zheng, Zihao Wan, Jie Yin, Ben Huang

The 2023 Big ANN Challenge, held at NeurIPS 2023, focused on advancing the state-of-the-art in indexing data structures and search algorithms for practical variants of Approximate Nearest Neighbor (ANN) search that reflect the growing complexity and diversity of workloads.

Diversity

Vector search with small radiuses

no code implementations16 Mar 2024 Gergely Szilvasy, Pierre-Emmanuel Mazaré, Matthijs Douze

Although convenient to compute, this metric is distantly related to the end-to-end accuracy of a full system that integrates vector search.

Image Retrieval Retrieval

Watermarking Makes Language Models Radioactive

1 code implementation22 Feb 2024 Tom Sander, Pierre Fernandez, Alain Durmus, Matthijs Douze, Teddy Furon

We discover that, on the contrary, it is possible to reliably determine if a language model was trained on synthetic data if that data is output by a watermarked LLM.

Language Modeling Language Modelling

Residual Quantization with Implicit Neural Codebooks

1 code implementation26 Jan 2024 Iris A. M. Huijben, Matthijs Douze, Matthew Muckley, Ruud J. G. van Sloun, Jakob Verbeek

For example, QINCo achieves better nearest-neighbor search accuracy using 12-byte codes than the state-of-the-art UNQ using 16 bytes on the BigANN1M and Deep1M datasets.

Data Compression Quantization

Functional Invariants to Watermark Large Transformers

no code implementations17 Oct 2023 Pierre Fernandez, Guillaume Couairon, Teddy Furon, Matthijs Douze

The rapid growth of transformer-based models increases the concerns about their integrity and ownership insurance.

Quantization

DeDrift: Robust Similarity Search under Content Drift

no code implementations ICCV 2023 Dmitry Baranchuk, Matthijs Douze, Yash Upadhyay, I. Zeki Yalniz

We investigate the impact of this "content drift" for large-scale similarity search tools, based on nearest neighbor search in embedding space.

The 2023 Video Similarity Dataset and Challenge

1 code implementation15 Jun 2023 Ed Pizzi, Giorgos Kordopatis-Zilos, Hiral Patel, Gheorghe Postelnicu, Sugosh Nagavara Ravindra, Akshay Gupta, Symeon Papadopoulos, Giorgos Tolias, Matthijs Douze

The problem comprises two distinct but related tasks: determining whether a query video shares content with a reference video ("detection"), and additionally temporally localizing the shared content within each video ("localization").

Copy Detection Video Similarity

The Stable Signature: Rooting Watermarks in Latent Diffusion Models

3 code implementations ICCV 2023 Pierre Fernandez, Guillaume Couairon, Hervé Jégou, Matthijs Douze, Teddy Furon

For instance, it detects the origin of an image generated from a text prompt, then cropped to keep $10\%$ of the content, with $90$+$\%$ accuracy at a false positive rate below 10$^{-6}$.

Decoder

Active Image Indexing

1 code implementation5 Oct 2022 Pierre Fernandez, Matthijs Douze, Hervé Jégou, Teddy Furon

First, a neural network maps an image to a vector representation, that is relatively robust to various transformations of the image.

Copy Detection Quantization +1

A Self-Supervised Descriptor for Image Copy Detection

2 code implementations CVPR 2022 Ed Pizzi, Sreya Dutta Roy, Sugosh Nagavara Ravindra, Priya Goyal, Matthijs Douze

We adapt this method to the copy detection task by changing the architecture and training objective, including a pooling operator from the instance matching literature, and adapting contrastive learning to augmentations that combine images.

Contrastive Learning Copy Detection +1

Watermarking Images in Self-Supervised Latent Spaces

1 code implementation17 Dec 2021 Pierre Fernandez, Alexandre Sablayrolles, Teddy Furon, Hervé Jégou, Matthijs Douze

We revisit watermarking techniques based on pre-trained deep networks, in the light of self-supervised approaches.

Data Augmentation Decoder

Nearest neighbor search with compact codes: A decoder perspective

no code implementations17 Dec 2021 Kenza Amara, Matthijs Douze, Alexandre Sablayrolles, Hervé Jégou

Modern approaches for fast retrieval of similar vectors on billion-scaled datasets rely on compressed-domain approaches such as binary sketches or product quantization.

Decoder Quantization +1

XCiT: Cross-Covariance Image Transformers

12 code implementations NeurIPS 2021 Alaaeldin El-Nouby, Hugo Touvron, Mathilde Caron, Piotr Bojanowski, Matthijs Douze, Armand Joulin, Ivan Laptev, Natalia Neverova, Gabriel Synnaeve, Jakob Verbeek, Hervé Jegou

We propose a "transposed" version of self-attention that operates across feature channels rather than tokens, where the interactions are based on the cross-covariance matrix between keys and queries.

Instance Segmentation object-detection +3

Powers of layers for image-to-image translation

no code implementations13 Aug 2020 Hugo Touvron, Matthijs Douze, Matthieu Cord, Hervé Jégou

We propose a simple architecture to address unpaired image-to-image translation tasks: style or class transfer, denoising, deblurring, deblocking, etc.

 Ranked #1 on Image-to-Image Translation on vangogh2photo (Frechet Inception Distance metric)

Deblurring Denoising +2

Data Augmenting Contrastive Learning of Speech Representations in the Time Domain

1 code implementation2 Jul 2020 Eugene Kharitonov, Morgane Rivière, Gabriel Synnaeve, Lior Wolf, Pierre-Emmanuel Mazaré, Matthijs Douze, Emmanuel Dupoux

Contrastive Predictive Coding (CPC), based on predicting future segments of speech based on past segments is emerging as a powerful algorithm for representation learning of speech signal.

Contrastive Learning Data Augmentation +1

Fixing the train-test resolution discrepancy: FixEfficientNet

1 code implementation18 Mar 2020 Hugo Touvron, Andrea Vedaldi, Matthijs Douze, Hervé Jégou

An EfficientNet-L2 pre-trained with weak supervision on 300M unlabeled images and further optimized with FixRes achieves 88. 5% top-1 accuracy (top-5: 98. 7%), which establishes the new state of the art for ImageNet with a single crop.

Ranked #9 on Image Classification on ImageNet ReaL (using extra training data)

Data Augmentation Image Classification

White-box vs Black-box: Bayes Optimal Strategies for Membership Inference

no code implementations29 Aug 2019 Alexandre Sablayrolles, Matthijs Douze, Yann Ollivier, Cordelia Schmid, Hervé Jégou

Membership inference determines, given a sample and trained parameters of a machine learning model, whether the sample was part of the training set.

Fixing the train-test resolution discrepancy

3 code implementations NeurIPS 2019 Hugo Touvron, Andrea Vedaldi, Matthijs Douze, Hervé Jégou

Conversely, when training a ResNeXt-101 32x48d pre-trained in weakly-supervised fashion on 940 million public images at resolution 224x224 and further optimizing for test resolution 320x320, we obtain a test top-1 accuracy of 86. 4% (top-5: 98. 0%) (single-crop).

Ranked #2 on Fine-Grained Image Classification on Birdsnap (using extra training data)

Data Augmentation Fine-Grained Image Classification +1

Déjà Vu: an empirical evaluation of the memorization properties of ConvNets

no code implementations ICLR 2019 Alexandre Sablayrolles, Matthijs Douze, Cordelia Schmid, Hervé Jégou

Convolutional neural networks memorize part of their training data, which is why strategies such as data augmentation and drop-out are employed to mitigate overfitting.

Data Augmentation Memorization

Deep Clustering for Unsupervised Learning of Visual Features

9 code implementations ECCV 2018 Mathilde Caron, Piotr Bojanowski, Armand Joulin, Matthijs Douze

In this work, we present DeepCluster, a clustering method that jointly learns the parameters of a neural network and the cluster assignments of the resulting features.

Clustering Deep Clustering +3

Spreading vectors for similarity search

2 code implementations ICLR 2019 Alexandre Sablayrolles, Matthijs Douze, Cordelia Schmid, Hervé Jégou

Discretizing multi-dimensional data distributions is a fundamental step of modern indexing methods.

Quantization Triplet

Link and code: Fast indexing with graphs and compact regression codes

8 code implementations CVPR 2018 Matthijs Douze, Alexandre Sablayrolles, Hervé Jégou

Similarity search approaches based on graph walks have recently attained outstanding speed-accuracy trade-offs, taking aside the memory requirements.

Image Similarity Search Quantization +1

An evaluation of large-scale methods for image instance and class discovery

no code implementations9 Aug 2017 Matthijs Douze, Hervé Jégou, Jeff Johnson

While k-means is usually considered as the gold standard for this task, we evaluate and show the interest of diffusion methods that have been neglected by the state of the art, such as the Markov Clustering algorithm.

Clustering Instance Search

Learning Joint Multilingual Sentence Representations with Neural Machine Translation

1 code implementation WS 2017 Holger Schwenk, Matthijs Douze

In this paper, we use the framework of neural machine translation to learn joint sentence representations across six very different languages.

Joint Multilingual Sentence Representations Machine Translation +2

Billion-scale similarity search with GPUs

14 code implementations28 Feb 2017 Jeff Johnson, Matthijs Douze, Hervé Jégou

Similarity search finds application in specialized database systems handling complex data such as images or videos, which are typically represented by high-dimensional features and require specific indexing structures.

Image Similarity Search Quantization

FastText.zip: Compressing text classification models

44 code implementations12 Dec 2016 Armand Joulin, Edouard Grave, Piotr Bojanowski, Matthijs Douze, Hérve Jégou, Tomas Mikolov

We consider the problem of producing compact architectures for text classification, such that the full model fits in a limited amount of memory.

General Classification Quantization +2

How should we evaluate supervised hashing?

1 code implementation21 Sep 2016 Alexandre Sablayrolles, Matthijs Douze, Hervé Jégou, Nicolas Usunier

Hashing produces compact representations for documents, to perform tasks like classification or retrieval based on these short codes.

General Classification Retrieval +1

Polysemous codes

11 code implementations7 Sep 2016 Matthijs Douze, Hervé Jégou, Florent Perronnin

This paper considers the problem of approximate nearest neighbor search in the compressed domain.

Quantization

Convolutional Patch Representations for Image Retrieval: an Unsupervised Approach

no code implementations1 Mar 2016 Mattis Paulin, Julien Mairal, Matthijs Douze, Zaid Harchaoui, Florent Perronnin, Cordelia Schmid

Convolutional neural networks (CNNs) have recently received a lot of attention due to their ability to model local stationary structures in natural images in a multi-scale fashion, when learning all model parameters with supervision.

Image Classification Image Retrieval +1

Beat-Event Detection in Action Movie Franchises

no code implementations15 Aug 2015 Danila Potapov, Matthijs Douze, Jerome Revaud, Zaid Harchaoui, Cordelia Schmid

While important advances were recently made towards temporally localizing and recognizing specific human actions or activities in videos, efficient detection and classification of long video chunks belonging to semantically defined categories such as "pursuit" or "romance" remains challenging. We introduce a new dataset, Action Movie Franchises, consisting of a collection of Hollywood action movie franchises.

Classification Event Detection +1

Event Retrieval in Large Video Collections with Circulant Temporal Encoding

no code implementations CVPR 2013 Jerome Revaud, Matthijs Douze, Cordelia Schmid, Herve Jegou

Furthermore, we extend product quantization to complex vectors in order to compress our descriptors, and to compare them in the compressed domain.

Copy Detection Quantization +1

Cannot find the paper you are looking for? You can Submit a new open access paper.