We propose Context Diffusion, a diffusion-based framework that enables image generation models to learn from visual examples presented in context.
no code implementations • 27 Sep 2023 • Xiaoliang Dai, Ji Hou, Chih-Yao Ma, Sam Tsai, Jialiang Wang, Rui Wang, Peizhao Zhang, Simon Vandenhende, Xiaofang Wang, Abhimanyu Dubey, Matthew Yu, Abhishek Kadian, Filip Radenovic, Dhruv Mahajan, Kunpeng Li, Yue Zhao, Vladan Petrovic, Mitesh Kumar Singh, Simran Motwani, Yi Wen, Yiwen Song, Roshan Sumbaly, Vignesh Ramanathan, Zijian He, Peter Vajda, Devi Parikh
Training text-to-image models with web scale image-text pairs enables the generation of a wide range of visual concepts from text.
Vision-language models trained with contrastive learning on large-scale noisy data are becoming increasingly popular for zero-shot recognition problems.
We demonstrate by human subject evaluations that SPAMs are demonstrably more interpretable in practice, and are hence an effortless replacement for DNNs for creating interpretable and high-performance systems suitable for large-scale machine learning.
However, these models are typically black-box deep neural networks, explained post-hoc via methods with known faithfulness limitations.
A visual counterfactual explanation replaces image regions in a query image with regions from a distractor image such that the system's decision on the transformed image changes to the distractor class.
1 code implementation • 17 Jun 2021 • Matthijs Douze, Giorgos Tolias, Ed Pizzi, Zoë Papakipos, Lowik Chanussot, Filip Radenovic, Tomas Jenicek, Maxim Maximov, Laura Leal-Taixé, Ismail Elezi, Ondřej Chum, Cristian Canton Ferrer
This benchmark is used for the Image Similarity Challenge at NeurIPS'21 (ISC2021).
Ranked #1 on Image Similarity Detection on DISC21 dev
We study the problem of learning how to predict attribute-object compositions from images, and its generalization to unseen compositions missing from the training data.
Query expansion is a technique widely used in image search consisting in combining highly ranked images from an original query into an expanded query that is then reissued, generally leading to increased recall and precision.
We show successful attacks to partially unknown systems, by designing various loss functions for the adversarial image construction.
A method for learning local affine-covariant regions is presented.
Ranked #4 on Image Matching on IMC PhotoTourism (using extra training data)
We introduce a novel loss for learning local feature descriptors which is inspired by the Lowe's matching criterion for SIFT.
This work addresses the problem of camera elevation estimation from a single photograph in an outdoor environment.
We present an algorithm that leverages the appearance variety to obtain more complete and accurate scene geometry along with consistent multi-illumination appearance information.
Structure-from-Motion for unordered image collections has significantly advanced in scale over the last decade.
This paper addresses the construction of a short-vector (128D) image representation for large-scale image and particular object retrieval.