no code implementations • 31 Oct 2024 • Xueyang Yu, Xinlei Chen, Yossi Gandelsman
A VideoMAE model pre-trained on our synthetic videos closes 97. 2\% of the performance gap on UCF101 action classification between training from scratch and self-supervised pre-training from natural videos, and outperforms the pre-trained model on HMDB51.
1 code implementation • 3 Oct 2024 • Nick Jiang, Anish Kachinthaya, Suzie Petryk, Yossi Gandelsman
We investigate the internal representations of vision-language models (VLMs) to address hallucinations, a persistent challenge despite advances in model size and training.
no code implementations • 10 Sep 2024 • Avinash Madasu, Yossi Gandelsman, Vasudev Lal, Phillip Howard
However, little is known about the inner workings of CLIP.
1 code implementation • 13 Jun 2024 • Amil Dravid, Yossi Gandelsman, Kuan-Chieh Wang, Rameen Abdal, Gordon Wetzstein, Alexei A. Efros, Kfir Aberman
First, sampling a set of weights from this space results in a new model encoding a novel identity.
no code implementations • 6 Jun 2024 • Yossi Gandelsman, Alexei A. Efros, Jacob Steinhardt
We interpret the function of individual neurons in CLIP by automatically describing them using text.
1 code implementation • 4 Apr 2024 • Xinyang Han, Zelin Gao, Angjoo Kanazawa, Shubham Goel, Yossi Gandelsman
Inspired by this behavior, we introduce SAP3D, a system for 3D reconstruction and novel view synthesis from an arbitrary number of unposed images.
no code implementations • 19 Jan 2024 • Boyi Li, Jathushan Rajasegaran, Yossi Gandelsman, Alexei A. Efros, Jitendra Malik
This disentangled approach allows our method to generate a sequence of images that are faithful to the target motion in the 3D pose and, to the input image in terms of visual similarity.
1 code implementation • CVPR 2024 • Xinyang Han, Zelin Gao, Angjoo Kanazawa, Shubham Goel, Yossi Gandelsman
Inspired by this behavior we introduce SAP3D a system for 3D reconstruction and novel view synthesis from an arbitrary number of unposed images.
no code implementations • 4 Dec 2023 • Jiarui Xu, Yossi Gandelsman, Amir Bar, Jianwei Yang, Jianfeng Gao, Trevor Darrell, Xiaolong Wang
Given a textual description of a visual task (e. g. "Left: input image, Right: foreground segmentation"), a few input-output visual examples, or both, the model in-context learns to solve it for a new test input.
2 code implementations • 2 Nov 2023 • Assaf Shocher, Amil Dravid, Yossi Gandelsman, Inbar Mosseri, Michael Rubinstein, Alexei A. Efros
We define the target manifold as the set of all instances that $f$ maps to themselves.
1 code implementation • 9 Oct 2023 • Yossi Gandelsman, Alexei A. Efros, Jacob Steinhardt
We decompose the image representation as a sum across individual image patches, model layers, and attention heads, and use CLIP's text representation to interpret the summands.
1 code implementation • 29 Sep 2023 • Tianyu Han, Laura Žigutytė, Luisa Huck, Marc Huppertz, Robert Siepmann, Yossi Gandelsman, Christian Blüthgen, Firas Khader, Christiane Kuhl, Sven Nebelung, Jakob Kather, Daniel Truhn
Current techniques for evaluating deep learning models cannot visualize confounding factors at a diagnostic level.
no code implementations • 11 Jul 2023 • Renhao Wang, Yu Sun, Yossi Gandelsman, Xinlei Chen, Alexei A. Efros, Xiaolong Wang
Before making a prediction on each test instance, the model is trained on the same instance using a self-supervised task, such as image reconstruction with masked autoencoders.
no code implementations • ICCV 2023 • Amil Dravid, Yossi Gandelsman, Alexei A. Efros, Assaf Shocher
In this paper, we demonstrate the existence of common features we call "Rosetta Neurons" across a range of models with different architectures, different tasks (generative and discriminative), and different types of supervision (class-supervised, text-supervised, self-supervised).
1 code implementation • 15 Sep 2022 • Yossi Gandelsman, Yu Sun, Xinlei Chen, Alexei A. Efros
Test-time training adapts to a new test distribution on the fly by optimizing a model for each test input using self-supervision.
1 code implementation • 1 Sep 2022 • Amir Bar, Yossi Gandelsman, Trevor Darrell, Amir Globerson, Alexei A. Efros
How does one adapt a pre-trained visual model to novel downstream tasks without task-specific finetuning or any model modification?
Ranked #5 on Personalized Segmentation on PerSeg
no code implementations • 31 Mar 2022 • Yotam Nitzan, Kfir Aberman, Qiurui He, Orly Liba, Michal Yarom, Yossi Gandelsman, Inbar Mosseri, Yael Pritch, Daniel Cohen-Or
Given a small reference set of portrait images of a person (~100), we tune the weights of a pretrained StyleGAN face generator to form a local, low-dimensional, personalized manifold in the latent space.
1 code implementation • 10 Dec 2021 • Shir Amir, Yossi Gandelsman, Shai Bagon, Tali Dekel
To distill the power of ViT features from convoluted design choices, we restrict ourselves to lightweight zero-shot methodologies (e. g., binning and clustering) applied directly to the features.
Ranked #6 on Feature Upsampling on ImageNet
no code implementations • CVPR 2022 • Kfir Aberman, Junfeng He, Yossi Gandelsman, Inbar Mosseri, David E. Jacobs, Kai Kohlhoff, Yael Pritch, Michael Rubinstein
Using only a model that was trained to predict where people look at images, and no additional training data, we can produce a range of powerful editing effects for reducing distraction in images.
2 code implementations • ICCV 2021 • Oran Lang, Yossi Gandelsman, Michal Yarom, Yoav Wald, Gal Elidan, Avinatan Hassidim, William T. Freeman, Phillip Isola, Amir Globerson, Michal Irani, Inbar Mosseri
A natural source for such attributes is the StyleSpace of StyleGAN, which is known to generate semantically meaningful dimensions in the image.
2 code implementations • CVPR 2020 • Assaf Shocher, Yossi Gandelsman, Inbar Mosseri, Michal Yarom, Michal Irani, William T. Freeman, Tali Dekel
We demonstrate that our model results in a versatile and flexible framework that can be used in various classic and novel image generation tasks.
1 code implementation • Computer Vision Foundation 2018 • Yossi Gandelsman, Assaf Shocher, Michal Irani
It was shown [Ulyanov et al] that the structure of a single DIP generator network is sufficient to capture the low-level statistics of a single image.