Search Results for author: Yossi Gandelsman

Found 22 papers, 13 papers with code

Learning Video Representations without Natural Videos

no code implementations31 Oct 2024 Xueyang Yu, Xinlei Chen, Yossi Gandelsman

A VideoMAE model pre-trained on our synthetic videos closes 97. 2\% of the performance gap on UCF101 action classification between training from scratch and self-supervised pre-training from natural videos, and outperforms the pre-trained model on HMDB51.

Action Classification Diversity

Interpreting and Editing Vision-Language Representations to Mitigate Hallucinations

1 code implementation3 Oct 2024 Nick Jiang, Anish Kachinthaya, Suzie Petryk, Yossi Gandelsman

We investigate the internal representations of vision-language models (VLMs) to address hallucinations, a persistent challenge despite advances in model size and training.

Zero Shot Segmentation

Interpreting the Weight Space of Customized Diffusion Models

1 code implementation13 Jun 2024 Amil Dravid, Yossi Gandelsman, Kuan-Chieh Wang, Rameen Abdal, Gordon Wetzstein, Alexei A. Efros, Kfir Aberman

First, sampling a set of weights from this space results in a new model encoding a novel identity.

Interpreting the Second-Order Effects of Neurons in CLIP

no code implementations6 Jun 2024 Yossi Gandelsman, Alexei A. Efros, Jacob Steinhardt

We interpret the function of individual neurons in CLIP by automatically describing them using text.

Attribute Zero Shot Segmentation

The More You See in 2D, the More You Perceive in 3D

1 code implementation4 Apr 2024 Xinyang Han, Zelin Gao, Angjoo Kanazawa, Shubham Goel, Yossi Gandelsman

Inspired by this behavior, we introduce SAP3D, a system for 3D reconstruction and novel view synthesis from an arbitrary number of unposed images.

3D Reconstruction Image to 3D +1

Synthesizing Moving People with 3D Control

no code implementations19 Jan 2024 Boyi Li, Jathushan Rajasegaran, Yossi Gandelsman, Alexei A. Efros, Jitendra Malik

This disentangled approach allows our method to generate a sequence of images that are faithful to the target motion in the 3D pose and, to the input image in terms of visual similarity.

The More You See in 2D the More You Perceive in 3D

1 code implementation CVPR 2024 Xinyang Han, Zelin Gao, Angjoo Kanazawa, Shubham Goel, Yossi Gandelsman

Inspired by this behavior we introduce SAP3D a system for 3D reconstruction and novel view synthesis from an arbitrary number of unposed images.

3D Reconstruction Image to 3D +1

IMProv: Inpainting-based Multimodal Prompting for Computer Vision Tasks

no code implementations4 Dec 2023 Jiarui Xu, Yossi Gandelsman, Amir Bar, Jianwei Yang, Jianfeng Gao, Trevor Darrell, Xiaolong Wang

Given a textual description of a visual task (e. g. "Left: input image, Right: foreground segmentation"), a few input-output visual examples, or both, the model in-context learns to solve it for a new test input.

Colorization Foreground Segmentation +3

Idempotent Generative Network

2 code implementations2 Nov 2023 Assaf Shocher, Amil Dravid, Yossi Gandelsman, Inbar Mosseri, Michael Rubinstein, Alexei A. Efros

We define the target manifold as the set of all instances that $f$ maps to themselves.

Interpreting CLIP's Image Representation via Text-Based Decomposition

1 code implementation9 Oct 2023 Yossi Gandelsman, Alexei A. Efros, Jacob Steinhardt

We decompose the image representation as a sum across individual image patches, model layers, and attention heads, and use CLIP's text representation to interpret the summands.

Test-Time Training on Video Streams

no code implementations11 Jul 2023 Renhao Wang, Yu Sun, Yossi Gandelsman, Xinlei Chen, Alexei A. Efros, Xiaolong Wang

Before making a prediction on each test instance, the model is trained on the same instance using a self-supervised task, such as image reconstruction with masked autoencoders.

Image Reconstruction Panoptic Segmentation

Rosetta Neurons: Mining the Common Units in a Model Zoo

no code implementations ICCV 2023 Amil Dravid, Yossi Gandelsman, Alexei A. Efros, Assaf Shocher

In this paper, we demonstrate the existence of common features we call "Rosetta Neurons" across a range of models with different architectures, different tasks (generative and discriminative), and different types of supervision (class-supervised, text-supervised, self-supervised).

Test-Time Training with Masked Autoencoders

1 code implementation15 Sep 2022 Yossi Gandelsman, Yu Sun, Xinlei Chen, Alexei A. Efros

Test-time training adapts to a new test distribution on the fly by optimizing a model for each test input using self-supervision.

Visual Prompting via Image Inpainting

1 code implementation1 Sep 2022 Amir Bar, Yossi Gandelsman, Trevor Darrell, Amir Globerson, Alexei A. Efros

How does one adapt a pre-trained visual model to novel downstream tasks without task-specific finetuning or any model modification?

Colorization Edge Detection +6

MyStyle: A Personalized Generative Prior

no code implementations31 Mar 2022 Yotam Nitzan, Kfir Aberman, Qiurui He, Orly Liba, Michal Yarom, Yossi Gandelsman, Inbar Mosseri, Yael Pritch, Daniel Cohen-Or

Given a small reference set of portrait images of a person (~100), we tune the weights of a pretrained StyleGAN face generator to form a local, low-dimensional, personalized manifold in the latent space.

Image Enhancement Super-Resolution

Deep ViT Features as Dense Visual Descriptors

1 code implementation10 Dec 2021 Shir Amir, Yossi Gandelsman, Shai Bagon, Tali Dekel

To distill the power of ViT features from convoluted design choices, we restrict ourselves to lightweight zero-shot methodologies (e. g., binning and clustering) applied directly to the features.

Feature Upsampling Semantic correspondence

Deep Saliency Prior for Reducing Visual Distraction

no code implementations CVPR 2022 Kfir Aberman, Junfeng He, Yossi Gandelsman, Inbar Mosseri, David E. Jacobs, Kai Kohlhoff, Yael Pritch, Michael Rubinstein

Using only a model that was trained to predict where people look at images, and no additional training data, we can produce a range of powerful editing effects for reducing distraction in images.

Semantic Pyramid for Image Generation

2 code implementations CVPR 2020 Assaf Shocher, Yossi Gandelsman, Inbar Mosseri, Michal Yarom, Michal Irani, William T. Freeman, Tali Dekel

We demonstrate that our model results in a versatile and flexible framework that can be used in various classic and novel image generation tasks.

General Classification Image Generation +2

"Double-DIP": Unsupervised Image Decomposition via Coupled Deep-Image-Priors

1 code implementation Computer Vision Foundation 2018 Yossi Gandelsman, Assaf Shocher, Michal Irani

It was shown [Ulyanov et al] that the structure of a single DIP generator network is sufficient to capture the low-level statistics of a single image.

Image Dehazing Image Segmentation +3

Cannot find the paper you are looking for? You can Submit a new open access paper.