no code implementations • 13 Feb 2025 • Noam Issachar, Mohammad Salama, Raanan Fattal, Sagie Benaim
Given an input condition, such as a text prompt, we first map it to a point lying in data space, representing an ``average" data point with the minimal average distance to all data points of the same conditional mode (e. g., class).
no code implementations • 6 Jan 2025 • Guy Yariv, Yuval Kirstain, Amit Zohar, Shelly Sheynin, Yaniv Taigman, Yossi Adi, Sagie Benaim, Adam Polyak
To address these limitations, we propose a two-stage compositional framework that decomposes I2V generation into: (i) An explicit intermediate representation generation stage, followed by (ii) A video generation stage that is conditioned on this representation.
1 code implementation • 13 Oct 2024 • Ran Galun, Sagie Benaim
Text-to-image diffusion models have demonstrated an impressive ability to produce high-quality outputs.
1 code implementation • 19 Jun 2024 • Guy Yariv, Idan Schwartz, Yossi Adi, Sagie Benaim
To facilitate multimodal grounded language modeling, we employ a late-fusion layer that combines the projected visual features with the output of a pre-trained LLM conditioned on text only.
1 code implementation • 6 Jun 2024 • Sebastian Loeschcke, Dan Wang, Christian Leth-Espensen, Serge Belongie, Michael J. Kastoryano, Sagie Benaim
This has prevented practitioners from deploying the full potential of tensor networks for visual data.
no code implementations • 29 May 2024 • Isaac Labe, Noam Issachar, Itai Lang, Sagie Benaim
Our learned semantic radiance field captures per-point semantics as well as color and geometric properties for a dynamic 3D scene, enabling the generation of novel views and their corresponding semantics.
1 code implementation • 28 Sep 2023 • Guy Yariv, Itai Gat, Sagie Benaim, Lior Wolf, Idan Schwartz, Yossi Adi
The proposed method is based on a lightweight adaptor network, which learns to map the audio-based representation to the input representation expected by the text-to-video generation model.
1 code implementation • ICCV 2023 • Idan Schwartz, Vésteinn Snæbjarnarson, Hila Chefer, Ryan Cotterell, Serge Belongie, Lior Wolf, Sagie Benaim
This approach has two disadvantages: (i) supervised datasets are generally small compared to large-scale scraped text-image datasets on which text-to-image models are trained, affecting the quality and diversity of the generated images, or (ii) the input is a hard-coded label, as opposed to free-form text, limiting the control over the generated images.
1 code implementation • 9 Feb 2023 • Guandao Yang, Sagie Benaim, Varun Jampani, Kyle Genova, Jonathan T. Barron, Thomas Funkhouser, Bharath Hariharan, Serge Belongie
We use this framework to design Fourier PNFs, which match state-of-the-art performance in signal representation tasks that use neural fields.
no code implementations • 17 Nov 2022 • Peter Ebert Christensen, Vésteinn Snæbjarnarson, Andrea Dittadi, Serge Belongie, Sagie Benaim
We demonstrate that APT is capable of a wide range of class-preserving semantic image manipulations that fool a variety of pretrained classifiers.
no code implementations • 18 Jul 2022 • Lior Ben-Moshe, Sagie Benaim, Lior Wolf
We then use a separate set of side images to model the structure of generated images using an autoregressive model trained on the learned patch embeddings of training images.
no code implementations • 24 Jun 2022 • Sebastian Loeschcke, Serge Belongie, Sagie Benaim
The first target text prompt describes the global semantics and the second target text prompt describes the local semantics.
no code implementations • 6 Jun 2022 • Sagie Benaim, Frederik Warburg, Peter Ebert Christensen, Serge Belongie
To this end, we propose a volumetric framework for (i) disentangling or separating, the volumetric representation of a given foreground object from the background, and (ii) semantically manipulating the foreground object, as well as the background.
no code implementations • 5 May 2022 • Yaron Gurovich, Sagie Benaim, Lior Wolf
This problem is tackled through the lens of disentangled and locally fair representations.
1 code implementation • 9 Dec 2021 • Shelly Sheynin, Sagie Benaim, Adam Polyak, Lior Wolf
The separation of the attention layer into local and global counterparts allows for a low computational cost in the number of patches, while still supporting data-dependent localization already at the first layer, as opposed to the static positioning in other visual transformers.
1 code implementation • CVPR 2022 • Oscar Michel, Roi Bar-On, Richard Liu, Sagie Benaim, Rana Hanocka
In order to modify style, we obtain a similarity score between a text prompt (describing style) and a stylized mesh by harnessing the representational power of CLIP.
Ranked #1 on
Neural Stylization
on Meshes
1 code implementation • 24 Oct 2021 • Hila Chefer, Sagie Benaim, Roni Paiss, Lior Wolf
We make the distinction between (i) style transfer, in which a source image is manipulated to match the textures and colors of a target image, and (ii) essence transfer, in which one edits the source image to include high-level semantic attributes from the target.
no code implementations • 29 Sep 2021 • Shelly Sheynin, Sagie Benaim, Adam Polyak, Lior Wolf
Due to the expensive quadratic cost of the attention mechanism, either a large patch size is used, resulting in coarse-grained global interactions, or alternatively, attention is applied only on a local region of the image at the expense of long-range interactions.
1 code implementation • 17 Jun 2021 • Ron Mokady, Rotem Tzaban, Sagie Benaim, Amit H. Bermano, Daniel Cohen-Or
To alleviate this problem, we introduce JOKR - a JOint Keypoint Representation that captures the motion common to both the source and target videos, without requiring any object prior or data collection.
no code implementations • 30 May 2021 • Noam Gat, Sagie Benaim, Lior Wolf
We consider the task of upscaling a low resolution thumbnail image of a person, to a higher resolution image, which preserves the person's identity and other attributes.
2 code implementations • ICCV 2021 • Shelly Sheynin, Sagie Benaim, Lior Wolf
We demonstrate the superiority of our method on both the one-shot and few-shot settings, on the datasets of Paris, CIFAR10, MNIST and FashionMNIST as well as in the setting of defect detection on MVTec.
1 code implementation • CVPR 2021 • Oren Nuriel, Sagie Benaim, Lior Wolf
In the setting of robustness, our method improves on both ImageNet-C and Cifar-100-C for multiple architectures.
3 code implementations • NeurIPS 2020 • Shir Gur, Sagie Benaim, Lior Wolf
We consider the task of generating diverse and novel videos from a single video sample.
1 code implementation • ICLR 2020 • Ron Mokady, Sagie Benaim, Lior Wolf, Amit Bermano
We consider the problem of translating, in an unsupervised manner, between two domains where one contains some additional information compared to the other.
no code implementations • 26 Apr 2020 • Yaniv Benny, Tomer Galanti, Sagie Benaim, Lior Wolf
We present two new metrics for evaluating generative models in the class-conditional image generation setting.
1 code implementation • CVPR 2020 • Sagie Benaim, Ariel Ephrat, Oran Lang, Inbar Mosseri, William T. Freeman, Michael Rubinstein, Michal Irani, Tali Dekel
We demonstrate how those learned features can boost the performance of self-supervised action recognition, and can be used for video retrieval.
1 code implementation • 5 Apr 2020 • Sagie Benaim, Ron Mokady, Amit Bermano, Daniel Cohen-Or, Lior Wolf
In this paper, we explore the capabilities of neural networks to understand image structure given only a single pair of images, A and B.
no code implementations • ICLR 2019 • Lior Wolf, Sagie Benaim, Tomer Galanti
Two functions are learned: (i) a set indicator c, which is a binary classifier, and (ii) a comparator function h that given two nearby samples, predicts which sample has the higher value of the unknown function v. Loss terms are used to ensure that all training samples x are a local maxima of v, according to h and satisfy c(x)=1.
1 code implementation • ICLR 2019 • Ori Press, Tomer Galanti, Sagie Benaim, Lior Wolf
Thus, in the above example, we can create, for every person without glasses a version with the glasses observed in any face image.
1 code implementation • ICCV 2019 • Sagie Benaim, Michael Khaitov, Tomer Galanti, Lior Wolf
We present a method for recovering the shared content between two visual domains as well as the content that is unique to each domain.
1 code implementation • 15 Jun 2019 • Ron Mokady, Sagie Benaim, Lior Wolf, Amit Bermano
We consider the problem of translating, in an unsupervised manner, between two domains where one contains some additional information compared to the other.
1 code implementation • 14 Dec 2018 • Michael Michelashvili, Sagie Benaim, Lior Wolf
We study the problem of semi-supervised singing voice separation, in which the training data contains a set of samples of mixed music (singing and instrumental) and an unmatched set of instrumental music.
no code implementations • 23 Jul 2018 • Tomer Galanti, Sagie Benaim, Lior Wolf
The recent empirical success of unsupervised cross-domain mapping algorithms, between two domains that share common characteristics, is not well-supported by theoretical justifications.
2 code implementations • NeurIPS 2018 • Sagie Benaim, Lior Wolf
Given a single image x from domain A and a set of images from domain B, our task is to generate the analogous of x in B.
1 code implementation • ECCV 2018 • Sagie Benaim, Tomer Galanti, Lior Wolf
While in supervised learning, the validation error is an unbiased estimator of the generalization (test) error and complexity-based generalization bounds are abundant, no such bounds exist for learning a mapping in an unsupervised way.
no code implementations • ICLR 2018 • Tomer Galanti, Lior Wolf, Sagie Benaim
We discuss the feasibility of the following learning problem: given unmatched samples from two domains and nothing else, learn a mapping between the two, which preserves semantics.
1 code implementation • NeurIPS 2017 • Sagie Benaim, Lior Wolf
In this work, we present a method of learning $G_{AB}$ without learning $G_{BA}$.
Ranked #8 on
Facial Expression Translation
on CelebA