4 code implementations • CVPR 2023 • Ron Mokady, Amir Hertz, Kfir Aberman, Yael Pritch, Daniel Cohen-Or
Our Null-text inversion, based on the publicly available Stable Diffusion model, is extensively evaluated on a variety of images and prompt editing, showing high-fidelity editing of real images.
Ranked #6 on
Text-based Image Editing
on PIE-Bench
4 code implementations • 1 Nov 2022 • David Nukrai, Ron Mokady, Amir Globerson
We consider the task of image-captioning using only the CLIP model and additional text data at training time, and no additional captioned images.
Ranked #1 on
Image Captioning
on MSCOCO
7 code implementations • 2 Aug 2022 • Amir Hertz, Ron Mokady, Jay Tenenbaum, Kfir Aberman, Yael Pritch, Daniel Cohen-Or
Editing is challenging for these generative models, since an innate property of an editing technique is to preserve most of the original image, while in the text-based models, even a small modification of the text prompt often leads to a completely different outcome.
Ranked #17 on
Text-based Image Editing
on PIE-Bench
no code implementations • 28 Feb 2022 • Amit H. Bermano, Rinon Gal, Yuval Alaluf, Ron Mokady, Yotam Nitzan, Omer Tov, Or Patashnik, Daniel Cohen-Or
Of these, StyleGAN offers a fascinating case study, owing to its remarkable visual quality and an ability to support a large array of downstream tasks.
2 code implementations • 24 Feb 2022 • Ron Mokady, Michal Yarom, Omer Tov, Oran Lang, Daniel Cohen-Or, Tali Dekel, Michal Irani, Inbar Mosseri
To meet these challenges, we proposed a StyleGAN-based self-distillation approach, which consists of two main components: (i) A generative-based self-filtering of the dataset to eliminate outlier images, in order to generate an adequate training set, and (ii) Perceptual clustering of the generated images to detect the inherent data modalities, which are then employed to improve StyleGAN's "truncation trick" in the image synthesis process.
1 code implementation • 20 Jan 2022 • Rotem Tzaban, Ron Mokady, Rinon Gal, Amit H. Bermano, Daniel Cohen-Or
The ability of Generative Adversarial Networks to encode rich semantics within their latent space has been widely adopted for facial image editing.
1 code implementation • CVPR 2022 • Yuval Alaluf, Omer Tov, Ron Mokady, Rinon Gal, Amit H. Bermano
In this work, we introduce this approach into the realm of encoder-based inversion.
4 code implementations • 18 Nov 2021 • Ron Mokady, Amir Hertz, Amit H. Bermano
Image captioning is a fundamental task in vision-language understanding, where the model predicts a textual informative caption to a given input image.
Ranked #1 on
Image Captioning
on Conceptual Captions
1 code implementation • 17 Jun 2021 • Ron Mokady, Rotem Tzaban, Sagie Benaim, Amit H. Bermano, Daniel Cohen-Or
To alleviate this problem, we introduce JOKR - a JOint Keypoint Representation that captures the motion common to both the source and target videos, without requiring any object prior or data collection.
3 code implementations • 10 Jun 2021 • Daniel Roich, Ron Mokady, Amit H. Bermano, Daniel Cohen-Or
The key idea is pivotal tuning - a brief training process that preserves the editing quality of an in-domain latent region, while changing its portrayed identity and appearance.
1 code implementation • ICLR 2020 • Ron Mokady, Sagie Benaim, Lior Wolf, Amit Bermano
We consider the problem of translating, in an unsupervised manner, between two domains where one contains some additional information compared to the other.
1 code implementation • 5 Apr 2020 • Sagie Benaim, Ron Mokady, Amit Bermano, Daniel Cohen-Or, Lior Wolf
In this paper, we explore the capabilities of neural networks to understand image structure given only a single pair of images, A and B.
1 code implementation • 15 Jun 2019 • Ron Mokady, Sagie Benaim, Lior Wolf, Amit Bermano
We consider the problem of translating, in an unsupervised manner, between two domains where one contains some additional information compared to the other.