no code implementations • 16 Jul 2024 • Vasco Ramos, Yonatan Bitton, Michal Yarom, Idan Szpektor, Joao Magalhaes
Generated video scenes for action-centric sequence descriptions like recipe instructions and do-it-yourself projects include non-linear patterns, in which the next video may require to be visually consistent not on the immediate previous video but on earlier ones.
no code implementations • 5 Jun 2024 • Hritik Bansal, Zongyu Lin, Tianyi Xie, Zeshun Zong, Michal Yarom, Yonatan Bitton, Chenfanfu Jiang, Yizhou Sun, Kai-Wei Chang, Aditya Grover
Recent advances in internet-scale video data pretraining have led to the development of text-to-video generative models that can create high-quality videos across a broad range of visual concepts, synthesize realistic motions and render complex objects.
no code implementations • 16 May 2024 • João Bordalo, Vasco Ramos, Rodrigo Valério, Diogo Glória-Silva, Yonatan Bitton, Michal Yarom, Idan Szpektor, Joao Magalhaes
In addition, to maintain the visual coherence of the image sequence, we introduce a copy mechanism to initialise reverse diffusion processes with a latent vector iteration from a previously generated image from a relevant step.
1 code implementation • 7 May 2024 • Hritik Bansal, Yonatan Bitton, Michal Yarom, Idan Szpektor, Aditya Grover, Kai-Wei Chang
For instance, we condition the visual features of the earlier and later scenes of the generated video with the representations of the first scene description (e. g., 'a red panda climbing a tree') and second scene description (e. g., 'the red panda sleeps on the top of the tree'), respectively.
no code implementations • 24 May 2023 • Rodrigo Valerio, Joao Bordalo, Michal Yarom, Yonatan Bitton, Idan Szpektor, Joao Magalhaes
In this paper, we propose to strengthen the consistency property of T2I methods in the presence of natural complex language, which often breaks the limits of T2I methods by including non-visual information, and textual elements that require knowledge for accurate generation.
1 code implementation • NeurIPS 2023 • Michal Yarom, Yonatan Bitton, Soravit Changpinyo, Roee Aharoni, Jonathan Herzig, Oran Lang, Eran Ofek, Idan Szpektor
Automatically determining whether a text and a corresponding image are semantically aligned is a significant challenge for vision-language models, with applications in generative text-to-image and image-to-text tasks.
Ranked #11 on Visual Reasoning on Winoground
1 code implementation • 12 Sep 2022 • Soravit Changpinyo, Linting Xue, Michal Yarom, Ashish V. Thapliyal, Idan Szpektor, Julien Amelot, Xi Chen, Radu Soricut
In this paper, we propose scalable solutions to multilingual visual question answering (mVQA), on both data and modeling fronts.
no code implementations • 31 Mar 2022 • Yotam Nitzan, Kfir Aberman, Qiurui He, Orly Liba, Michal Yarom, Yossi Gandelsman, Inbar Mosseri, Yael Pritch, Daniel Cohen-Or
Given a small reference set of portrait images of a person (~100), we tune the weights of a pretrained StyleGAN face generator to form a local, low-dimensional, personalized manifold in the latent space.
2 code implementations • 24 Feb 2022 • Ron Mokady, Michal Yarom, Omer Tov, Oran Lang, Daniel Cohen-Or, Tali Dekel, Michal Irani, Inbar Mosseri
To meet these challenges, we proposed a StyleGAN-based self-distillation approach, which consists of two main components: (i) A generative-based self-filtering of the dataset to eliminate outlier images, in order to generate an adequate training set, and (ii) Perceptual clustering of the generated images to detect the inherent data modalities, which are then employed to improve StyleGAN's "truncation trick" in the image synthesis process.
2 code implementations • ICCV 2021 • Oran Lang, Yossi Gandelsman, Michal Yarom, Yoav Wald, Gal Elidan, Avinatan Hassidim, William T. Freeman, Phillip Isola, Amir Globerson, Michal Irani, Inbar Mosseri
A natural source for such attributes is the StyleSpace of StyleGAN, which is known to generate semantically meaningful dimensions in the image.
2 code implementations • CVPR 2020 • Assaf Shocher, Yossi Gandelsman, Inbar Mosseri, Michal Yarom, Michal Irani, William T. Freeman, Tali Dekel
We demonstrate that our model results in a versatile and flexible framework that can be used in various classic and novel image generation tasks.
no code implementations • 14 Dec 2016 • Michal Yarom, Michal Irani
However, to find similar actions across videos, we consider only a small subset of the descriptors - the statistical significant descriptors.