Search Results for author: Michal Yarom

Found 12 papers, 6 papers with code

Contrastive Sequential-Diffusion Learning: Non-linear and Multi-Scene Instructional Video Synthesis

no code implementations16 Jul 2024 Vasco Ramos, Yonatan Bitton, Michal Yarom, Idan Szpektor, Joao Magalhaes

Generated video scenes for action-centric sequence descriptions like recipe instructions and do-it-yourself projects include non-linear patterns, in which the next video may require to be visually consistent not on the immediate previous video but on earlier ones.

Denoising

VideoPhy: Evaluating Physical Commonsense for Video Generation

no code implementations5 Jun 2024 Hritik Bansal, Zongyu Lin, Tianyi Xie, Zeshun Zong, Michal Yarom, Yonatan Bitton, Chenfanfu Jiang, Yizhou Sun, Kai-Wei Chang, Aditya Grover

Recent advances in internet-scale video data pretraining have led to the development of text-to-video generative models that can create high-quality videos across a broad range of visual concepts, synthesize realistic motions and render complex objects.

Video Generation

Generating Coherent Sequences of Visual Illustrations for Real-World Manual Tasks

no code implementations16 May 2024 João Bordalo, Vasco Ramos, Rodrigo Valério, Diogo Glória-Silva, Yonatan Bitton, Michal Yarom, Idan Szpektor, Joao Magalhaes

In addition, to maintain the visual coherence of the image sequence, we introduce a copy mechanism to initialise reverse diffusion processes with a latent vector iteration from a previously generated image from a relevant step.

TALC: Time-Aligned Captions for Multi-Scene Text-to-Video Generation

1 code implementation7 May 2024 Hritik Bansal, Yonatan Bitton, Michal Yarom, Idan Szpektor, Aditya Grover, Kai-Wei Chang

For instance, we condition the visual features of the earlier and later scenes of the generated video with the representations of the first scene description (e. g., 'a red panda climbing a tree') and second scene description (e. g., 'the red panda sleeps on the top of the tree'), respectively.

Text-to-Video Generation Video Generation

Transferring Visual Attributes from Natural Language to Verified Image Generation

no code implementations24 May 2023 Rodrigo Valerio, Joao Bordalo, Michal Yarom, Yonatan Bitton, Idan Szpektor, Joao Magalhaes

In this paper, we propose to strengthen the consistency property of T2I methods in the presence of natural complex language, which often breaks the limits of T2I methods by including non-visual information, and textual elements that require knowledge for accurate generation.

Text-to-Image Generation Visual Question Answering (VQA)

What You See is What You Read? Improving Text-Image Alignment Evaluation

1 code implementation NeurIPS 2023 Michal Yarom, Yonatan Bitton, Soravit Changpinyo, Roee Aharoni, Jonathan Herzig, Oran Lang, Eran Ofek, Idan Szpektor

Automatically determining whether a text and a corresponding image are semantically aligned is a significant challenge for vision-language models, with applications in generative text-to-image and image-to-text tasks.

Image to text Question Answering +6

MaXM: Towards Multilingual Visual Question Answering

1 code implementation12 Sep 2022 Soravit Changpinyo, Linting Xue, Michal Yarom, Ashish V. Thapliyal, Idan Szpektor, Julien Amelot, Xi Chen, Radu Soricut

In this paper, we propose scalable solutions to multilingual visual question answering (mVQA), on both data and modeling fronts.

Question Answering Translation +1

MyStyle: A Personalized Generative Prior

no code implementations31 Mar 2022 Yotam Nitzan, Kfir Aberman, Qiurui He, Orly Liba, Michal Yarom, Yossi Gandelsman, Inbar Mosseri, Yael Pritch, Daniel Cohen-Or

Given a small reference set of portrait images of a person (~100), we tune the weights of a pretrained StyleGAN face generator to form a local, low-dimensional, personalized manifold in the latent space.

Image Enhancement Super-Resolution

Self-Distilled StyleGAN: Towards Generation from Internet Photos

2 code implementations24 Feb 2022 Ron Mokady, Michal Yarom, Omer Tov, Oran Lang, Daniel Cohen-Or, Tali Dekel, Michal Irani, Inbar Mosseri

To meet these challenges, we proposed a StyleGAN-based self-distillation approach, which consists of two main components: (i) A generative-based self-filtering of the dataset to eliminate outlier images, in order to generate an adequate training set, and (ii) Perceptual clustering of the generated images to detect the inherent data modalities, which are then employed to improve StyleGAN's "truncation trick" in the image synthesis process.

Image Generation

Semantic Pyramid for Image Generation

2 code implementations CVPR 2020 Assaf Shocher, Yossi Gandelsman, Inbar Mosseri, Michal Yarom, Michal Irani, William T. Freeman, Tali Dekel

We demonstrate that our model results in a versatile and flexible framework that can be used in various classic and novel image generation tasks.

General Classification Image Generation +2

Temporal-Needle: A view and appearance invariant video descriptor

no code implementations14 Dec 2016 Michal Yarom, Michal Irani

However, to find similar actions across videos, we consider only a small subset of the descriptors - the statistical significant descriptors.

Action Detection Clustering +1

Cannot find the paper you are looking for? You can Submit a new open access paper.