no code implementations • 4 Oct 2024 • Wenhao Chai, Enxin Song, Yilun Du, Chenlin Meng, Vashisht Madhavan, Omer Bar-Tal, Jeng-Neng Hwang, Saining Xie, Christopher D. Manning
AuroraCap shows superior performance on various video and image captioning benchmarks, for example, obtaining a CIDEr of 88. 9 on Flickr30k, beating GPT-4V (55. 3) and Gemini-1. 5 Pro (82. 2).
no code implementations • 23 Jan 2024 • Omer Bar-Tal, Hila Chefer, Omer Tov, Charles Herrmann, Roni Paiss, Shiran Zada, Ariel Ephrat, Junhwa Hur, Guanghui Liu, Amit Raj, Yuanzhen Li, Michael Rubinstein, Tomer Michaeli, Oliver Wang, Deqing Sun, Tali Dekel, Inbar Mosseri
We introduce Lumiere -- a text-to-video diffusion model designed for synthesizing videos that portray realistic, diverse and coherent motion -- a pivotal challenge in video synthesis.
Ranked #5 on Text-to-Video Generation on UCF-101
no code implementations • CVPR 2024 • Danah Yatim, Rafail Fridman, Omer Bar-Tal, Yoni Kasten, Tali Dekel
This loss guides the generation process to preserve the overall motion of the input video while complying with the target object in terms of shape and fine-grained motion traits.
no code implementations • 20 Nov 2023 • Narek Tumanyan, Omer Bar-Tal, Shir Amir, Shai Bagon, Tali Dekel
Specifically, our goal is to generate an image in which objects in a source structure image are "painted" with the visual appearance of their semantically related objects in a target appearance image.
1 code implementation • 19 Jul 2023 • Michal Geyer, Omer Bar-Tal, Shai Bagon, Tali Dekel
In this work, we present a framework that harnesses the power of a text-to-image diffusion model for the task of text-driven video editing.
3 code implementations • 16 Feb 2023 • Omer Bar-Tal, Lior Yariv, Yaron Lipman, Tali Dekel
In this work, we present MultiDiffusion, a unified framework that enables versatile and controllable image generation, using a pre-trained text-to-image diffusion model, without any further training or finetuning.
1 code implementation • 5 Apr 2022 • Omer Bar-Tal, Dolev Ofri-Amar, Rafail Fridman, Yoni Kasten, Tali Dekel
Given an input image or video and a target text prompt, our goal is to edit the appearance of existing objects (e. g., object's texture) or augment the scene with visual effects (e. g., smoke, fire) in a semantically meaningful manner.
1 code implementation • CVPR 2022 • Narek Tumanyan, Omer Bar-Tal, Shai Bagon, Tali Dekel
Specifically, our goal is to generate an image in which objects in a source structure image are "painted" with the visual appearance of their semantically related objects in a target appearance image.
no code implementations • 29 Sep 2021 • Alycia Lee, Anthony L Pineci, Uriah Israel, Omer Bar-Tal, Leeat Keren, David A. Van Valen, Anima Anandkumar, Yisong Yue, Anqi Liu
For each layer, we also achieve higher accuracy when the overall accuracy is kept fixed across different methods.