1 code implementation • 12 Apr 2024 • Mohamed El Banani, Amit Raj, Kevis-Kokitsi Maninis, Abhishek Kar, Yuanzhen Li, Michael Rubinstein, Deqing Sun, Leonidas Guibas, Justin Johnson, Varun Jampani
Given that such models can classify, delineate, and localize objects in 2D, we ask whether they also represent their 3D structure?
no code implementations • 23 Jan 2024 • Omer Bar-Tal, Hila Chefer, Omer Tov, Charles Herrmann, Roni Paiss, Shiran Zada, Ariel Ephrat, Junhwa Hur, Guanghui Liu, Amit Raj, Yuanzhen Li, Michael Rubinstein, Tomer Michaeli, Oliver Wang, Deqing Sun, Tali Dekel, Inbar Mosseri
We introduce Lumiere -- a text-to-video diffusion model designed for synthesizing videos that portray realistic, diverse and coherent motion -- a pivotal challenge in video synthesis.
Ranked #6 on Text-to-Video Generation on UCF-101
no code implementations • 6 Dec 2023 • Hong-Xing Yu, Haoyi Duan, Junhwa Hur, Kyle Sargent, Michael Rubinstein, William T. Freeman, Forrester Cole, Deqing Sun, Noah Snavely, Jiajun Wu, Charles Herrmann
We introduce WonderJourney, a modularized framework for perpetual 3D scene generation.
1 code implementation • 2 Nov 2023 • Assaf Shocher, Amil Dravid, Yossi Gandelsman, Inbar Mosseri, Michael Rubinstein, Alexei A. Efros
We define the target manifold as the set of all instances that $f$ maps to themselves.
no code implementations • 28 Sep 2023 • Luming Tang, Nataniel Ruiz, Qinghao Chu, Yuanzhen Li, Aleksander Holynski, David E. Jacobs, Bharath Hariharan, Yael Pritch, Neal Wadhwa, Kfir Aberman, Michael Rubinstein
Once personalized, RealFill is able to complete a target image with visually compelling contents that are faithful to the original scene.
no code implementations • 7 Aug 2023 • Brandon Y. Feng, Hadi AlZayer, Michael Rubinstein, William T. Freeman, Jia-Bin Huang
Motion magnification helps us visualize subtle, imperceptible motion.
2 code implementations • 13 Jul 2023 • Nataniel Ruiz, Yuanzhen Li, Varun Jampani, Wei Wei, Tingbo Hou, Yael Pritch, Neal Wadhwa, Michael Rubinstein, Kfir Aberman
By composing these weights into the diffusion model, coupled with fast finetuning, HyperDreamBooth can generate a person's face in various contexts and styles, with high subject details while also preserving the model's crucial knowledge of diverse styles and semantic modifications.
no code implementations • 8 Jun 2023 • Manel Baradad, Yuanzhen Li, Forrester Cole, Michael Rubinstein, Antonio Torralba, William T. Freeman, Varun Jampani
To infer object depth on a real image, we place the segmented object into the learned background prompt and run off-the-shelf depth networks.
3 code implementations • 1 Jun 2023 • Kihyuk Sohn, Nataniel Ruiz, Kimin Lee, Daniel Castro Chin, Irina Blok, Huiwen Chang, Jarred Barber, Lu Jiang, Glenn Entis, Yuanzhen Li, Yuan Hao, Irfan Essa, Michael Rubinstein, Dilip Krishnan
Pre-trained large text-to-image models synthesize impressive images with an appropriate use of text prompts.
no code implementations • ICCV 2023 • Berthy T. Feng, Jamie Smith, Michael Rubinstein, Huiwen Chang, Katherine L. Bouman, William T. Freeman
In this work, we empirically validate the theoretically-proven probability function of a score-based diffusion model.
no code implementations • ICCV 2023 • Amit Raj, Srinivas Kaza, Ben Poole, Michael Niemeyer, Nataniel Ruiz, Ben Mildenhall, Shiran Zada, Kfir Aberman, Michael Rubinstein, Jonathan Barron, Yuanzhen Li, Varun Jampani
We present DreamBooth3D, an approach to personalize text-to-3D generative models from as few as 3-6 casually captured images of a subject.
4 code implementations • 2 Jan 2023 • Huiwen Chang, Han Zhang, Jarred Barber, AJ Maschinot, Jose Lezama, Lu Jiang, Ming-Hsuan Yang, Kevin Murphy, William T. Freeman, Michael Rubinstein, Yuanzhen Li, Dilip Krishnan
Compared to pixel-space diffusion models, such as Imagen and DALL-E 2, Muse is significantly more efficient due to the use of discrete tokens and requiring fewer sampling iterations; compared to autoregressive models, such as Parti, Muse is more efficient due to the use of parallel decoding.
Ranked #1 on Text-to-Image Generation on MS-COCO (FID metric)
no code implementations • ICCV 2023 • Brandon Y. Feng, Hadi AlZayer, Michael Rubinstein, William T. Freeman, Jia-Bin Huang
Motion magnification helps us visualize subtle, imperceptible motion.
1 code implementation • CVPR 2023 • Chun-Han Yao, Wei-Chih Hung, Yuanzhen Li, Michael Rubinstein, Ming-Hsuan Yang, Varun Jampani
Automatically estimating 3D skeleton, shape, camera viewpoints, and part articulation from sparse in-the-wild image ensembles is a severely under-constrained and challenging problem.
1 code implementation • CVPR 2023 • Itai Lang, Dror Aiger, Forrester Cole, Shai Avidan, Michael Rubinstein
Scene flow estimation is a long-standing problem in computer vision, where the goal is to find the 3D motion of a scene from its consecutive observations.
10 code implementations • CVPR 2023 • Nataniel Ruiz, Yuanzhen Li, Varun Jampani, Yael Pritch, Michael Rubinstein, Kfir Aberman
Once the subject is embedded in the output domain of the model, the unique identifier can be used to synthesize novel photorealistic images of the subject contextualized in different scenes.
no code implementations • 14 Aug 2022 • Medhini Narasimhan, Arsha Nagrani, Chen Sun, Michael Rubinstein, Trevor Darrell, Anna Rohrbach, Cordelia Schmid
In this work, we focus on summarizing instructional videos, an under-explored area of video summarization.
no code implementations • 7 Jul 2022 • Chun-Han Yao, Wei-Chih Hung, Yuanzhen Li, Michael Rubinstein, Ming-Hsuan Yang, Varun Jampani
In this work, we propose a practical problem setting to estimate 3D pose and shape of animals given only a few (10-30) in-the-wild images of a particular animal species (say, horse).
no code implementations • 21 Mar 2022 • Deqing Sun, Charles Herrmann, Fitsum Reda, Michael Rubinstein, David Fleet, William T. Freeman
Our newly trained RAFT achieves an Fl-all score of 4. 31% on KITTI 2015, more accurate than all published optical flow methods at the time of writing.
no code implementations • CVPR 2022 • Kfir Aberman, Junfeng He, Yossi Gandelsman, Inbar Mosseri, David E. Jacobs, Kai Kohlhoff, Yael Pritch, Michael Rubinstein
Using only a model that was trained to predict where people look at images, and no additional training data, we can produce a range of powerful editing effects for reducing distraction in images.
no code implementations • CVPR 2021 • Erika Lu, Forrester Cole, Tali Dekel, Andrew Zisserman, William T. Freeman, Michael Rubinstein
We show results on real-world videos containing interactions between different types of subjects (cars, animals, people) and complex effects, ranging from semi-transparent elements such as smoke and reflections, to fully opaque effects such as objects attached to the subject.
1 code implementation • 16 Sep 2020 • Erika Lu, Forrester Cole, Tali Dekel, Weidi Xie, Andrew Zisserman, David Salesin, William T. Freeman, Michael Rubinstein
We present a method for retiming people in an ordinary, natural video -- manipulating and editing the time in which different motions of individuals in the video occur.
1 code implementation • CVPR 2020 • Sagie Benaim, Ariel Ephrat, Oran Lang, Inbar Mosseri, William T. Freeman, Michael Rubinstein, Michal Irani, Tali Dekel
We demonstrate how those learned features can boost the performance of self-supervised action recognition, and can be used for video retrieval.
3 code implementations • CVPR 2019 • Tae-Hyun Oh, Tali Dekel, Changil Kim, Inbar Mosseri, William T. Freeman, Michael Rubinstein, Wojciech Matusik
How much can we infer about a person's looks from the way they speak?
5 code implementations • 10 Apr 2018 • Ariel Ephrat, Inbar Mosseri, Oran Lang, Tali Dekel, Kevin Wilson, Avinatan Hassidim, William T. Freeman, Michael Rubinstein
Solving this task using only audio as input is extremely challenging and does not provide an association of the separated speech signals with speakers in the video.
no code implementations • CVPR 2017 • Tali Dekel, Michael Rubinstein, Ce Liu, William T. Freeman
Since such an attack relies on the consistency of watermarks across image collection, we explore and evaluate how it is affected by various types of inconsistencies in the watermark embedding that could potentially be used to make watermarking more secured.
no code implementations • IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, VOL. 39, NO. 4 2017 • Abe Davis, Katherine L. Bouman, Justin G. Chen, Michael Rubinstein, Oral Buyukozturk, Fredo Durand, William T. Freeman
The estimation of material properties is important for scene understanding, with many applications in vision, robotics, andstructural engineering.
no code implementations • IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, VOL. 39, NO. 4 2017 • Abe Davis, Katherine L. Bouman, Justin G. Chen, Michael Rubinstein, Oral Buyukozturk, Fredo Durand, William T. Freeman
The estimation of material properties is important for scene understanding, with many applications in vision, robotics, and structural engineering.
no code implementations • CVPR 2015 • Abe Davis, Katherine L. Bouman, Justin G. Chen, Michael Rubinstein, Fredo Durand, William T. Freeman
The estimation of material properties is important for scene understanding, with many applications in vision, robotics, and structural engineering.
no code implementations • CVPR 2015 • Tali Dekel, Shaul Oron, Michael Rubinstein, Shai Avidan, William T. Freeman
We propose a novel method for template matching in unconstrained environments.
no code implementations • CVPR 2013 • Michael Rubinstein, Armand Joulin, Johannes Kopf, Ce Liu
In contrast to previous co-segmentation methods, our algorithm performs well even in the presence of significant amounts of noise images (images not containing a common object), as typical for datasets collected from Internet search.