no code implementations • 3 Dec 2024 • Daniel Geng, Charles Herrmann, Junhwa Hur, Forrester Cole, Serena Zhang, Tobias Pfaff, Tatiana Lopez-Guevara, Carl Doersch, Yusuf Aytar, Michael Rubinstein, Chen Sun, Oliver Wang, Andrew Owens, Deqing Sun
Motion control is crucial for generating expressive and compelling video content; however, most existing video generation models rely mainly on text prompts for control, which struggle to capture the nuances of dynamic actions and temporal compositions.
no code implementations • 15 Oct 2024 • Junhwa Hur, Charles Herrmann, Saurabh Saxena, Janne Kontkanen, Wei-Sheng Lai, YiChang Shih, Michael Rubinstein, David J. Fleet, Deqing Sun
To address these issues, we introduce a patch-based cascaded pixel diffusion model for high resolution frame interpolation, HIFI, that excels in these scenarios while achieving competitive performance on standard benchmarks.
Ranked #1 on
Video Frame Interpolation
on X4K1000FPS
no code implementations • 15 Oct 2024 • Xirui Li, Charles Herrmann, Kelvin C. K. Chan, Yinxiao Li, Deqing Sun, Chao Ma, Ming-Hsuan Yang
Recent progress in image generation has sparked research into controlling these models through condition signals, with various methods addressing specific challenges in conditional generation.
no code implementations • 4 Oct 2024 • Junyi Zhang, Charles Herrmann, Junhwa Hur, Varun Jampani, Trevor Darrell, Forrester Cole, Deqing Sun, Ming-Hsuan Yang
Estimating geometry from dynamic scenes, where objects move and deform over time, remains a core challenge in computer vision.
no code implementations • 13 Jun 2024 • Hong-Xing Yu, Haoyi Duan, Charles Herrmann, William T. Freeman, Jiajun Wu
Existing scene generation approaches fall short of speed as they often require (1) progressively generating many views and depth maps, and (2) time-consuming optimization of the scene geometry representations.
no code implementations • 4 Apr 2024 • Michelle Shu, Charles Herrmann, Richard Strong Bowen, Forrester Cole, Ramin Zabih
Text-conditioned diffusion models can generate impressive images, but fall short when it comes to fine-grained control.
2 code implementations • 23 Jan 2024 • Omer Bar-Tal, Hila Chefer, Omer Tov, Charles Herrmann, Roni Paiss, Shiran Zada, Ariel Ephrat, Junhwa Hur, Guanghui Liu, Amit Raj, Yuanzhen Li, Michael Rubinstein, Tomer Michaeli, Oliver Wang, Deqing Sun, Tali Dekel, Inbar Mosseri
We introduce Lumiere -- a text-to-video diffusion model designed for synthesizing videos that portray realistic, diverse and coherent motion -- a pivotal challenge in video synthesis.
Ranked #5 on
Text-to-Video Generation
on UCF-101
no code implementations • 2 Jan 2024 • Xiaotong Wu, Wei-Sheng Lai, YiChang Shih, Charles Herrmann, Michael Krainin, Deqing Sun, Chia-Kai Liang
DSLR cameras can achieve multiple zoom levels via shifting lens distances or swapping lens types.
no code implementations • 1 Jan 2024 • Mia Gaia Polansky, Charles Herrmann, Junhwa Hur, Deqing Sun, Dor Verbin, Todd Zickler
We present a lightweight network that infers grouping and boundaries, including curves, corners and junctions.
no code implementations • 20 Dec 2023 • Saurabh Saxena, Junhwa Hur, Charles Herrmann, Deqing Sun, David J. Fleet
In contrast, we advocate a generic, task-agnostic diffusion model, with several advancements such as log-scale depth parameterization to enable joint modeling of indoor and outdoor scenes, conditioning on the field-of-view (FOV) to handle scale ambiguity and synthetically augmenting FOV during training to generalize beyond the limited camera intrinsics in training datasets.
Ranked #22 on
Monocular Depth Estimation
on NYU-Depth V2
(using extra training data)
no code implementations • CVPR 2024 • Hong-Xing Yu, Haoyi Duan, Junhwa Hur, Kyle Sargent, Michael Rubinstein, William T. Freeman, Forrester Cole, Deqing Sun, Noah Snavely, Jiajun Wu, Charles Herrmann
We introduce WonderJourney, a modularized framework for perpetual 3D scene generation.
no code implementations • 29 Nov 2023 • Jiao Sun, Deqing Fu, Yushi Hu, Su Wang, Royi Rassin, Da-Cheng Juan, Dana Alon, Charles Herrmann, Sjoerd van Steenkiste, Ranjay Krishna, Cyrus Rashtchian
Then, it uses two VLMs to select the best generation: a Visual Question Answering model that measures the alignment of generated images to the text, and another that measures the generation's aesthetic quality.
1 code implementation • CVPR 2024 • Junyi Zhang, Charles Herrmann, Junhwa Hur, Eric Chen, Varun Jampani, Deqing Sun, Ming-Hsuan Yang
This paper identifies the importance of being geometry-aware for semantic correspondence and reveals a limitation of the features of current foundation models under simple post-processing.
Ranked #1 on
Semantic correspondence
on SPair-71k
1 code implementation • CVPR 2024 • Kyle Sargent, Zizhang Li, Tanmay Shah, Charles Herrmann, Hong-Xing Yu, Yunzhi Zhang, Eric Ryan Chan, Dmitry Lagun, Li Fei-Fei, Deqing Sun, Jiajun Wu
Further, we observe that Score Distillation Sampling (SDS) tends to truncate the distribution of complex backgrounds during distillation of 360-degree scenes, and propose "SDS anchoring" to improve the diversity of synthesized novel views.
no code implementations • 10 Jul 2023 • Cyrus Rashtchian, Charles Herrmann, Chun-Sung Ferng, Ayan Chakrabarti, Dilip Krishnan, Deqing Sun, Da-Cheng Juan, Andrew Tomkins
We find that image-text models (CLIP and ALIGN) are better at recognizing new examples of style transfer than masking-based models (CAN and MAE).
no code implementations • NeurIPS 2023 • Saurabh Saxena, Charles Herrmann, Junhwa Hur, Abhishek Kar, Mohammad Norouzi, Deqing Sun, David J. Fleet
Denoising diffusion probabilistic models have transformed image generation with their impressive fidelity and diversity.
1 code implementation • NeurIPS 2023 • Junyi Zhang, Charles Herrmann, Junhwa Hur, Luisa Polania Cabrera, Varun Jampani, Deqing Sun, Ming-Hsuan Yang
Text-to-image diffusion models have made significant advances in generating and editing high-quality images.
Ranked #1 on
Dense Pixel Correspondence Estimation
on TSS
Dense Pixel Correspondence Estimation
Representation Learning
+2
no code implementations • ICCV 2023 • Kyle Sargent, Jing Yu Koh, Han Zhang, Huiwen Chang, Charles Herrmann, Pratul Srinivasan, Jiajun Wu, Deqing Sun
Recent work has shown the possibility of training generative models of 3D content from 2D image collections on small datasets corresponding to a single object class, such as human faces, animal faces, or cars.
no code implementations • CVPR 2023 • Hong-Xing Yu, Samir Agarwala, Charles Herrmann, Richard Szeliski, Noah Snavely, Jiajun Wu, Deqing Sun
Recovering lighting in a scene from a single image is a fundamental problem in computer vision.
no code implementations • CVPR 2023 • Hsin-Ping Huang, Charles Herrmann, Junhwa Hur, Erika Lu, Kyle Sargent, Austin Stone, Ming-Hsuan Yang, Deqing Sun
Recently, AutoFlow has shown promising results on learning a training set for optical flow, but requires ground truth labels in the target domain to compute its search metric.
no code implementations • 21 Mar 2022 • Deqing Sun, Charles Herrmann, Fitsum Reda, Michael Rubinstein, David Fleet, William T. Freeman
Our newly trained RAFT achieves an Fl-all score of 4. 31% on KITTI 2015, more accurate than all published optical flow methods at the time of writing.
1 code implementation • CVPR 2022 • Klaus Greff, Francois Belletti, Lucas Beyer, Carl Doersch, Yilun Du, Daniel Duckworth, David J. Fleet, Dan Gnanapragasam, Florian Golemo, Charles Herrmann, Thomas Kipf, Abhijit Kundu, Dmitry Lagun, Issam Laradji, Hsueh-Ti, Liu, Henning Meyer, Yishu Miao, Derek Nowrouzezahrai, Cengiz Oztireli, Etienne Pot, Noha Radwan, Daniel Rebain, Sara Sabour, Mehdi S. M. Sajjadi, Matan Sela, Vincent Sitzmann, Austin Stone, Deqing Sun, Suhani Vora, Ziyu Wang, Tianhao Wu, Kwang Moo Yi, Fangcheng Zhong, Andrea Tagliasacchi
Data is the driving force of machine learning, with the amount and quality of training data often being more important for the performance of a system than architecture and training details.
1 code implementation • CVPR 2022 • Charles Herrmann, Kyle Sargent, Lu Jiang, Ramin Zabih, Huiwen Chang, Ce Liu, Dilip Krishnan, Deqing Sun
In this work, we present pyramid adversarial training (PyramidAT), a simple and effective technique to improve ViT's overall performance.
Ranked #9 on
Domain Generalization
on ImageNet-C
(using extra training data)
no code implementations • ICCV 2021 • Michelle Shu, Richard Strong Bowen, Charles Herrmann, Gengmo Qi, Michele Santacatterina, Ramin Zabih
Time-to-event analysis is an important statistical tool for allocating clinical resources such as ICU beds.
no code implementations • CVPR 2021 • Richard Strong Bowen, Huiwen Chang, Charles Herrmann, Piotr Teterwak, Ce Liu, Ramin Zabih
Existing methods struggle to extrapolate images with salient objects in the foreground or are limited to very specific objects such as humans, but tend to work well on indoor/outdoor scenes.
1 code implementation • CVPR 2021 • Deqing Sun, Daniel Vlasic, Charles Herrmann, Varun Jampani, Michael Krainin, Huiwen Chang, Ramin Zabih, William T. Freeman, Ce Liu
Synthetic datasets play a critical role in pre-training CNN models for optical flow, but they are painstaking to generate and hard to adapt to new applications.
no code implementations • ECCV 2018 • Charles Herrmann, Chen Wang, Richard Strong Bowen, Emil Keyder, Michael Krainin, Ce Liu, Ramin Zabih
Here, we observe that the use of a single registration often leads to errors, especially in scenes with significant depth variation or object motion.
no code implementations • ECCV 2018 • Charles Herrmann, Chen Wang, Richard Strong Bowen, Emil Keyder, Ramin Zabih
Image stitching is typically decomposed into three phases: registration, which aligns the source images with a common target image; seam finding, which determines for each target pixel the source image it should come from; and blending, which smooths transitions over the seams.
no code implementations • CVPR 2020 • Charles Herrmann, Richard Strong Bowen, Neal Wadhwa, Rahul Garg, Qiurui He, Jonathan T. Barron, Ramin Zabih
Autofocus is an important task for digital cameras, yet current approaches often exhibit poor performance.
1 code implementation • ECCV 2020 • Charles Herrmann, Richard Strong Bowen, Ramin Zabih
Important applications such as mobile computing require reducing the computational costs of neural network inference.
no code implementations • ICCV 2017 • Chen Wang, Charles Herrmann, Ramin Zabih
While Markov Random Fields (MRFs) are widely used in computer vision, they present a quite challenging inference problem.