no code implementations • 3 Dec 2024 • Daniel Geng, Charles Herrmann, Junhwa Hur, Forrester Cole, Serena Zhang, Tobias Pfaff, Tatiana Lopez-Guevara, Carl Doersch, Yusuf Aytar, Michael Rubinstein, Chen Sun, Oliver Wang, Andrew Owens, Deqing Sun
Motion control is crucial for generating expressive and compelling video content; however, most existing video generation models rely mainly on text prompts for control, which struggle to capture the nuances of dynamic actions and temporal compositions.
no code implementations • 15 Oct 2024 • Junhwa Hur, Charles Herrmann, Saurabh Saxena, Janne Kontkanen, Wei-Sheng Lai, YiChang Shih, Michael Rubinstein, David J. Fleet, Deqing Sun
However, contrary to prior work on cascaded diffusion models which perform diffusion on increasingly large resolutions, we use a single model that always performs diffusion at the same resolution and upsamples by processing patches of the inputs and the prior solution.
Ranked #1 on
Video Frame Interpolation
on X4K1000FPS
no code implementations • 4 Oct 2024 • Junyi Zhang, Charles Herrmann, Junhwa Hur, Varun Jampani, Trevor Darrell, Forrester Cole, Deqing Sun, Ming-Hsuan Yang
Estimating geometry from dynamic scenes, where objects move and deform over time, remains a core challenge in computer vision.
1 code implementation • 23 Jan 2024 • Omer Bar-Tal, Hila Chefer, Omer Tov, Charles Herrmann, Roni Paiss, Shiran Zada, Ariel Ephrat, Junhwa Hur, Guanghui Liu, Amit Raj, Yuanzhen Li, Michael Rubinstein, Tomer Michaeli, Oliver Wang, Deqing Sun, Tali Dekel, Inbar Mosseri
We introduce Lumiere -- a text-to-video diffusion model designed for synthesizing videos that portray realistic, diverse and coherent motion -- a pivotal challenge in video synthesis.
Ranked #5 on
Text-to-Video Generation
on UCF-101
no code implementations • 1 Jan 2024 • Mia Gaia Polansky, Charles Herrmann, Junhwa Hur, Deqing Sun, Dor Verbin, Todd Zickler
We present a lightweight network that infers grouping and boundaries, including curves, corners and junctions.
no code implementations • 20 Dec 2023 • Saurabh Saxena, Junhwa Hur, Charles Herrmann, Deqing Sun, David J. Fleet
In contrast, we advocate a generic, task-agnostic diffusion model, with several advancements such as log-scale depth parameterization to enable joint modeling of indoor and outdoor scenes, conditioning on the field-of-view (FOV) to handle scale ambiguity and synthetically augmenting FOV during training to generalize beyond the limited camera intrinsics in training datasets.
Ranked #19 on
Monocular Depth Estimation
on NYU-Depth V2
(using extra training data)
no code implementations • CVPR 2024 • Hong-Xing Yu, Haoyi Duan, Junhwa Hur, Kyle Sargent, Michael Rubinstein, William T. Freeman, Forrester Cole, Deqing Sun, Noah Snavely, Jiajun Wu, Charles Herrmann
We introduce WonderJourney, a modularized framework for perpetual 3D scene generation.
1 code implementation • CVPR 2024 • Junyi Zhang, Charles Herrmann, Junhwa Hur, Eric Chen, Varun Jampani, Deqing Sun, Ming-Hsuan Yang
This paper identifies the importance of being geometry-aware for semantic correspondence and reveals a limitation of the features of current foundation models under simple post-processing.
Ranked #1 on
Semantic correspondence
on SPair-71k
no code implementations • NeurIPS 2023 • Saurabh Saxena, Charles Herrmann, Junhwa Hur, Abhishek Kar, Mohammad Norouzi, Deqing Sun, David J. Fleet
Denoising diffusion probabilistic models have transformed image generation with their impressive fidelity and diversity.
1 code implementation • NeurIPS 2023 • Junyi Zhang, Charles Herrmann, Junhwa Hur, Luisa Polania Cabrera, Varun Jampani, Deqing Sun, Ming-Hsuan Yang
Text-to-image diffusion models have made significant advances in generating and editing high-quality images.
Ranked #1 on
Dense Pixel Correspondence Estimation
on TSS
Dense Pixel Correspondence Estimation
Representation Learning
+2
no code implementations • CVPR 2023 • Hsin-Ping Huang, Charles Herrmann, Junhwa Hur, Erika Lu, Kyle Sargent, Austin Stone, Ming-Hsuan Yang, Deqing Sun
Recently, AutoFlow has shown promising results on learning a training set for optical flow, but requires ground truth labels in the target domain to compute its search metric.
no code implementations • 3 May 2022 • Bayram Bayramli, Junhwa Hur, Hongtao Lu
Self-supervised methods demonstrate learning scene flow estimation from unlabeled data, yet their accuracy lags behind (semi-)supervised methods.
1 code implementation • CVPR 2021 • Junhwa Hur, Stefan Roth
Estimating 3D scene flow from a sequence of monocular images has been gaining increased attention due to the simple, economical capture setup.
1 code implementation • CVPR 2020 • Junhwa Hur, Stefan Roth
Our model achieves state-of-the-art accuracy among unsupervised/self-supervised learning approaches to monocular scene flow, and yields competitive results for the optical flow and monocular depth estimation sub-tasks.
no code implementations • 6 Apr 2020 • Junhwa Hur, Stefan Roth
Akin to many subareas of computer vision, the recent advances in deep learning have also significantly influenced the literature on optical flow.
2 code implementations • CVPR 2019 • Junhwa Hur, Stefan Roth
While leading to more accurate results, the downside of this is an increased number of parameters.
Ranked #10 on
Optical Flow Estimation
on KITTI 2012
2 code implementations • 21 Nov 2017 • Simon Meister, Junhwa Hur, Stefan Roth
By optionally fine-tuning on the KITTI training data, our method achieves competitive optical flow accuracy on the KITTI 2012 and 2015 benchmarks, thus in addition enabling generic pre-training of supervised networks for datasets with limited amounts of ground truth.
no code implementations • ICCV 2017 • Junhwa Hur, Stefan Roth
The key feature of our model is to fully exploit the symmetry properties that characterize optical flow and occlusions in the two consecutive images.
no code implementations • 26 Jul 2016 • Junhwa Hur, Stefan Roth
The importance and demands of visual scene understanding have been steadily increasing along with the active development of autonomous systems.
no code implementations • CVPR 2015 • Junhwa Hur, Hwasup Lim, Changsoo Park, Sang Chul Ahn
We present a Generalized Deformable Spatial Pyramid (GDSP) matching algorithm for calculating the dense correspondence between a pair of images with large appearance variations.