Search Results for author: Charles Herrmann

Found 31 papers, 8 papers with code

Motion Prompting: Controlling Video Generation with Motion Trajectories

no code implementations3 Dec 2024 Daniel Geng, Charles Herrmann, Junhwa Hur, Forrester Cole, Serena Zhang, Tobias Pfaff, Tatiana Lopez-Guevara, Carl Doersch, Yusuf Aytar, Michael Rubinstein, Chen Sun, Oliver Wang, Andrew Owens, Deqing Sun

Motion control is crucial for generating expressive and compelling video content; however, most existing video generation models rely mainly on text prompts for control, which struggle to capture the nuances of dynamic actions and temporal compositions.

Video Generation

High-Resolution Frame Interpolation with Patch-based Cascaded Diffusion

no code implementations15 Oct 2024 Junhwa Hur, Charles Herrmann, Saurabh Saxena, Janne Kontkanen, Wei-Sheng Lai, YiChang Shih, Michael Rubinstein, David J. Fleet, Deqing Sun

To address these issues, we introduce a patch-based cascaded pixel diffusion model for high resolution frame interpolation, HIFI, that excels in these scenarios while achieving competitive performance on standard benchmarks.

8k Video Frame Interpolation

A Simple Approach to Unifying Diffusion-based Conditional Generation

no code implementations15 Oct 2024 Xirui Li, Charles Herrmann, Kelvin C. K. Chan, Yinxiao Li, Deqing Sun, Chao Ma, Ming-Hsuan Yang

Recent progress in image generation has sparked research into controlling these models through condition signals, with various methods addressing specific challenges in conditional generation.

Image Generation

WonderWorld: Interactive 3D Scene Generation from a Single Image

no code implementations13 Jun 2024 Hong-Xing Yu, Haoyi Duan, Charles Herrmann, William T. Freeman, Jiajun Wu

Existing scene generation approaches fall short of speed as they often require (1) progressively generating many views and depth maps, and (2) time-consuming optimization of the scene geometry representations.

Depth Estimation Navigate +1

DreamWalk: Style Space Exploration using Diffusion Guidance

no code implementations4 Apr 2024 Michelle Shu, Charles Herrmann, Richard Strong Bowen, Forrester Cole, Ramin Zabih

Text-conditioned diffusion models can generate impressive images, but fall short when it comes to fine-grained control.

Prompt Engineering

Boundary Attention: Learning curves, corners, junctions and grouping

no code implementations1 Jan 2024 Mia Gaia Polansky, Charles Herrmann, Junhwa Hur, Deqing Sun, Dor Verbin, Todd Zickler

We present a lightweight network that infers grouping and boundaries, including curves, corners and junctions.

Zero-Shot Metric Depth with a Field-of-View Conditioned Diffusion Model

no code implementations20 Dec 2023 Saurabh Saxena, Junhwa Hur, Charles Herrmann, Deqing Sun, David J. Fleet

In contrast, we advocate a generic, task-agnostic diffusion model, with several advancements such as log-scale depth parameterization to enable joint modeling of indoor and outdoor scenes, conditioning on the field-of-view (FOV) to handle scale ambiguity and synthetically augmenting FOV during training to generalize beyond the limited camera intrinsics in training datasets.

Ranked #22 on Monocular Depth Estimation on NYU-Depth V2 (using extra training data)

Denoising Monocular Depth Estimation

DreamSync: Aligning Text-to-Image Generation with Image Understanding Feedback

no code implementations29 Nov 2023 Jiao Sun, Deqing Fu, Yushi Hu, Su Wang, Royi Rassin, Da-Cheng Juan, Dana Alon, Charles Herrmann, Sjoerd van Steenkiste, Ranjay Krishna, Cyrus Rashtchian

Then, it uses two VLMs to select the best generation: a Visual Question Answering model that measures the alignment of generated images to the text, and another that measures the generation's aesthetic quality.

Question Answering Text-to-Image Generation +1

Telling Left from Right: Identifying Geometry-Aware Semantic Correspondence

1 code implementation CVPR 2024 Junyi Zhang, Charles Herrmann, Junhwa Hur, Eric Chen, Varun Jampani, Deqing Sun, Ming-Hsuan Yang

This paper identifies the importance of being geometry-aware for semantic correspondence and reveals a limitation of the features of current foundation models under simple post-processing.

Animal Pose Estimation Semantic correspondence

ZeroNVS: Zero-Shot 360-Degree View Synthesis from a Single Image

1 code implementation CVPR 2024 Kyle Sargent, Zizhang Li, Tanmay Shah, Charles Herrmann, Hong-Xing Yu, Yunzhi Zhang, Eric Ryan Chan, Dmitry Lagun, Li Fei-Fei, Deqing Sun, Jiajun Wu

Further, we observe that Score Distillation Sampling (SDS) tends to truncate the distribution of complex backgrounds during distillation of 360-degree scenes, and propose "SDS anchoring" to improve the diversity of synthesized novel views.

Diversity NeRF +1

Substance or Style: What Does Your Image Embedding Know?

no code implementations10 Jul 2023 Cyrus Rashtchian, Charles Herrmann, Chun-Sung Ferng, Ayan Chakrabarti, Dilip Krishnan, Deqing Sun, Da-Cheng Juan, Andrew Tomkins

We find that image-text models (CLIP and ALIGN) are better at recognizing new examples of style transfer than masking-based models (CAN and MAE).

Style Transfer

VQ3D: Learning a 3D-Aware Generative Model on ImageNet

no code implementations ICCV 2023 Kyle Sargent, Jing Yu Koh, Han Zhang, Huiwen Chang, Charles Herrmann, Pratul Srinivasan, Jiajun Wu, Deqing Sun

Recent work has shown the possibility of training generative models of 3D content from 2D image collections on small datasets corresponding to a single object class, such as human faces, animal faces, or cars.

Decoder NeRF +1

Self-supervised AutoFlow

no code implementations CVPR 2023 Hsin-Ping Huang, Charles Herrmann, Junhwa Hur, Erika Lu, Kyle Sargent, Austin Stone, Ming-Hsuan Yang, Deqing Sun

Recently, AutoFlow has shown promising results on learning a training set for optical flow, but requires ground truth labels in the target domain to compute its search metric.

Optical Flow Estimation

Disentangling Architecture and Training for Optical Flow

no code implementations21 Mar 2022 Deqing Sun, Charles Herrmann, Fitsum Reda, Michael Rubinstein, David Fleet, William T. Freeman

Our newly trained RAFT achieves an Fl-all score of 4. 31% on KITTI 2015, more accurate than all published optical flow methods at the time of writing.

Optical Flow Estimation

Pyramid Adversarial Training Improves ViT Performance

1 code implementation CVPR 2022 Charles Herrmann, Kyle Sargent, Lu Jiang, Ramin Zabih, Huiwen Chang, Ce Liu, Dilip Krishnan, Deqing Sun

In this work, we present pyramid adversarial training (PyramidAT), a simple and effective technique to improve ViT's overall performance.

Ranked #9 on Domain Generalization on ImageNet-C (using extra training data)

Adversarial Attack Data Augmentation +2

OCONet: Image Extrapolation by Object Completion

no code implementations CVPR 2021 Richard Strong Bowen, Huiwen Chang, Charles Herrmann, Piotr Teterwak, Ce Liu, Ramin Zabih

Existing methods struggle to extrapolate images with salient objects in the foreground or are limited to very specific objects such as humans, but tend to work well on indoor/outdoor scenes.

Decoder Object

AutoFlow: Learning a Better Training Set for Optical Flow

1 code implementation CVPR 2021 Deqing Sun, Daniel Vlasic, Charles Herrmann, Varun Jampani, Michael Krainin, Huiwen Chang, Ramin Zabih, William T. Freeman, Ce Liu

Synthetic datasets play a critical role in pre-training CNN models for optical flow, but they are painstaking to generate and hard to adapt to new applications.

Optical Flow Estimation

Robust image stitching with multiple registrations

no code implementations ECCV 2018 Charles Herrmann, Chen Wang, Richard Strong Bowen, Emil Keyder, Michael Krainin, Ce Liu, Ramin Zabih

Here, we observe that the use of a single registration often leads to errors, especially in scenes with significant depth variation or object motion.

Image Stitching

Object-centered image stitching

no code implementations ECCV 2018 Charles Herrmann, Chen Wang, Richard Strong Bowen, Emil Keyder, Ramin Zabih

Image stitching is typically decomposed into three phases: registration, which aligns the source images with a common target image; seam finding, which determines for each target pixel the source image it should come from; and blending, which smooths transitions over the seams.

Image Stitching Object +2

Learning to Autofocus

no code implementations CVPR 2020 Charles Herrmann, Richard Strong Bowen, Neal Wadhwa, Rahul Garg, Qiurui He, Jonathan T. Barron, Ramin Zabih

Autofocus is an important task for digital cameras, yet current approaches often exhibit poor performance.

Depth Estimation

Channel selection using Gumbel Softmax

1 code implementation ECCV 2020 Charles Herrmann, Richard Strong Bowen, Ramin Zabih

Important applications such as mobile computing require reducing the computational costs of neural network inference.

channel selection Classification +1

A discriminative view of MRF pre-processing algorithms

no code implementations ICCV 2017 Chen Wang, Charles Herrmann, Ramin Zabih

While Markov Random Fields (MRFs) are widely used in computer vision, they present a quite challenging inference problem.

Cannot find the paper you are looking for? You can Submit a new open access paper.