Search Results for author: Charles Herrmann

Found 26 papers, 6 papers with code

DreamWalk: Style Space Exploration using Diffusion Guidance

no code implementations • 4 Apr 2024 • Michelle Shu, Charles Herrmann, Richard Strong Bowen, Forrester Cole, Ramin Zabih

Text-conditioned diffusion models can generate impressive images, but fall short when it comes to fine-grained control.

Prompt Engineering

Paper
Add Code

Lumiere: A Space-Time Diffusion Model for Video Generation

no code implementations • 23 Jan 2024 • Omer Bar-Tal, Hila Chefer, Omer Tov, Charles Herrmann, Roni Paiss, Shiran Zada, Ariel Ephrat, Junhwa Hur, Guanghui Liu, Amit Raj, Yuanzhen Li, Michael Rubinstein, Tomer Michaeli, Oliver Wang, Deqing Sun, Tali Dekel, Inbar Mosseri

We introduce Lumiere -- a text-to-video diffusion model designed for synthesizing videos that portray realistic, diverse and coherent motion -- a pivotal challenge in video synthesis.

Ranked #6 on Text-to-Video Generation on UCF-101

Super-Resolution Text-to-Video Generation +3

Paper
Add Code

Efficient Hybrid Zoom using Camera Fusion on Mobile Phones

no code implementations • 2 Jan 2024 • Xiaotong Wu, Wei-Sheng Lai, YiChang Shih, Charles Herrmann, Michael Krainin, Deqing Sun, Chia-Kai Liang

DSLR cameras can achieve multiple zoom levels via shifting lens distances or swapping lens types.

Super-Resolution

Paper
Add Code

Boundary Attention: Learning to Localize Boundaries under High Noise

no code implementations • 1 Jan 2024 • Mia Gaia Polansky, Charles Herrmann, Junhwa Hur, Deqing Sun, Dor Verbin, Todd Zickler

We present a differentiable model that infers explicit boundaries, including curves, corners and junctions, using a mechanism that we call boundary attention.

Paper
Add Code

Zero-Shot Metric Depth with a Field-of-View Conditioned Diffusion Model

no code implementations • 20 Dec 2023 • Saurabh Saxena, Junhwa Hur, Charles Herrmann, Deqing Sun, David J. Fleet

In contrast, we advocate a generic, task-agnostic diffusion model, with several advancements such as log-scale depth parameterization to enable joint modeling of indoor and outdoor scenes, conditioning on the field-of-view (FOV) to handle scale ambiguity and synthetically augmenting FOV during training to generalize beyond the limited camera intrinsics in training datasets.

Denoising Monocular Depth Estimation

Paper
Add Code

WonderJourney: Going from Anywhere to Everywhere

no code implementations • 6 Dec 2023 • Hong-Xing Yu, Haoyi Duan, Junhwa Hur, Kyle Sargent, Michael Rubinstein, William T. Freeman, Forrester Cole, Deqing Sun, Noah Snavely, Jiajun Wu, Charles Herrmann

We introduce WonderJourney, a modularized framework for perpetual 3D scene generation.

Point Cloud Generation Scene Generation

Paper
Add Code

DreamSync: Aligning Text-to-Image Generation with Image Understanding Feedback

no code implementations • 29 Nov 2023 • Jiao Sun, Deqing Fu, Yushi Hu, Su Wang, Royi Rassin, Da-Cheng Juan, Dana Alon, Charles Herrmann, Sjoerd van Steenkiste, Ranjay Krishna, Cyrus Rashtchian

Then, it uses two VLMs to select the best generation: a Visual Question Answering model that measures the alignment of generated images to the text, and another that measures the generation's aesthetic quality.

Question Answering Text-to-Image Generation +1

Paper
Add Code

Telling Left from Right: Identifying Geometry-Aware Semantic Correspondence

1 code implementation • 28 Nov 2023 • Junyi Zhang, Charles Herrmann, Junhwa Hur, Eric Chen, Varun Jampani, Deqing Sun, Ming-Hsuan Yang

This paper identifies the importance of being geometry-aware for semantic correspondence and reveals a limitation of the features of current foundation models under simple post-processing.

Ranked #1 on Semantic correspondence on PF-PASCAL

Animal Pose Estimation Semantic correspondence

Paper
Code

ZeroNVS: Zero-Shot 360-Degree View Synthesis from a Single Real Image

no code implementations • 27 Oct 2023 • Kyle Sargent, Zizhang Li, Tanmay Shah, Charles Herrmann, Hong-Xing Yu, Yunzhi Zhang, Eric Ryan Chan, Dmitry Lagun, Li Fei-Fei, Deqing Sun, Jiajun Wu

Further, we observe that Score Distillation Sampling (SDS) tends to truncate the distribution of complex backgrounds during distillation of 360-degree scenes, and propose "SDS anchoring" to improve the diversity of synthesized novel views.

Novel View Synthesis

Paper
Add Code

Substance or Style: What Does Your Image Embedding Know?

no code implementations • 10 Jul 2023 • Cyrus Rashtchian, Charles Herrmann, Chun-Sung Ferng, Ayan Chakrabarti, Dilip Krishnan, Deqing Sun, Da-Cheng Juan, Andrew Tomkins

We find that image-text models (CLIP and ALIGN) are better at recognizing new examples of style transfer than masking-based models (CAN and MAE).

Style Transfer

Paper
Add Code

The Surprising Effectiveness of Diffusion Models for Optical Flow and Monocular Depth Estimation

no code implementations • NeurIPS 2023 • Saurabh Saxena, Charles Herrmann, Junhwa Hur, Abhishek Kar, Mohammad Norouzi, Deqing Sun, David J. Fleet

Denoising diffusion probabilistic models have transformed image generation with their impressive fidelity and diversity.

Denoising Image Generation +2

Paper
Add Code

A Tale of Two Features: Stable Diffusion Complements DINO for Zero-Shot Semantic Correspondence

1 code implementation • NeurIPS 2023 • Junyi Zhang, Charles Herrmann, Junhwa Hur, Luisa Polania Cabrera, Varun Jampani, Deqing Sun, Ming-Hsuan Yang

Text-to-image diffusion models have made significant advances in generating and editing high-quality images.

Ranked #3 on Semantic correspondence on SPair-71k

Representation Learning Semantic correspondence +1

206

Paper
Code

VQ3D: Learning a 3D-Aware Generative Model on ImageNet

no code implementations • ICCV 2023 • Kyle Sargent, Jing Yu Koh, Han Zhang, Huiwen Chang, Charles Herrmann, Pratul Srinivasan, Jiajun Wu, Deqing Sun

Recent work has shown the possibility of training generative models of 3D content from 2D image collections on small datasets corresponding to a single object class, such as human faces, animal faces, or cars.

Position

Paper
Add Code

Accidental Light Probes

no code implementations • CVPR 2023 • Hong-Xing Yu, Samir Agarwala, Charles Herrmann, Richard Szeliski, Noah Snavely, Jiajun Wu, Deqing Sun

Recovering lighting in a scene from a single image is a fundamental problem in computer vision.

Lighting Estimation

Paper
Add Code

Self-supervised AutoFlow

no code implementations • CVPR 2023 • Hsin-Ping Huang, Charles Herrmann, Junhwa Hur, Erika Lu, Kyle Sargent, Austin Stone, Ming-Hsuan Yang, Deqing Sun

Recently, AutoFlow has shown promising results on learning a training set for optical flow, but requires ground truth labels in the target domain to compute its search metric.

Optical Flow Estimation

Paper
Add Code

Disentangling Architecture and Training for Optical Flow

no code implementations • 21 Mar 2022 • Deqing Sun, Charles Herrmann, Fitsum Reda, Michael Rubinstein, David Fleet, William T. Freeman

Our newly trained RAFT achieves an Fl-all score of 4. 31% on KITTI 2015, more accurate than all published optical flow methods at the time of writing.

Optical Flow Estimation

Paper
Add Code

Kubric: A scalable dataset generator

1 code implementation • CVPR 2022 • Klaus Greff, Francois Belletti, Lucas Beyer, Carl Doersch, Yilun Du, Daniel Duckworth, David J. Fleet, Dan Gnanapragasam, Florian Golemo, Charles Herrmann, Thomas Kipf, Abhijit Kundu, Dmitry Lagun, Issam Laradji, Hsueh-Ti, Liu, Henning Meyer, Yishu Miao, Derek Nowrouzezahrai, Cengiz Oztireli, Etienne Pot, Noha Radwan, Daniel Rebain, Sara Sabour, Mehdi S. M. Sajjadi, Matan Sela, Vincent Sitzmann, Austin Stone, Deqing Sun, Suhani Vora, Ziyu Wang, Tianhao Wu, Kwang Moo Yi, Fangcheng Zhong, Andrea Tagliasacchi

Data is the driving force of machine learning, with the amount and quality of training data often being more important for the performance of a system than architecture and training details.

Fairness Optical Flow Estimation

2,173

Paper
Code

Pyramid Adversarial Training Improves ViT Performance

1 code implementation • CVPR 2022 • Charles Herrmann, Kyle Sargent, Lu Jiang, Ramin Zabih, Huiwen Chang, Ce Liu, Dilip Krishnan, Deqing Sun

In this work, we present pyramid adversarial training (PyramidAT), a simple and effective technique to improve ViT's overall performance.

Ranked #9 on Domain Generalization on ImageNet-C (using extra training data)

Adversarial Attack Data Augmentation +2

2,990

Paper
Code

Deep survival analysis with longitudinal X-rays for COVID-19

no code implementations • ICCV 2021 • Michelle Shu, Richard Strong Bowen, Charles Herrmann, Gengmo Qi, Michele Santacatterina, Ramin Zabih

Time-to-event analysis is an important statistical tool for allocating clinical resources such as ICU beds.

Survival Analysis

Paper
Add Code

OCONet: Image Extrapolation by Object Completion

no code implementations • CVPR 2021 • Richard Strong Bowen, Huiwen Chang, Charles Herrmann, Piotr Teterwak, Ce Liu, Ramin Zabih

Existing methods struggle to extrapolate images with salient objects in the foreground or are limited to very specific objects such as humans, but tend to work well on indoor/outdoor scenes.

Object

Paper
Add Code

AutoFlow: Learning a Better Training Set for Optical Flow

1 code implementation • CVPR 2021 • Deqing Sun, Daniel Vlasic, Charles Herrmann, Varun Jampani, Michael Krainin, Huiwen Chang, Ramin Zabih, William T. Freeman, Ce Liu

Synthetic datasets play a critical role in pre-training CNN models for optical flow, but they are painstaking to generate and hard to adapt to new applications.

Optical Flow Estimation

115

Paper
Code

Robust image stitching with multiple registrations

no code implementations • ECCV 2018 • Charles Herrmann, Chen Wang, Richard Strong Bowen, Emil Keyder, Michael Krainin, Ce Liu, Ramin Zabih

Here, we observe that the use of a single registration often leads to errors, especially in scenes with significant depth variation or object motion.

Image Stitching

Paper
Add Code

Object-centered image stitching

no code implementations • ECCV 2018 • Charles Herrmann, Chen Wang, Richard Strong Bowen, Emil Keyder, Ramin Zabih

Image stitching is typically decomposed into three phases: registration, which aligns the source images with a common target image; seam finding, which determines for each target pixel the source image it should come from; and blending, which smooths transitions over the seams.

Image Stitching Object +2

Paper
Add Code

Learning to Autofocus

no code implementations • CVPR 2020 • Charles Herrmann, Richard Strong Bowen, Neal Wadhwa, Rahul Garg, Qiurui He, Jonathan T. Barron, Ramin Zabih

Autofocus is an important task for digital cameras, yet current approaches often exhibit poor performance.

Depth Estimation

Paper
Add Code

Channel selection using Gumbel Softmax

1 code implementation • ECCV 2020 • Charles Herrmann, Richard Strong Bowen, Ramin Zabih

Important applications such as mobile computing require reducing the computational costs of neural network inference.

Classification General Classification

Paper
Code

A discriminative view of MRF pre-processing algorithms

no code implementations • ICCV 2017 • Chen Wang, Charles Herrmann, Ramin Zabih

While Markov Random Fields (MRFs) are widely used in computer vision, they present a quite challenging inference problem.

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.