Search Results for author: Charles Herrmann

Found 26 papers, 6 papers with code

DreamWalk: Style Space Exploration using Diffusion Guidance

no code implementations4 Apr 2024 Michelle Shu, Charles Herrmann, Richard Strong Bowen, Forrester Cole, Ramin Zabih

Text-conditioned diffusion models can generate impressive images, but fall short when it comes to fine-grained control.

Prompt Engineering

Boundary Attention: Learning to Localize Boundaries under High Noise

no code implementations1 Jan 2024 Mia Gaia Polansky, Charles Herrmann, Junhwa Hur, Deqing Sun, Dor Verbin, Todd Zickler

We present a differentiable model that infers explicit boundaries, including curves, corners and junctions, using a mechanism that we call boundary attention.

Zero-Shot Metric Depth with a Field-of-View Conditioned Diffusion Model

no code implementations20 Dec 2023 Saurabh Saxena, Junhwa Hur, Charles Herrmann, Deqing Sun, David J. Fleet

In contrast, we advocate a generic, task-agnostic diffusion model, with several advancements such as log-scale depth parameterization to enable joint modeling of indoor and outdoor scenes, conditioning on the field-of-view (FOV) to handle scale ambiguity and synthetically augmenting FOV during training to generalize beyond the limited camera intrinsics in training datasets.

Denoising Monocular Depth Estimation

DreamSync: Aligning Text-to-Image Generation with Image Understanding Feedback

no code implementations29 Nov 2023 Jiao Sun, Deqing Fu, Yushi Hu, Su Wang, Royi Rassin, Da-Cheng Juan, Dana Alon, Charles Herrmann, Sjoerd van Steenkiste, Ranjay Krishna, Cyrus Rashtchian

Then, it uses two VLMs to select the best generation: a Visual Question Answering model that measures the alignment of generated images to the text, and another that measures the generation's aesthetic quality.

Question Answering Text-to-Image Generation +1

Telling Left from Right: Identifying Geometry-Aware Semantic Correspondence

1 code implementation28 Nov 2023 Junyi Zhang, Charles Herrmann, Junhwa Hur, Eric Chen, Varun Jampani, Deqing Sun, Ming-Hsuan Yang

This paper identifies the importance of being geometry-aware for semantic correspondence and reveals a limitation of the features of current foundation models under simple post-processing.

Animal Pose Estimation Semantic correspondence

ZeroNVS: Zero-Shot 360-Degree View Synthesis from a Single Real Image

no code implementations27 Oct 2023 Kyle Sargent, Zizhang Li, Tanmay Shah, Charles Herrmann, Hong-Xing Yu, Yunzhi Zhang, Eric Ryan Chan, Dmitry Lagun, Li Fei-Fei, Deqing Sun, Jiajun Wu

Further, we observe that Score Distillation Sampling (SDS) tends to truncate the distribution of complex backgrounds during distillation of 360-degree scenes, and propose "SDS anchoring" to improve the diversity of synthesized novel views.

Novel View Synthesis

Substance or Style: What Does Your Image Embedding Know?

no code implementations10 Jul 2023 Cyrus Rashtchian, Charles Herrmann, Chun-Sung Ferng, Ayan Chakrabarti, Dilip Krishnan, Deqing Sun, Da-Cheng Juan, Andrew Tomkins

We find that image-text models (CLIP and ALIGN) are better at recognizing new examples of style transfer than masking-based models (CAN and MAE).

Style Transfer

VQ3D: Learning a 3D-Aware Generative Model on ImageNet

no code implementations ICCV 2023 Kyle Sargent, Jing Yu Koh, Han Zhang, Huiwen Chang, Charles Herrmann, Pratul Srinivasan, Jiajun Wu, Deqing Sun

Recent work has shown the possibility of training generative models of 3D content from 2D image collections on small datasets corresponding to a single object class, such as human faces, animal faces, or cars.

Position

Self-supervised AutoFlow

no code implementations CVPR 2023 Hsin-Ping Huang, Charles Herrmann, Junhwa Hur, Erika Lu, Kyle Sargent, Austin Stone, Ming-Hsuan Yang, Deqing Sun

Recently, AutoFlow has shown promising results on learning a training set for optical flow, but requires ground truth labels in the target domain to compute its search metric.

Optical Flow Estimation

Disentangling Architecture and Training for Optical Flow

no code implementations21 Mar 2022 Deqing Sun, Charles Herrmann, Fitsum Reda, Michael Rubinstein, David Fleet, William T. Freeman

Our newly trained RAFT achieves an Fl-all score of 4. 31% on KITTI 2015, more accurate than all published optical flow methods at the time of writing.

Optical Flow Estimation

Pyramid Adversarial Training Improves ViT Performance

1 code implementation CVPR 2022 Charles Herrmann, Kyle Sargent, Lu Jiang, Ramin Zabih, Huiwen Chang, Ce Liu, Dilip Krishnan, Deqing Sun

In this work, we present pyramid adversarial training (PyramidAT), a simple and effective technique to improve ViT's overall performance.

Ranked #9 on Domain Generalization on ImageNet-C (using extra training data)

Adversarial Attack Data Augmentation +2

OCONet: Image Extrapolation by Object Completion

no code implementations CVPR 2021 Richard Strong Bowen, Huiwen Chang, Charles Herrmann, Piotr Teterwak, Ce Liu, Ramin Zabih

Existing methods struggle to extrapolate images with salient objects in the foreground or are limited to very specific objects such as humans, but tend to work well on indoor/outdoor scenes.

Object

AutoFlow: Learning a Better Training Set for Optical Flow

1 code implementation CVPR 2021 Deqing Sun, Daniel Vlasic, Charles Herrmann, Varun Jampani, Michael Krainin, Huiwen Chang, Ramin Zabih, William T. Freeman, Ce Liu

Synthetic datasets play a critical role in pre-training CNN models for optical flow, but they are painstaking to generate and hard to adapt to new applications.

Optical Flow Estimation

Robust image stitching with multiple registrations

no code implementations ECCV 2018 Charles Herrmann, Chen Wang, Richard Strong Bowen, Emil Keyder, Michael Krainin, Ce Liu, Ramin Zabih

Here, we observe that the use of a single registration often leads to errors, especially in scenes with significant depth variation or object motion.

Image Stitching

Object-centered image stitching

no code implementations ECCV 2018 Charles Herrmann, Chen Wang, Richard Strong Bowen, Emil Keyder, Ramin Zabih

Image stitching is typically decomposed into three phases: registration, which aligns the source images with a common target image; seam finding, which determines for each target pixel the source image it should come from; and blending, which smooths transitions over the seams.

Image Stitching Object +2

Learning to Autofocus

no code implementations CVPR 2020 Charles Herrmann, Richard Strong Bowen, Neal Wadhwa, Rahul Garg, Qiurui He, Jonathan T. Barron, Ramin Zabih

Autofocus is an important task for digital cameras, yet current approaches often exhibit poor performance.

Depth Estimation

Channel selection using Gumbel Softmax

1 code implementation ECCV 2020 Charles Herrmann, Richard Strong Bowen, Ramin Zabih

Important applications such as mobile computing require reducing the computational costs of neural network inference.

Classification General Classification

A discriminative view of MRF pre-processing algorithms

no code implementations ICCV 2017 Chen Wang, Charles Herrmann, Ramin Zabih

While Markov Random Fields (MRFs) are widely used in computer vision, they present a quite challenging inference problem.

Cannot find the paper you are looking for? You can Submit a new open access paper.