no code implementations • ECCV 2020 • Minyoung Huh, Richard Zhang, Jun-Yan Zhu, Sylvain Paris, Aaron Hertzmann
We present a method for projecting an input image into the space of a class-conditional generative neural network.
2 code implementations • 3 Feb 2025 • Rohit Gandikota, Zongze Wu, Richard Zhang, David Bau, Eli Shechtman, Nick Kolkin
Unlike existing control methods that require a user to specify attributes for each edit direction individually, SliderSpace discovers multiple interpretable and diverse directions simultaneously from a single text prompt.
no code implementations • 30 Dec 2024 • Netanel Y. Tamir, Shir Amir, Ranel Itzhaky, Noam Atia, Shobhita Sundaram, Stephanie Fu, Ron Sokolovsky, Phillip Isola, Tali Dekel, Richard Zhang, Miriam Farber
With rapid advancements in virtual reality (VR) headsets, effectively measuring stereoscopic quality of experience (SQoE) has become essential for delivering immersive and comfortable 3D experiences.
no code implementations • 10 Dec 2024 • Tianwei Yin, Qiang Zhang, Richard Zhang, William T. Freeman, Fredo Durand, Eli Shechtman, Xun Huang
Current video diffusion models achieve impressive generation quality but struggle in interactive applications due to bidirectional attention dependencies.
no code implementations • 23 Sep 2024 • Alireza Ganjdanesh, Yan Kang, Yuchen Liu, Richard Zhang, Zhe Lin, Heng Huang
Finally, with a selected configuration, we fine-tune our pruned experts to obtain our mixture of efficient experts.
no code implementations • 14 Aug 2024 • Zongze Wu, Nicholas Kolkin, Jonathan Brandt, Richard Zhang, Eli Shechtman
We address the challenges of precise image inversion and disentangled image editing in the context of few-step diffusion models.
1 code implementation • 13 Jun 2024 • Sheng-Yu Wang, Aaron Hertzmann, Alexei A. Efros, Jun-Yan Zhu, Richard Zhang
The goal of data attribution for text-to-image models is to identify the training images that most influence the generation of a new image.
no code implementations • CVPR 2024 • Yinbo Chen, Oliver Wang, Richard Zhang, Eli Shechtman, Xiaolong Wang, Michael Gharbi
We propose to learn the distribution of continuous images by training diffusion models on image neural fields, which can be rendered at any resolution, and show its advantages over fixed-resolution models.
1 code implementation • 23 May 2024 • Tianwei Yin, Michaël Gharbi, Taesung Park, Richard Zhang, Eli Shechtman, Fredo Durand, William T. Freeman
Recent approaches have shown promises distilling diffusion models into efficient one-step generators.
no code implementations • CVPR 2024 • Cusuh Ham, Matthew Fisher, James Hays, Nicholas Kolkin, Yuchen Liu, Richard Zhang, Tobias Hinz
We present personalized residuals and localized attention-guided sampling for efficient concept-driven generation using text-to-image diffusion models.
no code implementations • 9 May 2024 • Minguk Kang, Richard Zhang, Connelly Barnes, Sylvain Paris, Suha Kwak, Jaesik Park, Eli Shechtman, Jun-Yan Zhu, Taesung Park
We propose a method to distill a complex multistep diffusion model into a single-step conditional GAN student model, dramatically accelerating inference, while preserving image quality.
no code implementations • 24 Apr 2024 • Jiteng Mu, Michaël Gharbi, Richard Zhang, Eli Shechtman, Nuno Vasconcelos, Xiaolong Wang, Taesung Park
In this work, we propose an image representation that promotes spatial editing of input images using a diffusion model.
no code implementations • 18 Apr 2024 • Yotam Nitzan, Zongze Wu, Richard Zhang, Eli Shechtman, Daniel Cohen-Or, Taesung Park, Michaël Gharbi
We demonstrate that our approach is competitive with state-of-the-art inpainting methods in terms of quality and fidelity while providing a 10x speedup for typical user interactions, where the editing mask represents 10% of the image.
no code implementations • 18 Apr 2024 • Nupur Kumari, Grace Su, Richard Zhang, Taesung Park, Eli Shechtman, Jun-Yan Zhu
In this work, we introduce a new task -- enabling explicit control of the object viewpoint in the customization of text-to-image diffusion models.
no code implementations • 18 Apr 2024 • Yiran Xu, Taesung Park, Richard Zhang, Yang Zhou, Eli Shechtman, Feng Liu, Jia-Bin Huang, Difan Liu
We introduce VideoGigaGAN, a new generative VSR model that can produce videos with high-frequency details and temporal consistency.
Ranked #16 on
Video Super-Resolution
on Vid4 - 4x upscaling
(PSNR metric)
no code implementations • 9 Jan 2024 • Xiaojuan Wang, Taesung Park, Yang Zhou, Eli Shechtman, Richard Zhang
We leverage the appearance of the subject from the other source frames in the video, fusing it with a mid-level representation driven by DensePose keypoints and face landmarks.
no code implementations • 7 Dec 2023 • Joanna Materzynska, Josef Sivic, Eli Shechtman, Antonio Torralba, Richard Zhang, Bryan Russell
To avoid overfitting to the new custom motion, we introduce an approach for regularization over videos.
2 code implementations • CVPR 2024 • Tianwei Yin, Michaël Gharbi, Richard Zhang, Eli Shechtman, Fredo Durand, William T. Freeman, Taesung Park
We introduce Distribution Matching Distillation (DMD), a procedure to transform a diffusion model into a one-step image generator with minimal impact on image quality.
no code implementations • 23 Oct 2023 • David C. Epstein, Ishan Jain, Oliver Wang, Richard Zhang
With advancements in AI-generated images coming on a continuous basis, it is increasingly difficult to distinguish traditionally-sourced images (e. g., photos, artwork) from AI-generated ones.
1 code implementation • NeurIPS 2023 • Stephanie Fu, Netanel Tamir, Shobhita Sundaram, Lucy Chai, Richard Zhang, Tali Dekel, Phillip Isola
Furthermore, our metric outperforms both prior learned metrics and recent large vision models on these tasks.
2 code implementations • ICCV 2023 • Sheng-Yu Wang, Alexei A. Efros, Jun-Yan Zhu, Richard Zhang
The problem of data attribution in such models -- which of the images in the training set are most responsible for the appearance of a given generated image -- is a difficult yet important one.
2 code implementations • ICCV 2023 • Nupur Kumari, Bingliang Zhang, Sheng-Yu Wang, Eli Shechtman, Richard Zhang, Jun-Yan Zhu
To achieve this goal, we propose an efficient method of ablating concepts in the pretrained model, i. e., preventing the generation of a target concept.
1 code implementation • CVPR 2023 • Minguk Kang, Jun-Yan Zhu, Richard Zhang, Jaesik Park, Eli Shechtman, Sylvain Paris, Taesung Park
From a technical standpoint, it also marked a drastic change in the favored architecture to design generative image models.
Ranked #16 on
Text-to-Image Generation
on MS COCO
2 code implementations • 6 Feb 2023 • Gaurav Parmar, Krishna Kumar Singh, Richard Zhang, Yijun Li, Jingwan Lu, Jun-Yan Zhu
However, it is still challenging to directly apply these models for editing real images for two reasons.
Ranked #16 on
Text-based Image Editing
on PIE-Bench
2 code implementations • CVPR 2023 • Yotam Nitzan, Michaël Gharbi, Richard Zhang, Taesung Park, Jun-Yan Zhu, Daniel Cohen-Or, Eli Shechtman
First, we note the generator contains a meaningful, pretrained latent space.
2 code implementations • CVPR 2023 • Nupur Kumari, Bingliang Zhang, Richard Zhang, Eli Shechtman, Jun-Yan Zhu
Can we teach a model to quickly acquire a new concept, given a few examples?
no code implementations • 24 Aug 2022 • Yuchen Liu, Zhixin Shu, Yijun Li, Zhe Lin, Richard Zhang, S. Y. Kung
While concatenating GAN inversion and a 3D-aware, noise-to-image GAN is a straight-forward solution, it is inefficient and may lead to noticeable drop in editing quality.
1 code implementation • CVPR 2022 • Gaurav Parmar, Yijun Li, Jingwan Lu, Richard Zhang, Jun-Yan Zhu, Krishna Kumar Singh
We propose a new method to invert and edit such complex images in the latent space of GANs, such as StyleGAN2.
1 code implementation • 24 May 2022 • Difan Liu, Sandesh Shetty, Tobias Hinz, Matthew Fisher, Richard Zhang, Taesung Park, Evangelos Kalogerakis
We present ASSET, a neural architecture for automatically modifying an input high-resolution image according to a user's edits on its semantic segmentation map.
no code implementations • 5 May 2022 • Dave Epstein, Taesung Park, Richard Zhang, Eli Shechtman, Alexei A. Efros
Blobs are differentiably placed onto a feature grid that is decoded into an image by a generative adversarial network.
1 code implementation • 14 Apr 2022 • Lucy Chai, Michael Gharbi, Eli Shechtman, Phillip Isola, Richard Zhang
To take advantage of varied-size data, we introduce continuous-scale training, a process that samples patches at random scales to train a new generator with variable output resolutions.
1 code implementation • CVPR 2022 • Nupur Kumari, Richard Zhang, Eli Shechtman, Jun-Yan Zhu
Can the collective "knowledge" from a large bank of pretrained vision models be leveraged to improve GAN training?
Ranked #1 on
Image Generation
on AFHQ Cat
1 code implementation • CVPR 2022 • William Peebles, Jun-Yan Zhu, Richard Zhang, Antonio Torralba, Alexei A. Efros, Eli Shechtman
We propose GAN-Supervised Learning, a framework for learning discriminative models and their GAN-generated training data jointly end-to-end.
no code implementations • NeurIPS 2021 • Jialun Zhang, Salar Fattahi, Richard Zhang
This over-parameterized regime of matrix factorization significantly slows down the convergence of local search algorithms, from a linear rate with $r=r^{\star}$ to a sublinear rate when $r>r^{\star}$.
1 code implementation • 12 Nov 2021 • Alex Andonian, Taesung Park, Bryan Russell, Phillip Isola, Jun-Yan Zhu, Richard Zhang
Training supervised image synthesis models requires a critic to compare two images: the ground truth to the result.
1 code implementation • ICCV 2021 • Steven Liu, Xiuming Zhang, Zhoutong Zhang, Richard Zhang, Jun-Yan Zhu, Bryan Russell
In this paper, we explore enabling user editing of a category-level NeRF - also known as a conditional radiance field - trained on a shape category.
Ranked #1 on
Novel View Synthesis
on PhotoShape
1 code implementation • CVPR 2021 • Lucy Chai, Jun-Yan Zhu, Eli Shechtman, Phillip Isola, Richard Zhang
Here, we investigate whether such views can be applied to real images to benefit downstream analysis tasks such as image classification.
3 code implementations • CVPR 2022 • Gaurav Parmar, Richard Zhang, Jun-Yan Zhu
Furthermore, we show that if compression is used on real training images, FID can actually improve if the generated images are also subsequently compressed.
3 code implementations • CVPR 2021 • Utkarsh Ojha, Yijun Li, Jingwan Lu, Alexei A. Efros, Yong Jae Lee, Eli Shechtman, Richard Zhang
Training generative models, such as GANs, on a target domain containing limited examples (e. g., 10) can easily result in overfitting.
Ranked #3 on
10-shot image generation
on Babies
1 code implementation • 18 Mar 2021 • Minyoung Huh, Hossein Mobahi, Richard Zhang, Brian Cheung, Pulkit Agrawal, Phillip Isola
We show empirically that our claim holds true on finite width linear and non-linear models on practical learning paradigms and show that on natural data, these are often the solutions that generalize well.
1 code implementation • CVPR 2021 • Ji Lin, Richard Zhang, Frieder Ganz, Song Han, Jun-Yan Zhu
Generative adversarial networks (GANs) have enabled photorealistic image synthesis and editing.
Ranked #1 on
Image Generation
on FFHQ
1 code implementation • 9 Feb 2021 • Pranay Manocha, Zeyu Jin, Richard Zhang, Adam Finkelstein
The DPAM approach of Manocha et al. learns a full-reference metric trained directly on human judgments, and thus correlates well with human perception.
1 code implementation • CVPR 2021 • Tamar Rott Shaham, Michael Gharbi, Richard Zhang, Eli Shechtman, Tomer Michaeli
We introduce a new generator architecture, aimed at fast and efficient high-resolution image-to-image translation.
no code implementations • NeurIPS 2020 • Yijun Li, Richard Zhang, Jingwan Lu, Eli Shechtman
Few-shot image generation seeks to generate more data of a given domain, with only few available training examples.
Ranked #4 on
10-shot image generation
on Babies
no code implementations • NeurIPS 2020 • Jialun Zhang, Richard Zhang
Optimizing the threshold over regions of the landscape, we see that, for initial points not too close to the ground truth, a linear improvement in the quality of the initial guess amounts to a constant factor improvement in the sample complexity.
10 code implementations • 30 Jul 2020 • Taesung Park, Alexei A. Efros, Richard Zhang, Jun-Yan Zhu
Furthermore, we draw negatives from within the input image itself, rather than from the rest of the dataset.
4 code implementations • NeurIPS 2020 • Taesung Park, Jun-Yan Zhu, Oliver Wang, Jingwan Lu, Eli Shechtman, Alexei A. Efros, Richard Zhang
Deep generative models have become increasingly effective at producing realistic images from randomly sampled seeds, but using such models for controllable manipulation of existing images remains challenging.
2 code implementations • 4 May 2020 • Minyoung Huh, Richard Zhang, Jun-Yan Zhu, Sylvain Paris, Aaron Hertzmann
We present a method for projecting an input image into the space of a class-conditional generative neural network.
1 code implementation • 29 Apr 2020 • Noa Fish, Richard Zhang, Lilach Perry, Daniel Cohen-Or, Eli Shechtman, Connelly Barnes
In image morphing, a sequence of plausible frames are synthesized and composited together to form a smooth transformation between given instances.
1 code implementation • 13 Jan 2020 • Pranay Manocha, Adam Finkelstein, Zeyu Jin, Nicholas J. Bryan, Richard Zhang, Gautham J. Mysore
Assessment of many audio processing tasks relies on subjective evaluation which is time-consuming and expensive.
5 code implementations • CVPR 2020 • Sheng-Yu Wang, Oliver Wang, Richard Zhang, Andrew Owens, Alexei A. Efros
In this work we ask whether it is possible to create a "universal" detector for telling apart real images from these generated by a CNN, regardless of architecture or dataset used.
1 code implementation • ICCV 2019 • Arnab Ghosh, Richard Zhang, Puneet K. Dokania, Oliver Wang, Alexei A. Efros, Philip H. S. Torr, Eli Shechtman
We propose an interactive GAN-based sketch-to-image translation method that helps novice users create images of simple objects.
2 code implementations • ICCV 2019 • Sheng-Yu Wang, Oliver Wang, Andrew Owens, Richard Zhang, Alexei A. Efros
Most malicious photo manipulations are created using standard image editing tools, such as Adobe Photoshop.
7 code implementations • 25 Apr 2019 • Richard Zhang
The well-known signal processing fix is anti-aliasing by low-pass filtering before downsampling.
Ranked #26 on
Domain Generalization
on VizWiz-Classification
1 code implementation • CVPR 2020 • Dmitriy Smirnov, Matthew Fisher, Vladimir G. Kim, Richard Zhang, Justin Solomon
Many tasks in graphics and vision demand machinery for converting shapes into consistent representations with sparse sets of parameters; these representations facilitate rendering, editing, and storage.
4 code implementations • ICLR 2019 • Alex X. Lee, Richard Zhang, Frederik Ebert, Pieter Abbeel, Chelsea Finn, Sergey Levine
However, learning to predict raw future observations, such as frames in a video, is exceedingly challenging -- the ambiguous nature of the problem can cause a naively designed model to average together possible futures into a single, blurry prediction.
Ranked #1 on
Video Prediction
on KTH
(Cond metric)
24 code implementations • CVPR 2018 • Richard Zhang, Phillip Isola, Alexei A. Efros, Eli Shechtman, Oliver Wang
We systematically evaluate deep features across different architectures and tasks and compare them with classic metrics.
Ranked #19 on
Video Quality Assessment
on MSU FR VQA Database
no code implementations • ICLR 2018 • Alex X. Lee, Frederik Ebert, Richard Zhang, Chelsea Finn, Pieter Abbeel, Sergey Levine
In this paper, we study the problem of multi-step video prediction, where the goal is to predict a sequence of future frames conditioned on a short context.
7 code implementations • NeurIPS 2017 • Jun-Yan Zhu, Richard Zhang, Deepak Pathak, Trevor Darrell, Alexei A. Efros, Oliver Wang, Eli Shechtman
Our proposed method encourages bijective consistency between the latent encoding and output modes.
3 code implementations • 8 May 2017 • Richard Zhang, Jun-Yan Zhu, Phillip Isola, Xinyang Geng, Angela S. Lin, Tianhe Yu, Alexei A. Efros
The system directly maps a grayscale image, along with sparse, local user "hints" to an output colorization with a Convolutional Neural Network (CNN).
2 code implementations • CVPR 2017 • Richard Zhang, Phillip Isola, Alexei A. Efros
We propose split-brain autoencoders, a straightforward modification of the traditional autoencoder architecture, for unsupervised representation learning.
Ranked #139 on
Self-Supervised Image Classification
on ImageNet
Representation Learning
Self-Supervised Image Classification
+1
39 code implementations • 28 Mar 2016 • Richard Zhang, Phillip Isola, Alexei A. Efros
We embrace the underlying uncertainty of the problem by posing it as a classification task and use class-rebalancing at training time to increase the diversity of colors in the result.
Ranked #141 on
Self-Supervised Image Classification
on ImageNet