Search Results for author: Yujun Shen

Found 106 papers, 49 papers with code

BoxDreamer: Dreaming Box Corners for Generalizable Object Pose Estimation

no code implementations10 Apr 2025 Yuanhong Yu, Xingyi He, Chen Zhao, Junhao Yu, Jiaqi Yang, Ruizhen Hu, Yujun Shen, Xing Zhu, Xiaowei Zhou, Sida Peng

This paper presents a generalizable RGB-based approach for object pose estimation, specifically designed to address challenges in sparse-view settings.

Object Pose Estimation

AvatarArtist: Open-Domain 4D Avatarization

no code implementations25 Mar 2025 Hongyu Liu, Xuan Wang, Ziyu Wan, Yue Ma, Jingye Chen, Yanbo Fan, Yujun Shen, Yibing Song, Qifeng Chen

This work focuses on open-domain 4D avatarization, with the purpose of creating a 4D avatar from a portrait image in an arbitrary style.

FLARE: Feed-forward Geometry, Appearance and Camera Estimation from Uncalibrated Sparse Views

no code implementations17 Feb 2025 Shangzhan Zhang, Jianyuan Wang, Yinghao Xu, Nan Xue, Christian Rupprecht, Xiaowei Zhou, Yujun Shen, Gordon Wetzstein

We present FLARE, a feed-forward model designed to infer high-quality camera poses and 3D geometry from uncalibrated sparse-view images (i. e., as few as 2-8 inputs), which is a challenging yet practical setting in real-world applications.

3D geometry Camera Pose Estimation +2

DiffDoctor: Diagnosing Image Diffusion Models Before Treating

no code implementations21 Jan 2025 Yiyang Wang, Xi Chen, Xiaogang Xu, Sihui Ji, Yu Liu, Yujun Shen, Hengshuang Zhao

In spite of the recent progress, image diffusion models still produce artifacts.

MangaNinja: Line Art Colorization with Precise Reference Following

no code implementations14 Jan 2025 Zhiheng Liu, Ka Leong Cheng, Xi Chen, Jie Xiao, Hao Ouyang, Kai Zhu, Yu Liu, Yujun Shen, Qifeng Chen, Ping Luo

Derived from diffusion models, MangaNinjia specializes in the task of reference-guided line art colorization.

Line Art Colorization

Edicho: Consistent Image Editing in the Wild

1 code implementation30 Dec 2024 Qingyan Bai, Hao Ouyang, Yinghao Xu, Qiuyu Wang, Ceyuan Yang, Ka Leong Cheng, Yujun Shen, Qifeng Chen

As a verified need, consistent editing across in-the-wild images remains a technical challenge arising from various unmanageable factors, like object poses, lighting conditions, and photography environments.

Denoising

Prometheus: 3D-Aware Latent Diffusion Models for Feed-Forward Text-to-3D Scene Generation

no code implementations30 Dec 2024 Yuanbo Yang, Jiahao Shao, Xinyang Li, Yujun Shen, Andreas Geiger, Yiyi Liao

In this work, we introduce Prometheus, a 3D-aware latent diffusion model for text-to-3D generation at both object and scene levels in seconds.

3D Generation Scene Generation +2

DepthLab: From Partial to Complete

no code implementations24 Dec 2024 Zhiheng Liu, Ka Leong Cheng, Qiuyu Wang, Shuzhe Wang, Hao Ouyang, Bin Tan, Kai Zhu, Yujun Shen, Qifeng Chen, Ping Luo

Missing values remain a common challenge for depth data across its wide range of applications, stemming from various causes like incomplete data acquisition and perspective alteration.

Depth Completion Missing Values +2

LeviTor: 3D Trajectory Oriented Image-to-Video Synthesis

2 code implementations19 Dec 2024 Hanlin Wang, Hao Ouyang, Qiuyu Wang, Wen Wang, Ka Leong Cheng, Qifeng Chen, Yujun Shen, LiMin Wang

The intuitive nature of drag-based interaction has led to its growing adoption for controlling object trajectories in image-to-video synthesis.

Object

EnvGS: Modeling View-Dependent Appearance with Environment Gaussian

1 code implementation19 Dec 2024 Tao Xie, Xi Chen, Zhen Xu, Yiman Xie, Yudong Jin, Yujun Shen, Sida Peng, Hujun Bao, Xiaowei Zhou

Reconstructing complex reflections in real-world scenes from 2D images is essential for achieving photorealistic novel view synthesis.

Novel View Synthesis

AniDoc: Animation Creation Made Easier

no code implementations18 Dec 2024 Yihao Meng, Hao Ouyang, Hanlin Wang, Qiuyu Wang, Wen Wang, Ka Leong Cheng, Zhiheng Liu, Yujun Shen, Huamin Qu

The production of 2D animation follows an industry-standard workflow, encompassing four essential stages: character design, keyframe animation, in-betweening, and coloring.

Line Art Colorization

Benchmarking Large Vision-Language Models via Directed Scene Graph for Comprehensive Image Captioning

1 code implementation11 Dec 2024 Fan Lu, Wei Wu, Kecheng Zheng, Shuailei Ma, Biao Gong, Jiawei Liu, Wei Zhai, Yang Cao, Yujun Shen, Zheng-Jun Zha

Generating detailed captions comprehending text-rich visual content in images has received growing attention for Large Vision-Language Models (LVLMs).

Attribute Benchmarking +2

Learning Visual Generative Priors without Text

no code implementations10 Dec 2024 Shuailei Ma, Kecheng Zheng, Ying WEI, Wei Wu, Fan Lu, Yifei Zhang, Chen-Wei Xie, Biao Gong, Jiapeng Zhu, Yujun Shen

Although text-to-image (T2I) models have recently thrived as visual generative priors, their reliance on high-quality text-image pairs makes scaling up expensive.

Image to 3D Philosophy

PlanarSplatting: Accurate Planar Surface Reconstruction in 3 Minutes

no code implementations4 Dec 2024 Bin Tan, Rui Yu, Yujun Shen, Nan Xue

We believe that our accurate and ultrafast planar surface reconstruction method will be applied in the structured data curation for surface reconstruction in the future.

3D Plane Detection Surface Reconstruction

Framer: Interactive Frame Interpolation

no code implementations24 Oct 2024 Wen Wang, Qiuyu Wang, Kecheng Zheng, Hao Ouyang, Zhekai Chen, Biao Gong, Hao Chen, Yujun Shen, Chunhua Shen

We propose Framer for interactive frame interpolation, which targets producing smoothly transitioning frames between two images as per user creativity.

Image Morphing Video Generation

Rectified Diffusion Guidance for Conditional Generation

no code implementations24 Oct 2024 Mengfei Xia, Nan Xue, Yujun Shen, Ran Yi, Tieliang Gong, Yong-Jin Liu

Classifier-Free Guidance (CFG), which combines the conditional and unconditional score functions with two coefficients summing to one, serves as a practical technique for diffusion model sampling.

Denoising

LoTLIP: Improving Language-Image Pre-training for Long Text Understanding

no code implementations7 Oct 2024 Wei Wu, Kecheng Zheng, Shuailei Ma, Fan Lu, Yuxin Guo, Yifei Zhang, Wei Chen, Qingpei Guo, Yujun Shen, Zheng-Jun Zha

Then, after incorporating corner tokens to aggregate diverse textual information, we manage to help the model catch up to its original level of short text understanding yet greatly enhance its capability of long text understanding.

Image Classification Image Retrieval

Zero-shot Image Editing with Reference Imitation

1 code implementation11 Jun 2024 Xi Chen, Yutong Feng, Mengting Chen, Yiyang Wang, Shilong Zhang, Yu Liu, Yujun Shen, Hengshuang Zhao

Image editing serves as a practical yet challenging task considering the diverse demands from users, where one of the hardest parts is to precisely describe how the edited image should look like.

Semantic correspondence

GaussianPrediction: Dynamic 3D Gaussian Prediction for Motion Extrapolation and Free View Synthesis

no code implementations30 May 2024 Boming Zhao, Yuan Li, Ziyu Sun, Lin Zeng, Yujun Shen, Rui Ma, yinda zhang, Hujun Bao, Zhaopeng Cui

In this paper, we introduce GaussianPrediction, a novel framework that empowers 3D Gaussian representations with dynamic scene modeling and future scenario synthesis in dynamic environments.

Decision Making Novel View Synthesis +1

MaPa: Text-driven Photorealistic Material Painting for 3D Shapes

no code implementations26 Apr 2024 Shangzhan Zhang, Sida Peng, Tao Xu, Yuanbo Yang, Tianrun Chen, Nan Xue, Yujun Shen, Hujun Bao, Ruizhen Hu, Xiaowei Zhou

Instead of relying on extensive paired data, i. e., 3D meshes with material graphs and corresponding text descriptions, to train a material graph generative model, we propose to leverage the pre-trained 2D diffusion model as a bridge to connect the text and material graphs.

Learning 3D-Aware GANs from Unposed Images with Template Feature Field

no code implementations8 Apr 2024 Xinya Chen, Hanlei Guo, Yanrui Bin, Shangzhan Zhang, Yuanbo Yang, Yue Wang, Yujun Shen, Yiyi Liao

Collecting accurate camera poses of training images has been shown to well serve the learning of 3D-aware generative adversarial networks (GANs) yet can be quite expensive in practice.

Pose Estimation

DreamLIP: Language-Image Pre-training with Long Captions

1 code implementation25 Mar 2024 Kecheng Zheng, Yifei Zhang, Wei Wu, Fan Lu, Shuailei Ma, Xin Jin, Wei Chen, Yujun Shen

Motivated by this, we propose to dynamically sample sub-captions from the text label to construct multiple positive pairs, and introduce a grouping loss to match the embeddings of each sub-caption with its corresponding local image patches in a self-supervised manner.

Contrastive Learning Image-text Retrieval +5

FlashFace: Human Image Personalization with High-fidelity Identity Preservation

1 code implementation25 Mar 2024 Shilong Zhang, Lianghua Huang, Xi Chen, Yifei Zhang, Zhi-Fan Wu, Yutong Feng, Wei Wang, Yujun Shen, Yu Liu, Ping Luo

This work presents FlashFace, a practical tool with which users can easily personalize their own photos on the fly by providing one or a few reference face images and a text prompt.

Face Swapping Instruction Following +1

Contextual AD Narration with Interleaved Multimodal Sequence

1 code implementation19 Mar 2024 Hanlin Wang, Zhan Tong, Kecheng Zheng, Yujun Shen, LiMin Wang

The Audio Description (AD) task aims to generate descriptions of visual elements for visually impaired individuals to help them access long-form video content, like movies.

Real-time 3D-aware Portrait Editing from a Single Image

1 code implementation21 Feb 2024 Qingyan Bai, Zifan Shi, Yinghao Xu, Hao Ouyang, Qiuyu Wang, Ceyuan Yang, Xuan Wang, Gordon Wetzstein, Yujun Shen, Qifeng Chen

Second, thanks to the powerful priors, our module could focus on the learning of editing-related variations, such that it manages to handle various types of editing simultaneously in the training phase and further supports fast adaptation to user-specified customized types of editing during inference (e. g., with ~5min fine-tuning per style).

Towards More Accurate Diffusion Model Acceleration with A Timestep Tuner

no code implementations CVPR 2024 Mengfei Xia, Yujun Shen, Changsong Lei, Yu Zhou, Deli Zhao, Ran Yi, Wenping Wang, Yong-Jin Liu

A diffusion model which is formulated to produce an image using thousands of denoising steps usually suffers from a slow inference speed.

Denoising

A Recipe for Scaling up Text-to-Video Generation with Text-free Videos

1 code implementation CVPR 2024 Xiang Wang, Shiwei Zhang, Hangjie Yuan, Zhiwu Qing, Biao Gong, Yingya Zhang, Yujun Shen, Changxin Gao, Nong Sang

Following such a pipeline, we study the effect of doubling the scale of training set (i. e., video-only WebVid10M) with some randomly collected text-free videos and are encouraged to observe the performance improvement (FID from 9. 67 to 8. 19 and FVD from 484 to 441), demonstrating the scalability of our approach.

Text-to-Image Generation Text-to-Video Generation +2

SAM-guided Graph Cut for 3D Instance Segmentation

no code implementations13 Dec 2023 Haoyu Guo, He Zhu, Sida Peng, Yuang Wang, Yujun Shen, Ruizhen Hu, Xiaowei Zhou

Experimental results on the ScanNet, ScanNet++ and KITTI-360 datasets demonstrate that our method achieves robust segmentation performance and can generalize across different types of scenes.

3D Instance Segmentation Graph Neural Network +2

CCM: Adding Conditional Controls to Text-to-Image Consistency Models

no code implementations12 Dec 2023 Jie Xiao, Kai Zhu, Han Zhang, Zhiheng Liu, Yujun Shen, Yu Liu, Xueyang Fu, Zheng-Jun Zha

Consistency Models (CMs) have showed a promise in creating visual content efficiently and with high quality.

HeadArtist: Text-conditioned 3D Head Generation with Self Score Distillation

no code implementations12 Dec 2023 Hongyu Liu, Xuan Wang, Ziyu Wan, Yujun Shen, Yibing Song, Jing Liao, Qifeng Chen

The noisy image, landmarks, and text condition are then fed into the frozen ControlNet twice for noise prediction.

Learning Naturally Aggregated Appearance for Efficient 3D Editing

1 code implementation11 Dec 2023 Ka Leong Cheng, Qiuyu Wang, Zifan Shi, Kecheng Zheng, Yinghao Xu, Hao Ouyang, Qifeng Chen, Yujun Shen

Neural radiance fields, which represent a 3D scene as a color field and a density field, have demonstrated great progress in novel view synthesis yet are unfavorable for editing due to the implicitness.

Novel View Synthesis

LivePhoto: Real Image Animation with Text-guided Motion Control

no code implementations5 Dec 2023 Xi Chen, Zhiheng Liu, Mengting Chen, Yutong Feng, Yu Liu, Yujun Shen, Hengshuang Zhao

In particular, considering the facts that (1) text can only describe motions roughly (e. g., regardless of the moving speed) and (2) text may include both content and motion descriptions, we introduce a motion intensity estimation module as well as a text re-weighting module to reduce the ambiguity of text-to-motion mapping.

Image Animation Text-to-Video Generation +1

BerfScene: Bev-conditioned Equivariant Radiance Fields for Infinite 3D Scene Generation

no code implementations CVPR 2024 Qihang Zhang, Yinghao Xu, Yujun Shen, Bo Dai, Bolei Zhou, Ceyuan Yang

Generating large-scale 3D scenes cannot simply apply existing 3D object synthesis technique since 3D scenes usually hold complex spatial configurations and consist of a number of objects at varying scales.

Scene Generation

SMaRt: Improving GANs with Score Matching Regularity

no code implementations30 Nov 2023 Mengfei Xia, Yujun Shen, Ceyuan Yang, Ran Yi, Wenping Wang, Yong-Jin Liu

In this work, we revisit the mathematical foundations of GANs, and theoretically reveal that the native adversarial loss for GAN training is insufficient to fix the problem of subsets with positive Lebesgue measure of the generated data manifold lying out of the real data manifold.

valid

Ranni: Taming Text-to-Image Diffusion for Accurate Instruction Following

1 code implementation CVPR 2024 Yutong Feng, Biao Gong, Di Chen, Yujun Shen, Yu Liu, Jingren Zhou

Existing text-to-image (T2I) diffusion models usually struggle in interpreting complex prompts, especially those with quantity, object-attribute binding, and multi-subject descriptions.

Attribute Denoising +1

4K4D: Real-Time 4D View Synthesis at 4K Resolution

no code implementations CVPR 2024 Zhen Xu, Sida Peng, Haotong Lin, Guangzhao He, Jiaming Sun, Yujun Shen, Hujun Bao, Xiaowei Zhou

Experiments show that our representation can be rendered at over 400 FPS on the DNA-Rendering dataset at 1080p resolution and 80 FPS on the ENeRF-Outdoor dataset at 4K resolution using an RTX 4090 GPU, which is 30x faster than previous methods and achieves the state-of-the-art rendering quality.

4k

Towards More Accurate Diffusion Model Acceleration with A Timestep Aligner

no code implementations14 Oct 2023 Mengfei Xia, Yujun Shen, Changsong Lei, Yu Zhou, Ran Yi, Deli Zhao, Wenping Wang, Yong-Jin Liu

By viewing the generation of diffusion models as a discretized integrating process, we argue that the quality drop is partly caused by applying an inaccurate integral direction to a timestep interval.

Denoising

In-Domain GAN Inversion for Faithful Reconstruction and Editability

no code implementations25 Sep 2023 Jiapeng Zhu, Yujun Shen, Yinghao Xu, Deli Zhao, Qifeng Chen, Bolei Zhou

This work fills in this gap by proposing in-domain GAN inversion, which consists of a domain-guided encoder and a domain-regularized optimizer, to regularize the inverted code in the native latent space of the pre-trained GAN model.

Image Generation Image Reconstruction

Exploring Sparse MoE in GANs for Text-conditioned Image Synthesis

1 code implementation7 Sep 2023 Jiapeng Zhu, Ceyuan Yang, Kecheng Zheng, Yinghao Xu, Zifan Shi, Yujun Shen

Due to the difficulty in scaling up, generative adversarial networks (GANs) seem to be falling from grace on the task of text-conditioned image synthesis.

Image Generation Mixture-of-Experts +2

CoDeF: Content Deformation Fields for Temporally Consistent Video Processing

1 code implementation CVPR 2024 Hao Ouyang, Qiuyu Wang, Yuxi Xiao, Qingyan Bai, Juntao Zhang, Kecheng Zheng, Xiaowei Zhou, Qifeng Chen, Yujun Shen

With such a design, CoDeF naturally supports lifting image algorithms for video processing, in the sense that one can apply an image algorithm to the canonical image and effortlessly propagate the outcomes to the entire video with the aid of the temporal deformation field.

Image-to-Image Translation Keypoint Detection +1

Regularized Mask Tuning: Uncovering Hidden Knowledge in Pre-trained Vision-Language Models

no code implementations ICCV 2023 Kecheng Zheng, Wei Wu, Ruili Feng, Kai Zhu, Jiawei Liu, Deli Zhao, Zheng-Jun Zha, Wei Chen, Yujun Shen

To bring the useful knowledge back into light, we first identify a set of parameters that are important to a given downstream task, then attach a binary mask to each parameter, and finally optimize these masks on the downstream data with the parameters frozen.

AnyDoor: Zero-shot Object-level Image Customization

2 code implementations CVPR 2024 Xi Chen, Lianghua Huang, Yu Liu, Yujun Shen, Deli Zhao, Hengshuang Zhao

This work presents AnyDoor, a diffusion-based image generator with the power to teleport target objects to new scenes at user-specified locations in a harmonious way.

Object Virtual Try-on

NEAT: Distilling 3D Wireframes from Neural Attraction Fields

1 code implementation CVPR 2024 Nan Xue, Bin Tan, Yuxi Xiao, Liang Dong, Gui-Song Xia, Tianfu Wu, Yujun Shen

Instead of leveraging matching-based solutions from 2D wireframes (or line segments) for 3D wireframe reconstruction as done in prior arts, we present NEAT, a rendering-distilling formulation using neural fields to represent 3D line segments with 2D observations, and bipartite matching for perceiving and distilling of a sparse set of 3D global junctions.

3D Wireframe Reconstruction Novel View Synthesis

Lipschitz Singularities in Diffusion Models

no code implementations20 Jun 2023 Zhantao Yang, Ruili Feng, Han Zhang, Yujun Shen, Kai Zhu, Lianghua Huang, Yifei Zhang, Yu Liu, Deli Zhao, Jingren Zhou, Fan Cheng

Diffusion models, which employ stochastic differential equations to sample images through integrals, have emerged as a dominant class of generative models.

Using Unreliable Pseudo-Labels for Label-Efficient Semantic Segmentation

1 code implementation4 Jun 2023 Haochen Wang, Yuchao Wang, Yujun Shen, Junsong Fan, Yuxi Wang, Zhaoxiang Zhang

A common practice is to select the highly confident predictions as the pseudo-ground-truths for each pixel, but it leads to a problem that most pixels may be left unused due to their unreliability.

Semantic Segmentation

Balancing Logit Variation for Long-tailed Semantic Segmentation

1 code implementation CVPR 2023 Yuchao Wang, Jingjing Fei, Haochen Wang, Wei Li, Tianpeng Bao, Liwei Wu, Rui Zhao, Yujun Shen

In this way, we manage to close the gap between the feature areas of different categories, resulting in a more balanced representation.

Semantic Segmentation

Cones 2: Customizable Image Synthesis with Multiple Subjects

1 code implementation30 May 2023 Zhiheng Liu, Yifei Zhang, Yujun Shen, Kecheng Zheng, Kai Zhu, Ruili Feng, Yu Liu, Deli Zhao, Jingren Zhou, Yang Cao

Synthesizing images with user-specified subjects has received growing attention due to its practical applications.

Image Generation

Pulling Target to Source: A New Perspective on Domain Adaptive Semantic Segmentation

1 code implementation23 May 2023 Haochen Wang, Yujun Shen, Jingjing Fei, Wei Li, Liwei Wu, Yuxi Wang, Zhaoxiang Zhang

To this end, we propose T2S-DA, which we interpret as a form of pulling Target to Source for Domain Adaptation, encouraging the model in learning similar cross-domain features.

Domain Generalization Semantic Segmentation

VideoFusion: Decomposed Diffusion Models for High-Quality Video Generation

1 code implementation CVPR 2023 Zhengxiong Luo, Dayou Chen, Yingya Zhang, Yan Huang, Liang Wang, Yujun Shen, Deli Zhao, Jingren Zhou, Tieniu Tan

A diffusion probabilistic model (DPM), which constructs a forward diffusion process by gradually adding noise to data points and learns the reverse denoising process to generate new samples, has been shown to handle complex data distribution.

Code Generation Denoising +4

ViM: Vision Middleware for Unified Downstream Transferring

no code implementations ICCV 2023 Yutong Feng, Biao Gong, Jianwen Jiang, Yiliang Lv, Yujun Shen, Deli Zhao, Jingren Zhou

ViM consists of a zoo of lightweight plug-in modules, each of which is independently learned on a midstream dataset with a shared frozen backbone.

Composer: Creative and Controllable Image Synthesis with Composable Conditions

6 code implementations20 Feb 2023 Lianghua Huang, Di Chen, Yu Liu, Yujun Shen, Deli Zhao, Jingren Zhou

Recent large-scale generative models learned on big data are capable of synthesizing incredible images yet suffer from limited controllability.

Image Colorization Image-to-Image Translation +3

Spatial Steerability of GANs via Self-Supervision from Discriminator

no code implementations20 Jan 2023 Jianyuan Wang, Lalit Bhagat, Ceyuan Yang, Yinghao Xu, Yujun Shen, Hongdong Li, Bolei Zhou

In this work, we propose a self-supervised approach to improve the spatial steerability of GANs without searching for steerable directions in the latent space or requiring extra annotations.

Image Generation Inductive Bias +1

Learning 3D-aware Image Synthesis with Unknown Pose Distribution

no code implementations CVPR 2023 Zifan Shi, Yujun Shen, Yinghao Xu, Sida Peng, Yiyi Liao, Sheng Guo, Qifeng Chen, Dit-yan Yeung

Existing methods for 3D-aware image synthesis largely depend on the 3D pose distribution pre-estimated on the training set.

3D-Aware Image Synthesis

GH-Feat: Learning Versatile Generative Hierarchical Features from GANs

no code implementations12 Jan 2023 Yinghao Xu, Yujun Shen, Jiapeng Zhu, Ceyuan Yang, Bolei Zhou

In this work we investigate that such a generative feature learned from image synthesis exhibits great potentials in solving a wide range of computer vision tasks, including both generative ones and more importantly discriminative ones.

Face Verification Image Harmonization +3

LinkGAN: Linking GAN Latents to Pixels for Controllable Image Synthesis

no code implementations ICCV 2023 Jiapeng Zhu, Ceyuan Yang, Yujun Shen, Zifan Shi, Bo Dai, Deli Zhao, Qifeng Chen

This work presents an easy-to-use regularizer for GAN training, which helps explicitly link some axes of the latent space to a set of pixels in the synthesized image.

Image Generation

Towards Smooth Video Composition

1 code implementation14 Dec 2022 Qihang Zhang, Ceyuan Yang, Yujun Shen, Yinghao Xu, Bolei Zhou

Video generation requires synthesizing consistent and persistent frames with dynamic content over time.

Image Generation single-image-generation +2

GLeaD: Improving GANs with A Generator-Leading Task

1 code implementation CVPR 2023 Qingyan Bai, Ceyuan Yang, Yinghao Xu, Xihui Liu, Yujiu Yang, Yujun Shen

Generative adversarial network (GAN) is formulated as a two-player game between a generator (G) and a discriminator (D), where D is asked to differentiate whether an image comes from real data or is produced by G. Under such a formulation, D plays as the rule maker and hence tends to dominate the competition.

domain classification Generative Adversarial Network +1

Dimensionality-Varying Diffusion Process

no code implementations CVPR 2023 Han Zhang, Ruili Feng, Zhantao Yang, Lianghua Huang, Yu Liu, Yifei Zhang, Yujun Shen, Deli Zhao, Jingren Zhou, Fan Cheng

Diffusion models, which learn to reverse a signal destruction process to generate new data, typically require the signal at each step to have the same dimension.

Image Generation

Neural Dependencies Emerging from Learning Massive Categories

no code implementations CVPR 2023 Ruili Feng, Kecheng Zheng, Kai Zhu, Yujun Shen, Jian Zhao, Yukun Huang, Deli Zhao, Jingren Zhou, Michael Jordan, Zheng-Jun Zha

Through investigating the properties of the problem solution, we confirm that neural dependency is guaranteed by a redundant logit covariance matrix, which condition is easily met given massive categories, and that neural dependency is highly sparse, implying that one category correlates to only a few others.

Image Classification

Deep Generative Models on 3D Representations: A Survey

1 code implementation27 Oct 2022 Zifan Shi, Sida Peng, Yinghao Xu, Andreas Geiger, Yiyi Liao, Yujun Shen

In this survey, we thoroughly review the ongoing developments of 3D generative models, including methods that employ 2D and 3D supervision.

3D-Aware Image Synthesis 3D Shape Generation +1

Improving 3D-aware Image Synthesis with A Geometry-aware Discriminator

no code implementations30 Sep 2022 Zifan Shi, Yinghao Xu, Yujun Shen, Deli Zhao, Qifeng Chen, Dit-yan Yeung

We argue that, considering the two-player game in the formulation of GANs, only making the generator 3D-aware is not enough.

3D-Aware Image Synthesis domain classification +3

Improving GANs with A Dynamic Discriminator

no code implementations20 Sep 2022 Ceyuan Yang, Yujun Shen, Yinghao Xu, Deli Zhao, Bo Dai, Bolei Zhou

Two capacity adjusting schemes are developed for training GANs under different data regimes: i) given a sufficient amount of training data, the discriminator benefits from a progressively increased learning capacity, and ii) when the training data is limited, gradually decreasing the layer width mitigates the over-fitting issue of the discriminator.

3D-Aware Image Synthesis Data Augmentation

A Unified Model for Multi-class Anomaly Detection

1 code implementation8 Jun 2022 Zhiyuan You, Lei Cui, Yujun Shen, Kai Yang, Xin Lu, Yu Zheng, Xinyi Le

For example, when learning a unified model for 15 categories in MVTec-AD, we surpass the second competitor on the tasks of both anomaly detection (from 88. 1% to 96. 5%) and anomaly localization (from 89. 5% to 96. 8%).

Anomaly Localization model +2

Interpreting Class Conditional GANs with Channel Awareness

no code implementations21 Mar 2022 Yingqing He, Zhiyi Zhang, Jiapeng Zhu, Yujun Shen, Qifeng Chen

To describe such a phenomenon, we propose channel awareness, which quantitatively characterizes how a single channel contributes to the final synthesis.

High-fidelity GAN Inversion with Padding Space

1 code implementation21 Mar 2022 Qingyan Bai, Yinghao Xu, Jiapeng Zhu, Weihao Xia, Yujiu Yang, Yujun Shen

In this work, we propose to involve the padding space of the generator to complement the latent space with spatial information.

Generative Adversarial Network Image Manipulation +1

Semi-Supervised Semantic Segmentation Using Unreliable Pseudo-Labels

1 code implementation CVPR 2022 Yuchao Wang, Haochen Wang, Yujun Shen, Jingjing Fei, Wei Li, Guoqiang Jin, Liwei Wu, Rui Zhao, Xinyi Le

A common practice is to select the highly confident predictions as the pseudo ground-truth, but it leads to a problem that most pixels may be left unused due to their unreliability.

Semi-Supervised Semantic Segmentation

Region-Based Semantic Factorization in GANs

1 code implementation19 Feb 2022 Jiapeng Zhu, Yujun Shen, Yinghao Xu, Deli Zhao, Qifeng Chen

Despite the rapid advancement of semantic discovery in the latent space of Generative Adversarial Networks (GANs), existing approaches either are limited to finding global attributes or rely on a number of segmentation masks to identify local attributes.

3D-Aware Indoor Scene Synthesis with Depth Priors

no code implementations17 Feb 2022 Zifan Shi, Yujun Shen, Jiapeng Zhu, Dit-yan Yeung, Qifeng Chen

In this way, the discriminator can take the spatial arrangement into account and advise the generator to learn an appropriate depth condition.

3D-Aware Image Synthesis 3D geometry +2

3D-aware Image Synthesis via Learning Structural and Textural Representations

1 code implementation CVPR 2022 Yinghao Xu, Sida Peng, Ceyuan Yang, Yujun Shen, Bolei Zhou

The feature field is further accumulated into a 2D feature map as the textural representation, followed by a neural renderer for appearance synthesis.

3D-Aware Image Synthesis Generative Adversarial Network +1

Cross-Model Pseudo-Labeling for Semi-Supervised Action Recognition

no code implementations CVPR 2022 Yinghao Xu, Fangyun Wei, Xiao Sun, Ceyuan Yang, Yujun Shen, Bo Dai, Bolei Zhou, Stephen Lin

Typically in recent work, the pseudo-labels are obtained by training a model on the labeled data, and then using confident predictions from the model to teach itself.

Action Recognition

Improving GAN Equilibrium by Raising Spatial Awareness

1 code implementation CVPR 2022 Jianyuan Wang, Ceyuan Yang, Yinghao Xu, Yujun Shen, Hongdong Li, Bolei Zhou

We further propose to align the spatial awareness of G with the attention map induced from D. Through this way we effectively lessen the information gap between D and G. Extensive results show that our method pushes the two-player game in GANs closer to the equilibrium, leading to a better synthesis performance.

Attribute Inductive Bias

One-Shot Generative Domain Adaptation

no code implementations ICCV 2023 Ceyuan Yang, Yujun Shen, Zhiyi Zhang, Yinghao Xu, Jiapeng Zhu, Zhirong Wu, Bolei Zhou

We then equip the well-learned discriminator backbone with an attribute classifier to ensure that the generator captures the appropriate characters from the reference.

Attribute Diversity +2

CompConv: A Compact Convolution Module for Efficient Feature Learning

no code implementations19 Jun 2021 Chen Zhang, Yinghao Xu, Yujun Shen

Convolutional Neural Networks (CNNs) have achieved remarkable success in various computer vision tasks but rely on tremendous computational cost.

Glancing at the Patch: Anomaly Localization With Global and Local Feature Comparison

no code implementations CVPR 2021 Shenzhi Wang, Liwei Wu, Lei Cui, Yujun Shen

More concretely, we employ a Local-Net and Global-Net to extract features from any individual patch and its surrounding respectively.

Anomaly Detection Anomaly Localization

Data-Efficient Instance Generation from Instance Discrimination

1 code implementation NeurIPS 2021 Ceyuan Yang, Yujun Shen, Yinghao Xu, Bolei Zhou

Meanwhile, the learned instance discrimination capability from the discriminator is in turn exploited to encourage the generator for diverse generation.

Ranked #11 on Image Generation on FFHQ 256 x 256 (FD metric)

2k Data Augmentation +1

Low-Rank Subspaces in GANs

1 code implementation NeurIPS 2021 Jiapeng Zhu, Ruili Feng, Yujun Shen, Deli Zhao, ZhengJun Zha, Jingren Zhou, Qifeng Chen

Concretely, given an arbitrary image and a region of interest (e. g., eyes of face images), we manage to relate the latent space to the image region with the Jacobian matrix and then use low-rank factorization to discover steerable latent subspaces.

Attribute Generative Adversarial Network

Unsupervised Image Transformation Learning via Generative Adversarial Networks

no code implementations13 Mar 2021 Kaiwen Zha, Yujun Shen, Bolei Zhou

In this work, we study the image transformation problem, which targets at learning the underlying transformations (e. g., the transition of seasons) from a collection of unlabeled images.

Image Generation valid

Improving the Fairness of Deep Generative Models without Retraining

1 code implementation9 Dec 2020 Shuhan Tan, Yujun Shen, Bolei Zhou

Generative Adversarial Networks (GANs) advance face synthesis through learning the underlying distribution of observed data.

Attribute Diversity +4

Generative Hierarchical Features from Synthesizing Images

1 code implementation CVPR 2021 Yinghao Xu, Yujun Shen, Jiapeng Zhu, Ceyuan Yang, Bolei Zhou

Generative Adversarial Networks (GANs) have recently advanced image synthesis by learning the underlying distribution of the observed data.

Face Verification Image Classification +2

Closed-Form Factorization of Latent Semantics in GANs

11 code implementations CVPR 2021 Yujun Shen, Bolei Zhou

A rich set of interpretable dimensions has been shown to emerge in the latent space of the Generative Adversarial Networks (GANs) trained for synthesizing images.

Attribute Form +1

InterFaceGAN: Interpreting the Disentangled Face Representation Learned by GANs

2 code implementations18 May 2020 Yujun Shen, Ceyuan Yang, Xiaoou Tang, Bolei Zhou

In this work, we propose a framework called InterFaceGAN to interpret the disentangled face representation learned by the state-of-the-art GAN models and study the properties of the facial semantics encoded in the latent space.

Attribute Face Generation

In-Domain GAN Inversion for Real Image Editing

2 code implementations ECCV 2020 Jiapeng Zhu, Yujun Shen, Deli Zhao, Bolei Zhou

A common practice of feeding a real image to a trained GAN generator is to invert it back to a latent code.

Image Reconstruction

Residual Knowledge Distillation

no code implementations21 Feb 2020 Mengya Gao, Yujun Shen, Quanquan Li, Chen Change Loy

Knowledge distillation (KD) is one of the most potent ways for model compression.

Knowledge Distillation Model Compression

Image Processing Using Multi-Code GAN Prior

1 code implementation CVPR 2020 Jinjin Gu, Yujun Shen, Bolei Zhou

Such an over-parameterization of the latent space significantly improves the image reconstruction quality, outperforming existing competitors.

Blind Face Restoration Colorization +6

Semantic Hierarchy Emerges in Deep Generative Representations for Scene Synthesis

2 code implementations21 Nov 2019 Ceyuan Yang, Yujun Shen, Bolei Zhou

Despite the success of Generative Adversarial Networks (GANs) in image synthesis, there lacks enough understanding on what generative models have learned inside the deep generative representations and how photo-realistic images are able to be composed of the layer-wise stochasticity introduced in recent GANs.

Image Generation

Semantic Hierarchy Emerges in the Deep Generative Representations for Scene Synthesis

no code implementations25 Sep 2019 Ceyuan Yang, Yujun Shen, Bolei Zhou

Despite the success of Generative Adversarial Networks (GANs) in image synthesis, there lacks enough understanding on what networks have learned inside the deep generative representations and how photo-realistic images are able to be composed from random noises.

Image Generation

Interpreting the Latent Space of GANs for Semantic Face Editing

4 code implementations CVPR 2020 Yujun Shen, Jinjin Gu, Xiaoou Tang, Bolei Zhou

In this work, we propose a novel framework, called InterFaceGAN, for semantic face editing by interpreting the latent semantics learned by GANs.

Attribute Disentanglement +2

An Embarrassingly Simple Approach for Knowledge Distillation

1 code implementation5 Dec 2018 Mengya Gao, Yujun Shen, Quanquan Li, Junjie Yan, Liang Wan, Dahua Lin, Chen Change Loy, Xiaoou Tang

Knowledge Distillation (KD) aims at improving the performance of a low-capacity student model by inheriting knowledge from a high-capacity teacher model.

Face Recognition Knowledge Distillation +3

FaceFeat-GAN: a Two-Stage Approach for Identity-Preserving Face Synthesis

no code implementations4 Dec 2018 Yujun Shen, Bolei Zhou, Ping Luo, Xiaoou Tang

In the second stage, they compete in the image domain to render photo-realistic images that contain high diversity but preserve identity.

Diversity Face Generation +1

FaceID-GAN: Learning a Symmetry Three-Player GAN for Identity-Preserving Face Synthesis

no code implementations CVPR 2018 Yujun Shen, Ping Luo, Junjie Yan, Xiaogang Wang, Xiaoou Tang

Existing methods typically formulate GAN as a two-player game, where a discriminator distinguishes face images from the real and synthesized domains, while a generator reduces its discriminativeness by synthesizing a face of photo-realistic quality.

Face Generation

Cannot find the paper you are looking for? You can Submit a new open access paper.