Search Results for author: Qifeng Chen

Found 139 papers, 86 papers with code

CI-AVSR: A Cantonese Audio-Visual Speech Datasetfor In-car Command Recognition

1 code implementation • LREC 2022 • Wenliang Dai, Samuel Cahyawijaya, Tiezheng Yu, Elham J. Barezi, Peng Xu, Cheuk Tung Yiu, Rita Frieske, Holy Lovenia, Genta Winata, Qifeng Chen, Xiaojuan Ma, Bertram Shi, Pascale Fung

With the rise of deep learning and intelligent vehicles, the smart assistant has become an essential in-car component to facilitate driving and provide extra functionalities.

Audio-Visual Speech Recognition speech-recognition +1

Paper
Code

Latent Guard: a Safety Framework for Text-to-image Generation

2 code implementations • 11 Apr 2024 • Runtao Liu, Ashkan Khakzar, Jindong Gu, Qifeng Chen, Philip Torr, Fabio Pizzati

Hence, we propose Latent Guard, a framework designed to improve safety measures in text-to-image generation.

Contrastive Learning Text-to-Image Generation

140

Paper
Code

Automatic Controllable Colorization via Imagination

no code implementations • 8 Apr 2024 • Xiaoyan Cong, Yue Wu, Qifeng Chen, Chenyang Lei

Unlike most previous end-to-end automatic colorization algorithms, our framework allows for iterative and localized modifications of the colorization results because we explicitly model the coloring samples.

Colorization Image Generation

Paper
Add Code

Robust Depth Enhancement via Polarization Prompt Fusion Tuning

no code implementations • 5 Apr 2024 • Kei Ikemura, Yiming Huang, Felix Heide, Zhaoxiang Zhang, Qifeng Chen, Chenyang Lei

Existing depth sensors are imperfect and may provide inaccurate depth values in challenging scenarios, such as in the presence of transparent or reflective objects.

Paper
Add Code

OV9D: Open-Vocabulary Category-Level 9D Object Pose and Size Estimation

no code implementations • 19 Mar 2024 • Junhao Cai, Yisheng He, Weihao Yuan, Siyu Zhu, Zilong Dong, Liefeng Bo, Qifeng Chen

Derived from OmniObject3D, OO3D-9D is the largest and most diverse dataset in the field of category-level object pose and size estimation.

Object

Paper
Add Code

Follow-Your-Click: Open-domain Regional Image Animation via Short Prompts

2 code implementations • 13 Mar 2024 • Yue Ma, Yingqing He, Hongfa Wang, Andong Wang, Chenyang Qi, Chengfei Cai, Xiu Li, Zhifeng Li, Heung-Yeung Shum, Wei Liu, Qifeng Chen

Despite recent advances in image-to-video generation, better controllability and local animation are less explored.

Image Animation Image to Video Generation

712

Paper
Code

Cross-Cluster Shifting for Efficient and Effective 3D Object Detection in Autonomous Driving

no code implementations • 10 Mar 2024 • Zhili Chen, Kien T. Pham, Maosheng Ye, Zhiqiang Shen, Qifeng Chen

We present a new 3D point-based detector model, named Shift-SSD, for precise 3D object detection in autonomous driving.

3D Object Detection Autonomous Driving +2

Paper
Add Code

Seeing and Hearing: Open-domain Visual-Audio Generation with Diffusion Latent Aligners

no code implementations • 27 Feb 2024 • Yazhou Xing, Yingqing He, Zeyue Tian, Xintao Wang, Qifeng Chen

Thus, instead of training the giant models from scratch, we propose to bridge the existing strong models with a shared latent representation space.

Audio Generation Denoising

Paper
Add Code

Real-time 3D-aware Portrait Editing from a Single Image

no code implementations • 21 Feb 2024 • Qingyan Bai, Zifan Shi, Yinghao Xu, Hao Ouyang, Qiuyu Wang, Ceyuan Yang, Xuan Wang, Gordon Wetzstein, Yujun Shen, Qifeng Chen

This work presents 3DPE, a practical method that can efficiently edit a face image following given prompts, like reference images or text descriptions, in a 3D-aware manner.

Paper
Add Code

Make a Cheap Scaling: A Self-Cascade Diffusion Model for Higher-Resolution Adaptation

1 code implementation • 16 Feb 2024 • Lanqing Guo, Yingqing He, Haoxin Chen, Menghan Xia, Xiaodong Cun, YuFei Wang, Siyu Huang, Yong Zhang, Xintao Wang, Qifeng Chen, Ying Shan, Bihan Wen

Diffusion models have proven to be highly effective in image and video generation; however, they still face composition challenges when generating images of varying sizes due to single-scale training data.

Video Generation

Paper
Code

Using Left and Right Brains Together: Towards Vision and Language Planning

no code implementations • 16 Feb 2024 • Jun Cen, Chenfei Wu, Xiao Liu, Shengming Yin, Yixuan Pei, Jinglong Yang, Qifeng Chen, Nan Duan, JianGuo Zhang

Large Language Models (LLMs) and Large Multi-modality Models (LMMs) have demonstrated remarkable decision masking capabilities on a variety of tasks.

Paper
Add Code

ENTED: Enhanced Neural Texture Extraction and Distribution for Reference-based Blind Face Restoration

no code implementations • 13 Jan 2024 • Yuen-Fui Lau, Tianjia Zhang, Zhefan Rao, Qifeng Chen

The latent code extracted from the degraded input image often contains corrupted features, making it difficult to align the semantic information from the input with the high-quality textures from the reference.

Blind Face Restoration Quantization

Paper
Add Code

DiffSHEG: A Diffusion-Based Approach for Real-Time Speech-driven Holistic 3D Expression and Gesture Generation

no code implementations • 9 Jan 2024 • Junming Chen, Yunfei Liu, Jianan Wang, Ailing Zeng, Yu Li, Qifeng Chen

We propose DiffSHEG, a Diffusion-based approach for Speech-driven Holistic 3D Expression and Gesture generation with arbitrary length.

Computational Efficiency Gesture Generation

Paper
Add Code

MagicScroll: Nontypical Aspect-Ratio Image Generation for Visual Storytelling via Multi-Layered Semantic-Aware Denoising

no code implementations • 18 Dec 2023 • Bingyuan Wang, Hengyu Meng, Zeyu Cai, Lanjiong Li, Yue Ma, Qifeng Chen, Zeyu Wang

Visual storytelling often uses nontypical aspect-ratio images like scroll paintings, comic strips, and panoramas to create an expressive and compelling narrative.

Denoising Image Generation +1

Paper
Add Code

TIP: Text-Driven Image Processing with Semantic and Restoration Instructions

no code implementations • 18 Dec 2023 • Chenyang Qi, Zhengzhong Tu, Keren Ye, Mauricio Delbracio, Peyman Milanfar, Qifeng Chen, Hossein Talebi

Text-driven diffusion models have become increasingly popular for various image editing tasks, including inpainting, stylization, and object replacement.

Deblurring Denoising +2

Paper
Add Code

HeadArtist: Text-conditioned 3D Head Generation with Self Score Distillation

no code implementations • 12 Dec 2023 • Hongyu Liu, Xuan Wang, Ziyu Wan, Yujun Shen, Yibing Song, Jing Liao, Qifeng Chen

The noisy image, landmarks, and text condition are then fed into the frozen ControlNet twice for noise prediction.

Paper
Add Code

Learning Naturally Aggregated Appearance for Efficient 3D Editing

1 code implementation • 11 Dec 2023 • Ka Leong Cheng, Qiuyu Wang, Zifan Shi, Kecheng Zheng, Yinghao Xu, Hao Ouyang, Qifeng Chen, Yujun Shen

Neural radiance fields, which represent a 3D scene as a color field and a density field, have demonstrated great progress in novel view synthesis yet are unfavorable for editing due to the implicitness.

Novel View Synthesis

Paper
Code

MagicStick: Controllable Video Editing via Control Handle Transformations

1 code implementation • 5 Dec 2023 • Yue Ma, Xiaodong Cun, Yingqing He, Chenyang Qi, Xintao Wang, Ying Shan, Xiu Li, Qifeng Chen

Yet succinct, our method is the first method to show the ability of video property editing from the pre-trained text-to-image model.

Video Editing Video Generation

Paper
Code

LDM-ISP: Enhancing Neural ISP for Low Light with Latent Diffusion Models

no code implementations • 2 Dec 2023 • Qiang Wen, Yazhou Xing, Zhefan Rao, Qifeng Chen

Specifically, to tailor the pre-trained latent diffusion model to operate on the RAW domain, we train a set of lightweight taming modules to inject the RAW information into the diffusion denoising process via modulating the intermediate features of UNet.

Denoising Image Generation +1

Paper
Add Code

Gaussian Shell Maps for Efficient 3D Human Generation

no code implementations • 29 Nov 2023 • Rameen Abdal, Wang Yifan, Zifan Shi, Yinghao Xu, Ryan Po, Zhengfei Kuang, Qifeng Chen, Dit-yan Yeung, Gordon Wetzstein

Instead of rasterizing the shells directly, we sample 3D Gaussians on the shells whose attributes are encoded in the texture features.

Paper
Add Code

TextDiffuser-2: Unleashing the Power of Language Models for Text Rendering

no code implementations • 28 Nov 2023 • Jingye Chen, Yupan Huang, Tengchao Lv, Lei Cui, Qifeng Chen, Furu Wei

The diffusion model has been proven a powerful generative model in recent years, yet remains a challenge in generating visual text.

Language Modelling Large Language Model +1

Paper
Add Code

PPAD: Iterative Interactions of Prediction and Planning for End-to-end Autonomous Driving

no code implementations • 14 Nov 2023 • Zhili Chen, Maosheng Ye, Shuangjie Xu, Tongyi Cao, Qifeng Chen

Unlike existing end-to-end autonomous driving frameworks, PPAD models the interactions among ego, agents, and the dynamic environment in an autoregressive manner by interleaving the Prediction and Planning processes at every timestep, instead of a single sequential process of prediction followed by planning.

Autonomous Driving Motion Planning +1

Paper
Add Code

VideoCrafter1: Open Diffusion Models for High-Quality Video Generation

3 code implementations • 30 Oct 2023 • Haoxin Chen, Menghan Xia, Yingqing He, Yong Zhang, Xiaodong Cun, Shaoshu Yang, Jinbo Xing, Yaofang Liu, Qifeng Chen, Xintao Wang, Chao Weng, Ying Shan

The I2V model is designed to produce videos that strictly adhere to the content of the provided reference image, preserving its content, structure, and style.

Ranked #3 on Text-to-Video Generation on EvalCrafter Text-to-Video (ECTV) Dataset (using extra training data)

Text-to-Video Generation Video Generation

4,061

Paper
Code

ControlLLM: Augment Language Models with Tools by Searching on Graphs

1 code implementation • 26 Oct 2023 • Zhaoyang Liu, Zeqiang Lai, Zhangwei Gao, Erfei Cui, Ziheng Li, Xizhou Zhu, Lewei Lu, Qifeng Chen, Yu Qiao, Jifeng Dai, Wenhai Wang

We present ControlLLM, a novel framework that enables large language models (LLMs) to utilize multi-modal tools for solving complex real-world tasks.

Scheduling

161

Paper
Code

ScaleCrafter: Tuning-free Higher-Resolution Visual Generation with Diffusion Models

1 code implementation • 11 Oct 2023 • Yingqing He, Shaoshu Yang, Haoxin Chen, Xiaodong Cun, Menghan Xia, Yong Zhang, Xintao Wang, Ran He, Qifeng Chen, Ying Shan

Our work also suggests that a pre-trained diffusion model trained on low-resolution images can be directly used for high-resolution visual generation without further tuning, which may provide insights for future research on ultra-high-resolution image and video synthesis.

Image Generation

434

Paper
Code

In-Domain GAN Inversion for Faithful Reconstruction and Editability

no code implementations • 25 Sep 2023 • Jiapeng Zhu, Yujun Shen, Yinghao Xu, Deli Zhao, Qifeng Chen, Bolei Zhou

This work fills in this gap by proposing in-domain GAN inversion, which consists of a domain-guided encoder and a domain-regularized optimizer, to regularize the inverted code in the native latent space of the pre-trained GAN model.

Image Generation Image Reconstruction

Paper
Add Code

AniPortraitGAN: Animatable 3D Portrait Generation from 2D Image Collections

no code implementations • 5 Sep 2023 • Yue Wu, Sicheng Xu, Jianfeng Xiang, Fangyun Wei, Qifeng Chen, Jiaolong Yang, Xin Tong

For the new task, we base our method on the generative radiance manifold representation and equip it with learnable facial and head-shoulder deformations.

Paper
Add Code

Online Overexposed Pixels Hallucination in Videos with Adaptive Reference Frame Selection

no code implementations • 29 Aug 2023 • Yazhou Xing, Amrita Mazumdar, Anjul Patney, Chao Liu, Hongxu Yin, Qifeng Chen, Jan Kautz, Iuri Frosio

We present a learning-based system to reduce these artifacts without resorting to complex acquisition mechanisms like alternating exposures or costly processing that are typical of high dynamic range (HDR) imaging.

Hallucination

Paper
Add Code

CoDeF: Content Deformation Fields for Temporally Consistent Video Processing

1 code implementation • 15 Aug 2023 • Hao Ouyang, Qiuyu Wang, Yuxi Xiao, Qingyan Bai, Juntao Zhang, Kecheng Zheng, Xiaowei Zhou, Qifeng Chen, Yujun Shen

We present the content deformation field CoDeF as a new type of video representation, which consists of a canonical content field aggregating the static contents in the entire video and a temporal deformation field recording the transformations from the canonical image (i. e., rendered from the canonical content field) to each individual frame along the time axis. Given a target video, these two fields are jointly optimized to reconstruct it through a carefully tailored rendering pipeline. We advisedly introduce some regularizations into the optimization process, urging the canonical content field to inherit semantics (e. g., the object shape) from the video. With such a design, CoDeF naturally supports lifting image algorithms for video processing, in the sense that one can apply an image algorithm to the canonical image and effortlessly propagate the outcomes to the entire video with the aid of the temporal deformation field. We experimentally show that CoDeF is able to lift image-to-image translation to video-to-video translation and lift keypoint detection to keypoint tracking without any training. More importantly, thanks to our lifting strategy that deploys the algorithms on only one image, we achieve superior cross-frame consistency in processed videos compared to existing video-to-video translation approaches, and even manage to track non-rigid objects like water and smog. Project page can be found at https://qiuyu96. github. io/CoDeF/.

Image-to-Image Translation Keypoint Detection +1

4,756

Paper
Code

Animate-A-Story: Storytelling with Retrieval-Augmented Video Generation

1 code implementation • 13 Jul 2023 • Yingqing He, Menghan Xia, Haoxin Chen, Xiaodong Cun, Yuan Gong, Jinbo Xing, Yong Zhang, Xintao Wang, Chao Weng, Ying Shan, Qifeng Chen

For the first module, we leverage an off-the-shelf video retrieval system and extract video depths as motion structure.

Retrieval Video Generation +2

234

Paper
Code

CMDFusion: Bidirectional Fusion Network with Cross-modality Knowledge Distillation for LIDAR Semantic Segmentation

1 code implementation • 9 Jul 2023 • Jun Cen, Shiwei Zhang, Yixuan Pei, Kun Li, Hang Zheng, Maochun Luo, Yingya Zhang, Qifeng Chen

In this way, RGB images are not required during inference anymore since the 2D knowledge branch provides 2D information according to the 3D LIDAR input.

Autonomous Vehicles Knowledge Distillation +2

Paper
Code

SAD: Segment Any RGBD

1 code implementation • 23 May 2023 • Jun Cen, Yizheng Wu, Kewei Wang, Xingyi Li, Jingkang Yang, Yixuan Pei, Lingdong Kong, Ziwei Liu, Qifeng Chen

The Segment Anything Model (SAM) has demonstrated its effectiveness in segmenting any part of 2D RGB images.

Open Vocabulary Semantic Segmentation Panoptic Segmentation +1

721

Paper
Code

TextDiffuser: Diffusion Models as Text Painters

no code implementations • NeurIPS 2023 • Jingye Chen, Yupan Huang, Tengchao Lv, Lei Cui, Qifeng Chen, Furu Wei

Diffusion models have gained increasing attention for their impressive generation abilities but currently struggle with rendering accurate and coherent text.

Optical Character Recognition (OCR)

Paper
Add Code

Follow Your Pose: Pose-Guided Text-to-Video Generation using Pose-Free Videos

1 code implementation • 3 Apr 2023 • Yue Ma, Yingqing He, Xiaodong Cun, Xintao Wang, Siran Chen, Ying Shan, Xiu Li, Qifeng Chen

Generating text-editable and pose-controllable character videos have an imperious demand in creating various digital human.

Text-to-Image Generation Text-to-Video Generation +1

1,007

Paper
Code

Real-time 6K Image Rescaling with Rate-distortion Optimization

1 code implementation • CVPR 2023 • Chenyang Qi, Xin Yang, Ka Leong Cheng, Ying-Cong Chen, Qifeng Chen

Then, an efficient frequency-aware decoder reconstructs a high-fidelity HR image from the LR one in real time.

Image Reconstruction Image Restoration +2

Paper
Code

Enlarging Instance-specific and Class-specific Information for Open-set Action Recognition

1 code implementation • CVPR 2023 • Jun Cen, Shiwei Zhang, Xiang Wang, Yixuan Pei, Zhiwu Qing, Yingya Zhang, Qifeng Chen

In this paper, we begin with analyzing the feature representation behavior in the open-set action recognition (OSAR) problem based on the information bottleneck (IB) theory, and propose to enlarge the instance-specific (IS) and class-specific (CS) information contained in the feature for better performance.

Open Set Action Recognition

Paper
Code

Rotating without Seeing: Towards In-hand Dexterity through Touch

no code implementations • 20 Mar 2023 • Zhao-Heng Yin, Binghao Huang, Yuzhe Qin, Qifeng Chen, Xiaolong Wang

Relying on touch-only sensing, we can directly deploy the policy in a real robot hand and rotate novel objects that are not presented in training.

Object

Paper
Add Code

FateZero: Fusing Attentions for Zero-shot Text-based Video Editing

1 code implementation • ICCV 2023 • Chenyang Qi, Xiaodong Cun, Yong Zhang, Chenyang Lei, Xintao Wang, Ying Shan, Qifeng Chen

We also have a better zero-shot shape-aware editing ability based on the text-to-video model.

Attribute Text-to-Video Editing +2

1,041

Paper
Code

Blind Video Deflickering by Neural Filtering with a Flawed Atlas

1 code implementation • CVPR 2023 • Chenyang Lei, Xuanchi Ren, Zhaoxiang Zhang, Qifeng Chen

Prior work usually requires specific guidance such as the flickering frequency, manual annotations, or extra consistent videos to remove the flicker.

Video Generation Video Temporal Consistency

650

Paper
Code

Human MotionFormer: Transferring Human Motions with Vision Transformers

1 code implementation • 22 Feb 2023 • Hongyu Liu, Xintong Han, ChengBin Jin, Lihui Qian, Huawei Wei, Zhe Lin, Faqiang Wang, Haoye Dong, Yibing Song, Jia Xu, Qifeng Chen

In this paper, we propose Human MotionFormer, a hierarchical ViT framework that leverages global and local perceptions to capture large and subtle motion matching, respectively.

Motion Synthesis

Paper
Code

Video Waterdrop Removal via Spatio-Temporal Fusion in Driving Scenes

1 code implementation • 12 Feb 2023 • Qiang Wen, Yue Wu, Qifeng Chen

The waterdrops on windshields during driving can cause severe visual obstructions, which may lead to car accidents.

Autonomous Driving

Paper
Code

The Devil is in the Wrongly-classified Samples: Towards Unified Open-set Recognition

1 code implementation • 8 Feb 2023 • Jun Cen, Di Luan, Shiwei Zhang, Yixuan Pei, Yingya Zhang, Deli Zhao, Shaojie Shen, Qifeng Chen

Recently, Unified Open-set Recognition (UOSR) has been proposed to reject not only unknown samples but also known but wrongly classified samples, which tends to be more practical in real-world applications.

Open Set Learning

Paper
Code

Learning 3D-aware Image Synthesis with Unknown Pose Distribution

no code implementations • CVPR 2023 • Zifan Shi, Yujun Shen, Yinghao Xu, Sida Peng, Yiyi Liao, Sheng Guo, Qifeng Chen, Dit-yan Yeung

Existing methods for 3D-aware image synthesis largely depend on the 3D pose distribution pre-estimated on the training set.

3D-Aware Image Synthesis

Paper
Add Code

LinkGAN: Linking GAN Latents to Pixels for Controllable Image Synthesis

no code implementations • ICCV 2023 • Jiapeng Zhu, Ceyuan Yang, Yujun Shen, Zifan Shi, Bo Dai, Deli Zhao, Qifeng Chen

This work presents an easy-to-use regularizer for GAN training, which helps explicitly link some axes of the latent space to a set of pixels in the synthesized image.

Image Generation

Paper
Add Code

Randomized Quantization: A Generic Augmentation for Data Agnostic Self-supervised Learning

1 code implementation • ICCV 2023 • Huimin Wu, Chenyang Lei, Xiao Sun, Peng-Shuai Wang, Qifeng Chen, Kwang-Ting Cheng, Stephen Lin, Zhirong Wu

Self-supervised representation learning follows a paradigm of withholding some part of the data and tasking the network to predict it from the remaining part.

Data Augmentation Quantization +2

Paper
Code

MetaPortrait: Identity-Preserving Talking Head Generation with Fast Personalized Adaptation

1 code implementation • CVPR 2023 • BoWen Zhang, Chenyang Qi, Pan Zhang, Bo Zhang, HsiangTao Wu, Dong Chen, Qifeng Chen, Yong Wang, Fang Wen

In this work, we propose an ID-preserving talking head generation framework, which advances previous methods in two aspects.

Face Swapping Meta-Learning +1

493

Paper
Code

Rodin: A Generative Model for Sculpting 3D Digital Avatars Using Diffusion

no code implementations • CVPR 2023 • Tengfei Wang, Bo Zhang, Ting Zhang, Shuyang Gu, Jianmin Bao, Tadas Baltrusaitis, Jingjing Shen, Dong Chen, Fang Wen, Qifeng Chen, Baining Guo

This paper presents a 3D generative model that uses diffusion models to automatically generate 3D digital avatars represented as neural radiance fields.

Computational Efficiency

Paper
Add Code

High-fidelity 3D GAN Inversion by Pseudo-multi-view Optimization

1 code implementation • CVPR 2023 • Jiaxin Xie, Hao Ouyang, Jingtan Piao, Chenyang Lei, Qifeng Chen

We present a high-fidelity 3D generative adversarial network (GAN) inversion framework that can synthesize photo-realistic novel views while preserving specific details of the input image.

Attribute Generative Adversarial Network +2

193

Paper
Code

Latent Video Diffusion Models for High-Fidelity Long Video Generation

1 code implementation • 23 Nov 2022 • Yingqing He, Tianyu Yang, Yong Zhang, Ying Shan, Qifeng Chen

Diffusion models have shown remarkable results recently but require significant computational resources.

Ranked #2 on Video Generation on Taichi

Denoising Image Generation +3

406

Paper
Code

Delving StyleGAN Inversion for Image Editing: A Foundation Latent Space Viewpoint

1 code implementation • CVPR 2023 • Hongyu Liu, Yibing Song, Qifeng Chen

In this work, we propose to first obtain the precise latent code in foundation latent space $\mathcal{W}$.

Contrastive Learning

Paper
Code

DYNAFED: Tackling Client Data Heterogeneity with Global Dynamics

2 code implementations • CVPR 2023 • Renjie Pi, Weizhong Zhang, Yueqi Xie, Jiahui Gao, Xiaoyu Wang, Sunghun Kim, Qifeng Chen

Specifically, we first reserve a short trajectory of global model snapshots on the server.

Federated Learning

1,159

Paper
Code

Robust Federated Learning against both Data Heterogeneity and Poisoning Attack via Aggregation Optimization

no code implementations • 10 Nov 2022 • Yueqi Xie, Weizhong Zhang, Renjie Pi, Fangzhao Wu, Qifeng Chen, Xing Xie, Sunghun Kim

Since at each round, the number of tunable parameters optimized on the server side equals the number of participating clients (thus independent of the model size), we are able to train a global model with massive parameters using only a small amount of proxy data (e. g., around one hundred samples).

Federated Learning

Paper
Add Code

Robust Reflection Removal with Flash-only Cues in the Wild

1 code implementation • 5 Nov 2022 • Chenyang Lei, Xudong Jiang, Qifeng Chen

We propose a simple yet effective reflection-free cue for robust reflection removal from a pair of flash and ambient (no-flash) images.

Reflection Removal

214

Paper
Code

Planning for Sample Efficient Imitation Learning

1 code implementation • 18 Oct 2022 • Zhao-Heng Yin, Weirui Ye, Qifeng Chen, Yang Gao

Inspired by the recent success of EfficientZero in RL, we propose EfficientImitate (EI), a planning-based imitation learning method that can achieve high in-environment sample efficiency and performance simultaneously.

Imitation Learning

Paper
Code

One Model to Edit Them All: Free-Form Text-Driven Image Manipulation with Semantic Modulations

1 code implementation • 14 Oct 2022 • Yiming Zhu, Hongyu Liu, Yibing Song, Ziyang Yuan, Xintong Han, Chun Yuan, Qifeng Chen, Jue Wang

Based on the visual latent space of StyleGAN[21] and text embedding space of CLIP[34], studies focus on how to map these two latent spaces for text-driven attribute manipulations.

Attribute Image Manipulation

Paper
Code

AniFaceGAN: Animatable 3D-Aware Face Image Generation for Video Avatars

1 code implementation • 12 Oct 2022 • Yue Wu, Yu Deng, Jiaolong Yang, Fangyun Wei, Qifeng Chen, Xin Tong

To achieve meaningful control over facial expressions via deformation, we propose a 3D-level imitative learning scheme between the generator and a parametric 3D face model during adversarial training of the 3D-aware GAN.

Disentanglement Face Model +1

Paper
Code

Federated Domain Generalization for Image Recognition via Cross-Client Style Transfer

1 code implementation • 3 Oct 2022 • Junming Chen, Meirui Jiang, Qi Dou, Qifeng Chen

Our style representation is exceptionally lightweight and can hardly be used for the reconstruction of the dataset.

Domain Generalization Federated Learning +1

Paper
Code

Improving 3D-aware Image Synthesis with A Geometry-aware Discriminator

no code implementations • 30 Sep 2022 • Zifan Shi, Yinghao Xu, Yujun Shen, Deli Zhao, Qifeng Chen, Dit-yan Yeung

We argue that, considering the two-player game in the formulation of GANs, only making the generator 3D-aware is not enough.

3D-Aware Image Synthesis domain classification +2

Paper
Add Code

A Portable Multiscopic Camera for Novel View and Time Synthesis in Dynamic Scenes

no code implementations • 30 Aug 2022 • Tianjia Zhang, Yuen-Fui Lau, Qifeng Chen

We present a portable multiscopic camera system with a dedicated model for novel view and time synthesis in dynamic scenes.

Paper
Add Code

Optimizing Image Compression via Joint Learning with Denoising

1 code implementation • 22 Jul 2022 • Ka Leong Cheng, Yueqi Xie, Qifeng Chen

The key is to transform the original noisy images to noise-free bits by eliminating the undesired noise during compression, where the bits are later decompressed as clean images.

Denoising Image Compression

Paper
Code

Real-time Streaming Video Denoising with Bidirectional Buffers

1 code implementation • 14 Jul 2022 • Chenyang Qi, Junming Chen, Xin Yang, Qifeng Chen

Recent multi-output inference works propagate the bidirectional temporal feature with a parallel or recurrent framework, which either suffers from performance drops on the temporal edges of clips or can not achieve online inference.

Ranked #1 on Video Denoising on CRVD

Denoising Video Denoising

Paper
Code

Optimizing Video Prediction via Video Frame Interpolation

1 code implementation • CVPR 2022 • Yue Wu, Qiang Wen, Qifeng Chen

Extensive experiments on the Cityscapes, KITTI, DAVIS, Middlebury, and Vimeo90K datasets show that our video prediction results are robust in general scenarios, and our approach outperforms other video prediction methods that require a large amount of training data or extra semantic information.

Open-Ended Question Answering Video Frame Interpolation +1

Paper
Code

Pretraining is All You Need for Image-to-Image Translation

2 code implementations • 25 May 2022 • Tengfei Wang, Ting Zhang, Bo Zhang, Hao Ouyang, Dong Chen, Qifeng Chen, Fang Wen

We propose to use pretraining to boost general image-to-image translation.

Ranked #1 on Sketch-to-Image Translation on COCO-Stuff

Image-to-Image Translation Sketch-to-Image Translation +2

471

Paper
Code

Point Cloud Compression with Sibling Context and Surface Priors

1 code implementation • 2 May 2022 • Zhili Chen, Zian Qian, Sukai Wang, Qifeng Chen

We present a novel octree-based multi-level framework for large-scale point cloud compression, which can organize sparse and unstructured point clouds in a memory-efficient way.

Paper
Code

Real-Time Neural Character Rendering with Pose-Guided Multiplane Images

1 code implementation • 25 Apr 2022 • Hao Ouyang, Bo Zhang, Pan Zhang, Hao Yang, Jiaolong Yang, Dong Chen, Qifeng Chen, Fang Wen

We propose pose-guided multiplane image (MPI) synthesis which can render an animatable character in real scenes with photorealistic quality.

Image-to-Image Translation Neural Rendering +1

Paper
Code

Bootstrap Motion Forecasting With Self-Consistent Constraints

no code implementations • ICCV 2023 • Maosheng Ye, Jiamiao Xu, Xunnong Xu, Tengfei Wang, Tongyi Cao, Qifeng Chen

Also, to model the multi-modality in motion forecasting, we design a novel self-ensembling scheme to obtain accurate teacher targets to enforce the self-constraints with multi-modality supervision.

Ranked #9 on Motion Forecasting on Argoverse CVPR 2020

Motion Forecasting

Paper
Add Code

FS6D: Few-Shot 6D Pose Estimation of Novel Objects

1 code implementation • CVPR 2022 • Yisheng He, Yao Wang, Haoqiang Fan, Jian Sun, Qifeng Chen

6D object pose estimation networks are limited in their capability to scale to large numbers of object instances due to the close-set assumption and their reliance on high-fidelity object CAD models.

6D Pose Estimation 6D Pose Estimation using RGB +1

Paper
Code

Interpreting Class Conditional GANs with Channel Awareness

no code implementations • 21 Mar 2022 • Yingqing He, Zhiyi Zhang, Jiapeng Zhu, Yujun Shen, Qifeng Chen

To describe such a phenomenon, we propose channel awareness, which quantitatively characterizes how a single channel contributes to the final synthesis.

Paper
Add Code

Towards Self-Supervised Category-Level Object Pose and Size Estimation

no code implementations • 6 Mar 2022 • Yisheng He, Haoqiang Fan, Haibin Huang, Qifeng Chen, Jian Sun

Instead, we propose a label-free method that learns to enforce the geometric consistency between category template mesh and observed object point cloud under a self-supervision manner.

Paper
Add Code

Region-Based Semantic Factorization in GANs

1 code implementation • 19 Feb 2022 • Jiapeng Zhu, Yujun Shen, Yinghao Xu, Deli Zhao, Qifeng Chen

Despite the rapid advancement of semantic discovery in the latent space of Generative Adversarial Networks (GANs), existing approaches either are limited to finding global attributes or rely on a number of segmentation masks to identify local attributes.

Paper
Code

3D-Aware Indoor Scene Synthesis with Depth Priors

no code implementations • 17 Feb 2022 • Zifan Shi, Yujun Shen, Jiapeng Zhu, Dit-yan Yeung, Qifeng Chen

In this way, the discriminator can take the spatial arrangement into account and advise the generator to learn an appropriate depth condition.

3D-Aware Image Synthesis Indoor Scene Synthesis

Paper
Add Code

Deep Video Prior for Video Consistency and Propagation

1 code implementation • 27 Jan 2022 • Chenyang Lei, Yazhou Xing, Hao Ouyang, Qifeng Chen

A progressive propagation strategy with pseudo labels is also proposed to enhance DVP's performance on video propagation.

Optical Flow Estimation Semantic Segmentation +2

318

Paper
Code

CI-AVSR: A Cantonese Audio-Visual Speech Dataset for In-car Command Recognition

1 code implementation • 11 Jan 2022 • Wenliang Dai, Samuel Cahyawijaya, Tiezheng Yu, Elham J. Barezi, Peng Xu, Cheuk Tung Shadow Yiu, Rita Frieske, Holy Lovenia, Genta Indra Winata, Qifeng Chen, Xiaojuan Ma, Bertram E. Shi, Pascale Fung

With the rise of deep learning and intelligent vehicle, the smart assistant has become an essential in-car component to facilitate driving and provide extra functionalities.

Audio-Visual Speech Recognition speech-recognition +1

Paper
Code

Automatic Speech Recognition Datasets in Cantonese: A Survey and New Dataset

1 code implementation • LREC 2022 • Tiezheng Yu, Rita Frieske, Peng Xu, Samuel Cahyawijaya, Cheuk Tung Shadow Yiu, Holy Lovenia, Wenliang Dai, Elham J. Barezi, Qifeng Chen, Xiaojuan Ma, Bertram E. Shi, Pascale Fung

We further conduct experiments with Fairseq S2T Transformer, a state-of-the-art ASR model, on the biggest existing dataset, Common Voice zh-HK, and our proposed MDCC, and the results show the effectiveness of our dataset.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +3

Paper
Code

Shape from Polarization for Complex Scenes in the Wild

1 code implementation • CVPR 2022 • Chenyang Lei, Chenyang Qi, Jiaxin Xie, Na Fan, Vladlen Koltun, Qifeng Chen

We present a new data-driven approach with physics-based priors to scene-level normal estimation from a single polarization image.

Paper
Code

ASCEND: A Spontaneous Chinese-English Dataset for Code-switching in Multi-turn Conversation

2 code implementations • LREC 2022 • Holy Lovenia, Samuel Cahyawijaya, Genta Indra Winata, Peng Xu, Xu Yan, Zihan Liu, Rita Frieske, Tiezheng Yu, Wenliang Dai, Elham J. Barezi, Qifeng Chen, Xiaojuan Ma, Bertram E. Shi, Pascale Fung

ASCEND (A Spontaneous Chinese-English Dataset) is a high-quality Mandarin Chinese-English code-switching corpus built on spontaneous multi-turn conversational dialogue sources collected in Hong Kong.

122

Paper
Code

DRINet++: Efficient Voxel-as-point Point Cloud Segmentation

no code implementations • 16 Nov 2021 • Maosheng Ye, Rui Wan, Shuangjie Xu, Tongyi Cao, Qifeng Chen

The Sparse Feature Encoder extracts the local context information for each point, and the Sparse Geometry Feature Enhancement enhances the geometric properties of a sparse point cloud via multi-scale sparse projection and attentive multi-scale fusion.

Point Cloud Segmentation Segmentation +1

Paper
Add Code

Physics Assisted Deep Learning for Indoor Imaging using Phaseless Wi-Fi Measurements

no code implementations • 4 Nov 2021 • Samruddhi Deshmukh, Amartansh Dubey, Dingfei Ma, Qifeng Chen, Ross Murch

Thus, our proposed method is the first inverse scattering-based deep learning framework which can image large scatterers with high permittivity and achieve accurate indoor RF imaging using phaseless Wi-Fi measurements.

Paper
Add Code

High-Fidelity GAN Inversion for Image Attribute Editing

1 code implementation • CVPR 2022 • Tengfei Wang, Yong Zhang, Yanbo Fan, Jue Wang, Qifeng Chen

With a low bit-rate latent code, previous works have difficulties in preserving high-fidelity details in reconstructed and edited images.

Attribute Generative Adversarial Network +2

450

Paper
Code

IICNet: A Generic Framework for Reversible Image Conversion

1 code implementation • ICCV 2021 • Ka Leong Cheng, Yueqi Xie, Qifeng Chen

Reversible image conversion (RIC) aims to build a reversible transformation between specific visual content (e. g., short videos) and an embedding image, where the original content can be restored from the embedding when necessary.

Paper
Code

Safety-aware Motion Prediction with Unseen Vehicles for Autonomous Driving

1 code implementation • ICCV 2021 • Xuanchi Ren, Tao Yang, Li Erran Li, Alexandre Alahi, Qifeng Chen

The ability to predict unseen vehicles is critical for safety in autonomous driving.

Autonomous Driving motion prediction +1

Paper
Code

Dual-Camera Super-Resolution with Aligned Attention Modules

2 code implementations • ICCV 2021 • Tengfei Wang, Jiaxin Xie, Wenxiu Sun, Qiong Yan, Qifeng Chen

We present a novel approach to reference-based super-resolution (RefSR) with the focus on dual-camera super-resolution (DCSR), which utilizes reference images for high-quality and high-fidelity results.

Domain Adaptation Reference-based Super-Resolution

136

Paper
Code

Embedding Novel Views in a Single JPEG Image

1 code implementation • ICCV 2021 • Yue Wu, Guotao Meng, Qifeng Chen

We propose a novel approach for embedding novel views in a single JPEG image while preserving the perceptual fidelity of the modified JPEG image and the restored novel views.

Novel View Synthesis

Paper
Code

Towards Photorealistic Colorization by Imagination

no code implementations • 20 Aug 2021 • Chenyang Lei, Yue Wu, Qifeng Chen

We present a novel approach to automatic image colorization by imitating the imagination process of human experts.

Colorization Image Colorization +1

Paper
Add Code

DRINet: A Dual-Representation Iterative Learning Network for Point Cloud Segmentation

no code implementations • ICCV 2021 • Maosheng Ye, Shuangjie Xu, Tongyi Cao, Qifeng Chen

By utilizing these two modules iteratively, features can be propagated between two different representations.

Point Cloud Classification Point Cloud Segmentation +1

Paper
Add Code

Joint Depth and Normal Estimation from Real-world Time-of-flight Raw Data

no code implementations • 8 Aug 2021 • Rongrong Gao, Na Fan, Changlin Li, Wentao Liu, Qifeng Chen

We present a novel approach to joint depth and normal estimation for time-of-flight (ToF) sensors.

Paper
Add Code

Enhanced Invertible Encoding for Learned Image Compression

1 code implementation • 8 Aug 2021 • Yueqi Xie, Ka Leong Cheng, Qifeng Chen

Although deep learning based image compression methods have achieved promising progress these days, the performance of these methods still cannot match the latest compression standard Versatile Video Coding (VVC).

Image Compression

117

Paper
Code

A Categorized Reflection Removal Dataset with Diverse Real-world Scenes

no code implementations • 7 Aug 2021 • Chenyang Lei, Xuhua Huang, Chenyang Qi, Yankun Zhao, Wenxiu Sun, Qiong Yan, Qifeng Chen

Due to the lack of a large-scale reflection removal dataset with diverse real-world scenes, many existing reflection removal methods are trained on synthetic data plus a small amount of real-world data, which makes it difficult to evaluate the strengths or weaknesses of different reflection removal methods thoroughly.

Reflection Removal

Paper
Add Code

Stereo Waterdrop Removal with Row-wise Dilated Attention

1 code implementation • 7 Aug 2021 • Zifan Shi, Na Fan, Dit-yan Yeung, Qifeng Chen

Thus, we propose a learning-based model for waterdrop removal with stereo images.

Autonomous Driving

Paper
Code

Unsupervised Portrait Shadow Removal via Generative Priors

1 code implementation • 7 Aug 2021 • Yingqing He, Yazhou Xing, Tianjia Zhang, Qifeng Chen

Qualitative and quantitative experiments on a real-world portrait shadow dataset demonstrate that our approach achieves comparable performance with supervised shadow removal methods.

Shadow Removal Unsupervised Semantic Segmentation

Paper
Code

MFuseNet: Robust Depth Estimation with Learned Multiscopic Fusion

no code implementations • 5 Aug 2021 • Weihao Yuan, Rui Fan, Michael Yu Wang, Qifeng Chen

We design a multiscopic vision system that utilizes a low-cost monocular RGB camera to acquire accurate depth estimation.

Depth Estimation Stereo Matching

Paper
Add Code

Internal Video Inpainting by Implicit Long-range Propagation

1 code implementation • ICCV 2021 • Hao Ouyang, Tengfei Wang, Qifeng Chen

We propose a novel framework for video inpainting by adopting an internal learning strategy.

4k Object +2

242

Paper
Code

Video Super-Resolution with Long-Term Self-Exemplars

no code implementations • 24 Jun 2021 • Guotao Meng, Yue Wu, Sijin Li, Qifeng Chen

Existing video super-resolution methods often utilize a few neighboring frames to generate a higher-resolution image for each frame.

Video Super-Resolution

Paper
Add Code

SinIR: Efficient General Image Manipulation with Single Image Reconstruction

1 code implementation • 14 Jun 2021 • Jihyeong Yoo, Qifeng Chen

We train our model on a single image with cascaded multi-scale learning, where each network at each scale is responsible for image reconstruction.

Denoising Image Manipulation +3

Paper
Code

Low-Rank Subspaces in GANs

1 code implementation • NeurIPS 2021 • Jiapeng Zhu, Ruili Feng, Yujun Shen, Deli Zhao, ZhengJun Zha, Jingren Zhou, Qifeng Chen

Concretely, given an arbitrary image and a region of interest (e. g., eyes of face images), we manage to relate the latent space to the image region with the Jacobian matrix and then use low-rank factorization to discover steerable latent subspaces.

Attribute Generative Adversarial Network

121

Paper
Code

Image Inpainting with External-internal Learning and Monochromic Bottleneck

1 code implementation • CVPR 2021 • Tengfei Wang, Hao Ouyang, Qifeng Chen

Although recent inpainting approaches have demonstrated significant improvements with deep neural networks, they still suffer from artifacts such as blunt structures and abrupt colors when filling in the missing regions.

Image Inpainting

104

Paper
Code

Neural Camera Simulators

1 code implementation • CVPR 2021 • Hao Ouyang, Zifan Shi, Chenyang Lei, Ka Lung Law, Qifeng Chen

To facilitate the learning of a simulator model, we collect a dataset of the 10, 000 raw images of 450 scenes with different exposure settings.

Data Augmentation

Paper
Code

Stereo Matching by Self-supervision of Multiscopic Vision

no code implementations • 9 Apr 2021 • Weihao Yuan, Yazhan Zhang, Bingkun Wu, Siyu Zhu, Ping Tan, Michael Yu Wang, Qifeng Chen

Self-supervised learning for depth estimation possesses several advantages over supervised learning.

Depth Estimation Self-Supervised Learning +1

Paper
Add Code

Invertible Image Signal Processing

1 code implementation • CVPR 2021 • Yazhou Xing, Zian Qian, Qifeng Chen

Unprocessed RAW data is a highly valuable image format for image editing and computer vision.

327

Paper
Code

Involution: Inverting the Inherence of Convolution for Visual Recognition

13 code implementations • CVPR 2021 • Duo Li, Jie Hu, Changhu Wang, Xiangtai Li, Qi She, Lei Zhu, Tong Zhang, Qifeng Chen

Convolution has been the core ingredient of modern neural networks, triggering the surge of deep learning in vision.

Ranked #703 on Image Classification on ImageNet

Image Classification

5,251

Paper
Code

Robust Reflection Removal with Reflection-free Flash-only Cues

1 code implementation • CVPR 2021 • Chenyang Lei, Qifeng Chen

The flash-only image is equivalent to an image taken in a dark environment with only a flash on.

Reflection Removal SSIM

214

Paper
Code

Learning to Predict Vehicle Trajectories with Model-based Planning

no code implementations • 6 Mar 2021 • Haoran Song, Di Luan, Wenchao Ding, Michael Yu Wang, Qifeng Chen

Predicting the future trajectories of on-road vehicles is critical for autonomous driving.

Motion Forecasting

Paper
Add Code

TPCN: Temporal Point Cloud Networks for Motion Forecasting

no code implementations • CVPR 2021 • Maosheng Ye, Tongyi Cao, Qifeng Chen

We propose the Temporal Point Cloud Networks (TPCN), a novel and flexible framework with joint spatial and temporal learning for trajectory prediction.

Ranked #52 on Motion Forecasting on Argoverse CVPR 2020

Motion Forecasting Trajectory Prediction

Paper
Add Code

FFB6D: A Full Flow Bidirectional Fusion Network for 6D Pose Estimation

3 code implementations • CVPR 2021 • Yisheng He, Haibin Huang, Haoqiang Fan, Qifeng Chen, Jian Sun

Moreover, at the output representation stage, we designed a simple but effective 3D keypoints selection algorithm considering the texture and geometry information of objects, which simplifies keypoint localization for precise pose estimation.

Ranked #1 on 6D Pose Estimation on LineMOD

6D Pose Estimation Representation Learning

473

Paper
Code

Robust Federated Learning with Attack-Adaptive Aggregation

1 code implementation • 10 Feb 2021 • Ching Pui Wan, Qifeng Chen

To the best of our knowledge, our aggregation strategy is the first one that can be adapted to defend against various attacks in a data-driven fashion.

Federated Learning Model Poisoning

Paper
Code

Normalized Human Pose Features for Human Action Video Alignment

no code implementations • ICCV 2021 • Jingyuan Liu, Mingyi Shi, Qifeng Chen, Hongbo Fu, Chiew-Lan Tai

We present a novel approach for extracting human pose features from human action videos.

Action Recognition Metric Learning +3

Paper
Add Code

Video Deblurring by Fitting to Test Data

1 code implementation • 9 Dec 2020 • Xuanchi Ren, Zian Qian, Qifeng Chen

Our key observation is that some frames in a video with motion blur are much sharper than others, and thus we can transfer the texture information in those sharp frames to blurry frames.

Autonomous Vehicles Deblurring

Paper
Code

Evaluating adversarial robustness in simulated cerebellum

no code implementations • 5 Dec 2020 • Liu Yuezhang, Bo Li, Qifeng Chen

It is well known that artificial neural networks are vulnerable to adversarial examples, in which great efforts have been made to improve the robustness.

Adversarial Robustness

Paper
Add Code

Blind Video Temporal Consistency via Deep Video Prior

2 code implementations • NeurIPS 2020 • Chenyang Lei, Yazhou Xing, Qifeng Chen

Extensive quantitative and perceptual experiments show that our approach obtains superior performance than state-of-the-art methods on blind video temporal consistency.

Colorization Image Dehazing +4

318

Paper
Code

Self-supervised Object Tracking with Cycle-consistent Siamese Networks

1 code implementation • 3 Aug 2020 • Weihao Yuan, Michael Yu Wang, Qifeng Chen

Self-supervised learning for visual object tracking possesses valuable advantages compared to supervised learning, such as the non-necessity of laborious human annotations and online training.

Object Region Proposal +5

Paper
Code

Fully Convolutional Networks for Continuous Sign Language Recognition

no code implementations • ECCV 2020 • Ka Leong Cheng, Zhaoyang Yang, Qifeng Chen, Yu-Wing Tai

Continuous sign language recognition (SLR) is a challenging task that requires learning on both spatial and temporal dimensions of signing frame sequences.

Sentence Sign Language Recognition

Paper
Add Code

PSConv: Squeezing Feature Pyramid into One Compact Poly-Scale Convolutional Layer

1 code implementation • ECCV 2020 • Duo Li, Anbang Yao, Qifeng Chen

Despite their strong modeling capacities, Convolutional Neural Networks (CNNs) are often scale-sensitive.

Representation Learning

175

Paper
Code

Learning to Learn Parameterized Classification Networks for Scalable Input Images

1 code implementation • ECCV 2020 • Duo Li, Anbang Yao, Qifeng Chen

To achieve efficient and flexible image classification at runtime, we employ meta learners to generate convolutional weights of main networks for various input scales and maintain privatized Batch Normalization layers per scale.

Classification General Classification +2

Paper
Code

Deep Reinforced Attention Learning for Quality-Aware Visual Recognition

no code implementations • ECCV 2020 • Duo Li, Qifeng Chen

In this paper, we build upon the weakly-supervised generation mechanism of intermediate attention maps in any convolutional neural networks and disclose the effectiveness of attention modules more straightforwardly to fully exploit their potential.

Paper
Add Code

Depth Sensing Beyond LiDAR Range

no code implementations • CVPR 2020 • Kai Zhang, Jiaxin Xie, Noah Snavely, Qifeng Chen

Depth sensing is a critical component of autonomous driving technologies, but today's LiDAR- or stereo camera-based solutions have limited range.

Autonomous Driving

Paper
Add Code

Future Video Synthesis with Object Motion Prediction

1 code implementation • CVPR 2020 • Yue Wu, Rongrong Gao, Jaesik Park, Qifeng Chen

We present an approach to predict future video frames given a sequence of continuous video frames in the past.

Ranked #2 on Video Prediction on Cityscapes (using extra training data)

motion prediction Object +1

Paper
Code

Polarized Reflection Removal with Perfect Alignment in the Wild

1 code implementation • CVPR 2020 • Chenyang Lei, Xuhua Huang, Mengdi Zhang, Qiong Yan, Wenxiu Sun, Qifeng Chen

We present a novel formulation to removing reflection from polarized images in the wild.

Image Enhancement Reflection Removal

Paper
Code

PiP: Planning-informed Trajectory Prediction for Autonomous Driving

1 code implementation • ECCV 2020 • Haoran Song, Wenchao Ding, Yuxuan Chen, Shaojie Shen, Michael Yu Wang, Qifeng Chen

Moreover, our approach enables a novel pipeline which couples the prediction and planning, by conditioning PiP on multiple candidate trajectories of the ego vehicle, which is highly beneficial for autonomous driving in interactive scenarios.

Autonomous Driving Future prediction +1

165

Paper
Code

Dynamic Hierarchical Mimicking Towards Consistent Optimization Objectives

1 code implementation • CVPR 2020 • Duo Li, Qifeng Chen

While the depth of modern Convolutional Neural Networks (CNNs) surpasses that of the pioneering networks with a significant margin, the traditional way of appending supervision only over the final classifier and progressively propagating gradient flow upstream remains the training mainstay.

Paper
Code

Active Perception with A Monocular Camera for Multiscopic Vision

1 code implementation • 22 Jan 2020 • Weihao Yuan, Rui Fan, Michael Yu Wang, Qifeng Chen

We design a multiscopic vision system that utilizes a low-cost monocular RGB camera to acquire accurate depth estimation for robotic applications.

Depth Estimation Stereo Matching

Paper
Code

Video Depth Estimation by Fusing Flow-to-Depth Proposals

1 code implementation • 30 Dec 2019 • Jiaxin Xie, Chenyang Lei, Zhuwen Li, Li Erran Li, Qifeng Chen

Our flow-to-depth layer is differentiable, and thus we can refine camera poses by maximizing the aggregated confidence in the camera pose refinement module.

Depth Estimation Optical Flow Estimation

Paper
Code

Attack-Resistant Federated Learning with Residual-based Reweighting

2 code implementations • 24 Dec 2019 • Shuhao Fu, Chulin Xie, Bo Li, Qifeng Chen

Federated learning has a variety of applications in multiple domains by utilizing private training data stored on different devices.

Federated Learning regression

Paper
Code

Music-oriented Dance Video Synthesis with Pose Perceptual Loss

1 code implementation • 13 Dec 2019 • Xuanchi Ren, Haoran Li, Zijian Huang, Qifeng Chen

We present a learning-based approach with pose perceptual loss for automatic music video generation.

Video Generation

245

Paper
Code

Fully Automatic Video Colorization with Self-Regularization and Diversity

4 code implementations • CVPR 2019 • Chenyang Lei, Qifeng Chen

We present a fully automatic approach to video colorization with self-regularization and diversity.

Colorization

188

Paper
Code

Zoom To Learn, Learn To Zoom

1 code implementation • 13 May 2019 • Xuaner Cecilia Zhang, Qifeng Chen, Ren Ng, Vladlen Koltun

We show how to obtain the ground-truth data with optically zoomed images and contribute a dataset, SR-RAW, for real-world computational zoom.

Super-Resolution

312

Paper
Code

Combinatorial Optimization with Graph Convolutional Networks and Guided Tree Search

2 code implementations • NeurIPS 2018 • Zhuwen Li, Qifeng Chen, Vladlen Koltun

We present a learning-based approach to computing solutions for certain NP-hard problems.

Combinatorial Optimization

Paper
Code

Speech Denoising with Deep Feature Losses

5 code implementations • 27 Jun 2018 • Francois G. Germain, Qifeng Chen, Vladlen Koltun

We present an end-to-end deep learning approach to denoising speech signals by processing the raw waveform directly.

Audio Tagging Denoising +1

485

Paper
Code

Speech Denoising Convolutional Neural Network trained with Deep Feature Losses.

2 code implementations • Interspeech 2018 • Francois G. Germain, Qifeng Chen, Vladlen Koltun

We present an end-to-end deep learning approach to denoising speech signals by processing the raw waveform directly.

Audio Tagging Denoising +2

182

Paper
Code

Single Image Reflection Separation with Perceptual Losses

3 code implementations • CVPR 2018 • Xuaner Zhang, Ren Ng, Qifeng Chen

Our loss function includes two perceptual losses: a feature loss from a visual perception network, and an adversarial loss that encodes characteristics of images in the transmission layers.

Image Enhancement Reflection Removal +1