Search Results for author: Qifeng Chen

Found 139 papers, 86 papers with code

Automatic Controllable Colorization via Imagination

no code implementations8 Apr 2024 Xiaoyan Cong, Yue Wu, Qifeng Chen, Chenyang Lei

Unlike most previous end-to-end automatic colorization algorithms, our framework allows for iterative and localized modifications of the colorization results because we explicitly model the coloring samples.

Colorization Image Generation

Robust Depth Enhancement via Polarization Prompt Fusion Tuning

no code implementations5 Apr 2024 Kei Ikemura, Yiming Huang, Felix Heide, Zhaoxiang Zhang, Qifeng Chen, Chenyang Lei

Existing depth sensors are imperfect and may provide inaccurate depth values in challenging scenarios, such as in the presence of transparent or reflective objects.

OV9D: Open-Vocabulary Category-Level 9D Object Pose and Size Estimation

no code implementations19 Mar 2024 Junhao Cai, Yisheng He, Weihao Yuan, Siyu Zhu, Zilong Dong, Liefeng Bo, Qifeng Chen

Derived from OmniObject3D, OO3D-9D is the largest and most diverse dataset in the field of category-level object pose and size estimation.

Object

Seeing and Hearing: Open-domain Visual-Audio Generation with Diffusion Latent Aligners

no code implementations27 Feb 2024 Yazhou Xing, Yingqing He, Zeyue Tian, Xintao Wang, Qifeng Chen

Thus, instead of training the giant models from scratch, we propose to bridge the existing strong models with a shared latent representation space.

Audio Generation Denoising

Real-time 3D-aware Portrait Editing from a Single Image

no code implementations21 Feb 2024 Qingyan Bai, Zifan Shi, Yinghao Xu, Hao Ouyang, Qiuyu Wang, Ceyuan Yang, Xuan Wang, Gordon Wetzstein, Yujun Shen, Qifeng Chen

This work presents 3DPE, a practical method that can efficiently edit a face image following given prompts, like reference images or text descriptions, in a 3D-aware manner.

Make a Cheap Scaling: A Self-Cascade Diffusion Model for Higher-Resolution Adaptation

1 code implementation16 Feb 2024 Lanqing Guo, Yingqing He, Haoxin Chen, Menghan Xia, Xiaodong Cun, YuFei Wang, Siyu Huang, Yong Zhang, Xintao Wang, Qifeng Chen, Ying Shan, Bihan Wen

Diffusion models have proven to be highly effective in image and video generation; however, they still face composition challenges when generating images of varying sizes due to single-scale training data.

Video Generation

Using Left and Right Brains Together: Towards Vision and Language Planning

no code implementations16 Feb 2024 Jun Cen, Chenfei Wu, Xiao Liu, Shengming Yin, Yixuan Pei, Jinglong Yang, Qifeng Chen, Nan Duan, JianGuo Zhang

Large Language Models (LLMs) and Large Multi-modality Models (LMMs) have demonstrated remarkable decision masking capabilities on a variety of tasks.

ENTED: Enhanced Neural Texture Extraction and Distribution for Reference-based Blind Face Restoration

no code implementations13 Jan 2024 Yuen-Fui Lau, Tianjia Zhang, Zhefan Rao, Qifeng Chen

The latent code extracted from the degraded input image often contains corrupted features, making it difficult to align the semantic information from the input with the high-quality textures from the reference.

Blind Face Restoration Quantization

MagicScroll: Nontypical Aspect-Ratio Image Generation for Visual Storytelling via Multi-Layered Semantic-Aware Denoising

no code implementations18 Dec 2023 Bingyuan Wang, Hengyu Meng, Zeyu Cai, Lanjiong Li, Yue Ma, Qifeng Chen, Zeyu Wang

Visual storytelling often uses nontypical aspect-ratio images like scroll paintings, comic strips, and panoramas to create an expressive and compelling narrative.

Denoising Image Generation +1

TIP: Text-Driven Image Processing with Semantic and Restoration Instructions

no code implementations18 Dec 2023 Chenyang Qi, Zhengzhong Tu, Keren Ye, Mauricio Delbracio, Peyman Milanfar, Qifeng Chen, Hossein Talebi

Text-driven diffusion models have become increasingly popular for various image editing tasks, including inpainting, stylization, and object replacement.

Deblurring Denoising +2

HeadArtist: Text-conditioned 3D Head Generation with Self Score Distillation

no code implementations12 Dec 2023 Hongyu Liu, Xuan Wang, Ziyu Wan, Yujun Shen, Yibing Song, Jing Liao, Qifeng Chen

The noisy image, landmarks, and text condition are then fed into the frozen ControlNet twice for noise prediction.

Learning Naturally Aggregated Appearance for Efficient 3D Editing

1 code implementation11 Dec 2023 Ka Leong Cheng, Qiuyu Wang, Zifan Shi, Kecheng Zheng, Yinghao Xu, Hao Ouyang, Qifeng Chen, Yujun Shen

Neural radiance fields, which represent a 3D scene as a color field and a density field, have demonstrated great progress in novel view synthesis yet are unfavorable for editing due to the implicitness.

Novel View Synthesis

MagicStick: Controllable Video Editing via Control Handle Transformations

1 code implementation5 Dec 2023 Yue Ma, Xiaodong Cun, Yingqing He, Chenyang Qi, Xintao Wang, Ying Shan, Xiu Li, Qifeng Chen

Yet succinct, our method is the first method to show the ability of video property editing from the pre-trained text-to-image model.

Video Editing Video Generation

LDM-ISP: Enhancing Neural ISP for Low Light with Latent Diffusion Models

no code implementations2 Dec 2023 Qiang Wen, Yazhou Xing, Zhefan Rao, Qifeng Chen

Specifically, to tailor the pre-trained latent diffusion model to operate on the RAW domain, we train a set of lightweight taming modules to inject the RAW information into the diffusion denoising process via modulating the intermediate features of UNet.

Denoising Image Generation +1

Gaussian Shell Maps for Efficient 3D Human Generation

no code implementations29 Nov 2023 Rameen Abdal, Wang Yifan, Zifan Shi, Yinghao Xu, Ryan Po, Zhengfei Kuang, Qifeng Chen, Dit-yan Yeung, Gordon Wetzstein

Instead of rasterizing the shells directly, we sample 3D Gaussians on the shells whose attributes are encoded in the texture features.

TextDiffuser-2: Unleashing the Power of Language Models for Text Rendering

no code implementations28 Nov 2023 Jingye Chen, Yupan Huang, Tengchao Lv, Lei Cui, Qifeng Chen, Furu Wei

The diffusion model has been proven a powerful generative model in recent years, yet remains a challenge in generating visual text.

Language Modelling Large Language Model +1

PPAD: Iterative Interactions of Prediction and Planning for End-to-end Autonomous Driving

no code implementations14 Nov 2023 Zhili Chen, Maosheng Ye, Shuangjie Xu, Tongyi Cao, Qifeng Chen

Unlike existing end-to-end autonomous driving frameworks, PPAD models the interactions among ego, agents, and the dynamic environment in an autoregressive manner by interleaving the Prediction and Planning processes at every timestep, instead of a single sequential process of prediction followed by planning.

Autonomous Driving Motion Planning +1

VideoCrafter1: Open Diffusion Models for High-Quality Video Generation

3 code implementations30 Oct 2023 Haoxin Chen, Menghan Xia, Yingqing He, Yong Zhang, Xiaodong Cun, Shaoshu Yang, Jinbo Xing, Yaofang Liu, Qifeng Chen, Xintao Wang, Chao Weng, Ying Shan

The I2V model is designed to produce videos that strictly adhere to the content of the provided reference image, preserving its content, structure, and style.

Text-to-Video Generation Video Generation

ControlLLM: Augment Language Models with Tools by Searching on Graphs

1 code implementation26 Oct 2023 Zhaoyang Liu, Zeqiang Lai, Zhangwei Gao, Erfei Cui, Ziheng Li, Xizhou Zhu, Lewei Lu, Qifeng Chen, Yu Qiao, Jifeng Dai, Wenhai Wang

We present ControlLLM, a novel framework that enables large language models (LLMs) to utilize multi-modal tools for solving complex real-world tasks.

Scheduling

ScaleCrafter: Tuning-free Higher-Resolution Visual Generation with Diffusion Models

1 code implementation11 Oct 2023 Yingqing He, Shaoshu Yang, Haoxin Chen, Xiaodong Cun, Menghan Xia, Yong Zhang, Xintao Wang, Ran He, Qifeng Chen, Ying Shan

Our work also suggests that a pre-trained diffusion model trained on low-resolution images can be directly used for high-resolution visual generation without further tuning, which may provide insights for future research on ultra-high-resolution image and video synthesis.

Image Generation

In-Domain GAN Inversion for Faithful Reconstruction and Editability

no code implementations25 Sep 2023 Jiapeng Zhu, Yujun Shen, Yinghao Xu, Deli Zhao, Qifeng Chen, Bolei Zhou

This work fills in this gap by proposing in-domain GAN inversion, which consists of a domain-guided encoder and a domain-regularized optimizer, to regularize the inverted code in the native latent space of the pre-trained GAN model.

Image Generation Image Reconstruction

AniPortraitGAN: Animatable 3D Portrait Generation from 2D Image Collections

no code implementations5 Sep 2023 Yue Wu, Sicheng Xu, Jianfeng Xiang, Fangyun Wei, Qifeng Chen, Jiaolong Yang, Xin Tong

For the new task, we base our method on the generative radiance manifold representation and equip it with learnable facial and head-shoulder deformations.

Online Overexposed Pixels Hallucination in Videos with Adaptive Reference Frame Selection

no code implementations29 Aug 2023 Yazhou Xing, Amrita Mazumdar, Anjul Patney, Chao Liu, Hongxu Yin, Qifeng Chen, Jan Kautz, Iuri Frosio

We present a learning-based system to reduce these artifacts without resorting to complex acquisition mechanisms like alternating exposures or costly processing that are typical of high dynamic range (HDR) imaging.

Hallucination

CoDeF: Content Deformation Fields for Temporally Consistent Video Processing

1 code implementation15 Aug 2023 Hao Ouyang, Qiuyu Wang, Yuxi Xiao, Qingyan Bai, Juntao Zhang, Kecheng Zheng, Xiaowei Zhou, Qifeng Chen, Yujun Shen

We present the content deformation field CoDeF as a new type of video representation, which consists of a canonical content field aggregating the static contents in the entire video and a temporal deformation field recording the transformations from the canonical image (i. e., rendered from the canonical content field) to each individual frame along the time axis. Given a target video, these two fields are jointly optimized to reconstruct it through a carefully tailored rendering pipeline. We advisedly introduce some regularizations into the optimization process, urging the canonical content field to inherit semantics (e. g., the object shape) from the video. With such a design, CoDeF naturally supports lifting image algorithms for video processing, in the sense that one can apply an image algorithm to the canonical image and effortlessly propagate the outcomes to the entire video with the aid of the temporal deformation field. We experimentally show that CoDeF is able to lift image-to-image translation to video-to-video translation and lift keypoint detection to keypoint tracking without any training. More importantly, thanks to our lifting strategy that deploys the algorithms on only one image, we achieve superior cross-frame consistency in processed videos compared to existing video-to-video translation approaches, and even manage to track non-rigid objects like water and smog. Project page can be found at https://qiuyu96. github. io/CoDeF/.

Image-to-Image Translation Keypoint Detection +1

CMDFusion: Bidirectional Fusion Network with Cross-modality Knowledge Distillation for LIDAR Semantic Segmentation

1 code implementation9 Jul 2023 Jun Cen, Shiwei Zhang, Yixuan Pei, Kun Li, Hang Zheng, Maochun Luo, Yingya Zhang, Qifeng Chen

In this way, RGB images are not required during inference anymore since the 2D knowledge branch provides 2D information according to the 3D LIDAR input.

Autonomous Vehicles Knowledge Distillation +2

SAD: Segment Any RGBD

1 code implementation23 May 2023 Jun Cen, Yizheng Wu, Kewei Wang, Xingyi Li, Jingkang Yang, Yixuan Pei, Lingdong Kong, Ziwei Liu, Qifeng Chen

The Segment Anything Model (SAM) has demonstrated its effectiveness in segmenting any part of 2D RGB images.

Open Vocabulary Semantic Segmentation Panoptic Segmentation +1

TextDiffuser: Diffusion Models as Text Painters

no code implementations NeurIPS 2023 Jingye Chen, Yupan Huang, Tengchao Lv, Lei Cui, Qifeng Chen, Furu Wei

Diffusion models have gained increasing attention for their impressive generation abilities but currently struggle with rendering accurate and coherent text.

Optical Character Recognition (OCR)

Enlarging Instance-specific and Class-specific Information for Open-set Action Recognition

1 code implementation CVPR 2023 Jun Cen, Shiwei Zhang, Xiang Wang, Yixuan Pei, Zhiwu Qing, Yingya Zhang, Qifeng Chen

In this paper, we begin with analyzing the feature representation behavior in the open-set action recognition (OSAR) problem based on the information bottleneck (IB) theory, and propose to enlarge the instance-specific (IS) and class-specific (CS) information contained in the feature for better performance.

Open Set Action Recognition

Rotating without Seeing: Towards In-hand Dexterity through Touch

no code implementations20 Mar 2023 Zhao-Heng Yin, Binghao Huang, Yuzhe Qin, Qifeng Chen, Xiaolong Wang

Relying on touch-only sensing, we can directly deploy the policy in a real robot hand and rotate novel objects that are not presented in training.

Object

Blind Video Deflickering by Neural Filtering with a Flawed Atlas

1 code implementation CVPR 2023 Chenyang Lei, Xuanchi Ren, Zhaoxiang Zhang, Qifeng Chen

Prior work usually requires specific guidance such as the flickering frequency, manual annotations, or extra consistent videos to remove the flicker.

Video Generation Video Temporal Consistency

Human MotionFormer: Transferring Human Motions with Vision Transformers

1 code implementation22 Feb 2023 Hongyu Liu, Xintong Han, ChengBin Jin, Lihui Qian, Huawei Wei, Zhe Lin, Faqiang Wang, Haoye Dong, Yibing Song, Jia Xu, Qifeng Chen

In this paper, we propose Human MotionFormer, a hierarchical ViT framework that leverages global and local perceptions to capture large and subtle motion matching, respectively.

Motion Synthesis

Video Waterdrop Removal via Spatio-Temporal Fusion in Driving Scenes

1 code implementation12 Feb 2023 Qiang Wen, Yue Wu, Qifeng Chen

The waterdrops on windshields during driving can cause severe visual obstructions, which may lead to car accidents.

Autonomous Driving

The Devil is in the Wrongly-classified Samples: Towards Unified Open-set Recognition

1 code implementation8 Feb 2023 Jun Cen, Di Luan, Shiwei Zhang, Yixuan Pei, Yingya Zhang, Deli Zhao, Shaojie Shen, Qifeng Chen

Recently, Unified Open-set Recognition (UOSR) has been proposed to reject not only unknown samples but also known but wrongly classified samples, which tends to be more practical in real-world applications.

Open Set Learning

Learning 3D-aware Image Synthesis with Unknown Pose Distribution

no code implementations CVPR 2023 Zifan Shi, Yujun Shen, Yinghao Xu, Sida Peng, Yiyi Liao, Sheng Guo, Qifeng Chen, Dit-yan Yeung

Existing methods for 3D-aware image synthesis largely depend on the 3D pose distribution pre-estimated on the training set.

3D-Aware Image Synthesis

LinkGAN: Linking GAN Latents to Pixels for Controllable Image Synthesis

no code implementations ICCV 2023 Jiapeng Zhu, Ceyuan Yang, Yujun Shen, Zifan Shi, Bo Dai, Deli Zhao, Qifeng Chen

This work presents an easy-to-use regularizer for GAN training, which helps explicitly link some axes of the latent space to a set of pixels in the synthesized image.

Image Generation

Randomized Quantization: A Generic Augmentation for Data Agnostic Self-supervised Learning

1 code implementation ICCV 2023 Huimin Wu, Chenyang Lei, Xiao Sun, Peng-Shuai Wang, Qifeng Chen, Kwang-Ting Cheng, Stephen Lin, Zhirong Wu

Self-supervised representation learning follows a paradigm of withholding some part of the data and tasking the network to predict it from the remaining part.

Data Augmentation Quantization +2

Rodin: A Generative Model for Sculpting 3D Digital Avatars Using Diffusion

no code implementations CVPR 2023 Tengfei Wang, Bo Zhang, Ting Zhang, Shuyang Gu, Jianmin Bao, Tadas Baltrusaitis, Jingjing Shen, Dong Chen, Fang Wen, Qifeng Chen, Baining Guo

This paper presents a 3D generative model that uses diffusion models to automatically generate 3D digital avatars represented as neural radiance fields.

Computational Efficiency

High-fidelity 3D GAN Inversion by Pseudo-multi-view Optimization

1 code implementation CVPR 2023 Jiaxin Xie, Hao Ouyang, Jingtan Piao, Chenyang Lei, Qifeng Chen

We present a high-fidelity 3D generative adversarial network (GAN) inversion framework that can synthesize photo-realistic novel views while preserving specific details of the input image.

Attribute Generative Adversarial Network +2

Latent Video Diffusion Models for High-Fidelity Long Video Generation

1 code implementation23 Nov 2022 Yingqing He, Tianyu Yang, Yong Zhang, Ying Shan, Qifeng Chen

Diffusion models have shown remarkable results recently but require significant computational resources.

Denoising Image Generation +3

Delving StyleGAN Inversion for Image Editing: A Foundation Latent Space Viewpoint

1 code implementation CVPR 2023 Hongyu Liu, Yibing Song, Qifeng Chen

In this work, we propose to first obtain the precise latent code in foundation latent space $\mathcal{W}$.

Contrastive Learning

Robust Federated Learning against both Data Heterogeneity and Poisoning Attack via Aggregation Optimization

no code implementations10 Nov 2022 Yueqi Xie, Weizhong Zhang, Renjie Pi, Fangzhao Wu, Qifeng Chen, Xing Xie, Sunghun Kim

Since at each round, the number of tunable parameters optimized on the server side equals the number of participating clients (thus independent of the model size), we are able to train a global model with massive parameters using only a small amount of proxy data (e. g., around one hundred samples).

Federated Learning

Robust Reflection Removal with Flash-only Cues in the Wild

1 code implementation5 Nov 2022 Chenyang Lei, Xudong Jiang, Qifeng Chen

We propose a simple yet effective reflection-free cue for robust reflection removal from a pair of flash and ambient (no-flash) images.

Reflection Removal

Planning for Sample Efficient Imitation Learning

1 code implementation18 Oct 2022 Zhao-Heng Yin, Weirui Ye, Qifeng Chen, Yang Gao

Inspired by the recent success of EfficientZero in RL, we propose EfficientImitate (EI), a planning-based imitation learning method that can achieve high in-environment sample efficiency and performance simultaneously.

Imitation Learning

One Model to Edit Them All: Free-Form Text-Driven Image Manipulation with Semantic Modulations

1 code implementation14 Oct 2022 Yiming Zhu, Hongyu Liu, Yibing Song, Ziyang Yuan, Xintong Han, Chun Yuan, Qifeng Chen, Jue Wang

Based on the visual latent space of StyleGAN[21] and text embedding space of CLIP[34], studies focus on how to map these two latent spaces for text-driven attribute manipulations.

Attribute Image Manipulation

AniFaceGAN: Animatable 3D-Aware Face Image Generation for Video Avatars

1 code implementation12 Oct 2022 Yue Wu, Yu Deng, Jiaolong Yang, Fangyun Wei, Qifeng Chen, Xin Tong

To achieve meaningful control over facial expressions via deformation, we propose a 3D-level imitative learning scheme between the generator and a parametric 3D face model during adversarial training of the 3D-aware GAN.

Disentanglement Face Model +1

Federated Domain Generalization for Image Recognition via Cross-Client Style Transfer

1 code implementation3 Oct 2022 Junming Chen, Meirui Jiang, Qi Dou, Qifeng Chen

Our style representation is exceptionally lightweight and can hardly be used for the reconstruction of the dataset.

Domain Generalization Federated Learning +1

Improving 3D-aware Image Synthesis with A Geometry-aware Discriminator

no code implementations30 Sep 2022 Zifan Shi, Yinghao Xu, Yujun Shen, Deli Zhao, Qifeng Chen, Dit-yan Yeung

We argue that, considering the two-player game in the formulation of GANs, only making the generator 3D-aware is not enough.

3D-Aware Image Synthesis domain classification +2

A Portable Multiscopic Camera for Novel View and Time Synthesis in Dynamic Scenes

no code implementations30 Aug 2022 Tianjia Zhang, Yuen-Fui Lau, Qifeng Chen

We present a portable multiscopic camera system with a dedicated model for novel view and time synthesis in dynamic scenes.

Optimizing Image Compression via Joint Learning with Denoising

1 code implementation22 Jul 2022 Ka Leong Cheng, Yueqi Xie, Qifeng Chen

The key is to transform the original noisy images to noise-free bits by eliminating the undesired noise during compression, where the bits are later decompressed as clean images.

Denoising Image Compression

Real-time Streaming Video Denoising with Bidirectional Buffers

1 code implementation14 Jul 2022 Chenyang Qi, Junming Chen, Xin Yang, Qifeng Chen

Recent multi-output inference works propagate the bidirectional temporal feature with a parallel or recurrent framework, which either suffers from performance drops on the temporal edges of clips or can not achieve online inference.

Denoising Video Denoising

Optimizing Video Prediction via Video Frame Interpolation

1 code implementation CVPR 2022 Yue Wu, Qiang Wen, Qifeng Chen

Extensive experiments on the Cityscapes, KITTI, DAVIS, Middlebury, and Vimeo90K datasets show that our video prediction results are robust in general scenarios, and our approach outperforms other video prediction methods that require a large amount of training data or extra semantic information.

Open-Ended Question Answering Video Frame Interpolation +1

Point Cloud Compression with Sibling Context and Surface Priors

1 code implementation2 May 2022 Zhili Chen, Zian Qian, Sukai Wang, Qifeng Chen

We present a novel octree-based multi-level framework for large-scale point cloud compression, which can organize sparse and unstructured point clouds in a memory-efficient way.

Real-Time Neural Character Rendering with Pose-Guided Multiplane Images

1 code implementation25 Apr 2022 Hao Ouyang, Bo Zhang, Pan Zhang, Hao Yang, Jiaolong Yang, Dong Chen, Qifeng Chen, Fang Wen

We propose pose-guided multiplane image (MPI) synthesis which can render an animatable character in real scenes with photorealistic quality.

Image-to-Image Translation Neural Rendering +1

Bootstrap Motion Forecasting With Self-Consistent Constraints

no code implementations ICCV 2023 Maosheng Ye, Jiamiao Xu, Xunnong Xu, Tengfei Wang, Tongyi Cao, Qifeng Chen

Also, to model the multi-modality in motion forecasting, we design a novel self-ensembling scheme to obtain accurate teacher targets to enforce the self-constraints with multi-modality supervision.

Motion Forecasting

FS6D: Few-Shot 6D Pose Estimation of Novel Objects

1 code implementation CVPR 2022 Yisheng He, Yao Wang, Haoqiang Fan, Jian Sun, Qifeng Chen

6D object pose estimation networks are limited in their capability to scale to large numbers of object instances due to the close-set assumption and their reliance on high-fidelity object CAD models.

6D Pose Estimation 6D Pose Estimation using RGB +1

Interpreting Class Conditional GANs with Channel Awareness

no code implementations21 Mar 2022 Yingqing He, Zhiyi Zhang, Jiapeng Zhu, Yujun Shen, Qifeng Chen

To describe such a phenomenon, we propose channel awareness, which quantitatively characterizes how a single channel contributes to the final synthesis.

Towards Self-Supervised Category-Level Object Pose and Size Estimation

no code implementations6 Mar 2022 Yisheng He, Haoqiang Fan, Haibin Huang, Qifeng Chen, Jian Sun

Instead, we propose a label-free method that learns to enforce the geometric consistency between category template mesh and observed object point cloud under a self-supervision manner.

Region-Based Semantic Factorization in GANs

1 code implementation19 Feb 2022 Jiapeng Zhu, Yujun Shen, Yinghao Xu, Deli Zhao, Qifeng Chen

Despite the rapid advancement of semantic discovery in the latent space of Generative Adversarial Networks (GANs), existing approaches either are limited to finding global attributes or rely on a number of segmentation masks to identify local attributes.

3D-Aware Indoor Scene Synthesis with Depth Priors

no code implementations17 Feb 2022 Zifan Shi, Yujun Shen, Jiapeng Zhu, Dit-yan Yeung, Qifeng Chen

In this way, the discriminator can take the spatial arrangement into account and advise the generator to learn an appropriate depth condition.

3D-Aware Image Synthesis Indoor Scene Synthesis

Deep Video Prior for Video Consistency and Propagation

1 code implementation27 Jan 2022 Chenyang Lei, Yazhou Xing, Hao Ouyang, Qifeng Chen

A progressive propagation strategy with pseudo labels is also proposed to enhance DVP's performance on video propagation.

Optical Flow Estimation Semantic Segmentation +2

Automatic Speech Recognition Datasets in Cantonese: A Survey and New Dataset

1 code implementation LREC 2022 Tiezheng Yu, Rita Frieske, Peng Xu, Samuel Cahyawijaya, Cheuk Tung Shadow Yiu, Holy Lovenia, Wenliang Dai, Elham J. Barezi, Qifeng Chen, Xiaojuan Ma, Bertram E. Shi, Pascale Fung

We further conduct experiments with Fairseq S2T Transformer, a state-of-the-art ASR model, on the biggest existing dataset, Common Voice zh-HK, and our proposed MDCC, and the results show the effectiveness of our dataset.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +3

Shape from Polarization for Complex Scenes in the Wild

1 code implementation CVPR 2022 Chenyang Lei, Chenyang Qi, Jiaxin Xie, Na Fan, Vladlen Koltun, Qifeng Chen

We present a new data-driven approach with physics-based priors to scene-level normal estimation from a single polarization image.

ASCEND: A Spontaneous Chinese-English Dataset for Code-switching in Multi-turn Conversation

2 code implementations LREC 2022 Holy Lovenia, Samuel Cahyawijaya, Genta Indra Winata, Peng Xu, Xu Yan, Zihan Liu, Rita Frieske, Tiezheng Yu, Wenliang Dai, Elham J. Barezi, Qifeng Chen, Xiaojuan Ma, Bertram E. Shi, Pascale Fung

ASCEND (A Spontaneous Chinese-English Dataset) is a high-quality Mandarin Chinese-English code-switching corpus built on spontaneous multi-turn conversational dialogue sources collected in Hong Kong.

DRINet++: Efficient Voxel-as-point Point Cloud Segmentation

no code implementations16 Nov 2021 Maosheng Ye, Rui Wan, Shuangjie Xu, Tongyi Cao, Qifeng Chen

The Sparse Feature Encoder extracts the local context information for each point, and the Sparse Geometry Feature Enhancement enhances the geometric properties of a sparse point cloud via multi-scale sparse projection and attentive multi-scale fusion.

Point Cloud Segmentation Segmentation +1

Physics Assisted Deep Learning for Indoor Imaging using Phaseless Wi-Fi Measurements

no code implementations4 Nov 2021 Samruddhi Deshmukh, Amartansh Dubey, Dingfei Ma, Qifeng Chen, Ross Murch

Thus, our proposed method is the first inverse scattering-based deep learning framework which can image large scatterers with high permittivity and achieve accurate indoor RF imaging using phaseless Wi-Fi measurements.

High-Fidelity GAN Inversion for Image Attribute Editing

1 code implementation CVPR 2022 Tengfei Wang, Yong Zhang, Yanbo Fan, Jue Wang, Qifeng Chen

With a low bit-rate latent code, previous works have difficulties in preserving high-fidelity details in reconstructed and edited images.

Attribute Generative Adversarial Network +2

IICNet: A Generic Framework for Reversible Image Conversion

1 code implementation ICCV 2021 Ka Leong Cheng, Yueqi Xie, Qifeng Chen

Reversible image conversion (RIC) aims to build a reversible transformation between specific visual content (e. g., short videos) and an embedding image, where the original content can be restored from the embedding when necessary.

Dual-Camera Super-Resolution with Aligned Attention Modules

2 code implementations ICCV 2021 Tengfei Wang, Jiaxin Xie, Wenxiu Sun, Qiong Yan, Qifeng Chen

We present a novel approach to reference-based super-resolution (RefSR) with the focus on dual-camera super-resolution (DCSR), which utilizes reference images for high-quality and high-fidelity results.

Domain Adaptation Reference-based Super-Resolution

Embedding Novel Views in a Single JPEG Image

1 code implementation ICCV 2021 Yue Wu, Guotao Meng, Qifeng Chen

We propose a novel approach for embedding novel views in a single JPEG image while preserving the perceptual fidelity of the modified JPEG image and the restored novel views.

Novel View Synthesis

Towards Photorealistic Colorization by Imagination

no code implementations20 Aug 2021 Chenyang Lei, Yue Wu, Qifeng Chen

We present a novel approach to automatic image colorization by imitating the imagination process of human experts.

Colorization Image Colorization +1

Joint Depth and Normal Estimation from Real-world Time-of-flight Raw Data

no code implementations8 Aug 2021 Rongrong Gao, Na Fan, Changlin Li, Wentao Liu, Qifeng Chen

We present a novel approach to joint depth and normal estimation for time-of-flight (ToF) sensors.

Enhanced Invertible Encoding for Learned Image Compression

1 code implementation8 Aug 2021 Yueqi Xie, Ka Leong Cheng, Qifeng Chen

Although deep learning based image compression methods have achieved promising progress these days, the performance of these methods still cannot match the latest compression standard Versatile Video Coding (VVC).

Image Compression

A Categorized Reflection Removal Dataset with Diverse Real-world Scenes

no code implementations7 Aug 2021 Chenyang Lei, Xuhua Huang, Chenyang Qi, Yankun Zhao, Wenxiu Sun, Qiong Yan, Qifeng Chen

Due to the lack of a large-scale reflection removal dataset with diverse real-world scenes, many existing reflection removal methods are trained on synthetic data plus a small amount of real-world data, which makes it difficult to evaluate the strengths or weaknesses of different reflection removal methods thoroughly.

Reflection Removal

Stereo Waterdrop Removal with Row-wise Dilated Attention

1 code implementation7 Aug 2021 Zifan Shi, Na Fan, Dit-yan Yeung, Qifeng Chen

Thus, we propose a learning-based model for waterdrop removal with stereo images.

Autonomous Driving

Unsupervised Portrait Shadow Removal via Generative Priors

1 code implementation7 Aug 2021 Yingqing He, Yazhou Xing, Tianjia Zhang, Qifeng Chen

Qualitative and quantitative experiments on a real-world portrait shadow dataset demonstrate that our approach achieves comparable performance with supervised shadow removal methods.

Shadow Removal Unsupervised Semantic Segmentation

MFuseNet: Robust Depth Estimation with Learned Multiscopic Fusion

no code implementations5 Aug 2021 Weihao Yuan, Rui Fan, Michael Yu Wang, Qifeng Chen

We design a multiscopic vision system that utilizes a low-cost monocular RGB camera to acquire accurate depth estimation.

Depth Estimation Stereo Matching

Internal Video Inpainting by Implicit Long-range Propagation

1 code implementation ICCV 2021 Hao Ouyang, Tengfei Wang, Qifeng Chen

We propose a novel framework for video inpainting by adopting an internal learning strategy.

4k Object +2

Video Super-Resolution with Long-Term Self-Exemplars

no code implementations24 Jun 2021 Guotao Meng, Yue Wu, Sijin Li, Qifeng Chen

Existing video super-resolution methods often utilize a few neighboring frames to generate a higher-resolution image for each frame.

Video Super-Resolution

SinIR: Efficient General Image Manipulation with Single Image Reconstruction

1 code implementation14 Jun 2021 Jihyeong Yoo, Qifeng Chen

We train our model on a single image with cascaded multi-scale learning, where each network at each scale is responsible for image reconstruction.

Denoising Image Manipulation +3

Low-Rank Subspaces in GANs

1 code implementation NeurIPS 2021 Jiapeng Zhu, Ruili Feng, Yujun Shen, Deli Zhao, ZhengJun Zha, Jingren Zhou, Qifeng Chen

Concretely, given an arbitrary image and a region of interest (e. g., eyes of face images), we manage to relate the latent space to the image region with the Jacobian matrix and then use low-rank factorization to discover steerable latent subspaces.

Attribute Generative Adversarial Network

Image Inpainting with External-internal Learning and Monochromic Bottleneck

1 code implementation CVPR 2021 Tengfei Wang, Hao Ouyang, Qifeng Chen

Although recent inpainting approaches have demonstrated significant improvements with deep neural networks, they still suffer from artifacts such as blunt structures and abrupt colors when filling in the missing regions.

Image Inpainting

Neural Camera Simulators

1 code implementation CVPR 2021 Hao Ouyang, Zifan Shi, Chenyang Lei, Ka Lung Law, Qifeng Chen

To facilitate the learning of a simulator model, we collect a dataset of the 10, 000 raw images of 450 scenes with different exposure settings.

Data Augmentation

Invertible Image Signal Processing

1 code implementation CVPR 2021 Yazhou Xing, Zian Qian, Qifeng Chen

Unprocessed RAW data is a highly valuable image format for image editing and computer vision.

Robust Reflection Removal with Reflection-free Flash-only Cues

1 code implementation CVPR 2021 Chenyang Lei, Qifeng Chen

The flash-only image is equivalent to an image taken in a dark environment with only a flash on.

Reflection Removal SSIM

TPCN: Temporal Point Cloud Networks for Motion Forecasting

no code implementations CVPR 2021 Maosheng Ye, Tongyi Cao, Qifeng Chen

We propose the Temporal Point Cloud Networks (TPCN), a novel and flexible framework with joint spatial and temporal learning for trajectory prediction.

Motion Forecasting Trajectory Prediction

FFB6D: A Full Flow Bidirectional Fusion Network for 6D Pose Estimation

3 code implementations CVPR 2021 Yisheng He, Haibin Huang, Haoqiang Fan, Qifeng Chen, Jian Sun

Moreover, at the output representation stage, we designed a simple but effective 3D keypoints selection algorithm considering the texture and geometry information of objects, which simplifies keypoint localization for precise pose estimation.

6D Pose Estimation Representation Learning

Robust Federated Learning with Attack-Adaptive Aggregation

1 code implementation10 Feb 2021 Ching Pui Wan, Qifeng Chen

To the best of our knowledge, our aggregation strategy is the first one that can be adapted to defend against various attacks in a data-driven fashion.

Federated Learning Model Poisoning

Video Deblurring by Fitting to Test Data

1 code implementation9 Dec 2020 Xuanchi Ren, Zian Qian, Qifeng Chen

Our key observation is that some frames in a video with motion blur are much sharper than others, and thus we can transfer the texture information in those sharp frames to blurry frames.

Autonomous Vehicles Deblurring

Evaluating adversarial robustness in simulated cerebellum

no code implementations5 Dec 2020 Liu Yuezhang, Bo Li, Qifeng Chen

It is well known that artificial neural networks are vulnerable to adversarial examples, in which great efforts have been made to improve the robustness.

Adversarial Robustness

Blind Video Temporal Consistency via Deep Video Prior

2 code implementations NeurIPS 2020 Chenyang Lei, Yazhou Xing, Qifeng Chen

Extensive quantitative and perceptual experiments show that our approach obtains superior performance than state-of-the-art methods on blind video temporal consistency.

Colorization Image Dehazing +4

Self-supervised Object Tracking with Cycle-consistent Siamese Networks

1 code implementation3 Aug 2020 Weihao Yuan, Michael Yu Wang, Qifeng Chen

Self-supervised learning for visual object tracking possesses valuable advantages compared to supervised learning, such as the non-necessity of laborious human annotations and online training.

Object Region Proposal +5

Fully Convolutional Networks for Continuous Sign Language Recognition

no code implementations ECCV 2020 Ka Leong Cheng, Zhaoyang Yang, Qifeng Chen, Yu-Wing Tai

Continuous sign language recognition (SLR) is a challenging task that requires learning on both spatial and temporal dimensions of signing frame sequences.

Sentence Sign Language Recognition

PSConv: Squeezing Feature Pyramid into One Compact Poly-Scale Convolutional Layer

1 code implementation ECCV 2020 Duo Li, Anbang Yao, Qifeng Chen

Despite their strong modeling capacities, Convolutional Neural Networks (CNNs) are often scale-sensitive.

Representation Learning

Learning to Learn Parameterized Classification Networks for Scalable Input Images

1 code implementation ECCV 2020 Duo Li, Anbang Yao, Qifeng Chen

To achieve efficient and flexible image classification at runtime, we employ meta learners to generate convolutional weights of main networks for various input scales and maintain privatized Batch Normalization layers per scale.

Classification General Classification +2

Deep Reinforced Attention Learning for Quality-Aware Visual Recognition

no code implementations ECCV 2020 Duo Li, Qifeng Chen

In this paper, we build upon the weakly-supervised generation mechanism of intermediate attention maps in any convolutional neural networks and disclose the effectiveness of attention modules more straightforwardly to fully exploit their potential.

Depth Sensing Beyond LiDAR Range

no code implementations CVPR 2020 Kai Zhang, Jiaxin Xie, Noah Snavely, Qifeng Chen

Depth sensing is a critical component of autonomous driving technologies, but today's LiDAR- or stereo camera-based solutions have limited range.

Autonomous Driving

Future Video Synthesis with Object Motion Prediction

1 code implementation CVPR 2020 Yue Wu, Rongrong Gao, Jaesik Park, Qifeng Chen

We present an approach to predict future video frames given a sequence of continuous video frames in the past.

Ranked #2 on Video Prediction on Cityscapes (using extra training data)

motion prediction Object +1

PiP: Planning-informed Trajectory Prediction for Autonomous Driving

1 code implementation ECCV 2020 Haoran Song, Wenchao Ding, Yuxuan Chen, Shaojie Shen, Michael Yu Wang, Qifeng Chen

Moreover, our approach enables a novel pipeline which couples the prediction and planning, by conditioning PiP on multiple candidate trajectories of the ego vehicle, which is highly beneficial for autonomous driving in interactive scenarios.

Autonomous Driving Future prediction +1

Dynamic Hierarchical Mimicking Towards Consistent Optimization Objectives

1 code implementation CVPR 2020 Duo Li, Qifeng Chen

While the depth of modern Convolutional Neural Networks (CNNs) surpasses that of the pioneering networks with a significant margin, the traditional way of appending supervision only over the final classifier and progressively propagating gradient flow upstream remains the training mainstay.

Active Perception with A Monocular Camera for Multiscopic Vision

1 code implementation22 Jan 2020 Weihao Yuan, Rui Fan, Michael Yu Wang, Qifeng Chen

We design a multiscopic vision system that utilizes a low-cost monocular RGB camera to acquire accurate depth estimation for robotic applications.

Depth Estimation Stereo Matching

Video Depth Estimation by Fusing Flow-to-Depth Proposals

1 code implementation30 Dec 2019 Jiaxin Xie, Chenyang Lei, Zhuwen Li, Li Erran Li, Qifeng Chen

Our flow-to-depth layer is differentiable, and thus we can refine camera poses by maximizing the aggregated confidence in the camera pose refinement module.

Depth Estimation Optical Flow Estimation

Attack-Resistant Federated Learning with Residual-based Reweighting

2 code implementations24 Dec 2019 Shuhao Fu, Chulin Xie, Bo Li, Qifeng Chen

Federated learning has a variety of applications in multiple domains by utilizing private training data stored on different devices.

Federated Learning regression

Music-oriented Dance Video Synthesis with Pose Perceptual Loss

1 code implementation13 Dec 2019 Xuanchi Ren, Haoran Li, Zijian Huang, Qifeng Chen

We present a learning-based approach with pose perceptual loss for automatic music video generation.

Video Generation

Fully Automatic Video Colorization with Self-Regularization and Diversity

4 code implementations CVPR 2019 Chenyang Lei, Qifeng Chen

We present a fully automatic approach to video colorization with self-regularization and diversity.

Colorization

Zoom To Learn, Learn To Zoom

1 code implementation13 May 2019 Xuaner Cecilia Zhang, Qifeng Chen, Ren Ng, Vladlen Koltun

We show how to obtain the ground-truth data with optically zoomed images and contribute a dataset, SR-RAW, for real-world computational zoom.

Super-Resolution

Speech Denoising with Deep Feature Losses

5 code implementations27 Jun 2018 Francois G. Germain, Qifeng Chen, Vladlen Koltun

We present an end-to-end deep learning approach to denoising speech signals by processing the raw waveform directly.

Audio Tagging Denoising +1

Single Image Reflection Separation with Perceptual Losses

3 code implementations CVPR 2018 Xuaner Zhang, Ren Ng, Qifeng Chen

Our loss function includes two perceptual losses: a feature loss from a visual perception network, and an adversarial loss that encodes characteristics of images in the transmission layers.

Image Enhancement Reflection Removal +1

Fast Image Processing with Fully-Convolutional Networks

2 code implementations ICCV 2017 Qifeng Chen, Jia Xu, Vladlen Koltun

Our approach uses a fully-convolutional network that is trained on input-output pairs that demonstrate the operator's action.

Style Transfer

Full Flow: Optical Flow Estimation By Global Optimization over Regular Grids

no code implementations CVPR 2016 Qifeng Chen, Vladlen Koltun

The approach optimizes a classical optical flow objective over the full space of mappings between discrete grids.

Optical Flow Estimation

1-HKUST: Object Detection in ILSVRC 2014

no code implementations22 Sep 2014 Cewu Lu, Hao Chen, Qifeng Chen, Hei Law, Yao Xiao, Chi-Keung Tang

We participated in the object detection track of ILSVRC 2014 and received the fourth place among the 38 teams.

Object object-detection +3

Fast MRF Optimization with Application to Depth Reconstruction

no code implementations CVPR 2014 Qifeng Chen, Vladlen Koltun

We describe a simple and fast algorithm for optimizing Markov random fields over images.

Cannot find the paper you are looking for? You can Submit a new open access paper.