Search Results for author: Lu Yuan

Found 127 papers, 78 papers with code

Efficient Modulation for Vision Networks

1 code implementation • 29 Mar 2024 • Xu Ma, Xiyang Dai, Jianwei Yang, Bin Xiao, Yinpeng Chen, Yun Fu, Lu Yuan

We demonstrate that the modulation mechanism is particularly well suited for efficient networks and further tailor the modulation design by proposing the efficient modulation (EfficientMod) block, which is considered the essential building block for our networks.

Paper
Code

OmniVid: A Generative Framework for Universal Video Understanding

1 code implementation • 26 Mar 2024 • Junke Wang, Dongdong Chen, Chong Luo, Bo He, Lu Yuan, Zuxuan Wu, Yu-Gang Jiang

The core of video understanding tasks, such as recognition, captioning, and tracking, is to automatically detect objects or actions in a video and analyze their temporal evolution.

Action Recognition Dense Video Captioning +4

Paper
Code

Generative Enhancement for 3D Medical Images

1 code implementation • 19 Mar 2024 • Lingting Zhu, Noel Codella, Dongdong Chen, Zhenchao Jin, Lu Yuan, Lequan Yu

Our method begins with a 2D slice, noted as the informed slice to serve the patient prior, and propagates the generation process using a 3D segmentation mask.

counterfactual Image Generation

Paper
Code

Block and Detail: Scaffolding Sketch-to-Image Generation

no code implementations • 28 Feb 2024 • Vishnu Sarukkai, Lu Yuan, Mia Tang, Maneesh Agrawala, Kayvon Fatahalian

Our tool lets users sketch blocking strokes to coarsely represent the placement and form of objects and detail strokes to refine their shape and silhouettes.

Blocking Image Generation

Paper
Add Code

Knowledge Graph Driven UAV Cognitive Semantic Communication Systems for Efficient Object Detection

no code implementations • 25 Jan 2024 • Xi Song, Lu Yuan, Zhibo Qu, Fuhui Zhou, Qihui Wu, Tony Q. S. Quek, Rose Qingyang Hu

Unmanned aerial vehicles (UAVs) are widely used for object detection.

Object object-detection +1

Paper
Add Code

iFusion: Inverting Diffusion for Pose-Free Reconstruction from Sparse Views

1 code implementation • 28 Dec 2023 • Chin-Hsuan Wu, Yen-Chun Chen, Bolivar Solarte, Lu Yuan, Min Sun

Our strategy unfolds in three steps: (1) We invert the diffusion model for camera pose estimation instead of synthesizing novel views.

3D Object Reconstruction Novel View Synthesis +2

Paper
Code

Learning Subject-Aware Cropping by Outpainting Professional Photos

no code implementations • 19 Dec 2023 • James Hong, Lu Yuan, Michaël Gharbi, Matthew Fisher, Kayvon Fatahalian

How to frame (or crop) a photo often depends on the image subject and its context; e. g., a human portrait.

Image Cropping

Paper
Add Code

Video-Bench: A Comprehensive Benchmark and Toolkit for Evaluating Video-based Large Language Models

1 code implementation • 27 Nov 2023 • Munan Ning, Bin Zhu, Yujia Xie, Bin Lin, Jiaxi Cui, Lu Yuan, Dongdong Chen, Li Yuan

Video-based large language models (Video-LLMs) have been recently introduced, targeting both fundamental improvements in perception and comprehension, and a diverse range of user inquiries.

Decision Making Question Answering

Paper
Code

Fully Authentic Visual Question Answering Dataset from Online Communities

no code implementations • 27 Nov 2023 • Chongyan Chen, Mengchen Liu, Noel Codella, Yunsheng Li, Lu Yuan, Danna Gurari

Visual Question Answering (VQA) entails answering questions about images.

Question Answering Visual Question Answering

Paper
Add Code

Florence-2: Advancing a Unified Representation for a Variety of Vision Tasks

no code implementations • 10 Nov 2023 • Bin Xiao, Haiping Wu, Weijian Xu, Xiyang Dai, Houdong Hu, Yumao Lu, Michael Zeng, Ce Liu, Lu Yuan

We introduce Florence-2, a novel vision foundation model with a unified, prompt-based representation for a variety of computer vision and vision-language tasks.

Multi-Task Learning object-detection +1

Paper
Add Code

PersonMAE: Person Re-Identification Pre-Training with Masked AutoEncoders

no code implementations • 8 Nov 2023 • Hezhen Hu, Xiaoyi Dong, Jianmin Bao, Dongdong Chen, Lu Yuan, Dong Chen, Houqiang Li

Pre-training is playing an increasingly important role in learning generic feature representation for Person Re-identification (ReID).

Person Re-Identification

Paper
Add Code

On the Hidden Waves of Image

no code implementations • 19 Oct 2023 • Yinpeng Chen, Dongdong Chen, Xiyang Dai, Mengchen Liu, Lu Yuan, Zicheng Liu, Youzuo Lin

We term this phenomenon hidden waves, as it reveals that, although the speeds of the set of wave equations and autoregressive coefficient matrices are latent, they are both learnable and shared across images.

Paper
Add Code

LACMA: Language-Aligning Contrastive Learning with Meta-Actions for Embodied Instruction Following

1 code implementation • 18 Oct 2023 • Cheng-Fu Yang, Yen-Chun Chen, Jianwei Yang, Xiyang Dai, Lu Yuan, Yu-Chiang Frank Wang, Kai-Wei Chang

Additional analysis shows that the contrastive objective and meta-actions are complementary in achieving the best results, and the resulting agent better aligns its states with corresponding instructions, making it more suitable for real-world embodied agents.

Contrastive Learning Instruction Following

Paper
Code

TinyCLIP: CLIP Distillation via Affinity Mimicking and Weight Inheritance

1 code implementation • ICCV 2023 • Kan Wu, Houwen Peng, Zhenghong Zhou, Bin Xiao, Mengchen Liu, Lu Yuan, Hong Xuan, Michael Valenzuela, Xi, Chen, Xinggang Wang, Hongyang Chao, Han Hu

In this paper, we propose a novel cross-modal distillation method, called TinyCLIP, for large-scale language-image pre-trained models.

1,557

Paper
Code

Improving Adversarial Robustness of Masked Autoencoders via Test-time Frequency-domain Prompting

1 code implementation • ICCV 2023 • Qidong Huang, Xiaoyi Dong, Dongdong Chen, Yinpeng Chen, Lu Yuan, Gang Hua, Weiming Zhang, Nenghai Yu

Based on our analysis, we provide a simple yet effective way to boost the adversarial robustness of MAE.

Adversarial Robustness

Paper
Code

HQ-50K: A Large-scale, High-quality Dataset for Image Restoration

1 code implementation • 8 Jun 2023 • Qinhong Yang, Dongdong Chen, Zhentao Tan, Qiankun Liu, Qi Chu, Jianmin Bao, Lu Yuan, Gang Hua, Nenghai Yu

This paper introduces a new large-scale image restoration dataset, called HQ-50K, which contains 50, 000 high-quality images with rich texture details and semantic diversity.

Denoising Image Restoration +2

Paper
Code

Designing a Better Asymmetric VQGAN for StableDiffusion

2 code implementations • 7 Jun 2023 • Zixin Zhu, Xuelu Feng, Dongdong Chen, Jianmin Bao, Le Wang, Yinpeng Chen, Lu Yuan, Gang Hua

The training cost of our asymmetric VQGAN is cheap, and we only need to retrain a new asymmetric decoder while keeping the vanilla VQGAN encoder and StableDiffusion unchanged.

Image Inpainting

956

Paper
Code

Uni-ControlNet: All-in-One Control to Text-to-Image Diffusion Models

1 code implementation • NeurIPS 2023 • Shihao Zhao, Dongdong Chen, Yen-Chun Chen, Jianmin Bao, Shaozhe Hao, Lu Yuan, Kwan-Yee K. Wong

Text-to-Image diffusion models have made tremendous progress over the past two years, enabling the generation of highly realistic images based on open-domain text descriptions.

499

Paper
Code

Image as First-Order Norm+Linear Autoregression: Unveiling Mathematical Invariance

no code implementations • 25 May 2023 • Yinpeng Chen, Xiyang Dai, Dongdong Chen, Mengchen Liu, Lu Yuan, Zicheng Liu, Youzuo Lin

This paper introduces a novel mathematical property applicable to diverse images, referred to as FINOLA (First-Order Norm+Linear Autoregressive).

Image Classification Image Reconstruction +3

Paper
Add Code

i-Code Studio: A Configurable and Composable Framework for Integrative AI

no code implementations • 23 May 2023 • Yuwei Fang, Mahmoud Khademi, Chenguang Zhu, ZiYi Yang, Reid Pryzant, Yichong Xu, Yao Qian, Takuya Yoshioka, Lu Yuan, Michael Zeng, Xuedong Huang

Artificial General Intelligence (AGI) requires comprehensive understanding and generation capabilities for a variety of tasks spanning different modalities and functionalities.

Question Answering Retrieval +4

Paper
Add Code

Album Storytelling with Iterative Story-aware Captioning and Large Language Models

no code implementations • 22 May 2023 • Munan Ning, Yujia Xie, Dongdong Chen, Zeyin Song, Lu Yuan, Yonghong Tian, Qixiang Ye, Li Yuan

One natural approach is to use caption models to describe each photo in the album, and then use LLMs to summarize and rewrite the generated captions into an engaging story.

Paper
Add Code

i-Code V2: An Autoregressive Generation Framework over Vision, Language, and Speech Data

no code implementations • 21 May 2023 • ZiYi Yang, Mahmoud Khademi, Yichong Xu, Reid Pryzant, Yuwei Fang, Chenguang Zhu, Dongdong Chen, Yao Qian, Mei Gao, Yi-Ling Chen, Robert Gmyr, Naoyuki Kanda, Noel Codella, Bin Xiao, Yu Shi, Lu Yuan, Takuya Yoshioka, Michael Zeng, Xuedong Huang

The convergence of text, visual, and audio data is a key step towards human-like artificial intelligence, however the current Vision-Language-Speech landscape is dominated by encoder-only models which lack generative abilities.

Paper
Add Code

ChatVideo: A Tracklet-centric Multimodal and Versatile Video Understanding System

no code implementations • 27 Apr 2023 • Junke Wang, Dongdong Chen, Chong Luo, Xiyang Dai, Lu Yuan, Zuxuan Wu, Yu-Gang Jiang

Existing deep video models are limited by specific tasks, fixed input-output spaces, and poor generalization capabilities, making it difficult to deploy them in real-world scenarios.

Video Understanding

Paper
Add Code

OmniTracker: Unifying Object Tracking by Tracking-with-Detection

no code implementations • 21 Mar 2023 • Junke Wang, Dongdong Chen, Zuxuan Wu, Chong Luo, Xiyang Dai, Lu Yuan, Yu-Gang Jiang

Object tracking (OT) aims to estimate the positions of target objects in a video sequence.

Object Object Tracking

Paper
Add Code

Cognitive Semantic Communication Systems Driven by Knowledge Graph: Principle, Implementation, and Performance Evaluation

no code implementations • 15 Mar 2023 • Fuhui Zhou, Yihao Li, Ming Xu, Lu Yuan, Qihui Wu, Rose Qingyang Hu, Naofal Al-Dhahir

Extensive simulation results conducted on a public dataset demonstrate that our proposed single-user and multi-user cognitive semantic communication systems are superior to benchmark communication systems in terms of the data compression rate and communication reliability.

Data Compression

Paper
Add Code

Layer Grafted Pre-training: Bridging Contrastive Learning And Masked Image Modeling For Label-Efficient Representations

1 code implementation • 27 Feb 2023 • Ziyu Jiang, Yinpeng Chen, Mengchen Liu, Dongdong Chen, Xiyang Dai, Lu Yuan, Zicheng Liu, Zhangyang Wang

This motivates us to shift the paradigm from combining loss at the end, to choosing the proper learning method per network layer.

Contrastive Learning Few-Shot Learning

Paper
Code

LC-NeRF: Local Controllable Face Generation in Neural Randiance Field

no code implementations • 19 Feb 2023 • Wenyang Zhou, Lu Yuan, ShuYu Chen, Lin Gao, Shimin Hu

Since changes to the latent code affect global generation results, these methods do not allow for fine-grained control of local facial regions.

Face Generation

Paper
Add Code

Generalized Decoding for Pixel, Image, and Language

1 code implementation • CVPR 2023 • Xueyan Zou, Zi-Yi Dou, Jianwei Yang, Zhe Gan, Linjie Li, Chunyuan Li, Xiyang Dai, Harkirat Behl, JianFeng Wang, Lu Yuan, Nanyun Peng, Lijuan Wang, Yong Jae Lee, Jianfeng Gao

We present X-Decoder, a generalized decoding model that can predict pixel-level segmentation and language tokens seamlessly.

Ranked #4 on Instance Segmentation on ADE20K val (using extra training data)

Image Segmentation Panoptic Segmentation +3

1,243

Paper
Code

Look Before You Match: Instance Understanding Matters in Video Object Segmentation

no code implementations • CVPR 2023 • Junke Wang, Dongdong Chen, Zuxuan Wu, Chong Luo, Chuanxin Tang, Xiyang Dai, Yucheng Zhao, Yujia Xie, Lu Yuan, Yu-Gang Jiang

Towards this goal, we present a two-branch network for VOS, where the query-based instance segmentation (IS) branch delves into the instance details of the current frame and the VOS branch performs spatial-temporal matching with the memory bank.

Ranked #1 on Semi-Supervised Video Object Segmentation on Long Video Dataset (using extra training data)

Instance Segmentation Segmentation +3

Paper
Add Code

CLIP Itself is a Strong Fine-tuner: Achieving 85.7% and 88.0% Top-1 Accuracy with ViT-B and ViT-L on ImageNet

1 code implementation • 12 Dec 2022 • Xiaoyi Dong, Jianmin Bao, Ting Zhang, Dongdong Chen, Shuyang Gu, Weiming Zhang, Lu Yuan, Dong Chen, Fang Wen, Nenghai Yu

Recent studies have shown that CLIP has achieved remarkable success in performing zero-shot inference while its fine-tuning performance is not satisfactory.

185

Paper
Code

Masked Video Distillation: Rethinking Masked Feature Modeling for Self-supervised Video Representation Learning

4 code implementations • CVPR 2023 • Rui Wang, Dongdong Chen, Zuxuan Wu, Yinpeng Chen, Xiyang Dai, Mengchen Liu, Lu Yuan, Yu-Gang Jiang

For the choice of teacher models, we observe that students taught by video teachers perform better on temporally-heavy video tasks, while image teachers transfer stronger spatial representations for spatially-heavy video tasks.

Ranked #1 on Self-Supervised Action Recognition on HMDB51

Action Classification Representation Learning +1

Paper
Code

X-Paste: Revisiting Scalable Copy-Paste for Instance Segmentation using CLIP and StableDiffusion

1 code implementation • 7 Dec 2022 • Hanqing Zhao, Dianmo Sheng, Jianmin Bao, Dongdong Chen, Dong Chen, Fang Wen, Lu Yuan, Ce Liu, Wenbo Zhou, Qi Chu, Weiming Zhang, Nenghai Yu

We demonstrate for the first time that using a text2image model to generate images or zero-shot recognition model to filter noisily crawled images for different object categories is a feasible way to make Copy-Paste truly scalable.

Ranked #7 on Instance Segmentation on LVIS v1.0 val

Data Augmentation Instance Segmentation +5

Paper
Code

Improving Commonsense in Vision-Language Models via Knowledge Graph Riddles

1 code implementation • CVPR 2023 • Shuquan Ye, Yujia Xie, Dongdong Chen, Yichong Xu, Lu Yuan, Chenguang Zhu, Jing Liao

Through our analysis, we find one important reason is that existing large-scale VL datasets do not contain much commonsense knowledge, which motivates us to improve the commonsense of VL-models from the data perspective.

Data Augmentation Retrieval

Paper
Code

Self-Supervised Learning based on Heat Equation

no code implementations • 23 Nov 2022 • Yinpeng Chen, Xiyang Dai, Dongdong Chen, Mengchen Liu, Lu Yuan, Zicheng Liu, Youzuo Lin

When transferring to object detection with frozen backbone, QB-Heat outperforms MoCo-v2 and supervised pre-training on ImageNet by 7. 9 and 4. 5 AP respectively.

Image Classification object-detection +2

Paper
Add Code

SinDiffusion: Learning a Diffusion Model from a Single Natural Image

1 code implementation • 22 Nov 2022 • Weilun Wang, Jianmin Bao, Wengang Zhou, Dongdong Chen, Dong Chen, Lu Yuan, Houqiang Li

We present SinDiffusion, leveraging denoising diffusion models to capture internal distribution of patches from a single natural image.

Ranked #1 on Image Generation on Places50

Denoising Image Generation +1

271

Paper
Code

OmniVL:One Foundation Model for Image-Language and Video-Language Tasks

no code implementations • 15 Sep 2022 • Junke Wang, Dongdong Chen, Zuxuan Wu, Chong Luo, Luowei Zhou, Yucheng Zhao, Yujia Xie, Ce Liu, Yu-Gang Jiang, Lu Yuan

This paper presents OmniVL, a new foundation model to support both image-language and video-language tasks using one universal architecture.

Ranked #4 on Cross-Modal Retrieval on Flickr30k (using extra training data)

Action Classification Action Recognition +13

Paper
Add Code

Frido: Feature Pyramid Diffusion for Complex Scene Image Synthesis

1 code implementation • 29 Aug 2022 • Wan-Cyuan Fan, Yen-Chun Chen, Dongdong Chen, Yu Cheng, Lu Yuan, Yu-Chiang Frank Wang

Diffusion models (DMs) have shown great potential for high-quality image synthesis.

Conditional Image Generation Denoising +1

111

Paper
Code

MaskCLIP: Masked Self-Distillation Advances Contrastive Language-Image Pretraining

no code implementations • CVPR 2023 • Xiaoyi Dong, Jianmin Bao, Yinglin Zheng, Ting Zhang, Dongdong Chen, Hao Yang, Ming Zeng, Weiming Zhang, Lu Yuan, Dong Chen, Fang Wen, Nenghai Yu

Second, masked self-distillation is also consistent with vision-language contrastive from the perspective of training objective as both utilize the visual encoder for feature aligning, and thus is able to learn local semantics getting indirect supervision from the language.

Representation Learning

Paper
Add Code

Video Mobile-Former: Video Recognition with Efficient Global Spatial-temporal Modeling

no code implementations • 25 Aug 2022 • Rui Wang, Zuxuan Wu, Dongdong Chen, Yinpeng Chen, Xiyang Dai, Mengchen Liu, Luowei Zhou, Lu Yuan, Yu-Gang Jiang

To avoid significant computational cost incurred by computing self-attention between the large number of local patches in videos, we propose to use very few global tokens (e. g., 6) for a whole video in Transformers to exchange information with 3D-CNNs with a cross-attention mechanism.

Video Recognition

Paper
Add Code

Learning Visual Representation from Modality-Shared Contrastive Language-Image Pre-training

1 code implementation • 26 Jul 2022 • Haoxuan You, Luowei Zhou, Bin Xiao, Noel Codella, Yu Cheng, Ruochen Xu, Shih-Fu Chang, Lu Yuan

Large-scale multi-modal contrastive pre-training has demonstrated great utility to learn transferable features for a range of downstream tasks by mapping multiple modalities into a shared embedding space.

Paper
Code

TinyViT: Fast Pretraining Distillation for Small Vision Transformers

2 code implementations • 21 Jul 2022 • Kan Wu, Jinnian Zhang, Houwen Peng, Mengchen Liu, Bin Xiao, Jianlong Fu, Lu Yuan

It achieves a top-1 accuracy of 84. 8% on ImageNet-1k with only 21M parameters, being comparable to Swin-B pretrained on ImageNet-21k while using 4. 2 times fewer parameters.

Ranked #133 on Image Classification on ImageNet

Image Classification Knowledge Distillation

29,624

Paper
Code

Bootstrapped Masked Autoencoders for Vision BERT Pretraining

1 code implementation • 14 Jul 2022 • Xiaoyi Dong, Jianmin Bao, Ting Zhang, Dongdong Chen, Weiming Zhang, Lu Yuan, Dong Chen, Fang Wen, Nenghai Yu

The first design is motivated by the observation that using a pretrained MAE to extract the features as the BERT prediction target for masked tokens can achieve better pretraining performance.

Ranked #19 on Self-Supervised Image Classification on ImageNet (finetuned)

Object Detection Self-Supervised Image Classification +1

Paper
Code

Should All Proposals be Treated Equally in Object Detection?

1 code implementation • 7 Jul 2022 • Yunsheng Li, Yinpeng Chen, Xiyang Dai, Dongdong Chen, Mengchen Liu, Pei Yu, Jing Yin, Lu Yuan, Zicheng Liu, Nuno Vasconcelos

We formulate this as a learning problem where the goal is to assign operators to proposals, in the detection head, so that the total computational cost is constrained and the precision is maximized.

Object Object Detection

Paper
Code

Semantic Image Synthesis via Diffusion Models

3 code implementations • 30 Jun 2022 • Weilun Wang, Jianmin Bao, Wengang Zhou, Dongdong Chen, Dong Chen, Lu Yuan, Houqiang Li

Denoising Diffusion Probabilistic Models (DDPMs) have achieved remarkable success in various image generation tasks compared with Generative Adversarial Nets (GANs).

Denoising Image Generation

191

Paper
Code

GLIPv2: Unifying Localization and Vision-Language Understanding

1 code implementation • 12 Jun 2022 • Haotian Zhang, Pengchuan Zhang, Xiaowei Hu, Yen-Chun Chen, Liunian Harold Li, Xiyang Dai, Lijuan Wang, Lu Yuan, Jenq-Neng Hwang, Jianfeng Gao

We present GLIPv2, a grounded VL understanding model, that serves both localization tasks (e. g., object detection, instance segmentation) and Vision-Language (VL) understanding tasks (e. g., VQA, image captioning).

Ranked #1 on Phrase Grounding on Flickr30k Entities Test (using extra training data)

Contrastive Learning Image Captioning +7

1,936

Paper
Code

Detection Hub: Unifying Object Detection Datasets via Query Adaptation on Language Embedding

no code implementations • CVPR 2023 • Lingchen Meng, Xiyang Dai, Yinpeng Chen, Pengchuan Zhang, Dongdong Chen, Mengchen Liu, JianFeng Wang, Zuxuan Wu, Lu Yuan, Yu-Gang Jiang

Detection Hub further achieves SoTA performance on UODB benchmark with wide variety of datasets.

Object object-detection +1

Paper
Add Code

Visual Clues: Bridging Vision and Language Foundations for Image Paragraph Captioning

no code implementations • 3 Jun 2022 • Yujia Xie, Luowei Zhou, Xiyang Dai, Lu Yuan, Nguyen Bach, Ce Liu, Michael Zeng

Thanks to the strong zero-shot capability of foundation models, we start by constructing a rich semantic representation of the image (e. g., image tags, object attributes / locations, captions) as a structured textual prompt, called visual clues, using a vision foundation model.

Image Paragraph Captioning Language Modelling +1

Paper
Add Code

REVIVE: Regional Visual Representation Matters in Knowledge-Based Visual Question Answering

1 code implementation • 2 Jun 2022 • Yuanze Lin, Yujia Xie, Dongdong Chen, Yichong Xu, Chenguang Zhu, Lu Yuan

Specifically, we observe that in most state-of-the-art knowledge-based VQA methods: 1) visual features are extracted either from the whole image or in a sliding window manner for retrieving knowledge, and the important relationship within/among object regions is neglected; 2) visual features are not well utilized in the final answering model, which is counter-intuitive to some extent.

Ranked #11 on Visual Question Answering (VQA) on OK-VQA

Question Answering Retrieval +1

Paper
Code

Reduce Information Loss in Transformers for Pluralistic Image Inpainting

1 code implementation • CVPR 2022 • Qiankun Liu, Zhentao Tan, Dongdong Chen, Qi Chu, Xiyang Dai, Yinpeng Chen, Mengchen Liu, Lu Yuan, Nenghai Yu

The indices of quantized pixels are used as tokens for the inputs and prediction targets of transformer.

Ranked #6 on Seeing Beyond the Visible on KITTI360-EX

Image Inpainting Quantization +1

147

Paper
Code

i-Code: An Integrative and Composable Multimodal Learning Framework

no code implementations • 3 May 2022 • ZiYi Yang, Yuwei Fang, Chenguang Zhu, Reid Pryzant, Dongdong Chen, Yu Shi, Yichong Xu, Yao Qian, Mei Gao, Yi-Ling Chen, Liyang Lu, Yujia Xie, Robert Gmyr, Noel Codella, Naoyuki Kanda, Bin Xiao, Lu Yuan, Takuya Yoshioka, Michael Zeng, Xuedong Huang

Human intelligence is multimodal; we integrate visual, linguistic, and acoustic signals to maintain a holistic worldview.

Contrastive Learning Video Understanding

Paper
Add Code

Multimodal Adaptive Distillation for Leveraging Unimodal Encoders for Vision-Language Tasks

no code implementations • 22 Apr 2022 • Zhecan Wang, Noel Codella, Yen-Chun Chen, Luowei Zhou, Xiyang Dai, Bin Xiao, Jianwei Yang, Haoxuan You, Kai-Wei Chang, Shih-Fu Chang, Lu Yuan

Experiments demonstrate that MAD leads to consistent gains in the low-shot, domain-shifted, and fully-supervised conditions on VCR, SNLI-VE, and VQA, achieving SOTA performance on VCR compared to other single models pretrained with image-text data.

Ranked #4 on Visual Question Answering (VQA) on VCR (Q-A) test

Question Answering Visual Commonsense Reasoning +2

Paper
Add Code

K-LITE: Learning Transferable Visual Models with External Knowledge

2 code implementations • 20 Apr 2022 • Sheng Shen, Chunyuan Li, Xiaowei Hu, Jianwei Yang, Yujia Xie, Pengchuan Zhang, Zhe Gan, Lijuan Wang, Lu Yuan, Ce Liu, Kurt Keutzer, Trevor Darrell, Anna Rohrbach, Jianfeng Gao

We propose K-LITE, a simple strategy to leverage external knowledge for building transferable visual systems: In training, it enriches entities in text with WordNet and Wiktionary knowledge, leading to an efficient and scalable approach to learning image representations that uses knowledge about the visual concepts.

Benchmarking Descriptive +4

367

Paper
Code

Residual Mixture of Experts

no code implementations • 20 Apr 2022 • Lemeng Wu, Mengchen Liu, Yinpeng Chen, Dongdong Chen, Xiyang Dai, Lu Yuan

In this paper, we propose Residual Mixture of Experts (RMoE), an efficient training pipeline for MoE vision transformers on downstream tasks, such as segmentation and detection.

object-detection Object Detection

Paper
Add Code

MiniViT: Compressing Vision Transformers with Weight Multiplexing

2 code implementations • CVPR 2022 • Jinnian Zhang, Houwen Peng, Kan Wu, Mengchen Liu, Bin Xiao, Jianlong Fu, Lu Yuan

The central idea of MiniViT is to multiplex the weights of consecutive transformer blocks.

Ranked #209 on Image Classification on ImageNet (using extra training data)

Image Classification

1,555

Paper
Code

DaViT: Dual Attention Vision Transformers

3 code implementations • 7 Apr 2022 • Mingyu Ding, Bin Xiao, Noel Codella, Ping Luo, Jingdong Wang, Lu Yuan

We show that these two self-attentions complement each other: (i) since each channel token contains an abstract representation of the entire image, the channel attention naturally captures global interactions and representations by taking all spatial positions into account when computing attention scores between channels; (ii) the spatial attention refines the local representations by performing fine-grained interactions across spatial locations, which in turn helps the global information modeling in channel attention.

Ranked #1 on Instance Segmentation on Object Detection on COCO minival

Computational Efficiency Image Classification +4

29,624

Paper
Code

Unified Contrastive Learning in Image-Text-Label Space

1 code implementation • CVPR 2022 • Jianwei Yang, Chunyuan Li, Pengchuan Zhang, Bin Xiao, Ce Liu, Lu Yuan, Jianfeng Gao

Particularly, it attains gains up to 9. 2% and 14. 5% in average on zero-shot recognition benchmarks over the language-image contrastive learning and supervised learning methods, respectively.

Contrastive Learning Image Classification +2

367

Paper
Code

Large-Scale Pre-training for Person Re-identification with Noisy Labels

2 code implementations • CVPR 2022 • Dengpan Fu, Dongdong Chen, Hao Yang, Jianmin Bao, Lu Yuan, Lei Zhang, Houqiang Li, Fang Wen, Dong Chen

Since theses ID labels automatically derived from tracklets inevitably contain noises, we develop a large-scale Pre-training framework utilizing Noisy Labels (PNL), which consists of three learning modules: supervised Re-ID learning, prototype-based contrastive learning, and label-guided contrastive learning.

Ranked #7 on Person Re-Identification on CUHK03

Contrastive Learning Multi-Object Tracking +3

214

Paper
Code

Focal Modulation Networks

6 code implementations • 22 Mar 2022 • Jianwei Yang, Chunyuan Li, Xiyang Dai, Lu Yuan, Jianfeng Gao

For semantic segmentation with UPerNet, FocalNet base at single-scale outperforms Swin by 2. 4, and beats Swin at multi-scale (50. 5 v. s.

Ranked #8 on Object Detection on COCO minival (using extra training data)

Image Classification Object Detection +2

12,012

Paper
Code

CLIP-TD: CLIP Targeted Distillation for Vision-Language Tasks

no code implementations • 15 Jan 2022 • Zhecan Wang, Noel Codella, Yen-Chun Chen, Luowei Zhou, Jianwei Yang, Xiyang Dai, Bin Xiao, Haoxuan You, Shih-Fu Chang, Lu Yuan

Experiments demonstrate that our proposed CLIP-TD leads to exceptional gains in the low-shot (up to 51. 9%) and domain-shifted (up to 71. 3%) conditions of VCR, while simultaneously improving performance under standard fully-supervised conditions (up to 2%), achieving state-of-art performance on VCR compared to other single models that are pretrained with image-text data only.

Question Answering Visual Commonsense Reasoning +2

Paper
Add Code

Online Multi-Object Tracking with Unsupervised Re-Identification Learning and Occlusion Estimation

no code implementations • 4 Jan 2022 • Qiankun Liu, Dongdong Chen, Qi Chu, Lu Yuan, Bin Liu, Lei Zhang, Nenghai Yu

In addition, such practice of re-identification still can not track those highly occluded objects when they are missed by the detector.

Ranked #7 on Multi-Object Tracking on MOT16 (using extra training data)

Multi-Object Tracking Object +2

Paper
Add Code

RegionCLIP: Region-based Language-Image Pretraining

1 code implementation • CVPR 2022 • Yiwu Zhong, Jianwei Yang, Pengchuan Zhang, Chunyuan Li, Noel Codella, Liunian Harold Li, Luowei Zhou, Xiyang Dai, Lu Yuan, Yin Li, Jianfeng Gao

However, we show that directly applying such models to recognize image regions for object detection leads to poor performance due to a domain shift: CLIP was trained to match an image as a whole to a text description, without capturing the fine-grained alignment between image regions and text spans.

Ranked #11 on Open Vocabulary Object Detection on MSCOCO (using extra training data)

Image Classification Object +3

643

Paper
Code

HairCLIP: Design Your Hair by Text and Reference Image

1 code implementation • CVPR 2022 • Tianyi Wei, Dongdong Chen, Wenbo Zhou, Jing Liao, Zhentao Tan, Lu Yuan, Weiming Zhang, Nenghai Yu

Hair editing is an interesting and challenging problem in computer vision and graphics.

Attribute

485

Paper
Code

Grounded Language-Image Pre-training

2 code implementations • CVPR 2022 • Liunian Harold Li, Pengchuan Zhang, Haotian Zhang, Jianwei Yang, Chunyuan Li, Yiwu Zhong, Lijuan Wang, Lu Yuan, Lei Zhang, Jenq-Neng Hwang, Kai-Wei Chang, Jianfeng Gao

The unification brings two benefits: 1) it allows GLIP to learn from both detection and grounding data to improve both tasks and bootstrap a good grounding model; 2) GLIP can leverage massive image-text pairs by generating grounding boxes in a self-training fashion, making the learned representation semantic-rich.

Ranked #1 on 2D Object Detection on RF100

Described Object Detection Few-Shot Object Detection +1

1,936

Paper
Code

General Facial Representation Learning in a Visual-Linguistic Manner

2 code implementations • CVPR 2022 • Yinglin Zheng, Hao Yang, Ting Zhang, Jianmin Bao, Dongdong Chen, Yangyu Huang, Lu Yuan, Dong Chen, Ming Zeng, Fang Wen

In this paper, we study the transfer performance of pre-trained models on face analysis tasks and introduce a framework, called FaRL, for general Facial Representation Learning in a visual-linguistic manner.

Ranked #1 on Face Parsing on CelebAMask-HQ (using extra training data)

Face Alignment Face Parsing +1

329

Paper
Code

BEVT: BERT Pretraining of Video Transformers

1 code implementation • CVPR 2022 • Rui Wang, Dongdong Chen, Zuxuan Wu, Yinpeng Chen, Xiyang Dai, Mengchen Liu, Yu-Gang Jiang, Luowei Zhou, Lu Yuan

This design is motivated by two observations: 1) transformers learned on image datasets provide decent spatial priors that can ease the learning of video transformers, which are often times computationally-intensive if trained from scratch; 2) discriminative clues, i. e., spatial and temporal information, needed to make correct predictions vary among different videos due to large intra-class and inter-class variations.

Ranked #8 on Action Recognition on Diving-48

Action Recognition Representation Learning

153

Paper
Code

Focal Attention for Long-Range Interactions in Vision Transformers

1 code implementation • NeurIPS 2021 • Jianwei Yang, Chunyuan Li, Pengchuan Zhang, Xiyang Dai, Bin Xiao, Lu Yuan, Jianfeng Gao

With focal attention, we propose a new variant of Vision Transformer models, called Focal Transformers, which achieve superior performance over the state-of-the-art (SoTA) Vision Transformers on a range of public image classification and object detection benchmarks.

Image Classification object-detection +2

542

Paper
Code

Vector Quantized Diffusion Model for Text-to-Image Synthesis

2 code implementations • CVPR 2022 • Shuyang Gu, Dong Chen, Jianmin Bao, Fang Wen, Bo Zhang, Dongdong Chen, Lu Yuan, Baining Guo

Our experiments indicate that the VQ-Diffusion model with the reparameterization is fifteen times faster than traditional AR methods while achieving a better image quality.

Ranked #1 on Text-to-Image Generation on Oxford 102 Flowers (using extra training data)

Denoising Text-to-Image Generation

832

Paper
Code

PeCo: Perceptual Codebook for BERT Pre-training of Vision Transformers

1 code implementation • 24 Nov 2021 • Xiaoyi Dong, Jianmin Bao, Ting Zhang, Dongdong Chen, Weiming Zhang, Lu Yuan, Dong Chen, Fang Wen, Nenghai Yu

This paper explores a better prediction target for BERT pre-training of vision transformers.

Ranked #4 on Self-Supervised Image Classification on ImageNet (finetuned)

object-detection Object Detection +2

153

Paper
Code

Florence: A New Foundation Model for Computer Vision

1 code implementation • 22 Nov 2021 • Lu Yuan, Dongdong Chen, Yi-Ling Chen, Noel Codella, Xiyang Dai, Jianfeng Gao, Houdong Hu, Xuedong Huang, Boxin Li, Chunyuan Li, Ce Liu, Mengchen Liu, Zicheng Liu, Yumao Lu, Yu Shi, Lijuan Wang, JianFeng Wang, Bin Xiao, Zhen Xiao, Jianwei Yang, Michael Zeng, Luowei Zhou, Pengchuan Zhang

Computer vision foundation models, which are trained on diverse, large-scale dataset and can be adapted to a wide range of downstream tasks, are critical for this mission to solve real-world computer vision applications.

Ranked #1 on Action Recognition In Videos on Kinetics-600

Action Classification Action Recognition In Videos +12

367

Paper
Code

An Empirical Study of Training End-to-End Vision-and-Language Transformers

2 code implementations • CVPR 2022 • Zi-Yi Dou, Yichong Xu, Zhe Gan, JianFeng Wang, Shuohang Wang, Lijuan Wang, Chenguang Zhu, Pengchuan Zhang, Lu Yuan, Nanyun Peng, Zicheng Liu, Michael Zeng

Vision-and-language (VL) pre-training has proven to be highly effective on various VL downstream tasks.

Ranked #20 on Cross-Modal Retrieval on COCO 2014 (using extra training data)

Cross-Modal Retrieval Visual Question Answering (VQA) +1

349

Paper
Code

Unsupervised Finetuning

no code implementations • 18 Oct 2021 • Suichan Li, Dongdong Chen, Yinpeng Chen, Lu Yuan, Lei Zhang, Qi Chu, Bin Liu, Nenghai Yu

This problem is more challenging than the supervised counterpart, as the low data density in the small-scale target data is not friendly for unsupervised learning, leading to the damage of the pretrained representation and poor representation in the target domain.

Paper
Add Code

MA-CLIP: Towards Modality-Agnostic Contrastive Language-Image Pre-training

no code implementations • 29 Sep 2021 • Haoxuan You, Luowei Zhou, Bin Xiao, Noel C Codella, Yu Cheng, Ruochen Xu, Shih-Fu Chang, Lu Yuan

Large-scale multimodal contrastive pretraining has demonstrated great utility to support high performance in a range of downstream tasks by mapping multiple modalities into a shared embedding space.

Paper
Add Code

Automatic Modulation Classification Using Involution Enabled Residual Networks

no code implementations • 23 Aug 2021 • Hao Zhang, Lu Yuan, Guangyu Wu, Fuhui Zhou, Qihui Wu

Automatic modulation classification (AMC) is of crucial importance for realizing wireless intelligence communications.

Classification

Paper
Add Code

MicroNet: Improving Image Recognition with Extremely Low FLOPs

1 code implementation • ICCV 2021 • Yunsheng Li, Yinpeng Chen, Xiyang Dai, Dongdong Chen, Mengchen Liu, Lu Yuan, Zicheng Liu, Lei Zhang, Nuno Vasconcelos

This paper aims at addressing the problem of substantial performance degradation at extremely low computational cost (e. g. 5M FLOPs on ImageNet classification).

328

Paper
Code

Mobile-Former: Bridging MobileNet and Transformer

4 code implementations • CVPR 2022 • Yinpeng Chen, Xiyang Dai, Dongdong Chen, Mengchen Liu, Xiaoyi Dong, Lu Yuan, Zicheng Liu

This structure leverages the advantages of MobileNet at local processing and transformer at global interaction.

object-detection Object Detection

1,183

Paper
Code

Improve Unsupervised Pretraining for Few-label Transfer

no code implementations • ICCV 2021 • Suichan Li, Dongdong Chen, Yinpeng Chen, Lu Yuan, Lei Zhang, Qi Chu, Bin Liu, Nenghai Yu

Unsupervised pretraining has achieved great success and many recent works have shown unsupervised pretraining can achieve comparable or even slightly better transfer performance than supervised pretraining on downstream target datasets.

Clustering Contrastive Learning

Paper
Add Code

CSWin Transformer: A General Vision Transformer Backbone with Cross-Shaped Windows

6 code implementations • CVPR 2022 • Xiaoyi Dong, Jianmin Bao, Dongdong Chen, Weiming Zhang, Nenghai Yu, Lu Yuan, Dong Chen, Baining Guo

By further pretraining on the larger dataset ImageNet-21K, we achieve 87. 5% Top-1 accuracy on ImageNet-1K and high segmentation performance on ADE20K with 55. 7 mIoU.

Ranked #25 on Semantic Segmentation on ADE20K val

Image Classification Semantic Segmentation

5,245

Paper
Code

Focal Self-attention for Local-Global Interactions in Vision Transformers

3 code implementations • 1 Jul 2021 • Jianwei Yang, Chunyuan Li, Pengchuan Zhang, Xiyang Dai, Bin Xiao, Lu Yuan, Jianfeng Gao

With focal self-attention, we propose a new variant of Vision Transformer models, called Focal Transformer, which achieves superior performance over the state-of-the-art vision Transformers on a range of public image classification and object detection benchmarks.

Ranked #17 on Instance Segmentation on COCO test-dev

Image Classification Instance Segmentation +3

1,183

Paper
Code

Efficient Self-supervised Vision Transformers for Representation Learning

1 code implementation • ICLR 2022 • Chunyuan Li, Jianwei Yang, Pengchuan Zhang, Mei Gao, Bin Xiao, Xiyang Dai, Lu Yuan, Jianfeng Gao

This paper investigates two techniques for developing efficient self-supervised vision transformers (EsViT) for visual representation learning.

Ranked #16 on Self-Supervised Image Classification on ImageNet

Representation Learning Self-Supervised Image Classification

405

Paper
Code

Dynamic Head: Unifying Object Detection Heads with Attentions

3 code implementations • CVPR 2021 • Xiyang Dai, Yinpeng Chen, Bin Xiao, Dongdong Chen, Mengchen Liu, Lu Yuan, Lei Zhang

In this paper, we present a novel dynamic head framework to unify object detection heads with attentions.

Ranked #2 on Object Detection on COCO 2017 val (AP75 metric)

Object object-detection +1

27,678

Paper
Code

Chasing Sparsity in Vision Transformers: An End-to-End Exploration

1 code implementation • NeurIPS 2021 • Tianlong Chen, Yu Cheng, Zhe Gan, Lu Yuan, Lei Zhang, Zhangyang Wang

For example, our sparsified DeiT-Small at (5%, 50%) sparsity for (data, architecture), improves 0. 28% top-1 accuracy, and meanwhile enjoys 49. 32% FLOPs and 4. 40% running time savings.

Ranked #20 on Efficient ViTs on ImageNet-1K (with DeiT-T)

Efficient ViTs

Paper
Code

E2Style: Improve the Efficiency and Effectiveness of StyleGAN Inversion

2 code implementations • 15 Apr 2021 • Tianyi Wei, Dongdong Chen, Wenbo Zhou, Jing Liao, Weiming Zhang, Lu Yuan, Gang Hua, Nenghai Yu

This paper studies the problem of StyleGAN inversion, which plays an essential role in enabling the pretrained StyleGAN to be used for real image editing tasks.

Face Parsing

142

Paper
Code

Lite-HRNet: A Lightweight High-Resolution Network

15 code implementations • CVPR 2021 • Changqian Yu, Bin Xiao, Changxin Gao, Lu Yuan, Lei Zhang, Nong Sang, Jingdong Wang

We introduce a lightweight unit, conditional channel weighting, to replace costly pointwise (1x1) convolutions in shuffle blocks.

Ranked #33 on Pose Estimation on COCO test-dev

Pose Estimation Real-Time Semantic Segmentation +1

12,012

Paper
Code

Multi-Scale Vision Longformer: A New Vision Transformer for High-Resolution Image Encoding

3 code implementations • ICCV 2021 • Pengchuan Zhang, Xiyang Dai, Jianwei Yang, Bin Xiao, Lu Yuan, Lei Zhang, Jianfeng Gao

This paper presents a new Vision Transformer (ViT) architecture Multi-Scale Vision Longformer, which significantly enhances the ViT of \cite{dosovitskiy2020image} for encoding high-resolution images using two techniques.

Ranked #45 on Instance Segmentation on COCO minival

Image Classification Instance Segmentation +2

405

Paper
Code

CvT: Introducing Convolutions to Vision Transformers

14 code implementations • ICCV 2021 • Haiping Wu, Bin Xiao, Noel Codella, Mengchen Liu, Xiyang Dai, Lu Yuan, Lei Zhang

We present in this paper a new architecture, named Convolutional vision Transformer (CvT), that improves Vision Transformer (ViT) in performance and efficiency by introducing convolutions into ViT to yield the best of both designs.

Ranked #3 on Image Classification on Flowers-102 (using extra training data)

Image Classification

124,353

Paper
Code

Dynamic Transfer for Multi-Source Domain Adaptation

1 code implementation • CVPR 2021 • Yunsheng Li, Lu Yuan, Yinpeng Chen, Pei Wang, Nuno Vasconcelos

However, such a static model is difficult to handle conflicts across multiple domains, and suffers from a performance degradation in both source domains and target domain.

Domain Adaptation

Paper
Code

Revisiting Dynamic Convolution via Matrix Decomposition

1 code implementation • ICLR 2021 • Yunsheng Li, Yinpeng Chen, Xiyang Dai, Mengchen Liu, Dongdong Chen, Ye Yu, Lu Yuan, Zicheng Liu, Mei Chen, Nuno Vasconcelos

It has two limitations: (a) it increases the number of convolutional weights by K-times, and (b) the joint optimization of dynamic attention and static convolution kernels is challenging.

Dimensionality Reduction

129

Paper
Code

Stronger NAS with Weaker Predictors

1 code implementation • NeurIPS 2021 • Junru Wu, Xiyang Dai, Dongdong Chen, Yinpeng Chen, Mengchen Liu, Ye Yu, Zhangyang Wang, Zicheng Liu, Mei Chen, Lu Yuan

We propose a paradigm shift from fitting the whole architecture space using one strong predictor, to progressively fitting a search path towards the high-performance sub-space through a set of weaker predictors.

Neural Architecture Search

Paper
Code

Dynamic DETR: End-to-End Object Detection With Dynamic Attention

no code implementations • ICCV 2021 • Xiyang Dai, Yinpeng Chen, Jianwei Yang, Pengchuan Zhang, Lu Yuan, Lei Zhang

To mitigate the second limitation of learning difficulty, we introduce a dynamic decoder by replacing the cross-attention module with a ROI-based dynamic attention in the Transformer decoder.

object-detection Object Detection

Paper
Add Code

Weak NAS Predictor Is All You Need

no code implementations • 1 Jan 2021 • Junru Wu, Xiyang Dai, Dongdong Chen, Yinpeng Chen, Mengchen Liu, Ye Yu, Zhangyang Wang, Zicheng Liu, Mei Chen, Lu Yuan

Rather than expecting a single strong predictor to model the whole space, we seek a progressive line of weak predictors that can connect a path to the best architecture, thus greatly simplifying the learning task of each predictor.

Neural Architecture Search

Paper
Add Code

Are Fewer Labels Possible for Few-shot Learning?

no code implementations • 10 Dec 2020 • Suichan Li, Dongdong Chen, Yinpeng Chen, Lu Yuan, Lei Zhang, Qi Chu, Nenghai Yu

We conduct experiments on 10 different few-shot target datasets, and our average few-shot performance outperforms both vanilla inductive unsupervised transfer and supervised transfer by a large margin.

Clustering Few-Shot Learning

Paper
Add Code

Efficient Semantic Image Synthesis via Class-Adaptive Normalization

1 code implementation • 8 Dec 2020 • Zhentao Tan, Dongdong Chen, Qi Chu, Menglei Chai, Jing Liao, Mingming He, Lu Yuan, Gang Hua, Nenghai Yu

Spatially-adaptive normalization (SPADE) is remarkably successful recently in conditional semantic image synthesis \cite{park2019semantic}, which modulates the normalized activation with spatially-varying transformations learned from semantic layouts, to prevent the semantic information from being washed away.

Image Generation

Paper
Code

Unsupervised Pre-training for Person Re-identification

1 code implementation • CVPR 2021 • Dengpan Fu, Dongdong Chen, Jianmin Bao, Hao Yang, Lu Yuan, Lei Zhang, Houqiang Li, Dong Chen

In this paper, we present a large scale unlabeled person re-identification (Re-ID) dataset "LUPerson" and make the first attempt of performing unsupervised pre-training for improving the generalization ability of the learned person Re-ID feature representation.

Ranked #1 on Person Re-Identification on Market-1501 (using extra training data)

Data Augmentation Person Re-Identification +1

214

Paper
Code

MicroNet: Towards Image Recognition with Extremely Low FLOPs

no code implementations • 24 Nov 2020 • Yunsheng Li, Yinpeng Chen, Xiyang Dai, Dongdong Chen, Mengchen Liu, Lu Yuan, Zicheng Liu, Lei Zhang, Nuno Vasconcelos

In this paper, we present MicroNet, which is an efficient convolutional neural network using extremely low computational cost (e. g. 6 MFLOPs on ImageNet classification).

Paper
Add Code

MichiGAN: Multi-Input-Conditioned Hair Image Generation for Portrait Editing

1 code implementation • 30 Oct 2020 • Zhentao Tan, Menglei Chai, Dongdong Chen, Jing Liao, Qi Chu, Lu Yuan, Sergey Tulyakov, Nenghai Yu

In this paper, we present MichiGAN (Multi-Input-Conditioned Hair Image GAN), a novel conditional image generation method for interactive portrait hair manipulation.

Conditional Image Generation

289

Paper
Code

GreedyFool: Distortion-Aware Sparse Adversarial Attack

1 code implementation • NeurIPS 2020 • Xiaoyi Dong, Dongdong Chen, Jianmin Bao, Chuan Qin, Lu Yuan, Weiming Zhang, Nenghai Yu, Dong Chen

Sparse adversarial samples are a special branch of adversarial samples that can fool the target model by only perturbing a few pixels.

Adversarial Attack

Paper
Code

LSM: Learning Subspace Minimization for Low-level Vision

no code implementations • CVPR 2020 • Chengzhou Tang, Lu Yuan, Ping Tan

We study the energy minimization problem in low-level vision tasks from a novel perspective.

Image Segmentation Optical Flow Estimation +5

Paper
Add Code

Cross-domain Correspondence Learning for Exemplar-based Image Translation

3 code implementations • CVPR 2020 • Pan Zhang, Bo Zhang, Dong Chen, Lu Yuan, Fang Wen

The output has the style (e. g., color, texture) in consistency with the semantically corresponding objects in the exemplar.

Ranked #1 on Image-to-Image Translation on ADE20K-Outdoor Labels-to-Photos (FID metric)

Image-to-Image Translation Translation

386

Paper
Code

Rethinking Spatially-Adaptive Normalization

no code implementations • 6 Apr 2020 • Zhentao Tan, Dongdong Chen, Qi Chu, Menglei Chai, Jing Liao, Mingming He, Lu Yuan, Nenghai Yu

Despite its impressive performance, a more thorough understanding of the true advantages inside the box is still highly demanded, to help reduce the significant computation and parameter overheads introduced by these new structures.

Image Generation

Paper
Add Code

Density-Aware Graph for Deep Semi-Supervised Visual Recognition

no code implementations • CVPR 2020 • Suichan Li, Bin Liu, Dong-Dong Chen, Qi Chu, Lu Yuan, Nenghai Yu

Motivated by these limitations, this paper proposes to solve the SSL problem by building a novel density-aware graph, based on which the neighborhood information can be easily leveraged and the feature learning and label propagation can also be trained in an end-to-end way.

Pseudo Label

Paper
Add Code

DA-NAS: Data Adapted Pruning for Efficient Neural Architecture Search

no code implementations • ECCV 2020 • Xiyang Dai, Dong-Dong Chen, Mengchen Liu, Yinpeng Chen, Lu Yuan

One common way is searching on a smaller proxy dataset (e. g., CIFAR-10) and then transferring to the target task (e. g., ImageNet).

Neural Architecture Search

Paper
Add Code

Dynamic ReLU

2 code implementations • ECCV 2020 • Yinpeng Chen, Xiyang Dai, Mengchen Liu, Dong-Dong Chen, Lu Yuan, Zicheng Liu

Rectified linear units (ReLU) are commonly used in deep neural networks.

204

Paper
Code

Dynamic Convolution: Attention over Convolution Kernels

5 code implementations • CVPR 2020 • Yinpeng Chen, Xiyang Dai, Mengchen Liu, Dong-Dong Chen, Lu Yuan, Zicheng Liu

Light-weight convolutional neural networks (CNNs) suffer performance degradation as their low computational budgets constrain both the depth (number of convolution layers) and the width (number of channels) of CNNs, resulting in limited representation capability.

Ranked #905 on Image Classification on ImageNet

Image Classification Keypoint Detection

10,776

Paper
Code

A General Decoupled Learning Framework for Parameterized Image Operators

no code implementations • 11 Jul 2019 • Qingnan Fan, Dong-Dong Chen, Lu Yuan, Gang Hua, Nenghai Yu, Baoquan Chen

To overcome this limitation, we propose a new decoupled learning algorithm to learn from the operator parameters to dynamically adjust the weights of a deep network for image operators, denoted as the base network.

Paper
Add Code

Deep Exemplar-based Video Colorization

1 code implementation • CVPR 2019 • Bo Zhang, Mingming He, Jing Liao, Pedro V. Sander, Lu Yuan, Amine Bermak, Dong Chen

This paper presents the first end-to-end network for exemplar-based video colorization.

Colorization Semantic correspondence

330

Paper
Code

Face Parsing with RoI Tanh-Warping

2 code implementations • CVPR 2019 • Jinpeng Lin, Hao Yang, Dong Chen, Ming Zeng, Fang Wen, Lu Yuan

It uses hierarchical local based method for inner facial components and global methods for outer facial components.

Face Parsing

373

Paper
Code

Mask-Guided Portrait Editing with Conditional GANs

no code implementations • CVPR 2019 • Shuyang Gu, Jianmin Bao, Hao Yang, Dong Chen, Fang Wen, Lu Yuan

Portrait editing is a popular subject in photo manipulation.

Data Augmentation Face Generation +3

Paper
Add Code

Bidirectional Learning for Domain Adaptation of Semantic Segmentation

3 code implementations • CVPR 2019 • Yunsheng Li, Lu Yuan, Nuno Vasconcelos

In this paper, we propose a novel bidirectional learning framework for domain adaptation of segmentation.

Ranked #7 on Semantic Segmentation on DADA-seg

Domain Adaptation Image Segmentation +5

222

Paper
Code

Rethinking Classification and Localization for Object Detection

2 code implementations • CVPR 2020 • Yue Wu, Yinpeng Chen, Lu Yuan, Zicheng Liu, Lijuan Wang, Hongzhi Li, Yun Fu

Two head structures (i. e. fully connected head and convolution head) have been widely used in R-CNN based detectors for classification and localization tasks.

Classification General Classification +3

27,678

Paper
Code

Gated Context Aggregation Network for Image Dehazing and Deraining

1 code implementation • 21 Nov 2018 • Dongdong Chen, Mingming He, Qingnan Fan, Jing Liao, Liheng Zhang, Dongdong Hou, Lu Yuan, Gang Hua

Image dehazing aims to recover the uncorrupted content from a hazy image.

Ranked #1 on Rain Removal on DID-MDN

Image Dehazing Rain Removal

219

Paper
Code

CariGANs: Unpaired Photo-to-Caricature Translation

no code implementations • 1 Nov 2018 • Kaidi Cao, Jing Liao, Lu Yuan

Facial caricature is an art form of drawing faces in an exaggerated way to convey humor or sarcasm.

Caricature Generative Adversarial Network +2

Paper
Add Code

Reinforced Pipeline Optimization: Behaving Optimally with Non-Differentiabilities

no code implementations • 27 Sep 2018 • Aijun Bai, Dongdong Chen, Gang Hua, Lu Yuan

Many machine learning systems are implemented as pipelines.

object-detection Object Detection

Paper
Add Code

Decouple Learning for Parameterized Image Operators

1 code implementation • ECCV 2018 • Qingnan Fan, Dong-Dong Chen, Lu Yuan, Gang Hua, Nenghai Yu, Baoquan Chen

Many different deep networks have been used to approximate, accelerate or improve traditional image operators, such as image smoothing, super-resolution and denoising.

Denoising image smoothing +1

Paper
Code

Deep Exemplar-based Colorization

1 code implementation • 17 Jul 2018 • Mingming He, Dong-Dong Chen, Jing Liao, Pedro V. Sander, Lu Yuan

More importantly, as opposed to other learning-based colorization methods, our network allows the user to achieve customizable results by simply feeding different references.

Colorization Image Retrieval +1

494

Paper
Code

Arbitrary Style Transfer with Deep Feature Reshuffle

1 code implementation • CVPR 2018 • Shuyang Gu, Congliang Chen, Jing Liao, Lu Yuan

We theoretically prove that our new style loss based on reshuffle connects both global and local style losses respectively used by most parametric and non-parametric neural style transfer methods.

Style Transfer

Paper
Code

Towards High Performance Video Object Detection for Mobiles

3 code implementations • 16 Apr 2018 • Xizhou Zhu, Jifeng Dai, Xingchi Zhu, Yichen Wei, Lu Yuan

In this paper, we present a light weight network architecture for video object detection on mobiles.

Object object-detection +2

Paper
Code

Stereoscopic Neural Style Transfer

no code implementations • CVPR 2018 • Dongdong Chen, Lu Yuan, Jing Liao, Nenghai Yu, Gang Hua

This paper presents the first attempt at stereoscopic neural style transfer, which responds to the emerging demand for 3D movies or AR/VR.

Style Transfer

Paper
Add Code

Towards High Performance Video Object Detection

no code implementations • CVPR 2018 • Xizhou Zhu, Jifeng Dai, Lu Yuan, Yichen Wei

There has been significant progresses for image object detection in recent years.

Object object-detection +2

Paper
Add Code

Progressive Color Transfer with Dense Semantic Correspondences

3 code implementations • 2 Oct 2017 • Mingming He, Jing Liao, Dong-Dong Chen, Lu Yuan, Pedro V. Sander

The proposed method can be successfully extended from one-to-one to one-to-many color transfer.

Paper
Code

Visual Attribute Transfer through Deep Image Analogy

5 code implementations • 2 May 2017 • Jing Liao, Yuan YAO, Lu Yuan, Gang Hua, Sing Bing Kang

We propose a new technique for visual attribute transfer across images that may have very different appearance but have perceptually similar semantic structure.

Attribute

1,367

Paper
Code

Flow-Guided Feature Aggregation for Video Object Detection

2 code implementations • ICCV 2017 • Xizhou Zhu, Yujie Wang, Jifeng Dai, Lu Yuan, Yichen Wei

The accuracy of detection suffers from degenerated object appearances in videos, e. g., motion blur, video defocus, rare poses, etc.

Ranked #25 on Video Object Detection on ImageNet VID

Object object-detection +2

3,365

Paper
Code

Coherent Online Video Style Transfer

no code implementations • ICCV 2017 • Dongdong Chen, Jing Liao, Lu Yuan, Nenghai Yu, Gang Hua

Training a feed-forward network for fast neural style transfer of images is proven to be successful.

Image Stylization Video Style Transfer

Paper
Add Code

StyleBank: An Explicit Representation for Neural Image Style Transfer

1 code implementation • CVPR 2017 • Dongdong Chen, Lu Yuan, Jing Liao, Nenghai Yu, Gang Hua

It also enables us to conduct incremental learning to add a new image style by learning a new filter bank while holding the auto-encoder fixed.

Incremental Learning Style Transfer

Paper
Code

Deep Feature Flow for Video Recognition

3 code implementations • CVPR 2017 • Xizhou Zhu, Yuwen Xiong, Jifeng Dai, Lu Yuan, Yichen Wei

Yet, it is non-trivial to transfer the state-of-the-art image recognition networks to videos as per-frame evaluation is too slow and unaffordable.

Ranked #9 on Video Semantic Segmentation on Cityscapes val

Video Recognition Video Semantic Segmentation

3,365

Paper
Code

Image Deblurring Using Smartphone Inertial Sensors

no code implementations • CVPR 2016 • Zhe Hu, Lu Yuan, Stephen Lin, Ming-Hsuan Yang

Removing image blur caused by camera shake is an ill-posed problem, as both the latent image and the point spread function (PSF) are unknown.

Deblurring Image Deblurring

Paper
Add Code

Dual-Feature Warping-Based Motion Model Estimation

no code implementations • ICCV 2015 • Shiwei Li, Lu Yuan, Jian Sun, Long Quan

Line segment is a prominent feature in artificial environments and it can supply sufficient geometrical and structural information of scenes, which not only helps guild to a correct warp in low-texture condition, but also prevents the undesired distortion induced by warping.

Image Stitching Video Stabilization

Paper
Add Code

SteadyFlow: Spatially Smooth Optical Flow for Video Stabilization

no code implementations • CVPR 2014 • Shuaicheng Liu, Lu Yuan, Ping Tan, Jian Sun

We propose a novel motion model, SteadyFlow, to represent the motion between neighboring video frames for stabilization.

Video Stabilization

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.