Search Results for author: Dongdong Chen

Found 114 papers, 69 papers with code

Dynamic Head: Unifying Object Detection Heads with Attentions

3 code implementations • CVPR 2021 • Xiyang Dai, Yinpeng Chen, Bin Xiao, Dongdong Chen, Mengchen Liu, Lu Yuan, Lei Zhang

In this paper, we present a novel dynamic head framework to unify object detection heads with attentions.

Ranked #2 on Object Detection on COCO 2017 val (AP75 metric)

Object object-detection +1

27,693

Paper
Code

CSWin Transformer: A General Vision Transformer Backbone with Cross-Shaped Windows

6 code implementations • CVPR 2022 • Xiaoyi Dong, Jianmin Bao, Dongdong Chen, Weiming Zhang, Nenghai Yu, Lu Yuan, Dong Chen, Baining Guo

By further pretraining on the larger dataset ImageNet-21K, we achieve 87. 5% Top-1 accuracy on ImageNet-1K and high segmentation performance on ADE20K with 55. 7 mIoU.

Ranked #25 on Semantic Segmentation on ADE20K val

Image Classification Semantic Segmentation

5,246

Paper
Code

CLIP-NeRF: Text-and-Image Driven Manipulation of Neural Radiance Fields

2 code implementations • CVPR 2022 • Can Wang, Menglei Chai, Mingming He, Dongdong Chen, Jing Liao

Furthermore, we propose an inverse optimization method that accurately projects an input image to the latent codes for manipulation to enable editing on real images.

Novel View Synthesis

1,995

Paper
Code

Mobile-Former: Bridging MobileNet and Transformer

4 code implementations • CVPR 2022 • Yinpeng Chen, Xiyang Dai, Dongdong Chen, Mengchen Liu, Xiaoyi Dong, Lu Yuan, Zicheng Liu

This structure leverages the advantages of MobileNet at local processing and transformer at global interaction.

object-detection Object Detection

1,183

Paper
Code

Designing a Better Asymmetric VQGAN for StableDiffusion

2 code implementations • 7 Jun 2023 • Zixin Zhu, Xuelu Feng, Dongdong Chen, Jianmin Bao, Le Wang, Yinpeng Chen, Lu Yuan, Gang Hua

The training cost of our asymmetric VQGAN is cheap, and we only need to retrain a new asymmetric decoder while keeping the vanilla VQGAN encoder and StableDiffusion unchanged.

Image Inpainting

956

Paper
Code

Vector Quantized Diffusion Model for Text-to-Image Synthesis

2 code implementations • CVPR 2022 • Shuyang Gu, Dong Chen, Jianmin Bao, Fang Wen, Bo Zhang, Dongdong Chen, Lu Yuan, Baining Guo

Our experiments indicate that the VQ-Diffusion model with the reparameterization is fifteen times faster than traditional AR methods while achieving a better image quality.

Ranked #1 on Text-to-Image Generation on Oxford 102 Flowers (using extra training data)

Denoising Text-to-Image Generation

832

Paper
Code

Uni-ControlNet: All-in-One Control to Text-to-Image Diffusion Models

1 code implementation • NeurIPS 2023 • Shihao Zhao, Dongdong Chen, Yen-Chun Chen, Jianmin Bao, Shaozhe Hao, Lu Yuan, Kwan-Yee K. Wong

Text-to-Image diffusion models have made tremendous progress over the past two years, enabling the generation of highly realistic images based on open-domain text descriptions.

500

Paper
Code

HairCLIP: Design Your Hair by Text and Reference Image

1 code implementation • CVPR 2022 • Tianyi Wei, Dongdong Chen, Wenbo Zhou, Jing Liao, Zhentao Tan, Lu Yuan, Weiming Zhang, Nenghai Yu

Hair editing is an interesting and challenging problem in computer vision and graphics.

Attribute

486

Paper
Code

Bringing Old Films Back to Life

1 code implementation • CVPR 2022 • Ziyu Wan, Bo Zhang, Dongdong Chen, Jing Liao

We present a learning-based framework, recurrent transformer network (RTN), to restore heavily degraded old films.

Ranked #6 on Analog Video Restoration on TAPE

Analog Video Restoration

486

Paper
Code

Florence: A New Foundation Model for Computer Vision

1 code implementation • 22 Nov 2021 • Lu Yuan, Dongdong Chen, Yi-Ling Chen, Noel Codella, Xiyang Dai, Jianfeng Gao, Houdong Hu, Xuedong Huang, Boxin Li, Chunyuan Li, Ce Liu, Mengchen Liu, Zicheng Liu, Yumao Lu, Yu Shi, Lijuan Wang, JianFeng Wang, Bin Xiao, Zhen Xiao, Jianwei Yang, Michael Zeng, Luowei Zhou, Pengchuan Zhang

Computer vision foundation models, which are trained on diverse, large-scale dataset and can be adapted to a wide range of downstream tasks, are critical for this mission to solve real-world computer vision applications.

Ranked #1 on Action Recognition In Videos on Kinetics-600

Action Classification Action Recognition In Videos +12

367

Paper
Code

General Facial Representation Learning in a Visual-Linguistic Manner

2 code implementations • CVPR 2022 • Yinglin Zheng, Hao Yang, Ting Zhang, Jianmin Bao, Dongdong Chen, Yangyu Huang, Lu Yuan, Dong Chen, Ming Zeng, Fang Wen

In this paper, we study the transfer performance of pre-trained models on face analysis tasks and introduce a framework, called FaRL, for general Facial Representation Learning in a visual-linguistic manner.

Ranked #1 on Face Parsing on CelebAMask-HQ (using extra training data)

Face Alignment Face Parsing +1

329

Paper
Code

MicroNet: Improving Image Recognition with Extremely Low FLOPs

1 code implementation • ICCV 2021 • Yunsheng Li, Yinpeng Chen, Xiyang Dai, Dongdong Chen, Mengchen Liu, Lu Yuan, Zicheng Liu, Lei Zhang, Nuno Vasconcelos

This paper aims at addressing the problem of substantial performance degradation at extremely low computational cost (e. g. 5M FLOPs on ImageNet classification).

328

Paper
Code

Equivariant Multi-Modality Image Fusion

2 code implementations • 19 May 2023 • Zixiang Zhao, Haowen Bai, Jiangshe Zhang, Yulun Zhang, Kai Zhang, Shuang Xu, Dongdong Chen, Radu Timofte, Luc van Gool

These components enable the net training to follow the principles of the natural sensing-imaging process while satisfying the equivariant imaging prior.

Self-Supervised Learning

311

Paper
Code

High-Fidelity Pluralistic Image Completion with Transformers

4 code implementations • ICCV 2021 • Ziyu Wan, Jingbo Zhang, Dongdong Chen, Jing Liao

Image completion has made tremendous progress with convolutional neural networks (CNNs), because of their powerful texture modeling capacity.

Ranked #6 on Image Inpainting on CelebA-HQ

Image Inpainting Vocal Bursts Intensity Prediction

296

Paper
Code

MichiGAN: Multi-Input-Conditioned Hair Image Generation for Portrait Editing

1 code implementation • 30 Oct 2020 • Zhentao Tan, Menglei Chai, Dongdong Chen, Jing Liao, Qi Chu, Lu Yuan, Sergey Tulyakov, Nenghai Yu

In this paper, we present MichiGAN (Multi-Input-Conditioned Hair Image GAN), a novel conditional image generation method for interactive portrait hair manipulation.

Conditional Image Generation

289

Paper
Code

SinDiffusion: Learning a Diffusion Model from a Single Natural Image

1 code implementation • 22 Nov 2022 • Weilun Wang, Jianmin Bao, Wengang Zhou, Dongdong Chen, Dong Chen, Lu Yuan, Houqiang Li

We present SinDiffusion, leveraging denoising diffusion models to capture internal distribution of patches from a single natural image.

Ranked #1 on Image Generation on Places50

Denoising Image Generation +1

271

Paper
Code

Multi-attentional Deepfake Detection

1 code implementation • CVPR 2021 • Hanqing Zhao, Wenbo Zhou, Dongdong Chen, Tianyi Wei, Weiming Zhang, Nenghai Yu

Most of them model deepfake detection as a vanilla binary classification problem, i. e, first use a backbone network to extract a global feature and then feed it into a binary classifier (real/fake).

Binary Classification Data Augmentation +2

226

Paper
Code

Gated Context Aggregation Network for Image Dehazing and Deraining

1 code implementation • 21 Nov 2018 • Dongdong Chen, Mingming He, Qingnan Fan, Jing Liao, Liheng Zhang, Dongdong Hou, Lu Yuan, Gang Hua

Image dehazing aims to recover the uncorrupted content from a hazy image.

Ranked #1 on Rain Removal on DID-MDN

Image Dehazing Rain Removal

219

Paper
Code

Unsupervised Pre-training for Person Re-identification

1 code implementation • CVPR 2021 • Dengpan Fu, Dongdong Chen, Jianmin Bao, Hao Yang, Lu Yuan, Lei Zhang, Houqiang Li, Dong Chen

In this paper, we present a large scale unlabeled person re-identification (Re-ID) dataset "LUPerson" and make the first attempt of performing unsupervised pre-training for improving the generalization ability of the learned person Re-ID feature representation.

Ranked #1 on Person Re-Identification on Market-1501 (using extra training data)

Data Augmentation Person Re-Identification +1

214

Paper
Code

Large-Scale Pre-training for Person Re-identification with Noisy Labels

2 code implementations • CVPR 2022 • Dengpan Fu, Dongdong Chen, Hao Yang, Jianmin Bao, Lu Yuan, Lei Zhang, Houqiang Li, Fang Wen, Dong Chen

Since theses ID labels automatically derived from tracklets inevitably contain noises, we develop a large-scale Pre-training framework utilizing Noisy Labels (PNL), which consists of three learning modules: supervised Re-ID learning, prototype-based contrastive learning, and label-guided contrastive learning.

Ranked #7 on Person Re-Identification on CUHK03

Contrastive Learning Multi-Object Tracking +3

214

Paper
Code

Semantic Image Synthesis via Diffusion Models

3 code implementations • 30 Jun 2022 • Weilun Wang, Jianmin Bao, Wengang Zhou, Dongdong Chen, Dong Chen, Lu Yuan, Houqiang Li

Denoising Diffusion Probabilistic Models (DDPMs) have achieved remarkable success in various image generation tasks compared with Generative Adversarial Nets (GANs).

Denoising Image Generation

191

Paper
Code

CLIP Itself is a Strong Fine-tuner: Achieving 85.7% and 88.0% Top-1 Accuracy with ViT-B and ViT-L on ImageNet

1 code implementation • 12 Dec 2022 • Xiaoyi Dong, Jianmin Bao, Ting Zhang, Dongdong Chen, Shuyang Gu, Weiming Zhang, Lu Yuan, Dong Chen, Fang Wen, Nenghai Yu

Recent studies have shown that CLIP has achieved remarkable success in performing zero-shot inference while its fine-tuning performance is not satisfactory.

186

Paper
Code

AvatarCraft: Transforming Text into Neural Human Avatars with Parameterized Shape and Pose Control

1 code implementation • ICCV 2023 • Ruixiang Jiang, Can Wang, Jingbo Zhang, Menglei Chai, Mingming He, Dongdong Chen, Jing Liao

Neural implicit fields are powerful for representing 3D scenes and generating high-quality novel views, but it remains challenging to use such implicit representations for creating a 3D human avatar with a specific identity and artistic style that can be easily animated.

175

Paper
Code

PeCo: Perceptual Codebook for BERT Pre-training of Vision Transformers

1 code implementation • 24 Nov 2021 • Xiaoyi Dong, Jianmin Bao, Ting Zhang, Dongdong Chen, Weiming Zhang, Lu Yuan, Dong Chen, Fang Wen, Nenghai Yu

This paper explores a better prediction target for BERT pre-training of vision transformers.

Ranked #4 on Self-Supervised Image Classification on ImageNet (finetuned)

object-detection Object Detection +2

153

Paper
Code

BEVT: BERT Pretraining of Video Transformers

1 code implementation • CVPR 2022 • Rui Wang, Dongdong Chen, Zuxuan Wu, Yinpeng Chen, Xiyang Dai, Mengchen Liu, Yu-Gang Jiang, Luowei Zhou, Lu Yuan

This design is motivated by two observations: 1) transformers learned on image datasets provide decent spatial priors that can ease the learning of video transformers, which are often times computationally-intensive if trained from scratch; 2) discriminative clues, i. e., spatial and temporal information, needed to make correct predictions vary among different videos due to large intra-class and inter-class variations.

Ranked #8 on Action Recognition on Diving-48

Action Recognition Representation Learning

153

Paper
Code

NeRF-Art: Text-Driven Neural Radiance Fields Stylization

1 code implementation • 15 Dec 2022 • Can Wang, Ruixiang Jiang, Menglei Chai, Mingming He, Dongdong Chen, Jing Liao

As a powerful representation of 3D scenes, the neural radiance field (NeRF) enables high-quality novel view synthesis from multi-view images.

Contrastive Learning Novel View Synthesis

150

Paper
Code

Reduce Information Loss in Transformers for Pluralistic Image Inpainting

1 code implementation • CVPR 2022 • Qiankun Liu, Zhentao Tan, Dongdong Chen, Qi Chu, Xiyang Dai, Yinpeng Chen, Mengchen Liu, Lu Yuan, Nenghai Yu

The indices of quantized pixels are used as tokens for the inputs and prediction targets of transformer.

Ranked #6 on Seeing Beyond the Visible on KITTI360-EX

Image Inpainting Quantization +1

147

Paper
Code

Transformer based Pluralistic Image Completion with Reduced Information Loss

1 code implementation • 31 Mar 2024 • Qiankun Liu, Yuqi Jiang, Zhentao Tan, Dongdong Chen, Ying Fu, Qi Chu, Gang Hua, Nenghai Yu

The indices of quantized pixels are used as tokens for the inputs and prediction targets of the transformer.

Image Inpainting Quantization

147

Paper
Code

E2Style: Improve the Efficiency and Effectiveness of StyleGAN Inversion

2 code implementations • 15 Apr 2021 • Tianyi Wei, Dongdong Chen, Wenbo Zhou, Jing Liao, Weiming Zhang, Lu Yuan, Gang Hua, Nenghai Yu

This paper studies the problem of StyleGAN inversion, which plays an essential role in enabling the pretrained StyleGAN to be used for real image editing tasks.

Face Parsing

142

Paper
Code

HairCLIPv2: Unifying Hair Editing via Proxy Feature Blending

1 code implementation • ICCV 2023 • Tianyi Wei, Dongdong Chen, Wenbo Zhou, Jing Liao, Weiming Zhang, Gang Hua, Nenghai Yu

Even though they can enable very fine-grained local control, such interaction modes are inefficient for the editing conditions that can be easily specified by language descriptions or reference images.

Attribute

130

Paper
Code

Revisiting Dynamic Convolution via Matrix Decomposition

1 code implementation • ICLR 2021 • Yunsheng Li, Yinpeng Chen, Xiyang Dai, Mengchen Liu, Dongdong Chen, Ye Yu, Lu Yuan, Zicheng Liu, Mei Chen, Nuno Vasconcelos

It has two limitations: (a) it increases the number of convolutional weights by K-times, and (b) the joint optimization of dynamic attention and static convolution kernels is challenging.

Dimensionality Reduction

129

Paper
Code

Frido: Feature Pyramid Diffusion for Complex Scene Image Synthesis

1 code implementation • 29 Aug 2022 • Wan-Cyuan Fan, Yen-Chun Chen, Dongdong Chen, Yu Cheng, Lu Yuan, Yu-Chiang Frank Wang

Diffusion models (DMs) have shown great potential for high-quality image synthesis.

Conditional Image Generation Denoising +1

111

Paper
Code

Cross-Domain and Disentangled Face Manipulation with 3D Guidance

1 code implementation • 22 Apr 2021 • Can Wang, Menglei Chai, Mingming He, Dongdong Chen, Jing Liao

Face image manipulation via three-dimensional guidance has been widely applied in various interactive scenarios due to its semantically-meaningful understanding and user-friendly controllability.

Attribute Domain Adaptation +1

102

Paper
Code

Bootstrapped Masked Autoencoders for Vision BERT Pretraining

1 code implementation • 14 Jul 2022 • Xiaoyi Dong, Jianmin Bao, Ting Zhang, Dongdong Chen, Weiming Zhang, Lu Yuan, Dong Chen, Fang Wen, Nenghai Yu

The first design is motivated by the observation that using a pretrained MAE to extract the features as the BERT prediction target for masked tokens can achieve better pretraining performance.

Ranked #19 on Self-Supervised Image Classification on ImageNet (finetuned)

Object Detection Self-Supervised Image Classification +1

Paper
Code

Video-Bench: A Comprehensive Benchmark and Toolkit for Evaluating Video-based Large Language Models

1 code implementation • 27 Nov 2023 • Munan Ning, Bin Zhu, Yujia Xie, Bin Lin, Jiaxi Cui, Lu Yuan, Dongdong Chen, Li Yuan

Video-based large language models (Video-LLMs) have been recently introduced, targeting both fundamental improvements in perception and comprehension, and a diverse range of user inquiries.

Decision Making Question Answering

Paper
Code

Protecting Celebrities from DeepFake with Identity Consistency Transformer

1 code implementation • CVPR 2022 • Xiaoyi Dong, Jianmin Bao, Dongdong Chen, Ting Zhang, Weiming Zhang, Nenghai Yu, Dong Chen, Fang Wen, Baining Guo

In this work we propose Identity Consistency Transformer, a novel face forgery detection method that focuses on high-level semantics, specifically identity information, and detecting a suspect face by finding identity inconsistency in inner and outer face regions.

Face Swapping

Paper
Code

StyleBank: An Explicit Representation for Neural Image Style Transfer

1 code implementation • CVPR 2017 • Dongdong Chen, Lu Yuan, Jing Liao, Nenghai Yu, Gang Hua

It also enables us to conduct incremental learning to add a new image style by learning a new filter bank while holding the auto-encoder fixed.

Incremental Learning Style Transfer

Paper
Code

Masked Video Distillation: Rethinking Masked Feature Modeling for Self-supervised Video Representation Learning

4 code implementations • CVPR 2023 • Rui Wang, Dongdong Chen, Zuxuan Wu, Yinpeng Chen, Xiyang Dai, Mengchen Liu, Lu Yuan, Yu-Gang Jiang

For the choice of teacher models, we observe that students taught by video teachers perform better on temporally-heavy video tasks, while image teachers transfer stronger spatial representations for spatially-heavy video tasks.

Ranked #1 on Self-Supervised Action Recognition on HMDB51

Action Classification Representation Learning +1

Paper
Code

Robust Equivariant Imaging: a fully unsupervised framework for learning to image from noisy and partial measurements

1 code implementation • CVPR 2022 • Dongdong Chen, Julián Tachella, Mike E. Davies

Deep networks provide state-of-the-art performance in multiple imaging inverse problems ranging from medical imaging to computational photography.

Self-Supervised Learning

Paper
Code

Streaming Video Model

1 code implementation • CVPR 2023 • Yucheng Zhao, Chong Luo, Chuanxin Tang, Dongdong Chen, Noel Codella, Zheng-Jun Zha

We believe that the concept of streaming video model and the implementation of S-ViT are solid steps towards a unified deep learning architecture for video understanding.

Action Recognition Multiple Object Tracking +1

Paper
Code

HQ-50K: A Large-scale, High-quality Dataset for Image Restoration

1 code implementation • 8 Jun 2023 • Qinhong Yang, Dongdong Chen, Zhentao Tan, Qiankun Liu, Qi Chu, Jianmin Bao, Lu Yuan, Gang Hua, Nenghai Yu

This paper introduces a new large-scale image restoration dataset, called HQ-50K, which contains 50, 000 high-quality images with rich texture details and semantic diversity.

Denoising Image Restoration +2

Paper
Code

Diversity-Aware Meta Visual Prompting

1 code implementation • CVPR 2023 • Qidong Huang, Xiaoyi Dong, Dongdong Chen, Weiming Zhang, Feifei Wang, Gang Hua, Nenghai Yu

We present Diversity-Aware Meta Visual Prompting~(DAM-VP), an efficient and effective prompting method for transferring pre-trained models to downstream tasks with frozen backbone.

Visual Prompting

Paper
Code

Deep Model Intellectual Property Protection via Deep Watermarking

1 code implementation • 8 Mar 2021 • Jie Zhang, Dongdong Chen, Jing Liao, Weiming Zhang, Huamin Feng, Gang Hua, Nenghai Yu

By jointly training the target model and watermark embedding, the extra barrier can even be absorbed into the target model.

Paper
Code

Learning with Noisy Labels for Robust Point Cloud Segmentation

1 code implementation • ICCV 2021 • Shuquan Ye, Dongdong Chen, Songfang Han, Jing Liao

Point cloud segmentation is a fundamental task in 3D.

Learning with noisy labels Point Cloud Segmentation

Paper
Code

Robust Point Cloud Segmentation with Noisy Annotations

1 code implementation • 6 Dec 2022 • Shuquan Ye, Dongdong Chen, Songfang Han, Jing Liao

To handle boundary-level label noise, we also propose a variant ``PNAL-boundary " with a progressive boundary label cleaning strategy.

Point Cloud Segmentation

Paper
Code

Efficient Semantic Image Synthesis via Class-Adaptive Normalization

1 code implementation • 8 Dec 2020 • Zhentao Tan, Dongdong Chen, Qi Chu, Menglei Chai, Jing Liao, Mingming He, Lu Yuan, Gang Hua, Nenghai Yu

Spatially-adaptive normalization (SPADE) is remarkably successful recently in conditional semantic image synthesis \cite{park2019semantic}, which modulates the normalized activation with spatially-varying transformations learned from semantic layouts, to prevent the semantic information from being washed away.

Image Generation

Paper
Code

Equivariant Imaging: Learning Beyond the Range Space

1 code implementation • ICCV 2021 • Dongdong Chen, Julián Tachella, Mike E. Davies

In various imaging problems, we only have access to compressed measurements of the underlying signals, hindering most learning-based strategies which usually require pairs of signals and associated measurements for training.

Image Inpainting

Paper
Code

Sensing Theorems for Unsupervised Learning in Linear Inverse Problems

1 code implementation • 23 Mar 2022 • Julián Tachella, Dongdong Chen, Mike Davies

In this paper, we present necessary and sufficient sensing conditions for learning the signal model from measurement data alone which only depend on the dimension of the model and the number of operators or properties of the group action that the model is invariant to.

Dictionary Learning Matrix Completion

Paper
Code

Diverse Semantic Image Synthesis via Probability Distribution Modeling

1 code implementation • CVPR 2021 • Zhentao Tan, Menglei Chai, Dongdong Chen, Jing Liao, Qi Chu, Bin Liu, Gang Hua, Nenghai Yu

In this paper, we propose a novel diverse semantic image synthesis framework from the perspective of semantic class distributions, which naturally supports diverse generation at semantic or even instance level.

Ranked #1 on Image-to-Image Translation on ADE20K Labels-to-Photos (LPIPS metric)

Image-to-Image Translation

Paper
Code

Shape-invariant 3D Adversarial Point Clouds

1 code implementation • CVPR 2022 • Qidong Huang, Xiaoyi Dong, Dongdong Chen, Hang Zhou, Weiming Zhang, Nenghai Yu

In this paper, we propose a novel Point-Cloud Sensitivity Map to boost both the efficiency and imperceptibility of point perturbations.

Paper
Code

GreedyFool: Distortion-Aware Sparse Adversarial Attack

1 code implementation • NeurIPS 2020 • Xiaoyi Dong, Dongdong Chen, Jianmin Bao, Chuan Qin, Lu Yuan, Weiming Zhang, Nenghai Yu, Dong Chen

Sparse adversarial samples are a special branch of adversarial samples that can fool the target model by only perturbing a few pixels.

Adversarial Attack

Paper
Code

REVIVE: Regional Visual Representation Matters in Knowledge-Based Visual Question Answering

1 code implementation • 2 Jun 2022 • Yuanze Lin, Yujia Xie, Dongdong Chen, Yichong Xu, Chenguang Zhu, Lu Yuan

Specifically, we observe that in most state-of-the-art knowledge-based VQA methods: 1) visual features are extracted either from the whole image or in a sliding window manner for retrieving knowledge, and the important relationship within/among object regions is neglected; 2) visual features are not well utilized in the final answering model, which is counter-intuitive to some extent.

Ranked #11 on Visual Question Answering (VQA) on OK-VQA

Question Answering Retrieval +1

Paper
Code

Generative Enhancement for 3D Medical Images

1 code implementation • 19 Mar 2024 • Lingting Zhu, Noel Codella, Dongdong Chen, Zhenchao Jin, Lu Yuan, Lequan Yu

Our method begins with a 2D slice, noted as the informed slice to serve the patient prior, and propagates the generation process using a 3D segmentation mask.

counterfactual Image Generation

Paper
Code

Should All Proposals be Treated Equally in Object Detection?

1 code implementation • 7 Jul 2022 • Yunsheng Li, Yinpeng Chen, Xiyang Dai, Dongdong Chen, Mengchen Liu, Pei Yu, Jing Yin, Lu Yuan, Zicheng Liu, Nuno Vasconcelos

We formulate this as a learning problem where the goal is to assign operators to proposals, in the detection head, so that the total computational cost is constrained and the precision is maximized.

Object Object Detection

Paper
Code

Meta-PU: An Arbitrary-Scale Upsampling Network for Point Cloud

1 code implementation • 8 Feb 2021 • Shuquan Ye, Dongdong Chen, Songfang Han, Ziyu Wan, Jing Liao

Thus, Meta-PU even outperforms the existing methods trained for a specific scale factor only.

Graphics

Paper
Code

Stronger NAS with Weaker Predictors

1 code implementation • NeurIPS 2021 • Junru Wu, Xiyang Dai, Dongdong Chen, Yinpeng Chen, Mengchen Liu, Ye Yu, Zhangyang Wang, Zicheng Liu, Mei Chen, Lu Yuan

We propose a paradigm shift from fitting the whole architecture space using one strong predictor, to progressively fitting a search path towards the high-performance sub-space through a set of weaker predictors.

Neural Architecture Search

Paper
Code

X-Paste: Revisiting Scalable Copy-Paste for Instance Segmentation using CLIP and StableDiffusion

1 code implementation • 7 Dec 2022 • Hanqing Zhao, Dianmo Sheng, Jianmin Bao, Dongdong Chen, Dong Chen, Fang Wen, Lu Yuan, Ce Liu, Wenbo Zhou, Qi Chu, Weiming Zhang, Nenghai Yu

We demonstrate for the first time that using a text2image model to generate images or zero-shot recognition model to filter noisily crawled images for different object categories is a feasible way to make Copy-Paste truly scalable.

Ranked #7 on Instance Segmentation on LVIS v1.0 val

Data Augmentation Instance Segmentation +5

Paper
Code

Deep Decomposition Learning for Inverse Imaging Problems

1 code implementation • ECCV 2020 • Dongdong Chen, Mike E. Davies

Deep learning is emerging as a new paradigm for solving inverse imaging problems.

Compressive Sensing Image Super-Resolution

Paper
Code

MADAv2: Advanced Multi-Anchor Based Active Domain Adaptation Segmentation

1 code implementation • 18 Jan 2023 • Munan Ning, Donghuan Lu, Yujia Xie, Dongdong Chen, Dong Wei, Yefeng Zheng, Yonghong Tian, Shuicheng Yan, Li Yuan

Unsupervised domain adaption has been widely adopted in tasks with scarce annotated data.

Domain Adaptation Semantic Segmentation +1

Paper
Code

Improving Commonsense in Vision-Language Models via Knowledge Graph Riddles

1 code implementation • CVPR 2023 • Shuquan Ye, Yujia Xie, Dongdong Chen, Yichong Xu, Lu Yuan, Chenguang Zhu, Jing Liao

Through our analysis, we find one important reason is that existing large-scale VL datasets do not contain much commonsense knowledge, which motivates us to improve the commonsense of VL-models from the data perspective.

Data Augmentation Retrieval

Paper
Code

Layer Grafted Pre-training: Bridging Contrastive Learning And Masked Image Modeling For Label-Efficient Representations

1 code implementation • 27 Feb 2023 • Ziyu Jiang, Yinpeng Chen, Mengchen Liu, Dongdong Chen, Xiyang Dai, Lu Yuan, Zicheng Liu, Zhangyang Wang

This motivates us to shift the paradigm from combining loss at the end, to choosing the proper learning method per network layer.

Contrastive Learning Few-Shot Learning

Paper
Code

Passport-aware Normalization for Deep Model Protection

1 code implementation • NeurIPS 2020 • Jie Zhang, Dongdong Chen, Jing Liao, Weiming Zhang, Gang Hua, Nenghai Yu

Only when the model IP is suspected to be stolen by someone, the private passport-aware branch is added back for ownership verification.

Model Compression

Paper
Code

Compressive MR Fingerprinting reconstruction with Neural Proximal Gradient iterations

1 code implementation • 27 Jun 2020 • Dongdong Chen, Mike E. Davies, Mohammad Golbabaee

Consistency of the predictions with respect to the physical forward model is pivotal for reliably solving inverse problems.

De-aliasing Magnetic Resonance Fingerprinting

Paper
Code

OmniVid: A Generative Framework for Universal Video Understanding

1 code implementation • 26 Mar 2024 • Junke Wang, Dongdong Chen, Chong Luo, Bo He, Lu Yuan, Zuxuan Wu, Yu-Gang Jiang

The core of video understanding tasks, such as recognition, captioning, and tracking, is to automatically detect objects or actions in a video and analyze their temporal evolution.

Action Recognition Dense Video Captioning +4

Paper
Code

Improving Adversarial Robustness of Masked Autoencoders via Test-time Frequency-domain Prompting

1 code implementation • ICCV 2023 • Qidong Huang, Xiaoyi Dong, Dongdong Chen, Yinpeng Chen, Lu Yuan, Gang Hua, Weiming Zhang, Nenghai Yu

Based on our analysis, we provide a simple yet effective way to boost the adversarial robustness of MAE.

Adversarial Robustness

Paper
Code

Poison Ink: Robust and Invisible Backdoor Attack

1 code implementation • 5 Aug 2021 • Jie Zhang, Dongdong Chen, Qidong Huang, Jing Liao, Weiming Zhang, Huamin Feng, Gang Hua, Nenghai Yu

As the image structure can keep its semantic meaning during the data transformation, such trigger pattern is inherently robust to data transformations.

Backdoor Attack Data Poisoning

Paper
Code

Exploring Pre-trained Text-to-Video Diffusion Models for Referring Video Object Segmentation

1 code implementation • 18 Mar 2024 • Zixin Zhu, Xuelu Feng, Dongdong Chen, Junsong Yuan, Chunming Qiao, Gang Hua

We hypothesize that the latent representation learned from a pretrained generative T2V model encapsulates rich semantics and coherent temporal correspondences, thereby naturally facilitating video understanding.

Referring Video Object Segmentation Semantic Segmentation +2

Paper
Code

Unsupervised Learning From Incomplete Measurements for Inverse Problems

1 code implementation • 28 Jan 2022 • Julián Tachella, Dongdong Chen, Mike Davies

In many real-world inverse problems, only incomplete measurement data are available for training which can pose a problem for learning a reconstruction function.

Image Inpainting

Paper
Code

Real-Time Image Segmentation via Hybrid Convolutional-Transformer Architecture Search

1 code implementation • 15 Mar 2024 • Hongyuan Yu, Cheng Wan, Mengchen Liu, Dongdong Chen, Bin Xiao, Xiyang Dai

Manually replacing convolution layers with multi-head self-attention is non-trivial due to the costly overhead in memory to maintain high resolution.

Autonomous Driving Image Segmentation +2

Paper
Code

Stereoscopic Neural Style Transfer

no code implementations • CVPR 2018 • Dongdong Chen, Lu Yuan, Jing Liao, Nenghai Yu, Gang Hua

This paper presents the first attempt at stereoscopic neural style transfer, which responds to the emerging demand for 3D movies or AR/VR.

Style Transfer

Paper
Add Code

Coherent Online Video Style Transfer

no code implementations • ICCV 2017 • Dongdong Chen, Jing Liao, Lu Yuan, Nenghai Yu, Gang Hua

Training a feed-forward network for fast neural style transfer of images is proven to be successful.

Image Stylization Video Style Transfer

Paper
Add Code

Learning Discriminative Representation with Signed Laplacian Restricted Boltzmann Machine

no code implementations • 28 Aug 2018 • Dongdong Chen, Jiancheng Lv, Mike E. Davies

We investigate the potential of a restricted Boltzmann Machine (RBM) for discriminative representation learning.

Representation Learning

Paper
Add Code

Rethinking Spatially-Adaptive Normalization

no code implementations • 6 Apr 2020 • Zhentao Tan, Dongdong Chen, Qi Chu, Menglei Chai, Jing Liao, Mingming He, Lu Yuan, Nenghai Yu

Despite its impressive performance, a more thorough understanding of the true advantages inside the box is still highly demanded, to help reduce the significant computation and parameter overheads introduced by these new structures.

Image Generation

Paper
Add Code

Weak NAS Predictor Is All You Need

no code implementations • 1 Jan 2021 • Junru Wu, Xiyang Dai, Dongdong Chen, Yinpeng Chen, Mengchen Liu, Ye Yu, Zhangyang Wang, Zicheng Liu, Mei Chen, Lu Yuan

Rather than expecting a single strong predictor to model the whole space, we seek a progressive line of weak predictors that can connect a path to the best architecture, thus greatly simplifying the learning task of each predictor.

Neural Architecture Search

Paper
Add Code

LG-GAN: Label Guided Adversarial Network for Flexible Targeted Attack of Point Cloud-based Deep Networks

no code implementations • 1 Nov 2020 • Hang Zhou, Dongdong Chen, Jing Liao, Weiming Zhang, Kejiang Chen, Xiaoyi Dong, Kunlin Liu, Gang Hua, Nenghai Yu

To overcome these shortcomings, this paper proposes a novel label guided adversarial network (LG-GAN) for real-time flexible targeted point cloud attack.

Paper
Add Code

MicroNet: Towards Image Recognition with Extremely Low FLOPs

no code implementations • 24 Nov 2020 • Yunsheng Li, Yinpeng Chen, Xiyang Dai, Dongdong Chen, Mengchen Liu, Lu Yuan, Zicheng Liu, Lei Zhang, Nuno Vasconcelos

In this paper, we present MicroNet, which is an efficient convolutional neural network using extremely low computational cost (e. g. 6 MFLOPs on ImageNet classification).

Paper
Add Code

Identity-Driven DeepFake Detection

no code implementations • 7 Dec 2020 • Xiaoyi Dong, Jianmin Bao, Dongdong Chen, Weiming Zhang, Nenghai Yu, Dong Chen, Fang Wen, Baining Guo

Our approach takes as input the suspect image/video as well as the target identity information (a reference image or video).

DeepFake Detection Face Swapping

Paper
Add Code

Are Fewer Labels Possible for Few-shot Learning?

no code implementations • 10 Dec 2020 • Suichan Li, Dongdong Chen, Yinpeng Chen, Lu Yuan, Lei Zhang, Qi Chu, Nenghai Yu

We conduct experiments on 10 different few-shot target datasets, and our average few-shot performance outperforms both vanilla inductive unsupervised transfer and supervised transfer by a large margin.

Clustering Few-Shot Learning

Paper
Add Code

Improved Image Matting via Real-time User Clicks and Uncertainty Estimation

no code implementations • CVPR 2021 • Tianyi Wei, Dongdong Chen, Wenbo Zhou, Jing Liao, Hanqing Zhao, Weiming Zhang, Nenghai Yu

Image matting is a fundamental and challenging problem in computer vision and graphics.

Image Matting

Paper
Add Code

COIN: Contrastive Identifier Network for Breast Mass Diagnosis in Mammography

no code implementations • 29 Dec 2020 • Heyi Li, Dongdong Chen, William H. Nailon, Mike E. Davies, David Laurenson

Computer-aided breast cancer diagnosis in mammography is a challenging problem, stemming from mammographical data scarcity and data entanglement.

Contrastive Learning

Paper
Add Code

Improve Unsupervised Pretraining for Few-label Transfer

no code implementations • ICCV 2021 • Suichan Li, Dongdong Chen, Yinpeng Chen, Lu Yuan, Lei Zhang, Qi Chu, Bin Liu, Nenghai Yu

Unsupervised pretraining has achieved great success and many recent works have shown unsupervised pretraining can achieve comparable or even slightly better transfer performance than supervised pretraining on downstream target datasets.

Clustering Contrastive Learning

Paper
Add Code

Exploring Structure Consistency for Deep Model Watermarking

no code implementations • 5 Aug 2021 • Jie Zhang, Dongdong Chen, Jing Liao, Han Fang, Zehua Ma, Weiming Zhang, Gang Hua, Nenghai Yu

However, little attention has been devoted to the protection of DNNs in image processing tasks.

Data Augmentation

Paper
Add Code

Unsupervised Finetuning

no code implementations • 18 Oct 2021 • Suichan Li, Dongdong Chen, Yinpeng Chen, Lu Yuan, Lei Zhang, Qi Chu, Bin Liu, Nenghai Yu

This problem is more challenging than the supervised counterpart, as the low data density in the small-scale target data is not friendly for unsupervised learning, leading to the damage of the pretrained representation and poor representation in the target domain.

Paper
Add Code

Reinforced Pipeline Optimization: Behaving Optimally with Non-Differentiabilities

no code implementations • 27 Sep 2018 • Aijun Bai, Dongdong Chen, Gang Hua, Lu Yuan

Many machine learning systems are implemented as pipelines.

object-detection Object Detection

Paper
Add Code

3D Question Answering

no code implementations • 15 Dec 2021 • Shuquan Ye, Dongdong Chen, Songfang Han, Jing Liao

To this end, we propose a novel transformer-based 3DQA framework "3DQA-TR", which consists of two encoders for exploiting the appearance and geometry information, respectively.

Question Answering Visual Question Answering

Paper
Add Code

Online Multi-Object Tracking with Unsupervised Re-Identification Learning and Occlusion Estimation

no code implementations • 4 Jan 2022 • Qiankun Liu, Dongdong Chen, Qi Chu, Lu Yuan, Bin Liu, Lei Zhang, Nenghai Yu

In addition, such practice of re-identification still can not track those highly occluded objects when they are missed by the detector.

Ranked #7 on Multi-Object Tracking on MOT16 (using extra training data)

Multi-Object Tracking Object +2

Paper
Add Code

Deep Unrolling for Magnetic Resonance Fingerprinting

no code implementations • 23 Jan 2022 • Dongdong Chen, Mike E. Davies, Mohammad Golbabaee

Magnetic Resonance Fingerprinting (MRF) has emerged as a promising quantitative MR imaging approach.

Magnetic Resonance Fingerprinting Rolling Shutter Correction

Paper
Add Code

Self-supervised Transformer for Deepfake Detection

no code implementations • 2 Mar 2022 • Hanqing Zhao, Wenbo Zhou, Dongdong Chen, Weiming Zhang, Nenghai Yu

After pre-training with our method, the model will then be partially fine-tuned for deepfake detection task.

Contrastive Learning DeepFake Detection +3

Paper
Add Code

Residual Mixture of Experts

no code implementations • 20 Apr 2022 • Lemeng Wu, Mengchen Liu, Yinpeng Chen, Dongdong Chen, Xiyang Dai, Lu Yuan

In this paper, we propose Residual Mixture of Experts (RMoE), an efficient training pipeline for MoE vision transformers on downstream tasks, such as segmentation and detection.

object-detection Object Detection

Paper
Add Code

i-Code: An Integrative and Composable Multimodal Learning Framework

no code implementations • 3 May 2022 • ZiYi Yang, Yuwei Fang, Chenguang Zhu, Reid Pryzant, Dongdong Chen, Yu Shi, Yichong Xu, Yao Qian, Mei Gao, Yi-Ling Chen, Liyang Lu, Yujia Xie, Robert Gmyr, Noel Codella, Naoyuki Kanda, Bin Xiao, Lu Yuan, Takuya Yoshioka, Michael Zeng, Xuedong Huang

Human intelligence is multimodal; we integrate visual, linguistic, and acoustic signals to maintain a holistic worldview.

Contrastive Learning Video Understanding

Paper
Add Code

Detection Hub: Unifying Object Detection Datasets via Query Adaptation on Language Embedding

no code implementations • CVPR 2023 • Lingchen Meng, Xiyang Dai, Yinpeng Chen, Pengchuan Zhang, Dongdong Chen, Mengchen Liu, JianFeng Wang, Zuxuan Wu, Lu Yuan, Yu-Gang Jiang

Detection Hub further achieves SoTA performance on UODB benchmark with wide variety of datasets.

Object object-detection +1

Paper
Add Code

MaskCLIP: Masked Self-Distillation Advances Contrastive Language-Image Pretraining

no code implementations • CVPR 2023 • Xiaoyi Dong, Jianmin Bao, Yinglin Zheng, Ting Zhang, Dongdong Chen, Hao Yang, Ming Zeng, Weiming Zhang, Lu Yuan, Dong Chen, Fang Wen, Nenghai Yu

Second, masked self-distillation is also consistent with vision-language contrastive from the perspective of training objective as both utilize the visual encoder for feature aligning, and thus is able to learn local semantics getting indirect supervision from the language.

Representation Learning

Paper
Add Code

Video Mobile-Former: Video Recognition with Efficient Global Spatial-temporal Modeling

no code implementations • 25 Aug 2022 • Rui Wang, Zuxuan Wu, Dongdong Chen, Yinpeng Chen, Xiyang Dai, Mengchen Liu, Luowei Zhou, Lu Yuan, Yu-Gang Jiang

To avoid significant computational cost incurred by computing self-attention between the large number of local patches in videos, we propose to use very few global tokens (e. g., 6) for a whole video in Transformers to exchange information with 3D-CNNs with a cross-attention mechanism.

Video Recognition

Paper
Add Code

Imaging with Equivariant Deep Learning

no code implementations • 5 Sep 2022 • Dongdong Chen, Mike Davies, Matthias J. Ehrhardt, Carola-Bibiane Schönlieb, Ferdia Sherry, Julián Tachella

From early image processing to modern computational imaging, successful models and algorithms have relied on a fundamental property of natural signals: symmetry.

Image Classification Self-Supervised Learning

Paper
Add Code

OmniVL:One Foundation Model for Image-Language and Video-Language Tasks

no code implementations • 15 Sep 2022 • Junke Wang, Dongdong Chen, Zuxuan Wu, Chong Luo, Luowei Zhou, Yucheng Zhao, Yujia Xie, Ce Liu, Yu-Gang Jiang, Lu Yuan

This paper presents OmniVL, a new foundation model to support both image-language and video-language tasks using one universal architecture.

Ranked #4 on Cross-Modal Retrieval on Flickr30k (using extra training data)

Action Classification Action Recognition +13

Paper
Add Code

PointCAT: Contrastive Adversarial Training for Robust Point Cloud Recognition

no code implementations • 16 Sep 2022 • Qidong Huang, Xiaoyi Dong, Dongdong Chen, Hang Zhou, Weiming Zhang, Kui Zhang, Gang Hua, Nenghai Yu

Notwithstanding the prominent performance achieved in various applications, point cloud recognition models have often suffered from natural corruptions and adversarial perturbations.

Paper
Add Code

Self-Supervised Learning based on Heat Equation

no code implementations • 23 Nov 2022 • Yinpeng Chen, Xiyang Dai, Dongdong Chen, Mengchen Liu, Lu Yuan, Zicheng Liu, Youzuo Lin

When transferring to object detection with frozen backbone, QB-Heat outperforms MoCo-v2 and supervised pre-training on ImageNet by 7. 9 and 4. 5 AP respectively.

Image Classification object-detection +2

Paper
Add Code

Look Before You Match: Instance Understanding Matters in Video Object Segmentation

no code implementations • CVPR 2023 • Junke Wang, Dongdong Chen, Zuxuan Wu, Chong Luo, Chuanxin Tang, Xiyang Dai, Yucheng Zhao, Yujia Xie, Lu Yuan, Yu-Gang Jiang

Towards this goal, we present a two-branch network for VOS, where the query-based instance segmentation (IS) branch delves into the instance details of the current frame and the VOS branch performs spatial-temporal matching with the memory bank.

Ranked #1 on Semi-Supervised Video Object Segmentation on Long Video Dataset (using extra training data)

Instance Segmentation Segmentation +3

Paper
Add Code

PA-GM: Position-Aware Learning of Embedding Networks for Deep Graph Matching

no code implementations • 5 Jan 2023 • Dongdong Chen, Yuxing Dai, Lichi Zhang, Zhihong Zhang

In this paper, we introduce a novel end-to-end neural network that can map the linear assignment problem into a high-dimensional space augmented with node-level relative position information, which is crucial for improving the method's performance for similar content matching.

Combinatorial Optimization Graph Matching +1

Paper
Add Code

OmniTracker: Unifying Object Tracking by Tracking-with-Detection

no code implementations • 21 Mar 2023 • Junke Wang, Dongdong Chen, Zuxuan Wu, Chong Luo, Xiyang Dai, Lu Yuan, Yu-Gang Jiang

Object tracking (OT) aims to estimate the positions of target objects in a video sequence.

Object Object Tracking

Paper
Add Code

Domain Generalization for Mammographic Image Analysis with Contrastive Learning

no code implementations • 20 Apr 2023 • Zheren Li, Zhiming Cui, Lichi Zhang, Sheng Wang, Chenjin Lei, Xi Ouyang, Dongdong Chen, Xiangyu Zhao, Yajia Gu, Zaiyi Liu, Chunling Liu, Dinggang Shen, Jie-Zhi Cheng

The training of an efficacious deep learning model requires large data with diverse styles and qualities.

breast density classification Contrastive Learning +3

Paper
Add Code

ChatVideo: A Tracklet-centric Multimodal and Versatile Video Understanding System

no code implementations • 27 Apr 2023 • Junke Wang, Dongdong Chen, Chong Luo, Xiyang Dai, Lu Yuan, Zuxuan Wu, Yu-Gang Jiang

Existing deep video models are limited by specific tasks, fixed input-output spaces, and poor generalization capabilities, making it difficult to deploy them in real-world scenarios.

Video Understanding

Paper
Add Code

i-Code V2: An Autoregressive Generation Framework over Vision, Language, and Speech Data

no code implementations • 21 May 2023 • ZiYi Yang, Mahmoud Khademi, Yichong Xu, Reid Pryzant, Yuwei Fang, Chenguang Zhu, Dongdong Chen, Yao Qian, Mei Gao, Yi-Ling Chen, Robert Gmyr, Naoyuki Kanda, Noel Codella, Bin Xiao, Yu Shi, Lu Yuan, Takuya Yoshioka, Michael Zeng, Xuedong Huang

The convergence of text, visual, and audio data is a key step towards human-like artificial intelligence, however the current Vision-Language-Speech landscape is dominated by encoder-only models which lack generative abilities.

Paper
Add Code

Album Storytelling with Iterative Story-aware Captioning and Large Language Models

no code implementations • 22 May 2023 • Munan Ning, Yujia Xie, Dongdong Chen, Zeyin Song, Lu Yuan, Yonghong Tian, Qixiang Ye, Li Yuan

One natural approach is to use caption models to describe each photo in the album, and then use LLMs to summarize and rewrite the generated captions into an engaging story.

Paper
Add Code

Image as First-Order Norm+Linear Autoregression: Unveiling Mathematical Invariance

no code implementations • 25 May 2023 • Yinpeng Chen, Xiyang Dai, Dongdong Chen, Mengchen Liu, Lu Yuan, Zicheng Liu, Youzuo Lin

This paper introduces a novel mathematical property applicable to diverse images, referred to as FINOLA (First-Order Norm+Linear Autoregressive).

Image Classification Image Reconstruction +3

Paper
Add Code

On the Hidden Waves of Image

no code implementations • 19 Oct 2023 • Yinpeng Chen, Dongdong Chen, Xiyang Dai, Mengchen Liu, Lu Yuan, Zicheng Liu, Youzuo Lin

We term this phenomenon hidden waves, as it reveals that, although the speeds of the set of wave equations and autoregressive coefficient matrices are latent, they are both learnable and shared across images.

Paper
Add Code

PersonMAE: Person Re-Identification Pre-Training with Masked AutoEncoders

no code implementations • 8 Nov 2023 • Hezhen Hu, Xiaoyi Dong, Jianmin Bao, Dongdong Chen, Lu Yuan, Dong Chen, Houqiang Li

Pre-training is playing an increasingly important role in learning generic feature representation for Person Re-identification (ReID).

Person Re-Identification

Paper
Add Code

Uni-COAL: A Unified Framework for Cross-Modality Synthesis and Super-Resolution of MR Images

no code implementations • 14 Nov 2023 • Zhiyun Song, Zengxin Qi, Xin Wang, Xiangyu Zhao, Zhenrong Shen, Sheng Wang, Manman Fei, Zhe Wang, Di Zang, Dongdong Chen, Linlin Yao, Qian Wang, Xuehai Wu, Lichi Zhang

Cross-modality synthesis (CMS), super-resolution (SR), and their combination (CMSR) have been extensively studied for magnetic resonance imaging (MRI).

Attribute Image Generation +2

Paper
Add Code

Traffic Video Object Detection using Motion Prior

no code implementations • 16 Nov 2023 • Lihao Liu, Yanqi Cheng, Dongdong Chen, Jing He, Pietro Liò, Carola-Bibiane Schönlieb, Angelica I Aviles-Rivero

In this work, we propose two innovative methods to exploit the motion prior and boost the performance of both fully-supervised and semi-supervised traffic video object detection.

Object object-detection +1

Paper
Add Code

TrafficMOT: A Challenging Dataset for Multi-Object Tracking in Complex Traffic Scenarios

no code implementations • 30 Nov 2023 • Lihao Liu, Yanqi Cheng, Zhongying Deng, Shujun Wang, Dongdong Chen, Xiaowei Hu, Pietro Liò, Carola-Bibiane Schönlieb, Angelica Aviles-Rivero

Multi-object tracking in traffic videos is a crucial research area, offering immense potential for enhancing traffic monitoring accuracy and promoting road safety measures through the utilisation of advanced machine learning algorithms.

Multi-Object Tracking Object

Paper
Add Code

Mesh-Guided Neural Implicit Field Editing

no code implementations • 4 Dec 2023 • Can Wang, Mingming He, Menglei Chai, Dongdong Chen, Jing Liao

We first introduce a differentiable method using marching tetrahedra for polygonal mesh extraction from the neural implicit field and then design a differentiable color extractor to assign colors obtained from the volume renderings to this extracted mesh.

Paper
Add Code

Towards More Unified In-context Visual Understanding

no code implementations • 5 Dec 2023 • Dianmo Sheng, Dongdong Chen, Zhentao Tan, Qiankun Liu, Qi Chu, Jianmin Bao, Tao Gong, Bin Liu, Shengwei Xu, Nenghai Yu

Thanks to this design, the model is capable of handling in-context vision understanding tasks with multimodal output in a unified pipeline. Experimental results demonstrate that our model achieves competitive performance compared with specialized models and previous ICL baselines.

Image Captioning In-Context Learning +1

Paper
Add Code

Image Fusion via Vision-Language Model

no code implementations • 3 Feb 2024 • Zixiang Zhao, Lilun Deng, Haowen Bai, Yukun Cui, Zhipeng Zhang, Yulun Zhang, Haotong Qin, Dongdong Chen, Jiangshe Zhang, Peng Wang, Luc van Gool

Therefore, we introduce a novel fusion paradigm named image Fusion via vIsion-Language Model (FILM), for the first time, utilizing explicit textual information in different source images to guide image fusion.

Language Modelling

Paper
Add Code

Diffusion Posterior Proximal Sampling for Image Restoration

no code implementations • 25 Feb 2024 • Hongjie Wu, Linchao He, Mingqin Zhang, Dongdong Chen, Kunming Luo, Mengting Luo, Ji-Zhe Zhou, Hu Chen, Jiancheng Lv

Specifically, we opt for a sample consistent with the measurement identity at each generative step, exploiting the sampling selection as an avenue for output stability and enhancement.

Denoising Image Restoration

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.