Search Results for author: DaCheng Tao

Found 719 papers, 326 papers with code

From Images to Textual Prompts: Zero-shot VQA with Frozen Large Language Models

3 code implementations • 21 Dec 2022 • Jiaxian Guo, Junnan Li, Dongxu Li, Anthony Meng Huat Tiong, Boyang Li, DaCheng Tao, Steven C. H. Hoi

To address this issue, we propose \emph{Img2Prompt}, a plug-and-play module that provides the prompts that can bridge the aforementioned modality and task disconnections, so that LLMs can perform zero-shot VQA tasks without end-to-end training.

Question Answering Visual Question Answering +1

8,722

Paper
Code

Domain Generalization via Conditional Invariant Representation

1 code implementation • 23 Jul 2018 • Ya Li, Mingming Gong, Xinmei Tian, Tongliang Liu, DaCheng Tao

With the conditional invariant representation, the invariance of the joint distribution $\mathbb{P}(h(X), Y)$ can be guaranteed if the class prior $\mathbb{P}(Y)$ does not change across training and test domains.

Domain Generalization

1,332

Paper
Code

ViTPose: Simple Vision Transformer Baselines for Human Pose Estimation

5 code implementations • 26 Apr 2022 • Yufei Xu, Jing Zhang, Qiming Zhang, DaCheng Tao

In this paper, we show the surprisingly good capabilities of plain vision transformers for pose estimation from various aspects, namely simplicity in model structure, scalability in model size, flexibility in training paradigm, and transferability of knowledge between models, through a simple baseline model called ViTPose.

Ranked #1 on Pose Estimation on COCO test-dev

2D Human Pose Estimation Keypoint Detection

1,167

Paper
Code

ViTPose++: Vision Transformer for Generic Body Pose Estimation

1 code implementation • 7 Dec 2022 • Yufei Xu, Jing Zhang, Qiming Zhang, DaCheng Tao

In this paper, we show the surprisingly good properties of plain vision transformers for body pose estimation from various aspects, namely simplicity in model structure, scalability in model size, flexibility in training paradigm, and transferability of knowledge between models, through a simple baseline model dubbed ViTPose.

Ranked #1 on Animal Pose Estimation on AP-10K (using extra training data)

2D Human Pose Estimation Animal Pose Estimation +1

1,167

Paper
Code

A Comprehensive Survey of Dataset Distillation

1 code implementation • 13 Jan 2023 • Shiye Lei, DaCheng Tao

Dataset distillation, a dataset reduction method, addresses this problem by synthesizing a small typical dataset from substantial data and has attracted much attention from the deep learning community.

Meta-Learning

1,158

Paper
Code

Positive-Unlabeled Compression on the Cloud

2 code implementations • NeurIPS 2019 • Yixing Xu, Yunhe Wang, Hanting Chen, Kai Han, Chunjing Xu, DaCheng Tao, Chang Xu

In practice, only a small portion of the original training set is required as positive examples and more useful training examples can be obtained from the massive unlabeled data on the cloud through a PU classifier with an attention based multi-scale feature extractor.

Knowledge Distillation

1,111

Paper
Code

Bridging Composite and Real: Towards End-to-end Deep Image Matting

1 code implementation • 30 Oct 2020 • Jizhizi Li, Jing Zhang, Stephen J. Maybank, DaCheng Tao

Furthermore, we provide a benchmark containing 2, 000 high-resolution real-world animal images and 10, 000 portrait images along with their manually labeled alpha mattes to serve as a test bed for evaluating matting model's generalization ability on real-world images.

Ranked #2 on Image Matting on AM-2K

Image Matting Semantic Segmentation

901

Paper
Code

GMFlow: Learning Optical Flow via Global Matching

4 code implementations • CVPR 2022 • Haofei Xu, Jing Zhang, Jianfei Cai, Hamid Rezatofighi, DaCheng Tao

Learning-based optical flow estimation has been dominated with the pipeline of cost volume with convolutions for flow regression, which is inherently limited to local correlations and thus is hard to address the long-standing challenge of large displacements.

Ranked #8 on Optical Flow Estimation on Spring

Optical Flow Estimation regression

889

Paper
Code

Unifying Flow, Stereo and Depth Estimation

1 code implementation • 10 Nov 2022 • Haofei Xu, Jing Zhang, Jianfei Cai, Hamid Rezatofighi, Fisher Yu, DaCheng Tao, Andreas Geiger

We present a unified formulation and model for three motion and 3D perception tasks: optical flow, rectified stereo matching and unrectified stereo depth estimation from posed images.

Ranked #1 on Optical Flow Estimation on Sintel-clean

Optical Flow Estimation Stereo Depth Estimation +1

889

Paper
Code

VanillaNet: the Power of Minimalism in Deep Learning

4 code implementations • NeurIPS 2023 • Hanting Chen, Yunhe Wang, Jianyuan Guo, DaCheng Tao

In this study, we introduce VanillaNet, a neural network architecture that embraces elegance in design.

Philosophy

793

Paper
Code

Towards Open Vocabulary Learning: A Survey

1 code implementation • 28 Jun 2023 • Jianzong Wu, Xiangtai Li, Shilin Xu, Haobo Yuan, Henghui Ding, Yibo Yang, Xia Li, Jiangning Zhang, Yunhai Tong, Xudong Jiang, Bernard Ghanem, DaCheng Tao

To our knowledge, this is the first comprehensive literature review of open vocabulary learning.

Open Set Learning Out-of-Distribution Detection +3

640

Paper
Code

SAMRS: Scaling-up Remote Sensing Segmentation Dataset with Segment Anything Model

2 code implementations • NeurIPS 2023 • Di Wang, Jing Zhang, Bo Du, Minqiang Xu, Lin Liu, DaCheng Tao, Liangpei Zhang

In this study, we leverage SAM and existing RS object detection datasets to develop an efficient pipeline for generating a large-scale RS segmentation dataset, dubbed SAMRS.

Instance Segmentation Object +4

631

Paper
Code

HandRefiner: Refining Malformed Hands in Generated Images by Diffusion-based Conditional Inpainting

1 code implementation • 29 Nov 2023 • Wenquan Lu, Yufei Xu, Jing Zhang, Chaoyue Wang, DaCheng Tao

Given a generated failed image due to malformed hands, we utilize ControlNet modules to re-inject such correct hand information.

631

Paper
Code

ViTAEv2: Vision Transformer Advanced by Exploring Inductive Bias for Image Recognition and Beyond

4 code implementations • 21 Feb 2022 • Qiming Zhang, Yufei Xu, Jing Zhang, DaCheng Tao

Vision transformers have shown great potential in various computer vision tasks owing to their strong capability to model long-range dependency using the self-attention mechanism.

Ranked #2 on Image Classification on ImageNet ReaL

Image Classification Inductive Bias

514

Paper
Code

Deep Ordinal Regression Network for Monocular Depth Estimation

5 code implementations • CVPR 2018 • Huan Fu, Mingming Gong, Chaohui Wang, Kayhan Batmanghelich, DaCheng Tao

These methods model depth estimation as a regression problem and train the regression networks by minimizing mean squared error, which suffers from slow convergence and unsatisfactory local solutions.

Ranked #13 on Depth Estimation on NYU-Depth V2

Monocular Depth Estimation regression

463

Paper
Code

Deep Modular Co-Attention Networks for Visual Question Answering

7 code implementations • CVPR 2019 • Zhou Yu, Jun Yu, Yuhao Cui, DaCheng Tao, Qi Tian

In this paper, we propose a deep Modular Co-Attention Network (MCAN) that consists of Modular Co-Attention (MCA) layers cascaded in depth.

Ranked #5 on Question Answering on SQA3D

Question Answering Visual Question Answering

432

Paper
Code

An Empirical Study of Remote Sensing Pretraining

2 code implementations • 6 Apr 2022 • Di Wang, Jing Zhang, Bo Du, Gui-Song Xia, DaCheng Tao

To this end, we train different networks from scratch with the help of the largest RS scene recognition dataset up to now -- MillionAID, to obtain a series of RS pretrained backbones, including both convolutional neural networks (CNN) and vision transformers such as Swin and ViTAE, which have shown promising performance on computer vision tasks.

Ranked #1 on Aerial Scene Classification on UCM (80% as trainset)

Aerial Scene Classification Building change detection for remote sensing images +5

413

Paper
Code

Advancing Plain Vision Transformer Towards Remote Sensing Foundation Model

2 code implementations • 8 Aug 2022 • Di Wang, Qiming Zhang, Yufei Xu, Jing Zhang, Bo Du, DaCheng Tao, Liangpei Zhang

Large-scale vision foundation models have made significant progress in visual tasks on natural images, with vision transformers being the primary choice due to their good scalability and representation ability.

Ranked #1 on Aerial Scene Classification on AID (50% as trainset)

Aerial Scene Classification Few-Shot Learning +2

413

Paper
Code

Deep Learning for Camera Calibration and Beyond: A Survey

1 code implementation • 19 Mar 2023 • Kang Liao, Lang Nie, Shujuan Huang, Chunyu Lin, Jing Zhang, Yao Zhao, Moncef Gabbouj, DaCheng Tao

In this paper, we provide a comprehensive survey of learning-based camera calibration techniques, by analyzing their strengths and limitations.

Camera Calibration

395

Paper
Code

Deep Automatic Natural Image Matting

1 code implementation • 15 Jul 2021 • Jizhizi Li, Jing Zhang, DaCheng Tao

To address the problem, a novel end-to-end matting network is proposed, which can predict a generalized trimap for any image of the above types as a unified semantic representation.

Ranked #2 on Image Matting on AIM-500

Image Matting

371

Paper
Code

SFNet: Faster, Accurate, and Domain Agnostic Semantic Segmentation via Semantic Flow

1 code implementation • 10 Jul 2022 • Xiangtai Li, Jiangning Zhang, Yibo Yang, Guangliang Cheng, Kuiyuan Yang, Yunhai Tong, DaCheng Tao

In this paper, we focus on exploring effective methods for faster, accurate, and domain agnostic semantic segmentation.

Real-Time Semantic Segmentation

350

Paper
Code

Recurrent Feature Reasoning for Image Inpainting

1 code implementation • CVPR 2020 • Jingyuan Li, Ning Wang, Lefei Zhang, Bo Du, DaCheng Tao

To capture information from distant places in the feature map for RFR, we further develop KCA and incorporate it in RFR.

Image Inpainting SSIM

346

Paper
Code

HourNAS: Extremely Fast Neural Architecture Search Through an Hourglass Lens

6 code implementations • CVPR 2021 • Zhaohui Yang, Yunhe Wang, Xinghao Chen, Jianyuan Guo, Wei zhang, Chao Xu, Chunjing Xu, DaCheng Tao, Chang Xu

To achieve an extremely fast NAS while preserving the high accuracy, we propose to identify the vital blocks and make them the priority in the architecture search.

Neural Architecture Search

334

Paper
Code

SCOP: Scientific Control for Reliable Neural Network Pruning

4 code implementations • NeurIPS 2020 • Yehui Tang, Yunhe Wang, Yixing Xu, DaCheng Tao, Chunjing Xu, Chao Xu, Chang Xu

To increase the reliability of the results, we prefer to have a more rigorous research design by including a scientific control group as an essential part to minimize the effect of all factors except the association between the filter and expected network output.

Network Pruning

334

Paper
Code

Manifold Regularized Dynamic Network Pruning

7 code implementations • CVPR 2021 • Yehui Tang, Yunhe Wang, Yixing Xu, Yiping Deng, Chao Xu, DaCheng Tao, Chang Xu

Then, the manifold relationship between instances and the pruned sub-networks will be aligned in the training procedure.

Network Pruning

334

Paper
Code

Privacy-Preserving Portrait Matting

1 code implementation • 29 Apr 2021 • Jizhizi Li, Sihan Ma, Jing Zhang, DaCheng Tao

We systematically evaluate both trimap-free and trimap-based matting methods on P3M-10k and find that existing matting methods show different generalization capabilities when following the Privacy-Preserving Training (PPT) setting, i. e., training on face-blurred images and testing on arbitrary images.

Ranked #3 on Image Matting on P3M-10k

Image Matting Privacy Preserving

276

Paper
Code

Channel Exchanging Networks for Multimodal and Multitask Dense Image Prediction

1 code implementation • 4 Dec 2021 • Yikai Wang, Fuchun Sun, Wenbing Huang, Fengxiang He, DaCheng Tao

For the application of dense image prediction, the validity of CEN is tested by four different scenarios: multimodal fusion, cycle multimodal fusion, multitask learning, and multimodal multitask learning.

Ranked #7 on Semantic Segmentation on LLRGBD-synthetic

Semantic Segmentation

276

Paper
Code

ST-P3: End-to-end Vision-based Autonomous Driving via Spatial-Temporal Feature Learning

1 code implementation • 15 Jul 2022 • Shengchao Hu, Li Chen, Penghao Wu, Hongyang Li, Junchi Yan, DaCheng Tao

In particular, we propose a spatial-temporal feature learning scheme towards a set of more representative features for perception, prediction and planning tasks simultaneously, which is called ST-P3.

Ranked #7 on Bird's-Eye View Semantic Segmentation on nuScenes (IoU ped - 224x480 - Vis filter. - 100x100 at 0.5 metric)

Autonomous Driving Bird's-Eye View Semantic Segmentation +1

267

Paper
Code

Learning to Generalize Provably in Learning to Optimize

1 code implementation • 22 Feb 2023 • Junjie Yang, Tianlong Chen, Mingkang Zhu, Fengxiang He, DaCheng Tao, Yingbin Liang, Zhangyang Wang

While the optimizer generalization has been recently studied, the optimizee generalization (or learning to generalize) has not been rigorously studied in the L2O context, which is the aim of this paper.

249

Paper
Code

Perceptual Adversarial Networks for Image-to-Image Transformation

2 code implementations • 28 Jun 2017 • Chaoyue Wang, Chang Xu, Chaohui Wang, DaCheng Tao

The proposed PAN consists of two feed-forward convolutional neural networks (CNNs), the image transformation network T and the discriminative network D. Through combining the generative adversarial loss and the proposed perceptual adversarial loss, these two networks can be trained alternately to solve image-to-image transformation tasks.

Image Inpainting

248

Paper
Code

ViTAE: Vision Transformer Advanced by Exploring Intrinsic Inductive Bias

2 code implementations • NeurIPS 2021 • Yufei Xu, Qiming Zhang, Jing Zhang, DaCheng Tao

Nevertheless, vision transformers treat an image as 1D sequence of visual tokens, lacking an intrinsic inductive bias (IB) in modeling local visual structures and dealing with scale variance.

Ranked #2 on Video Object Segmentation on DAVIS 2017

Image Classification Inductive Bias +2

240

Paper
Code

DehazeNet: An End-to-End System for Single Image Haze Removal

3 code implementations • 28 Jan 2016 • Bolun Cai, Xiangmin Xu, Kui Jia, Chunmei Qing, DaCheng Tao

The key to achieve haze removal is to estimate a medium transmission map for an input hazy image.

Ranked #7 on Image Dehazing on RS-Haze

Image Dehazing Single Image Haze Removal

238

Paper
Code

BatchFormer: Learning to Explore Sample Relationships for Robust Representation Learning

1 code implementation • CVPR 2022 • Zhi Hou, Baosheng Yu, DaCheng Tao

We perform extensive experiments on over ten datasets and the proposed method achieves significant improvements on different data scarcity applications without any bells and whistles, including the tasks of long-tailed recognition, compositional zero-shot learning, domain generalization, and contrastive learning.

Ranked #17 on Long-tail Learning on iNaturalist 2018

Compositional Zero-Shot Learning Contrastive Learning +4

236

Paper
Code

BatchFormerV2: Exploring Sample Relationships for Dense Representation Learning

1 code implementation • 4 Apr 2022 • Zhi Hou, Baosheng Yu, Chaoyue Wang, Yibing Zhan, DaCheng Tao

Specifically, when applying the proposed module, it employs a two-stream pipeline during training, i. e., either with or without a BatchFormerV2 module, where the batchformer stream can be removed for testing.

Image Classification object-detection +3

236

Paper
Code

Trajectory Consistency Distillation: Improved Latent Consistency Distillation by Semi-Linear Consistency Function with Trajectory Mapping

1 code implementation • 29 Feb 2024 • Jianbin Zheng, Minghui Hu, Zhongyi Fan, Chaoyue Wang, Changxing Ding, DaCheng Tao, Tat-Jen Cham

Consequently, we introduce Trajectory Consistency Distillation (TCD), which encompasses trajectory consistency function and strategic stochastic sampling.

Image Generation

234

Paper
Code

Generating Holistic 3D Human Motion from Speech

2 code implementations • CVPR 2023 • Hongwei Yi, Hualin Liang, Yifei Liu, Qiong Cao, Yandong Wen, Timo Bolkart, DaCheng Tao, Michael J. Black

This work addresses the problem of generating 3D holistic body motions from human speech.

Ranked #2 on Gesture Generation on BEAT2

3D Face Animation Gesture Generation

232

Paper
Code

A Survey on Knowledge Distillation of Large Language Models

1 code implementation • 20 Feb 2024 • Xiaohan Xu, Ming Li, Chongyang Tao, Tao Shen, Reynold Cheng, Jinyang Li, Can Xu, DaCheng Tao, Tianyi Zhou

In the era of Large Language Models (LLMs), Knowledge Distillation (KD) emerges as a pivotal methodology for transferring advanced capabilities from leading proprietary LLMs, such as GPT-4, to their open-source counterparts like LLaMA and Mistral.

Data Augmentation Knowledge Distillation +1

231

Paper
Code

DUT: Learning Video Stabilization by Simply Watching Unstable Videos

2 code implementations • 30 Nov 2020 • Yufei Xu, Jing Zhang, Stephen J. Maybank, DaCheng Tao

In this paper, we attempt to tackle the video stabilization problem in a deep unsupervised learning manner, which borrows the divide-and-conquer idea from traditional stabilizers while leveraging the representation power of DNNs to handle the challenges in real-world scenarios.

Homography Estimation Video Stabilization

227

Paper
Code

DPText-DETR: Towards Better Scene Text Detection with Dynamic Points in Transformer

3 code implementations • 10 Jul 2022 • Maoyuan Ye, Jing Zhang, Shanshan Zhao, Juhua Liu, Bo Du, DaCheng Tao

However, these methods built upon detection transformer framework might achieve sub-optimal training efficiency and performance due to coarse positional query modeling. In addition, the point label form exploited in previous works implies the reading order of humans, which impedes the detection robustness from our observation.

Ranked #3 on Scene Text Detection on SCUT-CTW1500

Inductive Bias Scene Text Detection +1

225

Paper
Code

DeepSolo: Let Transformer Decoder with Explicit Points Solo for Text Spotting

2 code implementations • CVPR 2023 • Maoyuan Ye, Jing Zhang, Shanshan Zhao, Juhua Liu, Tongliang Liu, Bo Du, DaCheng Tao

In this paper, we present DeepSolo, a simple DETR-like baseline that lets a single Decoder with Explicit Points Solo for text detection and recognition simultaneously.

Ranked #1 on Text Spotting on Total-Text (using extra training data)

Scene Text Detection Text Detection +2

225

Paper
Code

DeepSolo++: Let Transformer Decoder with Explicit Points Solo for Multilingual Text Spotting

2 code implementations • 31 May 2023 • Maoyuan Ye, Jing Zhang, Shanshan Zhao, Juhua Liu, Tongliang Liu, Bo Du, DaCheng Tao

In this paper, we present DeepSolo++, a simple DETR-like baseline that lets a single decoder with explicit points solo for text detection, recognition, and script identification simultaneously.

Ranked #1 on Text Spotting on Inverse-Text

Scene Text Detection Text Detection +1

225

Paper
Code

Referring Image Matting

1 code implementation • CVPR 2023 • Jizhizi Li, Jing Zhang, DaCheng Tao

Different from conventional image matting, which either requires user-defined scribbles/trimap to extract a specific foreground object or directly extracts all the foreground objects in the image indiscriminately, we introduce a new task named Referring Image Matting (RIM) in this paper, which aims to extract the meticulous alpha matte of the specific object that best matches the given natural language description, thus enabling a more natural and simpler instruction for image matting.

Ranked #1 on Referring Image Matting (RefMatte-RW100) on RefMatte

Domain Generalization Image Matting +5

198

Paper
Code

TransVOD: End-to-End Video Object Detection with Spatial-Temporal Transformers

3 code implementations • 13 Jan 2022 • Qianyu Zhou, Xiangtai Li, Lu He, Yibo Yang, Guangliang Cheng, Yunhai Tong, Lizhuang Ma, DaCheng Tao

Detection Transformer (DETR) and Deformable DETR have been proposed to eliminate the need for many hand-designed components in object detection while demonstrating good performance as previous complex hand-crafted detectors.

Ranked #4 on Video Object Detection on ImageNet VID (using extra training data)

Object object-detection +2

196

Paper
Code

Can ChatGPT Understand Too? A Comparative Study on ChatGPT and Fine-tuned BERT

1 code implementation • 19 Feb 2023 • Qihuang Zhong, Liang Ding, Juhua Liu, Bo Du, DaCheng Tao

Recently, ChatGPT has attracted great attention, as it can generate fluent and high-quality responses to human inquiries.

Question Answering Sentiment Analysis

192

Paper
Code

Towards Scale-Aware, Robust, and Generalizable Unsupervised Monocular Depth Estimation by Integrating IMU Motion Dynamics

1 code implementation • 11 Jul 2022 • Sen Zhang, Jing Zhang, DaCheng Tao

Unsupervised monocular depth and ego-motion estimation has drawn extensive research attention in recent years.

Monocular Depth Estimation Motion Estimation +1

191

Paper
Code

Beyond Bilinear: Generalized Multimodal Factorized High-order Pooling for Visual Question Answering

2 code implementations • 10 Aug 2017 • Zhou Yu, Jun Yu, Chenchao Xiang, Jianping Fan, DaCheng Tao

For fine-grained image and question representations, a `co-attention' mechanism is developed by using a deep neural network architecture to jointly learn the attentions for both the image and the question, which can allow us to reduce the irrelevant features effectively and obtain more discriminative features for image and question representations.

Question Answering Visual Question Answering +1

183

Paper
Code

Multi-modal Factorized Bilinear Pooling with Co-Attention Learning for Visual Question Answering

6 code implementations • ICCV 2017 • Zhou Yu, Jun Yu, Jianping Fan, DaCheng Tao

For multi-modal feature fusion, here we develop a Multi-modal Factorized Bilinear (MFB) pooling approach to efficiently and effectively combine multi-modal features, which results in superior performance for VQA compared with other bilinear pooling approaches.

Question Answering Visual Question Answering

183

Paper
Code

Tensor Canonical Correlation Analysis for Multi-view Dimension Reduction

3 code implementations • 9 Feb 2015 • Yong Luo, DaCheng Tao, Yonggang Wen, Kotagiri Ramamohanarao, Chao Xu

As a consequence, the high order correlation information contained in the different views is explored and thus a more reliable common subspace shared by all features can be obtained.

Dimensionality Reduction MULTI-VIEW LEARNING

177

Paper
Code

Self-Supervised Adversarial Hashing Networks for Cross-Modal Retrieval

1 code implementation • CVPR 2018 • Chao Li, Cheng Deng, Ning li, Wei Liu, Xinbo Gao, DaCheng Tao

In addition, we harness a self-supervised semantic network to discover high-level semantic information in the form of multi-label annotations.

Cross-Modal Retrieval Retrieval

161

Paper
Code

Stroke Controllable Fast Style Transfer with Adaptive Receptive Fields

1 code implementation • ECCV 2018 • Yongcheng Jing, Yang Liu, Yezhou Yang, Zunlei Feng, Yizhou Yu, DaCheng Tao, Mingli Song

In this paper, we present a stroke controllable style transfer network that can achieve continuous and spatial stroke size control.

Style Transfer

158

Paper
Code

Progressive LiDAR Adaptation for Road Detection

1 code implementation • 2 Apr 2019 • Zhe Chen, Jing Zhang, DaCheng Tao

To this end, LiDAR sensor data can be incorporated to improve the visual image-based road detection, because LiDAR data is less susceptible to visual noises.

155

Paper
Code

Deep Image Matting: A Comprehensive Survey

1 code implementation • 10 Apr 2023 • Jizhizi Li, Jing Zhang, DaCheng Tao

Image matting refers to extracting precise alpha matte from natural images, and it plays a critical role in various downstream applications, such as image editing.

Image Matting Referring Image Matting

153

Paper
Code

TriDet: Temporal Action Detection with Relative Boundary Modeling

1 code implementation • CVPR 2023 • Dingfeng Shi, Yujie Zhong, Qiong Cao, Lin Ma, Jia Li, DaCheng Tao

In this paper, we present a one-stage framework TriDet for temporal action detection.

Ranked #2 on Temporal Action Localization on EPIC-KITCHENS-100

Action Detection Temporal Action Localization

148

Paper
Code

Temporal Action Localization with Enhanced Instant Discriminability

3 code implementations • 11 Sep 2023 • Dingfeng Shi, Qiong Cao, Yujie Zhong, Shan An, Jian Cheng, Haogang Zhu, DaCheng Tao

Temporal action detection (TAD) aims to detect all action boundaries and their corresponding categories in an untrimmed video.

Ranked #1 on Temporal Action Localization on MultiTHUMOS

Action Detection Temporal Action Localization

148

Paper
Code

VSA: Learning Varied-Size Window Attention in Vision Transformers

2 code implementations • 18 Apr 2022 • Qiming Zhang, Yufei Xu, Jing Zhang, DaCheng Tao

Attention within windows has been widely explored in vision transformers to balance the performance, computation complexity, and memory footprint.

Instance Segmentation Object Detection +1

147

Paper
Code

RobustART: Benchmarking Robustness on Architecture Design and Training Techniques

1 code implementation • 11 Sep 2021 • Shiyu Tang, Ruihao Gong, Yan Wang, Aishan Liu, Jiakai Wang, Xinyun Chen, Fengwei Yu, Xianglong Liu, Dawn Song, Alan Yuille, Philip H. S. Torr, DaCheng Tao

Thus, we propose RobustART, the first comprehensive Robustness investigation benchmark on ImageNet regarding ARchitecture design (49 human-designed off-the-shelf architectures and 1200+ networks from neural architecture search) and Training techniques (10+ techniques, e. g., data augmentation) towards diverse noises (adversarial, natural, and system noises).

Adversarial Robustness Benchmarking +2

143

Paper
Code

Category Anchor-Guided Unsupervised Domain Adaptation for Semantic Segmentation

1 code implementation • NeurIPS 2019 • Qiming Zhang, Jing Zhang, Wei Liu, DaCheng Tao

Although there has been a progress in matching the marginal distributions between two domains, the classifier favors the source domain features and makes incorrect predictions on the target domain due to category-agnostic feature alignment.

Ranked #24 on Image-to-Image Translation on SYNTHIA-to-Cityscapes

Semantic Segmentation Synthetic-to-Real Translation +1

138

Paper
Code

Contrastive Boundary Learning for Point Cloud Segmentation

1 code implementation • CVPR 2022 • Liyao Tang, Yibing Zhan, Zhe Chen, Baosheng Yu, DaCheng Tao

Point cloud segmentation is fundamental in understanding 3D environments.

Ranked #17 on Semantic Segmentation on S3DIS

Point Cloud Segmentation Segmentation +1

135

Paper
Code

Geometry-Aware Symmetric Domain Adaptation for Monocular Depth Estimation

1 code implementation • CVPR 2019 • Shanshan Zhao, Huan Fu, Mingming Gong, DaCheng Tao

Supervised depth estimation has achieved high accuracy due to the advanced deep network architectures.

Ranked #68 on Monocular Depth Estimation on KITTI Eigen split

Depth Prediction Domain Adaptation +2

133

Paper
Code

One for All: Towards Training One Graph Model for All Classification Tasks

1 code implementation • 29 Sep 2023 • Hao liu, Jiarui Feng, Lecheng Kong, Ningyue Liang, DaCheng Tao, Yixin Chen, Muhan Zhang

For in-context learning on graphs, OFA introduces a novel graph prompting paradigm that appends prompting substructures to the input graph, which enables it to address varied tasks without fine-tuning.

Graph Classification Graph Learning +3

131

Paper
Code

SnapMix: Semantically Proportional Mixing for Augmenting Fine-grained Data

2 code implementations • 9 Dec 2020 • Shaoli Huang, Xinchao Wang, DaCheng Tao

As the main discriminative information of a fine-grained image usually resides in subtle regions, methods along this line are prone to heavy label noise in fine-grained recognition.

Ranked #31 on Fine-Grained Image Classification on CUB-200-2011

Fine-Grained Image Classification Semantic Composition +1

130

Paper
Code

AP-10K: A Benchmark for Animal Pose Estimation in the Wild

4 code implementations • 28 Aug 2021 • Hang Yu, Yufei Xu, Jing Zhang, Wei Zhao, Ziyu Guan, DaCheng Tao

The experimental results provide sound empirical evidence on the superiority of learning from diverse animals species in terms of both accuracy and generalization ability.

Animal Pose Estimation Domain Generalization +1

123

Paper
Code

APT-36K: A Large-scale Benchmark for Animal Pose Estimation and Tracking

4 code implementations • 12 Jun 2022 • Yuxiang Yang, Junjie Yang, Yufei Xu, Jing Zhang, Long Lan, DaCheng Tao

Based on APT-36K, we benchmark several representative models on the following three tracks: (1) supervised animal pose estimation on a single frame under intra- and inter-domain transfer learning settings, (2) inter-species domain generalization test for unseen animals, and (3) animal pose estimation with animal tracking.

Animal Pose Estimation Domain Generalization +1

123

Paper
Code

Hi-SAM: Marrying Segment Anything Model for Hierarchical Text Segmentation

1 code implementation • 31 Jan 2024 • Maoyuan Ye, Jing Zhang, Juhua Liu, Chenyu Liu, BaoCai Yin, Cong Liu, Bo Du, DaCheng Tao

In terms of the AMG mode, Hi-SAM segments text stroke foreground masks initially, then samples foreground points for hierarchical text mask generation and achieves layout analysis in passing.

Ranked #1 on Hierarchical Text Segmentation on HierText

Hierarchical Text Segmentation Segmentation +1

120

Paper
Code

Vision Transformer with Quadrangle Attention

1 code implementation • 27 Mar 2023 • Qiming Zhang, Jing Zhang, Yufei Xu, DaCheng Tao

Window-based attention has become a popular choice in vision transformers due to its superior performance, lower computational complexity, and less memory footprint.

object-detection Object Detection +2

119

Paper
Code

Pruning Self-attentions into Convolutional Layers in Single Path

3 code implementations • 23 Nov 2021 • Haoyu He, Jianfei Cai, Jing Liu, Zizheng Pan, Jing Zhang, DaCheng Tao, Bohan Zhuang

Relying on the single-path space, we introduce learnable binary gates to encode the operation choices in MSA layers.

Ranked #18 on Efficient ViTs on ImageNet-1K (with DeiT-T)

Efficient ViTs Inductive Bias +1

Paper
Code

Graph Pooling for Graph Neural Networks: Progress, Challenges, and Opportunities

1 code implementation • 15 Apr 2022 • Chuang Liu, Yibing Zhan, Jia Wu, Chang Li, Bo Du, Wenbin Hu, Tongliang Liu, DaCheng Tao

Graph neural networks have emerged as a leading architecture for many graph-level tasks, such as graph classification and graph generation.

Graph Classification Graph Generation

Paper
Code

Towards Realistic Face Photo-Sketch Synthesis via Composition-Aided GANs

2 code implementations • 4 Dec 2017 • Jun Yu, Xingxin Xu, Fei Gao, Shengjie Shi, Meng Wang, DaCheng Tao, Qingming Huang

Experimental results show that our method is capable of generating both visually comfortable and identity-preserving face sketches/photos over a wide range of challenging data.

Ranked #1 on Face Sketch Synthesis on CUFS (FID metric)

Face Sketch Synthesis Generative Adversarial Network

Paper
Code

TGRNet: A Table Graph Reconstruction Network for Table Structure Recognition

1 code implementation • ICCV 2021 • Wenyuan Xue, Baosheng Yu, Wen Wang, DaCheng Tao, Qingyong Li

A table arranging data in rows and columns is a very effective data structure, which has been widely used in business and scientific research.

Cell Detection Graph Reconstruction +1

Paper
Code

Exploring Sequence Feature Alignment for Domain Adaptive Detection Transformers

1 code implementation • 27 Jul 2021 • Wen Wang, Yang Cao, Jing Zhang, Fengxiang He, Zheng-Jun Zha, Yonggang Wen, DaCheng Tao

In DQFA, a novel domain query is used to aggregate and align global context from the token sequence of both domains.

Domain Adaptation Object +2

Paper
Code

A Comprehensive Survey and Taxonomy on Single Image Dehazing Based on Deep Learning

1 code implementation • 7 Jun 2021 • Jie Gui, Xiaofeng Cong, Yuan Cao, Wenqi Ren, Jun Zhang, Jing Zhang, Jiuxin Cao, DaCheng Tao

With the development of convolutional neural networks, hundreds of deep learning based dehazing methods have been proposed.

Image Dehazing Single Image Dehazing

Paper
Code

MTP: Advancing Remote Sensing Foundation Model via Multi-Task Pretraining

1 code implementation • 20 Mar 2024 • Di Wang, Jing Zhang, Minqiang Xu, Lin Liu, Dongsheng Wang, Erzhong Gao, Chengxi Han, HaoNan Guo, Bo Du, DaCheng Tao, Liangpei Zhang

However, transferring the pretrained models to downstream tasks may encounter task discrepancy due to their formulation of pretraining as image classification or object discrimination tasks.

Ranked #1 on Semantic Segmentation on SpaceNet 1 (using extra training data)

Aerial Scene Classification Building change detection for remote sensing images +12

Paper
Code

SASA: Semantics-Augmented Set Abstraction for Point-based 3D Object Detection

1 code implementation • 6 Jan 2022 • Chen Chen, Zhe Chen, Jing Zhang, DaCheng Tao

We observe that the prevailing set abstraction design for down-sampling points may maintain too much unimportant background information that can affect feature learning for detecting objects.

3D Object Detection object-detection

Paper
Code

Event-based Simultaneous Localization and Mapping: A Comprehensive Survey

1 code implementation • 19 Apr 2023 • Kunping Huang, Sen Zhang, Jing Zhang, DaCheng Tao

This paper presents a timely and comprehensive review of event-based vSLAM algorithms that exploit the benefits of asynchronous and irregular event streams for localization and mapping tasks.

Motion Compensation Simultaneous Localization and Mapping

Paper
Code

Fashionformer: A simple, Effective and Unified Baseline for Human Fashion Segmentation and Recognition

1 code implementation • 10 Apr 2022 • Shilin Xu, Xiangtai Li, Jingbo Wang, Guangliang Cheng, Yunhai Tong, DaCheng Tao

This focus on joint human fashion segmentation and attribute recognition.

Attribute Fashion Understanding +1

Paper
Code

Diff-Font: Diffusion Model for Robust One-Shot Font Generation

1 code implementation • 12 Dec 2022 • Haibin He, Xinyuan Chen, Chaoyue Wang, Juhua Liu, Bo Du, DaCheng Tao, Yu Qiao

Specifically, a large stroke-wise dataset is constructed, and a stroke-wise diffusion model is proposed to preserve the structure and the completion of each generated character.

Font Generation

Paper
Code

Deepfake Generation and Detection: A Benchmark and Survey

1 code implementation • 26 Mar 2024 • Gan Pei, Jiangning Zhang, Menghan Hu, Zhenyu Zhang, Chengjie Wang, Yunsheng Wu, Guangtao Zhai, Jian Yang, Chunhua Shen, DaCheng Tao

Deepfake is a technology dedicated to creating highly realistic facial images and videos under specific conditions, which has significant application potential in fields such as entertainment, movie production, digital human creation, to name a few.

Attribute Face Reenactment +2

Paper
Code

Toward Human-Like Evaluation for Natural Language Generation with Error Analysis

1 code implementation • 20 Dec 2022 • Qingyu Lu, Liang Ding, Liping Xie, Kanjian Zhang, Derek F. Wong, DaCheng Tao

To this end, we augment BARTScore by incorporating the human-like error analysis strategies, namely BARTScore++, where the final score consists of both the evaluations of major errors and minor errors.

Language Modelling Machine Translation +2

Paper
Code

Error Analysis Prompting Enables Human-Like Translation Evaluation in Large Language Models

1 code implementation • 24 Mar 2023 • Qingyu Lu, Baopu Qiu, Liang Ding, Kanjian Zhang, Tom Kocmi, DaCheng Tao

To further improve the performance of LLMs on MT quality assessment, we investigate several prompting designs, and propose a new prompting method called \textbf{\texttt{Error Analysis Prompting}} (EAPrompt) by combining Chain-of-Thoughts (Wei et al., 2022) and Error Analysis (Lu et al., 2023).

Machine Translation Natural Language Understanding +3

Paper
Code

Self-Augmented Unpaired Image Dehazing via Density and Depth Decomposition

1 code implementation • CVPR 2022 • Yang Yang, Chaoyue Wang, Risheng Liu, Lin Zhang, Xiaojie Guo, DaCheng Tao

With estimated scene depth, our method is capable of re-rendering hazy images with different thicknesses which further benefits the training of the dehazing network.

Image Dehazing

Paper
Code

Heterogeneous Federated Learning: State-of-the-art and Research Challenges

2 code implementations • 20 Jul 2023 • Mang Ye, Xiuwen Fang, Bo Du, Pong C. Yuen, DaCheng Tao

Therefore, a systematic survey on this topic about the research challenges and state-of-the-art is essential.

Federated Learning

Paper
Code

Visual Compositional Learning for Human-Object Interaction Detection

4 code implementations • ECCV 2020 • Zhi Hou, Xiaojiang Peng, Yu Qiao, DaCheng Tao

The integration of decomposition and composition enables VCL to share object and verb features among different HOI samples and images, and to generate new interaction samples and new types of HOI, and thus largely alleviates the long-tail distribution problem and benefits low-shot or zero-shot HOI detection.

Ranked #3 on Affordance Recognition on HICO-DET(Unknown Concepts)

Affordance Recognition Object

Paper
Code

Detecting Human-Object Interaction via Fabricated Compositional Learning

1 code implementation • CVPR 2021 • Zhi Hou, Baosheng Yu, Yu Qiao, Xiaojiang Peng, DaCheng Tao

With the proposed object fabricator, we are able to generate large-scale HOI samples for rare and unseen categories to alleviate the open long-tailed issues in HOI detection.

Ranked #4 on Affordance Recognition on HICO-DET

Affordance Recognition Object +1

Paper
Code

Affordance Transfer Learning for Human-Object Interaction Detection

2 code implementations • CVPR 2021 • Zhi Hou, Baosheng Yu, Yu Qiao, Xiaojiang Peng, DaCheng Tao

The proposed method can thus be used to 1) improve the performance of HOI detection, especially for the HOIs with unseen objects; and 2) infer the affordances of novel objects.

Ranked #2 on Affordance Recognition on HICO-DET(Unknown Concepts)

Affordance Detection Affordance Recognition +4

Paper
Code

Discovering Human-Object Interaction Concepts via Self-Compositional Learning

2 code implementations • 27 Mar 2022 • Zhi Hou, Baosheng Yu, DaCheng Tao

Therefore, the proposed method enables the learning on both known and unknown HOI concepts.

Ranked #1 on Human-Object Interaction Concept Discovery on HICO-DET

Affordance Recognition Human-Object Interaction Concept Discovery +1

Paper
Code

Towards Data-Efficient Detection Transformers

2 code implementations • 17 Mar 2022 • Wen Wang, Jing Zhang, Yang Cao, Yongliang Shen, DaCheng Tao

Besides, we introduce a simple yet effective label augmentation method to provide richer supervision and improve data efficiency.

Paper
Code

Rethinking Portrait Matting with Privacy Preserving

1 code implementation • 31 Mar 2022 • Sihan Ma, Jizhizi Li, Jing Zhang, He Zhang, DaCheng Tao

P3M-10k consists of 10, 421 high resolution face-blurred portrait images along with high-quality alpha mattes, which enables us to systematically evaluate both trimap-free and trimap-based matting methods and obtain some useful findings about model generalization ability under the privacy preserving training (PPT) setting.

Ranked #1 on Image Matting on P3M-10k

Domain Generalization Image Matting +1

Paper
Code

JPerceiver: Joint Perception Network for Depth, Pose and Layout Estimation in Driving Scenes

1 code implementation • 16 Jul 2022 • Haimei Zhao, Jing Zhang, Sen Zhang, DaCheng Tao

A naive way is to accomplish them independently in a sequential or parallel manner, but there are many drawbacks, i. e., 1) the depth and VO results suffer from the inherent scale ambiguity issue; 2) the BEV layout is directly predicted from the front-view image without using any depth-related information, although the depth map contains useful geometry clues for inferring scene layouts.

Autonomous Driving Depth Estimation +3

Paper
Code

Neural Collapse Inspired Feature-Classifier Alignment for Few-Shot Class-Incremental Learning

1 code implementation • ICLR 2023 • Yibo Yang, Haobo Yuan, Xiangtai Li, Zhouchen Lin, Philip Torr, DaCheng Tao

In this paper, we deal with this misalignment dilemma in FSCIL inspired by the recently discovered phenomenon named neural collapse, which reveals that the last-layer features of the same class will collapse into a vertex, and the vertices of all classes are aligned with the classifier prototypes, which are formed as a simplex equiangular tight frame (ETF).

Ranked #3 on Few-Shot Class-Incremental Learning on CUB-200-2011

Few-Shot Class-Incremental Learning Incremental Learning

Paper
Code

Neural Collapse Inspired Feature-Classifier Alignment for Few-Shot Class Incremental Learning

1 code implementation • 6 Feb 2023 • Yibo Yang, Haobo Yuan, Xiangtai Li, Zhouchen Lin, Philip Torr, DaCheng Tao

Few-Shot Class-Incremental Learning Incremental Learning

Paper
Code

Towards Making the Most of ChatGPT for Machine Translation

1 code implementation • 24 Mar 2023 • Keqin Peng, Liang Ding, Qihuang Zhong, Li Shen, Xuebo Liu, Min Zhang, Yuanxin Ouyang, DaCheng Tao

We show that: 1) The performance of ChatGPT depends largely on temperature, and a lower temperature usually can achieve better performance; 2) Emphasizing the task information can further improve ChatGPT's performance, particularly in complex MT tasks; 3) Introducing domain information can elicit ChatGPT's generalization ability and improve its performance in the specific domain; 4) ChatGPT tends to generate hallucinations for non-English-centric MT tasks, which can be partially addressed by our proposed prompts but still need to be highlighted for the MT/NLP community.

In-Context Learning Machine Translation +2

Paper
Code

Neural Collapse Terminus: A Unified Solution for Class Incremental Learning and Its Variants

2 code implementations • 3 Aug 2023 • Yibo Yang, Haobo Yuan, Xiangtai Li, Jianlong Wu, Lefei Zhang, Zhouchen Lin, Philip Torr, DaCheng Tao, Bernard Ghanem

Beyond the normal case, long-tail class incremental learning and few-shot class incremental learning are also proposed to consider the data imbalance and data scarcity, respectively, which are common in real-world implementations and further exacerbate the well-known problem of catastrophic forgetting.

Few-Shot Class-Incremental Learning Incremental Learning

Paper
Code

Evolutionary Generative Adversarial Networks

3 code implementations • 1 Mar 2018 • Chaoyue Wang, Chang Xu, Xin Yao, DaCheng Tao

In this paper, we propose a novel GAN framework called evolutionary generative adversarial networks (E-GAN) for stable GAN training and improved generative performance.

Paper
Code

Geometry-Consistent Generative Adversarial Networks for One-Sided Unsupervised Domain Mapping

1 code implementation • CVPR 2019 • Huan Fu, Mingming Gong, Chaohui Wang, Kayhan Batmanghelich, Kun Zhang, DaCheng Tao

Unsupervised domain mapping aims to learn a function to translate domain X to Y by a function GXY in the absence of paired examples.

Generative Adversarial Network

Paper
Code

BAG: Bi-directional Attention Entity Graph Convolutional Network for Multi-hop Reasoning Question Answering

1 code implementation • NAACL 2019 • Yu Cao, Meng Fang, DaCheng Tao

Graph convolutional networks are used to obtain a relation-aware representation of nodes for entity graphs built from documents with multi-level features.

Question Answering

Paper
Code

Progressive One-shot Human Parsing

1 code implementation • 22 Dec 2020 • Haoyu He, Jing Zhang, Bhavani Thuraisingham, DaCheng Tao

In this paper, we devise a novel Progressive One-shot Parsing network (POPNet) to address two critical challenges , i. e., testing bias and small sizes.

Human Parsing Metric Learning +1

Paper
Code

End-to-end One-shot Human Parsing

1 code implementation • 4 May 2021 • Haoyu He, Bohan Zhuang, Jing Zhang, Jianfei Cai, DaCheng Tao

To address three main challenges in OSHP, i. e., small sizes, testing bias, and similar parts, we devise an End-to-end One-shot human Parsing Network (EOP-Net).

Human Parsing Metric Learning +1

Paper
Code

I3CL:Intra- and Inter-Instance Collaborative Learning for Arbitrary-shaped Scene Text Detection

1 code implementation • 3 Aug 2021 • Bo Du, Jian Ye, Jing Zhang, Juhua Liu, DaCheng Tao

Existing methods for arbitrary-shaped text detection in natural scenes face two critical issues, i. e., 1) fracture detections at the gaps in a text instance; and 2) inaccurate detections of arbitrary-shaped text instances with diverse background context.

Ranked #5 on Scene Text Detection on SCUT-CTW1500

Scene Text Detection Text Detection

Paper
Code

Domain-Specific Risk Minimization for Out-of-Distribution Generalization

1 code implementation • 18 Aug 2022 • Yi-Fan Zhang, Jindong Wang, Jian Liang, Zhang Zhang, Baosheng Yu, Liang Wang, DaCheng Tao, Xing Xie

Our bound motivates two strategies to reduce the gap: the first one is ensembling multiple classifiers to enrich the hypothesis space, then we propose effective gap estimation methods for guiding the selection of a better hypothesis for the target.

Domain Generalization Out-of-Distribution Generalization

Paper
Code

Adaptive Context-Aware Multi-Modal Network for Depth Completion

1 code implementation • 25 Aug 2020 • Shanshan Zhao, Mingming Gong, Huan Fu, DaCheng Tao

Furthermore, considering the mutli-modality of input data, we exploit the graph propagation on the two modalities respectively to extract multi-modal representations.

Depth Completion

Paper
Code

A Comprehensive Survey of Data Augmentation in Visual Reinforcement Learning

1 code implementation • 10 Oct 2022 • Guozheng Ma, Zhen Wang, Zhecheng Yuan, Xueqian Wang, Bo Yuan, DaCheng Tao

Visual reinforcement learning (RL), which makes decisions directly from high-dimensional visual inputs, has demonstrated significant potential in various domains.

Data Augmentation reinforcement-learning +1

Paper
Code

Grapy-ML: Graph Pyramid Mutual Learning for Cross-dataset Human Parsing

1 code implementation • 27 Nov 2019 • Haoyu He, Jing Zhang, Qiming Zhang, DaCheng Tao

In this paper, we propose a novel GRAph PYramid Mutual Learning (Grapy-ML) method to address the cross-dataset human parsing problem, where the annotations are at different granularities.

Human Parsing Semantic Segmentation

Paper
Code

DisPFL: Towards Communication-Efficient Personalized Federated Learning via Decentralized Sparse Training

1 code implementation • 1 Jun 2022 • Rong Dai, Li Shen, Fengxiang He, Xinmei Tian, DaCheng Tao

In this work, we propose a novel personalized federated learning framework in a decentralized (peer-to-peer) communication protocol named Dis-PFL, which employs personalized sparse masks to customize sparse local models on the edge.

Personalized Federated Learning

Paper
Code

LIIR: Learning Individual Intrinsic Reward in Multi-Agent Reinforcement Learning

1 code implementation • NeurIPS 2019 • Yali Du, Lei Han, Meng Fang, Ji Liu, Tianhong Dai, DaCheng Tao

A great challenge in cooperative decentralized multi-agent reinforcement learning (MARL) is generating diversified behaviors for each individual agent when receiving only a team reward.

Multi-agent Reinforcement Learning reinforcement-learning +3

Paper
Code

GPS-Net: Graph Property Sensing Network for Scene Graph Generation

1 code implementation • CVPR 2020 • Xin Lin, Changxing Ding, Jinquan Zeng, DaCheng Tao

There are three key properties of scene graph that have been underexplored in recent works: namely, the edge direction information, the difference in priority between nodes, and the long-tailed distribution of relationships.

Ranked #5 on Scene Graph Generation on Visual Genome

Graph Generation Scene Graph Generation

Paper
Code

Part-dependent Label Noise: Towards Instance-dependent Label Noise

1 code implementation • NeurIPS 2020 • Xiaobo Xia, Tongliang Liu, Bo Han, Nannan Wang, Mingming Gong, Haifeng Liu, Gang Niu, DaCheng Tao, Masashi Sugiyama

Learning with the \textit{instance-dependent} label noise is challenging, because it is hard to model such real-world noise.

Paper
Code

Domain Generalization via Entropy Regularization

1 code implementation • NeurIPS 2020 • Shanshan Zhao, Mingming Gong, Tongliang Liu, Huan Fu, DaCheng Tao

To arrive at this, some methods introduce a domain discriminator through adversarial learning to match the feature distributions in multiple source domains.

Ranked #43 on Domain Generalization on PACS

Domain Generalization

Paper
Code

Online Multiple Object Tracking with Cross-Task Synergy

1 code implementation • CVPR 2021 • Song Guo, Jingya Wang, Xinchao Wang, DaCheng Tao

On the other hand, such reliable embeddings can boost identity-awareness through memory aggregation, hence strengthen attention modules and suppress drifts.

Multiple Object Tracking Object +1

Paper
Code

One-Shot Affordance Detection

2 code implementations • 28 Jun 2021 • Hongchen Luo, Wei Zhai, Jing Zhang, Yang Cao, DaCheng Tao

To empower robots with this ability in unseen scenarios, we consider the challenging one-shot affordance detection problem in this paper, i. e., given a support image that depicts the action purpose, all objects in a scene with the common affordance should be detected.

4k Affordance Detection

Paper
Code

One-Shot Object Affordance Detection in the Wild

1 code implementation • 8 Aug 2021 • Wei Zhai, Hongchen Luo, Jing Zhang, Yang Cao, DaCheng Tao

To empower robots with this ability in unseen scenarios, we first study the challenging one-shot affordance detection problem in this paper, i. e., given a support image that depicts the action purpose, all objects in a scene with the common affordance should be detected.

Action Recognition Affordance Detection +3

Paper
Code

Semantically Self-Aligned Network for Text-to-Image Part-aware Person Re-identification

1 code implementation • 27 Jul 2021 • Zefeng Ding, Changxing Ding, Zhiyin Shao, DaCheng Tao

Third, we introduce a Compound Ranking (CR) loss that makes use of textual descriptions for other images of the same identity to provide extra supervision, thereby effectively reducing the intra-class variance in textual features.

Ranked #1 on Image Retrieval on ICFG-PEDES

Image Retrieval Person Re-Identification +1

Paper
Code

PolyphonicFormer: Unified Query Learning for Depth-aware Video Panoptic Segmentation

1 code implementation • 5 Dec 2021 • Haobo Yuan, Xiangtai Li, Yibo Yang, Guangliang Cheng, Jing Zhang, Yunhai Tong, Lefei Zhang, DaCheng Tao

The Depth-aware Video Panoptic Segmentation (DVPS) is a new challenging vision problem that aims to predict panoptic segmentation and depth in a video simultaneously.

Ranked #1 on Depth-aware Video Panoptic Segmentation on SemKITTI-DVPS

Depth-aware Video Panoptic Segmentation Depth Estimation +4

Paper
Code

Image Captions are Natural Prompts for Text-to-Image Models

1 code implementation • 17 Jul 2023 • Shiye Lei, Hao Chen, Sen Zhang, Bo Zhao, DaCheng Tao

With the rapid development of Artificial Intelligence Generated Content (AIGC), it has become common practice in many learning tasks to train or fine-tune large models on synthetic data due to the data-scarcity and privacy leakage problems.

Image Captioning Image Generation

Paper
Code

Dynamic Focus-aware Positional Queries for Semantic Segmentation

2 code implementations • CVPR 2023 • Haoyu He, Jianfei Cai, Zizheng Pan, Jing Liu, Jing Zhang, DaCheng Tao, Bohan Zhuang

In this paper, we propose a simple yet effective query design for semantic segmentation termed Dynamic Focus-aware Positional Queries (DFPQ), which dynamically generates positional queries conditioned on the cross-attention scores from the preceding decoder block and the positional encodings for the corresponding image features, simultaneously.

Ranked #21 on Semantic Segmentation on ADE20K

Semantic Segmentation

Paper
Code

Distilling Knowledge from Graph Convolutional Networks

1 code implementation • CVPR 2020 • Yiding Yang, Jiayan Qiu, Mingli Song, DaCheng Tao, Xinchao Wang

To enable the knowledge transfer from the teacher GCN to the student, we propose a local structure preserving module that explicitly accounts for the topological semantics of the teacher.

Knowledge Distillation Transfer Learning

Paper
Code

SynFace: Face Recognition with Synthetic Data

1 code implementation • ICCV 2021 • Haibo Qiu, Baosheng Yu, Dihong Gong, Zhifeng Li, Wei Liu, DaCheng Tao

We then analyze the underlying causes behind the performance gap, e. g., the poor intra-class variations and the domain gap between synthetic and real face images.

Face Generation Face Recognition

Paper
Code

ActivityNet-QA: A Dataset for Understanding Complex Web Videos via Question Answering

1 code implementation • 6 Jun 2019 • Zhou Yu, Dejing Xu, Jun Yu, Ting Yu, Zhou Zhao, Yueting Zhuang, DaCheng Tao

It is both crucial and natural to extend this research direction to the video domain for video question answering (VideoQA).

Ranked #29 on Video Question Answering on ActivityNet-QA

Visual Question Answering (VQA) Zero-Shot Video Question Answer

Paper
Code

Region-wise Generative Adversarial ImageInpainting for Large Missing Areas

1 code implementation • 27 Sep 2019 • Yuqing Ma, Xianglong Liu, Shihao Bai, Lei Wang, Aishan Liu, DaCheng Tao, Edwin Hancock

To address these problems, we propose a generic inpainting framework capable of handling with incomplete images on both continuous and discontinuous large missing areas, in an adversarial manner.

Image Inpainting

Paper
Code

Nighttime Dehazing with a Synthetic Benchmark

1 code implementation • 10 Aug 2020 • Jing Zhang, Yang Cao, Zheng-Jun Zha, DaCheng Tao

To address this issue, we propose a novel synthetic method called 3R to simulate nighttime hazy images from daytime clear images, which first reconstructs the scene geometry, then simulates the light rays and object reflectance, and finally renders the haze effects.

Paper
Code

Panoptic-PartFormer: Learning a Unified Model for Panoptic Part Segmentation

1 code implementation • 10 Apr 2022 • Xiangtai Li, Shilin Xu, Yibo Yang, Guangliang Cheng, Yunhai Tong, DaCheng Tao

To the best of our knowledge, we are the first to solve the PPS problem via \textit{a unified and end-to-end transformer model.

Ranked #2 on Part-aware Panoptic Segmentation on Pascal Panoptic Parts

Panoptic Segmentation Part-aware Panoptic Segmentation +1

Paper
Code

PanopticPartFormer++: A Unified and Decoupled View for Panoptic Part Segmentation

1 code implementation • 3 Jan 2023 • Xiangtai Li, Shilin Xu, Yibo Yang, Haobo Yuan, Guangliang Cheng, Yunhai Tong, Zhouchen Lin, Ming-Hsuan Yang, DaCheng Tao

Third, inspired by Mask2Former, based on our meta-architecture, we propose Panoptic-PartFormer++ and design a new part-whole cross-attention scheme to boost part segmentation qualities further.

Panoptic Segmentation Segmentation

Paper
Code

Sensitivity-Aware Visual Parameter-Efficient Fine-Tuning

1 code implementation • ICCV 2023 • Haoyu He, Jianfei Cai, Jing Zhang, DaCheng Tao, Bohan Zhuang

Visual Parameter-Efficient Fine-Tuning (PEFT) has become a powerful alternative for full fine-tuning so as to adapt pre-trained vision models to downstream tasks, which only tunes a small number of parameters while freezing the vast majority ones to ease storage burden and optimization difficulty.

Paper
Code

Exploring Plain ViT Reconstruction for Multi-class Unsupervised Anomaly Detection

1 code implementation • 12 Dec 2023 • Jiangning Zhang, Xuhai Chen, Yabiao Wang, Chengjie Wang, Yong liu, Xiangtai Li, Ming-Hsuan Yang, DaCheng Tao

Following this spirit, this paper explores plain ViT architecture for MUAD.

Unsupervised Anomaly Detection

Paper
Code

Learning Feature Inversion for Multi-class Anomaly Detection under General-purpose COCO-AD Benchmark

1 code implementation • 16 Apr 2024 • Jiangning Zhang, Chengjie Wang, Xiangtai Li, Guanzhong Tian, Zhucun Xue, Yong liu, Guansong Pang, DaCheng Tao

Moreover, current metrics such as AU-ROC have nearly reached saturation on simple datasets, which prevents a comprehensive evaluation of different methods.

Anomaly Detection object-detection +2

Paper
Code

Variational Distillation for Multi-View Learning

3 code implementations • 20 Jun 2022 • Xudong Tian, Zhizhong Zhang, Cong Wang, Wensheng Zhang, Yanyun Qu, Lizhuang Ma, Zongze Wu, Yuan Xie, DaCheng Tao

Information Bottleneck (IB) based multi-view learning provides an information theoretic principle for seeking shared information contained in heterogeneous data descriptions.

MULTI-VIEW LEARNING Representation Learning

Paper
Code

Heatmap Regression via Randomized Rounding

2 code implementations • 1 Sep 2020 • Baosheng Yu, DaCheng Tao

Previous methods to overcome the sub-pixel localization problem usually rely on high-resolution heatmaps.

Face Alignment Pose Estimation +2

Paper
Code

Improved Techniques for Learning to Dehaze and Beyond: A Collective Study

1 code implementation • 30 Jun 2018 • Yu Liu, Guanlong Zhao, Boyuan Gong, Yang Li, Ritu Raj, Niraj Goel, Satya Kesav, Sandeep Gottimukkala, Zhangyang Wang, Wenqi Ren, DaCheng Tao

Here we explore two related but important tasks based on the recently released REalistic Single Image DEhazing (RESIDE) benchmark dataset: (i) single image dehazing as a low-level image restoration problem; and (ii) high-level visual understanding (e. g., object detection) of hazy images.

Image Dehazing Image Restoration +4

Paper
Code

Brain Age Estimation From MRI Using Cascade Networks with Ranking Loss

1 code implementation • 6 Jun 2021 • Jian Cheng, Ziyang Liu, Hao Guan, Zhenzhou Wu, Haogang Zhu, Jiyang Jiang, Wei Wen, DaCheng Tao, Tao Liu

In this paper, a novel 3D convolutional network, called two-stage-age-network (TSAN), is proposed to estimate brain age from T1-weighted MRI data.

Age Estimation

Paper
Code

A Survey on Self-supervised Learning: Algorithms, Applications, and Future Trends

1 code implementation • 13 Jan 2023 • Jie Gui, Tuo Chen, Jing Zhang, Qiong Cao, Zhenan Sun, Hao Luo, DaCheng Tao

Deep supervised learning algorithms typically require a large volume of labeled data to achieve satisfactory performance.

Self-Supervised Learning

Paper
Code

Learning with Biased Complementary Labels

1 code implementation • ECCV 2018 • Xiyu Yu, Tongliang Liu, Mingming Gong, DaCheng Tao

We therefore reason that the transition probabilities will be different.

Paper
Code

Where and What? Examining Interpretable Disentangled Representations

1 code implementation • CVPR 2021 • Xinqi Zhu, Chang Xu, DaCheng Tao

We thus impose a perturbation on a certain dimension of the latent code, and expect to identify the perturbation along this dimension from the generated images so that the encoding of simple variations can be enforced.

Disentanglement Model Selection +1

Paper
Code

WebUAV-3M: A Benchmark for Unveiling the Power of Million-Scale Deep UAV Tracking

1 code implementation • 19 Jan 2022 • Chunhui Zhang, Guanjie Huang, Li Liu, Shan Huang, Yinan Yang, Xiang Wan, Shiming Ge, DaCheng Tao

In this work, we propose WebUAV-3M, the largest public UAV tracking benchmark to date, to facilitate both the development and evaluation of deep UAV trackers.

Paper
Code

Unified Discrete Diffusion for Simultaneous Vision-Language Generation

1 code implementation • 27 Nov 2022 • Minghui Hu, Chuanxia Zheng, Heliang Zheng, Tat-Jen Cham, Chaoyue Wang, Zuopeng Yang, DaCheng Tao, Ponnuthurai N. Suganthan

The recently developed discrete diffusion models perform extraordinarily well in the text-to-image task, showing significant promise for handling the multi-modality signals.

multimodal generation Text Generation +1

Paper
Code

SPAGAN: Shortest Path Graph Attention Network

1 code implementation • 10 Jan 2021 • Yiding Yang, Xinchao Wang, Mingli Song, Junsong Yuan, DaCheng Tao

SPAGAN therefore allows for a more informative and intact exploration of the graph structure and further {a} more effective aggregation of information from distant neighbors into the center node, as compared to node-based GCN methods.

Graph Attention

Paper
Code

Learn From Model Beyond Fine-Tuning: A Survey

1 code implementation • 12 Oct 2023 • Hongling Zheng, Li Shen, Anke Tang, Yong Luo, Han Hu, Bo Du, DaCheng Tao

LFM focuses on the research, modification, and design of FM based on the model interface, so as to better understand the model structure and weights (in a black box environment), and to generalize the model to downstream tasks.

Meta-Learning Model Editing

Paper
Code

Semantic Structure-based Unsupervised Deep Hashing

1 code implementation • IJCAI2018 2018 • Erkun Yang, Cheng Deng, Tongliang Liu, Wei Liu, DaCheng Tao

Hashing is becoming increasingly popular for approximate nearest neighbor searching in massive databases due to its storage and search efficiency.

Deep Hashing Semantic Similarity +1

Paper
Code

Where Does the Performance Improvement Come From? -- A Reproducibility Concern about Image-Text Retrieval

1 code implementation • 8 Mar 2022 • Jun Rao, Fei Wang, Liang Ding, Shuhan Qi, Yibing Zhan, Weifeng Liu, DaCheng Tao

In contrast to previous works, we focus on the reproducibility of the approaches and the examination of the elements that lead to improved performance by pretrained and nonpretrained models in retrieving images and text.

Information Retrieval Retrieval +1

Paper
Code

Knowledge Graph Augmented Network Towards Multiview Representation Learning for Aspect-based Sentiment Analysis

1 code implementation • 13 Jan 2022 • Qihuang Zhong, Liang Ding, Juhua Liu, Bo Du, Hua Jin, DaCheng Tao

To this end, we propose a knowledge graph augmented network KGAN, which aims to effectively incorporate external knowledge with explicitly syntactic and contextual information.

Aspect-Based Sentiment Analysis Aspect-Based Sentiment Analysis (ABSA) +2

Paper
Code

Robust Unlearnable Examples: Protecting Data Against Adversarial Learning

2 code implementations • 28 Mar 2022 • Shaopeng Fu, Fengxiang He, Yang Liu, Li Shen, DaCheng Tao

To address this concern, methods are proposed to make data unlearnable for deep learning models by adding a type of error-minimizing noise.

Paper
Code

Vega-MT: The JD Explore Academy Translation System for WMT22

1 code implementation • 20 Sep 2022 • Changtong Zan, Keqin Peng, Liang Ding, Baopu Qiu, Boan Liu, Shwai He, Qingyu Lu, Zheng Zhang, Chuang Liu, Weifeng Liu, Yibing Zhan, DaCheng Tao

As for model sizes, we scale the Transformer-Big up to the extremely large model that owns nearly 4. 7 Billion parameters, to fully enhance the model capacity for our Vega-MT.

Ranked #1 on Machine Translation on WMT 2022 English-Russian

Data Augmentation Machine Translation +1

Paper
Code

DCN-T: Dual Context Network with Transformer for Hyperspectral Image Classification

2 code implementations • 19 Apr 2023 • Di Wang, Jing Zhang, Bo Du, Liangpei Zhang, DaCheng Tao

Hyperspectral image (HSI) classification is challenging due to spatial variability caused by complex imaging conditions.

Hyperspectral Image Classification Image Generation

Paper
Code

Recurrent Glimpse-based Decoder for Detection with Transformer

1 code implementation • CVPR 2022 • Zhe Chen, Jing Zhang, DaCheng Tao

Then, a glimpse-based decoder is introduced to provide refined detection results based on both the glimpse features and the attention modeling outputs of the previous stage.

Ranked #1 on Object Detection on MS COCO (GFlops metric)

Object Detection

Paper
Code

Inducing Neural Collapse in Imbalanced Learning: Do We Really Need a Learnable Classifier at the End of Deep Neural Network?

1 code implementation • 17 Mar 2022 • Yibo Yang, Shixiang Chen, Xiangtai Li, Liang Xie, Zhouchen Lin, DaCheng Tao

Modern deep neural networks for classification usually jointly learn a backbone for representation and a linear classifier to output the logit of each class.

Ranked #26 on Long-tail Learning on CIFAR-10-LT (ρ=100)

Classification Image Classification +1

Paper
Code

End2End Occluded Face Recognition by Masking Corrupted Features

1 code implementation • 21 Aug 2021 • Haibo Qiu, Dihong Gong, Zhifeng Li, Wei Liu, DaCheng Tao

However, the state-of-the-art general face recognition models do not generalize well to occluded face images, which are exactly the common cases in real-world scenarios.

Face Recognition

Paper
Code

Learning Affordance Grounding from Exocentric Images

2 code implementations • CVPR 2022 • Hongchen Luo, Wei Zhai, Jing Zhang, Yang Cao, DaCheng Tao

To empower an agent with such ability, this paper proposes a task of affordance grounding from exocentric view, i. e., given exocentric human-object interaction and egocentric object images, learning the affordance knowledge of the object and transferring it to the egocentric image using only the affordance label as supervision.

Human-Object Interaction Detection Object +1

Paper
Code

Grounded Affordance from Exocentric View

2 code implementations • 28 Aug 2022 • Hongchen Luo, Wei Zhai, Jing Zhang, Yang Cao, DaCheng Tao

Due to the diversity of interactive affordance, the uniqueness of different individuals leads to diverse interactions, which makes it difficult to establish an explicit link between object parts and affordance labels.

Human-Object Interaction Detection Object +1

Paper
Code

GFNet: Geometric Flow Network for 3D Point Cloud Semantic Segmentation

1 code implementation • 6 Jul 2022 • Haibo Qiu, Baosheng Yu, DaCheng Tao

However, recent projection-based methods for point cloud semantic segmentation usually utilize a vanilla late fusion strategy for the predictions of different views, failing to explore the complementary information from a geometric perspective during the representation learning.

Ranked #1 on Robust 3D Semantic Segmentation on nuScenes-C

LIDAR Semantic Segmentation Representation Learning +2

Paper
Code

ReAct: Temporal Action Detection with Relational Queries

1 code implementation • 14 Jul 2022 • Dingfeng Shi, Yujie Zhong, Qiong Cao, Jing Zhang, Lin Ma, Jia Li, DaCheng Tao

Moreover, we propose two losses to facilitate and stabilize the training of action classification.

Ranked #15 on Temporal Action Localization on THUMOS’14

Action Classification Action Detection +4

Paper
Code

Make Sharpness-Aware Minimization Stronger: A Sparsified Perturbation Approach

1 code implementation • 11 Oct 2022 • Peng Mi, Li Shen, Tianhe Ren, Yiyi Zhou, Xiaoshuai Sun, Rongrong Ji, DaCheng Tao

One of the popular solutions is Sharpness-Aware Minimization (SAM), which smooths the loss landscape via minimizing the maximized change of training loss when adding a perturbation to the weight.

Paper
Code

Segmentation-Aware Image Denoising without Knowing True Segmentation

2 code implementations • 22 May 2019 • Sicheng Wang, Bihan Wen, Junru Wu, DaCheng Tao, Zhangyang Wang

Several recent works discussed application-driven image restoration neural networks, which are capable of not only removing noise in images but also preserving their semantic-aware details, making them suitable for various high-level computer vision tasks as the pre-processing step.

Image Denoising Image Restoration +2

Paper
Code

FAMED-Net: A Fast and Accurate Multi-scale End-to-end Dehazing Network

1 code implementation • 11 Jun 2019 • Jing Zhang, DaCheng Tao

Single image dehazing is a critical image pre-processing step for subsequent high-level computer vision tasks.

Computational Efficiency Image Dehazing +1

Paper
Code

Learn, Imagine and Create: Text-to-Image Generation from Prior Knowledge

1 code implementation • NeurIPS 2019 • Tingting Qiao, Jing Zhang, Duanqing Xu, DaCheng Tao

Given a text description, we immediately imagine an overall visual impression using this prior and, based on this, we draw a picture by progressively adding more and more details.

Text-to-Image Generation

Paper
Code

Artificial Neural Variability for Deep Learning: On Overfitting, Noise Memorization, and Catastrophic Forgetting

1 code implementation • 12 Nov 2020 • Zeke Xie, Fengxiang He, Shaopeng Fu, Issei Sato, DaCheng Tao, Masashi Sugiyama

Thus it motivates us to design a similar mechanism named {\it artificial neural variability} (ANV), which helps artificial neural networks learn some advantages from ``natural'' neural networks.

Memorization

Paper
Code

An Underwater Image Enhancement Benchmark Dataset and Beyond

1 code implementation • 11 Jan 2019 • Chongyi Li, Chunle Guo, Wenqi Ren, Runmin Cong, Junhui Hou, Sam Kwong, DaCheng Tao

In this paper, we construct an Underwater Image Enhancement Benchmark (UIEB) including 950 real-world underwater images, 890 of which have the corresponding reference images.

Ranked #5 on Underwater Image Restoration on LSUI (using extra training data)

Image Enhancement Underwater Image Restoration

Paper
Code

RegionCL: Can Simple Region Swapping Contribute to Contrastive Learning?

2 code implementations • 24 Nov 2021 • Yufei Xu, Qiming Zhang, Jing Zhang, DaCheng Tao

In this paper, we make the first attempt to demonstrate the importance of both regions in cropping from a complete perspective and propose a simple yet effective pretext task called Region Contrastive Learning (RegionCL).

Contrastive Learning

Paper
Code

Video Frame Interpolation without Temporal Priors

1 code implementation • NeurIPS 2020 • Youjian Zhang, Chaoyue Wang, DaCheng Tao

However, in complicated real-world situations, the temporal priors of videos, i. e. frames per second (FPS) and frame exposure time, may vary from different camera sensors.

Optical Flow Estimation Video Frame Interpolation

Paper
Code

One-pass Multi-task Networks with Cross-task Guided Attention for Brain Tumor Segmentation

1 code implementation • 5 Jun 2019 • Chenhong Zhou, Changxing Ding, Xinchao Wang, Zhentai Lu, DaCheng Tao

The model cascade (MC) strategy significantly alleviates the class imbalance issue via running a set of individual deep models for coarse-to-fine segmentation.

Ranked #1 on Brain Tumor Segmentation on BRATS-2015

Brain Tumor Segmentation Image Segmentation +2

Paper
Code

Exposure Trajectory Recovery from Motion Blur

1 code implementation • 6 Oct 2020 • Youjian Zhang, Chaoyue Wang, Stephen J. Maybank, DaCheng Tao

However, the motion information contained in a blurry image has yet to be fully explored and accurately formulated because: (i) the ground truth of dynamic motion is difficult to obtain; (ii) the temporal ordering is destroyed during the exposure; and (iii) the motion estimation from a blurry image is highly ill-posed.

Deblurring Image Deblurring +1

Paper
Code

Topology-aware Generalization of Decentralized SGD

1 code implementation • 25 Jun 2022 • Tongtian Zhu, Fengxiang He, Lan Zhang, Zhengyang Niu, Mingli Song, DaCheng Tao

Our theory indicates that the generalizability of D-SGD is positively correlated with the spectral gap, and can explain why consensus control in initial training phase can ensure better generalization.

Paper
Code

GraMMaR: Ground-aware Motion Model for 3D Human Motion Reconstruction

1 code implementation • 29 Jun 2023 • Sihan Ma, Qiong Cao, Hongwei Yi, Jing Zhang, DaCheng Tao

Demystifying complex human-ground interactions is essential for accurate and realistic 3D human motion reconstruction from RGB videos, as it ensures consistency between the humans and the ground plane.

Paper
Code

Pretrained Language Models for Dialogue Generation with Multiple Input Sources

1 code implementation • Findings of the Association for Computational Linguistics 2020 • Yu Cao, Wei Bi, Meng Fang, DaCheng Tao

In this work, we study dialogue models with multiple input sources adapted from the pretrained language model GPT2.

Dialogue Generation Language Modelling +1

Paper
Code

EATFormer: Improving Vision Transformer Inspired by Evolutionary Algorithm

1 code implementation • 19 Jun 2022 • Jiangning Zhang, Xiangtai Li, Yabiao Wang, Chengjie Wang, Yibo Yang, Yong liu, DaCheng Tao

Motivated by biological evolution, this paper explains the rationality of Vision Transformer by analogy with the proven practical Evolutionary Algorithm (EA) and derives that both have consistent mathematical formulation.

Image Classification

Paper
Code

Make Landscape Flatter in Differentially Private Federated Learning

1 code implementation • CVPR 2023 • Yifan Shi, Yingqi Liu, Kang Wei, Li Shen, Xueqian Wang, DaCheng Tao

Specifically, DP-FedSAM integrates Sharpness Aware Minimization (SAM) optimizer to generate local flatness models with better stability and weight perturbation robustness, which results in the small norm of local updates and robustness to DP noise, thereby improving the performance.

Federated Learning

Paper
Code

Towards the Flatter Landscape and Better Generalization in Federated Learning under Client-level Differential Privacy

1 code implementation • 1 May 2023 • Yifan Shi, Kang Wei, Li Shen, Yingqi Liu, Xueqian Wang, Bo Yuan, DaCheng Tao

To defend the inference attacks and mitigate the sensitive information leakages in Federated Learning (FL), client-level Differentially Private FL (DPFL) is the de-facto standard for privacy protection by clipping local updates and adding random noise.

Federated Learning

Paper
Code

Searching for Low-Bit Weights in Quantized Neural Networks

1 code implementation • NeurIPS 2020 • Zhaohui Yang, Yunhe Wang, Kai Han, Chunjing Xu, Chao Xu, DaCheng Tao, Chang Xu

Quantized neural networks with low-bit weights and activations are attractive for developing AI accelerators.

Image Classification Quantization +1

Paper
Code

Glance and Gaze: Inferring Action-aware Points for One-Stage Human-Object Interaction Detection

1 code implementation • CVPR 2021 • Xubin Zhong, Xian Qu, Changxing Ding, DaCheng Tao

In this paper, we propose a novel one-stage method, namely Glance and Gaze Network (GGNet), which adaptively models a set of actionaware points (ActPoints) via glance and gaze steps.

Ranked #17 on Human-Object Interaction Detection on V-COCO

Human-Object Interaction Detection

Paper
Code

Fine-tuning Global Model via Data-Free Knowledge Distillation for Non-IID Federated Learning

1 code implementation • CVPR 2022 • Lin Zhang, Li Shen, Liang Ding, DaCheng Tao, Ling-Yu Duan

Instead, we propose a data-free knowledge distillation method to fine-tune the global model in the server (FedFTG), which relieves the issue of direct model aggregation.

Data-free Knowledge Distillation Federated Learning

Paper
Code

CLAMP: Prompt-based Contrastive Learning for Connecting Language and Animal Pose

1 code implementation • CVPR 2023 • Xu Zhang, Wen Wang, Zhe Chen, Yufei Xu, Jing Zhang, DaCheng Tao

Motivated by the progress of visual-language research, we propose that pre-trained language models (e. g., CLIP) can facilitate animal pose estimation by providing rich prior knowledge for describing animal keypoints in text.

Animal Pose Estimation Contrastive Learning

Paper
Code

Upcycling Models under Domain and Category Shift

3 code implementations • CVPR 2023 • Sanqing Qu, Tianpei Zou, Florian Roehrbein, Cewu Lu, Guang Chen, DaCheng Tao, Changjun Jiang

We examine the superiority of our GLC on multiple benchmarks with different category shift scenarios, including partial-set, open-set, and open-partial-set DA.

Ranked #2 on Universal Domain Adaptation on VisDA2017

Clustering Source-Free Domain Adaptation +2

Paper
Code

GLC++: Source-Free Universal Domain Adaptation through Global-Local Clustering and Contrastive Affinity Learning

2 code implementations • 21 Mar 2024 • Sanqing Qu, Tianpei Zou, Florian Röhrbein, Cewu Lu, Guang Chen, DaCheng Tao, Changjun Jiang

GLC++ enhances the novel category clustering accuracy of GLC by 4. 3% in open-set scenarios on Office-Home.

Clustering Contrastive Learning +2

Paper
Code

MirrorGAN: Learning Text-to-image Generation by Redescription

2 code implementations • CVPR 2019 • Tingting Qiao, Jing Zhang, Duanqing Xu, DaCheng Tao

Generating an image from a given text description has two goals: visual realism and semantic consistency.

Ranked #8 on Text-to-Image Generation on CUB (Inception score metric)

Sentence Text-to-Image Generation

Paper
Code

Quantum circuit architecture search for variational quantum algorithms

1 code implementation • 20 Oct 2020 • Yuxuan Du, Tao Huang, Shan You, Min-Hsiu Hsieh, DaCheng Tao

Variational quantum algorithms (VQAs) are expected to be a path to quantum advantages on noisy intermediate-scale quantum devices.

Paper
Code

Comprehensive Graph-conditional Similarity Preserving Network for Unsupervised Cross-modal Hashing

1 code implementation • 25 Dec 2020 • Jun Yu, Hao Zhou, Yibing Zhan, DaCheng Tao

Essentially, DGCPN addresses the inaccurate similarity problem by exploring and exploiting the data's intrinsic relationships in a graph.

Quantization Retrieval

Paper
Code

Out-of-boundary View Synthesis Towards Full-Frame Video Stabilization

1 code implementation • ICCV 2021 • Yufei Xu, Jing Zhang, DaCheng Tao

However, since the view outside the boundary is not available during warping, the resulting holes around the boundary of the stabilized frame must be discarded (i. e., cropping) to maintain visual consistency, and thus does leads to a tradeoff between stability and cropping ratio.

Video Stabilization

Paper
Code

BMD: A General Class-balanced Multicentric Dynamic Prototype Strategy for Source-free Domain Adaptation

1 code implementation • 6 Apr 2022 • Sanqing Qu, Guang Chen, Jing Zhang, Zhijun Li, wei he, DaCheng Tao

Source-free Domain Adaptation (SFDA) aims to adapt a pre-trained source model to the unlabeled target domain without accessing the well-labeled source data, which is a much more practical setting due to the data privacy, security, and transmission issues.

Clustering Pseudo Label +1

Paper
Code

FedSpeed: Larger Local Interval, Less Communication Round, and Higher Generalization Accuracy

1 code implementation • 21 Feb 2023 • Yan Sun, Li Shen, Tiansheng Huang, Liang Ding, DaCheng Tao

Federated learning is an emerging distributed machine learning framework which jointly trains a global model via a large number of local devices with data privacy protections.

Federated Learning

Paper
Code

Dynamic Regularized Sharpness Aware Minimization in Federated Learning: Approaching Global Consistency and Smooth Landscape

1 code implementation • 19 May 2023 • Yan Sun, Li Shen, Shixiang Chen, Liang Ding, DaCheng Tao

In federated learning (FL), a cluster of local clients are chaired under the coordination of the global server and cooperatively train one model with privacy protection.

Federated Learning

Paper
Code

Towards High Performance Human Keypoint Detection

1 code implementation • 3 Feb 2020 • Jing Zhang, Zhe Chen, DaCheng Tao

Human keypoint detection from a single image is very challenging due to occlusion, blur, illumination and scale variance.

Ranked #5 on Pose Estimation on COCO test-dev

Human Detection Keypoint Detection +1

Paper
Code

Deep Multimodal Neural Architecture Search

1 code implementation • 25 Apr 2020 • Zhou Yu, Yuhao Cui, Jun Yu, Meng Wang, DaCheng Tao, Qi Tian

Most existing works focus on a single task and design neural architectures manually, which are highly task-specific and hard to generalize to different tasks.

Ranked #19 on Visual Question Answering (VQA) on VQA v2 test-std

Image-text matching Neural Architecture Search +4

Paper
Code

Gated-GAN: Adversarial Gated Networks for Multi-Collection Style Transfer

2 code implementations • 4 Apr 2019 • Xinyuan Chen, Chang Xu, Xiaokang Yang, Li Song, DaCheng Tao

We propose adversarial gated networks (Gated GAN) to transfer multiple styles in a single model.

Style Transfer

Paper
Code

Improving Video Instance Segmentation via Temporal Pyramid Routing

1 code implementation • 28 Jul 2021 • Xiangtai Li, Hao He, Yibo Yang, Henghui Ding, Kuiyuan Yang, Guangliang Cheng, Yunhai Tong, DaCheng Tao

To incorporate both temporal and scale information, we propose a Temporal Pyramid Routing (TPR) strategy to conditionally align and conduct pixel-level aggregation from a feature pyramid pair of two adjacent frames.

Instance Segmentation Panoptic Segmentation +2

Paper
Code

FIBA: Frequency-Injection based Backdoor Attack in Medical Image Analysis

3 code implementations • CVPR 2022 • Yu Feng, Benteng Ma, Jing Zhang, Shanshan Zhao, Yong Xia, DaCheng Tao

However, designing a unified BA method that can be applied to various MIA systems is challenging due to the diversity of imaging modalities (e. g., X-Ray, CT, and MRI) and analysis tasks (e. g., classification, detection, and segmentation).

Artifact Detection Backdoor Attack +6

Paper
Code

CktGNN: Circuit Graph Neural Network for Electronic Design Automation

1 code implementation • 31 Aug 2023 • Zehao Dong, Weidong Cao, Muhan Zhang, DaCheng Tao, Yixin Chen, Xuan Zhang

The electronic design automation of analog circuits has been a longstanding challenge in the integrated circuit field due to the huge design space and complex design trade-offs among circuit specifications.

Bayesian Optimization Graph Learning

Paper
Code

Continual Learning on Graphs: Challenges, Solutions, and Opportunities

1 code implementation • 18 Feb 2024 • Xikun Zhang, Dongjin Song, DaCheng Tao

To bridge the gap, we provide a comprehensive review of existing continual graph learning (CGL) algorithms by elucidating the different task settings and categorizing the existing methods based on their characteristics.

Continual Learning Graph Learning

Paper
Code

Rethinking Diversified and Discriminative Proposal Generation for Visual Grounding

1 code implementation • 9 May 2018 • Zhou Yu, Jun Yu, Chenchao Xiang, Zhou Zhao, Qi Tian, DaCheng Tao

Visual grounding aims to localize an object in an image referred to by a textual query phrase.

Ranked #9 on Phrase Grounding on Flickr30k Entities Test

Phrase Grounding Visual Grounding

Paper
Code

On Positive-Unlabeled Classification in GAN

1 code implementation • CVPR 2020 • Tianyu Guo, Chang Xu, Jiajun Huang, Yunhe Wang, Boxin Shi, Chao Xu, DaCheng Tao

In contrast, it is more reasonable to treat the generated data as unlabeled, which could be positive or negative according to their quality.

Classification General Classification

Paper
Code

Commutative Lie Group VAE for Disentanglement Learning

1 code implementation • 7 Jun 2021 • Xinqi Zhu, Chang Xu, DaCheng Tao

Instead, we propose to encode the data variations with groups, a structure not only can equivariantly represent variations, but can also be adaptively optimized to preserve the properties of data variations.

Disentanglement

Paper
Code

DSP: Dual Soft-Paste for Unsupervised Domain Adaptive Semantic Segmentation

1 code implementation • 20 Jul 2021 • Li Gao, Jing Zhang, Lefei Zhang, DaCheng Tao

In addition, feature-level alignment is carried out by aligning the feature maps of the source and target images from student network using a weighted maximum mean discrepancy loss.

Ranked #18 on Synthetic-to-Real Translation on SYNTHIA-to-Cityscapes

Semantic Segmentation Synthetic-to-Real Translation +1

Paper
Code

Merging Experts into One: Improving Computational Efficiency of Mixture of Experts

1 code implementation • 15 Oct 2023 • Shwai He, Run-Ze Fan, Liang Ding, Li Shen, Tianyi Zhou, DaCheng Tao

Although a sparse Mixture of Experts (MoE) can reduce the cost by activating a small subset of parameters (e. g., one expert) for each input, its computation escalates significantly if increasing the number of activated experts, limiting its practical utility.

Computational Efficiency

Paper
Code

Centered Weight Normalization in Accelerating Training of Deep Neural Networks

1 code implementation • ICCV 2017 • Lei Huang, Xianglong Liu, Yang Liu, Bo Lang, DaCheng Tao

Training deep neural networks is difficult for the pathological curvature problem.

Paper
Code

Quantum Geometric Machine Learning for Quantum Circuits and Control

1 code implementation • 19 Jun 2020 • Elija Perrier, Christopher Ferrie, DaCheng Tao

Our results demonstrate how geometric control techniques can be used to both (a) verify the extent to which geometrically synthesised quantum circuits lie along geodesic, and thus time-optimal, routes and (b) synthesise those circuits.

BIG-bench Machine Learning

Paper
Code

Symbiotic Adversarial Learning for Attribute-based Person Search

1 code implementation • ECCV 2020 • Yu-Tong Cao, Jingya Wang, DaCheng Tao

The current state-of-the-art methods either focus on learning better cross-modal embeddings by mining only seen data, or they explicitly use generative adversarial networks (GANs) to synthesize unseen features.

Attribute Person Search

Paper
Code

Auto Learning Attention

1 code implementation • NeurIPS 2020 • Benteng Ma, Jing Zhang, Yong Xia, DaCheng Tao

Attention modules have been demonstrated effective in strengthening the representation ability of a neural network via reweighting spatial or channel features or stacking both operations sequentially.

Image Classification Keypoint Detection +2

Paper
Code

A Contrastive Cross-Channel Data Augmentation Framework for Aspect-based Sentiment Analysis

1 code implementation • COLING 2022 • Bing Wang, Liang Ding, Qihuang Zhong, Ximing Li, DaCheng Tao

Aspect-based sentiment analysis (ABSA) is a fine-grained sentiment analysis task, which focuses on detecting the sentiment polarity towards the aspect in a sentence.

Aspect-Based Sentiment Analysis Aspect-Based Sentiment Analysis (ABSA) +4

Paper
Code

Benchmarking Single Image Dehazing and Beyond

1 code implementation • 12 Dec 2017 • Boyi Li, Wenqi Ren, Dengpan Fu, DaCheng Tao, Dan Feng, Wen-Jun Zeng, Zhangyang Wang

We present a comprehensive study and evaluation of existing single image dehazing algorithms, using a new large-scale benchmark consisting of both synthetic and real-world hazy images, called REalistic Single Image DEhazing (RESIDE).

Benchmarking Image Dehazing +1

Paper
Code

Boundary-assisted Region Proposal Networks for Nucleus Segmentation

1 code implementation • 4 Jun 2020 • Shengcong Chen, Changxing Ding, DaCheng Tao

Accordingly, in this paper, we devise a Boundary-assisted Region Proposal Network (BRP-Net) that achieves robust instance-level nucleus segmentation.

Boundary Detection Instance Segmentation +3

Paper
Code

Polysemy Deciphering Network for Robust Human-Object Interaction Detection

2 code implementations • 7 Aug 2020 • Xubin Zhong, Changxing Ding, Xian Qu, DaCheng Tao

To address this issue, in this paper, we propose a novel Polysemy Deciphering Network (PD-Net) that decodes the visual polysemy of verbs for HOI detection in three distinct ways.

Ranked #19 on Human-Object Interaction Detection on V-COCO

Human-Object Interaction Detection Object +1

Paper
Code

Polysemy Deciphering Network for Human-Object Interaction Detection

1 code implementation • ECCV 2020 • Xubin Zhong, Changxing Ding, Xian Qu, DaCheng Tao

First, PD-Net augments human pose and spatial features for HOI detection using language priors, enabling the verb classifiers to receive language hints that reduce the intra-class variation of the same verb.

Human-Object Interaction Detection Object +1

Paper
Code

A Model-Agnostic Data Manipulation Method for Persona-based Dialogue Generation

1 code implementation • ACL 2022 • Yu Cao, Wei Bi, Meng Fang, Shuming Shi, DaCheng Tao

To alleviate the above data issues, we propose a data manipulation method, which is model-agnostic to be packed with any persona-based dialogue generation model to improve its performance.

Dialogue Generation

Paper
Code

Toward Real-world Single Image Deraining: A New Benchmark and Beyond

1 code implementation • 11 Jun 2022 • Wei Li, Qiming Zhang, Jing Zhang, Zhen Huang, Xinmei Tian, DaCheng Tao

To address these issues, we establish a new high-quality dataset named RealRain-1k, consisting of $1, 120$ high-resolution paired clean and rainy images with low- and high-density rain streaks, respectively.

Domain Generalization Image Restoration +2

Paper
Code

Cannot find the paper you are looking for? You can Submit a new open access paper.