ZeroNLG: Aligning and Autoencoding Domains for Zero-Shot Multimodal and Multilingual Natural Language Generation

1 code implementation11 Mar 2023 Bang Yang, Fenglin Liu, Yuexian Zou, Xian Wu, YaoWei Wang, David A. Clifton

We present the results of extensive experiments on twelve NLG tasks, showing that, without using any labeled downstream pairs for training, ZeroNLG generates high-quality and believable outputs and significantly outperforms existing zero-shot methods.

Image Captioning Machine Translation +5

Backdoor for Debias: Mitigating Model Bias with Backdoor Attack-based Artificial Bias

no code implementations1 Mar 2023 Shangxi Wu, Qiuyang He, Fangzhao Wu, Jitao Sang, YaoWei Wang, Changsheng Xu

In this work, we found that the backdoor attack can construct an artificial bias similar to the model bias derived in standard training.

Backdoor Attack Knowledge Distillation

Unsupervised Domain Adaptation via Distilled Discriminative Clustering

1 code implementation23 Feb 2023 Hui Tang, YaoWei Wang, Kui Jia

Differently, motivated by the fundamental assumption for domain adaptability, we re-cast the domain adaptation problem as discriminative clustering of target data, given strong privileged information provided by the closely related, labeled source data.

Unsupervised Domain Adaptation

Large-scale Multi-Modal Pre-trained Models: A Comprehensive Survey

1 code implementation20 Feb 2023 Xiao Wang, Guangyao Chen, Guangwu Qian, Pengcheng Gao, Xiao-Yong Wei, YaoWei Wang, Yonghong Tian, Wen Gao

With the urgent demand for generalized deep models, many pre-trained big models are proposed, such as BERT, ViT, GPT, etc.

DilateFormer: Multi-Scale Dilated Transformer for Visual Recognition

1 code implementation3 Feb 2023 Jiayu Jiao, Yu-Ming Tang, Kun-Yu Lin, Yipeng Gao, Jinhua Ma, YaoWei Wang, Wei-Shi Zheng

In this work, we explore effective Vision Transformers to pursue a preferable trade-off between the computational complexity and size of the attended receptive field.

Instance Segmentation object-detection +2

Unlearnable Clusters: Towards Label-agnostic Unlearnable Examples

1 code implementation31 Dec 2022 Jiaming Zhang, Xingjun Ma, Qi Yi, Jitao Sang, Yugang Jiang, YaoWei Wang, Changsheng Xu

Furthermore, we propose to leverage VisionandLanguage Pre-trained Models (VLPMs) like CLIP as the surrogate model to improve the transferability of the crafted UCs to diverse domains.

Data Poisoning

Million-scale Object Detection with Large Vision Model

1 code implementation19 Dec 2022 Feng Lin, Wenze Hu, YaoWei Wang, Yonghong Tian, Guangming Lu, Fanglin Chen, Yong Xu, Xiaoyu Wang

Over the past few years, there has been growing interest in developing a broad, universal, and general-purpose computer vision system.

object-detection Object Detection

Isolation and Impartial Aggregation: A Paradigm of Incremental Learning without Interference

1 code implementation29 Nov 2022 Yabin Wang, Zhiheng Ma, Zhiwu Huang, YaoWei Wang, Zhou Su, Xiaopeng Hong

To avoid obvious stage learning bottlenecks, we propose a brand-new stage-isolation based incremental learning framework, which leverages a series of stage-isolated classifiers to perform the learning task of each stage without the interference of others.

Continual Learning Incremental Learning

SgVA-CLIP: Semantic-guided Visual Adapting of Vision-Language Models for Few-shot Image Classification

no code implementations28 Nov 2022 Fang Peng, Xiaoshan Yang, Linhui Xiao, YaoWei Wang, Changsheng Xu

Although significant progress has been made in few-shot learning, most of existing few-shot image classification methods require supervised pre-training on a large amount of samples of base classes, which limits their generalization ability in real world application.

Few-Shot Image Classification Few-Shot Learning +2

Revisiting Color-Event based Tracking: A Unified Network, Dataset, and Metric

1 code implementation20 Nov 2022 Chuanming Tang, Xiao Wang, Ju Huang, Bo Jiang, Lin Zhu, Jianlin Zhang, YaoWei Wang, Yonghong Tian

In this paper, we propose a single-stage backbone network for Color-Event Unified Tracking (CEUTrack), which achieves the above functions simultaneously.

Object Localization Object Tracking

HARDVS: Revisiting Human Activity Recognition with Dynamic Vision Sensors

1 code implementation17 Nov 2022 Xiao Wang, Zongzhen Wu, Bo Jiang, Zhimin Bao, Lin Zhu, Guoqi Li, YaoWei Wang, Yonghong Tian

The main streams of human activity recognition (HAR) algorithms are developed based on RGB cameras which are suffered from illumination, fast motion, privacy-preserving, and large energy consumption.

Activity Prediction Human Activity Recognition +1

Spikformer: When Spiking Neural Network Meets Transformer

1 code implementation29 Sep 2022 Zhaokun Zhou, Yuesheng Zhu, Chao He, YaoWei Wang, Shuicheng Yan, Yonghong Tian, Li Yuan

Spikformer (66. 3M parameters) with comparable size to SEW-ResNet-152 (60. 2M, 69. 26%) can achieve 74. 81% top1 accuracy on ImageNet using 4 time steps, which is the state-of-the-art in directly trained SNNs models.

Image Classification

Learned Distributed Image Compression with Multi-Scale Patch Matching in Feature Domain

no code implementations6 Sep 2022 Yujun Huang, Bin Chen, Shiyu Qin, Jiawei Li, YaoWei Wang, Tao Dai, Shu-Tao Xia

Specifically, MSFDPM consists of a side information feature extractor, a multi-scale feature domain patch matching module, and a multi-scale feature fusion network.

Image Compression Patch Matching

DAS: Densely-Anchored Sampling for Deep Metric Learning

1 code implementation30 Jul 2022 Lizhao Liu, Shangxin Huang, Zhuangwei Zhuang, Ran Yang, Mingkui Tan, YaoWei Wang

To this end, we propose a Densely-Anchored Sampling (DAS) scheme that considers the embedding with corresponding data point as "anchor" and exploits the anchor's nearby embedding space to densely produce embeddings without data points.

Face Recognition Image Retrieval +2

Entity-Graph Enhanced Cross-Modal Pretraining for Instance-level Product Retrieval

no code implementations17 Jun 2022 Xiao Dong, Xunlin Zhan, Yunchao Wei, XiaoYong Wei, YaoWei Wang, Minlong Lu, Xiaochun Cao, Xiaodan Liang

Our goal in this research is to study a more realistic environment in which we can conduct weakly-supervised multi-modal instance-level product retrieval for fine-grained product categories.


Prompt-based Learning for Unpaired Image Captioning

no code implementations26 May 2022 Peipei Zhu, Xiao Wang, Lin Zhu, Zhenglong Sun, Weishi Zheng, YaoWei Wang, Changwen Chen

Inspired by the success of Vision-Language Pre-Trained Models (VL-PTMs) in this research, we attempt to infer the cross-domain cue information about a given image from the large VL-PTMs for the UIC task.

Image Captioning Question Answering +2

Global-Supervised Contrastive Loss and View-Aware-Based Post-Processing for Vehicle Re-Identification

no code implementations17 Apr 2022 Zhijun Hu, Yong Xu, Jie Wen, Xianjing Cheng, Zaijun Zhang, Lilei Sun, YaoWei Wang

The proposed VABPP method is the first time that the view-aware-based method is used as a post-processing method in the field of vehicle re-identification.

Vehicle Re-Identification

Fine-Grained Object Classification via Self-Supervised Pose Alignment

2 code implementations CVPR 2022 Xuhui Yang, YaoWei Wang, Ke Chen, Yong Xu, Yonghong Tian

Semantic patterns of fine-grained objects are determined by subtle appearance difference of local parts, which thus inspires a number of part-based methods.

Classification Representation Learning

Boost Test-Time Performance with Closed-Loop Inference

no code implementations21 Mar 2022 Shuaicheng Niu, Jiaxiang Wu, Yifan Zhang, Guanghui Xu, Haokun Li, Peilin Zhao, Junzhou Huang, YaoWei Wang, Mingkui Tan

Motivated by this, we propose to predict those hard-classified test samples in a looped manner to boost the model performance.

Auxiliary Learning

Mixed-Precision Neural Network Quantization via Learned Layer-wise Importance

1 code implementation16 Mar 2022 Chen Tang, Kai Ouyang, Zhi Wang, Yifei Zhu, YaoWei Wang, Wen Ji, Wenwu Zhu

For example, MPQ search on ResNet18 with our indicators takes only 0. 06 s, which improves time efficiency exponentially compared to iterative search methods.


Peng Cheng Object Detection Benchmark for Smart City

no code implementations11 Mar 2022 YaoWei Wang, Zhouxin Yang, Rui Liu, Deng Li, Yuandu Lai, Leyuan Fang, Yahong Han

Considering the diversity and complexity of scenes in intelligent city governance, we build a large-scale object detection benchmark for the smart city.

object-detection Object Detection

Unpaired Image Captioning by Image-level Weakly-Supervised Visual Concept Recognition

no code implementations7 Mar 2022 Peipei Zhu, Xiao Wang, Yong Luo, Zhenglong Sun, Wei-Shi Zheng, YaoWei Wang, Changwen Chen

The image-level labels are utilized to train a weakly-supervised object recognition model to extract object information (e. g., instance) in an image, and the extracted instances are adopted to infer the relationships among different objects based on an enhanced graph neural network (GNN).

Image Captioning Object Recognition

Boosting Crowd Counting via Multifaceted Attention

1 code implementation CVPR 2022 Hui Lin, Zhiheng Ma, Rongrong Ji, YaoWei Wang, Xiaopeng Hong

Secondly, we design the Local Attention Regularization to supervise the training of LRA by minimizing the deviation among the attention for different feature locations.

Crowd Counting

Conceptor Learning for Class Activation Mapping

no code implementations21 Jan 2022 Guangwu Qian, Zhen-Qun Yang, Xu-Lu Zhang, YaoWei Wang, Qing Li, Xiao-Yong Wei

Class Activation Mapping (CAM) has been widely adopted to generate saliency maps which provides visual explanations for deep neural networks (DNNs).

Towards End-to-End Image Compression and Analysis with Transformers

1 code implementation17 Dec 2021 Yuanchao Bai, Xu Yang, Xianming Liu, Junjun Jiang, YaoWei Wang, Xiangyang Ji, Wen Gao

Meanwhile, we propose a feature aggregation module to fuse the compressed features with the selected intermediate features of the Transformer, and feed the aggregated features to a deconvolutional neural network for image reconstruction.

Classification Image Classification +3

Frequency Spectrum Augmentation Consistency for Domain Adaptive Object Detection

no code implementations16 Dec 2021 Rui Liu, Yahong Han, YaoWei Wang, Qi Tian

In the second stage, augmented source and target data with pseudo labels are adopted to perform the self-training for prediction consistency.

object-detection Object Detection

Learning to Share in Multi-Agent Reinforcement Learning

2 code implementations16 Dec 2021 Yuxuan Yi, Ge Li, YaoWei Wang, Zongqing Lu

Inspired by the fact that sharing plays a key role in human's learning of cooperation, we propose LToS, a hierarchically decentralized MARL framework that enables agents to learn to dynamically share reward with neighbors so as to encourage agents to cooperate on the global objective through collectives.

Multi-agent Reinforcement Learning reinforcement-learning +1

An Informative Tracking Benchmark

1 code implementation13 Dec 2021 Xin Li, Qiao Liu, Wenjie Pei, Qiuhong Shen, YaoWei Wang, Huchuan Lu, Ming-Hsuan Yang

Along with the rapid progress of visual tracking, existing benchmarks become less informative due to redundancy of samples and weak discrimination between current trackers, making evaluations on all datasets extremely time-consuming.

Visual Tracking

Optimized Separable Convolution: Yet Another Efficient Convolution Operator

no code implementations29 Sep 2021 Tao Wei, Yonghong Tian, YaoWei Wang, Yun Liang, Chang Wen Chen

In this research, we propose a novel and principled operator called optimized separable convolution by optimal design for the internal number of groups and kernel sizes for general separable convolutions can achieve the complexity of O(C^{\frac{3}{2}}K).

M5Product: Self-harmonized Contrastive Learning for E-commercial Multi-modal Pretraining

no code implementations CVPR 2022 Xiao Dong, Xunlin Zhan, Yangxin Wu, Yunchao Wei, Michael C. Kampffmeyer, XiaoYong Wei, Minlong Lu, YaoWei Wang, Xiaodan Liang

Despite the potential of multi-modal pre-training to learn highly discriminative feature representations from complementary data modalities, current progress is being slowed by the lack of large-scale modality-diverse datasets.

Contrastive Learning

VisEvent: Reliable Object Tracking via Collaboration of Frame and Event Flows

2 code implementations11 Aug 2021 Xiao Wang, Jianing Li, Lin Zhu, Zhipeng Zhang, Zhe Chen, Xin Li, YaoWei Wang, Yonghong Tian, Feng Wu

Different from visible cameras which record intensity images frame by frame, the biologically inspired event camera produces a stream of asynchronous and sparse events with much lower latency.

Object Tracking

MFGNet: Dynamic Modality-Aware Filter Generation for RGB-T Tracking

2 code implementations22 Jul 2021 Xiao Wang, Xiujun Shu, Shiliang Zhang, Bo Jiang, YaoWei Wang, Yonghong Tian, Feng Wu

The visible and thermal filters will be used to conduct a dynamic convolutional operation on their corresponding input feature maps respectively.

Rgb-T Tracking

Direct Measure Matching for Crowd Counting

no code implementations4 Jul 2021 Hui Lin, Xiaopeng Hong, Zhiheng Ma, Xing Wei, Yunfeng Qiu, YaoWei Wang, Yihong Gong

Second, we derive a semi-balanced form of Sinkhorn divergence, based on which a Sinkhorn counting loss is designed for measure matching.

Crowd Counting

Self-Supervised Tracking via Target-Aware Data Synthesis

no code implementations21 Jun 2021 Xin Li, Wenjie Pei, YaoWei Wang, Zhenyu He, Huchuan Lu, Ming-Hsuan Yang

While deep-learning based tracking methods have achieved substantial progress, they entail large-scale and high-quality annotated data for sufficient training.

Representation Learning Self-Supervised Learning +1

Learning Scalable lY=-Constrained Near-Lossless Image Compression via Joint Lossy Image and Residual Compression

no code implementations CVPR 2021 Yuanchao Bai, Xianming Liu, WangMeng Zuo, YaoWei Wang, Xiangyang Ji

To achieve scalable compression with the error bound larger than zero, we derive the probability model of the quantized residual by quantizing the learned probability model of the original residual, instead of training multiple networks.

Image Compression

Tracking by Joint Local and Global Search: A Target-aware Attention based Approach

1 code implementation9 Jun 2021 Xiao Wang, Jin Tang, Bin Luo, YaoWei Wang, Yonghong Tian, Feng Wu

In this paper, we propose a novel and general target-aware attention mechanism (termed TANet) and integrate it with tracking-by-detection framework to conduct joint local and global search for robust tracking.

Object Tracking

Conformer: Local Features Coupling Global Representations for Visual Recognition

4 code implementations ICCV 2021 Zhiliang Peng, Wei Huang, Shanzhi Gu, Lingxi Xie, YaoWei Wang, Jianbin Jiao, Qixiang Ye

Within Convolutional Neural Network (CNN), the convolution operations are good at extracting local features but experience difficulty to capture global representations.

Image Classification Instance Segmentation +4

Anomaly Detection with Prototype-Guided Discriminative Latent Embeddings

no code implementations30 Apr 2021 Yuandu Lai, Yahong Han, YaoWei Wang

Recent efforts towards video anomaly detection (VAD) try to learn a deep autoencoder to describe normal event patterns with small reconstruction errors.

Anomaly Detection Optical Flow Estimation +1

AAformer: Auto-Aligned Transformer for Person Re-Identification

no code implementations2 Apr 2021 Kuan Zhu, Haiyun Guo, Shiliang Zhang, YaoWei Wang, Gaopan Huang, Honglin Qiao, Jing Liu, Jinqiao Wang, Ming Tang

In this paper, we introduce an alignment scheme in Transformer architecture for the first time and propose the Auto-Aligned Transformer (AAformer) to automatically locate both the human parts and non-human ones at patch-level.

Human Parsing Image Classification +3

Learning Scalable $\ell_\infty$-constrained Near-lossless Image Compression via Joint Lossy Image and Residual Compression

no code implementations31 Mar 2021 Yuanchao Bai, Xianming Liu, WangMeng Zuo, YaoWei Wang, Xiangyang Ji

To achieve scalable compression with the error bound larger than zero, we derive the probability model of the quantized residual by quantizing the learned probability model of the original residual, instead of training multiple networks.

Image Compression

Dynamic Attention guided Multi-Trajectory Analysis for Single Object Tracking

1 code implementation30 Mar 2021 Xiao Wang, Zhe Chen, Jin Tang, Bin Luo, YaoWei Wang, Yonghong Tian, Feng Wu

In this paper, we propose to introduce more dynamics by devising a dynamic attention-guided multi-trajectory tracking strategy.

Object Tracking

Classification of Single-View Object Point Clouds

no code implementations18 Dec 2020 Zelin Xu, Ke Chen, KangJun Liu, Changxing Ding, YaoWei Wang, Kui Jia

By adapting existing ModelNet40 and ScanNet datasets to the single-view, partial setting, experiment results can verify the necessity of object pose estimation and superiority of our PAPNet to existing classifiers.

3D Object Classification 6D Pose Estimation using RGB +5

Modular Graph Attention Network for Complex Visual Relational Reasoning

no code implementations22 Nov 2020 Yihan Zheng, Zhiquan Wen, Mingkui Tan, Runhao Zeng, Qi Chen, YaoWei Wang, Qi Wu

Moreover, to capture the complex logic in a query, we construct a relational graph to represent the visual objects and their relationships, and propose a multi-step reasoning method to progressively understand the complex logic.

Graph Attention Question Answering +5

