Search Results for author: Yongjian Wu

Found 42 papers, 24 papers with code

Learning Best Combination for Efficient N:M Sparsity

1 code implementation14 Jun 2022 Yuxin Zhang, Mingbao Lin, Zhihang Lin, Yiting Luo, Ke Li, Fei Chao, Yongjian Wu, Rongrong Ji

In this paper, we show that the N:M learning can be naturally characterized as a combinatorial problem which searches for the best combination candidate within a finite collection.

What Goes beyond Multi-modal Fusion in One-stage Referring Expression Comprehension: An Empirical Study

1 code implementation17 Apr 2022 Gen Luo, Yiyi Zhou, Jiamu Sun, Shubin Huang, Xiaoshuai Sun, Qixiang Ye, Yongjian Wu, Rongrong Ji

But the most encouraging finding is that with much less training overhead and parameters, SimREC can still achieve better performance than a set of large-scale pre-trained models, e. g., UNITER and VILLA, portraying the special role of REC in existing V&L research.

Data Augmentation Referring Expression +1

Global2Local: A Joint-Hierarchical Attention for Video Captioning

no code implementations13 Mar 2022 Chengpeng Dai, Fuhai Chen, Xiaoshuai Sun, Rongrong Ji, Qixiang Ye, Yongjian Wu

Recently, automatic video captioning has attracted increasing attention, where the core challenge lies in capturing the key semantic items, like objects and actions as well as their spatial-temporal correlations from the redundant frames and semantic content.

Video Captioning

Differentiated Relevances Embedding for Group-based Referring Expression Comprehension

no code implementations12 Mar 2022 Fuhai Chen, Xiaoshuai Sun, Xuri Ge, Jianzhuang Liu, Yongjian Wu, Feiyue Huang, Rongrong Ji

In particular, based on the visual and textual semantic features, RMSL conducts an adaptive learning cycle upon triplet ranking, where (1) the target-negative region-expression pairs with low within-group relevances are used preferentially in model training to distinguish the primary semantics of the target objects, and (2) an across-group relevance regularization is integrated into model training to balance the bias of group priority.

Referring Expression Referring Expression Comprehension

Coarse-to-Fine Vision Transformer

1 code implementation8 Mar 2022 Mengzhao Chen, Mingbao Lin, Ke Li, Yunhang Shen, Yongjian Wu, Fei Chao, Rongrong Ji

Our proposed CF-ViT is motivated by two important observations in modern ViT models: (1) The coarse-grained patch splitting can locate informative regions of an input image.

Dynamic Dual Trainable Bounds for Ultra-low Precision Super-Resolution Networks

1 code implementation8 Mar 2022 Yunshan Zhong, Mingbao Lin, Xunchao Li, Ke Li, Yunhang Shen, Fei Chao, Yongjian Wu, Rongrong Ji

However, these methods suffer from severe performance degradation when quantizing the SR models to ultra-low precision (e. g., 2-bit and 3-bit) with the low-cost layer-wise quantizer.

Quantization Super-Resolution

Optimizing Gradient-driven Criteria in Network Sparsity: Gradient is All You Need

1 code implementation30 Jan 2022 Yuxin Zhang, Mingbao Lin, Mengzhao Chen, Zihan Xu, Fei Chao, Yunhan Shen, Ke Li, Yongjian Wu, Rongrong Ji

We prove that supermask training is to accumulate the weight gradients and can partly solve the independence paradox.

Towards Language-guided Visual Recognition via Dynamic Convolutions

no code implementations17 Oct 2021 Gen Luo, Yiyi Zhou, Xiaoshuai Sun, Xinghao Ding, Yongjian Wu, Feiyue Huang, Yue Gao, Rongrong Ji

Based on the LaConv module, we further build the first fully language-driven convolution network, termed as LaConvNet, which can unify the visual recognition and multi-modal reasoning in one forward structure.

Question Answering Referring Expression +3

OMPQ: Orthogonal Mixed Precision Quantization

1 code implementation16 Sep 2021 Yuexiao Ma, Taisong Jin, Xiawu Zheng, Yan Wang, Huixia Li, Yongjian Wu, Yunsheng Wu, Guannan Jiang, Wei zhang, Rongrong Ji

Instead of solving a problem of the original integer programming, we propose to optimize a proxy metric, the concept of network orthogonality, which is highly correlated with the loss of the integer programming but also easy to optimize with linear programming.

AutoML Quantization

Fine-grained Data Distribution Alignment for Post-Training Quantization

1 code implementation9 Sep 2021 Yunshan Zhong, Mingbao Lin, Mengzhao Chen, Ke Li, Yunhang Shen, Fei Chao, Yongjian Wu, Feiyue Huang, Rongrong Ji

To alleviate this limitation, in this paper, we leverage the synthetic data introduced by zero-shot quantization with calibration dataset and we propose a fine-grained data distribution alignment (FDDA) method to boost the performance of post-training quantization.

Quantization

Discover Cross-Modality Nuances for Visible-Infrared Person Re-Identification

no code implementations CVPR 2021 Qiong Wu, Pingyang Dai, Jie Chen, Chia-Wen Lin, Yongjian Wu, Feiyue Huang, Bineng Zhong, Rongrong Ji

In this paper, we propose a joint Modality and Pattern Alignment Network (MPANet) to discover cross-modality nuances in different patterns for visible-infrared person Re-ID, which introduces a modality alleviation module and a pattern alignment module to jointly extract discriminative features.

Person Re-Identification

RSTNet: Captioning With Adaptive Attention on Visual and Non-Visual Words

1 code implementation CVPR 2021 Xuying Zhang, Xiaoshuai Sun, Yunpeng Luo, Jiayi Ji, Yiyi Zhou, Yongjian Wu, Feiyue Huang, Rongrong Ji

Then, we build a BERTbased language model to extract language context and propose Adaptive-Attention (AA) module on top of a transformer decoder to adaptively measure the contribution of visual and language cues before making decisions for word prediction.

Image Captioning Language Modelling +2

HifiFace: 3D Shape and Semantic Prior Guided High Fidelity Face Swapping

1 code implementation18 Jun 2021 YuHan Wang, Xu Chen, Junwei Zhu, Wenqing Chu, Ying Tai, Chengjie Wang, Jilin Li, Yongjian Wu, Feiyue Huang, Rongrong Ji

In this work, we propose a high fidelity face swapping method, called HifiFace, which can well preserve the face shape of the source face and generate photo-realistic results.

3D Face Reconstruction Face Recognition +1

Carrying out CNN Channel Pruning in a White Box

1 code implementation24 Apr 2021 Yuxin Zhang, Mingbao Lin, Chia-Wen Lin, Jie Chen, Feiyue Huang, Yongjian Wu, Yonghong Tian, Rongrong Ji

Specifically, to model the contribution of each channel to differentiating categories, we develop a class-wise mask for each channel, implemented in a dynamic training manner w. r. t.

Image Classification

Lottery Jackpots Exist in Pre-trained Models

2 code implementations18 Apr 2021 Yuxin Zhang, Mingbao Lin, Fei Chao, Yan Wang, Ke Li, Yunhang Shen, Yongjian Wu, Rongrong Ji

In this paper, we show that high-performing and sparse sub-networks without the involvement of weight tuning, termed "lottery jackpots", exist in pre-trained models with unexpanded width.

Network Pruning

Distilling a Powerful Student Model via Online Knowledge Distillation

1 code implementation26 Mar 2021 Shaojie Li, Mingbao Lin, Yan Wang, Yongjian Wu, Yonghong Tian, Ling Shao, Rongrong Ji

Besides, a self-distillation module is adopted to convert the feature map of deeper layers into a shallower one.

Knowledge Distillation

Image-to-image Translation via Hierarchical Style Disentanglement

1 code implementation CVPR 2021 Xinyang Li, Shengchuan Zhang, Jie Hu, Liujuan Cao, Xiaopeng Hong, Xudong Mao, Feiyue Huang, Yongjian Wu, Rongrong Ji

Recently, image-to-image translation has made significant progress in achieving both multi-label (\ie, translation conditioned on different labels) and multi-style (\ie, generation with diverse styles) tasks.

Disentanglement Multimodal Unsupervised Image-To-Image Translation +1

Network Pruning using Adaptive Exemplar Filters

1 code implementation20 Jan 2021 Mingbao Lin, Rongrong Ji, Shaojie Li, Yan Wang, Yongjian Wu, Feiyue Huang, Qixiang Ye

Inspired by the face recognition community, we use a message passing algorithm Affinity Propagation on the weight matrices to obtain an adaptive number of exemplars, which then act as the preserved filters.

Face Recognition Network Pruning

Dual-Level Collaborative Transformer for Image Captioning

1 code implementation16 Jan 2021 Yunpeng Luo, Jiayi Ji, Xiaoshuai Sun, Liujuan Cao, Yongjian Wu, Feiyue Huang, Chia-Wen Lin, Rongrong Ji

Descriptive region features extracted by object detection networks have played an important role in the recent advancements of image captioning.

Image Captioning object-detection +1

Aha! Adaptive History-Driven Attack for Decision-Based Black-Box Models

no code implementations ICCV 2021 Jie Li, Rongrong Ji, Peixian Chen, Baochang Zhang, Xiaopeng Hong, Ruixin Zhang, Shaoxin Li, Jilin Li, Feiyue Huang, Yongjian Wu

A common practice is to start from a large perturbation and then iteratively reduce it with a deterministic direction and a random one while keeping it adversarial.

Dimensionality Reduction

Improving Image Captioning by Leveraging Intra- and Inter-layer Global Representation in Transformer Network

1 code implementation13 Dec 2020 Jiayi Ji, Yunpeng Luo, Xiaoshuai Sun, Fuhai Chen, Gen Luo, Yongjian Wu, Yue Gao, Rongrong Ji

The latter contains a Global Adaptive Controller that can adaptively fuse the global information into the decoder to guide the caption generation.

Image Captioning

UWSOD: Toward Fully-Supervised-Level Capacity Weakly Supervised Object Detection

1 code implementation NeurIPS 2020 Yunhang Shen, Rongrong Ji, Zhiwei Chen, Yongjian Wu, Feiyue Huang

In this paper, we propose a unified WSOD framework, termed UWSOD, to develop a high-capacity general detection model with only image-level labels, which is self-contained and does not require external modules or additional supervision.

object-detection Object Proposal Generation +1

Rotated Binary Neural Network

2 code implementations NeurIPS 2020 Mingbao Lin, Rongrong Ji, Zihan Xu, Baochang Zhang, Yan Wang, Yongjian Wu, Feiyue Huang, Chia-Wen Lin

In this paper, for the first time, we explore the influence of angular bias on the quantization error and then introduce a Rotated Binary Neural Network (RBNN), which considers the angle alignment between the full-precision weight vector and its binarized version.

Binarization Quantization

Channel Pruning via Automatic Structure Search

1 code implementation23 Jan 2020 Mingbao Lin, Rongrong Ji, Yuxin Zhang, Baochang Zhang, Yongjian Wu, Yonghong Tian

In this paper, we propose a new channel pruning method based on artificial bee colony algorithm (ABC), dubbed as ABCPruner, which aims to efficiently find optimal pruned structure, i. e., channel number in each layer, rather than selecting "important" channels as previous works did.

Variational Structured Semantic Inference for Diverse Image Captioning

no code implementations NeurIPS 2019 Fuhai Chen, Rongrong Ji, Jiayi Ji, Xiaoshuai Sun, Baochang Zhang, Xuri Ge, Yongjian Wu, Feiyue Huang, Yan Wang

To model these two inherent diversities in image captioning, we propose a Variational Structured Semantic Inferring model (termed VSSI-cap) executed in a novel structured encoder-inferer-decoder schema.

Image Captioning

Semi-Supervised Adversarial Monocular Depth Estimation

no code implementations6 Aug 2019 Rongrong Ji, Ke Li, Yan Wang, Xiaoshuai Sun, Feng Guo, Xiaowei Guo, Yongjian Wu, Feiyue Huang, Jiebo Luo

In this paper, we address the problem of monocular depth estimation when only a limited number of training image-depth pairs are available.

Monocular Depth Estimation

Interpretable Neural Network Decoupling

no code implementations ECCV 2020 Yuchao Li, Rongrong Ji, Shaohui Lin, Baochang Zhang, Chenqian Yan, Yongjian Wu, Feiyue Huang, Ling Shao

More specifically, we introduce a novel architecture controlling module in each layer to encode the network architecture by a vector.

Dynamic Distribution Pruning for Efficient Network Architecture Search

1 code implementation28 May 2019 Xiawu Zheng, Rongrong Ji, Lang Tang, Yan Wan, Baochang Zhang, Yongjian Wu, Yunsheng Wu, Ling Shao

The search space is dynamically pruned every a few epochs to update this distribution, and the optimal neural architecture is obtained when there is only one structure remained.

Neural Architecture Search

Towards Optimal Discrete Online Hashing with Balanced Similarity

1 code implementation29 Jan 2019 Mingbao Lin, Rongrong Ji, Hong Liu, Xiaoshuai Sun, Yongjian Wu, Yunsheng Wu

In this paper, we propose a novel supervised online hashing method, termed Balanced Similarity for Online Discrete Hashing (BSODH), to solve the above problems in a unified framework.

Exploiting Kernel Sparsity and Entropy for Interpretable CNN Compression

1 code implementation CVPR 2019 Yuchao Li, Shaohui Lin, Baochang Zhang, Jianzhuang Liu, David Doermann, Yongjian Wu, Feiyue Huang, Rongrong Ji

The relationship between the input feature maps and 2D kernels is revealed in a theoretical framework, based on which a kernel sparsity and entropy (KSE) indicator is proposed to quantitate the feature map importance in a feature-agnostic manner to guide model compression.

Model Compression

Temporal Action Detection by Joint Identification-Verification

no code implementations19 Oct 2018 Wen Wang, Yongjian Wu, Haijun Liu, Shiguang Wang, Jian Cheng

Temporal action detection aims at not only recognizing action category but also detecting start time and end time for each action instance in an untrimmed video.

Action Detection

GroupCap: Group-Based Image Captioning With Structured Relevance and Diversity Constraints

no code implementations CVPR 2018 Fuhai Chen, Rongrong Ji, Xiaoshuai Sun, Yongjian Wu, Jinsong Su

In offline optimization, we adopt an end-to-end formulation, which jointly trains the visual tree parser, the structured relevance and diversity constraints, as well as the LSTM based captioning model.

Image Captioning

Cross-Modality Binary Code Learning via Fusion Similarity Hashing

no code implementations CVPR 2017 Hong Liu, Rongrong Ji, Yongjian Wu, Feiyue Huang, Baochang Zhang

In this paper, we propose a hashing scheme, termed Fusion Similarity Hashing (FSH), which explicitly embeds the graph-based fusion similarity across modalities into a common Hamming space.

Fast and Accurate Neural Word Segmentation for Chinese

1 code implementation ACL 2017 Deng Cai, Hai Zhao, Zhisong Zhang, Yuan Xin, Yongjian Wu, Feiyue Huang

Neural models with minimal feature engineering have achieved competitive performance against traditional methods for the task of Chinese word segmentation.

Chinese Word Segmentation Feature Engineering

Ordinal Constrained Binary Code Learning for Nearest Neighbor Search

no code implementations19 Nov 2016 Hong Liu, Rongrong Ji, Yongjian Wu, Feiyue Huang

By given a large-scale training data set, it is very expensive to embed such ranking tuples in binary code learning.

Small Data Image Classification

Cannot find the paper you are looking for? You can Submit a new open access paper.