API-Net: Robust Generative Classifier via a Single Discriminator

1 code implementation ECCV 2020 Xinshuai Dong, Hong Liu, Rongrong Ji, Liujuan Cao, Qixiang Ye, Jianzhuang Liu, Qi Tian

On the contrary, a discriminative classifier only models the conditional distribution of labels given inputs, but benefits from effective optimization owing to its succinct structure.

Integral Migrating Pre-trained Transformer Encoder-decoders for Visual Object Detection

no code implementations19 May 2022 Xiaosong Zhang, Feng Liu, Zhiliang Peng, Zonghao Guo, Fang Wan, Xiangyang Ji, Qixiang Ye

However, except for the backbone networks, other detector components, such as the detector head and the feature pyramid network, remain randomly initialized, which hinders the consistency between detectors and pre-trained models.

What Goes beyond Multi-modal Fusion in One-stage Referring Expression Comprehension: An Empirical Study

1 code implementation17 Apr 2022 Gen Luo, Yiyi Zhou, Jiamu Sun, Shubin Huang, Xiaoshuai Sun, Qixiang Ye, Yongjian Wu, Rongrong Ji

But the most encouraging finding is that with much less training overhead and parameters, SimREC can still achieve better performance than a set of large-scale pre-trained models, e. g., UNITER and VILLA, portraying the special role of REC in existing V&L research.

Object Localization under Single Coarse Point Supervision

1 code implementation17 Mar 2022 Xuehui Yu, Pengfei Chen, Di wu, Najmul Hassan, Guorong Li, Junchi Yan, Humphrey Shi, Qixiang Ye, Zhenjun Han

In this study, we propose a POL method using coarse point annotations, relaxing the supervision signals from accurate key points to freely spotted points.

Global2Local: A Joint-Hierarchical Attention for Video Captioning

no code implementations13 Mar 2022 Chengpeng Dai, Fuhai Chen, Xiaoshuai Sun, Rongrong Ji, Qixiang Ye, Yongjian Wu

Recently, automatic video captioning has attracted increasing attention, where the core challenge lies in capturing the key semantic items, like objects and actions as well as their spatial-temporal correlations from the redundant frames and semantic content.

P2P-Loc: Point to Point Tiny Person Localization

no code implementations31 Dec 2021 Xuehui Yu, Di wu, Qixiang Ye, Jianbin Jiao, Zhenjun Han

As a result, we propose a point self-refinement approach that iteratively updates point annotations in a self-paced way.

Exploring Complicated Search Spaces with Interleaving-Free Sampling

no code implementations5 Dec 2021 Yunjie Tian, Lingxi Xie, Jiemin Fang, Jianbin Jiao, Qixiang Ye, Qi Tian

In this paper, we build the search algorithm upon a complicated search space with long-distance connections, and show that existing weight-sharing search algorithms mostly fail due to the existence of \textbf{interleaved connections}.

Feature-Gate Coupling for Dynamic Network Pruning

1 code implementation29 Nov 2021 Mengnan Shi, Chang Liu, Qixiang Ye, Jianbin Jiao

Gating modules have been widely explored in dynamic network pruning to reduce the run-time computational cost of deep neural networks while preserving the representation of features.

Semantic-Aware Generation for Self-Supervised Visual Representation Learning

1 code implementation25 Nov 2021 Yunjie Tian, Lingxi Xie, Xiaopeng Zhang, Jiemin Fang, Haohang Xu, Wei Huang, Jianbin Jiao, Qi Tian, Qixiang Ye

In this paper, we propose a self-supervised visual representation learning approach which involves both generative and discriminative proxies, where we focus on the former part by requiring the target network to recover the original image based on the mid-level features.

Long-tailed Distribution Adaptation

1 code implementation6 Oct 2021 Zhiliang Peng, Wei Huang, Zonghao Guo, Xiaosong Zhang, Jianbin Jiao, Qixiang Ye

We propose to jointly optimize empirical risks of the unbalanced and balanced domains and approximate their domain divergence by intra-class and inter-class distances, with the aim to adapt models trained on the long-tailed distribution to general distributions in an interpretable way.

GraFormer: Graph Convolution Transformer for 3D Pose Estimation

2 code implementations17 Sep 2021 Weixi Zhao, Yunjie Tian, Qixiang Ye, Jianbin Jiao, Weiqiang Wang

Exploiting relations among 2D joints plays a crucial role yet remains semi-developed in 2D-to-3D pose estimation.

Adversarial Reinforced Instruction Attacker for Robust Vision-Language Navigation

1 code implementation23 Jul 2021 Bingqian Lin, Yi Zhu, Yanxin Long, Xiaodan Liang, Qixiang Ye, Liang Lin

Specifically, we propose a Dynamic Reinforced Instruction Attacker (DR-Attacker), which learns to mislead the navigator to move to the wrong target by destroying the most instructive information in instructions at different timesteps.

Rethinking Sampling Strategies for Unsupervised Person Re-identification

2 code implementations7 Jul 2021 Xumeng Han, Xuehui Yu, Guorong Li, Jian Zhao, Gang Pan, Qixiang Ye, Jianbin Jiao, Zhenjun Han

Inspired by that, a simple yet effective approach is proposed, known as group sampling, which gathers groups of samples from the same class into a mini-batch.

Cogradient Descent for Dependable Learning

no code implementations20 Jun 2021 Runqi Wang, Baochang Zhang, Li'an Zhuo, Qixiang Ye, David Doermann

Conventional gradient descent methods compute the gradients for multiple variables through the partial derivative.

Anti-aliasing Semantic Reconstruction for Few-Shot Semantic Segmentation

1 code implementation CVPR 2021 Binghao Liu, Yao Ding, Jianbin Jiao, Xiangyang Ji, Qixiang Ye

Encouraging progress in few-shot semantic segmentation has been made by leveraging features learned upon base classes with sufficient training data to represent novel classes with few-shot examples.

Conformer: Local Features Coupling Global Representations for Visual Recognition

3 code implementations ICCV 2021 Zhiliang Peng, Wei Huang, Shanzhi Gu, Lingxi Xie, YaoWei Wang, Jianbin Jiao, Qixiang Ye

Within Convolutional Neural Network (CNN), the convolution operations are good at extracting local features but experience difficulty to capture global representations.

Multiple instance active learning for object detection

1 code implementation CVPR 2021 Tianning Yuan, Fang Wan, Mengying Fu, Jianzhuang Liu, Songcen Xu, Xiangyang Ji, Qixiang Ye

Despite the substantial progress of active learning for image recognition, there still lacks an instance-level active learning method specified for object detection.

Learnable Expansion-and-Compression Network for Few-shot Class-Incremental Learning

no code implementations6 Apr 2021 Boyu Yang, Mingbao Lin, Binghao Liu, Mengying Fu, Chang Liu, Rongrong Ji, Qixiang Ye

By tentatively expanding network nodes, LEC-Net enlarges the representation capacity of features, alleviating feature drift of old network from the perspective of model regularization.

Beyond Max-Margin: Class Margin Equilibrium for Few-shot Object Detection

2 code implementations CVPR 2021 Bohao Li, Boyu Yang, Chang Liu, Feng Liu, Rongrong Ji, Qixiang Ye

Few-shot object detection has made substantial progressby representing novel class objects using the feature representation learned upon a set of base class objects.

Harmonic Feature Activation for Few-Shot Semantic Segmentation

1 code implementation IEEE Transactions on Image Processing 2021 Binghao Liu, Jianbin Jiao, Qixiang Ye

HFA is formulated as a bilinear model, which takes charge of the pixel-wise dense correlation (bilinear feature activation) between query and support images in a systematic way.

Network Pruning using Adaptive Exemplar Filters

1 code implementation20 Jan 2021 Mingbao Lin, Rongrong Ji, Shaojie Li, Yan Wang, Yongjian Wu, Feiyue Huang, Qixiang Ye

Inspired by the face recognition community, we use a message passing algorithm Affinity Propagation on the weight matrices to obtain an adaptive number of exemplars, which then act as the preserved filters.

Self-Motivated Communication Agent for Real-World Vision-Dialog Navigation

no code implementations ICCV 2021 Yi Zhu, Yue Weng, Fengda Zhu, Xiaodan Liang, Qixiang Ye, Yutong Lu, Jianbin Jiao

Vision-Dialog Navigation (VDN) requires an agent to ask questions and navigate following the human responses to find target objects.

Towards Spatio-Temporal Video Scene Text Detection via Temporal Clustering

no code implementations19 Nov 2020 Yuanqiang Cai, Chang Liu, Weiqiang Wang, Qixiang Ye

With only bounding-box annotations in the spatial domain, existing video scene text detection (VSTD) benchmarks lack temporal relation of text instances among video frames, which hinders the development of video text-related applications.

The 1st Tiny Object Detection Challenge:Methods and Results

1 code implementation16 Sep 2020 Xuehui Yu, Zhenjun Han, Yuqi Gong, Nan Jiang, Jian Zhao, Qixiang Ye, Jie Chen, Yuan Feng, Bin Zhang, Xiaodi Wang, Ying Xin, Jingwei Liu, Mingyuan Mao, Sheng Xu, Baochang Zhang, Shumin Han, Cheng Gao, Wei Tang, Lizuo Jin, Mingbo Hong, Yuchao Yang, Shuiwang Li, Huan Luo, Qijun Zhao, Humphrey Shi

The 1st Tiny Object Detection (TOD) Challenge aims to encourage research in developing novel and accurate methods for tiny object detection in images which have wide views, with a current focus on tiny person detection.

Component Divide-and-Conquer for Real-World Image Super-Resolution

1 code implementation ECCV 2020 Pengxu Wei, Ziwei Xie, Hannan Lu, Zongyuan Zhan, Qixiang Ye, WangMeng Zuo, Liang Lin

Learning an SR model with conventional pixel-wise loss usually is easily dominated by flat regions and edges, and fails to infer realistic details of complex textures.

Discretization-Aware Architecture Search

1 code implementation7 Jul 2020 Yunjie Tian, Chang Liu, Lingxi Xie, Jianbin Jiao, Qixiang Ye

The search cost of neural architecture search (NAS) has been largely reduced by weight-sharing methods.

Progressive Cluster Purification for Unsupervised Feature Learning

no code implementations6 Jul 2020 Yifei Zhang, Chang Liu, Yu Zhou, Wei Wang, Weiping Wang, Qixiang Ye

In this work, we propose a novel clustering based method, which, by iteratively excluding class inconsistent samples during progressive cluster formation, alleviates the impact of noise samples in a simple-yet-effective manner.

Multiple Expert Brainstorming for Domain Adaptive Person Re-identification

2 code implementations ECCV 2020 Yunpeng Zhai, Qixiang Ye, Shijian Lu, Mengxi Jia, Rongrong Ji, Yonghong Tian

Often the best performing deep neural models are ensembles of multiple base-level networks, nevertheless, ensemble learning with respect to domain adaptive person re-ID remains unexplored.

Domain Contrast for Domain Adaptive Object Detection

no code implementations26 Jun 2020 Feng Liu, Xiaoxong Zhang, Fang Wan, Xiangyang Ji, Qixiang Ye

We present Domain Contrast (DC), a simple yet effective approach inspired by contrastive learning for training domain adaptive detectors.

iffDetector: Inference-aware Feature Filtering for Object Detection

1 code implementation23 Jun 2020 Mingyuan Mao, Yuxin Tian, Baochang Zhang, Qixiang Ye, Wanquan Liu, Guodong Guo, David Doermann

In this paper, we propose a new feature optimization approach to enhance features and suppress background noise in both the training and inference stages.

Video Playback Rate Perception for Self-supervisedSpatio-Temporal Representation Learning

1 code implementation20 Jun 2020 Yuan Yao, Chang Liu, Dezhao Luo, Yu Zhou, Qixiang Ye

The generative perception model acts as a feature decoder to focus on comprehending high temporal resolution and short-term representation by introducing a motion-attention mechanism.

Cogradient Descent for Bilinear Optimization

no code implementations CVPR 2020 Li'an Zhuo, Baochang Zhang, Linlin Yang, Hanlin Chen, Qixiang Ye, David Doermann, Guodong Guo, Rongrong Ji

Conventional learning methods simplify the bilinear model by regarding two intrinsically coupled factors independently, which degrades the optimization procedure.

Rethinking Performance Estimation in Neural Architecture Search

1 code implementation CVPR 2020 Xiawu Zheng, Rongrong Ji, Qiang Wang, Qixiang Ye, Zhenguo Li, Yonghong Tian, Qi Tian

In this paper, we provide a novel yet systematic rethinking of PE in a resource constrained regime, termed budgeted PE (BPE), which precisely and effectively estimates the performance of an architecture sampled from an architecture space.

Architecture Disentanglement for Deep Neural Networks

1 code implementation ICCV 2021 Jie Hu, Liujuan Cao, Qixiang Ye, Tong Tong, Shengchuan Zhang, Ke Li, Feiyue Huang, Rongrong Ji, Ling Shao

Based on the experimental results, we present three new findings that provide fresh insights into the inner logic of DNNs.

Self-Guided Adaptation: Progressive Representation Alignment for Domain Adaptive Object Detection

no code implementations19 Mar 2020 Zongxian Li, Qixiang Ye, Chong Zhang, Jingjing Liu, Shijian Lu, Yonghong Tian

In this work, we propose a Self-Guided Adaptation (SGA) model, target at aligning feature representation and transferring object detection models across domains while considering the instantaneous alignment difficulty.

Filter Sketch for Network Pruning

1 code implementation23 Jan 2020 Mingbao Lin, Liujuan Cao, Shaojie Li, Qixiang Ye, Yonghong Tian, Jianzhuang Liu, Qi Tian, Rongrong Ji

Our approach, referred to as FilterSketch, encodes the second-order information of pre-trained weights, which enables the representation capacity of pruned networks to be recovered with a simple fine-tuning procedure.

Video Cloze Procedure for Self-Supervised Spatio-Temporal Learning

1 code implementation2 Jan 2020 Dezhao Luo, Chang Liu, Yu Zhou, Dongbao Yang, Can Ma, Qixiang Ye, Weiping Wang

As a proxy task, it converts rich self-supervised representations into video clip operations (options), which enhances the flexibility and reduces the complexity of representation learning.

Scale Match for Tiny Person Detection

1 code implementation23 Dec 2019 Xuehui Yu, Yuqi Gong, Nan Jiang, Qixiang Ye, Zhenjun Han

In this paper, we introduce a new benchmark, referred to as TinyPerson, opening up a promising directionfor tiny object detection in a long distance and with mas-sive backgrounds.

Multiple Anchor Learning for Visual Object Detection

3 code implementations CVPR 2020 Wei Ke, Tianliang Zhang, Zeyi Huang, Qixiang Ye, Jianzhuang Liu, Dong Huang

In this paper, we propose a Multiple Instance Learning (MIL) approach that selects anchors and jointly optimizes the two modules of a CNN-based object detector.

SPSTracker: Sub-Peak Suppression of Response Map for Robust Object Tracking

1 code implementation2 Dec 2019 Qintao Hu, Lijun Zhou, Xiaoxiao Wang, Yao Mao, Jianlin Zhang, Qixiang Ye

Modern visual trackers usually construct online learning models under the assumption that the feature response has a Gaussian distribution with target-centered peak response.

FreeAnchor: Learning to Match Anchors for Visual Object Detection

3 code implementations NeurIPS 2019 Xiaosong Zhang, Fang Wan, Chang Liu, Rongrong Ji, Qixiang Ye

In this study, we propose a learning-to-match approach to break IoU restriction, allowing objects to match anchors in a flexible manner.

Attribute Guided Unpaired Image-to-Image Translation with Semi-supervised Learning

1 code implementation29 Apr 2019 Xinyang Li, Jie Hu, Shengchuan Zhang, Xiaopeng Hong, Qixiang Ye, Chenglin Wu, Rongrong Ji

Especially, AGUIT benefits from two-fold: (1) It adopts a novel semi-supervised learning process by translating attributes of labeled data to unlabeled data, and then reconstructing the unlabeled data by a cycle consistency operation.

C-MIL: Continuation Multiple Instance Learning for Weakly Supervised Object Detection

1 code implementation CVPR 2019 Fang Wan, Chang Liu, Wei Ke, Xiangyang Ji, Jianbin Jiao, Qixiang Ye

Weakly supervised object detection (WSOD) is a challenging task when provided with image category supervision but required to simultaneously learn object locations and object detectors.

Towards Optimal Structured CNN Pruning via Generative Adversarial Learning

1 code implementation CVPR 2019 Shaohui Lin, Rongrong Ji, Chenqian Yan, Baochang Zhang, Liujuan Cao, Qixiang Ye, Feiyue Huang, David Doermann

In this paper, we propose an effective structured pruning approach that jointly prunes filters as well as other structures in an end-to-end manner.

Min-Entropy Latent Model for Weakly Supervised Object Detection

1 code implementation CVPR 2018 Fang Wan, Pengxu Wei, Zhenjun Han, Jianbin Jiao, Qixiang Ye

Weakly supervised object detection is a challenging task when provided with image category supervision but required to learn, at the same time, object locations and object detectors.

SIXray : A Large-scale Security Inspection X-ray Benchmark for Prohibited Item Discovery in Overlapping Images

2 code implementations2 Jan 2019 Caijing Miao, Lingxi Xie, Fang Wan, Chi Su, Hongye Liu, Jianbin Jiao, Qixiang Ye

In particular, the advantage of CHR is more significant in the scenarios with fewer positive training samples, which demonstrates its potential application in real-world security inspection.

Similarity-preserving Image-image Domain Adaptation for Person Re-identification

no code implementations26 Nov 2018 Weijian Deng, Liang Zheng, Qixiang Ye, Yi Yang, Jianbin Jiao

It first preserves two types of unsupervised similarity, namely, self-similarity of an image before and after translation, and domain-dissimilarity of a translated source image and a target image.

Linear Span Network for Object Skeleton Detection

no code implementations ECCV 2018 Chang Liu, Wei Ke, Fei Qin, Qixiang Ye

Hinted by this, we formalize a Linear Span framework, and propose Linear Span Network (LSN) modified by Linear Span Units (LSUs), which minimize the reconstruction error of convolutional network.

SRN: Side-output Residual Network for Object Reflection Symmetry Detection and Beyond

1 code implementation17 Jul 2018 Wei Ke, Jie Chen, Jianbin Jiao, Guoying Zhao, Qixiang Ye

The end-to-end deep learning approach, referred to as a side-output residual network (SRN), leverages the output residual units (RUs) to fit the errors between the object ground-truth symmetry and the side-outputs of multiple stages.

Image-Image Domain Adaptation with Preserved Self-Similarity and Domain-Dissimilarity for Person Re-identification

2 code implementations CVPR 2018 Weijian Deng, Liang Zheng, Qixiang Ye, Guoliang Kang, Yi Yang, Jianbin Jiao

To this end, we propose to preserve two types of unsupervised similarities, 1) self-similarity of an image before and after translation, and 2) domain-dissimilarity of a translated source image and a target image.

SRN: Side-output Residual Network for Object Symmetry Detection in the Wild

1 code implementation CVPR 2017 Wei Ke, Jie Chen, Jianbin Jiao, Guoying Zhao, Qixiang Ye

By stacking RUs in a deep-to-shallow manner, SRN exploits the 'flow' of errors among multiple scales to ease the problems of fitting complex outputs with limited layers, suppressing the complex backgrounds, and effectively matching object symmetry of different scales.

A Graphical Social Topology Model for Multi-Object Tracking

no code implementations14 Feb 2017 Shan Gao, Xiaogang Chen, Qixiang Ye, Junliang Xing, Arjan Kuijper, Xiangyang Ji

Inspired with the social affinity property of moving objects, we propose a Graphical Social Topology (GST) model, which estimates the group dynamics by jointly modeling the group structure and the states of objects using a topological representation.

Oriented Response Networks

1 code implementation CVPR 2017 Yanzhao Zhou, Qixiang Ye, Qiang Qiu, Jianbin Jiao

DCNNs using ARFs, referred to as Oriented Response Networks (ORNs), can produce within-class rotation-invariant deep features while maintaining inter-class discrimination for classification tasks.

Self-learning Scene-specific Pedestrian Detectors using a Progressive Latent Model

no code implementations CVPR 2017 Qixiang Ye, Tianliang Zhang, Qiang Qiu, Baochang Zhang, Jie Chen, Guillermo Sapiro

In this paper, a self-learning approach is proposed towards solving scene-specific pedestrian detection problem without any human' annotation involved.

A scalable convolutional neural network for task-specified scenarios via knowledge distillation

no code implementations19 Sep 2016 Mengnan Shi, Fei Qin, Qixiang Ye, Zhenjun Han, Jianbin Jiao

In this paper, we explore the redundancy in convolutional neural network, which scales with the complexity of vision tasks.

