GhostNet: More Features from Cheap Operations

object-detection Object Detection In Aerial Images +2

1,851

Paper
Code

Rethinking Rotated Object Detection with Gaussian Wasserstein Distance Loss

2 code implementations • 28 Jan 2021 • Xue Yang, Junchi Yan, Qi Ming, Wentao Wang, Xiaopeng Zhang, Qi Tian

Boundary discontinuity and its inconsistency to the final detection metric have been the bottleneck for rotating detection regression loss design.

Ranked #16 on Object Detection In Aerial Images on DOTA (using extra training data)

1,722

Paper
Code

Learning High-Precision Bounding Box for Rotated Object Detection via Kullback-Leibler Divergence

2 code implementations • NeurIPS 2021 • Xue Yang, Xiaojiang Yang, Jirui Yang, Qi Ming, Wentao Wang, Qi Tian, Junchi Yan

Taking the perspective that horizontal detection is a special case for rotated object detection, in this paper, we are motivated to change the design of rotation regression loss from induction paradigm to deduction methodology, in terms of the relation between rotation and horizontal detection.

Ranked #14 on Object Detection In Aerial Images on DOTA (using extra training data)

object-detection Object Detection In Aerial Images +1

1,722

Paper
Code

The KFIoU Loss for Rotated Object Detection

3 code implementations • 29 Jan 2022 • Xue Yang, Yue Zhou, Gefan Zhang, Jirui Yang, Wentao Wang, Junchi Yan, Xiaopeng Zhang, Qi Tian

This is in contrast to recent Gaussian modeling based rotation detectors e. g. GWD loss and KLD loss that involve a human-specified distribution distance metric which require additional hyperparameter tuning that vary across datasets and detectors.

Cardiac Segmentation Image Segmentation +1

1,722

Paper
Code

4D Gaussian Splatting for Real-Time Dynamic Scene Rendering

1 code implementation • 12 Oct 2023 • Guanjun Wu, Taoran Yi, Jiemin Fang, Lingxi Xie, Xiaopeng Zhang, Wei Wei, Wenyu Liu, Qi Tian, Xinggang Wang

Representing and rendering dynamic scenes has been an important but challenging task.

1,652

Paper
Code

Swin-Unet: Unet-like Pure Transformer for Medical Image Segmentation

5 code implementations • 12 May 2021 • Hu Cao, Yueyue Wang, Joy Chen, Dongsheng Jiang, Xiaopeng Zhang, Qi Tian, Manning Wang

In the past few years, convolutional neural networks (CNNs) have achieved milestones in medical image analysis.

Ranked #3 on Medical Image Segmentation on ACDC

1,500

Paper
Code

Data-Free Learning of Student Networks

3 code implementations • ICCV 2019 • Hanting Chen, Yunhe Wang, Chang Xu, Zhaohui Yang, Chuanjian Liu, Boxin Shi, Chunjing Xu, Chao Xu, Qi Tian

Learning portable neural networks is very essential for computer vision for the purpose that pre-trained heavy deep models can be well applied on edge devices such as mobile phones and micro sensors.

Neural Network Compression

1,110

Paper
Code

AdderNet: Do We Really Need Multiplications in Deep Learning?

7 code implementations • CVPR 2020 • Hanting Chen, Yunhe Wang, Chunjing Xu, Boxin Shi, Chao Xu, Qi Tian, Chang Xu

The widely-used convolutions in deep neural networks are exactly cross-correlation to measure the similarity between input feature and convolution filters, which involves massive multiplications between float values.

946

Paper
Code

Pangu-Weather: A 3D High-Resolution Model for Fast and Accurate Global Weather Forecast

3 code implementations • 3 Nov 2022 • Kaifeng Bi, Lingxi Xie, Hengheng Zhang, Xin Chen, Xiaotao Gu, Qi Tian

In this paper, we present Pangu-Weather, a deep learning based system for fast and accurate global weather forecast.

924

Paper
Code

Segment Anything in 3D with Radiance Fields

1 code implementation • NeurIPS 2023 • Jiazhong Cen, Jiemin Fang, Zanwei Zhou, Chen Yang, Lingxi Xie, Xiaopeng Zhang, Wei Shen, Qi Tian

The Segment Anything Model (SAM) emerges as a powerful vision foundation model to generate high-quality 2D segmentation results.

Inverse Rendering Segmentation

786

Paper
Code

ControlVideo: Training-free Controllable Text-to-Video Generation

1 code implementation • 22 May 2023 • Yabo Zhang, Yuxiang Wei, Dongsheng Jiang, Xiaopeng Zhang, WangMeng Zuo, Qi Tian

Text-driven diffusion models have unlocked unprecedented abilities in image generation, whereas their video counterpart still lags behind due to the excessive training cost of temporal modeling.

Image Generation Text-to-Video Generation +1

691

Paper
Code

GaussianObject: Just Taking Four Images to Get A High-Quality 3D Object with Gaussian Splatting

1 code implementation • 15 Feb 2024 • Chen Yang, Sikuang Li, Jiemin Fang, Ruofan Liang, Lingxi Xie, Xiaopeng Zhang, Wei Shen, Qi Tian

Then we construct a Gaussian repair model based on diffusion models to supplement the omitted object information, where Gaussians are further refined.

Neural Rendering Object

611

Paper
Code

CIPS-3D: A 3D-Aware Generator of GANs Based on Conditionally-Independent Pixel Synthesis

1 code implementation • 19 Oct 2021 • Peng Zhou, Lingxi Xie, Bingbing Ni, Qi Tian

The style-based GAN (StyleGAN) architecture achieved state-of-the-art results for generating high-quality images, but it lacks explicit and precise control over camera poses.

Ranked #1 on 3D-Aware Image Synthesis on FFHQ 256 x 256

3D-Aware Image Synthesis Transfer Learning

606

Paper
Code

Attribute Mix: Semantic Data Augmentation for Fine Grained Recognition

1 code implementation • 6 Apr 2020 • Hao Li, Xiaopeng Zhang, Hongkai Xiong, Qi Tian

In this paper, we propose Attribute Mix, a data augmentation strategy at attribute level to expand the fine-grained samples.

Ranked #22 on Fine-Grained Image Classification on CUB-200-2011

Attribute Data Augmentation +1

568

Paper
Code

GaussianDreamer: Fast Generation from Text to 3D Gaussians by Bridging 2D and 3D Diffusion Models

1 code implementation • 12 Oct 2023 • Taoran Yi, Jiemin Fang, Junjie Wang, Guanjun Wu, Lingxi Xie, Xiaopeng Zhang, Wenyu Liu, Qi Tian, Xinggang Wang

In recent times, the generation of 3D assets from text prompts has shown impressive results.

Text to 3D

520

Paper
Code

Beyond Part Models: Person Retrieval with Refined Part Pooling (and a Strong Convolutional Baseline)

29 code implementations • ECCV 2018 • Yifan Sun, Liang Zheng, Yi Yang, Qi Tian, Shengjin Wang

RPP re-assigns these outliers to the parts they are closest to, resulting in refined parts with enhanced within-part consistency.

Ranked #3 on Person Re-Identification on UAV-Human

Person Re-Identification Person Retrieval +1

482

Paper
Code

Person Transfer GAN to Bridge Domain Gap for Person Re-Identification

25 code implementations • CVPR 2018 • Longhui Wei, Shiliang Zhang, Wen Gao, Qi Tian

Although the performance of person Re-Identification (ReID) has been significantly boosted, many challenging issues in real scenarios have not been fully investigated, e. g., the complex scenes and lighting variations, viewpoint and pose changes, and the large number of identities in a camera network.

Ranked #11 on Unsupervised Person Re-Identification on DukeMTMC-reID (Rank-10 metric)

Generative Adversarial Network Person Re-Identification +1

462

Paper
Code

Deep Modular Co-Attention Networks for Visual Question Answering

7 code implementations • CVPR 2019 • Zhou Yu, Jun Yu, Yuhao Cui, DaCheng Tao, Qi Tian

In this paper, we propose a deep Modular Co-Attention Network (MCAN) that consists of Modular Co-Attention (MCA) layers cascaded in depth.

Ranked #5 on Question Answering on SQA3D

Question Answering Visual Question Answering

432

Paper
Code

PC-DARTS: Partial Channel Connections for Memory-Efficient Architecture Search

8 code implementations • ICLR 2020 • Yuhui Xu, Lingxi Xie, Xiaopeng Zhang, Xin Chen, Guo-Jun Qi, Qi Tian, Hongkai Xiong

Differentiable architecture search (DARTS) provided a fast solution in finding effective network architectures, but suffered from large memory and computing overheads in jointly training a super-network and searching for an optimal architecture.

Ranked #20 on Neural Architecture Search on CIFAR-10

429

Paper
Code

Pixel Difference Networks for Efficient Edge Detection

2 code implementations • ICCV 2021 • Zhuo Su, Wenzhe Liu, Zitong Yu, Dewen Hu, Qing Liao, Qi Tian, Matti Pietikäinen, Li Liu

A faster version of PiDiNet with less than 0. 1M parameters can still achieve comparable performance among state of the arts with 200 FPS.

Ranked #2 on Edge Detection on BRIND

Edge Detection

411

Paper
Code

Progressive Differentiable Architecture Search: Bridging the Depth Gap between Search and Evaluation

4 code implementations • ICCV 2019 • Xin Chen, Lingxi Xie, Jun Wu, Qi Tian

Recently, differentiable search methods have made major progress in reducing the computational costs of neural architecture search.

360

Paper
Code

Progressive DARTS: Bridging the Optimization Gap for NAS in the Wild

4 code implementations • 23 Dec 2019 • Xin Chen, Lingxi Xie, Jun Wu, Qi Tian

With the rapid development of neural architecture search (NAS), researchers found powerful network architectures for a wide range of vision tasks.

Optical Character Recognition (OCR)

360

Paper
Code

DocScanner: Robust Document Image Rectification with Progressive Learning

3 code implementations • 28 Oct 2021 • Hao Feng, Wengang Zhou, Jiajun Deng, Qi Tian, Houqiang Li

The iterative refinements make DocScanner converge to a robust and superior rectification performance, while the lightweight recurrent architecture ensures the running efficiency.

331

Paper
Code

Towards Discriminability and Diversity: Batch Nuclear-norm Maximization under Label Insufficient Situations

2 code implementations • CVPR 2020 • Shuhao Cui, Shuhui Wang, Junbao Zhuo, Liang Li, Qingming Huang, Qi Tian

We find by theoretical analysis that the prediction discriminability and diversity could be separately measured by the Frobenius-norm and rank of the batch output matrix.

Domain Adaptation

320

Paper
Code

Gradually Vanishing Bridge for Adversarial Domain Adaptation

2 code implementations • CVPR 2020 • Shuhao Cui, Shuhui Wang, Junbao Zhuo, Chi Su, Qingming Huang, Qi Tian

On the discriminator, GVB contributes to enhance the discriminating ability, and balance the adversarial training process.

Unsupervised Domain Adaptation

320

Paper
Code

Fast Dynamic Radiance Fields with Time-Aware Neural Voxels

1 code implementation • 30 May 2022 • Jiemin Fang, Taoran Yi, Xinggang Wang, Lingxi Xie, Xiaopeng Zhang, Wenyu Liu, Matthias Nießner, Qi Tian

A multi-distance interpolation method is proposed and applied on voxel features to model both small and large motions.

308

Paper
Code

Actional-Structural Graph Convolutional Networks for Skeleton-based Action Recognition

1 code implementation • CVPR 2019 • Maosen Li, Siheng Chen, Xu Chen, Ya zhang, Yan-Feng Wang, Qi Tian

We validate AS-GCN in action recognition using two skeleton data sets, NTU-RGB+D and Kinetics.

Ranked #20 on Skeleton Based Action Recognition on Kinetics-Skeleton dataset

Action Recognition Pose Prediction +2

283

Paper
Code

Fast Batch Nuclear-norm Maximization and Minimization for Robust Domain Adaptation

1 code implementation • 13 Jul 2021 • Shuhao Cui, Shuhui Wang, Junbao Zhuo, Liang Li, Qingming Huang, Qi Tian

Due to the domain discrepancy in visual domain adaptation, the performance of source model degrades when bumping into the high data density near decision boundary in target domain.

Domain Adaptation

262

Paper
Code

Co-Evolutionary Compression for Unpaired Image Translation

2 code implementations • ICCV 2019 • Han Shu, Yunhe Wang, Xu Jia, Kai Han, Hanting Chen, Chunjing Xu, Qi Tian, Chang Xu

Generative adversarial networks (GANs) have been successfully used for considerable computer vision tasks, especially the image-to-image translation.

Image-to-Image Translation Translation

237

Paper
Code

Cross-Scale Cost Aggregation for Stereo Matching

1 code implementation • CVPR 2014 • Kang Zhang, Yuqiang Fang, Dongbo Min, Lifeng Sun, Shiqiang Yang. Shuicheng Yan, Qi Tian

We firstly reformulate cost aggregation from a unified optimization perspective and show that different cost aggregation methods essentially differ in the choices of similarity kernels.

Stereo Matching Stereo Matching Hand

208

Paper
Code

Multinomial Distribution Learning for Effective Neural Architecture Search

1 code implementation • ICCV 2019 • Xiawu Zheng, Rongrong Ji, Lang Tang, Baochang Zhang, Jianzhuang Liu, Qi Tian

Therefore, NAS can be transformed to a multinomial distribution learning problem, i. e., the distribution is optimized to have a high expectation of the performance.

Computational Efficiency Object +3

207

Paper
Code

Corner Proposal Network for Anchor-free, Two-stage Object Detection

1 code implementation • ECCV 2020 • Kaiwen Duan, Lingxi Xie, Honggang Qi, Song Bai, Qingming Huang, Qi Tian

On the MS-COCO dataset, CPN achieves an AP of 49. 2% which is competitive among state-of-the-art object detection methods.

Ranked #83 on Object Detection on COCO test-dev

193

Paper
Code

CenterNet++ for Object Detection

2 code implementations • 18 Apr 2022 • Kaiwen Duan, Song Bai, Lingxi Xie, Honggang Qi, Qingming Huang, Qi Tian

Our approach, named CenterNet, detects each object as a triplet keypoints (top-left and bottom-right corners and the center keypoint).

Ranked #35 on Object Detection on COCO test-dev

177

Paper
Code

Rethinking Performance Estimation in Neural Architecture Search

1 code implementation • CVPR 2020 • Xiawu Zheng, Rongrong Ji, Qiang Wang, Qixiang Ye, Zhenguo Li, Yonghong Tian, Qi Tian

In this paper, we provide a novel yet systematic rethinking of PE in a resource constrained regime, termed budgeted PE (BPE), which precisely and effectively estimates the performance of an architecture sampled from an architecture space.

2D Human Pose Estimation Instance Segmentation +5

165

Paper
Code

Adversarial Domain Adaptation with Domain Mixup

1 code implementation • 4 Dec 2019 • Minghao Xu, Jian Zhang, Bingbing Ni, Teng Li, Chengjie Wang, Qi Tian, Wenjun Zhang

In this paper, we present adversarial domain adaptation with domain mixup (DM-ADA), which guarantees domain-invariance in a more continuous latent space and guides the domain discriminator in judging samples' difference relative to source and target domains.

Domain Adaptation

159

Paper
Code

Location-Sensitive Visual Recognition with Cross-IOU Loss

1 code implementation • 11 Apr 2021 • Kaiwen Duan, Lingxi Xie, Honggang Qi, Song Bai, Qingming Huang, Qi Tian

Object detection, instance segmentation, and pose estimation are popular visual recognition tasks which require localizing the object by internal or boundary landmarks.

Ranked #45 on Object Detection on COCO test-dev

154

Paper
Code

Fast-iTPN: Integrally Pre-Trained Transformer Pyramid Network with Token Migration

1 code implementation • CVPR 2023 • Yunjie Tian, Lingxi Xie, Jihao Qiu, Jianbin Jiao, YaoWei Wang, Qi Tian, Qixiang Ye

iTPN is born with two elaborated designs: 1) The first pre-trained feature pyramid upon vision transformer (ViT).

object-detection Object Detection +1

149

Paper
Code

A Fourier-based Framework for Domain Generalization

1 code implementation • CVPR 2021 • Qinwei Xu, Ruipeng Zhang, Ya zhang, Yanfeng Wang, Qi Tian

Modern deep neural networks suffer from performance degradation when evaluated on testing data under different distributions from training data.

Data Augmentation Domain Generalization

145

Paper
Code

Cross-domain Detection via Graph-induced Prototype Alignment

1 code implementation • CVPR 2020 • Minghao Xu, Hang Wang, Bingbing Ni, Qi Tian, Wenjun Zhang

To mitigate these problems, we propose a Graph-induced Prototype Alignment (GPA) framework to seek for category-level domain alignment via elaborate prototype representations.

Domain Adaptation object-detection +1

137

Paper
Code

Dynamic Multiscale Graph Neural Networks for 3D Skeleton-Based Human Motion Prediction

1 code implementation • 17 Mar 2020 • Maosen Li, Siheng Chen, Yangheng Zhao, Ya zhang, Yan-Feng Wang, Qi Tian

The core idea of DMGNN is to use a multiscale graph to comprehensively model the internal relations of a human body for motion feature learning.

3D Human Pose Estimation 3D Pose Estimation +2

135

Paper
Code

TS-CAM: Token Semantic Coupled Attention Map for Weakly Supervised Object Localization

2 code implementations • ICCV 2021 • Wei Gao, Fang Wan, Xingjia Pan, Zhiliang Peng, Qi Tian, Zhenjun Han, Bolei Zhou, Qixiang Ye

TS-CAM finally couples the patch tokens with the semantic-agnostic attention map to achieve semantic-aware localization.

Object Weakly-Supervised Object Localization

131

Paper
Code

Video Super-resolution with Temporal Group Attention

1 code implementation • CVPR 2020 • Takashi Isobe, Songjiang Li, Xu Jia, Shanxin Yuan, Gregory Slabaugh, Chunjing Xu, Ya-Li Li, Shengjin Wang, Qi Tian

Video super-resolution, which aims at producing a high-resolution video from its corresponding low-resolution version, has recently drawn increasing attention.

Ranked #11 on Video Super-Resolution on MSU Video Super Resolution Benchmark: Detail Restoration

Video Super-Resolution

122

Paper
Code

Label Decoupling Framework for Salient Object Detection

1 code implementation • CVPR 2020 • Jun Wei, Shuhui Wang, Zhe Wu, Chi Su, Qingming Huang, Qi Tian

Though remarkable progress has been achieved, we observe that the closer the pixel is to the edge, the more difficult it is to be predicted, because edge pixels have a very imbalance distribution.

Ranked #1 on Saliency Detection on HKU-IS

Object object-detection +3

113

Paper
Code

Rethinking the Distribution Gap of Person Re-identification with Camera-based Batch Normalization

1 code implementation • ECCV 2020 • Zijie Zhuang, Longhui Wei, Lingxi Xie, Tianyu Zhang, Hengheng Zhang, Haozhe Wu, Haizhou Ai, Qi Tian

The fundamental difficulty in person re-identification (ReID) lies in learning the correspondence among individual cameras.

Ranked #16 on Unsupervised Domain Adaptation on Duke to Market

Direct Transfer Person Re-identification Domain Adaptive Person Re-Identification +2

104

Paper
Code

CARS: Continuous Evolution for Efficient Neural Architecture Search

1 code implementation • CVPR 2020 • Zhaohui Yang, Yunhe Wang, Xinghao Chen, Boxin Shi, Chao Xu, Chunjing Xu, Qi Tian, Chang Xu

Architectures in the population that share parameters within one SuperNet in the latest generation will be tuned over the training dataset with a few epochs.

Few-Shot Image Classification Few-Shot Learning

103

Paper
Code

Rectifying the Shortcut Learning of Background for Few-Shot Learning

1 code implementation • NeurIPS 2021 • Xu Luo, Longhui Wei, Liangjian Wen, Jinrong Yang, Lingxi Xie, Zenglin Xu, Qi Tian

The category gap between training and evaluation has been characterised as one of the main obstacles to the success of Few-Shot Learning (FSL).

Ranked #20 on Few-Shot Image Classification on Mini-Imagenet 5-way (5-shot)

101

Paper
Code

Video Super-Resolution with Recurrent Structure-Detail Network

2 code implementations • ECCV 2020 • Takashi Isobe, Xu Jia, Shuhang Gu, Songjiang Li, Shengjin Wang, Qi Tian

Most video super-resolution methods super-resolve a single reference frame with the help of neighboring frames in a temporal sliding window.

Ranked #9 on Video Super-Resolution on Vid4 - 4x upscaling - BD degradation

Video Super-Resolution

Paper
Code

CondenseNet V2: Sparse Feature Reactivation for Deep Networks

1 code implementation • CVPR 2021 • Le Yang, Haojun Jiang, Ruojin Cai, Yulin Wang, Shiji Song, Gao Huang, Qi Tian

Reusing features in deep networks through dense connectivity is an effective way to achieve high computational efficiency.

Computational Efficiency Image Classification +2

Paper
Code

MSG-Transformer: Exchanging Local Spatial Information by Manipulating Messenger Tokens

3 code implementations • CVPR 2022 • Jiemin Fang, Lingxi Xie, Xinggang Wang, Xiaopeng Zhang, Wenyu Liu, Qi Tian

Transformers have offered a new methodology of designing neural networks for visual recognition.

Image Classification object-detection +1

Paper
Code

Omni-GAN: On the Secrets of cGANs and Beyond

3 code implementations • ICCV 2021 • Peng Zhou, Lingxi Xie, Bingbing Ni, Cong Geng, Qi Tian

The conditional generative adversarial network (cGAN) is a powerful tool of generating high-quality images, but existing approaches mostly suffer unsatisfying performance or the risk of mode collapse.

Ranked #8 on Conditional Image Generation on ImageNet 128x128

Conditional Image Generation Generative Adversarial Network

Paper
Code

UnrealPerson: An Adaptive Pipeline towards Costless Person Re-identification

1 code implementation • CVPR 2021 • Tianyu Zhang, Lingxi Xie, Longhui Wei, Zijie Zhuang, Yongfei Zhang, Bo Li, Qi Tian

The main difficulty of person re-identification (ReID) lies in collecting annotated data and transferring the model across different domains.

Domain Adaptation Image Generation +1

Paper
Code

Stabilizing DARTS with Amended Gradient Estimation on Architectural Parameters

1 code implementation • 25 Oct 2019 • Kaifeng Bi, Changping Hu, Lingxi Xie, Xin Chen, Longhui Wei, Qi Tian

Our approach bridges the gap from two aspects, namely, amending the estimation on the architectural gradients, and unifying the hyper-parameter settings in the search and re-training stages.

Few-Shot Learning Semantic Parsing

Paper
Code

GraphQ IR: Unifying the Semantic Parsing of Graph Query Languages with One Intermediate Representation

1 code implementation • 24 May 2022 • Lunyiu Nie, Shulin Cao, Jiaxin Shi, Jiuding Sun, Qi Tian, Lei Hou, Juanzi Li, Jidong Zhai

Subject to the huge semantic gap between natural and formal languages, neural semantic parsing is typically bottlenecked by its complexity of dealing with both input semantics and output syntax.

Paper
Code

Reading and Writing: Discriminative and Generative Modeling for Self-Supervised Text Recognition

1 code implementation • 1 Jul 2022 • Mingkun Yang, Minghui Liao, Pu Lu, Jing Wang, Shenggao Zhu, Hualin Luo, Qi Tian, Xiang Bai

Inspired by the observation that humans learn to recognize the texts through both reading and writing, we propose to learn discrimination and generation by integrating contrastive learning and masked image modeling in our self-supervised method.

Contrastive Learning Scene Text Recognition

Paper
Code

Masked Autoencoders are Robust Data Augmentors

1 code implementation • 10 Jun 2022 • Haohang Xu, Shuangrui Ding, Xiaopeng Zhang, Hongkai Xiong, Qi Tian

Specifically, MRA consistently enhances the performance on supervised, semi-supervised as well as few-shot classification.

Image Augmentation Image Classification +1

Paper
Code

SAILER: Structure-aware Pre-trained Language Model for Legal Case Retrieval

1 code implementation • 22 Apr 2023 • Haitao Li, Qingyao Ai, Jia Chen, Qian Dong, Yueyue Wu, Yiqun Liu, Chong Chen, Qi Tian

Moreover, in contrast to the general retrieval, the relevance in the legal domain is sensitive to key legal elements.

Language Modelling Retrieval

Paper
Code

Filter Sketch for Network Pruning

1 code implementation • 23 Jan 2020 • Mingbao Lin, Liujuan Cao, Shaojie Li, Qixiang Ye, Yonghong Tian, Jianzhuang Liu, Qi Tian, Rongrong Ji

Our approach, referred to as FilterSketch, encodes the second-order information of pre-trained weights, which enables the representation capacity of pruned networks to be recovered with a simple fine-tuning procedure.

Network Pruning

Paper
Code

HiViT: Hierarchical Vision Transformer Meets Masked Image Modeling

1 code implementation • 30 May 2022 • Xiaosong Zhang, Yunjie Tian, Wei Huang, Qixiang Ye, Qi Dai, Lingxi Xie, Qi Tian

A key idea of efficient implementation is to discard the masked image patches (or tokens) throughout the target network (encoder), which requires the encoder to be a plain vision transformer (e. g., ViT), albeit hierarchical vision transformers (e. g., Swin Transformer) have potentially better properties in formulating vision inputs.

Transfer Learning

Paper
Code

SdAE: Self-distillated Masked Autoencoder

1 code implementation • 31 Jul 2022 • Yabo Chen, Yuchen Liu, Dongsheng Jiang, Xiaopeng Zhang, Wenrui Dai, Hongkai Xiong, Qi Tian

We also analyze how to build good views for the teacher branch to produce latent representation from the perspective of information bottleneck.

Descriptive Self-Supervised Learning

Paper
Code

Bottom-Up Temporal Action Localization with Mutual Regularization

1 code implementation • ECCV 2020 • Peisen Zhao, Lingxi Xie, Chen Ju, Ya zhang, Yan-Feng Wang, Qi Tian

To alleviate this problem, we introduce two regularization terms to mutually regularize the learning procedure: the Intra-phase Consistency (IntraC) regularization is proposed to make the predictions verified inside each phase; and the Inter-phase Consistency (InterC) regularization is proposed to keep consistency between these phases.

Temporal Action Localization

Paper
Code

Multi-Cue Correlation Filters for Robust Visual Tracking

1 code implementation • CVPR 2018 • Ning Wang, Wengang Zhou, Qi Tian, Richang Hong, Meng Wang, Houqiang Li

By combining different types of features, our approach constructs multiple experts through Discriminative Correlation Filter (DCF) and each of them tracks the target independently.

Visual Tracking

Paper
Code

Visual Recognition by Request

1 code implementation • CVPR 2023 • Chufeng Tang, Lingxi Xie, Xiaopeng Zhang, Xiaolin Hu, Qi Tian

Humans have the ability of recognizing visual semantics in an unlimited granularity, but existing visual recognition algorithms cannot achieve this goal.

Instance Segmentation Semantic Segmentation

Paper
Code

ChatterBox: Multi-round Multimodal Referring and Grounding

1 code implementation • 24 Jan 2024 • Yunjie Tian, Tianren Ma, Lingxi Xie, Jihao Qiu, Xi Tang, Yuan Zhang, Jianbin Jiao, Qi Tian, Qixiang Ye

In this study, we establish a baseline for a new task named multimodal multi-round referring and grounding (MRG), opening up a promising direction for instance-level multimodal dialogues.

Language Modelling Visual Grounding

Paper
Code

Adaptive Graph Representation Learning for Video Person Re-identification

1 code implementation • 5 Sep 2019 • Yiming Wu, Omar El Farouk Bourahla, Xi Li, Fei Wu, Qi Tian, Xue Zhou

While correlations between parts are ignored in the previous methods, to leverage the relations of different parts, we propose an innovative adaptive graph representation learning scheme for video person Re-ID, which enables the contextual interactions between relevant regional features.

Ranked #3 on Person Re-Identification on PRID2011

Graph Representation Learning Video-Based Person Re-Identification

Paper
Code

Large-Scale Spatio-Temporal Person Re-identification: Algorithms and Benchmark

2 code implementations • 31 May 2021 • Xiujun Shu, Xiao Wang, Xianghao Zang, Shiliang Zhang, Yuanqi Chen, Ge Li, Qi Tian

We also verified that models pre-trained on LaST can generalize well on existing datasets with short-term and cloth-changing scenarios.

object-detection Object Detection +1

Paper
Code

Fine-Grained Semantically Aligned Vision-Language Pre-Training

1 code implementation • 4 Aug 2022 • Juncheng Li, Xin He, Longhui Wei, Long Qian, Linchao Zhu, Lingxi Xie, Yueting Zhuang, Qi Tian, Siliang Tang

Large-scale vision-language pre-training has shown impressive advances in a wide range of downstream tasks.

Paper
Code

NeuSample: Neural Sample Field for Efficient View Synthesis

1 code implementation • 30 Nov 2021 • Jiemin Fang, Lingxi Xie, Xinggang Wang, Xiaopeng Zhang, Wenyu Liu, Qi Tian

Neural radiance fields (NeRF) have shown great potentials in representing 3D scenes and synthesizing novel views, but the computational overhead of NeRF at the inference stage is still heavy.

Paper
Code

Wavelet-Based Dual-Branch Network for Image Demoireing

1 code implementation • 14 Jul 2020 • Lin Liu, Jianzhuang Liu, Shanxin Yuan, Gregory Slabaugh, Ales Leonardis, Wengang Zhou, Qi Tian

When smartphone cameras are used to take photos of digital screens, usually moire patterns result, severely degrading photo quality.

Demoire Image Restoration +1

Paper
Code

Federated Domain Generalization With Generalization Adjustment

1 code implementation • CVPR 2023 • Ruipeng Zhang, Qinwei Xu, Jiangchao Yao, Ya zhang, Qi Tian, Yanfeng Wang

Federated Domain Generalization (FedDG) attempts to learn a global model in a privacy-preserving manner that generalizes well to new clients possibly with domain shift.

Domain Generalization Fairness +1

Paper
Code

Adversarial Training Towards Robust Multimedia Recommender System

1 code implementation • 19 Sep 2018 • Jinhui Tang, Xiaoyu Du, Xiangnan He, Fajie Yuan, Qi Tian, Tat-Seng Chua

To this end, we propose a novel solution named Adversarial Multimedia Recommendation (AMR), which can lead to a more robust multimedia recommender model by using adversarial learning.

Information Retrieval Multimedia

Paper
Code

Bag of Instances Aggregation Boosts Self-supervised Distillation

1 code implementation • ICLR 2022 • Haohang Xu, Jiemin Fang, Xiaopeng Zhang, Lingxi Xie, Xinggang Wang, Wenrui Dai, Hongkai Xiong, Qi Tian

Here bag of instances indicates a set of similar samples constructed by the teacher and are grouped within a bag, and the goal of distillation is to aggregate compact representations over the student with respect to instances in a bag.

Contrastive Learning Self-Supervised Learning

Paper
Code

A Bi-Step Grounding Paradigm for Large Language Models in Recommendation Systems

1 code implementation • 16 Aug 2023 • Keqin Bao, Jizhi Zhang, Wenjie Wang, Yang Zhang, Zhengyi Yang, Yancheng Luo, Chong Chen, Fuli Feng, Qi Tian

As the focus on Large Language Models (LLMs) in the field of recommendation intensifies, the optimization of LLMs for recommendation purposes (referred to as LLM4Rec) assumes a crucial role in augmenting their effectiveness in providing recommendations.

Collaborative Filtering Recommendation Systems

Paper
Code

Enhancing Person Re-identification in a Self-trained Subspace

1 code implementation • 20 Apr 2017 • Xun Yang, Meng Wang, Richang Hong, Qi Tian, Yong Rui

To address this problem, in this paper, we propose a self-trained subspace learning paradigm for person re-ID which effectively utilizes both labeled and unlabeled data to learn a discriminative subspace where person images across disjoint camera views can be easily matched.

Object Object Localization +2

Paper
Code

Partial Class Activation Attention for Semantic Segmentation

1 code implementation • CVPR 2022 • Sun-Ao Liu, Hongtao Xie, Hai Xu, Yongdong Zhang, Qi Tian

Current attention-based methods for semantic segmentation mainly model pixel relation through pairwise affinity and coarse segmentation.

Relation Segmentation +1

Paper
Code

Boosting Segment Anything Model Towards Open-Vocabulary Learning

1 code implementation • 6 Dec 2023 • Xumeng Han, Longhui Wei, Xuehui Yu, Zhiyang Dou, Xin He, Kuiran Wang, Zhenjun Han, Qi Tian

The recent Segment Anything Model (SAM) has emerged as a new paradigmatic vision foundation model, showcasing potent zero-shot generalization and flexible prompting.

Paper
Code

HalluciDoctor: Mitigating Hallucinatory Toxicity in Visual Instruction Data

1 code implementation • 22 Nov 2023 • Qifan Yu, Juncheng Li, Longhui Wei, Liang Pang, Wentao Ye, Bosheng Qin, Siliang Tang, Qi Tian, Yueting Zhuang

Multi-modal Large Language Models (MLLMs) tuned on machine-generated instruction-following data have demonstrated remarkable performance in various multi-modal understanding and generation tasks.

Attribute counterfactual +3

Paper
Code

Deep Multimodal Neural Architecture Search

1 code implementation • 25 Apr 2020 • Zhou Yu, Yuhao Cui, Jun Yu, Meng Wang, DaCheng Tao, Qi Tian

Most existing works focus on a single task and design neural architectures manually, which are highly task-specific and hard to generalize to different tasks.

Ranked #19 on Visual Question Answering (VQA) on VQA v2 test-std

Image-text matching Neural Architecture Search +4

Paper
Code

Semantic-Aware Generation for Self-Supervised Visual Representation Learning

1 code implementation • 25 Nov 2021 • Yunjie Tian, Lingxi Xie, Xiaopeng Zhang, Jiemin Fang, Haohang Xu, Wei Huang, Jianbin Jiao, Qi Tian, Qixiang Ye

In this paper, we propose a self-supervised visual representation learning approach which involves both generative and discriminative proxies, where we focus on the former part by requiring the target network to recover the original image based on the mid-level features.

Ranked #63 on Semantic Segmentation on Cityscapes test

Representation Learning Semantic Segmentation

Paper
Code

Skeleton-Parted Graph Scattering Networks for 3D Human Motion Prediction

1 code implementation • 31 Jul 2022 • Maosen Li, Siheng Chen, Zijing Zhang, Lingxi Xie, Qi Tian, Ya zhang

To address the first issue, we propose adaptive graph scattering, which leverages multiple trainable band-pass graph filters to decompose pose features into richer graph spectrum bands.

Human motion prediction motion prediction

Paper
Code

Towards a Unified View on Visual Parameter-Efficient Transfer Learning

1 code implementation • 3 Oct 2022 • Bruce X. B. Yu, Jianlong Chang, Lingbo Liu, Qi Tian, Chang Wen Chen

Towards this goal, we propose a framework with a unified view of PETL called visual-PETL (V-PETL) to investigate the effects of different PETL techniques, data scales of downstream domains, positions of trainable parameters, and other aspects affecting the trade-off.

Action Recognition Image Classification +2

Paper
Code

CooGAN: A Memory-Efficient Framework for High-Resolution Facial Attribute Editing

1 code implementation • ECCV 2020 • Xuanhong Chen, Bingbing Ni, Naiyuan Liu, Ziang Liu, Yiliu Jiang, Loc Truong, Qi Tian

In contrast to great success of memory-consuming face editing methods at a low resolution, to manipulate high-resolution (HR) facial images, i. e., typically larger than 7682 pixels, with very limited memory is still challenging.

Attribute Image Generation +2

Paper
Code

Greedy Gradient Ensemble for Robust Visual Question Answering

1 code implementation • ICCV 2021 • Xinzhe Han, Shuhui Wang, Chi Su, Qingming Huang, Qi Tian

Language bias is a critical issue in Visual Question Answering (VQA), where models often exploit dataset biases for the final decision without considering the image information.

Ranked #2 on Visual Question Answering (VQA) on VQA-CP

Question Answering Visual Question Answering

Paper
Code

Beyond Masking: Demystifying Token-Based Pre-Training for Vision Transformers

1 code implementation • 27 Mar 2022 • Yunjie Tian, Lingxi Xie, Jiemin Fang, Mengnan Shi, Junran Peng, Xiaopeng Zhang, Jianbin Jiao, Qi Tian, Qixiang Ye

The past year has witnessed a rapid development of masked image modeling (MIM).

Paper
Code

Hadamard Matrix Guided Online Hashing

1 code implementation • 11 May 2019 • Mingbao Lin, Rongrong Ji, Hong Liu, Xiaoshuai Sun, Shen Chen, Qi Tian

We then treat the learning of hash functions as a set of binary classification problems to fit the assigned target code.

Binary Classification

Paper
Code

Single Camera Training for Person Re-identification

1 code implementation • 24 Sep 2019 • Tianyu Zhang, Lingxi Xie, Longhui Wei, Yongfei Zhang, Bo Li, Qi Tian

Differently, this paper investigates ReID in an unexplored single-camera-training (SCT) setting, where each person in the training set appears in only one camera.

Metric Learning Person Re-Identification

Paper
Code

TapLab: A Fast Framework for Semantic Video Segmentation Tapping into Compressed-Domain Knowledge

1 code implementation • 30 Mar 2020 • Junyi Feng, Songyuan Li, Xi Li, Fei Wu, Qi Tian, Ming-Hsuan Yang, Haibin Ling

Real-time semantic video segmentation is a challenging task due to the strict requirements of inference speed.

Image Segmentation Semantic Segmentation +2

Paper
Code

GOLD-NAS: Gradual, One-Level, Differentiable

1 code implementation • 7 Jul 2020 • Kaifeng Bi, Lingxi Xie, Xin Chen, Longhui Wei, Qi Tian

There has been a large literature of neural architecture search, but most existing work made use of heuristic rules that largely constrained the search flexibility.

Image Classification Neural Architecture Search

Paper
Code

Semantic-guided Pixel Sampling for Cloth-Changing Person Re-identification

1 code implementation • 24 Jul 2021 • Xiujun Shu, Ge Li, Xiao Wang, Weijian Ruan, Qi Tian

The key to this task is to exploit cloth-irrelevant cues.

Cloth-Changing Person Re-Identification

Paper
Code

DE-Net: Dynamic Text-guided Image Editing Adversarial Networks

1 code implementation • 2 Jun 2022 • Ming Tao, Bing-Kun Bao, Hao Tang, Fei Wu, Longhui Wei, Qi Tian

To solve these limitations, we propose: (i) a Dynamic Editing Block (DEBlock) which composes different editing modules dynamically for various editing requirements.

text-guided-image-editing

Paper
Code

Rethinking Diversified and Discriminative Proposal Generation for Visual Grounding

1 code implementation • 9 May 2018 • Zhou Yu, Jun Yu, Chenchao Xiang, Zhou Zhao, Qi Tian, DaCheng Tao

Visual grounding aims to localize an object in an image referred to by a textual query phrase.

Ranked #9 on Phrase Grounding on Flickr30k Entities Test

Phrase Grounding Visual Grounding

Paper
Code

Towards Visual Feature Translation

1 code implementation • CVPR 2019 • Jie Hu, Rongrong Ji, Hong Liu, Shengchuan Zhang, Cheng Deng, Qi Tian

In this paper, we make the first attempt towards visual feature translation to break through the barrier of using features across different visual search systems.

Translation

Paper
Code

Iterative Reorganization with Weak Spatial Constraints: Solving Arbitrary Jigsaw Puzzles for Unsupervised Representation Learning

1 code implementation • CVPR 2019 • Chen Wei, Lingxi Xie, Xutong Ren, Yingda Xia, Chi Su, Jiaying Liu, Qi Tian, Alan L. Yuille

We consider spatial contexts, for which we solve so-called jigsaw puzzles, i. e., each image is cut into grids and then disordered, and the goal is to recover the correct configuration.

General Classification Image Classification +4

Paper
Code

Towards 3D Molecule-Text Interpretation in Language Models

1 code implementation • 25 Jan 2024 • Sihang Li, Zhiyuan Liu, Yanchen Luo, Xiang Wang, Xiangnan He, Kenji Kawaguchi, Tat-Seng Chua, Qi Tian

Through 3D molecule-text alignment and 3D molecule-centric instruction tuning, 3D-MoLM establishes an integration of 3D molecular encoder and LM.

Instruction Following Language Modelling +3

Paper
Code

Wnet: Audio-Guided Video Object Segmentation via Wavelet-Based Cross-Modal Denoising Networks

1 code implementation • CVPR 2022 • Wenwen Pan, Haonan Shi, Zhou Zhao, Jieming Zhu, Xiuqiang He, Zhigeng Pan, Lianli Gao, Jun Yu, Fei Wu, Qi Tian

Audio-Guided video semantic segmentation is a challenging problem in visual analysis and editing, which automatically separates foreground objects from background in a video sequence according to the referring audio expressions.

Denoising Segmentation +3

Paper
Code

AVT: Unsupervised Learning of Transformation Equivariant Representations by Autoencoding Variational Transformations

1 code implementation • ICCV 2019 • Guo-Jun Qi, Liheng Zhang, Chang Wen Chen, Qi Tian

This ensures the resultant TERs of individual images contain the {\em intrinsic} information about their visual structures that would equivary {\em extricably} under various transformations in a generalized {\em nonlinear} case.

Paper
Code

Self-Adaptively Learning to Demoire from Focused and Defocused Image Pairs

1 code implementation • 3 Nov 2020 • Lin Liu, Shanxin Yuan, Jianzhuang Liu, Liping Bao, Gregory Slabaugh, Qi Tian

In this paper, we propose a self-adaptive learning method for demoireing a high-frequency image, with the help of an additional defocused moire-free blur image.

Demoire Test-time Adaptation

Paper
Code

Information Competing Process for Learning Diversified Representations

1 code implementation • NeurIPS 2019 • Jie Hu, Rongrong Ji, Shengchuan Zhang, Xiaoshuai Sun, Qixiang Ye, Chia-Wen Lin, Qi Tian

Learning representations with diversified information remains as an open problem.

General Classification Image Classification +2

Paper
Code

Creating Something from Nothing: Unsupervised Knowledge Distillation for Cross-Modal Hashing

1 code implementation • CVPR 2020 • Hengtong Hu, Lingxi Xie, Richang Hong, Qi Tian

In recent years, cross-modal hashing (CMH) has attracted increasing attentions, mainly because its potential ability of mapping contents from different modalities, especially in vision and language, into the same space, so that it becomes efficient in cross-modal data retrieval.

Knowledge Distillation Retrieval

Paper
Code

Dual Distribution Alignment Network for Generalizable Person Re-Identification

1 code implementation • 27 Jul 2020 • Peixian Chen, Pingyang Dai, Jianzhuang Liu, Feng Zheng, Qi Tian, Rongrong Ji

Domain generalization (DG) serves as a promising solution to handle person Re-Identification (Re-ID), which trains the model using labels from the source domain alone, and then directly adopts the trained model to the target domain without model updating.

Domain Generalization Generalizable Person Re-identification

Paper
Code

DATA: Domain-Aware and Task-Aware Self-supervised Learning

1 code implementation • CVPR 2022 • Qing Chang, Junran Peng, Lingxie Xie, Jiajun Sun, Haoran Yin, Qi Tian, Zhaoxiang Zhang

However, due to the high training costs and the unconsciousness of downstream usages, most self-supervised learning methods lack the capability to correspond to the diversities of downstream scenarios, as there are various data domains, different vision tasks and latency constraints on models.

Image Classification Model Selection +5

Paper
Code

Prune Spatio-temporal Tokens by Semantic-aware Temporal Accumulation

1 code implementation • ICCV 2023 • Shuangrui Ding, Peisen Zhao, Xiaopeng Zhang, Rui Qian, Hongkai Xiong, Qi Tian

Based on the STA score, we are able to progressively prune the tokens without introducing any additional parameters or requiring further re-training.

Video Recognition

Paper
Code

Self-Regulated Learning for Egocentric Video Activity Anticipation

1 code implementation • 23 Nov 2021 • Zhaobo Qi, Shuhui Wang, Chi Su, Li Su, Qingming Huang, Qi Tian

Future activity anticipation is a challenging problem in egocentric vision.

Multi-Task Learning

Paper
Code

DeeCap: Dynamic Early Exiting for Efficient Image Captioning

1 code implementation • CVPR 2022 • Zhengcong Fei, Xu Yan, Shuhui Wang, Qi Tian

On one hand, the representation in shallow layers lacks high-level semantic and sufficient cross-modal fusion information for accurate prediction.

Image Captioning Imitation Learning

Paper
Code

Projection & Probability-Driven Black-Box Attack

1 code implementation • CVPR 2020 • Jie Li, Rongrong Ji, Hong Liu, Jianzhuang Liu, Bineng Zhong, Cheng Deng, Qi Tian

For reducing the solution space, we first model the adversarial perturbation optimization problem as a process of recovering frequency-sparse perturbations with compressed sensing, under the setting that random noise in the low-frequency space is more likely to be adversarial.

Paper
Code

Network Adjustment: Channel Search Guided by FLOPs Utilization Ratio

1 code implementation • CVPR 2020 • Zhengsu Chen, Jianwei Niu, Lingxi Xie, Xuefeng Liu, Longhui Wei, Qi Tian

Automatic designing computationally efficient neural networks has received much attention in recent years.

Image Classification Network Pruning

Paper
Code

DR2-Net: Deep Residual Reconstruction Network for Image Compressive Sensing

1 code implementation • 19 Feb 2017 • Hantao Yao, Feng Dai, Dongming Zhang, Yike Ma, Shiliang Zhang, Yongdong Zhang, Qi Tian

Accordingly, DR$^{2}$-Net consists of two components, \emph{i. e.,} linear mapping network and residual network, respectively.

Compressive Sensing Image Reconstruction

Paper
Code

Being Comes from Not-being: Open-vocabulary Text-to-Motion Generation with Wordless Training

1 code implementation • CVPR 2023 • Junfan Lin, Jianlong Chang, Lingbo Liu, Guanbin Li, Liang Lin, Qi Tian, Chang Wen Chen

During inference, instead of changing the motion generator, our method reformulates the input text into a masked motion as the prompt for the motion generator to ``reconstruct'' the motion.

Language Modelling Zero-Shot Learning

Paper
Code

Progressive Unsupervised Person Re-identification by Tracklet Association with Spatio-Temporal Regularization

1 code implementation • 25 Oct 2019 • Qiaokang Xie, Wengang Zhou, Guo-Jun Qi, Qi Tian, Houqiang Li

In our approach, we first collect tracklet data within each camera by automatic person detection and tracking.

Human Detection Representation Learning +1

Paper
Code

Towards Compact CNNs via Collaborative Compression

1 code implementation • CVPR 2021 • Yuchao Li, Shaohui Lin, Jianzhuang Liu, Qixiang Ye, Mengdi Wang, Fei Chao, Fan Yang, Jincheng Ma, Qi Tian, Rongrong Ji

Channel pruning and tensor decomposition have received extensive attention in convolutional neural network compression.

Neural Network Compression Tensor Decomposition

Paper
Code

A Real-time Global Inference Network for One-stage Referring Expression Comprehension

1 code implementation • 7 Dec 2019 • Yiyi Zhou, Rongrong Ji, Gen Luo, Xiaoshuai Sun, Jinsong Su, Xinghao Ding, Chia-Wen Lin, Qi Tian

Referring Expression Comprehension (REC) is an emerging research spot in computer vision, which refers to detecting the target region in an image given an text description.

feature selection Referring Expression +1

Paper
Code

Circumventing Outliers of AutoAugment with Knowledge Distillation

1 code implementation • ECCV 2020 • Longhui Wei, An Xiao, Lingxi Xie, Xin Chen, Xiaopeng Zhang, Qi Tian

AutoAugment has been a powerful algorithm that improves the accuracy of many vision tasks, yet it is sensitive to the operator space as well as hyper-parameters, and an improper setting may degenerate network optimization.

Ranked #185 on Image Classification on ImageNet

Data Augmentation General Classification +2

Paper
Code

Searching towards Class-Aware Generators for Conditional Generative Adversarial Networks

1 code implementation • 25 Jun 2020 • Peng Zhou, Lingxi Xie, Xiaopeng Zhang, Bingbing Ni, Qi Tian

To learn the sampling policy, a Markov decision process is embedded into the search algorithm and a moving average is applied for better stability.

Image Generation

Paper
Code

When Parameter-efficient Tuning Meets General-purpose Vision-language Models

1 code implementation • 16 Dec 2023 • Yihang Zhai, Haixin Wang, Jianlong Chang, Xinlong Yang, Jinan Sun, Shikun Zhang, Qi Tian

Instruction tuning has shown promising potential for developing general-purpose AI capabilities by using large-scale pre-trained models and boosts growing research to integrate multimodal information for creative applications.

Paper
Code

Harmonized Multimodal Learning with Gaussian Process Latent Variable Models

1 code implementation • 14 Aug 2019 • Guoli Song, Shuhui Wang, Qingming Huang, Qi Tian

Multimodal learning aims to discover the relationship between multiple modalities.

Cross-Modal Retrieval Retrieval

Paper
Code

General Greedy De-bias Learning

1 code implementation • 20 Dec 2021 • Xinzhe Han, Shuhui Wang, Chi Su, Qingming Huang, Qi Tian

Existing de-bias learning frameworks try to capture specific dataset bias by annotations but they fail to handle complicated OOD scenarios.

Image Classification Question Answering +1

Paper
Code

Active Pointly-Supervised Instance Segmentation

1 code implementation • 23 Jul 2022 • Chufeng Tang, Lingxi Xie, Gang Zhang, Xiaopeng Zhang, Qi Tian, Xiaolin Hu

In this paper, we present an economic active learning setting, named active pointly-supervised instance segmentation (APIS), which starts with box-level annotations and iteratively samples a point within the box and asks if it falls on the object.

Active Learning Instance Segmentation +2

Paper
Code

AiluRus: A Scalable ViT Framework for Dense Prediction

1 code implementation • NeurIPS 2023 • Jin Li, Yaoming Wang, Xiaopeng Zhang, Bowen Shi, Dongsheng Jiang, Chenglin Li, Wenrui Dai, Hongkai Xiong, Qi Tian

Specifically, at the intermediate layer of the ViT, we utilize a spatial-aware density-based clustering algorithm to select representative tokens from the token sequence.

object-detection Object Detection +1

Paper
Code

LogFormer: A Pre-train and Tuning Pipeline for Log Anomaly Detection

1 code implementation • 9 Jan 2024 • Hongcheng Guo, Jian Yang, Jiaheng Liu, Jiaqi Bai, Boyang Wang, Zhoujun Li, Tieqiao Zheng, Bo Zhang, Junran Peng, Qi Tian

Log anomaly detection is a key component in the field of artificial intelligence for IT operations (AIOps).

Anomaly Detection

Paper
Code

FedSkip: Combatting Statistical Heterogeneity with Federated Skip Aggregation

1 code implementation • 14 Dec 2022 • Ziqing Fan, Yanfeng Wang, Jiangchao Yao, Lingjuan Lyu, Ya zhang, Qi Tian

However, in addition to previous explorations for improvement in federated averaging, our analysis shows that another critical bottleneck is the poorer optima of client models in more heterogeneous conditions.

Federated Learning

Paper
Code

Learning Transferable Pedestrian Representation from Multimodal Information Supervision

1 code implementation • 12 Apr 2023 • Liping Bao, Longhui Wei, Xiaoyu Qiu, Wengang Zhou, Houqiang Li, Qi Tian

Recent researches on unsupervised person re-identification~(reID) have demonstrated that pre-training on unlabeled person images achieves superior performance on downstream reID tasks than pre-training on ImageNet.

Ranked #2 on Unsupervised Person Re-Identification on DukeMTMC-reID

Attribute Contrastive Learning +3

Paper
Code

One-bit Supervision for Image Classification

1 code implementation • NeurIPS 2020 • Hengtong Hu, Lingxi Xie, Zewei Du, Richang Hong, Qi Tian

Instead of training a model upon the accurate label of each sample, our setting requires the model to query with a predicted label of each sample and learn from the answer whether the guess is correct.

Classification General Classification +1

Paper
Code

Dilated Context Integrated Network with Cross-Modal Consensus for Temporal Emotion Localization in Videos

1 code implementation • 3 Aug 2022 • Juncheng Li, Junlin Xie, Linchao Zhu, Long Qian, Siliang Tang, Wenqiao Zhang, Haochen Shi, Shengyu Zhang, Longhui Wei, Qi Tian, Yueting Zhuang

In this paper, we introduce a new task, named Temporal Emotion Localization in videos~(TEL), which aims to detect human emotions and localize their corresponding temporal boundaries in untrimmed videos with aligned subtitles.

Emotion Classification Temporal Action Localization +1

Paper
Code

Enhance Image Classification via Inter-Class Image Mixup with Diffusion Model

1 code implementation • 28 Mar 2024 • Zhicai Wang, Longhui Wei, Tan Wang, Heyu Chen, Yanbin Hao, Xiang Wang, Xiangnan He, Qi Tian

Text-to-image (T2I) generative models have recently emerged as a powerful tool, enabling the creation of photo-realistic images and giving rise to a multitude of applications.

Data Augmentation Image Classification

Paper
Code

Unsupervised Person Re-identification via Softened Similarity Learning

1 code implementation • CVPR 2020 • Yutian Lin, Lingxi Xie, Yu Wu, Chenggang Yan, Qi Tian

Person re-identification (re-ID) is an important topic in computer vision.

Clustering General Classification +2

Paper
Code

Fast Non-Local Neural Networks with Spectral Residual Learning

1 code implementation • MM '19: Proceedings of the 27th ACM International Conference on Multimedia 2019 • Lu Chi, Guiyu Tian, Yadong Mu, Lingxi Xie, Qi Tian

We show its equivalence to conducting residual learning in some spectral domain and carefully re-formulate a variety of neural layers into their spectral forms, such as ReLU or convolutions.

Pose Estimation Video Classification

Paper
Code

DisturbLabel: Regularizing CNN on the Loss Layer

2 code implementations • CVPR 2016 • Lingxi Xie, Jingdong Wang, Zhen Wei, Meng Wang, Qi Tian

During a long period of time we are combating over-fitting in the CNN training process with model regularization, including weight decay, model averaging, data augmentation, etc.

Data Augmentation

Paper
Code

Adapting Shortcut With Normalizing Flow: An Efficient Tuning Framework for Visual Recognition

1 code implementation • CVPR 2023 • Yaoming Wang, Bowen Shi, Xiaopeng Zhang, Jin Li, Yuchen Liu, Wenrui Dai, Chenglin Li, Hongkai Xiong, Qi Tian

To mitigate the computational and storage demands, recent research has explored Parameter-Efficient Fine-Tuning (PEFT), which focuses on tuning a minimal number of parameters for efficient adaptation.

Paper
Code

Probabilistic Tree-of-thought Reasoning for Answering Knowledge-intensive Complex Questions

1 code implementation • 23 Nov 2023 • Shulin Cao, Jiajie Zhang, Jiaxin Shi, Xin Lv, Zijun Yao, Qi Tian, Juanzi Li, Lei Hou

During reasoning, for leaf nodes, LLMs choose a more confident answer from Closed-book QA that employs parametric knowledge and Open-book QA that employs retrieved external knowledge, thus eliminating the negative retrieval problem.

Retrieval

Paper
Code

Loss re-scaling VQA: Revisiting the LanguagePrior Problem from a Class-imbalance View

1 code implementation • 30 Oct 2020 • Yangyang Guo, Liqiang Nie, Zhiyong Cheng, Qi Tian, Min Zhang

Concretely, we design a novel interpretation scheme whereby the loss of mis-predicted frequent and sparse answers of the same question type is distinctly exhibited during the late training phase.

Face Recognition Image Classification +2

Paper
Code

Latency-Aware Differentiable Neural Architecture Search

1 code implementation • 17 Jan 2020 • Yuhui Xu, Lingxi Xie, Xiaopeng Zhang, Xin Chen, Bowen Shi, Qi Tian, Hongkai Xiong

However, these methods suffer the difficulty in optimizing network, so that the searched network is often unfriendly to hardware.

Image Captioning Sentence

Paper
Code

Semi-Autoregressive Image Captioning

1 code implementation • 11 Oct 2021 • Xu Yan, Zhengcong Fei, Zekang Li, Shuhui Wang, Qingming Huang, Qi Tian

Non-autoregressive image captioning with continuous iterative refinement, which eliminates the sequential dependence in a sentence generation, can achieve comparable performance to the autoregressive counterparts with a considerable acceleration.

Paper
Code

See Blue Sky: Deep Image Dehaze Using Paired and Unpaired Training Images

2 code implementations • 14 Oct 2022 • Xiaoyan Zhang, Gaoyang Tang, Yingying Zhu, Qi Tian

The issue of image haze removal has attracted wide attention in recent years.

Generative Adversarial Network

Paper
Code

Prototype-guided Cross-task Knowledge Distillation for Large-scale Models

1 code implementation • 26 Dec 2022 • Deng Li, Aming Wu, Yahong Han, Qi Tian

Considering the complexity and variability of real scene tasks, we propose a Prototype-guided Cross-task Knowledge Distillation (ProC-KD) approach to transfer the intrinsic local-level object knowledge of a large-scale teacher network to various task scenarios.

Knowledge Distillation

Paper
Code

SIFT Meets CNN: A Decade Survey of Instance Retrieval

1 code implementation • 5 Aug 2016 • Liang Zheng, Yi Yang, Qi Tian

This survey presents milestones in modern instance retrieval, reviews a broad selection of previous works in different categories, and provides insights on the connection between SIFT and CNN-based methods.

Content-Based Image Retrieval Retrieval

Paper
Code

API-Net: Robust Generative Classifier via a Single Discriminator

1 code implementation • ECCV 2020 • Xinshuai Dong, Hong Liu, Rongrong Ji, Liujuan Cao, Qixiang Ye, Jianzhuang Liu, Qi Tian

On the contrary, a discriminative classifier only models the conditional distribution of labels given inputs, but benefits from effective optimization owing to its succinct structure.

Robust classification

Paper
Code

Entity-enhanced Adaptive Reconstruction Network for Weakly Supervised Referring Expression Grounding

1 code implementation • 18 Jul 2022 • Xuejing Liu, Liang Li, Shuhui Wang, Zheng-Jun Zha, Zechao Li, Qi Tian, Qingming Huang

Second, most previous weakly supervised REG methods ignore the discriminative location and context of the referent, causing difficulties in distinguishing the target from other same-category objects.

Attribute Referring Expression +2

Paper
Code

Zigzag Learning for Weakly Supervised Object Detection

no code implementations • CVPR 2018 • Xiaopeng Zhang, Jiashi Feng, Hongkai Xiong, Qi Tian

Unlike them, we propose a zigzag learning strategy to simultaneously discover reliable object instances and prevent the model from overfitting initial seeds.

Ranked #16 on Weakly Supervised Object Detection on PASCAL VOC 2012 test

Attribute Multi-Task Learning

Paper
Add Code

Social Anchor-Unit Graph Regularized Tensor Completion for Large-Scale Image Retagging

no code implementations • 12 Apr 2018 • Jinhui Tang, Xiangbo Shu, Zechao Li, Yu-Gang Jiang, Qi Tian

Recent approaches simultaneously explore visual, user and tag information to improve the performance of image retagging by constructing and exploring an image-tag-user graph.

Graph Learning TAG

Paper
Add Code

A Novel Multi-Task Tensor Correlation Neural Network for Facial Attribute Prediction

no code implementations • 9 Apr 2018 • Mingxing Duan, Kenli Li, Qi Tian

In this paper, we propose a novel multi-attribute tensor correlation neural network (MTCN) for face attribute prediction.

Paper
Add Code

The Unmanned Aerial Vehicle Benchmark: Object Detection and Tracking

no code implementations • ECCV 2018 • Dawei Du, Yuankai Qi, Hongyang Yu, Yifan Yang, Kaiwen Duan, Guorong Li, Weigang Zhang, Qingming Huang, Qi Tian

Selected from 10 hours raw videos, about 80, 000 representative frames are fully annotated with bounding boxes as well as up to 14 kinds of attributes (e. g., weather condition, flying altitude, camera view, vehicle category, and occlusion) for three fundamental computer vision tasks: object detection, single object tracking, and multiple object tracking.

Ranked #5 on Object Detection on UAVDT

Multiple Object Tracking Object +3

Paper
Add Code

LVreID: Person Re-Identification with Long Sequence Videos

no code implementations • 20 Dec 2017 • Jianing Li, Shiliang Zhang, Jingdong Wang, Wen Gao, Qi Tian

This paper mainly establishes a large-scale Long sequence Video database for person re-IDentification (LVreID).

Data Augmentation Person Re-Identification

Paper
Add Code

Pseudo-positive regularization for deep person re-identification

no code implementations • 17 Nov 2017 • Fuqing Zhu, Xiangwei Kong, Haiyan Fu, Qi Tian

A small proportion of these retrieved samples are randomly selected as the Pseudo Positive samples and added to the target training set for the supervised CNN training.

Paper
Add Code

Deep Representation Learning with Part Loss for Person Re-Identification

no code implementations • 4 Jul 2017 • Hantao Yao, Shiliang Zhang, Yongdong Zhang, Jintao Li, Qi Tian

The representation learning risk is evaluated by the proposed part loss, which automatically generates several parts for an image, and computes the person classification loss on each part separately.

Ranked #97 on Person Re-Identification on Market-1501

Classification General Classification +2

Paper
Add Code

Learning to Learn Image Classifiers with Visual Analogy

no code implementations • CVPR 2019 • Linjun Zhou, Peng Cui, Shiqiang Yang, Wenwu Zhu, Qi Tian

We then propose an out-of-sample embedding method to learn the embedding of a new class represented by a few samples through its visual analogy with base classes and derive the classification parameters for the new class.

Classification General Classification +1

Paper
Add Code

Effective Image Retrieval via Multilinear Multi-index Fusion

no code implementations • 27 Sep 2017 • Zhizhong Zhang, Yuan Xie, Wensheng Zhang, Qi Tian

In this paper, we propose a new multi-index fusion scheme for image retrieval.

Image Retrieval Retrieval

Paper
Add Code

Pose-driven Deep Convolutional Model for Person Re-identification

no code implementations • ICCV 2017 • Chi Su, Jianing Li, Shiliang Zhang, Junliang Xing, Wen Gao, Qi Tian

Our deep architecture explicitly leverages the human part cues to alleviate the pose variations and learn robust feature representations from both the global image and different local parts.

Ranked #105 on Person Re-Identification on Market-1501

Image Retrieval Quantization +1

Paper
Add Code

E$^2$BoWs: An End-to-End Bag-of-Words Model via Deep Convolutional Neural Network

no code implementations • 18 Sep 2017 • Xiaobin Liu, Shiliang Zhang, Tiejun Huang, Qi Tian

To conquer these issues, we propose an End-to-End BoWs (E$^2$BoWs) model based on Deep Convolutional Neural Network (DCNN).

Paper
Add Code

GLAD: Global-Local-Alignment Descriptor for Pedestrian Retrieval

no code implementations • 13 Sep 2017 • Longhui Wei, Shiliang Zhang, Hantao Yao, Wen Gao, Qi Tian

Targeting to solve these problems, this work proposes a Global-Local-Alignment Descriptor (GLAD) and an efficient indexing and retrieval framework, respectively.

Ranked #93 on Person Re-Identification on Market-1501

Person Re-Identification Representation Learning +1

Paper
Add Code

Multidimensional Scaling on Multiple Input Distance Matrices

no code implementations • 1 May 2016 • Song Bai, Xiang Bai, Longin Jan Latecki, Qi Tian

How to do multidimensional scaling on multiple input distance matrices is still unsolved to our best knowledge.

Paper
Add Code

One-Shot Fine-Grained Instance Retrieval

no code implementations • 4 Jul 2017 • Hantao Yao, Shiliang Zhang, Yongdong Zhang, Jintao Li, Qi Tian

Aiming to conquer this issue, we propose a retrieval task named One-Shot Fine-Grained Instance Retrieval (OSFGIR).

Fine-Grained Visual Categorization Image Retrieval +1

Paper
Add Code

Ensemble of Part Detectors for Simultaneous Classification and Localization

no code implementations • 29 May 2017 • Xiaopeng Zhang, Hongkai Xiong, Weiyao Lin, Qi Tian

Part-based representation has been proven to be effective for a variety of visual applications.

Classification Clustering +4

Paper
Add Code

Part-based Deep Hashing for Large-scale Person Re-identification

no code implementations • 5 May 2017 • Fuqing Zhu, Xiangwei Kong, Liang Zheng, Haiyan Fu, Qi Tian

In the experiment, we show that the proposed Part-based Deep Hashing method yields very competitive re-id accuracy on the large-scale Market-1501 and Market-1501+500K datasets.

Deep Hashing Large-Scale Person Re-Identification

Paper
Add Code

Person Re-identification in the Wild

no code implementations • CVPR 2017 • Liang Zheng, Hengheng Zhang, Shaoyan Sun, Manmohan Chandraker, Yi Yang, Qi Tian

Our baselines address three issues: the performance of various combinations of detectors and recognizers, mechanisms for pedestrian detection to help improve overall re-identification accuracy and assessing the effectiveness of different detectors for re-identification.

Benchmarking Pedestrian Detection +2

Paper
Add Code

Scalable Person Re-identification on Supervised Smoothed Manifold

no code implementations • CVPR 2017 • Song Bai, Xiang Bai, Qi Tian

Most existing person re-identification algorithms either extract robust visual features or learn discriminative metrics for person images.

Ranked #100 on Person Re-Identification on Market-1501