Search Results for author: Qi Tian

Found 327 papers, 140 papers with code

Extract and Merge: Superpixel Segmentation with Regional Attributes

no code implementations • ECCV 2020 • Jianqiao An, Yucheng Shi, Yahong Han, Meijun Sun, Qi Tian

For a certain object in an image, the relationship between its central region and the peripheral region is not well utilized in existing superpixel segmentation methods.

Attribute Superpixels

Paper
Add Code

Large-Scale Few-Shot Learning via Multi-Modal Knowledge Discovery

no code implementations • ECCV 2020 • Shuo Wang, Jun Yue, Jianzhuang Liu, Qi Tian, Meng Wang

It is a challenging problem since (1) the identifying process is susceptible to over-fitting with limited samples of an object, and (2) the sample imbalance between a base (known knowledge) category and a novel category is easy to bias the recognition results.

Few-Shot Learning

Paper
Add Code

Wavelet-Based Dual-Branch Network for Image Demoiréing

no code implementations • ECCV 2020 • Lin Liu, Jianzhuang Liu, Shanxin Yuan, Gregory Slabaugh, Aleš Leonardis, Wengang Zhou, Qi Tian

When smartphone cameras are used to take photos of digital screens, usually moire patterns result, severely degrading photo quality.

Image Restoration Rain Removal

Paper
Add Code

Interpretable Visual Reasoning via Probabilistic Formulation under Natural Supervision

no code implementations • ECCV 2020 • Xinzhe Han, Shuhui Wang, Chi Su, Weigang Zhang, Qingming Huang, Qi Tian

In this paper, we rethink implicit reasoning process in VQA, and propose a new formulation which maximizes the log-likelihood of joint distribution for the observed question and predicted answer.

Question Answering Visual Question Answering +1

Paper
Add Code

API-Net: Robust Generative Classifier via a Single Discriminator

1 code implementation • ECCV 2020 • Xinshuai Dong, Hong Liu, Rongrong Ji, Liujuan Cao, Qixiang Ye, Jianzhuang Liu, Qi Tian

On the contrary, a discriminative classifier only models the conditional distribution of labels given inputs, but benefits from effective optimization owing to its succinct structure.

Robust classification

Paper
Code

FTL: A universal framework for training low-bit DNNs via Feature Transfer

no code implementations • ECCV 2020 • Kunyuan Du, Ya zhang, Haibing Guan, Qi Tian, Shenggan Cheng, James Lin

Compared with low-bit models trained directly, the proposed framework brings 0. 5% to 3. 4% accuracy gains to three different quantization schemes.

Quantization Transfer Learning

Paper
Add Code

AlignZeg: Mitigating Objective Misalignment for Zero-shot Semantic Segmentation

no code implementations • 8 Apr 2024 • Jiannan Ge, Lingxi Xie, Hongtao Xie, Pandeng Li, Xiaopeng Zhang, Yongdong Zhang, Qi Tian

(1) Mutually-Refined Proposal Extraction.

Image Segmentation Segmentation +3

Paper
Add Code

Enhance Image Classification via Inter-Class Image Mixup with Diffusion Model

1 code implementation • 28 Mar 2024 • Zhicai Wang, Longhui Wei, Tan Wang, Heyu Chen, Yanbin Hao, Xiang Wang, Xiangnan He, Qi Tian

Text-to-image (T2I) generative models have recently emerged as a powerful tool, enabling the creation of photo-realistic images and giving rise to a multitude of applications.

Data Augmentation Image Classification

Paper
Code

BLADE: Enhancing Black-box Large Language Models with Small Domain-Specific Models

no code implementations • 27 Mar 2024 • Haitao Li, Qingyao Ai, Jia Chen, Qian Dong, Zhijing Wu, Yiqun Liu, Chong Chen, Qi Tian

However, general LLMs, which are developed on open-domain data, may lack the domain-specific knowledge essential for tasks in vertical domains, such as legal, medical, etc.

Bayesian Optimization

Paper
Add Code

DELTA: Pre-train a Discriminative Encoder for Legal Case Retrieval via Structural Word Alignment

no code implementations • 27 Mar 2024 • Haitao Li, Qingyao Ai, Xinyan Han, Jia Chen, Qian Dong, Yiqun Liu, Chong Chen, Qi Tian

Most of the existing works focus on improving the representation ability for the contextualized embedding of the [CLS] token and calculate relevance using textual semantic similarity.

Retrieval Semantic Similarity +2

Paper
Add Code

GaussianObject: Just Taking Four Images to Get A High-Quality 3D Object with Gaussian Splatting

1 code implementation • 15 Feb 2024 • Chen Yang, Sikuang Li, Jiemin Fang, Ruofan Liang, Lingxi Xie, Xiaopeng Zhang, Wei Shen, Qi Tian

Then we construct a Gaussian repair model based on diffusion models to supplement the omitted object information, where Gaussians are further refined.

Neural Rendering Object

614

Paper
Code

Towards 3D Molecule-Text Interpretation in Language Models

1 code implementation • 25 Jan 2024 • Sihang Li, Zhiyuan Liu, Yanchen Luo, Xiang Wang, Xiangnan He, Kenji Kawaguchi, Tat-Seng Chua, Qi Tian

Through 3D molecule-text alignment and 3D molecule-centric instruction tuning, 3D-MoLM establishes an integration of 3D molecular encoder and LM.

Instruction Following Language Modelling +3

Paper
Code

ChatterBox: Multi-round Multimodal Referring and Grounding

1 code implementation • 24 Jan 2024 • Yunjie Tian, Tianren Ma, Lingxi Xie, Jihao Qiu, Xi Tang, Yuan Zhang, Jianbin Jiao, Qi Tian, Qixiang Ye

In this study, we establish a baseline for a new task named multimodal multi-round referring and grounding (MRG), opening up a promising direction for instance-level multimodal dialogues.

Language Modelling Visual Grounding

Paper
Code

UMG-CLIP: A Unified Multi-Granularity Vision Generalist for Open-World Understanding

no code implementations • 12 Jan 2024 • Bowen Shi, Peisen Zhao, Zichen Wang, Yuhang Zhang, Yaoming Wang, Jin Li, Wenrui Dai, Junni Zou, Hongkai Xiong, Qi Tian, Xiaopeng Zhang

Vision-language foundation models, represented by Contrastive language-image pre-training (CLIP), have gained increasing attention for jointly understanding both vision and textual tasks.

Panoptic Segmentation Retrieval +1

Paper
Add Code

Seek for Incantations: Towards Accurate Text-to-Image Diffusion Synthesis through Prompt Engineering

no code implementations • 12 Jan 2024 • Chang Yu, Junran Peng, Xiangyu Zhu, Zhaoxiang Zhang, Qi Tian, Zhen Lei

The text-to-image synthesis by diffusion models has recently shown remarkable performance in generating high-quality images.

Image Generation Prompt Engineering

Paper
Add Code

LogFormer: A Pre-train and Tuning Pipeline for Log Anomaly Detection

1 code implementation • 9 Jan 2024 • Hongcheng Guo, Jian Yang, Jiaheng Liu, Jiaqi Bai, Boyang Wang, Zhoujun Li, Tieqiao Zheng, Bo Zhang, Junran Peng, Qi Tian

Log anomaly detection is a key component in the field of artificial intelligence for IT operations (AIOps).

Anomaly Detection

Paper
Code

Incorporating Visual Experts to Resolve the Information Loss in Multimodal Large Language Models

no code implementations • 6 Jan 2024 • Xin He, Longhui Wei, Lingxi Xie, Qi Tian

Multimodal Large Language Models (MLLMs) are experiencing rapid growth, yielding a plethora of noteworthy contributions in recent months.

Instruction Following

Paper
Add Code

DeLR: Active Learning for Detection with Decoupled Localization and Recognition Query

no code implementations • 28 Dec 2023 • Yuhang Zhang, Yuang Deng, Xiaopeng Zhang, Jie Li, Robert C. Qiu, Qi Tian

In DeLR, the query is based on region-level, and we only annotate the object region that is queried; 2) Instead of directly providing both localization and recognition annotations, we separately query the two components, and thus reduce the recognition budget with the pseudo class labels provided by the model.

Active Learning Object +2

Paper
Add Code

Preliminary Study on Incremental Learning for Large Language Model-based Recommender Systems

no code implementations • 25 Dec 2023 • Tianhao Shi, Yang Zhang, Zhijian Xu, Chong Chen, Fuli Feng, Xiangnan He, Qi Tian

Rather than directly dismissing the role of incremental learning, we ascribe this lack of anticipated performance improvement to the mismatch between the LLM4Recarchitecture and incremental learning: LLM4Rec employs a single adaptation module for learning recommendation, hampering its ability to simultaneously capture long-term and short-term user preferences in the incremental learning context.

Incremental Learning Language Modelling +2

Paper
Add Code

When Parameter-efficient Tuning Meets General-purpose Vision-language Models

1 code implementation • 16 Dec 2023 • Yihang Zhai, Haixin Wang, Jianlong Chang, Xinlong Yang, Jinan Sun, Shikun Zhang, Qi Tian

Instruction tuning has shown promising potential for developing general-purpose AI capabilities by using large-scale pre-trained models and boosts growing research to integrate multimodal information for creative applications.

Paper
Code

Cascade-Zero123: One Image to Highly Consistent 3D with Self-Prompted Nearby Views

no code implementations • 7 Dec 2023 • Yabo Chen, Jiemin Fang, YuYang Huang, Taoran Yi, Xiaopeng Zhang, Lingxi Xie, Xinggang Wang, Wenrui Dai, Hongkai Xiong, Qi Tian

We propose a cascade generation framework constructed with two Zero-1-to-3 models, named Cascade-Zero123, to tackle this issue, which progressively extracts 3D information from the source image.

Transparent objects

Paper
Add Code

Boosting Segment Anything Model Towards Open-Vocabulary Learning

1 code implementation • 6 Dec 2023 • Xumeng Han, Longhui Wei, Xuehui Yu, Zhiyang Dou, Xin He, Kuiran Wang, Zhenjun Han, Qi Tian

The recent Segment Anything Model (SAM) has emerged as a new paradigmatic vision foundation model, showcasing potent zero-shot generalization and flexible prompting.

Object Object Localization +2

Paper
Code

Segment Any 3D Gaussians

no code implementations • 1 Dec 2023 • Jiazhong Cen, Jiemin Fang, Chen Yang, Lingxi Xie, Xiaopeng Zhang, Wei Shen, Qi Tian

Interactive 3D segmentation in radiance fields is an appealing task since its importance in 3D scene understanding and manipulation.

Interactive Segmentation Scene Understanding +1

Paper
Add Code

Parameter Efficient Fine-tuning via Cross Block Orchestration for Segment Anything Model

no code implementations • 28 Nov 2023 • Zelin Peng, Zhengqin Xu, Zhilin Zeng, Lingxi Xie, Qi Tian, Wei Shen

Parameter-efficient fine-tuning (PEFT) is an effective methodology to unleash the potential of large foundation models in novel scenarios with limited training data.

Image Classification Image Segmentation +2

Paper
Add Code

GaussianEditor: Editing 3D Gaussians Delicately with Text Instructions

no code implementations • 27 Nov 2023 • Jiemin Fang, Junjie Wang, Xiaopeng Zhang, Lingxi Xie, Qi Tian

Specifically, we first extract the region of interest (RoI) corresponding to the text instruction, aligning it to 3D Gaussians.

3D scene Editing

Paper
Add Code

One-bit Supervision for Image Classification: Problem, Solution, and Beyond

no code implementations • 26 Nov 2023 • Hengtong Hu, Lingxi Xie, Xinyue Hue, Richang Hong, Qi Tian

An intriguing property of the setting is that the burden of annotation largely alleviates in comparison to offering the accurate label.

Active Learning Image Classification +2

Paper
Add Code

Probabilistic Tree-of-thought Reasoning for Answering Knowledge-intensive Complex Questions

1 code implementation • 23 Nov 2023 • Shulin Cao, Jiajie Zhang, Jiaxin Shi, Xin Lv, Zijun Yao, Qi Tian, Juanzi Li, Lei Hou

During reasoning, for leaf nodes, LLMs choose a more confident answer from Closed-book QA that employs parametric knowledge and Open-book QA that employs retrieved external knowledge, thus eliminating the negative retrieval problem.

Retrieval

Paper
Code

HalluciDoctor: Mitigating Hallucinatory Toxicity in Visual Instruction Data

1 code implementation • 22 Nov 2023 • Qifan Yu, Juncheng Li, Longhui Wei, Liang Pang, Wentao Ye, Bosheng Qin, Siliang Tang, Qi Tian, Yueting Zhuang

Multi-modal Large Language Models (MLLMs) tuned on machine-generated instruction-following data have demonstrated remarkable performance in various multi-modal understanding and generation tasks.

Attribute counterfactual +3

Paper
Code

AiluRus: A Scalable ViT Framework for Dense Prediction

1 code implementation • NeurIPS 2023 • Jin Li, Yaoming Wang, Xiaopeng Zhang, Bowen Shi, Dongsheng Jiang, Chenglin Li, Wenrui Dai, Hongkai Xiong, Qi Tian

Specifically, at the intermediate layer of the ViT, we utilize a spatial-aware density-based clustering algorithm to select representative tokens from the token sequence.

object-detection Object Detection +1

Paper
Code

GaussianDreamer: Fast Generation from Text to 3D Gaussians by Bridging 2D and 3D Diffusion Models

1 code implementation • 12 Oct 2023 • Taoran Yi, Jiemin Fang, Junjie Wang, Guanjun Wu, Lingxi Xie, Xiaopeng Zhang, Wenyu Liu, Qi Tian, Xinggang Wang

In recent times, the generation of 3D assets from text prompts has shown impressive results.

Text to 3D

522

Paper
Code

4D Gaussian Splatting for Real-Time Dynamic Scene Rendering

1 code implementation • 12 Oct 2023 • Guanjun Wu, Taoran Yi, Jiemin Fang, Lingxi Xie, Xiaopeng Zhang, Wei Wei, Wenyu Liu, Qi Tian, Xinggang Wang

Representing and rendering dynamic scenes has been an important but challenging task.

1,667

Paper
Code

QA-LoRA: Quantization-Aware Low-Rank Adaptation of Large Language Models

1 code implementation • 26 Sep 2023 • Yuhui Xu, Lingxi Xie, Xiaotao Gu, Xin Chen, Heng Chang, Hengheng Zhang, Zhengsu Chen, Xiaopeng Zhang, Qi Tian

Recently years have witnessed a rapid development of large language models (LLMs).

Quantization

5,949

Paper
Code

Computation-efficient Deep Learning for Computer Vision: A Survey

no code implementations • 27 Aug 2023 • Yulin Wang, Yizeng Han, Chaofei Wang, Shiji Song, Qi Tian, Gao Huang

Over the past decade, deep learning models have exhibited considerable advancements, reaching or even exceeding human-level performance in a range of visual perception tasks.

Autonomous Vehicles Edge-computing +1

Paper
Add Code

A Bi-Step Grounding Paradigm for Large Language Models in Recommendation Systems

1 code implementation • 16 Aug 2023 • Keqin Bao, Jizhi Zhang, Wenjie Wang, Yang Zhang, Zhengyi Yang, Yancheng Luo, Chong Chen, Fuli Feng, Qi Tian

As the focus on Large Language Models (LLMs) in the field of recommendation intensifies, the optimization of LLMs for recommendation purposes (referred to as LLM4Rec) assumes a crucial role in augmenting their effectiveness in providing recommendations.

Collaborative Filtering Recommendation Systems

Paper
Code

Prune Spatio-temporal Tokens by Semantic-aware Temporal Accumulation

1 code implementation • ICCV 2023 • Shuangrui Ding, Peisen Zhao, Xiaopeng Zhang, Rui Qian, Hongkai Xiong, Qi Tian

Based on the STA score, we are able to progressively prune the tokens without introducing any additional parameters or requiring further re-training.

Video Recognition

Paper
Code

SDDM: Score-Decomposed Diffusion Models on Manifolds for Unpaired Image-to-Image Translation

no code implementations • 4 Aug 2023 • Shikun Sun, Longhui Wei, Junliang Xing, Jia Jia, Qi Tian

Recent score-based diffusion models (SBDMs) show promising results in unpaired image-to-image translation (I2I).

Image Denoising Image-to-Image Translation

Paper
Add Code

Degeneration-Tuning: Using Scrambled Grid shield Unwanted Concepts from Stable Diffusion

no code implementations • 2 Aug 2023 • Zixuan Ni, Longhui Wei, Jiacheng Li, Siliang Tang, Yueting Zhuang, Qi Tian

In this work, we propose a novel strategy named \textbf{Degeneration-Tuning (DT)} to shield contents of unwanted concepts from SD weights.

Paper
Add Code

Human Motion Generation: A Survey

no code implementations • 20 Jul 2023 • Wentao Zhu, Xiaoxuan Ma, Dongwoo Ro, Hai Ci, Jinlu Zhang, Jiaxin Shi, Feng Gao, Qi Tian, Yizhou Wang

In this survey, we present a comprehensive literature review of human motion generation, which, to the best of our knowledge, is the first of its kind in this field.

Paper
Add Code

Hybrid Distillation: Connecting Masked Autoencoders with Contrastive Learners

no code implementations • 28 Jun 2023 • Bowen Shi, Xiaopeng Zhang, Yaoming Wang, Jin Li, Wenrui Dai, Junni Zou, Hongkai Xiong, Qi Tian

In order to better obtain both discrimination and diversity, we propose a simple but effective Hybrid Distillation strategy, which utilizes both the supervised/CL teacher and the MIM teacher to jointly guide the student model.

Contrastive Learning Representation Learning

Paper
Add Code

Towards AGI in Computer Vision: Lessons Learned from GPT and Large Language Models

no code implementations • 14 Jun 2023 • Lingxi Xie, Longhui Wei, Xiaopeng Zhang, Kaifeng Bi, Xiaotao Gu, Jianlong Chang, Qi Tian

In this paper, we start with a conceptual definition of AGI and briefly review how NLP solves a wide range of tasks via a chat system.

Paper
Add Code

Exploring Effective Mask Sampling Modeling for Neural Image Compression

no code implementations • 9 Jun 2023 • Lin Liu, Mingming Zhao, Shanxin Yuan, Wenlong Lyu, Wengang Zhou, Houqiang Li, Yanfeng Wang, Qi Tian

Specifically, Cube Mask Sampling Module (CMSM) is proposed to apply both spatial and channel mask sampling modeling to image compression in the pre-training stage.

Image Compression Self-Supervised Learning

Paper
Add Code

Joint Channel Estimation and Feedback with Masked Token Transformers in Massive MIMO Systems

no code implementations • 8 Jun 2023 • Mingming Zhao, Lin Liu, Lifu Liu, Mengke Li, Qi Tian

To achieve joint channel estimation and feedback, this paper proposes an encoder-decoder based network that unveils the intrinsic frequency-domain correlation within the CSI matrix.

Denoising

Paper
Add Code

Reasoning over Hierarchical Question Decomposition Tree for Explainable Question Answering

no code implementations • 24 May 2023 • Jiajie Zhang, Shulin Cao, Tingjia Zhang, Xin Lv, Jiaxin Shi, Qi Tian, Juanzi Li, Lei Hou

To facilitate reasoning, we propose a novel two-stage XQA framework, Reasoning over Hierarchical Question Decomposition Tree (RoHT).

Question Answering

Paper
Add Code

ControlVideo: Training-free Controllable Text-to-Video Generation

1 code implementation • 22 May 2023 • Yabo Zhang, Yuxiang Wei, Dongsheng Jiang, Xiaopeng Zhang, WangMeng Zuo, Qi Tian

Text-driven diffusion models have unlocked unprecedented abilities in image generation, whereas their video counterpart still lags behind due to the excessive training cost of temporal modeling.

Image Generation Text-to-Video Generation +1

694

Paper
Code

Advancing Incremental Few-shot Semantic Segmentation via Semantic-guided Relation Alignment and Adaptation

no code implementations • 18 May 2023 • Yuan Zhou, Xin Chen, Yanrong Guo, Shijie Hao, Richang Hong, Qi Tian

Incremental few-shot semantic segmentation (IFSS) aims to incrementally extend a semantic segmentation model to novel classes according to only a few pixel-level annotated data, while preserving its segmentation capability on previously learned base categories.

Few-Shot Semantic Segmentation Incremental Learning +3

Paper
Add Code

Continual Vision-Language Representation Learning with Off-Diagonal Information

no code implementations • 11 May 2023 • Zixuan Ni, Longhui Wei, Siliang Tang, Yueting Zhuang, Qi Tian

Moreover, we empirically and theoretically demonstrate how SD leads to a performance decline for CLIP on cross-modal retrieval tasks.

Continual Learning Contrastive Learning +4

Paper
Add Code

Visual Tuning

no code implementations • 10 May 2023 • Bruce X. B. Yu, Jianlong Chang, Haixin Wang, Lingbo Liu, Shijie Wang, Zhiyu Wang, Junfan Lin, Lingxi Xie, Haojie Li, Zhouchen Lin, Qi Tian, Chang Wen Chen

With the surprising development of pre-trained visual foundation models, visual tuning jumped out of the standard modus operandi that fine-tunes the whole pre-trained model or just the fully connected layer.

Paper
Add Code

Segment Anything in 3D with Radiance Fields

1 code implementation • NeurIPS 2023 • Jiazhong Cen, Jiemin Fang, Zanwei Zhou, Chen Yang, Lingxi Xie, Xiaopeng Zhang, Wei Shen, Qi Tian

The Segment Anything Model (SAM) emerges as a powerful vision foundation model to generate high-quality 2D segmentation results.

Inverse Rendering Segmentation

791

Paper
Code

Pipeline MoE: A Flexible MoE Implementation with Pipeline Parallelism

no code implementations • 22 Apr 2023 • Xin Chen, Hengheng Zhang, Xiaotao Gu, Kaifeng Bi, Lingxi Xie, Qi Tian

The Mixture of Experts (MoE) model becomes an important choice of large language models nowadays because of its scalability with sublinear computational complexity for training and inference.

Paper
Add Code

SAILER: Structure-aware Pre-trained Language Model for Legal Case Retrieval

1 code implementation • 22 Apr 2023 • Haitao Li, Qingyao Ai, Jia Chen, Qian Dong, Yueyue Wu, Yiqun Liu, Chong Chen, Qi Tian

Moreover, in contrast to the general retrieval, the relevance in the legal domain is sensitive to key legal elements.

Language Modelling Retrieval

Paper
Code

Learning Transferable Pedestrian Representation from Multimodal Information Supervision

1 code implementation • 12 Apr 2023 • Liping Bao, Longhui Wei, Xiaoyu Qiu, Wengang Zhou, Houqiang Li, Qi Tian

Recent researches on unsupervised person re-identification~(reID) have demonstrated that pre-training on unlabeled person images achieves superior performance on downstream reID tasks than pre-training on ImageNet.

Ranked #2 on Unsupervised Person Re-Identification on DukeMTMC-reID

Attribute Contrastive Learning +3

Paper
Code

PSLT: A Light-weight Vision Transformer with Ladder Self-Attention and Progressive Shift

no code implementations • 7 Apr 2023 • Gaojie Wu, Wei-Shi Zheng, Yutong Lu, Qi Tian

In this work, we propose a ladder self-attention block with multiple branches and a progressive shift mechanism to develop a light-weight transformer backbone that requires less computing resources (e. g. a relatively small number of parameters and FLOPs), termed Progressive Shift Ladder Transformer (PSLT).

Image Classification Person Re-Identification

Paper
Add Code

Multi-modal Prompting for Low-Shot Temporal Action Localization

no code implementations • 21 Mar 2023 • Chen Ju, Zeqian Li, Peisen Zhao, Ya zhang, Xiaopeng Zhang, Qi Tian, Yanfeng Wang, Weidi Xie

In this paper, we consider the problem of temporal action localization under low-shot (zero-shot & few-shot) scenario, with the goal of detecting and classifying the action instances from arbitrary categories within some untrimmed videos, even not seen at training time.

Action Classification Temporal Action Localization

Paper
Add Code

LION: Implicit Vision Prompt Tuning

no code implementations • 17 Mar 2023 • Haixin Wang, Jianlong Chang, Xiao Luo, Jinan Sun, Zhouchen Lin, Qi Tian

Despite recent competitive performance across a range of vision tasks, vision Transformers still have an issue of heavy computational costs.

Transfer Learning

Paper
Add Code

Focus on Your Target: A Dual Teacher-Student Framework for Domain-adaptive Semantic Segmentation

no code implementations • ICCV 2023 • Xinyue Huo, Lingxi Xie, Wengang Zhou, Houqiang Li, Qi Tian

Currently, a popular UDA framework lies in self-training which endows the model with two-fold abilities: (i) learning reliable semantics from the labeled images in the source domain, and (ii) adapting to the target domain via generating pseudo labels on the unlabeled images.

Semantic Segmentation Unsupervised Domain Adaptation

Paper
Add Code

USAGE: A Unified Seed Area Generation Paradigm for Weakly Supervised Semantic Segmentation

no code implementations • ICCV 2023 • Zelin Peng, Guanchun Wang, Lingxi Xie, Dongsheng Jiang, Wei Shen, Qi Tian

Seed area generation is usually the starting point of weakly supervised semantic segmentation (WSSS).

Multi-Label Classification Weakly supervised Semantic Segmentation +1

Paper
Add Code

Gradient-Regulated Meta-Prompt Learning for Generalizable Vision-Language Models

no code implementations • ICCV 2023 • Juncheng Li, Minghe Gao, Longhui Wei, Siliang Tang, Wenqiao Zhang, Mengze Li, Wei Ji, Qi Tian, Tat-Seng Chua, Yueting Zhuang

Prompt tuning, a recently emerging paradigm, enables the powerful vision-language pre-training models to adapt to downstream tasks in a parameter -- and data -- efficient way, by learning the ``soft prompts'' to condition frozen pre-training models.

Domain Generalization Few-Shot Learning +1

Paper
Add Code

M-Tuning: Prompt Tuning with Mitigated Label Bias in Open-Set Scenarios

no code implementations • 9 Mar 2023 • Ning Liao, Xiaopeng Zhang, Min Cao, Junchi Yan, Qi Tian

In realistic open-set scenarios where labels of a part of testing data are totally unknown, when vision-language (VL) prompt learning methods encounter inputs related to unknown classes (i. e., not seen during training), they always predict them as one of the training classes.

Open Set Learning

Paper
Add Code

Rethinking Visual Prompt Learning as Masked Visual Token Modeling

no code implementations • 9 Mar 2023 • Ning Liao, Bowen Shi, Xiaopeng Zhang, Min Cao, Junchi Yan, Qi Tian

To explore prompt learning on the generative pre-trained visual model, as well as keeping the task consistency, we propose Visual Prompt learning as masked visual Token Modeling (VPTM) to transform the downstream visual classification into the pre-trained masked visual token prediction.

Paper
Add Code

Lformer: Text-to-Image Generation with L-shape Block Parallel Decoding

no code implementations • 7 Mar 2023 • Jiacheng Li, Longhui Wei, Zongyuan Zhan, Xin He, Siliang Tang, Qi Tian, Yueting Zhuang

To better accelerate the generative transformers while keeping good generation quality, we propose Lformer, a semi-autoregressive text-to-image generation model.

Text-to-Image Generation

Paper
Add Code

Constraint and Union for Partially-Supervised Temporal Sentence Grounding

no code implementations • 20 Feb 2023 • Chen Ju, Haicheng Wang, Jinxiang Liu, Chaofan Ma, Ya zhang, Peisen Zhao, Jianlong Chang, Qi Tian

Temporal sentence grounding aims to detect the event timestamps described by the natural language query from given untrimmed videos.

Sentence Temporal Sentence Grounding

Paper
Add Code

ShiftDDPMs: Exploring Conditional Diffusion Models by Shifting Diffusion Trajectories

no code implementations • 5 Feb 2023 • Zijian Zhang, Zhou Zhao, Jun Yu, Qi Tian

In this paper, we propose a novel and flexible conditional diffusion model by introducing conditions into the forward process.

Denoising Image Generation

Paper
Add Code

Federated Domain Generalization With Generalization Adjustment

1 code implementation • CVPR 2023 • Ruipeng Zhang, Qinwei Xu, Jiangchao Yao, Ya zhang, Qi Tian, Yanfeng Wang

Federated Domain Generalization (FedDG) attempts to learn a global model in a privacy-preserving manner that generalizes well to new clients possibly with domain shift.

Domain Generalization Fairness +1

Paper
Code

Open-Set Fine-Grained Retrieval via Prompting Vision-Language Evaluator

no code implementations • CVPR 2023 • Shijie Wang, Jianlong Chang, Haojie Li, Zhihui Wang, Wanli Ouyang, Qi Tian

PLEor could leverage pre-trained CLIP model to infer the discrepancies encompassing both pre-defined and unknown subcategories, called category-specific discrepancies, and transfer them to the backbone network trained in the close-set scenarios.

Knowledge Distillation Retrieval +1

Paper
Add Code

Adapting Shortcut With Normalizing Flow: An Efficient Tuning Framework for Visual Recognition

1 code implementation • CVPR 2023 • Yaoming Wang, Bowen Shi, Xiaopeng Zhang, Jin Li, Yuchen Liu, Wenrui Dai, Chenglin Li, Hongkai Xiong, Qi Tian

To mitigate the computational and storage demands, recent research has explored Parameter-Efficient Fine-Tuning (PEFT), which focuses on tuning a minimal number of parameters for efficient adaptation.

Paper
Code

Prototype-guided Cross-task Knowledge Distillation for Large-scale Models

1 code implementation • 26 Dec 2022 • Deng Li, Aming Wu, Yahong Han, Qi Tian

Considering the complexity and variability of real scene tasks, we propose a Prototype-guided Cross-task Knowledge Distillation (ProC-KD) approach to transfer the intrinsic local-level object knowledge of a large-scale teacher network to various task scenarios.

Knowledge Distillation

Paper
Code

Distilling Vision-Language Pre-training to Collaborate with Weakly-Supervised Temporal Action Localization

no code implementations • CVPR 2023 • Chen Ju, Kunhao Zheng, Jinxiang Liu, Peisen Zhao, Ya zhang, Jianlong Chang, Yanfeng Wang, Qi Tian

And as a result, the dual-branch complementarity is effectively fused to promote a strong alliance.

Weakly-supervised Temporal Action Localization Weakly Supervised Temporal Action Localization

Paper
Add Code

FedSkip: Combatting Statistical Heterogeneity with Federated Skip Aggregation

1 code implementation • 14 Dec 2022 • Ziqing Fan, Yanfeng Wang, Jiangchao Yao, Lingjuan Lyu, Ya zhang, Qi Tian

However, in addition to previous explorations for improvement in federated averaging, our analysis shows that another critical bottleneck is the poorer optima of client models in more heterogeneous conditions.

Federated Learning

Paper
Code

Feature Calibration Network for Occluded Pedestrian Detection

no code implementations • 12 Dec 2022 • Tianliang Zhang, Qixiang Ye, Baochang Zhang, Jianzhuang Liu, Xiaopeng Zhang, Qi Tian

FC-Net is based on the observation that the visible parts of pedestrians are selective and decisive for detection, and is implemented as a self-paced feature learning framework with a self-activation (SA) module and a feature calibration (FC) module.

Pedestrian Detection

Paper
Add Code

ConfounderGAN: Protecting Image Data Privacy with Causal Confounder

no code implementations • 4 Dec 2022 • Qi Tian, Kun Kuang, Kelu Jiang, Furui Liu, Zhihua Wang, Fei Wu

The success of deep learning is partly attributed to the availability of massive data downloaded freely from the Internet.

Generative Adversarial Network Image Classification

Paper
Add Code

Learning from Good Trajectories in Offline Multi-Agent Reinforcement Learning

no code implementations • 28 Nov 2022 • Qi Tian, Kun Kuang, Furui Liu, Baoxiang Wang

e. g., an agent is a random policy while other agents are medium policies.

Continuous Control Graph Attention +5

Paper
Add Code

Fast-iTPN: Integrally Pre-Trained Transformer Pyramid Network with Token Migration

1 code implementation • CVPR 2023 • Yunjie Tian, Lingxi Xie, Jihao Qiu, Jianbin Jiao, YaoWei Wang, Qi Tian, Qixiang Ye

iTPN is born with two elaborated designs: 1) The first pre-trained feature pyramid upon vision transformer (ViT).

object-detection Object Detection +1

150

Paper
Code

Pangu-Weather: A 3D High-Resolution Model for Fast and Accurate Global Weather Forecast

3 code implementations • 3 Nov 2022 • Kaifeng Bi, Lingxi Xie, Hengheng Zhang, Xin Chen, Xiaotao Gu, Qi Tian

In this paper, we present Pangu-Weather, a deep learning based system for fast and accurate global weather forecast.

932

Paper
Code

Being Comes from Not-being: Open-vocabulary Text-to-Motion Generation with Wordless Training

1 code implementation • CVPR 2023 • Junfan Lin, Jianlong Chang, Lingbo Liu, Guanbin Li, Liang Lin, Qi Tian, Chang Wen Chen

During inference, instead of changing the motion generator, our method reformulates the input text into a masked motion as the prompt for the motion generator to ``reconstruct'' the motion.

Language Modelling Zero-Shot Learning

Paper
Code

End-to-End Context-Aided Unicity Matching for Person Re-identification

no code implementations • 20 Oct 2022 • Min Cao, Cong Ding, Chen Chen, Junchi Yan, Qi Tian

Based on a natural assumption that images belonging to the same person identity should not match with images belonging to multiple different person identities across views, called the unicity of person matching on the identity level, we propose an end-to-end person unicity matching architecture for learning and refining the person matching relations.

Graph Matching Person Re-Identification

Paper
Add Code

See Blue Sky: Deep Image Dehaze Using Paired and Unpaired Training Images

2 code implementations • 14 Oct 2022 • Xiaoyan Zhang, Gaoyang Tang, Yingying Zhu, Qi Tian

The issue of image haze removal has attracted wide attention in recent years.

Generative Adversarial Network

Paper
Code

Towards a Unified View on Visual Parameter-Efficient Transfer Learning

1 code implementation • 3 Oct 2022 • Bruce X. B. Yu, Jianlong Chang, Lingbo Liu, Qi Tian, Chang Wen Chen

Towards this goal, we propose a framework with a unified view of PETL called visual-PETL (V-PETL) to investigate the effects of different PETL techniques, data scales of downstream domains, positions of trainable parameters, and other aspects affecting the trade-off.

Action Recognition Image Classification +2

Paper
Code

Motion-inductive Self-supervised Object Discovery in Videos

no code implementations • 1 Oct 2022 • Shuangrui Ding, Weidi Xie, Yabo Chen, Rui Qian, Xiaopeng Zhang, Hongkai Xiong, Qi Tian

In this paper, we consider the task of unsupervised object discovery in videos.

Ranked #3 on Unsupervised Object Segmentation on DAVIS 2016

Object Object Discovery +5

Paper
Add Code

Learnable Distribution Calibration for Few-Shot Class-Incremental Learning

no code implementations • 1 Oct 2022 • Binghao Liu, Boyu Yang, Lingxi Xie, Ren Wang, Qi Tian, Qixiang Ye

LDC is built upon a parameterized calibration unit (PCU), which initializes biased distributions for all classes based on classifier vectors (memory-free) and a single covariance matrix.

Few-Shot Class-Incremental Learning Few-Shot Learning +2

Paper
Add Code

Low-Light Video Enhancement with Synthetic Event Guidance

no code implementations • 23 Aug 2022 • Lin Liu, Junfeng An, Jianzhuang Liu, Shanxin Yuan, Xiangyu Chen, Wengang Zhou, Houqiang Li, Yanfeng Wang, Qi Tian

Low-light video enhancement (LLVE) is an important yet challenging task with many applications such as photographing and autonomous driving.

Autonomous Driving Image Enhancement +1

Paper
Add Code

Prompt-Matched Semantic Segmentation

no code implementations • 22 Aug 2022 • Lingbo Liu, Jianlong Chang, Bruce X. B. Yu, Liang Lin, Qi Tian, Chang-Wen Chen

Previous methods usually fine-tuned the entire networks for each specific dataset, which will be burdensome to store massive parameters of these networks.

Representation Learning Segmentation +2

Paper
Add Code

Fine-Grained Semantically Aligned Vision-Language Pre-Training

1 code implementation • 4 Aug 2022 • Juncheng Li, Xin He, Longhui Wei, Long Qian, Linchao Zhu, Lingxi Xie, Yueting Zhuang, Qi Tian, Siliang Tang

Large-scale vision-language pre-training has shown impressive advances in a wide range of downstream tasks.

object-detection Object Detection +1

Paper
Code

Dilated Context Integrated Network with Cross-Modal Consensus for Temporal Emotion Localization in Videos

1 code implementation • 3 Aug 2022 • Juncheng Li, Junlin Xie, Linchao Zhu, Long Qian, Siliang Tang, Wenqiao Zhang, Haochen Shi, Shengyu Zhang, Longhui Wei, Qi Tian, Yueting Zhuang

In this paper, we introduce a new task, named Temporal Emotion Localization in videos~(TEL), which aims to detect human emotions and localize their corresponding temporal boundaries in untrimmed videos with aligned subtitles.

Emotion Classification Temporal Action Localization +1

Paper
Code

Skeleton-Parted Graph Scattering Networks for 3D Human Motion Prediction

1 code implementation • 31 Jul 2022 • Maosen Li, Siheng Chen, Zijing Zhang, Lingxi Xie, Qi Tian, Ya zhang

To address the first issue, we propose adaptive graph scattering, which leverages multiple trainable band-pass graph filters to decompose pose features into richer graph spectrum bands.

Human motion prediction motion prediction

Paper
Code

SdAE: Self-distillated Masked Autoencoder

1 code implementation • 31 Jul 2022 • Yabo Chen, Yuchen Liu, Dongsheng Jiang, Xiaopeng Zhang, Wenrui Dai, Hongkai Xiong, Qi Tian

We also analyze how to build good views for the teacher branch to produce latent representation from the perspective of information bottleneck.

Descriptive Self-Supervised Learning

Paper
Code

Fine-grained Retrieval Prompt Tuning

no code implementations • 29 Jul 2022 • Shijie Wang, Jianlong Chang, Zhihui Wang, Haojie Li, Wanli Ouyang, Qi Tian

In this paper, we develop Fine-grained Retrieval Prompt Tuning (FRPT), which steers a frozen pre-trained model to perform the fine-grained retrieval task from the perspectives of sample prompting and feature adaptation.

Retrieval

Paper
Add Code

Visual Recognition by Request

1 code implementation • CVPR 2023 • Chufeng Tang, Lingxi Xie, Xiaopeng Zhang, Xiaolin Hu, Qi Tian

Humans have the ability of recognizing visual semantics in an unlimited granularity, but existing visual recognition algorithms cannot achieve this goal.

Instance Segmentation Semantic Segmentation

Paper
Code

Pro-tuning: Unified Prompt Tuning for Vision Tasks

no code implementations • 28 Jul 2022 • Xing Nie, Bolin Ni, Jianlong Chang, Gaomeng Meng, Chunlei Huo, Zhaoxiang Zhang, Shiming Xiang, Qi Tian, Chunhong Pan

To this end, we propose parameter-efficient Prompt tuning (Pro-tuning) to adapt frozen vision models to various downstream vision tasks.

Adversarial Robustness Image Classification +4

Paper
Add Code

Active Pointly-Supervised Instance Segmentation

1 code implementation • 23 Jul 2022 • Chufeng Tang, Lingxi Xie, Gang Zhang, Xiaopeng Zhang, Qi Tian, Xiaolin Hu

In this paper, we present an economic active learning setting, named active pointly-supervised instance segmentation (APIS), which starts with box-level annotations and iteratively samples a point within the box and asks if it falls on the object.

Active Learning Instance Segmentation +2

Paper
Code

Entity-enhanced Adaptive Reconstruction Network for Weakly Supervised Referring Expression Grounding

1 code implementation • 18 Jul 2022 • Xuejing Liu, Liang Li, Shuhui Wang, Zheng-Jun Zha, Zechao Li, Qi Tian, Qingming Huang

Second, most previous weakly supervised REG methods ignore the discriminative location and context of the referent, causing difficulties in distinguishing the target from other same-category objects.

Attribute Referring Expression +2

Paper
Code

A Survey on Label-efficient Deep Image Segmentation: Bridging the Gap between Weak Supervision and Dense Prediction

no code implementations • 4 Jul 2022 • Wei Shen, Zelin Peng, Xuehui Wang, Huayu Wang, Jiazhong Cen, Dongsheng Jiang, Lingxi Xie, Xiaokang Yang, Qi Tian

Next, we summarize the existing label-efficient image segmentation methods from a unified perspective that discusses an important question: how to bridge the gap between weak supervision and dense prediction -- the current methods are mostly based on heuristic priors, such as cross-pixel similarity, cross-label constraint, cross-view consistency, and cross-image relation.

Image Segmentation Instance Segmentation +2

Paper
Add Code

Reading and Writing: Discriminative and Generative Modeling for Self-Supervised Text Recognition

1 code implementation • 1 Jul 2022 • Mingkun Yang, Minghui Liao, Pu Lu, Jing Wang, Shenggao Zhu, Hualin Luo, Qi Tian, Xiang Bai

Inspired by the observation that humans learn to recognize the texts through both reading and writing, we propose to learn discrimination and generation by integrating contrastive learning and masked image modeling in our self-supervised method.

Contrastive Learning Scene Text Recognition

Paper
Code

Towards Generalizable Person Re-identification with a Bi-stream Generative Model

no code implementations • 19 Jun 2022 • Xin Xu, Wei Liu, Zheng Wang, Ruiming Hu, Qi Tian

Guided by original pedestrian images, one stream is employed to learn a camera-invariant global feature for the CC problem via filtering cross-camera interference factors.

Domain Generalization Generalizable Person Re-identification

Paper
Add Code

Masked Autoencoders are Robust Data Augmentors

1 code implementation • 10 Jun 2022 • Haohang Xu, Shuangrui Ding, Xiaopeng Zhang, Hongkai Xiong, Qi Tian

Specifically, MRA consistently enhances the performance on supervised, semi-supervised as well as few-shot classification.

Image Augmentation Image Classification +1

Paper
Code

DE-Net: Dynamic Text-guided Image Editing Adversarial Networks

1 code implementation • 2 Jun 2022 • Ming Tao, Bing-Kun Bao, Hao Tang, Fei Wu, Longhui Wei, Qi Tian

To solve these limitations, we propose: (i) a Dynamic Editing Block (DEBlock) which composes different editing modules dynamically for various editing requirements.

text-guided-image-editing

Paper
Code

Fast Dynamic Radiance Fields with Time-Aware Neural Voxels

1 code implementation • 30 May 2022 • Jiemin Fang, Taoran Yi, Xinggang Wang, Lingxi Xie, Xiaopeng Zhang, Wenyu Liu, Matthias Nießner, Qi Tian

A multi-distance interpolation method is proposed and applied on voxel features to model both small and large motions.

310

Paper
Code

HiViT: Hierarchical Vision Transformer Meets Masked Image Modeling

1 code implementation • 30 May 2022 • Xiaosong Zhang, Yunjie Tian, Wei Huang, Qixiang Ye, Qi Dai, Lingxi Xie, Qi Tian

A key idea of efficient implementation is to discard the masked image patches (or tokens) throughout the target network (encoder), which requires the encoder to be a plain vision transformer (e. g., ViT), albeit hierarchical vision transformers (e. g., Swin Transformer) have potentially better properties in formulating vision inputs.

Transfer Learning

Paper
Code

HiVLP: Hierarchical Vision-Language Pre-Training for Fast Image-Text Retrieval

no code implementations • 24 May 2022 • Feilong Chen, Xiuyi Chen, Jiaxin Shi, Duzhen Zhang, Jianlong Chang, Qi Tian

It also achieves about +4. 9 AR on COCO and +3. 8 AR on Flickr30K than LightingDot and achieves comparable performance with the state-of-the-art (SOTA) fusion-based model METER.

Cross-Modal Retrieval Retrieval +1

Paper
Add Code

GraphQ IR: Unifying the Semantic Parsing of Graph Query Languages with One Intermediate Representation

1 code implementation • 24 May 2022 • Lunyiu Nie, Shulin Cao, Jiaxin Shi, Jiuding Sun, Qi Tian, Lei Hou, Juanzi Li, Jidong Zhai

Subject to the huge semantic gap between natural and formal languages, neural semantic parsing is typically bottlenecked by its complexity of dealing with both input semantics and output syntax.

Few-Shot Learning Semantic Parsing

Paper
Code

CenterNet++ for Object Detection

2 code implementations • 18 Apr 2022 • Kaiwen Duan, Song Bai, Lingxi Xie, Honggang Qi, Qingming Huang, Qi Tian

Our approach, named CenterNet, detects each object as a triplet keypoints (top-left and bottom-right corners and the center keypoint).

Ranked #35 on Object Detection on COCO test-dev

Object object-detection +1

177

Paper
Code

HyperDet3D: Learning a Scene-conditioned 3D Object Detector

no code implementations • CVPR 2022 • Yu Zheng, Yueqi Duan, Jiwen Lu, Jie zhou, Qi Tian

A bathtub in a library, a sink in an office, a bed in a laundry room -- the counter-intuition suggests that scene provides important prior knowledge for 3D object detection, which instructs to eliminate the ambiguous detection of similar objects.

3D Object Detection Object +1

Paper
Add Code

Domain-Agnostic Prior for Transfer Semantic Segmentation

no code implementations • CVPR 2022 • Xinyue Huo, Lingxi Xie, Hengtong Hu, Wengang Zhou, Houqiang Li, Qi Tian

Unsupervised domain adaptation (UDA) is an important topic in the computer vision community.

Representation Learning Semantic Segmentation +1

Paper
Add Code

Beyond Masking: Demystifying Token-Based Pre-Training for Vision Transformers

1 code implementation • 27 Mar 2022 • Yunjie Tian, Lingxi Xie, Jiemin Fang, Mengnan Shi, Junran Peng, Xiaopeng Zhang, Jianbin Jiao, Qi Tian, Qixiang Ye

The past year has witnessed a rapid development of masked image modeling (MIM).

Paper
Code

DATA: Domain-Aware and Task-Aware Self-supervised Learning

1 code implementation • CVPR 2022 • Qing Chang, Junran Peng, Lingxie Xie, Jiajun Sun, Haoran Yin, Qi Tian, Zhaoxiang Zhang

However, due to the high training costs and the unconsciousness of downstream usages, most self-supervised learning methods lack the capability to correspond to the diversities of downstream scenarios, as there are various data domains, different vision tasks and latency constraints on models.

Image Classification Model Selection +5

Paper
Code

TAPE: Task-Agnostic Prior Embedding for Image Restoration

no code implementations • 11 Mar 2022 • Lin Liu, Lingxi Xie, Xiaopeng Zhang, Shanxin Yuan, Xiangyu Chen, Wengang Zhou, Houqiang Li, Qi Tian

In this paper, we propose a novel approach that embeds a task-agnostic prior into a transformer.

Image Restoration

Paper
Add Code

Deep Class Incremental Learning from Decentralized Data

no code implementations • 11 Mar 2022 • Xiaohan Zhang, Songlin Dong, Jinjie Chen, Qi Tian, Yihong Gong, Xiaopeng Hong

In this paper, we focus on a new and challenging decentralized machine learning paradigm in which there are continuous inflows of data to be addressed and the data are stored in multiple repositories.

Class Incremental Learning Incremental Learning +1

Paper
Add Code

MVP: Multimodality-guided Visual Pre-training

no code implementations • 10 Mar 2022 • Longhui Wei, Lingxi Xie, Wengang Zhou, Houqiang Li, Qi Tian

Recently, masked image modeling (MIM) has become a promising direction for visual pre-training.

Language Modelling

Paper
Add Code

The KFIoU Loss for Rotated Object Detection

3 code implementations • 29 Jan 2022 • Xue Yang, Yue Zhou, Gefan Zhang, Jirui Yang, Wentao Wang, Junchi Yan, Xiaopeng Zhang, Qi Tian

This is in contrast to recent Gaussian modeling based rotation detectors e. g. GWD loss and KLD loss that involve a human-specified distribution distance metric which require additional hyperparameter tuning that vary across datasets and detectors.

Object object-detection +1

1,724

Paper
Code

GhostNets on Heterogeneous Devices via Cheap Operations

8 code implementations • 10 Jan 2022 • Kai Han, Yunhe Wang, Chang Xu, Jianyuan Guo, Chunjing Xu, Enhua Wu, Qi Tian

The proposed C-Ghost module can be taken as a plug-and-play component to upgrade existing convolutional neural networks.

3,803

Paper
Code

Learning To Learn by Jointly Optimizing Neural Architecture and Weights

no code implementations • CVPR 2022 • Yadong Ding, Yu Wu, Chengyue Huang, Siliang Tang, Yi Yang, Longhui Wei, Yueting Zhuang, Qi Tian

Existing NAS-based meta-learning methods apply a two-stage strategy, i. e., first searching architectures and then re-training meta-weights on the searched architecture.

Meta-Learning

Paper
Add Code

Contextual Similarity Distillation for Asymmetric Image Retrieval

no code implementations • CVPR 2022 • Hui Wu, Min Wang, Wengang Zhou, Houqiang Li, Qi Tian

To this end, we propose a flexible contextual similarity distillation framework to enhance the small query model and keep its output feature compatible with that of large gallery model, which is crucial with asymmetric retrieval.

Image Retrieval Retrieval

Paper
Add Code

Wnet: Audio-Guided Video Object Segmentation via Wavelet-Based Cross-Modal Denoising Networks

1 code implementation • CVPR 2022 • Wenwen Pan, Haonan Shi, Zhou Zhao, Jieming Zhu, Xiuqiang He, Zhigeng Pan, Lianli Gao, Jun Yu, Fei Wu, Qi Tian

Audio-Guided video semantic segmentation is a challenging problem in visual analysis and editing, which automatically separates foreground objects from background in a video sequence according to the referring audio expressions.

Denoising Segmentation +3

Paper
Code

DeeCap: Dynamic Early Exiting for Efficient Image Captioning

1 code implementation • CVPR 2022 • Zhengcong Fei, Xu Yan, Shuhui Wang, Qi Tian

On one hand, the representation in shallow layers lacks high-level semantic and sufficient cross-modal fusion information for accurate prediction.

Image Captioning Imitation Learning

Paper
Code

Partial Class Activation Attention for Semantic Segmentation

1 code implementation • CVPR 2022 • Sun-Ao Liu, Hongtao Xie, Hai Xu, Yongdong Zhang, Qi Tian

Current attention-based methods for semantic segmentation mainly model pixel relation through pairwise affinity and coarse segmentation.

Relation Segmentation +1

Paper
Code

One-Bit Active Query With Contrastive Pairs

no code implementations • CVPR 2022 • Yuhang Zhang, Xiaopeng Zhang, Lingxi Xie, Jie Li, Robert C. Qiu, Hengtong Hu, Qi Tian

The Yes query is treated as positive pairs of the queried category for contrastive pulling, while the No query is treated as hard negative pairs for contrastive repelling.

Active Learning Contrastive Learning

Paper
Add Code

General Greedy De-bias Learning

1 code implementation • 20 Dec 2021 • Xinzhe Han, Shuhui Wang, Chi Su, Qingming Huang, Qi Tian

Existing de-bias learning frameworks try to capture specific dataset bias by annotations but they fail to handle complicated OOD scenarios.

Image Classification Question Answering +1

Paper
Code

CGIBNet: Bandwidth-constrained Communication with Graph Information Bottleneck in Multi-Agent Reinforcement Learning

no code implementations • 20 Dec 2021 • Qi Tian, Kun Kuang, Baoxiang Wang, Furui Liu, Fei Wu

The node information compression aims to address the problem of what to communicate via learning compact node representations.

Multi-agent Reinforcement Learning reinforcement-learning +3

Paper
Add Code

SiamTrans: Zero-Shot Multi-Frame Image Restoration with Pre-Trained Siamese Transformers

no code implementations • 17 Dec 2021 • Lin Liu, Shanxin Yuan, Jianzhuang Liu, Xin Guo, Youliang Yan, Qi Tian

For zero-shot image restoration, we design a novel model, termed SiamTrans, which is constructed by Siamese transformers, encoders, and decoders.

Denoising Image Restoration +1

Paper
Add Code

Frequency Spectrum Augmentation Consistency for Domain Adaptive Object Detection

no code implementations • 16 Dec 2021 • Rui Liu, Yahong Han, YaoWei Wang, Qi Tian

In the second stage, augmented source and target data with pseudo labels are adopted to perform the self-training for prediction consistency.

Object object-detection +1

Paper
Add Code

Mining Minority-class Examples With Uncertainty Estimates

no code implementations • 15 Dec 2021 • Gursimran Singh, Lingyang Chu, Lanjun Wang, Jian Pei, Qi Tian, Yong Zhang

In the real world, the frequency of occurrence of objects is naturally skewed forming long-tail class distributions, which results in poor performance on the statistically rare classes.

Paper
Add Code

Exploring Complicated Search Spaces with Interleaving-Free Sampling

no code implementations • 5 Dec 2021 • Yunjie Tian, Lingxi Xie, Jiemin Fang, Jianbin Jiao, Qixiang Ye, Qi Tian

In this paper, we build the search algorithm upon a complicated search space with long-distance connections, and show that existing weight-sharing search algorithms mostly fail due to the existence of \textbf{interleaved connections}.

Neural Architecture Search

Paper
Add Code

NeuSample: Neural Sample Field for Efficient View Synthesis

1 code implementation • 30 Nov 2021 • Jiemin Fang, Lingxi Xie, Xinggang Wang, Xiaopeng Zhang, Wenyu Liu, Qi Tian

Neural radiance fields (NeRF) have shown great potentials in representing 3D scenes and synthesizing novel views, but the computational overhead of NeRF at the inference stage is still heavy.

Paper
Code

Semantic-Aware Generation for Self-Supervised Visual Representation Learning

1 code implementation • 25 Nov 2021 • Yunjie Tian, Lingxi Xie, Xiaopeng Zhang, Jiemin Fang, Haohang Xu, Wei Huang, Jianbin Jiao, Qi Tian, Qixiang Ye

In this paper, we propose a self-supervised visual representation learning approach which involves both generative and discriminative proxies, where we focus on the former part by requiring the target network to recover the original image based on the mid-level features.

Ranked #63 on Semantic Segmentation on Cityscapes test

Representation Learning Semantic Segmentation

Paper
Code

Consensus Synergizes with Memory: A Simple Approach for Anomaly Segmentation in Urban Scenes

no code implementations • 24 Nov 2021 • Jiazhong Cen, Zenkun Jiang, Lingxi Xie, Qi Tian, Xiaokang Yang, Wei Shen

Anomaly segmentation is a crucial task for safety-critical applications, such as autonomous driving in urban scenes, where the goal is to detect out-of-distribution (OOD) objects with categories which are unseen during training.

Ranked #10 on Anomaly Detection on Fishyscapes L&F

Anomaly Detection Autonomous Driving +1

Paper
Add Code

Self-Regulated Learning for Egocentric Video Activity Anticipation

1 code implementation • 23 Nov 2021 • Zhaobo Qi, Shuhui Wang, Chi Su, Li Su, Qingming Huang, Qi Tian

Future activity anticipation is a challenging problem in egocentric vision.

Multi-Task Learning

Paper
Code

DVCFlow: Modeling Information Flow Towards Human-like Video Captioning

no code implementations • 19 Nov 2021 • Xu Yan, Zhengcong Fei, Shuhui Wang, Qingming Huang, Qi Tian

Dense video captioning (DVC) aims to generate multi-sentence descriptions to elucidate the multiple events in the video, which is challenging and demands visual consistency, discoursal coherence, and linguistic diversity.

Dense Video Captioning Sentence

Paper
Add Code

DocScanner: Robust Document Image Rectification with Progressive Learning

3 code implementations • 28 Oct 2021 • Hao Feng, Wengang Zhou, Jiajun Deng, Qi Tian, Houqiang Li

The iterative refinements make DocScanner converge to a robust and superior rectification performance, while the lightweight recurrent architecture ensures the running efficiency.

Optical Character Recognition (OCR)

331

Paper
Code

CIPS-3D: A 3D-Aware Generator of GANs Based on Conditionally-Independent Pixel Synthesis

1 code implementation • 19 Oct 2021 • Peng Zhou, Lingxi Xie, Bingbing Ni, Qi Tian

The style-based GAN (StyleGAN) architecture achieved state-of-the-art results for generating high-quality images, but it lacks explicit and precise control over camera poses.

Ranked #1 on 3D-Aware Image Synthesis on FFHQ 256 x 256

3D-Aware Image Synthesis Transfer Learning

606

Paper
Code

Semi-Autoregressive Image Captioning

1 code implementation • 11 Oct 2021 • Xu Yan, Zhengcong Fei, Zekang Li, Shuhui Wang, Qingming Huang, Qi Tian

Non-autoregressive image captioning with continuous iterative refinement, which eliminates the sequential dependence in a sentence generation, can achieve comparable performance to the autoregressive counterparts with a considerable acceleration.

Image Captioning Sentence

Paper
Code

Deep Encryption: Protecting Pre-Trained Neural Networks with Confusion Neurons

no code implementations • 29 Sep 2021 • Mengbiao Zhao, Shixiong Xu, Jianlong Chang, Lingxi Xie, Jie Chen, Qi Tian

Having consumed huge amounts of training data and computational resource, large-scale pre-trained models are often considered key assets of AI service providers.

Position

Paper
Add Code

Vibration-based Uncertainty Estimation for Learning from Limited Supervision

no code implementations • 29 Sep 2021 • Hengtong Hu, Lingxi Xie, Yinquan Wang, Richang Hong, Meng Wang, Qi Tian

We investigate the problem of estimating uncertainty for training data, so that deep neural networks can make use of the results for learning from limited supervision.

Active Learning

Paper
Add Code

Self-supervised Tumor Segmentation through Layer Decomposition

no code implementations • 7 Sep 2021 • Xiaoman Zhang, Weidi Xie, Chaoqin Huang, Yanfeng Wang, Ya zhang, Xin Chen, Qi Tian

In this paper, we target self-supervised representation learning for zero-shot tumor segmentation.

Brain Tumor Segmentation Data Augmentation +5

Paper
Add Code

Differentiable Convolution Search for Point Cloud Processing

no code implementations • ICCV 2021 • Xing Nie, Yongcheng Liu, Shaohong Chen, Jianlong Chang, Chunlei Huo, Gaofeng Meng, Qi Tian, Weiming Hu, Chunhong Pan

It can work in a purely data-driven manner and thus is capable of auto-creating a group of suitable convolutions for geometric shape modeling.

Paper
Add Code

Multiscale Spatio-Temporal Graph Neural Networks for 3D Skeleton-Based Motion Prediction

no code implementations • 25 Aug 2021 • Maosen Li, Siheng Chen, Yangheng Zhao, Ya zhang, Yanfeng Wang, Qi Tian

The core of MST-GNN is a multiscale spatio-temporal graph that explicitly models the relations in motions at various spatial and temporal scales.

motion prediction

Paper
Add Code

Pixel Difference Networks for Efficient Edge Detection

2 code implementations • ICCV 2021 • Zhuo Su, Wenzhe Liu, Zitong Yu, Dewen Hu, Qing Liao, Qi Tian, Matti Pietikäinen, Li Liu

A faster version of PiDiNet with less than 0. 1M parameters can still achieve comparable performance among state of the arts with 200 FPS.

Ranked #2 on Edge Detection on BRIND

Edge Detection

413

Paper
Code

Greedy Gradient Ensemble for Robust Visual Question Answering

1 code implementation • ICCV 2021 • Xinzhe Han, Shuhui Wang, Chi Su, Qingming Huang, Qi Tian

Language bias is a critical issue in Visual Question Answering (VQA), where models often exploit dataset biases for the final decision without considering the image information.

Ranked #2 on Visual Question Answering (VQA) on VQA-CP

Question Answering Visual Question Answering

Paper
Code

Revisiting Catastrophic Forgetting in Class Incremental Learning

no code implementations • 26 Jul 2021 • Zixuan Ni, Haizhou Shi, Siliang Tang, Longhui Wei, Qi Tian, Yueting Zhuang

After investigating existing strategies, we observe that there is a lack of study on how to prevent the inter-phase confusion.

Class Incremental Learning Contrastive Learning +2

Paper
Add Code

Semantic-guided Pixel Sampling for Cloth-Changing Person Re-identification

1 code implementation • 24 Jul 2021 • Xiujun Shu, Ge Li, Xiao Wang, Weijian Ruan, Qi Tian

The key to this task is to exploit cloth-irrelevant cues.

Cloth-Changing Person Re-Identification

Paper
Code

Domain Adaptation without Model Transferring

no code implementations • 21 Jul 2021 • Kunhong Wu, Yucheng Shi, Yahong Han, Yunfeng Shao, Bingshuai Li, Qi Tian

Existing unsupervised domain adaptation (UDA) methods can achieve promising performance without transferring data from source domain to target domain.

Unsupervised Domain Adaptation

Paper
Add Code

Rectifying the Shortcut Learning of Background for Few-Shot Learning

1 code implementation • NeurIPS 2021 • Xu Luo, Longhui Wei, Liangjian Wen, Jinrong Yang, Lingxi Xie, Zenglin Xu, Qi Tian

The category gap between training and evaluation has been characterised as one of the main obstacles to the success of Few-Shot Learning (FSL).

Ranked #20 on Few-Shot Image Classification on Mini-Imagenet 5-way (5-shot)

Few-Shot Image Classification Few-Shot Learning

101

Paper
Code

Fast Batch Nuclear-norm Maximization and Minimization for Robust Domain Adaptation

1 code implementation • 13 Jul 2021 • Shuhao Cui, Shuhui Wang, Junbao Zhuo, Liang Li, Qingming Huang, Qi Tian

Due to the domain discrepancy in visual domain adaptation, the performance of source model degrades when bumping into the high data density near decision boundary in target domain.

Domain Adaptation

262

Paper
Code

Bag of Instances Aggregation Boosts Self-supervised Distillation

1 code implementation • ICLR 2022 • Haohang Xu, Jiemin Fang, Xiaopeng Zhang, Lingxi Xie, Xinggang Wang, Wenrui Dai, Hongkai Xiong, Qi Tian

Here bag of instances indicates a set of similar samples constructed by the teacher and are grouped within a bag, and the goal of distillation is to aggregate compact representations over the student with respect to instances in a bag.

Contrastive Learning Self-Supervised Learning

Paper
Code

ATSO: Asynchronous Teacher-Student Optimization for Semi-Supervised Image Segmentation

no code implementations • CVPR 2021 • Xinyue Huo, Lingxi Xie, Jianzhong He, Zijie Yang, Wengang Zhou, Houqiang Li, Qi Tian

Semi-supervised learning is a useful tool for image segmentation, mainly due to its ability in extracting knowledge from unlabeled data to assist learning from labeled data.

Continual Learning Image Segmentation +3

Paper
Add Code

Multi-dataset Pretraining: A Unified Model for Semantic Segmentation

no code implementations • 8 Jun 2021 • Bowen Shi, Xiaopeng Zhang, Haohang Xu, Wenrui Dai, Junni Zou, Hongkai Xiong, Qi Tian

This is achieved by first pretraining the network via the proposed pixel-to-prototype contrastive loss over multiple datasets regardless of their taxonomy labels, and followed by fine-tuning the pretrained model over specific dataset as usual.

Semantic Segmentation

Paper
Add Code

Learning High-Precision Bounding Box for Rotated Object Detection via Kullback-Leibler Divergence

2 code implementations • NeurIPS 2021 • Xue Yang, Xiaojiang Yang, Jirui Yang, Qi Ming, Wentao Wang, Qi Tian, Junchi Yan

Taking the perspective that horizontal detection is a special case for rotated object detection, in this paper, we are motivated to change the design of rotation regression loss from induction paradigm to deduction methodology, in terms of the relation between rotation and horizontal detection.

Ranked #14 on Object Detection In Aerial Images on DOTA (using extra training data)

object-detection Object Detection In Aerial Images +1

1,724

Paper
Code

Exploring the Diversity and Invariance in Yourself for Visual Pre-Training Task

no code implementations • 1 Jun 2021 • Longhui Wei, Lingxi Xie, Wengang Zhou, Houqiang Li, Qi Tian

By simply pulling the different augmented views of each image together or other novel mechanisms, they can learn much unsupervised knowledge and significantly improve the transfer performance of pre-training models.

Self-Supervised Learning

Paper
Add Code

MSG-Transformer: Exchanging Local Spatial Information by Manipulating Messenger Tokens

3 code implementations • CVPR 2022 • Jiemin Fang, Lingxi Xie, Xinggang Wang, Xiaopeng Zhang, Wenyu Liu, Qi Tian

Transformers have offered a new methodology of designing neural networks for visual recognition.

Image Classification object-detection +1

Paper
Code

Large-Scale Spatio-Temporal Person Re-identification: Algorithms and Benchmark

2 code implementations • 31 May 2021 • Xiujun Shu, Xiao Wang, Xianghao Zang, Shiliang Zhang, Yuanqi Chen, Ge Li, Qi Tian

We also verified that models pre-trained on LaST can generalize well on existing datasets with short-term and cloth-changing scenarios.

Person Re-Identification

Paper
Code

Analysis and Applications of Class-wise Robustness in Adversarial Training

no code implementations • 29 May 2021 • Qi Tian, Kun Kuang, Kelu Jiang, Fei Wu, Yisen Wang

Adversarial training is one of the most effective approaches to improve model robustness against adversarial examples.

Paper
Add Code

What Is Considered Complete for Visual Recognition?

no code implementations • 28 May 2021 • Lingxi Xie, Xiaopeng Zhang, Longhui Wei, Jianlong Chang, Qi Tian

This is an opinion paper.

Paper
Add Code

Towards Compact CNNs via Collaborative Compression

1 code implementation • CVPR 2021 • Yuchao Li, Shaohui Lin, Jianzhuang Liu, Qixiang Ye, Mengdi Wang, Fei Chao, Fan Yang, Jincheng Ma, Qi Tian, Rongrong Ji

Channel pruning and tensor decomposition have received extensive attention in convolutional neural network compression.

Neural Network Compression Tensor Decomposition

Paper
Code

A Fourier-based Framework for Domain Generalization

1 code implementation • CVPR 2021 • Qinwei Xu, Ruipeng Zhang, Ya zhang, Yanfeng Wang, Qi Tian

Modern deep neural networks suffer from performance degradation when evaluated on testing data under different distributions from training data.

Data Augmentation Domain Generalization

145

Paper
Code

Semi-supervised Contrastive Learning with Similarity Co-calibration

no code implementations • 16 May 2021 • Yuhang Zhang, Xiaopeng Zhang, Robert. C. Qiu, Jie Li, Haohang Xu, Qi Tian

Semi-supervised learning acts as an effective way to leverage massive unlabeled data.

Contrastive Learning Few-Shot Learning +1

Paper
Add Code

Swin-Unet: Unet-like Pure Transformer for Medical Image Segmentation

5 code implementations • 12 May 2021 • Hu Cao, Yueyue Wang, Joy Chen, Dongsheng Jiang, Xiaopeng Zhang, Qi Tian, Manning Wang

In the past few years, convolutional neural networks (CNNs) have achieved milestones in medical image analysis.

Ranked #3 on Medical Image Segmentation on ACDC

Cardiac Segmentation Image Segmentation +1

1,510

Paper
Code

Visformer: The Vision-friendly Transformer

5 code implementations • ICCV 2021 • Zhengsu Chen, Lingxi Xie, Jianwei Niu, Xuefeng Liu, Longhui Wei, Qi Tian

The past year has witnessed the rapid development of applying the Transformer module to vision problems.

Ranked #510 on Image Classification on ImageNet

Image Classification

29,774

Paper
Code

Location-Sensitive Visual Recognition with Cross-IOU Loss

1 code implementation • 11 Apr 2021 • Kaiwen Duan, Lingxi Xie, Honggang Qi, Song Bai, Qingming Huang, Qi Tian

Object detection, instance segmentation, and pose estimation are popular visual recognition tasks which require localizing the object by internal or boundary landmarks.

Ranked #45 on Object Detection on COCO test-dev

2D Human Pose Estimation Instance Segmentation +5

154

Paper
Code

CondenseNet V2: Sparse Feature Reactivation for Deep Networks

1 code implementation • CVPR 2021 • Le Yang, Haojun Jiang, Ruojin Cai, Yulin Wang, Shiji Song, Gao Huang, Qi Tian

Reusing features in deep networks through dense connectivity is an effective way to achieve high computational efficiency.

Computational Efficiency Image Classification +2

Paper
Code

Adaptive Mutual Supervision for Weakly-Supervised Temporal Action Localization

no code implementations • 6 Apr 2021 • Chen Ju, Peisen Zhao, Siheng Chen, Ya zhang, Xiaoyun Zhang, Qi Tian

To solve this issue, we introduce an adaptive mutual supervision framework (AMS) with two branches, where the base branch adopts CAS to localize the most discriminative action regions, while the supplementary branch localizes the less discriminative action regions through a novel adaptive sampler.

Ranked #7 on Weakly Supervised Action Localization on THUMOS14

Weakly Supervised Action Localization Weakly-supervised Temporal Action Localization +1

Paper
Add Code

Spatiotemporal Transformer for Video-based Person Re-identification

no code implementations • 30 Mar 2021 • Tianyu Zhang, Longhui Wei, Lingxi Xie, Zijie Zhuang, Yongfei Zhang, Bo Li, Qi Tian

Recently, the Transformer module has been transplanted from natural language processing to computer vision.

Video-Based Person Re-Identification

Paper
Add Code

TS-CAM: Token Semantic Coupled Attention Map for Weakly Supervised Object Localization

2 code implementations • ICCV 2021 • Wei Gao, Fang Wan, Xingjia Pan, Zhiliang Peng, Qi Tian, Zhenjun Han, Bolei Zhou, Qixiang Ye

TS-CAM finally couples the patch tokens with the semantic-agnostic attention map to achieve semantic-aware localization.

Object Weakly-Supervised Object Localization

131

Paper
Code

Unsupervised Domain Adaptation for Image Classification via Structure-Conditioned Adversarial Learning

no code implementations • 4 Mar 2021 • Hui Wang, Jian Tian, Songyuan Li, Hanbin Zhao, Qi Tian, Fei Wu, Xi Li

Unsupervised domain adaptation (UDA) typically carries out knowledge transfer from a label-rich source domain to an unlabeled target domain by adversarial learning.

General Classification Image Classification +2

Paper
Add Code

Rethinking Rotated Object Detection with Gaussian Wasserstein Distance Loss

2 code implementations • 28 Jan 2021 • Xue Yang, Junchi Yan, Qi Ming, Wentao Wang, Xiaopeng Zhang, Qi Tian

Boundary discontinuity and its inconsistency to the final detection metric have been the bottleneck for rotating detection regression loss design.

Ranked #16 on Object Detection In Aerial Images on DOTA (using extra training data)

object-detection Object Detection In Aerial Images +2

1,724

Paper
Code

Divide and Conquer for Single-Frame Temporal Action Localization

no code implementations • ICCV 2021 • Chen Ju, Peisen Zhao, Siheng Chen, Ya zhang, Yanfeng Wang, Qi Tian

Single-frame temporal action localization (STAL) aims to localize actions in untrimmed videos with only one timestamp annotation for each action instance.

Temporal Action Localization

Paper
Add Code

Shape Self-Correction for Unsupervised Point Cloud Understanding

no code implementations • ICCV 2021 • Ye Chen, Jinxian Liu, Bingbing Ni, Hang Wang, Jiancheng Yang, Ning Liu, Teng Li, Qi Tian

Then the destroyed shape and the normal shape are sent into a point cloud network to get representations, which are employed to segment points that belong to distorted parts and further reconstruct them to restore the shape to normal.

Self-Supervised Learning

Paper
Add Code

Intriguing class-wise properties of adversarial training

no code implementations • 1 Jan 2021 • Qi Tian, Kun Kuang, Fei Wu, Yisen Wang

Adversarial training is one of the most effective approaches to improve model robustness against adversarial examples.

Adversarial Robustness

Paper
Add Code

Foreground Activation Maps for Weakly Supervised Object Localization

no code implementations • ICCV 2021 • Meng Meng, Tianzhu Zhang, Qi Tian, Yongdong Zhang, Feng Wu

To the best of our knowledge, this is the first work that can achieve remarkable performance for both tasks by optimizing them jointly via FAM for WSOL.

Classification Object +1

Paper
Add Code

Point-Level Temporal Action Localization: Bridging Fully-supervised Proposals to Weakly-supervised Losses

no code implementations • 15 Dec 2020 • Chen Ju, Peisen Zhao, Ya zhang, Yanfeng Wang, Qi Tian

Point-Level temporal action localization (PTAL) aims to localize actions in untrimmed videos with only one timestamp annotation for each action instance.

Ranked #3 on Weakly Supervised Action Localization on BEOID

Weakly Supervised Action Localization

Paper
Add Code

ESAD: End-to-end Deep Semi-supervised Anomaly Detection

no code implementations • 9 Dec 2020 • Chaoqin Huang, Fei Ye, Peisen Zhao, Ya zhang, Yan-Feng Wang, Qi Tian

This paper explores semi-supervised anomaly detection, a more practical setting for anomaly detection where a small additional set of labeled samples are provided.

Ranked #25 on Anomaly Detection on One-class CIFAR-10 (using extra training data)

Medical Diagnosis Semi-supervised Anomaly Detection +1

Paper
Add Code

UnrealPerson: An Adaptive Pipeline towards Costless Person Re-identification

1 code implementation • CVPR 2021 • Tianyu Zhang, Lingxi Xie, Longhui Wei, Zijie Zhuang, Yongfei Zhang, Bo Li, Qi Tian

The main difficulty of person re-identification (ReID) lies in collecting annotated data and transferring the model across different domains.

Domain Adaptation Image Generation +1

Paper
Code

Seed the Views: Hierarchical Semantic Alignment for Contrastive Representation Learning

no code implementations • 4 Dec 2020 • Haohang Xu, Xiaopeng Zhang, Hao Li, Lingxi Xie, Hongkai Xiong, Qi Tian

In this paper, we propose a hierarchical semantic alignment strategy via expanding the views generated by a single image to \textbf{Cross-samples and Multi-level} representation, and models the invariance to semantically similar images in a hierarchical way.

Contrastive Learning Representation Learning +2

Paper
Add Code

Self-Adaptively Learning to Demoiré from Focused and Defocused Image Pairs

no code implementations • NeurIPS 2020 • Lin Liu, Shanxin Yuan, Jianzhuang Liu, Liping Bao, Gregory Slabaugh, Qi Tian

In this paper, we propose a self-adaptive learning method for demoiréing a high-frequency image, with the help of an additional defocused moiré-free blur image.

Test-time Adaptation

Paper
Add Code

Omni-GAN: On the Secrets of cGANs and Beyond

3 code implementations • ICCV 2021 • Peng Zhou, Lingxi Xie, Bingbing Ni, Cong Geng, Qi Tian

The conditional generative adversarial network (cGAN) is a powerful tool of generating high-quality images, but existing approaches mostly suffer unsatisfying performance or the risk of mode collapse.

Ranked #8 on Conditional Image Generation on ImageNet 128x128

Conditional Image Generation Generative Adversarial Network

Paper
Code

Heterogeneous Contrastive Learning: Encoding Spatial Information for Compact Visual Representations

no code implementations • 19 Nov 2020 • Xinyue Huo, Lingxi Xie, Longhui Wei, Xiaopeng Zhang, Hao Li, Zijie Yang, Wengang Zhou, Houqiang Li, Qi Tian

Contrastive learning has achieved great success in self-supervised visual representation learning, but existing approaches mostly ignored spatial information which is often crucial for visual representation.

Contrastive Learning Data Augmentation +1

Paper
Add Code

Privileged Knowledge Distillation for Online Action Detection

no code implementations • 18 Nov 2020 • Peisen Zhao, Lingxi Xie, Ya zhang, Yanfeng Wang, Qi Tian

Knowledge distillation is employed to transfer the privileged information from the offline teacher to the online student.

Ranked #11 on Online Action Detection on TVSeries

Knowledge Distillation Online Action Detection

Paper
Add Code

Can Semantic Labels Assist Self-Supervised Visual Representation Learning?

no code implementations • 17 Nov 2020 • Longhui Wei, Lingxi Xie, Jianzhong He, Jianlong Chang, Xiaopeng Zhang, Wengang Zhou, Houqiang Li, Qi Tian

Recently, contrastive learning has largely advanced the progress of unsupervised visual representation learning.

Contrastive Learning Representation Learning +1

Paper
Add Code

Self-Adaptively Learning to Demoire from Focused and Defocused Image Pairs

1 code implementation • 3 Nov 2020 • Lin Liu, Shanxin Yuan, Jianzhuang Liu, Liping Bao, Gregory Slabaugh, Qi Tian

In this paper, we propose a self-adaptive learning method for demoireing a high-frequency image, with the help of an additional defocused moire-free blur image.

Demoire Test-time Adaptation

Paper
Code

CooGAN: A Memory-Efficient Framework for High-Resolution Facial Attribute Editing

1 code implementation • ECCV 2020 • Xuanhong Chen, Bingbing Ni, Naiyuan Liu, Ziang Liu, Yiliu Jiang, Loc Truong, Qi Tian

In contrast to great success of memory-consuming face editing methods at a low resolution, to manipulate high-resolution (HR) facial images, i. e., typically larger than 7682 pixels, with very limited memory is still challenging.

Attribute Image Generation +2

Paper
Code

Loss re-scaling VQA: Revisiting the LanguagePrior Problem from a Class-imbalance View

1 code implementation • 30 Oct 2020 • Yangyang Guo, Liqiang Nie, Zhiyong Cheng, Qi Tian, Min Zhang

Concretely, we design a novel interpretation scheme whereby the loss of mis-predicted frequent and sparse answers of the same question type is distinctly exhibited during the late training phase.

Face Recognition Image Classification +2

Paper
Code

One-bit Supervision for Image Classification

1 code implementation • NeurIPS 2020 • Hengtong Hu, Lingxi Xie, Zewei Du, Richang Hong, Qi Tian

Instead of training a model upon the accurate label of each sample, our setting requires the model to query with a predicted label of each sample and learn from the answer whether the guess is correct.

Classification General Classification +1

Paper
Code

Reinforced Axial Refinement Network for Monocular 3D Object Detection

no code implementations • ECCV 2020 • Lijie Liu, Chufan Wu, Jiwen Lu, Lingxi Xie, Jie zhou, Qi Tian

Monocular 3D object detection aims to extract the 3D position and properties of objects from a 2D input image.

Ranked #16 on Vehicle Pose Estimation on KITTI Cars Hard

Monocular 3D Object Detection Object +2

Paper
Add Code

Label Decoupling Framework for Salient Object Detection

1 code implementation • CVPR 2020 • Jun Wei, Shuhui Wang, Zhe Wu, Chi Su, Qingming Huang, Qi Tian

Though remarkable progress has been achieved, we observe that the closer the pixel is to the edge, the more difficult it is to be predicted, because edge pixels have a very imbalance distribution.

Ranked #1 on Saliency Detection on HKU-IS

Object object-detection +3

113

Paper
Code

Weight-Sharing Neural Architecture Search: A Battle to Shrink the Optimization Gap

no code implementations • 4 Aug 2020 • Lingxi Xie, Xin Chen, Kaifeng Bi, Longhui Wei, Yuhui Xu, Zhengsu Chen, Lanfei Wang, An Xiao, Jianlong Chang, Xiaopeng Zhang, Qi Tian

Neural architecture search (NAS) has attracted increasing attentions in both academia and industry.

Neural Architecture Search

Paper
Add Code

Video Super-Resolution with Recurrent Structure-Detail Network

2 code implementations • ECCV 2020 • Takashi Isobe, Xu Jia, Shuhang Gu, Songjiang Li, Shengjin Wang, Qi Tian

Most video super-resolution methods super-resolve a single reference frame with the help of neighboring frames in a temporal sliding window.

Ranked #9 on Video Super-Resolution on Vid4 - 4x upscaling - BD degradation

Video Super-Resolution

Paper
Code

Dual Distribution Alignment Network for Generalizable Person Re-Identification

1 code implementation • 27 Jul 2020 • Peixian Chen, Pingyang Dai, Jianzhuang Liu, Feng Zheng, Qi Tian, Rongrong Ji

Domain generalization (DG) serves as a promising solution to handle person Re-Identification (Re-ID), which trains the model using labels from the source domain alone, and then directly adopts the trained model to the target domain without model updating.

Domain Generalization Generalizable Person Re-identification

Paper
Code

Corner Proposal Network for Anchor-free, Two-stage Object Detection

1 code implementation • ECCV 2020 • Kaiwen Duan, Lingxi Xie, Honggang Qi, Song Bai, Qingming Huang, Qi Tian

On the MS-COCO dataset, CPN achieves an AP of 49. 2% which is competitive among state-of-the-art object detection methods.

Ranked #83 on Object Detection on COCO test-dev

Computational Efficiency Object +3

193

Paper
Code

Learning Task-oriented Disentangled Representations for Unsupervised Domain Adaptation

no code implementations • 27 Jul 2020 • Pingyang Dai, Peixian Chen, Qiong Wu, Xiaopeng Hong, Qixiang Ye, Qi Tian, Rongrong Ji

This drawback limits the flexibility of UDA in complicated open-set tasks where no labels are shared between domains.

Retrieval Unsupervised Domain Adaptation

Paper
Add Code

Video Super-resolution with Temporal Group Attention

1 code implementation • CVPR 2020 • Takashi Isobe, Songjiang Li, Xu Jia, Shanxin Yuan, Gregory Slabaugh, Chunjing Xu, Ya-Li Li, Shengjin Wang, Qi Tian

Video super-resolution, which aims at producing a high-resolution video from its corresponding low-resolution version, has recently drawn increasing attention.

Ranked #11 on Video Super-Resolution on MSU Video Super Resolution Benchmark: Detail Restoration

Video Super-Resolution

122

Paper
Code

Polar Relative Positional Encoding for Video-Language Segmentation

no code implementations • 20 Jul 2020 • Ke Ning, Lingxi Xie, Fei Wu, Qi Tian

In this paper, we propose a novel Polar Relative Positional Encoding (PRPE) mechanism that represents spatial relations in a ``linguistic'' way, i. e., in terms of direction and range.

Ranked #11 on Referring Expression Segmentation on J-HMDB

Referring Expression Segmentation Sentence

Paper
Add Code

Social Adaptive Module for Weakly-supervised Group Activity Recognition

no code implementations • ECCV 2020 • Rui Yan, Lingxi Xie, Jinhui Tang, Xiangbo Shu, Qi Tian

This paper presents a new task named weakly-supervised group activity recognition (GAR) which differs from conventional GAR tasks in that only video-level labels are available, yet the important persons within each frame are not provided even in the training data.

Group Activity Recognition

Paper
Add Code

Wavelet-Based Dual-Branch Network for Image Demoireing

1 code implementation • 14 Jul 2020 • Lin Liu, Jianzhuang Liu, Shanxin Yuan, Gregory Slabaugh, Ales Leonardis, Wengang Zhou, Qi Tian

When smartphone cameras are used to take photos of digital screens, usually moire patterns result, severely degrading photo quality.

Demoire Image Restoration +1

Paper
Code

Universal-to-Specific Framework for Complex Action Recognition

no code implementations • 13 Jul 2020 • Peisen Zhao, Lingxi Xie, Ya zhang, Qi Tian

The U2S framework is composed of three subnetworks: a universal network, a category-specific network, and a mask network.

Action Recognition Decision Making

Paper
Add Code

GOLD-NAS: Gradual, One-Level, Differentiable

1 code implementation • 7 Jul 2020 • Kaifeng Bi, Lingxi Xie, Xin Chen, Longhui Wei, Qi Tian

There has been a large literature of neural architecture search, but most existing work made use of heuristic rules that largely constrained the search flexibility.

Image Classification Neural Architecture Search

Paper
Code

MgSvF: Multi-Grained Slow vs. Fast Framework for Few-Shot Class-Incremental Learning

no code implementations • 28 Jun 2020 • Hanbin Zhao, Yongjian Fu, Mintong Kang, Qi Tian, Fei Wu, Xi Li

As a challenging problem, few-shot class-incremental learning (FSCIL) continually learns a sequence of tasks, confronting the dilemma between slow forgetting of old knowledge and fast adaptation to new knowledge.

Few-Shot Class-Incremental Learning Incremental Learning

Paper
Add Code

Searching towards Class-Aware Generators for Conditional Generative Adversarial Networks

1 code implementation • 25 Jun 2020 • Peng Zhou, Lingxi Xie, Xiaopeng Zhang, Bingbing Ni, Qi Tian

To learn the sampling policy, a Markov decision process is embedded into the search algorithm and a moving average is applied for better stability.

Image Generation

Paper
Code

ATSO: Asynchronous Teacher-Student Optimization for Semi-Supervised Medical Image Segmentation

no code implementations • 24 Jun 2020 • Xinyue Huo, Lingxi Xie, Jianzhong He, Zijie Yang, Qi Tian

This paper focuses on a popular pipeline known as self learning, and points out a weakness named lazy learning that refers to the difficulty for a model to learn from the pseudo labels generated by itself.

Autonomous Driving Image Segmentation +4

Paper
Add Code

Distilling Object Detectors with Task Adaptive Regularization

no code implementations • 23 Jun 2020 • Ruoyu Sun, Fuhui Tang, Xiaopeng Zhang, Hongkai Xiong, Qi Tian

Knowledge distillation, which aims at training a smaller student network by transferring knowledge from a larger teacher model, is one of the promising solutions for model miniaturization.

Knowledge Distillation Object +1

Paper
Add Code

Cascaded Regression Tracking: Towards Online Hard Distractor Discrimination

no code implementations • 18 Jun 2020 • Ning Wang, Wengang Zhou, Qi Tian, Houqiang Li

In the second stage, a discrete sampling based ridge regression is designed to double-check the remaining ambiguous hard samples, which serves as an alternative of fully-connected layers and benefits from the closed-form solver for efficient learning.

regression Visual Tracking

Paper
Add Code

Rethinking Performance Estimation in Neural Architecture Search

1 code implementation • CVPR 2020 • Xiawu Zheng, Rongrong Ji, Qiang Wang, Qixiang Ye, Zhenguo Li, Yonghong Tian, Qi Tian

In this paper, we provide a novel yet systematic rethinking of PE in a resource constrained regime, termed budgeted PE (BPE), which precisely and effectively estimates the performance of an architecture sampled from an architecture space.

Neural Architecture Search

165

Paper
Code

A Semi-Supervised Assessor of Neural Architectures

no code implementations • CVPR 2020 • Yehui Tang, Yunhe Wang, Yixing Xu, Hanting Chen, Chunjing Xu, Boxin Shi, Chao Xu, Qi Tian, Chang Xu

A graph convolutional neural network is introduced to predict the performance of architectures based on the learned representations and their relation modeled by the graph.

Neural Architecture Search

Paper
Add Code

Projection & Probability-Driven Black-Box Attack

1 code implementation • CVPR 2020 • Jie Li, Rongrong Ji, Hong Liu, Jianzhuang Liu, Bineng Zhong, Cheng Deng, Qi Tian

For reducing the solution space, we first model the adversarial perturbation optimization problem as a process of recovering frequency-sparse perturbations with compressed sensing, under the setting that random noise in the low-frequency space is more likely to be adversarial.

Paper
Code

Cannot find the paper you are looking for? You can Submit a new open access paper.