Search Results for author: Xiangtai Li

Found 69 papers, 62 papers with code

Point-In-Context: Understanding Point Cloud via In-Context Learning

1 code implementation • 18 Apr 2024 • Mengyuan Liu, Zhongbin Fang, Xia Li, Joachim M. Buhmann, Xiangtai Li, Chen Change Loy

With the emergence of large-scale models trained on diverse datasets, in-context learning has emerged as a promising paradigm for multitasking, notably in natural language processing and image processing.

In-Context Learning

Paper
Code

VG4D: Vision-Language Model Goes 4D Video Recognition

1 code implementation • 17 Apr 2024 • Zhichao Deng, Xiangtai Li, Xia Li, Yunhai Tong, Shen Zhao, Mengyuan Liu

By transferring the knowledge of the VLM to the 4D encoder and combining the VLM, our VG4D achieves improved recognition performance.

Action Recognition Autonomous Driving +2

Paper
Code

Learning Feature Inversion for Multi-class Anomaly Detection under General-purpose COCO-AD Benchmark

1 code implementation • 16 Apr 2024 • Jiangning Zhang, Chengjie Wang, Xiangtai Li, Guanzhong Tian, Zhucun Xue, Yong liu, Guansong Pang, DaCheng Tao

Moreover, current metrics such as AU-ROC have nearly reached saturation on simple datasets, which prevents a comprehensive evaluation of different methods.

Anomaly Detection object-detection +2

Paper
Code

DGMamba: Domain Generalization via Generalized State Space Model

1 code implementation • 11 Apr 2024 • Shaocong Long, Qianyu Zhou, Xiangtai Li, Xuequan Lu, Chenhao Ying, Yuan Luo, Lizhuang Ma, Shuicheng Yan

SPR strives to encourage the model to concentrate more on objects rather than context, consisting of two designs: Prior-Free Scanning~(PFS), and Domain Context Interchange~(DCI).

Domain Generalization

Paper
Code

MambaAD: Exploring State Space Models for Multi-class Unsupervised Anomaly Detection

no code implementations • 9 Apr 2024 • Haoyang He, Yuhu Bai, Jiangning Zhang, Qingdong He, Hongxu Chen, Zhenye Gan, Chengjie Wang, Xiangtai Li, Guanzhong Tian, Lei Xie

Recent advancements in anomaly detection have seen the efficacy of CNN- and transformer-based approaches.

Long-range modeling Unsupervised Anomaly Detection

Paper
Add Code

DVIS-DAQ: Improving Video Segmentation via Dynamic Anchor Queries

3 code implementations • 29 Mar 2024 • Yikang Zhou, Tao Zhang, Shunping Ji, Shuicheng Yan, Xiangtai Li

Modern video segmentation methods adopt object queries to perform inter-frame association and demonstrate satisfactory performance in tracking continuously appearing objects despite large-scale motion and transient occlusion.

Object Video Segmentation +1

115

Paper
Code

GenView: Enhancing View Quality with Pretrained Generative Model for Self-Supervised Learning

1 code implementation • 18 Mar 2024 • Xiaojie Li, Yibo Yang, Xiangtai Li, Jianlong Wu, Yue Yu, Bernard Ghanem, Min Zhang

To tackle these challenges, we present GenView, a controllable framework that augments the diversity of positive views leveraging the power of pretrained generative models while preserving semantics.

Contrastive Learning Data Augmentation +1

Paper
Code

Explore In-Context Segmentation via Latent Diffusion Models

no code implementations • 14 Mar 2024 • Chaoyang Wang, Xiangtai Li, Henghui Ding, Lu Qi, Jiangning Zhang, Yunhai Tong, Chen Change Loy, Shuicheng Yan

In-context segmentation has drawn more attention with the introduction of vision foundation models.

Metric Learning Segmentation

Paper
Add Code

Point Cloud Mamba: Point Cloud Learning via State Space Model

2 code implementations • 1 Mar 2024 • Tao Zhang, Xiangtai Li, Haobo Yuan, Shunping Ji, Shuicheng Yan

To enable more effective processing of 3-D point cloud data by Mamba, we propose a novel Consistent Traverse Serialization to convert point clouds into 1-D point sequences while ensuring that neighboring points in the sequence are also spatially adjacent.

Paper
Code

Generalizable Entity Grounding via Assistance of Large Language Model

no code implementations • 4 Feb 2024 • Lu Qi, Yi-Wen Chen, Lehan Yang, Tiancheng Shen, Xiangtai Li, Weidong Guo, Yu Xu, Ming-Hsuan Yang

In this work, we propose a novel approach to densely ground visual entities from a long caption.

Language Modelling Large Language Model +4

Paper
Add Code

Towards Language-Driven Video Inpainting via Multimodal Large Language Models

no code implementations • 18 Jan 2024 • Jianzong Wu, Xiangtai Li, Chenyang Si, Shangchen Zhou, Jingkang Yang, Jiangning Zhang, Yining Li, Kai Chen, Yunhai Tong, Ziwei Liu, Chen Change Loy

We introduce a new task -- language-driven video inpainting, which uses natural language instructions to guide the inpainting process.

Video Inpainting

Paper
Add Code

RAP-SAM: Towards Real-Time All-Purpose Segment Anything

1 code implementation • 18 Jan 2024 • Shilin Xu, Haobo Yuan, Qingyu Shi, Lu Qi, Jingbo Wang, Yibo Yang, Yining Li, Kai Chen, Yunhai Tong, Bernard Ghanem, Xiangtai Li, Ming-Hsuan Yang

Segment Anything Model (SAM) is one remarkable model that can achieve generalized segmentation.

Interactive Segmentation Panoptic Segmentation +3

187

Paper
Code

OMG-Seg: Is One Model Good Enough For All Segmentation?

1 code implementation • 18 Jan 2024 • Xiangtai Li, Haobo Yuan, Wei Li, Henghui Ding, Size Wu, Wenwei Zhang, Yining Li, Kai Chen, Chen Change Loy

In this work, we address various segmentation tasks, each traditionally tackled by distinct or partially unified models.

Interactive Segmentation Panoptic Segmentation +3

681

Paper
Code

ModelNet-O: A Large-Scale Synthetic Dataset for Occlusion-Aware Point Cloud Classification

1 code implementation • 16 Jan 2024 • Zhongbin Fang, Xia Li, Xiangtai Li, Shen Zhao, Mengyuan Liu

Through extensive experiments, we demonstrate that our PointMLS achieves state-of-the-art results on ModelNet-O and competitive results on regular datasets, and it is robust and effective.

3D Point Cloud Classification Point Cloud Classification

Paper
Code

Open-Vocabulary SAM: Segment and Recognize Twenty-thousand Classes Interactively

1 code implementation • 5 Jan 2024 • Haobo Yuan, Xiangtai Li, Chong Zhou, Yining Li, Kai Chen, Chen Change Loy

The CLIP and Segment Anything Model (SAM) are remarkable vision foundation models (VFMs).

Image Classification Interactive Segmentation +3

585

Paper
Code

BA-SAM: Scalable Bias-Mode Attention Mask for Segment Anything Model

1 code implementation • 4 Jan 2024 • Yiran Song, Qianyu Zhou, Xiangtai Li, Deng-Ping Fan, Xuequan Lu, Lizhuang Ma

To this end, we propose Scalable Bias-Mode Attention Mask (BA-SAM) to enhance SAM's adaptability to varying image resolutions while eliminating the need for structure modifications.

Paper
Code

An Open and Comprehensive Pipeline for Unified Object Grounding and Detection

1 code implementation • 4 Jan 2024 • Xiangyu Zhao, Yicheng Chen, Shilin Xu, Xiangtai Li, Xinjiang Wang, Yining Li, Haian Huang

Grounding-DINO is a state-of-the-art open-set detection model that tackles multiple vision tasks including Open-Vocabulary Detection (OVD), Phrase Grounding (PG), and Referring Expression Comprehension (REC).

Ranked #1 on Described Object Detection on Description Detection Dataset

Described Object Detection Phrase Grounding +2

27,765

Paper
Code

A Generalist FaceX via Learning Unified Facial Representation

1 code implementation • 31 Dec 2023 • Yue Han, Jiangning Zhang, Junwei Zhu, Xiangtai Li, Yanhao Ge, Wei Li, Chengjie Wang, Yong liu, Xiaoming Liu, Ying Tai

This work presents FaceX framework, a novel facial generalist model capable of handling diverse facial tasks simultaneously.

Facial Editing

Paper
Code

RTMO: Towards High-Performance One-Stage Real-Time Multi-Person Pose Estimation

1 code implementation • 12 Dec 2023 • Peng Lu, Tao Jiang, Yining Li, Xiangtai Li, Kai Chen, Wenming Yang

Real-time multi-person pose estimation presents significant challenges in balancing speed and precision.

Ranked #1 on Multi-Person Pose Estimation on CrowdPose (using extra training data)

Multi-Person Pose Estimation

4,986

Paper
Code

Exploring Plain ViT Reconstruction for Multi-class Unsupervised Anomaly Detection

1 code implementation • 12 Dec 2023 • Jiangning Zhang, Xuhai Chen, Yabiao Wang, Chengjie Wang, Yong liu, Xiangtai Li, Ming-Hsuan Yang, DaCheng Tao

Following this spirit, this paper explores plain ViT architecture for MUAD.

Unsupervised Anomaly Detection

Paper
Code

EdgeSAM: Prompt-In-the-Loop Distillation for On-Device Deployment of SAM

1 code implementation • 11 Dec 2023 • Chong Zhou, Xiangtai Li, Chen Change Loy, Bo Dai

It is also the first SAM variant that can run at over 30 FPS on an iPhone 14.

679

Paper
Code

Skeleton-in-Context: Unified Skeleton Sequence Modeling with In-Context Learning

1 code implementation • 6 Dec 2023 • Xinshun Wang, Zhongbin Fang, Xia Li, Xiangtai Li, Chen Chen, Mengyuan Liu

Under this setting, the model can perceive tasks from prompts and accomplish them without any extra task-specific head predictions or model fine-tuning.

In-Context Learning motion prediction +1

Paper
Code

Effective Adapter for Face Recognition in the Wild

no code implementations • 4 Dec 2023 • Yunhao Liu, Yu-Ju Tsai, Kelvin C. K. Chan, Xiangtai Li, Lu Qi, Ming-Hsuan Yang

Traditional heuristic approaches-either training models directly on these degraded images or their enhanced counterparts using face restoration techniques-have proven ineffective, primarily due to the degradation of facial features and the discrepancy in image domains.

Face Recognition

Paper
Add Code

Panoptic Video Scene Graph Generation

3 code implementations • CVPR 2023 • Jingkang Yang, Wenxuan Peng, Xiangtai Li, Zujin Guo, Liangyu Chen, Bo Li, Zheng Ma, Kaiyang Zhou, Wayne Zhang, Chen Change Loy, Ziwei Liu

PVSG relates to the existing video scene graph generation (VidSGG) problem, which focuses on temporal interactions between humans and objects grounded with bounding boxes in videos.

Graph Generation Panoptic Scene Graph Generation +5

Paper
Code

Rethinking Evaluation Metrics of Open-Vocabulary Segmentaion

1 code implementation • 6 Nov 2023 • Hao Zhou, Tiancheng Shen, Xu Yang, Hai Huang, Xiangtai Li, Lu Qi, Ming-Hsuan Yang

We benchmarked the proposed evaluation metrics on 12 open-vocabulary methods of three segmentation tasks.

Segmentation

664

Paper
Code

OV-VG: A Benchmark for Open-Vocabulary Visual Grounding

1 code implementation • 22 Oct 2023 • Chunlei Wang, Wenquan Feng, Xiangtai Li, Guangliang Cheng, Shuchang Lyu, Binghao Liu, Lijiang Chen, Qi Zhao

While current foundational models excel at various visual language tasks, there's a noticeable absence of models specifically tailored for open-vocabulary visual grounding.

Novel Concepts object-detection +2

Paper
Code

DST-Det: Simple Dynamic Self-Training for Open-Vocabulary Object Detection

1 code implementation • 2 Oct 2023 • Shilin Xu, Xiangtai Li, Size Wu, Wenwei Zhang, Yunhai Tong, Chen Change Loy

We refer to this approach as the self-training strategy, which enhances recall and accuracy for novel classes without requiring extra annotations, datasets, and re-training.

Novel Object Detection Object +5

Paper
Code

CLIPSelf: Vision Transformer Distills Itself for Open-Vocabulary Dense Prediction

1 code implementation • 2 Oct 2023 • Size Wu, Wenwei Zhang, Lumin Xu, Sheng Jin, Xiangtai Li, Wentao Liu, Chen Change Loy

However, when transferring the vision-language alignment of CLIP from global image representation to local region representation for the open-vocabulary dense prediction tasks, CLIP ViTs suffer from the domain shift from full images to local image regions.

Ranked #3 on Open Vocabulary Semantic Segmentation on PASCAL Context-59

Image Classification Image Segmentation +7

134

Paper
Code

MosaicFusion: Diffusion Models as Data Augmenters for Large Vocabulary Instance Segmentation

1 code implementation • 22 Sep 2023 • Jiahao Xie, Wei Li, Xiangtai Li, Ziwei Liu, Yew Soon Ong, Chen Change Loy

We present MosaicFusion, a simple yet effective diffusion-based data augmentation approach for large vocabulary instance segmentation.

Data Augmentation Instance Segmentation +1

105

Paper
Code

Neural Collapse Terminus: A Unified Solution for Class Incremental Learning and Its Variants

2 code implementations • 3 Aug 2023 • Yibo Yang, Haobo Yuan, Xiangtai Li, Jianlong Wu, Lefei Zhang, Zhouchen Lin, Philip Torr, DaCheng Tao, Bernard Ghanem

Beyond the normal case, long-tail class incremental learning and few-shot class incremental learning are also proposed to consider the data imbalance and data scarcity, respectively, which are common in real-world implementations and further exacerbate the well-known problem of catastrophic forgetting.

Few-Shot Class-Incremental Learning Incremental Learning

Paper
Code

Iterative Robust Visual Grounding with Masked Reference based Centerpoint Supervision

1 code implementation • 23 Jul 2023 • Menghao Li, Chunlei Wang, Wenquan Feng, Shuchang Lyu, Guangliang Cheng, Xiangtai Li, Binghao Liu, Qi Zhao

The proposed framework is evaluated on five regular VG datasets and two newly constructed robust VG datasets.

Visual Grounding

Paper
Code

Pair then Relation: Pair-Net for Panoptic Scene Graph Generation

1 code implementation • 17 Jul 2023 • Jinghao Wang, Zhengyu Wen, Xiangtai Li, Zujin Guo, Jingkang Yang, Ziwei Liu

Panoptic Scene Graph (PSG) is a challenging task in Scene Graph Generation (SGG) that aims to create a more comprehensive scene graph representation using panoptic segmentation instead of boxes.

Graph Generation Panoptic Scene Graph Generation +2

Paper
Code

Towards Open Vocabulary Learning: A Survey

1 code implementation • 28 Jun 2023 • Jianzong Wu, Xiangtai Li, Shilin Xu, Haobo Yuan, Henghui Ding, Yibo Yang, Xia Li, Jiangning Zhang, Yunhai Tong, Xudong Jiang, Bernard Ghanem, DaCheng Tao

To our knowledge, this is the first comprehensive literature review of open vocabulary learning.

Open Set Learning Out-of-Distribution Detection +3

640

Paper
Code

Explore In-Context Learning for 3D Point Cloud Understanding

1 code implementation • NeurIPS 2023 • Zhongbin Fang, Xiangtai Li, Xia Li, Joachim M. Buhmann, Chen Change Loy, Mengyuan Liu

With the rise of large-scale models trained on broad data, in-context learning has become a new learning paradigm that has demonstrated significant potential in natural language processing and computer vision tasks.

In-Context Learning

Paper
Code

Change Detection Methods for Remote Sensing in the Last Decade: A Comprehensive Review

no code implementations • 9 May 2023 • Guangliang Cheng, Yunmeng Huang, Xiangtai Li, Shuchang Lyu, Zhaoyang Xu, Qi Zhao, Shiming Xiang

We first introduce some preliminary knowledge for the change detection task, such as problem definition, datasets, evaluation metrics, and transformer basics, as well as provide a detailed taxonomy of existing algorithms from three different perspectives: algorithm granularity, supervision modes, and learning frameworks in the methodology section.

Change Detection Change detection for remote sensing images

Paper
Add Code

Transformer-Based Visual Segmentation: A Survey

2 code implementations • 19 Apr 2023 • Xiangtai Li, Henghui Ding, Haobo Yuan, Wenwei Zhang, Jiangmiao Pang, Guangliang Cheng, Kai Chen, Ziwei Liu, Chen Change Loy

Recently, transformers, a type of neural network based on self-attention originally designed for natural language processing, have considerably surpassed previous convolutional or recurrent approaches in various vision processing tasks.

Autonomous Driving Point Cloud Segmentation +1

573

Paper
Code

Tube-Link: A Flexible Cross Tube Framework for Universal Video Segmentation

2 code implementations • ICCV 2023 • Xiangtai Li, Haobo Yuan, Wenwei Zhang, Guangliang Cheng, Jiangmiao Pang, Chen Change Loy

Our framework is a near-online approach that takes a short subclip as input and outputs the corresponding spatial-temporal tube masks.

Ranked #3 on Video Semantic Segmentation on VSPW

Contrastive Learning Segmentation +4

105

Paper
Code

Neural Collapse Inspired Feature-Classifier Alignment for Few-Shot Class Incremental Learning

1 code implementation • 6 Feb 2023 • Yibo Yang, Haobo Yuan, Xiangtai Li, Zhouchen Lin, Philip Torr, DaCheng Tao

In this paper, we deal with this misalignment dilemma in FSCIL inspired by the recently discovered phenomenon named neural collapse, which reveals that the last-layer features of the same class will collapse into a vertex, and the vertices of all classes are aligned with the classifier prototypes, which are formed as a simplex equiangular tight frame (ETF).

Few-Shot Class-Incremental Learning Incremental Learning

Paper
Code

Neural Collapse Inspired Feature-Classifier Alignment for Few-Shot Class-Incremental Learning

1 code implementation • ICLR 2023 • Yibo Yang, Haobo Yuan, Xiangtai Li, Zhouchen Lin, Philip Torr, DaCheng Tao

Ranked #3 on Few-Shot Class-Incremental Learning on CUB-200-2011

Few-Shot Class-Incremental Learning Incremental Learning

Paper
Code

Reference Twice: A Simple and Unified Baseline for Few-Shot Instance Segmentation

1 code implementation • 3 Jan 2023 • Yue Han, Jiangning Zhang, Zhucun Xue, Chao Xu, Xintian Shen, Yabiao Wang, Chengjie Wang, Yong liu, Xiangtai Li

In this work, we explore a simple yet unified solution for FSIS as well as its incremental variants, and introduce a new framework named Reference Twice (RefT) to fully explore the relationship between support/query features based on a Transformer-like framework.

Benchmarking Few-Shot Object Detection +3

Paper
Code

PanopticPartFormer++: A Unified and Decoupled View for Panoptic Part Segmentation

1 code implementation • 3 Jan 2023 • Xiangtai Li, Shilin Xu, Yibo Yang, Haobo Yuan, Guangliang Cheng, Yunhai Tong, Zhouchen Lin, Ming-Hsuan Yang, DaCheng Tao

Third, inspired by Mask2Former, based on our meta-architecture, we propose Panoptic-PartFormer++ and design a new part-whole cross-attention scheme to boost part segmentation qualities further.

Panoptic Segmentation Segmentation

Paper
Code

Rethinking Mobile Block for Efficient Attention-based Models

1 code implementation • ICCV 2023 • Jiangning Zhang, Xiangtai Li, Jian Li, Liang Liu, Zhucun Xue, Boshen Zhang, Zhengkai Jiang, Tianxin Huang, Yabiao Wang, Chengjie Wang

This paper focuses on developing modern, efficient, lightweight models for dense predictions while trading off parameters, FLOPs, and performance.

Unity

214

Paper
Code

Betrayed by Captions: Joint Caption Grounding and Generation for Open Vocabulary Instance Segmentation

2 code implementations • ICCV 2023 • Jianzong Wu, Xiangtai Li, Henghui Ding, Xia Li, Guangliang Cheng, Yunhai Tong, Chen Change Loy

Experiments on the COCO dataset with two settings: Open Vocabulary Instance Segmentation (OVIS) and Open Set Panoptic Segmentation (OSPS) demonstrate the superiority of the CGG.

Caption Generation Instance Segmentation +2

Paper
Code

Convolution-enhanced Evolving Attention Networks

1 code implementation • 16 Dec 2022 • Yujing Wang, Yaming Yang, Zhuo Li, Jiangang Bai, Mingliang Zhang, Xiangtai Li, Jing Yu, Ce Zhang, Gao Huang, Yunhai Tong

To the best of our knowledge, this is the first work that explicitly models the layer-wise evolution of attention maps.

Image Classification Machine Translation +3

Paper
Code

Towards Robust Referring Image Segmentation

1 code implementation • 20 Sep 2022 • Jianzong Wu, Xiangtai Li, Xia Li, Henghui Ding, Yunhai Tong, DaCheng Tao

It considers the negative sentence inputs besides the regular positive text inputs.

Image Segmentation Segmentation +2

Paper
Code

SFNet: Faster, Accurate, and Domain Agnostic Semantic Segmentation via Semantic Flow

1 code implementation • 10 Jul 2022 • Xiangtai Li, Jiangning Zhang, Yibo Yang, Guangliang Cheng, Kuiyuan Yang, Yunhai Tong, DaCheng Tao

In this paper, we focus on exploring effective methods for faster, accurate, and domain agnostic semantic segmentation.

Real-Time Semantic Segmentation

350

Paper
Code

EATFormer: Improving Vision Transformer Inspired by Evolutionary Algorithm

1 code implementation • 19 Jun 2022 • Jiangning Zhang, Xiangtai Li, Yabiao Wang, Chengjie Wang, Yibo Yang, Yong liu, DaCheng Tao

Motivated by biological evolution, this paper explains the rationality of Vision Transformer by analogy with the proven practical Evolutionary Algorithm (EA) and derives that both have consistent mathematical formulation.

Image Classification

Paper
Code

Multi-Task Learning with Multi-Query Transformer for Dense Prediction

1 code implementation • 28 May 2022 • Yangyang Xu, Xiangtai Li, Haobo Yuan, Yibo Yang, Lefei Zhang

We first model each task with a task-relevant query.

Multi-Task Learning

Paper
Code

Fashionformer: A simple, Effective and Unified Baseline for Human Fashion Segmentation and Recognition

1 code implementation • 10 Apr 2022 • Shilin Xu, Xiangtai Li, Jingbo Wang, Guangliang Cheng, Yunhai Tong, DaCheng Tao

This focus on joint human fashion segmentation and attribute recognition.

Attribute Fashion Understanding +1

Paper
Code

Panoptic-PartFormer: Learning a Unified Model for Panoptic Part Segmentation

1 code implementation • 10 Apr 2022 • Xiangtai Li, Shilin Xu, Yibo Yang, Guangliang Cheng, Yunhai Tong, DaCheng Tao

To the best of our knowledge, we are the first to solve the PPS problem via \textit{a unified and end-to-end transformer model.

Ranked #2 on Part-aware Panoptic Segmentation on Pascal Panoptic Parts

Panoptic Segmentation Part-aware Panoptic Segmentation +1

Paper
Code

Video K-Net: A Simple, Strong, and Unified Baseline for Video Segmentation

1 code implementation • CVPR 2022 • Xiangtai Li, Wenwei Zhang, Jiangmiao Pang, Kai Chen, Guangliang Cheng, Yunhai Tong, Chen Change Loy

We hope this simple, yet effective method can serve as a new, flexible baseline in unified video segmentation design.

Ranked #1 on Video Panoptic Segmentation on KITTI-STEP (using extra training data)

Image Segmentation Instance Segmentation +5

150

Paper
Code

Inducing Neural Collapse in Imbalanced Learning: Do We Really Need a Learnable Classifier at the End of Deep Neural Network?

1 code implementation • 17 Mar 2022 • Yibo Yang, Shixiang Chen, Xiangtai Li, Liang Xie, Zhouchen Lin, DaCheng Tao

Modern deep neural networks for classification usually jointly learn a backbone for representation and a linear classifier to output the logit of each class.

Ranked #26 on Long-tail Learning on CIFAR-10-LT (ρ=100)

Classification Image Classification +1

Paper
Code

TransVOD: End-to-End Video Object Detection with Spatial-Temporal Transformers

3 code implementations • 13 Jan 2022 • Qianyu Zhou, Xiangtai Li, Lu He, Yibo Yang, Guangliang Cheng, Yunhai Tong, Lizhuang Ma, DaCheng Tao

Detection Transformer (DETR) and Deformable DETR have been proposed to eliminate the need for many hand-designed components in object detection while demonstrating good performance as previous complex hand-crafted detectors.

Ranked #4 on Video Object Detection on ImageNet VID (using extra training data)

Object object-detection +2

196

Paper
Code

PolyphonicFormer: Unified Query Learning for Depth-aware Video Panoptic Segmentation

1 code implementation • 5 Dec 2021 • Haobo Yuan, Xiangtai Li, Yibo Yang, Guangliang Cheng, Jing Zhang, Yunhai Tong, Lefei Zhang, DaCheng Tao

The Depth-aware Video Panoptic Segmentation (DVPS) is a new challenging vision problem that aims to predict panoptic segmentation and depth in a video simultaneously.

Ranked #1 on Depth-aware Video Panoptic Segmentation on SemKITTI-DVPS

Depth-aware Video Panoptic Segmentation Depth Estimation +4

Paper
Code

Improving Video Instance Segmentation via Temporal Pyramid Routing

1 code implementation • 28 Jul 2021 • Xiangtai Li, Hao He, Yibo Yang, Henghui Ding, Kuiyuan Yang, Guangliang Cheng, Yunhai Tong, DaCheng Tao

To incorporate both temporal and scale information, we propose a Temporal Pyramid Routing (TPR) strategy to conditionally align and conduct pixel-level aggregation from a feature pyramid pair of two adjacent frames.

Instance Segmentation Panoptic Segmentation +2

Paper
Code

Global Aggregation then Local Distribution for Scene Parsing

1 code implementation • 28 Jul 2021 • Xiangtai Li, Li Zhang, Guangliang Cheng, Kuiyuan Yang, Yunhai Tong, Xiatian Zhu, Tao Xiang

Modelling long-range contextual relationships is critical for pixel-wise prediction tasks such as semantic segmentation.

Scene Parsing Segmentation +1

342

Paper
Code

BoundarySqueeze: Image Segmentation as Boundary Squeezing

1 code implementation • 25 May 2021 • Hao He, Xiangtai Li, Yibo Yang, Guangliang Cheng, Yunhai Tong, Lubin Weng, Zhouchen Lin, Shiming Xiang

This module is used to squeeze the object boundary from both inner and outer directions, which contributes to precise mask representation.

Image Segmentation Instance Segmentation +2

Paper
Code

Fast and Accurate Scene Parsing via Bi-direction Alignment Networks

1 code implementation • 25 May 2021 • Yanran Wu, Xiangtai Li, Chen Shi, Yunhai Tong, Yang Hua, Tao Song, Ruhui Ma, Haibing Guan

Motivated by this, we propose a novel network by aligning two-path information into each other through a learned flow field.

Scene Parsing

Paper
Code

Dynamic Dual Sampling Module for Fine-Grained Semantic Segmentation

no code implementations • 25 May 2021 • Chen Shi, Xiangtai Li, Yanran Wu, Yunhai Tong, Yi Xu

Representation of semantic context and local details is the essential issue for building modern semantic segmentation models.

Segmentation Semantic Segmentation

Paper
Add Code

End-to-End Video Object Detection with Spatial-Temporal Transformers

1 code implementation • 23 May 2021 • Lu He, Qianyu Zhou, Xiangtai Li, Li Niu, Guangliang Cheng, Xiao Li, Wenxuan Liu, Yunhai Tong, Lizhuang Ma, Liqing Zhang

Recently, DETR and Deformable DETR have been proposed to eliminate the need for many hand-designed components in object detection while demonstrating good performance as previous complex hand-crafted detectors.

Object object-detection +2

196

Paper
Code

Enhanced Boundary Learning for Glass-like Object Segmentation

1 code implementation • ICCV 2021 • Hao He, Xiangtai Li, Guangliang Cheng, Jianping Shi, Yunhai Tong, Gaofeng Meng, Véronique Prinet, Lubin Weng

We use these two modules to design a decoder that generates accurate and clean segmentation results, especially on the object contours.

Ranked #20 on Thermal Image Segmentation on RGB-T-Glass-Segmentation

Object Robot Navigation +3

Paper
Code

PointFlow: Flowing Semantics Through Points for Aerial Image Segmentation

1 code implementation • CVPR 2021 • Xiangtai Li, Hao He, Xia Li, Duo Li, Guangliang Cheng, Jianping Shi, Lubin Weng, Yunhai Tong, Zhouchen Lin

Experimental results on three different aerial segmentation datasets suggest that the proposed method is more effective and efficient than state-of-the-art general semantic segmentation methods.

Image Segmentation Segmentation +1

121

Paper
Code

Involution: Inverting the Inherence of Convolution for Visual Recognition

13 code implementations • CVPR 2021 • Duo Li, Jie Hu, Changhu Wang, Xiangtai Li, Qi She, Lei Zhu, Tong Zhang, Qifeng Chen

Convolution has been the core ingredient of modern neural networks, triggering the surge of deep learning in vision.

Ranked #703 on Image Classification on ImageNet

Image Classification

5,253

Paper
Code

Towards Efficient Scene Understanding via Squeeze Reasoning

1 code implementation • 6 Nov 2020 • Xiangtai Li, Xia Li, Ansheng You, Li Zhang, Guangliang Cheng, Kuiyuan Yang, Yunhai Tong, Zhouchen Lin

Instead of propagating information on the spatial map, we first learn to squeeze the input feature into a channel-wise global vector and perform reasoning within the single vector where the computation cost can be significantly reduced.

Instance Segmentation object-detection +4

350

Paper
Code

Improving Semantic Segmentation via Decoupled Body and Edge Supervision

2 code implementations • ECCV 2020 • Xiangtai Li, Xia Li, Li Zhang, Guangliang Cheng, Jianping Shi, Zhouchen Lin, Shaohua Tan, Yunhai Tong

Our insight is that appealing performance of semantic segmentation requires \textit{explicitly} modeling the object \textit{body} and \textit{edge}, which correspond to the high and low frequency of the image.

Object Segmentation +1

8,248

Paper
Code

Semantic Flow for Fast and Accurate Scene Parsing

6 code implementations • ECCV 2020 • Xiangtai Li, Ansheng You, Zhen Zhu, Houlong Zhao, Maoke Yang, Kuiyuan Yang, Yunhai Tong

A common practice to improve the performance is to attain high resolution feature maps with strong semantic representation.

Ranked #2 on Real-Time Semantic Segmentation on Cityscapes test

Optical Flow Estimation Real-Time Semantic Segmentation +1

8,248

Paper
Code

Global Aggregation then Local Distribution in Fully Convolutional Networks

2 code implementations • 16 Sep 2019 • Xiangtai Li, Li Zhang, Ansheng You, Maoke Yang, Kuiyuan Yang, Yunhai Tong

GALD is end-to-end trainable and can be easily plugged into existing FCNs with various global aggregation modules for a wide range of vision tasks, and consistently improves the performance of state-of-the-art object detection and instance segmentation approaches.

Ranked #1 on Semantic Segmentation on PASCAL VOC 2007