Search Results for author: Xiangtai Li

Found 113 papers, 83 papers with code

NTIRE 2025 Challenge on Day and Night Raindrop Removal for Dual-Focused Images: Methods and Results

no code implementations17 Apr 2025 Xin Li, Yeying Jin, Xin Jin, Zongwei Wu, Bingchen Li, YuFei Wang, Wenhan Yang, Yu Li, Zhibo Chen, Bihan Wen, Robby T. Tan, Radu Timofte, Qiyu Rong, Hongyuan Jing, Mengmeng Zhang, Jinglong Li, Xiangyu Lu, Yi Ren, YuTing Liu, Meng Zhang, Xiang Chen, Qiyuan Guan, Jiangxin Dong, Jinshan Pan, Conglin Gou, Qirui Yang, Fangpu Zhang, Yunlong Lin, Sixiang Chen, Guoxi Huang, Ruirui Lin, Yan Zhang, Jingyu Yang, Huanjing Yue, Jiyuan Chen, Qiaosi Yi, Hongjun Wang, Chenxi Xie, Shuai Li, Yuhui Wu, Kaiyi Ma, Jiakui Hu, Juncheng Li, Liwen Pan, Guangwei Gao, Wenjie Li, Zhenyu Jin, Heng Guo, Zhanyu Ma, YuBo Wang, Jinghua Wang, Wangzhi Xing, Anjusree Karnavar, Diqi Chen, Mohammad Aminul Islam, Hao Yang, Ruikun Zhang, Liyuan Pan, Qianhao Luo, XinCao, Han Zhou, Yan Min, Wei Dong, Jun Chen, Taoyi Wu, Weijia Dou, Yu Wang, Shengjie Zhao, Yongcheng Huang, Xingyu Han, Anyan Huang, Hongtao Wu, Hong Wang, Yefeng Zheng, Abhijeet Kumar, Aman Kumar, Marcos V. Conde, Paula Garrido, Daniel Feijoo, Juan C. Benito, Guanglu Dong, Xin Lin, Siyuan Liu, Tianheng Zheng, Jiayu Zhong, Shouyi Wang, Xiangtai Li, Lanqing Guo, Lu Qi, Chao Ren, Shuaibo Wang, Shilong Zhang, Wanyu Zhou, Yunze Wu, Qinzhong Tan, Jieyuan Pei, Zhuoxuan Li, Jiayu Wang, Haoyu Bian, Haoran Sun, Subhajit Paul, Ni Tang, Junhao Huang, Zihan Cheng, Hongyun Zhu, Yuehan Wu, Kaixin Deng, Hang Ouyang, Tianxin Xiao, Fan Yang, Zhizun Luo, Zeyu Xiao, Zhuoyuan Li, Nguyen Pham Hoang Le, An Dinh Thien, Son T. Luu, Kiet Van Nguyen, Ronghua Xu, Xianmin Tian, Weijian Zhou, Jiacheng Zhang, Yuqian Chen, Yihang Duan, Yujie Wu, Suresh Raikwar, Arsh Garg, Kritika, Jianhua Zheng, Xiaoshan Ma, Ruolin Zhao, Yongyu Yang, Yongsheng Liang, Guiming Huang, Qiang Li, Hongbin Zhang, Xiangyu Zheng, A. N. Rajagopalan

This paper reviews the NTIRE 2025 Challenge on Day and Night Raindrop Removal for Dual-Focused Images.

Raindrop Removal Rain Removal +1

DC-SAM: In-Context Segment Anything in Images and Videos via Dual Consistency

1 code implementation16 Apr 2025 Mengshi Qi, Pengfei Zhu, Xiangtai Li, Xiaoyang Bi, Lu Qi, Huadong Ma, Ming-Hsuan Yang

In this work, we propose the Dual Consistency SAM (DC-SAM) method based on prompt-tuning to adapt SAM and SAM2 for in-context segmentation of both images and videos.

Few-Shot Learning Interactive Segmentation +6

The Scalability of Simplicity: Empirical Analysis of Vision-Language Learning with a Single Transformer

no code implementations14 Apr 2025 Weixian Lei, Jiacong Wang, Haochen Wang, Xiangtai Li, Jun Hao Liew, Jiashi Feng, Zilong Huang

This paper introduces SAIL, a single transformer unified multimodal large language model (MLLM) that integrates raw pixel encoding and language decoding within a singular architecture.

Language Modeling Language Modelling +3

An Empirical Study of GPT-4o Image Generation Capabilities

1 code implementation8 Apr 2025 Sixiang Chen, Jinbin Bai, Zhuoran Zhao, Tian Ye, Qingyu Shi, Donghao Zhou, Wenhao Chai, Xin Lin, Jianzong Wu, Chao Tang, Shilin Xu, Tao Zhang, Haobo Yuan, Yikang Zhou, Wei Chow, Linfeng Li, Xiangtai Li, Lei Zhu, Lu Qi

The landscape of image generation has rapidly evolved, from early GAN-based approaches to diffusion models and, most recently, to unified generative architectures that seek to bridge understanding and generation tasks.

Benchmarking Image Generation +3

Generative Classifier for Domain Generalization

no code implementations3 Apr 2025 Shaocong Long, Qianyu Zhou, Xiangtai Li, Chenhao Ying, Yunhai Tong, Lizhuang Ma, Yuan Luo, DaCheng Tao

To address these issues, we explore the theoretical implications of relying on domain invariance, revealing the crucial role of domain-specific information in mitigating the target risk for DG.

Blocking Domain Generalization +1

4th PVUW MeViS 3rd Place Report: Sa2VA

1 code implementation1 Apr 2025 Haobo Yuan, Tao Zhang, Xiangtai Li, Lu Qi, Zilong Huang, Shilin Xu, Jiashi Feng, Ming-Hsuan Yang

Referring video object segmentation (RVOS) is a challenging task that requires the model to segment the object in a video given the language description.

Language Modeling Language Modelling +5

Decouple and Track: Benchmarking and Improving Video Diffusion Transformers for Motion Transfer

1 code implementation21 Mar 2025 Qingyu Shi, Jianzong Wu, Jinbin Bai, Jiangning Zhang, Lu Qi, Xiangtai Li, Yunhai Tong

In contrast, state-of-the-art video Diffusion Transformers (DiT) models use 3D full attention, which does not explicitly separate temporal and spatial information.

Benchmarking Video Generation

Unified Dense Prediction of Video Diffusion

no code implementations12 Mar 2025 Lehan Yang, Lu Qi, Xiangtai Li, Sheng Li, Varun Jampani, Ming-Hsuan Yang

We present a unified network for simultaneously generating videos and their corresponding entity segmentation and depth maps from text prompts.

Prediction Video Generation

RobuRCDet: Enhancing Robustness of Radar-Camera Fusion in Bird's Eye View for 3D Object Detection

no code implementations18 Feb 2025 Jingtong Yue, Zhiwei Lin, Xin Lin, Xiaoyu Zhou, Xiangtai Li, Lu Qi, Yongtao Wang, Ming-Hsuan Yang

Specifically, we design a 3D Gaussian Expansion (3DGE) module to mitigate inaccuracies in radar points, including position, Radar Cross-Section (RCS), and velocity.

3D Object Detection Object +2

Are They the Same? Exploring Visual Correspondence Shortcomings of Multimodal LLMs

1 code implementation8 Jan 2025 Yikang Zhou, Tao Zhang, Shilin Xu, Shihao Chen, Qianyu Zhou, Yunhai Tong, Shunping Ji, Jiangning Zhang, Xiangtai Li, Lu Qi

Recent advancements in multimodal models have shown a strong ability in visual perception, reasoning abilities, and vision-language understanding.

Contrastive Learning

DiffSensei: Bridging Multi-Modal LLMs and Diffusion Models for Customized Manga Generation

no code implementations10 Dec 2024 Jianzong Wu, Chao Tang, Jingbo Wang, Yanhong Zeng, Xiangtai Li, Yunhai Tong

Story visualization, the task of creating visual narratives from textual descriptions, has seen progress with text-to-image generation models.

Language Modelling Large Language Model +3

EMOv2: Pushing 5M Vision Model Frontier

1 code implementation9 Dec 2024 Jiangning Zhang, Teng Hu, Haoyang He, Zhucun Xue, Yabiao Wang, Chengjie Wang, Yong liu, Xiangtai Li, DaCheng Tao

Our goal is to set up the new frontier of the 5M magnitude lightweight model on various downstream tasks.

Image Generation model +2

HumanEdit: A High-Quality Human-Rewarded Dataset for Instruction-based Image Editing

1 code implementation5 Dec 2024 Jinbin Bai, Wei Chow, Ling Yang, Xiangtai Li, Juncheng Li, Hanwang Zhang, Shuicheng Yan

HumanEdit bridges this gap by employing human annotators to construct data pairs and administrators to provide feedback.

RelationBooth: Towards Relation-Aware Customized Object Generation

no code implementations30 Oct 2024 Qingyu Shi, Lu Qi, Jianzong Wu, Jinbin Bai, Jingbo Wang, Yunhai Tong, Xiangtai Li, Ming-Husan Yang

Instead, this work addresses that gap by focusing on relation-aware customized image generation, which aims to preserve the identities from image prompts while maintaining the predicate relations described in text prompts.

Image Generation Object +1

Synergistic Dual Spatial-aware Generation of Image-to-Text and Text-to-Image

no code implementations20 Oct 2024 Yu Zhao, Hao Fei, Xiangtai Li, Libo Qin, Jiayi Ji, Hongyuan Zhu, Meishan Zhang, Min Zhang, Jianguo Wei

In the visual spatial understanding (VSU) area, spatial image-to-text (SI2T) and spatial text-to-image (ST2I) are two fundamental tasks that appear in dual form.

Image to text

Both Ears Wide Open: Towards Language-Driven Spatial Audio Generation

no code implementations14 Oct 2024 Peiwen Sun, Sitong Cheng, Xiangtai Li, Zhen Ye, Huadai Liu, Honggang Zhang, Wei Xue, Yike Guo

However, when it comes to stereo audio generation, the soundscapes often have a complex scene of multiple objects and directions.

Audio Generation multimodal generation

Meissonic: Revitalizing Masked Generative Transformers for Efficient High-Resolution Text-to-Image Synthesis

1 code implementation10 Oct 2024 Jinbin Bai, Tian Ye, Wei Chow, Enxin Song, Qing-Guo Chen, Xiangtai Li, Zhen Dong, Lei Zhu, Shuicheng Yan

We present Meissonic, which elevates non-autoregressive masked image modeling (MIM) text-to-image to a level comparable with state-of-the-art diffusion models like SDXL.

Feature Compression Image Generation

Video Prediction Transformers without Recurrence or Convolution

1 code implementation7 Oct 2024 Yujin Tang, Lu Qi, Fei Xie, Xiangtai Li, Chao Ma, Ming-Hsuan Yang

Video prediction has witnessed the emergence of RNN-based models led by ConvLSTM, and CNN-based models led by SimVP.

Decoder Prediction +1

MIMAFace: Face Animation via Motion-Identity Modulated Appearance Feature Learning

no code implementations23 Sep 2024 Yue Han, Junwei Zhu, Yuxiang Feng, Xiaozhong Ji, Keke He, Xiangtai Li, Zhucun Xue, Yong liu

Drawing from this analysis, we introduce a Motion-Identity Modulated Appearance Learning Module (MIA) that modulates CLIP features at both motion and identity levels.

PointDGMamba: Domain Generalization of Point Cloud Classification via Generalized State Space Model

1 code implementation24 Aug 2024 Hao Yang, Qianyu Zhou, Haijia Sun, Xiangtai Li, Fengqi Liu, Xuequan Lu, Lizhuang Ma, Shuicheng Yan

Domain Generalization (DG) has been recently explored to improve the generalizability of point cloud classification (PCC) models toward unseen domains.

Denoising Domain Generalization +2

You Can't Ignore Either: Unifying Structure and Feature Denoising for Robust Graph Learning

1 code implementation1 Aug 2024 Tianmeng Yang, Jiahao Meng, Min Zhou, Yaming Yang, Yujing Wang, Xiangtai Li, Yunhai Tong

However, the noises and attacks may come from both structures and features in graphs, making the graph denoising a dilemma and challenging problem.

Denoising Graph Learning

LLAVADI: What Matters For Multimodal Large Language Models Distillation

no code implementations28 Jul 2024 Shilin Xu, Xiangtai Li, Haobo Yuan, Lu Qi, Yunhai Tong, Ming-Hsuan Yang

The recent surge in Multimodal Large Language Models (MLLMs) has showcased their remarkable potential for achieving generalized intelligence by integrating visual understanding into Large Language Models. Nevertheless, the sheer model size of MLLMs leads to substantial memory and computational demands that hinder their widespread deployment.

Knowledge Distillation

MEDIC: Zero-shot Music Editing with Disentangled Inversion Control

no code implementations18 Jul 2024 Huadai Liu, Jialei Wang, Xiangtai Li, Rongjie Huang, Yang Liu, Jiayang Xu, Zhou Zhao

To counteract these issues, we introduce the Disentangled Inversion technique to disentangle the diffusion process into triple branches, rectifying the deviated path of the source branch caused by DDIM inversion.

Audio Generation

Auto Cherry-Picker: Learning from High-quality Generative Data Driven by Language

no code implementations28 Jun 2024 Yicheng Chen, Xiangtai Li, Yining Li, Yanhong Zeng, Jianzong Wu, Xiangyu Zhao, Kai Chen

Diffusion models can generate realistic and diverse images, potentially facilitating data availability for data-intensive perception tasks.

Image Captioning

Mamba or RWKV: Exploring High-Quality and High-Efficiency Segment Anything Model

1 code implementation27 Jun 2024 Haobo Yuan, Xiangtai Li, Lu Qi, Tao Zhang, Ming-Hsuan Yang, Shuicheng Yan, Chen Change Loy

Based on the benchmark results, our RWKV-SAM achieves outstanding performance in efficiency and segmentation quality compared to transformers and other linear attention models.

Mamba Segmentation

MotionBooth: Motion-Aware Customized Text-to-Video Generation

no code implementations25 Jun 2024 Jianzong Wu, Xiangtai Li, Yanhong Zeng, Jiangning Zhang, Qianyu Zhou, Yining Li, Yunhai Tong, Kai Chen

In this work, we present MotionBooth, an innovative framework designed for animating customized subjects with precise control over both object and camera movements.

Text-to-Video Generation Video Generation

MG-LLaVA: Towards Multi-Granularity Visual Instruction Tuning

1 code implementation25 Jun 2024 Xiangyu Zhao, Xiangtai Li, Haodong Duan, Haian Huang, Yining Li, Kai Chen, Hua Yang

We propose the integration of an additional high-resolution visual encoder to capture fine-grained details, which are then fused with base visual features through a Conv-Gate fusion network.

Object Object Recognition +1

Towards Semantic Equivalence of Tokenization in Multimodal LLM

no code implementations7 Jun 2024 Shengqiong Wu, Hao Fei, Xiangtai Li, Jiayi Ji, Hanwang Zhang, Tat-Seng Chua, Shuicheng Yan

The resulting vision tokens effectively preserve semantic integrity and capture both low-frequency and high-frequency visual features.

Visual Question Answering

BACON: Bayesian Optimal Condensation Framework for Dataset Distillation

1 code implementation3 Jun 2024 Zheng Zhou, Hongbo Zhao, Guangliang Cheng, Xiangtai Li, Shuchang Lyu, Wenquan Feng, Qi Zhao

Our extensive experiments confirm the effectiveness of BACON and its seamless integration with existing methods, thereby enhancing their performance for the DD task.

Dataset Distillation

SemFlow: Binding Semantic Segmentation and Image Synthesis via Rectified Flow

1 code implementation30 May 2024 Chaoyang Wang, Xiangtai Li, Lu Qi, Henghui Ding, Yunhai Tong, Ming-Hsuan Yang

For image synthesis, we propose a finite perturbation approach to enhance the diversity of generated results without changing the semantic categories.

Diversity Image Generation +2

Adversarial Attacks on Both Face Recognition and Face Anti-spoofing Models

no code implementations27 May 2024 Fengfan Zhou, Qianyu Zhou, Xiangtai Li, Xuequan Lu, Lizhuang Ma, Hefei Ling

In particular, we introduce a new attack method, namely Style-aligned Distribution Biasing (SDB), to improve the capacity of black-box attacks on both FR and FAS models.

Face Anti-Spoofing Face Recognition

Reason3D: Searching and Reasoning 3D Segmentation via Large Language Model

1 code implementation27 May 2024 Kuan-Chih Huang, Xiangtai Li, Lu Qi, Shuicheng Yan, Ming-Hsuan Yang

This foundational estimation facilitates a detailed, coarse-to-fine segmentation strategy that significantly enhances the precision of object identification and segmentation.

Decoder Language Modeling +5

PointRWKV: Efficient RWKV-Like Model for Hierarchical Point Cloud Learning

no code implementations24 May 2024 Qingdong He, Jiangning Zhang, Jinlong Peng, Haoyang He, Xiangtai Li, Yabiao Wang, Chengjie Wang

Transformers have revolutionized the point cloud learning task, but the quadratic complexity hinders its extension to long sequence and makes a burden on limited computational resources.

Mamba

CPT-Interp: Continuous sPatial and Temporal Motion Modeling for 4D Medical Image Interpolation

no code implementations24 May 2024 Xia Li, Runzhao Yang, Xiangtai Li, Antony Lomax, Ye Zhang, Joachim Buhmann

Motion information from 4D medical imaging offers critical insights into dynamic changes in patient anatomy for clinical assessments and radiotherapy planning and, thereby, enhances the capabilities of 3D image analysis.

Anatomy

Face Adapter for Pre-Trained Diffusion Models with Fine-Grained ID and Attribute Control

no code implementations21 May 2024 Yue Han, Junwei Zhu, Keke He, Xu Chen, Yanhao Ge, Wei Li, Xiangtai Li, Jiangning Zhang, Chengjie Wang, Yong liu

We observe that both face reenactment/swapping tasks essentially involve combinations of target structure, ID and attribute.

Attribute Decoder +1

4D Panoptic Scene Graph Generation

3 code implementations NeurIPS 2023 Jingkang Yang, Jun Cen, Wenxuan Peng, Shuai Liu, Fangzhou Hong, Xiangtai Li, Kaiyang Zhou, Qifeng Chen, Ziwei Liu

To facilitate research in this new area, we build a richly annotated PSG-4D dataset consisting of 3K RGB-D videos with a total of 1M frames, each of which is labeled with 4D panoptic segmentation masks as well as fine-grained, dynamic scene graphs.

4D Panoptic Segmentation Graph Generation +5

Point-In-Context: Understanding Point Cloud via In-Context Learning

1 code implementation18 Apr 2024 Mengyuan Liu, Zhongbin Fang, Xia Li, Joachim M. Buhmann, Xiangtai Li, Chen Change Loy

With the emergence of large-scale models trained on diverse datasets, in-context learning has emerged as a promising paradigm for multitasking, notably in natural language processing and image processing.

In-Context Learning

VG4D: Vision-Language Model Goes 4D Video Recognition

1 code implementation17 Apr 2024 Zhichao Deng, Xiangtai Li, Xia Li, Yunhai Tong, Shen Zhao, Mengyuan Liu

By transferring the knowledge of the VLM to the 4D encoder and combining the VLM, our VG4D achieves improved recognition performance.

Action Recognition Autonomous Driving +4

Learning Feature Inversion for Multi-class Anomaly Detection under General-purpose COCO-AD Benchmark

1 code implementation16 Apr 2024 Jiangning Zhang, Chengjie Wang, Xiangtai Li, Guanzhong Tian, Zhucun Xue, Yong liu, Guansong Pang, DaCheng Tao

Moreover, current metrics such as AU-ROC have nearly reached saturation on simple datasets, which prevents a comprehensive evaluation of different methods.

Multi-class Anomaly Detection object-detection +2

DGMamba: Domain Generalization via Generalized State Space Model

1 code implementation11 Apr 2024 Shaocong Long, Qianyu Zhou, Xiangtai Li, Xuequan Lu, Chenhao Ying, Yuan Luo, Lizhuang Ma, Shuicheng Yan

SPR strives to encourage the model to concentrate more on objects rather than context, consisting of two designs: Prior-Free Scanning~(PFS), and Domain Context Interchange~(DCI).

Domain Generalization Mamba +1

DVIS-DAQ: Improving Video Segmentation via Dynamic Anchor Queries

3 code implementations29 Mar 2024 Yikang Zhou, Tao Zhang, Shunping Ji, Shuicheng Yan, Xiangtai Li

Modern video segmentation methods adopt object queries to perform inter-frame association and demonstrate satisfactory performance in tracking continuously appearing objects despite large-scale motion and transient occlusion.

 Ranked #1 on Video Instance Segmentation on OVIS validation (using extra training data)

Object Video Instance Segmentation +2

GenView: Enhancing View Quality with Pretrained Generative Model for Self-Supervised Learning

1 code implementation18 Mar 2024 Xiaojie Li, Yibo Yang, Xiangtai Li, Jianlong Wu, Yue Yu, Bernard Ghanem, Min Zhang

To tackle these challenges, we present GenView, a controllable framework that augments the diversity of positive views leveraging the power of pretrained generative models while preserving semantics.

Contrastive Learning Data Augmentation +2

Point Cloud Mamba: Point Cloud Learning via State Space Model

2 code implementations1 Mar 2024 Tao Zhang, Haobo Yuan, Lu Qi, Jiangning Zhang, Qianyu Zhou, Shunping Ji, Shuicheng Yan, Xiangtai Li

To enable Mamba to process 3-D point cloud data more effectively, we propose a novel Consistent Traverse Serialization method to convert point clouds into 1-D point sequences while ensuring that neighboring points in the sequence are also spatially adjacent.

Mamba State Space Models +1

OMG-Seg: Is One Model Good Enough For All Segmentation?

1 code implementation CVPR 2024 Xiangtai Li, Haobo Yuan, Wei Li, Henghui Ding, Size Wu, Wenwei Zhang, Yining Li, Kai Chen, Chen Change Loy

In this work, we address various segmentation tasks, each traditionally tackled by distinct or partially unified models.

All Decoder +5

ModelNet-O: A Large-Scale Synthetic Dataset for Occlusion-Aware Point Cloud Classification

1 code implementation16 Jan 2024 Zhongbin Fang, Xia Li, Xiangtai Li, Shen Zhao, Mengyuan Liu

Through extensive experiments, we demonstrate that our PointMLS achieves state-of-the-art results on ModelNet-O and competitive results on regular datasets, and it is robust and effective.

3D Point Cloud Classification Point Cloud Classification

BA-SAM: Scalable Bias-Mode Attention Mask for Segment Anything Model

no code implementations CVPR 2024 Yiran Song, Qianyu Zhou, Xiangtai Li, Deng-Ping Fan, Xuequan Lu, Lizhuang Ma

To this end, we propose Scalable Bias-Mode Attention Mask (BA-SAM) to enhance SAM's adaptability to varying image resolutions while eliminating the need for structure modifications.

An Open and Comprehensive Pipeline for Unified Object Grounding and Detection

2 code implementations4 Jan 2024 Xiangyu Zhao, Yicheng Chen, Shilin Xu, Xiangtai Li, Xinjiang Wang, Yining Li, Haian Huang

Grounding-DINO is a state-of-the-art open-set detection model that tackles multiple vision tasks including Open-Vocabulary Detection (OVD), Phrase Grounding (PG), and Referring Expression Comprehension (REC).

Described Object Detection Phrase Grounding +2

A Generalist FaceX via Learning Unified Facial Representation

1 code implementation31 Dec 2023 Yue Han, Jiangning Zhang, Junwei Zhu, Xiangtai Li, Yanhao Ge, Wei Li, Chengjie Wang, Yong liu, Xiaoming Liu, Ying Tai

This work presents FaceX framework, a novel facial generalist model capable of handling diverse facial tasks simultaneously.

Facial Editing

Exploring Plain ViT Reconstruction for Multi-class Unsupervised Anomaly Detection

1 code implementation12 Dec 2023 Jiangning Zhang, Xuhai Chen, Yabiao Wang, Chengjie Wang, Yong liu, Xiangtai Li, Ming-Hsuan Yang, DaCheng Tao

\Eg, achieving 85. 4 mAD that surpasses UniAD by +3. 0 for the MVTec AD dataset, and it requires only 1. 1 hours and 2. 3G GPU memory to complete model training on a single V100 that can serve as a strong baseline to facilitate the development of future research.

Multi-class Anomaly Detection Unsupervised Anomaly Detection

EdgeSAM: Prompt-In-the-Loop Distillation for On-Device Deployment of SAM

1 code implementation11 Dec 2023 Chong Zhou, Xiangtai Li, Chen Change Loy, Bo Dai

It is also the first SAM variant that can run at over 30 FPS on an iPhone 14.

Decoder

Skeleton-in-Context: Unified Skeleton Sequence Modeling with In-Context Learning

1 code implementation CVPR 2024 Xinshun Wang, Zhongbin Fang, Xia Li, Xiangtai Li, Mengyuan Liu

Under this setting, the model can perceive tasks from prompts and accomplish them without any extra task-specific head predictions or model fine-tuning.

In-Context Learning motion prediction +1

Effective Adapter for Face Recognition in the Wild

no code implementations4 Dec 2023 Yunhao Liu, Yu-Ju Tsai, Kelvin C. K. Chan, Xiangtai Li, Lu Qi, Ming-Hsuan Yang

Traditional heuristic approaches-either training models directly on these degraded images or their enhanced counterparts using face restoration techniques-have proven ineffective, primarily due to the degradation of facial features and the discrepancy in image domains.

Face Recognition

Panoptic Video Scene Graph Generation

3 code implementations CVPR 2023 Jingkang Yang, Wenxuan Peng, Xiangtai Li, Zujin Guo, Liangyu Chen, Bo Li, Zheng Ma, Kaiyang Zhou, Wayne Zhang, Chen Change Loy, Ziwei Liu

PVSG relates to the existing video scene graph generation (VidSGG) problem, which focuses on temporal interactions between humans and objects grounded with bounding boxes in videos.

Graph Generation Panoptic Scene Graph Generation +5

Rethinking Evaluation Metrics of Open-Vocabulary Segmentaion

1 code implementation6 Nov 2023 Hao Zhou, Tiancheng Shen, Xu Yang, Hai Huang, Xiangtai Li, Lu Qi, Ming-Hsuan Yang

We benchmarked the proposed evaluation metrics on 12 open-vocabulary methods of three segmentation tasks.

Segmentation

OV-VG: A Benchmark for Open-Vocabulary Visual Grounding

1 code implementation22 Oct 2023 Chunlei Wang, Wenquan Feng, Xiangtai Li, Guangliang Cheng, Shuchang Lyu, Binghao Liu, Lijiang Chen, Qi Zhao

While current foundational models excel at various visual language tasks, there's a noticeable absence of models specifically tailored for open-vocabulary visual grounding.

Novel Concepts object-detection +3

DST-Det: Simple Dynamic Self-Training for Open-Vocabulary Object Detection

1 code implementation2 Oct 2023 Shilin Xu, Xiangtai Li, Size Wu, Wenwei Zhang, Yunhai Tong, Chen Change Loy

We refer to this approach as the self-training strategy, which enhances recall and accuracy for novel classes without requiring extra annotations, datasets, and re-training.

Novel Object Detection Object +6

CLIPSelf: Vision Transformer Distills Itself for Open-Vocabulary Dense Prediction

1 code implementation2 Oct 2023 Size Wu, Wenwei Zhang, Lumin Xu, Sheng Jin, Xiangtai Li, Wentao Liu, Chen Change Loy

However, when transferring the vision-language alignment of CLIP from global image representation to local region representation for the open-vocabulary dense prediction tasks, CLIP ViTs suffer from the domain shift from full images to local image regions.

Image Classification Image Segmentation +9

MosaicFusion: Diffusion Models as Data Augmenters for Large Vocabulary Instance Segmentation

1 code implementation22 Sep 2023 Jiahao Xie, Wei Li, Xiangtai Li, Ziwei Liu, Yew Soon Ong, Chen Change Loy

We present MosaicFusion, a simple yet effective diffusion-based data augmentation approach for large vocabulary instance segmentation.

Data Augmentation Instance Segmentation +1

Neural Collapse Terminus: A Unified Solution for Class Incremental Learning and Its Variants

2 code implementations3 Aug 2023 Yibo Yang, Haobo Yuan, Xiangtai Li, Jianlong Wu, Lefei Zhang, Zhouchen Lin, Philip Torr, DaCheng Tao, Bernard Ghanem

Beyond the normal case, long-tail class incremental learning and few-shot class incremental learning are also proposed to consider the data imbalance and data scarcity, respectively, which are common in real-world implementations and further exacerbate the well-known problem of catastrophic forgetting.

class-incremental learning Few-Shot Class-Incremental Learning +1

Pair then Relation: Pair-Net for Panoptic Scene Graph Generation

1 code implementation17 Jul 2023 Jinghao Wang, Zhengyu Wen, Xiangtai Li, Zujin Guo, Jingkang Yang, Ziwei Liu

Panoptic Scene Graph (PSG) is a challenging task in Scene Graph Generation (SGG) that aims to create a more comprehensive scene graph representation using panoptic segmentation instead of boxes.

Graph Generation Panoptic Scene Graph Generation +2

Explore In-Context Learning for 3D Point Cloud Understanding

2 code implementations NeurIPS 2023 Zhongbin Fang, Xiangtai Li, Xia Li, Joachim M. Buhmann, Chen Change Loy, Mengyuan Liu

With the rise of large-scale models trained on broad data, in-context learning has become a new learning paradigm that has demonstrated significant potential in natural language processing and computer vision tasks.

In-Context Learning

Change Detection Methods for Remote Sensing in the Last Decade: A Comprehensive Review

no code implementations9 May 2023 Guangliang Cheng, Yunmeng Huang, Xiangtai Li, Shuchang Lyu, Zhaoyang Xu, Qi Zhao, Shiming Xiang

We first introduce some preliminary knowledge for the change detection task, such as problem definition, datasets, evaluation metrics, and transformer basics, as well as provide a detailed taxonomy of existing algorithms from three different perspectives: algorithm granularity, supervision modes, and learning frameworks in the methodology section.

Change Detection Change detection for remote sensing images +1

Transformer-Based Visual Segmentation: A Survey

2 code implementations19 Apr 2023 Xiangtai Li, Henghui Ding, Haobo Yuan, Wenwei Zhang, Jiangmiao Pang, Guangliang Cheng, Kai Chen, Ziwei Liu, Chen Change Loy

Recently, transformers, a type of neural network based on self-attention originally designed for natural language processing, have considerably surpassed previous convolutional or recurrent approaches in various vision processing tasks.

Autonomous Driving Point Cloud Segmentation +2

Neural Collapse Inspired Feature-Classifier Alignment for Few-Shot Class Incremental Learning

1 code implementation6 Feb 2023 Yibo Yang, Haobo Yuan, Xiangtai Li, Zhouchen Lin, Philip Torr, DaCheng Tao

In this paper, we deal with this misalignment dilemma in FSCIL inspired by the recently discovered phenomenon named neural collapse, which reveals that the last-layer features of the same class will collapse into a vertex, and the vertices of all classes are aligned with the classifier prototypes, which are formed as a simplex equiangular tight frame (ETF).

class-incremental learning Few-Shot Class-Incremental Learning +1

Neural Collapse Inspired Feature-Classifier Alignment for Few-Shot Class-Incremental Learning

1 code implementation ICLR 2023 Yibo Yang, Haobo Yuan, Xiangtai Li, Zhouchen Lin, Philip Torr, DaCheng Tao

In this paper, we deal with this misalignment dilemma in FSCIL inspired by the recently discovered phenomenon named neural collapse, which reveals that the last-layer features of the same class will collapse into a vertex, and the vertices of all classes are aligned with the classifier prototypes, which are formed as a simplex equiangular tight frame (ETF).

class-incremental learning Few-Shot Class-Incremental Learning +1

Rethinking Mobile Block for Efficient Attention-based Models

1 code implementation ICCV 2023 Jiangning Zhang, Xiangtai Li, Jian Li, Liang Liu, Zhucun Xue, Boshen Zhang, Zhengkai Jiang, Tianxin Huang, Yabiao Wang, Chengjie Wang

This paper focuses on developing modern, efficient, lightweight models for dense predictions while trading off parameters, FLOPs, and performance.

Unity

PanopticPartFormer++: A Unified and Decoupled View for Panoptic Part Segmentation

1 code implementation3 Jan 2023 Xiangtai Li, Shilin Xu, Yibo Yang, Haobo Yuan, Guangliang Cheng, Yunhai Tong, Zhouchen Lin, Ming-Hsuan Yang, DaCheng Tao

Third, inspired by Mask2Former, based on our meta-architecture, we propose Panoptic-PartFormer++ and design a new part-whole cross-attention scheme to boost part segmentation qualities further.

Panoptic Segmentation Segmentation

Betrayed by Captions: Joint Caption Grounding and Generation for Open Vocabulary Instance Segmentation

2 code implementations ICCV 2023 Jianzong Wu, Xiangtai Li, Henghui Ding, Xia Li, Guangliang Cheng, Yunhai Tong, Chen Change Loy

Experiments on the COCO dataset with two settings: Open Vocabulary Instance Segmentation (OVIS) and Open Set Panoptic Segmentation (OSPS) demonstrate the superiority of the CGG.

Caption Generation Instance Segmentation +2

Convolution-enhanced Evolving Attention Networks

1 code implementation16 Dec 2022 Yujing Wang, Yaming Yang, Zhuo Li, Jiangang Bai, Mingliang Zhang, Xiangtai Li, Jing Yu, Ce Zhang, Gao Huang, Yunhai Tong

To the best of our knowledge, this is the first work that explicitly models the layer-wise evolution of attention maps.

Image Classification Machine Translation +3

EATFormer: Improving Vision Transformer Inspired by Evolutionary Algorithm

1 code implementation19 Jun 2022 Jiangning Zhang, Xiangtai Li, Yabiao Wang, Chengjie Wang, Yibo Yang, Yong liu, DaCheng Tao

Motivated by biological evolution, this paper explains the rationality of Vision Transformer by analogy with the proven practical evolutionary algorithm (EA) and derives that both have consistent mathematical formulation.

Image Classification

TransVOD: End-to-End Video Object Detection with Spatial-Temporal Transformers

3 code implementations13 Jan 2022 Qianyu Zhou, Xiangtai Li, Lu He, Yibo Yang, Guangliang Cheng, Yunhai Tong, Lizhuang Ma, DaCheng Tao

Detection Transformer (DETR) and Deformable DETR have been proposed to eliminate the need for many hand-designed components in object detection while demonstrating good performance as previous complex hand-crafted detectors.

Ranked #6 on Video Object Detection on ImageNet VID (using extra training data)

Object object-detection +2

Improving Video Instance Segmentation via Temporal Pyramid Routing

1 code implementation28 Jul 2021 Xiangtai Li, Hao He, Yibo Yang, Henghui Ding, Kuiyuan Yang, Guangliang Cheng, Yunhai Tong, DaCheng Tao

To incorporate both temporal and scale information, we propose a Temporal Pyramid Routing (TPR) strategy to conditionally align and conduct pixel-level aggregation from a feature pyramid pair of two adjacent frames.

Instance Segmentation Panoptic Segmentation +2

Global Aggregation then Local Distribution for Scene Parsing

1 code implementation28 Jul 2021 Xiangtai Li, Li Zhang, Guangliang Cheng, Kuiyuan Yang, Yunhai Tong, Xiatian Zhu, Tao Xiang

Modelling long-range contextual relationships is critical for pixel-wise prediction tasks such as semantic segmentation.

Scene Parsing Segmentation +1

BoundarySqueeze: Image Segmentation as Boundary Squeezing

1 code implementation25 May 2021 Hao He, Xiangtai Li, Yibo Yang, Guangliang Cheng, Yunhai Tong, Lubin Weng, Zhouchen Lin, Shiming Xiang

This module is used to squeeze the object boundary from both inner and outer directions, which contributes to precise mask representation.

Image Segmentation Instance Segmentation +2

Fast and Accurate Scene Parsing via Bi-direction Alignment Networks

1 code implementation25 May 2021 Yanran Wu, Xiangtai Li, Chen Shi, Yunhai Tong, Yang Hua, Tao Song, Ruhui Ma, Haibing Guan

Motivated by this, we propose a novel network by aligning two-path information into each other through a learned flow field.

Scene Parsing Semantic Segmentation

Dynamic Dual Sampling Module for Fine-Grained Semantic Segmentation

no code implementations25 May 2021 Chen Shi, Xiangtai Li, Yanran Wu, Yunhai Tong, Yi Xu

Representation of semantic context and local details is the essential issue for building modern semantic segmentation models.

Segmentation Semantic Segmentation

End-to-End Video Object Detection with Spatial-Temporal Transformers

1 code implementation23 May 2021 Lu He, Qianyu Zhou, Xiangtai Li, Li Niu, Guangliang Cheng, Xiao Li, Wenxuan Liu, Yunhai Tong, Lizhuang Ma, Liqing Zhang

Recently, DETR and Deformable DETR have been proposed to eliminate the need for many hand-designed components in object detection while demonstrating good performance as previous complex hand-crafted detectors.

Object object-detection +2

PointFlow: Flowing Semantics Through Points for Aerial Image Segmentation

1 code implementation CVPR 2021 Xiangtai Li, Hao He, Xia Li, Duo Li, Guangliang Cheng, Jianping Shi, Lubin Weng, Yunhai Tong, Zhouchen Lin

Experimental results on three different aerial segmentation datasets suggest that the proposed method is more effective and efficient than state-of-the-art general semantic segmentation methods.

Image Segmentation Segmentation +1

Towards Efficient Scene Understanding via Squeeze Reasoning

1 code implementation6 Nov 2020 Xiangtai Li, Xia Li, Ansheng You, Li Zhang, Guangliang Cheng, Kuiyuan Yang, Yunhai Tong, Zhouchen Lin

Instead of propagating information on the spatial map, we first learn to squeeze the input feature into a channel-wise global vector and perform reasoning within the single vector where the computation cost can be significantly reduced.

Instance Segmentation object-detection +4

Improving Semantic Segmentation via Decoupled Body and Edge Supervision

2 code implementations ECCV 2020 Xiangtai Li, Xia Li, Li Zhang, Guangliang Cheng, Jianping Shi, Zhouchen Lin, Shaohua Tan, Yunhai Tong

Our insight is that appealing performance of semantic segmentation requires \textit{explicitly} modeling the object \textit{body} and \textit{edge}, which correspond to the high and low frequency of the image.

Object Segmentation +1

Global Aggregation then Local Distribution in Fully Convolutional Networks

2 code implementations16 Sep 2019 Xiangtai Li, Li Zhang, Ansheng You, Maoke Yang, Kuiyuan Yang, Yunhai Tong

GALD is end-to-end trainable and can be easily plugged into existing FCNs with various global aggregation modules for a wide range of vision tasks, and consistently improves the performance of state-of-the-art object detection and instance segmentation approaches.

Instance Segmentation object-detection +4

Dual Graph Convolutional Network for Semantic Segmentation

6 code implementations13 Sep 2019 Li Zhang, Xiangtai Li, Anurag Arnab, Kuiyuan Yang, Yunhai Tong, Philip H. S. Torr

Exploiting long-range contextual information is key for pixel-wise prediction tasks such as semantic segmentation.

Semantic Segmentation

GFF: Gated Fully Fusion for Semantic Segmentation

2 code implementations3 Apr 2019 Xiangtai Li, Houlong Zhao, Lei Han, Yunhai Tong, Kuiyuan Yang

Semantic segmentation generates comprehensive understanding of scenes through densely predicting the category for each pixel.

Scene Understanding Segmentation +1

Cannot find the paper you are looking for? You can Submit a new open access paper.