no code implementations • 17 Apr 2025 • Xin Li, Yeying Jin, Xin Jin, Zongwei Wu, Bingchen Li, YuFei Wang, Wenhan Yang, Yu Li, Zhibo Chen, Bihan Wen, Robby T. Tan, Radu Timofte, Qiyu Rong, Hongyuan Jing, Mengmeng Zhang, Jinglong Li, Xiangyu Lu, Yi Ren, YuTing Liu, Meng Zhang, Xiang Chen, Qiyuan Guan, Jiangxin Dong, Jinshan Pan, Conglin Gou, Qirui Yang, Fangpu Zhang, Yunlong Lin, Sixiang Chen, Guoxi Huang, Ruirui Lin, Yan Zhang, Jingyu Yang, Huanjing Yue, Jiyuan Chen, Qiaosi Yi, Hongjun Wang, Chenxi Xie, Shuai Li, Yuhui Wu, Kaiyi Ma, Jiakui Hu, Juncheng Li, Liwen Pan, Guangwei Gao, Wenjie Li, Zhenyu Jin, Heng Guo, Zhanyu Ma, YuBo Wang, Jinghua Wang, Wangzhi Xing, Anjusree Karnavar, Diqi Chen, Mohammad Aminul Islam, Hao Yang, Ruikun Zhang, Liyuan Pan, Qianhao Luo, XinCao, Han Zhou, Yan Min, Wei Dong, Jun Chen, Taoyi Wu, Weijia Dou, Yu Wang, Shengjie Zhao, Yongcheng Huang, Xingyu Han, Anyan Huang, Hongtao Wu, Hong Wang, Yefeng Zheng, Abhijeet Kumar, Aman Kumar, Marcos V. Conde, Paula Garrido, Daniel Feijoo, Juan C. Benito, Guanglu Dong, Xin Lin, Siyuan Liu, Tianheng Zheng, Jiayu Zhong, Shouyi Wang, Xiangtai Li, Lanqing Guo, Lu Qi, Chao Ren, Shuaibo Wang, Shilong Zhang, Wanyu Zhou, Yunze Wu, Qinzhong Tan, Jieyuan Pei, Zhuoxuan Li, Jiayu Wang, Haoyu Bian, Haoran Sun, Subhajit Paul, Ni Tang, Junhao Huang, Zihan Cheng, Hongyun Zhu, Yuehan Wu, Kaixin Deng, Hang Ouyang, Tianxin Xiao, Fan Yang, Zhizun Luo, Zeyu Xiao, Zhuoyuan Li, Nguyen Pham Hoang Le, An Dinh Thien, Son T. Luu, Kiet Van Nguyen, Ronghua Xu, Xianmin Tian, Weijian Zhou, Jiacheng Zhang, Yuqian Chen, Yihang Duan, Yujie Wu, Suresh Raikwar, Arsh Garg, Kritika, Jianhua Zheng, Xiaoshan Ma, Ruolin Zhao, Yongyu Yang, Yongsheng Liang, Guiming Huang, Qiang Li, Hongbin Zhang, Xiangyu Zheng, A. N. Rajagopalan
This paper reviews the NTIRE 2025 Challenge on Day and Night Raindrop Removal for Dual-Focused Images.
1 code implementation • 16 Apr 2025 • Mengshi Qi, Pengfei Zhu, Xiangtai Li, Xiaoyang Bi, Lu Qi, Huadong Ma, Ming-Hsuan Yang
In this work, we propose the Dual Consistency SAM (DC-SAM) method based on prompt-tuning to adapt SAM and SAM2 for in-context segmentation of both images and videos.
no code implementations • 15 Apr 2025 • Henghui Ding, Chang Liu, Nikhila Ravi, Shuting He, Yunchao Wei, Song Bai, Philip Torr, Kehuan Song, Xinglin Xie, Kexin Zhang, Licheng Jiao, Lingling Li, Shuyuan Yang, Xuqiang Cao, Linnan Zhao, Jiaxuan Zhao, Fang Liu, Mengjiao Wang, Junpei Zhang, Xu Liu, Yuting Yang, Mengru Ma, Hao Fang, Runmin Cong, Xiankai Lu, Zhiyang Che, Wei Zhan, Tianming Liang, Haichao Jiang, Wei-Shi Zheng, Jian-Fang Hu, Haobo Yuan, Xiangtai Li, Tao Zhang, Lu Qi, Ming-Hsuan Yang
This report provides a comprehensive overview of the 4th Pixel-level Video Understanding in the Wild (PVUW) Challenge, held in conjunction with CVPR 2025.
1 code implementation • 14 Apr 2025 • Tao Zhang, Xiangtai Li, Zilong Huang, Yanwei Li, Weixian Lei, Xueqing Deng, Shihao Chen, Shunping Ji, Jiashi Feng
We present Pixel-SAIL, a single transformer for pixel-wise MLLM tasks.
no code implementations • 14 Apr 2025 • Weixian Lei, Jiacong Wang, Haochen Wang, Xiangtai Li, Jun Hao Liew, Jiashi Feng, Zilong Huang
This paper introduces SAIL, a single transformer unified multimodal large language model (MLLM) that integrates raw pixel encoding and language decoding within a singular architecture.
1 code implementation • 8 Apr 2025 • Sixiang Chen, Jinbin Bai, Zhuoran Zhao, Tian Ye, Qingyu Shi, Donghao Zhou, Wenhao Chai, Xin Lin, Jianzong Wu, Chao Tang, Shilin Xu, Tao Zhang, Haobo Yuan, Yikang Zhou, Wei Chow, Linfeng Li, Xiangtai Li, Lei Zhu, Lu Qi
The landscape of image generation has rapidly evolved, from early GAN-based approaches to diffusion models and, most recently, to unified generative architectures that seek to bridge understanding and generation tasks.
no code implementations • 3 Apr 2025 • Shaocong Long, Qianyu Zhou, Xiangtai Li, Chenhao Ying, Yunhai Tong, Lizhuang Ma, Yuan Luo, DaCheng Tao
To address these issues, we explore the theoretical implications of relying on domain invariance, revealing the crucial role of domain-specific information in mitigating the target risk for DG.
1 code implementation • 1 Apr 2025 • Haobo Yuan, Tao Zhang, Xiangtai Li, Lu Qi, Zilong Huang, Shilin Xu, Jiashi Feng, Ming-Hsuan Yang
Referring video object segmentation (RVOS) is a challenging task that requires the model to segment the object in a video given the language description.
1 code implementation • 21 Mar 2025 • Qingyu Shi, Jianzong Wu, Jinbin Bai, Jiangning Zhang, Lu Qi, Xiangtai Li, Yunhai Tong
In contrast, state-of-the-art video Diffusion Transformers (DiT) models use 3D full attention, which does not explicitly separate temporal and spatial information.
no code implementations • 12 Mar 2025 • Lehan Yang, Lu Qi, Xiangtai Li, Sheng Li, Varun Jampani, Ming-Hsuan Yang
We present a unified network for simultaneously generating videos and their corresponding entity segmentation and depth maps from text prompts.
no code implementations • 18 Feb 2025 • Jingtong Yue, Zhiwei Lin, Xin Lin, Xiaoyu Zhou, Xiangtai Li, Lu Qi, Yongtao Wang, Ming-Hsuan Yang
Specifically, we design a 3D Gaussian Expansion (3DGE) module to mitigate inaccuracies in radar points, including position, Radar Cross-Section (RCS), and velocity.
1 code implementation • 8 Jan 2025 • Yikang Zhou, Tao Zhang, Shilin Xu, Shihao Chen, Qianyu Zhou, Yunhai Tong, Shunping Ji, Jiangning Zhang, Xiangtai Li, Lu Qi
Recent advancements in multimodal models have shown a strong ability in visual perception, reasoning abilities, and vision-language understanding.
1 code implementation • 7 Jan 2025 • Haobo Yuan, Xiangtai Li, Tao Zhang, Zilong Huang, Shilin Xu, Shunping Ji, Yunhai Tong, Lu Qi, Jiashi Feng, Ming-Hsuan Yang
This work presents Sa2VA, the first unified model for dense grounded understanding of both images and videos.
Ranked #1 on
Referring Video Object Segmentation
on MeViS
no code implementations • 10 Dec 2024 • Jianzong Wu, Chao Tang, Jingbo Wang, Yanhong Zeng, Xiangtai Li, Yunhai Tong
Story visualization, the task of creating visual narratives from textual descriptions, has seen progress with text-to-image generation models.
1 code implementation • 9 Dec 2024 • Jiangning Zhang, Teng Hu, Haoyang He, Zhucun Xue, Yabiao Wang, Chengjie Wang, Yong liu, Xiangtai Li, DaCheng Tao
Our goal is to set up the new frontier of the 5M magnitude lightweight model on various downstream tasks.
1 code implementation • 5 Dec 2024 • Jinbin Bai, Wei Chow, Ling Yang, Xiangtai Li, Juncheng Li, Hanwang Zhang, Shuicheng Yan
HumanEdit bridges this gap by employing human annotators to construct data pairs and administrators to provide feedback.
no code implementations • 5 Dec 2024 • Zhenglin Huang, Jinwei Hu, Xiangtai Li, Yiwei He, Xingyu Zhao, Bei Peng, Baoyuan Wu, Xiaowei Huang, Guangliang Cheng
The rapid advancement of generative models in creating highly realistic images poses substantial risks for misinformation dissemination.
no code implementations • 4 Dec 2024 • Qingdong He, Jinlong Peng, Pengcheng Xu, Boyuan Jiang, Xiaobin Hu, Donghao Luo, Yong liu, Yabiao Wang, Chengjie Wang, Xiangtai Li, Jiangning Zhang
To enhance the controllability of text-to-image diffusion models, current ControlNet-like models have explored various control signals to dictate image attributes.
no code implementations • 30 Oct 2024 • Qingyu Shi, Lu Qi, Jianzong Wu, Jinbin Bai, Jingbo Wang, Yunhai Tong, Xiangtai Li, Ming-Husan Yang
Instead, this work addresses that gap by focusing on relation-aware customized image generation, which aims to preserve the identities from image prompts while maintaining the predicate relations described in text prompts.
no code implementations • 20 Oct 2024 • Yu Zhao, Hao Fei, Xiangtai Li, Libo Qin, Jiayi Ji, Hongyuan Zhu, Meishan Zhang, Min Zhang, Jianguo Wei
In the visual spatial understanding (VSU) area, spatial image-to-text (SI2T) and spatial text-to-image (ST2I) are two fundamental tasks that appear in dual form.
no code implementations • 14 Oct 2024 • Peiwen Sun, Sitong Cheng, Xiangtai Li, Zhen Ye, Huadai Liu, Honggang Zhang, Wei Xue, Yike Guo
However, when it comes to stereo audio generation, the soundscapes often have a complex scene of multiple objects and directions.
1 code implementation • 10 Oct 2024 • Jinbin Bai, Tian Ye, Wei Chow, Enxin Song, Qing-Guo Chen, Xiangtai Li, Zhen Dong, Lei Zhu, Shuicheng Yan
We present Meissonic, which elevates non-autoregressive masked image modeling (MIM) text-to-image to a level comparable with state-of-the-art diffusion models like SDXL.
1 code implementation • 7 Oct 2024 • Yujin Tang, Lu Qi, Fei Xie, Xiangtai Li, Chao Ma, Ming-Hsuan Yang
Video prediction has witnessed the emergence of RNN-based models led by ConvLSTM, and CNN-based models led by SimVP.
Ranked #1 on
Video Prediction
on Moving MNIST
no code implementations • 23 Sep 2024 • Yue Han, Junwei Zhu, Yuxiang Feng, Xiaozhong Ji, Keke He, Xiangtai Li, Zhucun Xue, Yong liu
Drawing from this analysis, we introduce a Motion-Identity Modulated Appearance Learning Module (MIA) that modulates CLIP features at both motion and identity levels.
1 code implementation • 24 Aug 2024 • Hao Yang, Qianyu Zhou, Haijia Sun, Xiangtai Li, Fengqi Liu, Xuequan Lu, Lizhuang Ma, Shuicheng Yan
Domain Generalization (DG) has been recently explored to improve the generalizability of point cloud classification (PCC) models toward unseen domains.
1 code implementation • 1 Aug 2024 • Tianmeng Yang, Jiahao Meng, Min Zhou, Yaming Yang, Yujing Wang, Xiangtai Li, Yunhai Tong
However, the noises and attacks may come from both structures and features in graphs, making the graph denoising a dilemma and challenging problem.
no code implementations • 28 Jul 2024 • Shilin Xu, Xiangtai Li, Haobo Yuan, Lu Qi, Yunhai Tong, Ming-Hsuan Yang
The recent surge in Multimodal Large Language Models (MLLMs) has showcased their remarkable potential for achieving generalized intelligence by integrating visual understanding into Large Language Models. Nevertheless, the sheer model size of MLLMs leads to substantial memory and computational demands that hinder their widespread deployment.
no code implementations • 18 Jul 2024 • Huadai Liu, Jialei Wang, Xiangtai Li, Rongjie Huang, Yang Liu, Jiayang Xu, Zhou Zhao
To counteract these issues, we introduce the Disentangled Inversion technique to disentangle the diffusion process into triple branches, rectifying the deviated path of the source branch caused by DDIM inversion.
no code implementations • 28 Jun 2024 • Yicheng Chen, Xiangtai Li, Yining Li, Yanhong Zeng, Jianzong Wu, Xiangyu Zhao, Kai Chen
Diffusion models can generate realistic and diverse images, potentially facilitating data availability for data-intensive perception tasks.
1 code implementation • 27 Jun 2024 • Haobo Yuan, Xiangtai Li, Lu Qi, Tao Zhang, Ming-Hsuan Yang, Shuicheng Yan, Chen Change Loy
Based on the benchmark results, our RWKV-SAM achieves outstanding performance in efficiency and segmentation quality compared to transformers and other linear attention models.
1 code implementation • 27 Jun 2024 • Tao Zhang, Xiangtai Li, Hao Fei, Haobo Yuan, Shengqiong Wu, Shunping Ji, Chen Change Loy, Shuicheng Yan
Current universal segmentation methods demonstrate strong capabilities in pixel-level image and video understanding.
no code implementations • 25 Jun 2024 • Jianzong Wu, Xiangtai Li, Yanhong Zeng, Jiangning Zhang, Qianyu Zhou, Yining Li, Yunhai Tong, Kai Chen
In this work, we present MotionBooth, an innovative framework designed for animating customized subjects with precise control over both object and camera movements.
1 code implementation • 25 Jun 2024 • Xiangyu Zhao, Xiangtai Li, Haodong Duan, Haian Huang, Yining Li, Kai Chen, Hua Yang
We propose the integration of an additional high-resolution visual encoder to capture fine-grained details, which are then fused with base visual features through a Conv-Gate fusion network.
Ranked #75 on
Visual Question Answering
on MM-Vet
no code implementations • 7 Jun 2024 • Shengqiong Wu, Hao Fei, Xiangtai Li, Jiayi Ji, Hanwang Zhang, Tat-Seng Chua, Shuicheng Yan
The resulting vision tokens effectively preserve semantic integrity and capture both low-frequency and high-frequency visual features.
Ranked #73 on
Visual Question Answering
on MM-Vet
1 code implementation • 3 Jun 2024 • Zheng Zhou, Hongbo Zhao, Guangliang Cheng, Xiangtai Li, Shuchang Lyu, Wenquan Feng, Qi Zhao
Our extensive experiments confirm the effectiveness of BACON and its seamless integration with existing methods, thereby enhancing their performance for the DD task.
1 code implementation • 30 May 2024 • Chaoyang Wang, Xiangtai Li, Lu Qi, Henghui Ding, Yunhai Tong, Ming-Hsuan Yang
For image synthesis, we propose a finite perturbation approach to enhance the diversity of generated results without changing the semantic categories.
no code implementations • 27 May 2024 • Fengfan Zhou, Qianyu Zhou, Xiangtai Li, Xuequan Lu, Lizhuang Ma, Hefei Ling
In particular, we introduce a new attack method, namely Style-aligned Distribution Biasing (SDB), to improve the capacity of black-box attacks on both FR and FAS models.
1 code implementation • 27 May 2024 • Kuan-Chih Huang, Xiangtai Li, Lu Qi, Shuicheng Yan, Ming-Hsuan Yang
This foundational estimation facilitates a detailed, coarse-to-fine segmentation strategy that significantly enhances the precision of object identification and segmentation.
no code implementations • 24 May 2024 • Qingdong He, Jiangning Zhang, Jinlong Peng, Haoyang He, Xiangtai Li, Yabiao Wang, Chengjie Wang
Transformers have revolutionized the point cloud learning task, but the quadratic complexity hinders its extension to long sequence and makes a burden on limited computational resources.
no code implementations • 24 May 2024 • Hanchen Tai, Qingdong He, Jiangning Zhang, Yijie Qian, Zhenyu Zhang, Xiaobin Hu, Xiangtai Li, Yabiao Wang, Yong liu
This framework is designed to perform understanding tasks for any 3D scene without requiring prior knowledge of the scene.
no code implementations • 24 May 2024 • Xia Li, Runzhao Yang, Xiangtai Li, Antony Lomax, Ye Zhang, Joachim Buhmann
Motion information from 4D medical imaging offers critical insights into dynamic changes in patient anatomy for clinical assessments and radiotherapy planning and, thereby, enhances the capabilities of 3D image analysis.
no code implementations • 21 May 2024 • Yue Han, Junwei Zhu, Keke He, Xu Chen, Yanhao Ge, Wei Li, Xiangtai Li, Jiangning Zhang, Chengjie Wang, Yong liu
We observe that both face reenactment/swapping tasks essentially involve combinations of target structure, ID and attribute.
3 code implementations • NeurIPS 2023 • Jingkang Yang, Jun Cen, Wenxuan Peng, Shuai Liu, Fangzhou Hong, Xiangtai Li, Kaiyang Zhou, Qifeng Chen, Ziwei Liu
To facilitate research in this new area, we build a richly annotated PSG-4D dataset consisting of 3K RGB-D videos with a total of 1M frames, each of which is labeled with 4D panoptic segmentation masks as well as fine-grained, dynamic scene graphs.
1 code implementation • 18 Apr 2024 • Mengyuan Liu, Zhongbin Fang, Xia Li, Joachim M. Buhmann, Xiangtai Li, Chen Change Loy
With the emergence of large-scale models trained on diverse datasets, in-context learning has emerged as a promising paradigm for multitasking, notably in natural language processing and image processing.
1 code implementation • 17 Apr 2024 • Zhichao Deng, Xiangtai Li, Xia Li, Yunhai Tong, Shen Zhao, Mengyuan Liu
By transferring the knowledge of the VLM to the 4D encoder and combining the VLM, our VG4D achieves improved recognition performance.
1 code implementation • 16 Apr 2024 • Jiangning Zhang, Chengjie Wang, Xiangtai Li, Guanzhong Tian, Zhucun Xue, Yong liu, Guansong Pang, DaCheng Tao
Moreover, current metrics such as AU-ROC have nearly reached saturation on simple datasets, which prevents a comprehensive evaluation of different methods.
1 code implementation • 11 Apr 2024 • Shaocong Long, Qianyu Zhou, Xiangtai Li, Xuequan Lu, Chenhao Ying, Yuan Luo, Lizhuang Ma, Shuicheng Yan
SPR strives to encourage the model to concentrate more on objects rather than context, consisting of two designs: Prior-Free Scanning~(PFS), and Domain Context Interchange~(DCI).
3 code implementations • 9 Apr 2024 • Haoyang He, Yuhu Bai, Jiangning Zhang, Qingdong He, Hongxu Chen, Zhenye Gan, Chengjie Wang, Xiangtai Li, Guanzhong Tian, Lei Xie
Recent advancements in anomaly detection have seen the efficacy of CNN- and transformer-based approaches.
3 code implementations • 29 Mar 2024 • Yikang Zhou, Tao Zhang, Shunping Ji, Shuicheng Yan, Xiangtai Li
Modern video segmentation methods adopt object queries to perform inter-frame association and demonstrate satisfactory performance in tracking continuously appearing objects despite large-scale motion and transient occlusion.
Ranked #1 on
Video Instance Segmentation
on OVIS validation
(using extra training data)
1 code implementation • 18 Mar 2024 • Xiaojie Li, Yibo Yang, Xiangtai Li, Jianlong Wu, Yue Yu, Bernard Ghanem, Min Zhang
To tackle these challenges, we present GenView, a controllable framework that augments the diversity of positive views leveraging the power of pretrained generative models while preserving semantics.
no code implementations • 14 Mar 2024 • Chaoyang Wang, Xiangtai Li, Henghui Ding, Lu Qi, Jiangning Zhang, Yunhai Tong, Chen Change Loy, Shuicheng Yan
In-context segmentation has drawn more attention with the introduction of vision foundation models.
2 code implementations • 1 Mar 2024 • Tao Zhang, Haobo Yuan, Lu Qi, Jiangning Zhang, Qianyu Zhou, Shunping Ji, Shuicheng Yan, Xiangtai Li
To enable Mamba to process 3-D point cloud data more effectively, we propose a novel Consistent Traverse Serialization method to convert point clouds into 1-D point sequences while ensuring that neighboring points in the sequence are also spatially adjacent.
no code implementations • 4 Feb 2024 • Lu Qi, Yi-Wen Chen, Lehan Yang, Tiancheng Shen, Xiangtai Li, Weidong Guo, Yu Xu, Ming-Hsuan Yang
In this work, we propose a novel approach to densely ground visual entities from a long caption.
1 code implementation • CVPR 2024 • Jianzong Wu, Xiangtai Li, Chenyang Si, Shangchen Zhou, Jingkang Yang, Jiangning Zhang, Yining Li, Kai Chen, Yunhai Tong, Ziwei Liu, Chen Change Loy
We introduce a new task -- language-driven video inpainting, which uses natural language instructions to guide the inpainting process.
1 code implementation • CVPR 2024 • Xiangtai Li, Haobo Yuan, Wei Li, Henghui Ding, Size Wu, Wenwei Zhang, Yining Li, Kai Chen, Chen Change Loy
In this work, we address various segmentation tasks, each traditionally tackled by distinct or partially unified models.
1 code implementation • 18 Jan 2024 • Shilin Xu, Haobo Yuan, Qingyu Shi, Lu Qi, Jingbo Wang, Yibo Yang, Yining Li, Kai Chen, Yunhai Tong, Bernard Ghanem, Xiangtai Li, Ming-Hsuan Yang
Segment Anything Model (SAM) is one remarkable model that can achieve generalized segmentation.
1 code implementation • 16 Jan 2024 • Zhongbin Fang, Xia Li, Xiangtai Li, Shen Zhao, Mengyuan Liu
Through extensive experiments, we demonstrate that our PointMLS achieves state-of-the-art results on ModelNet-O and competitive results on regular datasets, and it is robust and effective.
Ranked #25 on
3D Point Cloud Classification
on ModelNet40
1 code implementation • 5 Jan 2024 • Haobo Yuan, Xiangtai Li, Chong Zhou, Yining Li, Kai Chen, Chen Change Loy
The CLIP and Segment Anything Model (SAM) are remarkable vision foundation models (VFMs).
no code implementations • CVPR 2024 • Yiran Song, Qianyu Zhou, Xiangtai Li, Deng-Ping Fan, Xuequan Lu, Lizhuang Ma
To this end, we propose Scalable Bias-Mode Attention Mask (BA-SAM) to enhance SAM's adaptability to varying image resolutions while eliminating the need for structure modifications.
2 code implementations • 4 Jan 2024 • Xiangyu Zhao, Yicheng Chen, Shilin Xu, Xiangtai Li, Xinjiang Wang, Yining Li, Haian Huang
Grounding-DINO is a state-of-the-art open-set detection model that tackles multiple vision tasks including Open-Vocabulary Detection (OVD), Phrase Grounding (PG), and Referring Expression Comprehension (REC).
no code implementations • CVPR 2024 • Chang Liu, Xiangtai Li, Henghui Ding
Significant advancements have been made in image editing with the recent advance of the Diffusion model.
1 code implementation • 31 Dec 2023 • Yue Han, Jiangning Zhang, Junwei Zhu, Xiangtai Li, Yanhao Ge, Wei Li, Chengjie Wang, Yong liu, Xiaoming Liu, Ying Tai
This work presents FaceX framework, a novel facial generalist model capable of handling diverse facial tasks simultaneously.
1 code implementation • 12 Dec 2023 • Jiangning Zhang, Xuhai Chen, Yabiao Wang, Chengjie Wang, Yong liu, Xiangtai Li, Ming-Hsuan Yang, DaCheng Tao
\Eg, achieving 85. 4 mAD that surpasses UniAD by +3. 0 for the MVTec AD dataset, and it requires only 1. 1 hours and 2. 3G GPU memory to complete model training on a single V100 that can serve as a strong baseline to facilitate the development of future research.
Multi-class Anomaly Detection
Unsupervised Anomaly Detection
1 code implementation • CVPR 2024 • Peng Lu, Tao Jiang, Yining Li, Xiangtai Li, Kai Chen, Wenming Yang
Real-time multi-person pose estimation presents significant challenges in balancing speed and precision.
Ranked #1 on
Multi-Person Pose Estimation
on CrowdPose
(using extra training data)
1 code implementation • 11 Dec 2023 • Chong Zhou, Xiangtai Li, Chen Change Loy, Bo Dai
It is also the first SAM variant that can run at over 30 FPS on an iPhone 14.
1 code implementation • CVPR 2024 • Xinshun Wang, Zhongbin Fang, Xia Li, Xiangtai Li, Mengyuan Liu
Under this setting, the model can perceive tasks from prompts and accomplish them without any extra task-specific head predictions or model fine-tuning.
no code implementations • 4 Dec 2023 • Yunhao Liu, Yu-Ju Tsai, Kelvin C. K. Chan, Xiangtai Li, Lu Qi, Ming-Hsuan Yang
Traditional heuristic approaches-either training models directly on these degraded images or their enhanced counterparts using face restoration techniques-have proven ineffective, primarily due to the degradation of facial features and the discrepancy in image domains.
3 code implementations • CVPR 2023 • Jingkang Yang, Wenxuan Peng, Xiangtai Li, Zujin Guo, Liangyu Chen, Bo Li, Zheng Ma, Kaiyang Zhou, Wayne Zhang, Chen Change Loy, Ziwei Liu
PVSG relates to the existing video scene graph generation (VidSGG) problem, which focuses on temporal interactions between humans and objects grounded with bounding boxes in videos.
1 code implementation • 6 Nov 2023 • Hao Zhou, Tiancheng Shen, Xu Yang, Hai Huang, Xiangtai Li, Lu Qi, Ming-Hsuan Yang
We benchmarked the proposed evaluation metrics on 12 open-vocabulary methods of three segmentation tasks.
1 code implementation • 22 Oct 2023 • Chunlei Wang, Wenquan Feng, Xiangtai Li, Guangliang Cheng, Shuchang Lyu, Binghao Liu, Lijiang Chen, Qi Zhao
While current foundational models excel at various visual language tasks, there's a noticeable absence of models specifically tailored for open-vocabulary visual grounding.
1 code implementation • 2 Oct 2023 • Shilin Xu, Xiangtai Li, Size Wu, Wenwei Zhang, Yunhai Tong, Chen Change Loy
We refer to this approach as the self-training strategy, which enhances recall and accuracy for novel classes without requiring extra annotations, datasets, and re-training.
1 code implementation • 2 Oct 2023 • Size Wu, Wenwei Zhang, Lumin Xu, Sheng Jin, Xiangtai Li, Wentao Liu, Chen Change Loy
However, when transferring the vision-language alignment of CLIP from global image representation to local region representation for the open-vocabulary dense prediction tasks, CLIP ViTs suffer from the domain shift from full images to local image regions.
Ranked #4 on
Open Vocabulary Panoptic Segmentation
on ADE20K
1 code implementation • 22 Sep 2023 • Jiahao Xie, Wei Li, Xiangtai Li, Ziwei Liu, Yew Soon Ong, Chen Change Loy
We present MosaicFusion, a simple yet effective diffusion-based data augmentation approach for large vocabulary instance segmentation.
2 code implementations • 3 Aug 2023 • Yibo Yang, Haobo Yuan, Xiangtai Li, Jianlong Wu, Lefei Zhang, Zhouchen Lin, Philip Torr, DaCheng Tao, Bernard Ghanem
Beyond the normal case, long-tail class incremental learning and few-shot class incremental learning are also proposed to consider the data imbalance and data scarcity, respectively, which are common in real-world implementations and further exacerbate the well-known problem of catastrophic forgetting.
class-incremental learning
Few-Shot Class-Incremental Learning
+1
1 code implementation • 23 Jul 2023 • Menghao Li, Chunlei Wang, Wenquan Feng, Shuchang Lyu, Guangliang Cheng, Xiangtai Li, Binghao Liu, Qi Zhao
The proposed framework is evaluated on five regular VG datasets and two newly constructed robust VG datasets.
1 code implementation • 17 Jul 2023 • Jinghao Wang, Zhengyu Wen, Xiangtai Li, Zujin Guo, Jingkang Yang, Ziwei Liu
Panoptic Scene Graph (PSG) is a challenging task in Scene Graph Generation (SGG) that aims to create a more comprehensive scene graph representation using panoptic segmentation instead of boxes.
1 code implementation • 28 Jun 2023 • Jianzong Wu, Xiangtai Li, Shilin Xu, Haobo Yuan, Henghui Ding, Yibo Yang, Xia Li, Jiangning Zhang, Yunhai Tong, Xudong Jiang, Bernard Ghanem, DaCheng Tao
To our knowledge, this is the first comprehensive literature review of open vocabulary learning.
2 code implementations • NeurIPS 2023 • Zhongbin Fang, Xiangtai Li, Xia Li, Joachim M. Buhmann, Chen Change Loy, Mengyuan Liu
With the rise of large-scale models trained on broad data, in-context learning has become a new learning paradigm that has demonstrated significant potential in natural language processing and computer vision tasks.
no code implementations • 9 May 2023 • Guangliang Cheng, Yunmeng Huang, Xiangtai Li, Shuchang Lyu, Zhaoyang Xu, Qi Zhao, Shiming Xiang
We first introduce some preliminary knowledge for the change detection task, such as problem definition, datasets, evaluation metrics, and transformer basics, as well as provide a detailed taxonomy of existing algorithms from three different perspectives: algorithm granularity, supervision modes, and learning frameworks in the methodology section.
Change Detection
Change detection for remote sensing images
+1
2 code implementations • 19 Apr 2023 • Xiangtai Li, Henghui Ding, Haobo Yuan, Wenwei Zhang, Jiangmiao Pang, Guangliang Cheng, Kai Chen, Ziwei Liu, Chen Change Loy
Recently, transformers, a type of neural network based on self-attention originally designed for natural language processing, have considerably surpassed previous convolutional or recurrent approaches in various vision processing tasks.
1 code implementation • ICCV 2023 • Xiangtai Li, Haobo Yuan, Wenwei Zhang, Guangliang Cheng, Jiangmiao Pang, Chen Change Loy
Our framework is a near-online approach that takes a short subclip as input and outputs the corresponding spatial-temporal tube masks.
1 code implementation • 6 Feb 2023 • Yibo Yang, Haobo Yuan, Xiangtai Li, Zhouchen Lin, Philip Torr, DaCheng Tao
In this paper, we deal with this misalignment dilemma in FSCIL inspired by the recently discovered phenomenon named neural collapse, which reveals that the last-layer features of the same class will collapse into a vertex, and the vertices of all classes are aligned with the classifier prototypes, which are formed as a simplex equiangular tight frame (ETF).
class-incremental learning
Few-Shot Class-Incremental Learning
+1
1 code implementation • ICLR 2023 • Yibo Yang, Haobo Yuan, Xiangtai Li, Zhouchen Lin, Philip Torr, DaCheng Tao
In this paper, we deal with this misalignment dilemma in FSCIL inspired by the recently discovered phenomenon named neural collapse, which reveals that the last-layer features of the same class will collapse into a vertex, and the vertices of all classes are aligned with the classifier prototypes, which are formed as a simplex equiangular tight frame (ETF).
Ranked #5 on
Few-Shot Class-Incremental Learning
on mini-Imagenet
class-incremental learning
Few-Shot Class-Incremental Learning
+1
1 code implementation • ICCV 2023 • Jiangning Zhang, Xiangtai Li, Jian Li, Liang Liu, Zhucun Xue, Boshen Zhang, Zhengkai Jiang, Tianxin Huang, Yabiao Wang, Chengjie Wang
This paper focuses on developing modern, efficient, lightweight models for dense predictions while trading off parameters, FLOPs, and performance.
1 code implementation • 3 Jan 2023 • Xiangtai Li, Shilin Xu, Yibo Yang, Haobo Yuan, Guangliang Cheng, Yunhai Tong, Zhouchen Lin, Ming-Hsuan Yang, DaCheng Tao
Third, inspired by Mask2Former, based on our meta-architecture, we propose Panoptic-PartFormer++ and design a new part-whole cross-attention scheme to boost part segmentation qualities further.
1 code implementation • 3 Jan 2023 • Yue Han, Jiangning Zhang, Yabiao Wang, Chengjie Wang, Yong liu, Lu Qi, Xiangtai Li, Ming-Hsuan Yang
Few-Shot Instance Segmentation (FSIS) requires detecting and segmenting novel classes with limited support examples.
2 code implementations • ICCV 2023 • Jianzong Wu, Xiangtai Li, Henghui Ding, Xia Li, Guangliang Cheng, Yunhai Tong, Chen Change Loy
Experiments on the COCO dataset with two settings: Open Vocabulary Instance Segmentation (OVIS) and Open Set Panoptic Segmentation (OSPS) demonstrate the superiority of the CGG.
1 code implementation • 16 Dec 2022 • Yujing Wang, Yaming Yang, Zhuo Li, Jiangang Bai, Mingliang Zhang, Xiangtai Li, Jing Yu, Ce Zhang, Gao Huang, Yunhai Tong
To the best of our knowledge, this is the first work that explicitly models the layer-wise evolution of attention maps.
1 code implementation • 20 Sep 2022 • Jianzong Wu, Xiangtai Li, Xia Li, Henghui Ding, Yunhai Tong, DaCheng Tao
It considers the negative sentence inputs besides the regular positive text inputs.
1 code implementation • 10 Jul 2022 • Xiangtai Li, Jiangning Zhang, Yibo Yang, Guangliang Cheng, Kuiyuan Yang, Yunhai Tong, DaCheng Tao
In this paper, we focus on exploring effective methods for faster, accurate, and domain agnostic semantic segmentation.
Ranked #12 on
Semantic Segmentation
on BDD100K val
1 code implementation • 19 Jun 2022 • Jiangning Zhang, Xiangtai Li, Yabiao Wang, Chengjie Wang, Yibo Yang, Yong liu, DaCheng Tao
Motivated by biological evolution, this paper explains the rationality of Vision Transformer by analogy with the proven practical evolutionary algorithm (EA) and derives that both have consistent mathematical formulation.
1 code implementation • 28 May 2022 • Yangyang Xu, Xiangtai Li, Haobo Yuan, Yibo Yang, Lefei Zhang
We first model each task with a task-relevant query.
1 code implementation • CVPR 2022 • Xiangtai Li, Wenwei Zhang, Jiangmiao Pang, Kai Chen, Guangliang Cheng, Yunhai Tong, Chen Change Loy
We hope this simple, yet effective method can serve as a new, flexible baseline in unified video segmentation design.
Ranked #1 on
Video Panoptic Segmentation
on KITTI-STEP
(using extra training data)
1 code implementation • 10 Apr 2022 • Shilin Xu, Xiangtai Li, Jingbo Wang, Guangliang Cheng, Yunhai Tong, DaCheng Tao
This focus on joint human fashion segmentation and attribute recognition.
1 code implementation • 10 Apr 2022 • Xiangtai Li, Shilin Xu, Yibo Yang, Guangliang Cheng, Yunhai Tong, DaCheng Tao
To the best of our knowledge, we are the first to solve the PPS problem via \textit{a unified and end-to-end transformer model.
1 code implementation • 17 Mar 2022 • Yibo Yang, Shixiang Chen, Xiangtai Li, Liang Xie, Zhouchen Lin, DaCheng Tao
Modern deep neural networks for classification usually jointly learn a backbone for representation and a linear classifier to output the logit of each class.
Ranked #28 on
Long-tail Learning
on CIFAR-10-LT (ρ=100)
3 code implementations • 13 Jan 2022 • Qianyu Zhou, Xiangtai Li, Lu He, Yibo Yang, Guangliang Cheng, Yunhai Tong, Lizhuang Ma, DaCheng Tao
Detection Transformer (DETR) and Deformable DETR have been proposed to eliminate the need for many hand-designed components in object detection while demonstrating good performance as previous complex hand-crafted detectors.
Ranked #6 on
Video Object Detection
on ImageNet VID
(using extra training data)
1 code implementation • 5 Dec 2021 • Haobo Yuan, Xiangtai Li, Yibo Yang, Guangliang Cheng, Jing Zhang, Yunhai Tong, Lefei Zhang, DaCheng Tao
The Depth-aware Video Panoptic Segmentation (DVPS) is a new challenging vision problem that aims to predict panoptic segmentation and depth in a video simultaneously.
1 code implementation • 28 Jul 2021 • Xiangtai Li, Hao He, Yibo Yang, Henghui Ding, Kuiyuan Yang, Guangliang Cheng, Yunhai Tong, DaCheng Tao
To incorporate both temporal and scale information, we propose a Temporal Pyramid Routing (TPR) strategy to conditionally align and conduct pixel-level aggregation from a feature pyramid pair of two adjacent frames.
1 code implementation • 28 Jul 2021 • Xiangtai Li, Li Zhang, Guangliang Cheng, Kuiyuan Yang, Yunhai Tong, Xiatian Zhu, Tao Xiang
Modelling long-range contextual relationships is critical for pixel-wise prediction tasks such as semantic segmentation.
1 code implementation • 25 May 2021 • Hao He, Xiangtai Li, Yibo Yang, Guangliang Cheng, Yunhai Tong, Lubin Weng, Zhouchen Lin, Shiming Xiang
This module is used to squeeze the object boundary from both inner and outer directions, which contributes to precise mask representation.
1 code implementation • 25 May 2021 • Yanran Wu, Xiangtai Li, Chen Shi, Yunhai Tong, Yang Hua, Tao Song, Ruhui Ma, Haibing Guan
Motivated by this, we propose a novel network by aligning two-path information into each other through a learned flow field.
Ranked #12 on
Semantic Segmentation
on BDD100K val
no code implementations • 25 May 2021 • Chen Shi, Xiangtai Li, Yanran Wu, Yunhai Tong, Yi Xu
Representation of semantic context and local details is the essential issue for building modern semantic segmentation models.
1 code implementation • 23 May 2021 • Lu He, Qianyu Zhou, Xiangtai Li, Li Niu, Guangliang Cheng, Xiao Li, Wenxuan Liu, Yunhai Tong, Lizhuang Ma, Liqing Zhang
Recently, DETR and Deformable DETR have been proposed to eliminate the need for many hand-designed components in object detection while demonstrating good performance as previous complex hand-crafted detectors.
1 code implementation • ICCV 2021 • Hao He, Xiangtai Li, Guangliang Cheng, Jianping Shi, Yunhai Tong, Gaofeng Meng, Véronique Prinet, Lubin Weng
We use these two modules to design a decoder that generates accurate and clean segmentation results, especially on the object contours.
Ranked #20 on
Thermal Image Segmentation
on RGB-T-Glass-Segmentation
1 code implementation • CVPR 2021 • Xiangtai Li, Hao He, Xia Li, Duo Li, Guangliang Cheng, Jianping Shi, Lubin Weng, Yunhai Tong, Zhouchen Lin
Experimental results on three different aerial segmentation datasets suggest that the proposed method is more effective and efficient than state-of-the-art general semantic segmentation methods.
13 code implementations • CVPR 2021 • Duo Li, Jie Hu, Changhu Wang, Xiangtai Li, Qi She, Lei Zhu, Tong Zhang, Qifeng Chen
Convolution has been the core ingredient of modern neural networks, triggering the surge of deep learning in vision.
Ranked #768 on
Image Classification
on ImageNet
1 code implementation • 6 Nov 2020 • Xiangtai Li, Xia Li, Ansheng You, Li Zhang, Guangliang Cheng, Kuiyuan Yang, Yunhai Tong, Zhouchen Lin
Instead of propagating information on the spatial map, we first learn to squeeze the input feature into a channel-wise global vector and perform reasoning within the single vector where the computation cost can be significantly reduced.
2 code implementations • ECCV 2020 • Xiangtai Li, Xia Li, Li Zhang, Guangliang Cheng, Jianping Shi, Zhouchen Lin, Shaohua Tan, Yunhai Tong
Our insight is that appealing performance of semantic segmentation requires \textit{explicitly} modeling the object \textit{body} and \textit{edge}, which correspond to the high and low frequency of the image.
6 code implementations • ECCV 2020 • Xiangtai Li, Ansheng You, Zhen Zhu, Houlong Zhao, Maoke Yang, Kuiyuan Yang, Yunhai Tong
A common practice to improve the performance is to attain high resolution feature maps with strong semantic representation.
Ranked #2 on
Real-Time Semantic Segmentation
on Cityscapes test
2 code implementations • 16 Sep 2019 • Xiangtai Li, Li Zhang, Ansheng You, Maoke Yang, Kuiyuan Yang, Yunhai Tong
GALD is end-to-end trainable and can be easily plugged into existing FCNs with various global aggregation modules for a wide range of vision tasks, and consistently improves the performance of state-of-the-art object detection and instance segmentation approaches.
Ranked #1 on
Semantic Segmentation
on PASCAL VOC 2007
6 code implementations • 13 Sep 2019 • Li Zhang, Xiangtai Li, Anurag Arnab, Kuiyuan Yang, Yunhai Tong, Philip H. S. Torr
Exploiting long-range contextual information is key for pixel-wise prediction tasks such as semantic segmentation.
Ranked #32 on
Semantic Segmentation
on Cityscapes test
2 code implementations • 3 Apr 2019 • Xiangtai Li, Houlong Zhao, Lei Han, Yunhai Tong, Kuiyuan Yang
Semantic segmentation generates comprehensive understanding of scenes through densely predicting the category for each pixel.
Ranked #29 on
Semantic Segmentation
on Cityscapes test