no code implementations • 13 Dec 2024 • Haopeng Sun, Yingwei Zhang, Lumin Xu, Sheng Jin, Yiqiang Chen
Segmentation of ultra-high resolution (UHR) images is a critical task with numerous applications, yet it poses significant challenges due to high spatial resolution and rich fine details.
no code implementations • 4 Nov 2024 • Jie Yang, Wang Zeng, Sheng Jin, Lumin Xu, Wentao Liu, Chen Qian, Ruimao Zhang
To bridge this gap, we introduce the novel challenge of Semantic Keypoint Comprehension, which aims to comprehend keypoints across different task scenarios, including keypoint semantic understanding, visual prompt-based keypoint detection, and textual prompt-based keypoint detection.
1 code implementation • 16 Jul 2024 • Wang Zeng, Sheng Jin, Lumin Xu, Wentao Liu, Chen Qian, Wanli Ouyang, Ping Luo, Xiaogang Wang
Our dynamic tokens possess two crucial characteristics: (1) Representing image regions with similar semantic meanings using the same vision token, even if those regions are not adjacent, and (2) concentrating on regions with valuable details and represent them using fine tokens.
1 code implementation • 9 Jun 2024 • Size Wu, Sheng Jin, Wenwei Zhang, Lumin Xu, Wentao Liu, Wei Li, Chen Change Loy
To address this issue, we present F-LMM -- grounding frozen off-the-shelf LMMs in human-AI conversations -- a straightforward yet effective design based on the fact that word-pixel correspondences conducive to visual grounding inherently exist in the attention weights of well-trained LMMs.
1 code implementation • 30 Apr 2024 • Sheng Jin, Ruijie Yao, Lumin Xu, Wentao Liu, Chen Qian, Ji Wu, Ping Luo
In this paper, we propose UniFS, a universal few-shot instance perception model that unifies a wide range of instance perception tasks by reformulating them into a dynamic point representation learning framework.
Ranked #1 on
Few-Shot Object Detection
on MS-COCO (5-shot)
1 code implementation • 18 Dec 2023 • Size Wu, Wenwei Zhang, Lumin Xu, Sheng Jin, Wentao Liu, Chen Change Loy
Our experimental results demonstrate that CLIM improves different baseline open-vocabulary object detectors by a large margin on both OV-COCO and OV-LVIS benchmarks.
Ranked #9 on
Open Vocabulary Object Detection
on LVIS v1.0
1 code implementation • 8 Oct 2023 • Hao Zhang, Lumin Xu, Shenqi Lai, Wenqi Shao, Nanning Zheng, Ping Luo, Yu Qiao, Kaipeng Zhang
Current image-based keypoint detection methods for animal (including human) bodies and faces are generally divided into full-supervised and few-shot class-agnostic approaches.
1 code implementation • 2 Oct 2023 • Size Wu, Wenwei Zhang, Lumin Xu, Sheng Jin, Xiangtai Li, Wentao Liu, Chen Change Loy
However, when transferring the vision-language alignment of CLIP from global image representation to local region representation for the open-vocabulary dense prediction tasks, CLIP ViTs suffer from the domain shift from full images to local image regions.
1 code implementation • 28 Aug 2023 • Ruijie Yao, Sheng Jin, Lumin Xu, Wang Zeng, Wentao Liu, Chen Qian, Ping Luo, Ji Wu
Multi-Label Image Recognition (MLIR) is a challenging task that aims to predict multiple object labels in a single image while modeling the complex relationships between labels and image regions.
Ranked #3 on
Multi-Label Classification
on PASCAL VOC 2007
1 code implementation • 23 Aug 2022 • Lumin Xu, Sheng Jin, Wentao Liu, Chen Qian, Wanli Ouyang, Ping Luo, Xiaogang Wang
We propose a single-network approach, termed ZoomNet, to take into account the hierarchical structure of the full human body and solve the scale variation of different body parts.
Ranked #5 on
2D Human Pose Estimation
on COCO-WholeBody
1 code implementation • 21 Jul 2022 • Lumin Xu, Sheng Jin, Wang Zeng, Wentao Liu, Chen Qian, Wanli Ouyang, Ping Luo, Xiaogang Wang
In this paper, we introduce the task of Category-Agnostic Pose Estimation (CAPE), which aims to create a pose estimation model capable of detecting the pose of any class of object given only a few samples with keypoint definition.
Ranked #2 on
Category-Agnostic Pose Estimation
on MP100
4 code implementations • CVPR 2021 • Lumin Xu, Yingda Guan, Sheng Jin, Wentao Liu, Chen Qian, Ping Luo, Wanli Ouyang, Xiaogang Wang
Human pose estimation has achieved significant progress in recent years.
Ranked #23 on
Pose Estimation
on COCO test-dev
2 code implementations • ECCV 2020 • Sheng Jin, Lumin Xu, Jin Xu, Can Wang, Wentao Liu, Chen Qian, Wanli Ouyang, Ping Luo
This paper investigates the task of 2D human whole-body pose estimation, which aims to localize dense landmarks on the entire human body including face, hands, body, and feet.
Ranked #12 on
2D Human Pose Estimation
on COCO-WholeBody