no code implementations • 5 Sep 2024 • Xi Chen, Haosen Yang, Sheng Jin, Xiatian Zhu, Hongxun Yao
To fully exploit pre-trained knowledge while minimizing training overhead, we freeze both foundation models, focusing optimization efforts solely on a lightweight transformer decoder for mask proposal generation-the performance bottleneck.
no code implementations • 23 Jul 2024 • Kai Liu, Zhihang Fu, Sheng Jin, Ze Chen, Fan Zhou, Rongxin Jiang, Yaowu Chen, Jieping Ye
The resulting Efficient Small Object Detection (ESOD) approach is a generic framework, which can be applied to both CNN- and ViT-based detectors to save the computation and GPU memory costs.
no code implementations • 23 Jul 2024 • Kai Liu, Zhihang Fu, Sheng Jin, Chao Chen, Ze Chen, Rongxin Jiang, Fan Zhou, Yaowu Chen, Jieping Ye
Detecting and rejecting unknown out-of-distribution (OOD) samples is critical for deployed neural networks to void unreliable predictions.
no code implementations • NeurIPS 2023 • Kai Liu, Zhihang Fu, Chao Chen, Sheng Jin, Ze Chen, Mingyuan Tao, Rongxin Jiang, Jieping Ye
The key to OOD detection has two aspects: generalized feature representation and precise category description.
1 code implementation • 16 Jul 2024 • Wang Zeng, Sheng Jin, Lumin Xu, Wentao Liu, Chen Qian, Wanli Ouyang, Ping Luo, Xiaogang Wang
Our dynamic tokens possess two crucial characteristics: (1) Representing image regions with similar semantic meanings using the same vision token, even if those regions are not adjacent, and (2) concentrating on regions with valuable details and represent them using fine tokens.
1 code implementation • 14 Jul 2024 • Yi Zhang, Wang Zeng, Sheng Jin, Chen Qian, Ping Luo, Wentao Liu
With multi-modal joint training, our model achieves state-of-the-art performance on a wide range of pedestrian detection benchmarks, surpassing leading models tailored for specific sensor modality.
Ranked #1 on Object Detection on STCrowd
1 code implementation • 9 Jun 2024 • Size Wu, Sheng Jin, Wenwei Zhang, Lumin Xu, Wentao Liu, Wei Li, Chen Change Loy
To address this issue, we present F-LMM -- grounding frozen off-the-shelf LMMs in human-AI conversations -- a straightforward yet effective design based on the fact that word-pixel correspondences conducive to visual grounding inherently exist in the attention weights of well-trained LMMs.
no code implementations • 13 May 2024 • Xueying Jiang, Sheng Jin, Xiaoqin Zhang, Ling Shao, Shijian Lu
With the proposed object occlusion and completion, MonoMAE learns enriched 3D representations that achieve superior monocular 3D detection performance qualitatively and quantitatively for both occluded and non-occluded objects.
1 code implementation • 30 Apr 2024 • Sheng Jin, Ruijie Yao, Lumin Xu, Wentao Liu, Chen Qian, Ji Wu, Ping Luo
In this paper, we propose UniFS, a universal few-shot instance perception model that unifies a wide range of instance perception tasks by reformulating them into a dynamic point representation learning framework.
Ranked #1 on Few-Shot Object Detection on MS-COCO (5-shot)
no code implementations • CVPR 2024 • Xueying Jiang, Sheng Jin, Lewei Lu, Xiaoqin Zhang, Shijian Lu
We propose SKD-WM3D, a weakly supervised monocular 3D detection framework that exploits depth information to achieve M3D with a single-view image exclusively without any 3D annotations or other training data.
no code implementations • 23 Feb 2024 • Zekang Yang, Wang Zeng, Sheng Jin, Chen Qian, Ping Luo, Wentao Liu
Automated machine learning (AutoML) is a collection of techniques designed to automate the machine learning development process.
no code implementations • 7 Feb 2024 • Sheng Jin, Xueying Jiang, Jiaxing Huang, Lewei Lu, Shijian Lu
This paper presents DVDet, a Descriptor-Enhanced Open Vocabulary Detector that introduces conditional context prompts and hierarchical textual descriptors that enable precise region-text alignment as well as open-vocabulary detection training in general.
1 code implementation • 18 Dec 2023 • Size Wu, Wenwei Zhang, Lumin Xu, Sheng Jin, Wentao Liu, Chen Change Loy
Our experimental results demonstrate that CLIM improves different baseline open-vocabulary object detectors by a large margin on both OV-COCO and OV-LVIS benchmarks.
Ranked #8 on Open Vocabulary Object Detection on LVIS v1.0
no code implementations • 12 Dec 2023 • Xiaojie Fang, Xingguo Song, Xiangyin Meng, Xu Fang, Sheng Jin
The low-level spatial detail information and high-level semantic abstract information are both essential to the semantic segmentation task.
1 code implementation • 9 Dec 2023 • Sheng Jin, Shuhuai Li, Tong Li, Wentao Liu, Chen Qian, Ping Luo
Human-centric perception (e. g. detection, segmentation, pose estimation, and attribute analysis) is a long-standing problem for computer vision.
1 code implementation • 2 Oct 2023 • Size Wu, Wenwei Zhang, Lumin Xu, Sheng Jin, Xiangtai Li, Wentao Liu, Chen Change Loy
However, when transferring the vision-language alignment of CLIP from global image representation to local region representation for the open-vocabulary dense prediction tasks, CLIP ViTs suffer from the domain shift from full images to local image regions.
no code implementations • ICCV 2023 • Xueying Jiang, Jiaxing Huang, Sheng Jin, Shijian Lu
Despite its recent progress, most existing work suffers from the misalignment between the difficulty level of training samples and the capability of contemporarily trained models, leading to over-fitting or under-fitting in the trained generalization model.
1 code implementation • 28 Aug 2023 • Ruijie Yao, Sheng Jin, Lumin Xu, Wang Zeng, Wentao Liu, Chen Qian, Ping Luo, Ji Wu
Multi-Label Image Recognition (MLIR) is a challenging task that aims to predict multiple object labels in a single image while modeling the complex relationships between labels and image regions.
Ranked #3 on Multi-Label Classification on PASCAL VOC 2007
1 code implementation • ICCV 2023 • Kai Liu, Sheng Jin, Zhihang Fu, Ze Chen, Rongxin Jiang, Jieping Ye
The resulting accurate pseudo-tracklets boost learning the feature consistency.
no code implementations • 29 Jun 2023 • Jiaxing Huang, Jingyi Zhang, Han Qiu, Sheng Jin, Shijian Lu
Traditional domain adaptation assumes the same vocabulary across source and target domains, which often struggles with limited transfer flexibility and efficiency while handling target domains with different vocabularies.
1 code implementation • 3 Apr 2023 • Jingyi Zhang, Jiaxing Huang, Sheng Jin, Shijian Lu
Most visual recognition studies rely heavily on crowd-labelled data in deep neural networks (DNNs) training, and they usually train a DNN for each single visual recognition task, leading to a laborious and time-consuming visual recognition paradigm.
1 code implementation • CVPR 2023 • Size Wu, Wenwei Zhang, Sheng Jin, Wentao Liu, Chen Change Loy
The embeddings of regions in a bag are treated as embeddings of words in a sentence, and they are sent to the text encoder of a VLM to obtain the bag-of-regions embedding, which is learned to be aligned to the corresponding features extracted by a frozen VLM.
Ranked #8 on Open Vocabulary Object Detection on MSCOCO (using extra training data)
no code implementations • 23 Nov 2022 • Haoqing Luo, Sheng Jin
The prevailing reinforcement-learning-based traffic signal control methods are typically staging-optimizable or duration-optimizable, depending on the action spaces.
1 code implementation • 23 Aug 2022 • Lumin Xu, Sheng Jin, Wentao Liu, Chen Qian, Wanli Ouyang, Ping Luo, Xiaogang Wang
We propose a single-network approach, termed ZoomNet, to take into account the hierarchical structure of the full human body and solve the scale variation of different body parts.
Ranked #5 on 2D Human Pose Estimation on COCO-WholeBody
1 code implementation • 16 Aug 2022 • Wentao Jiang, Sheng Jin, Wentao Liu, Chen Qian, Ping Luo, Si Liu
Human pose estimation aims to accurately estimate a wide variety of human poses.
1 code implementation • 22 Jul 2022 • Hao Meng, Sheng Jin, Wentao Liu, Chen Qian, Mengxiang Lin, Wanli Ouyang, Ping Luo
Unlike most previous works that directly predict the 3D poses of two interacting hands simultaneously, we propose to decompose the challenging interacting hand pose estimation task and estimate the pose of each hand separately.
1 code implementation • 21 Jul 2022 • Lumin Xu, Sheng Jin, Wang Zeng, Wentao Liu, Chen Qian, Wanli Ouyang, Ping Luo, Xiaogang Wang
In this paper, we introduce the task of Category-Agnostic Pose Estimation (CAPE), which aims to create a pose estimation model capable of detecting the pose of any class of object given only a few samples with keypoint definition.
Ranked #1 on Category-Agnostic Pose Estimation on MP100
1 code implementation • CVPR 2022 • Wang Zeng, Sheng Jin, Wentao Liu, Chen Qian, Ping Luo, Wanli Ouyang, Xiaogang Wang
Vision transformers have achieved great successes in many computer vision tasks.
Ranked #7 on 2D Human Pose Estimation on COCO-WholeBody
no code implementations • ICLR 2022 • Can Wang, Sheng Jin, Yingda Guan, Wentao Liu, Chen Qian, Ping Luo, Wanli Ouyang
PL approaches apply pseudo-labels to unlabeled data, and then train the model with a combination of the labeled and pseudo-labeled data iteratively.
1 code implementation • 15 Dec 2021 • Haosen Yang, Wenhao Wu, Lining Wang, Sheng Jin, Boyang xia, Hongxun Yao, Hujie Huang
To evaluate the confidence of proposals, the existing works typically predict action score of proposals that are supervised by the temporal Intersection-over-Union (tIoU) between proposal and the ground-truth.
1 code implementation • ICCV 2021 • Size Wu, Sheng Jin, Wentao Liu, Lei Bai, Chen Qian, Dong Liu, Wanli Ouyang
Following the top-down paradigm, we decompose the task into two stages, i. e. person localization and pose estimation.
Ranked #2 on 3D Multi-Person Pose Estimation on Panoptic (using extra training data)
4 code implementations • CVPR 2021 • Lumin Xu, Yingda Guan, Sheng Jin, Wentao Liu, Chen Qian, Ping Luo, Wanli Ouyang, Xiaogang Wang
Human pose estimation has achieved significant progress in recent years.
Ranked #23 on Pose Estimation on COCO test-dev
1 code implementation • CVPR 2021 • Jiahang Wang, Sheng Jin, Wentao Liu, Weizhong Liu, Chen Qian, Ping Luo
However, unlike human vision that is robust to various data corruptions such as blur and pixelation, current pose estimators are easily confused by these corruptions.
no code implementations • 11 Feb 2021 • Sheng Jin
Then I fit two exponential decay functions of detection efficiency along with the increase of planetary orbital distance and the decrease of planetary radius.
Earth and Planetary Astrophysics Solar and Stellar Astrophysics
1 code implementation • NeurIPS 2020 • Nan Jiang, Sheng Jin, Zhiyao Duan, ChangShui Zhang
An interaction reward model is trained on the duets formed from outer parts of Bach chorales to model counterpoint interaction, while a style reward model is trained on monophonic melodies of Chinese folk songs to model melodic patterns.
no code implementations • ECCV 2020 • Sheng Jin, Wentao Liu, Enze Xie, Wenhai Wang, Chen Qian, Wanli Ouyang, Ping Luo
The modules of HGG can be trained end-to-end with the keypoint detection network and is able to supervise the grouping process in a hierarchical manner.
Ranked #3 on Keypoint Detection on OCHuman
2 code implementations • ECCV 2020 • Sheng Jin, Lumin Xu, Jin Xu, Can Wang, Wentao Liu, Chen Qian, Wanli Ouyang, Ping Luo
This paper investigates the task of 2D human whole-body pose estimation, which aims to localize dense landmarks on the entire human body including face, hands, body, and feet.
Ranked #12 on 2D Human Pose Estimation on COCO-WholeBody
no code implementations • 8 Feb 2020 • Nan Jiang, Sheng Jin, Zhiyao Duan, Chang-Shui Zhang
We cast this as a reinforcement learning problem, where the generation agent learns a policy to generate a musical note (action) based on previously generated context (state).
1 code implementation • 27 Dec 2019 • Chao Chen, Zhihang Fu, Zhihong Chen, Sheng Jin, Zhaowei Cheng, Xinyu Jin, Xian-Sheng Hua
In particular, our proposed HoMM can perform arbitrary-order moment tensor matching, we show that the first-order HoMM is equivalent to Maximum Mean Discrepancy (MMD) and the second-order HoMM is equivalent to Correlation Alignment (CORAL).
no code implementations • 20 Nov 2019 • Sheng Jin, Shangchen Zhou, Yao Liu, Chao Chen, Xiaoshuai Sun, Hongxun Yao, Xian-Sheng Hua
In this paper, we propose a novel Semi-supervised Self-pace Adversarial Hashing method, named SSAH to solve the above problems in a unified framework.
2 code implementations • ICCV 2019 • Haodong Duan, Kwan-Yee Lin, Sheng Jin, Wentao Liu, Chen Qian, Wanli Ouyang
In this paper, we propose the Triplet Representation for Body (TRB) -- a compact 2D human body representation, with skeleton keypoints capturing human pose information and contour keypoints containing human shape information.
Conditional Image Generation Open-Ended Question Answering +1
no code implementations • CVPR 2019 • Sheng Jin, Wentao Liu, Wanli Ouyang, Chen Qian
Our framework consists of two main components,~\ie~SpatialNet and TemporalNet.
1 code implementation • NeurIPS 2018 • Hu Liu, Sheng Jin, Chang-Shui Zhang
Connectionist Temporal Classification (CTC) is an objective function for end-to-end sequence learning, which adopts dynamic programming algorithms to directly learn the mapping between sequences.
no code implementations • 4 Jul 2018 • Sheng Jin, Hongxun Yao, Xiaoshuai Sun, Shangchen Zhou, Lei Zhang, Xian-Sheng Hua
As the core of DSaH, the saliency loss guides the attention network to mine discriminative regions from pairs of images.
no code implementations • 19 Mar 2018 • Sheng Jin
In this paper, we propose a novel unsupervised deep hashing method for large-scale image retrieval.