Search Results for author: Sheng Jin

Found 45 papers, 25 papers with code

FrozenSeg: Harmonizing Frozen Foundation Models for Open-Vocabulary Segmentation

no code implementations5 Sep 2024 Xi Chen, Haosen Yang, Sheng Jin, Xiatian Zhu, Hongxun Yao

To fully exploit pre-trained knowledge while minimizing training overhead, we freeze both foundation models, focusing optimization efforts solely on a lightweight transformer decoder for mask proposal generation-the performance bottleneck.

Decoder Segmentation

ESOD: Efficient Small Object Detection on High-Resolution Images

no code implementations23 Jul 2024 Kai Liu, Zhihang Fu, Sheng Jin, Ze Chen, Fan Zhou, Rongxin Jiang, Yaowu Chen, Jieping Ye

The resulting Efficient Small Object Detection (ESOD) approach is a generic framework, which can be applied to both CNN- and ViT-based detectors to save the computation and GPU memory costs.

Object object-detection +1

Rethinking Out-of-Distribution Detection on Imbalanced Data Distribution

no code implementations23 Jul 2024 Kai Liu, Zhihang Fu, Sheng Jin, Chao Chen, Ze Chen, Rongxin Jiang, Fan Zhou, Yaowu Chen, Jieping Ye

Detecting and rejecting unknown out-of-distribution (OOD) samples is critical for deployed neural networks to void unreliable predictions.

Out-of-Distribution Detection

TCFormer: Visual Recognition via Token Clustering Transformer

1 code implementation16 Jul 2024 Wang Zeng, Sheng Jin, Lumin Xu, Wentao Liu, Chen Qian, Wanli Ouyang, Ping Luo, Xiaogang Wang

Our dynamic tokens possess two crucial characteristics: (1) Representing image regions with similar semantic meanings using the same vision token, even if those regions are not adjacent, and (2) concentrating on regions with valuable details and represent them using fine tokens.

Clustering Image Classification +4

When Pedestrian Detection Meets Multi-Modal Learning: Generalist Model and Benchmark Dataset

1 code implementation14 Jul 2024 Yi Zhang, Wang Zeng, Sheng Jin, Chen Qian, Ping Luo, Wentao Liu

With multi-modal joint training, our model achieves state-of-the-art performance on a wide range of pedestrian detection benchmarks, surpassing leading models tailored for specific sensor modality.

3D Object Detection Multispectral Object Detection +1

F-LMM: Grounding Frozen Large Multimodal Models

1 code implementation9 Jun 2024 Size Wu, Sheng Jin, Wenwei Zhang, Lumin Xu, Wentao Liu, Wei Li, Chen Change Loy

To address this issue, we present F-LMM -- grounding frozen off-the-shelf LMMs in human-AI conversations -- a straightforward yet effective design based on the fact that word-pixel correspondences conducive to visual grounding inherently exist in the attention weights of well-trained LMMs.

General Knowledge Instruction Following +5

MonoMAE: Enhancing Monocular 3D Detection through Depth-Aware Masked Autoencoders

no code implementations13 May 2024 Xueying Jiang, Sheng Jin, Xiaoqin Zhang, Ling Shao, Shijian Lu

With the proposed object occlusion and completion, MonoMAE learns enriched 3D representations that achieve superior monocular 3D detection performance qualitatively and quantitatively for both occluded and non-occluded objects.

Monocular 3D Object Detection Object +1

UniFS: Universal Few-shot Instance Perception with Point Representations

1 code implementation30 Apr 2024 Sheng Jin, Ruijie Yao, Lumin Xu, Wentao Liu, Chen Qian, Ji Wu, Ping Luo

In this paper, we propose UniFS, a universal few-shot instance perception model that unifies a wide range of instance perception tasks by reformulating them into a dynamic point representation learning framework.

Few-Shot Learning Few-Shot Object Detection +4

Weakly Supervised Monocular 3D Detection with a Single-View Image

no code implementations CVPR 2024 Xueying Jiang, Sheng Jin, Lewei Lu, Xiaoqin Zhang, Shijian Lu

We propose SKD-WM3D, a weakly supervised monocular 3D detection framework that exploits depth information to achieve M3D with a single-view image exclusively without any 3D annotations or other training data.

Object Localization Self-Knowledge Distillation +1

LLMs Meet VLMs: Boost Open Vocabulary Object Detection with Fine-grained Descriptors

no code implementations7 Feb 2024 Sheng Jin, Xueying Jiang, Jiaxing Huang, Lewei Lu, Shijian Lu

This paper presents DVDet, a Descriptor-Enhanced Open Vocabulary Detector that introduces conditional context prompts and hierarchical textual descriptors that enable precise region-text alignment as well as open-vocabulary detection training in general.

Image Classification object-detection +1

CLIM: Contrastive Language-Image Mosaic for Region Representation

1 code implementation18 Dec 2023 Size Wu, Wenwei Zhang, Lumin Xu, Sheng Jin, Wentao Liu, Chen Change Loy

Our experimental results demonstrate that CLIM improves different baseline open-vocabulary object detectors by a large margin on both OV-COCO and OV-LVIS benchmarks.

Object object-detection +1

MCFNet: Multi-scale Covariance Feature Fusion Network for Real-time Semantic Segmentation

no code implementations12 Dec 2023 Xiaojie Fang, Xingguo Song, Xiangyin Meng, Xu Fang, Sheng Jin

The low-level spatial detail information and high-level semantic abstract information are both essential to the semantic segmentation task.

Real-Time Semantic Segmentation Segmentation

You Only Learn One Query: Learning Unified Human Query for Single-Stage Multi-Person Multi-Task Human-Centric Perception

1 code implementation9 Dec 2023 Sheng Jin, Shuhuai Li, Tong Li, Wentao Liu, Chen Qian, Ping Luo

Human-centric perception (e. g. detection, segmentation, pose estimation, and attribute analysis) is a long-standing problem for computer vision.

Attribute Multi-Task Learning +1

CLIPSelf: Vision Transformer Distills Itself for Open-Vocabulary Dense Prediction

1 code implementation2 Oct 2023 Size Wu, Wenwei Zhang, Lumin Xu, Sheng Jin, Xiangtai Li, Wentao Liu, Chen Change Loy

However, when transferring the vision-language alignment of CLIP from global image representation to local region representation for the open-vocabulary dense prediction tasks, CLIP ViTs suffer from the domain shift from full images to local image regions.

Image Classification Image Segmentation +7

Domain Generalization via Balancing Training Difficulty and Model Capability

no code implementations ICCV 2023 Xueying Jiang, Jiaxing Huang, Sheng Jin, Shijian Lu

Despite its recent progress, most existing work suffers from the misalignment between the difficulty level of training samples and the capability of contemporarily trained models, leading to over-fitting or under-fitting in the trained generalization model.

Data Augmentation Domain Generalization

GKGNet: Group K-Nearest Neighbor based Graph Convolutional Network for Multi-Label Image Recognition

1 code implementation28 Aug 2023 Ruijie Yao, Sheng Jin, Lumin Xu, Wang Zeng, Wentao Liu, Chen Qian, Ping Luo, Ji Wu

Multi-Label Image Recognition (MLIR) is a challenging task that aims to predict multiple object labels in a single image while modeling the complex relationships between labels and image regions.

graph construction Multi-Label Classification +1

Prompt Ensemble Self-training for Open-Vocabulary Domain Adaptation

no code implementations29 Jun 2023 Jiaxing Huang, Jingyi Zhang, Han Qiu, Sheng Jin, Shijian Lu

Traditional domain adaptation assumes the same vocabulary across source and target domains, which often struggles with limited transfer flexibility and efficiency while handling target domains with different vocabularies.

Unsupervised Domain Adaptation

Vision-Language Models for Vision Tasks: A Survey

1 code implementation3 Apr 2023 Jingyi Zhang, Jiaxing Huang, Sheng Jin, Shijian Lu

Most visual recognition studies rely heavily on crowd-labelled data in deep neural networks (DNNs) training, and they usually train a DNN for each single visual recognition task, leading to a laborious and time-consuming visual recognition paradigm.

Benchmarking Knowledge Distillation +1

Aligning Bag of Regions for Open-Vocabulary Object Detection

1 code implementation CVPR 2023 Size Wu, Wenwei Zhang, Sheng Jin, Wentao Liu, Chen Change Loy

The embeddings of regions in a bag are treated as embeddings of words in a sentence, and they are sent to the text encoder of a VLM to obtain the bag-of-regions embedding, which is learned to be aligned to the corresponding features extracted by a frozen VLM.

Ranked #8 on Open Vocabulary Object Detection on MSCOCO (using extra training data)

Object object-detection +2

Reinforcement learning for traffic signal control in hybrid action space

no code implementations23 Nov 2022 Haoqing Luo, Sheng Jin

The prevailing reinforcement-learning-based traffic signal control methods are typically staging-optimizable or duration-optimizable, depending on the action spaces.

Fairness reinforcement-learning +3

ZoomNAS: Searching for Whole-body Human Pose Estimation in the Wild

1 code implementation23 Aug 2022 Lumin Xu, Sheng Jin, Wentao Liu, Chen Qian, Wanli Ouyang, Ping Luo, Xiaogang Wang

We propose a single-network approach, termed ZoomNet, to take into account the hierarchical structure of the full human body and solve the scale variation of different body parts.

2D Human Pose Estimation Neural Architecture Search +1

3D Interacting Hand Pose Estimation by Hand De-occlusion and Removal

1 code implementation22 Jul 2022 Hao Meng, Sheng Jin, Wentao Liu, Chen Qian, Mengxiang Lin, Wanli Ouyang, Ping Luo

Unlike most previous works that directly predict the 3D poses of two interacting hands simultaneously, we propose to decompose the challenging interacting hand pose estimation task and estimate the pose of each hand separately.

3D Interacting Hand Pose Estimation Hand Pose Estimation

Pose for Everything: Towards Category-Agnostic Pose Estimation

1 code implementation21 Jul 2022 Lumin Xu, Sheng Jin, Wang Zeng, Wentao Liu, Chen Qian, Wanli Ouyang, Ping Luo, Xiaogang Wang

In this paper, we introduce the task of Category-Agnostic Pose Estimation (CAPE), which aims to create a pose estimation model capable of detecting the pose of any class of object given only a few samples with keypoint definition.

Category-Agnostic Pose Estimation Pose Estimation

Pseudo-Labeled Auto-Curriculum Learning for Semi-Supervised Keypoint Localization

no code implementations ICLR 2022 Can Wang, Sheng Jin, Yingda Guan, Wentao Liu, Chen Qian, Ping Luo, Wanli Ouyang

PL approaches apply pseudo-labels to unlabeled data, and then train the model with a combination of the labeled and pseudo-labeled data iteratively.

Temporal Action Proposal Generation with Background Constraint

1 code implementation15 Dec 2021 Haosen Yang, Wenhao Wu, Lining Wang, Sheng Jin, Boyang xia, Hongxun Yao, Hujie Huang

To evaluate the confidence of proposals, the existing works typically predict action score of proposals that are supervised by the temporal Intersection-over-Union (tIoU) between proposal and the ground-truth.

Temporal Action Proposal Generation

When Human Pose Estimation Meets Robustness: Adversarial Algorithms and Benchmarks

1 code implementation CVPR 2021 Jiahang Wang, Sheng Jin, Wentao Liu, Weizhong Liu, Chen Qian, Ping Luo

However, unlike human vision that is robust to various data corruptions such as blur and pixelation, current pose estimators are easily confused by these corruptions.

Knowledge Distillation Pose Estimation

Relative occurrence rates of terrestrial planets orbiting FGK stars

no code implementations11 Feb 2021 Sheng Jin

Then I fit two exponential decay functions of detection efficiency along with the increase of planetary orbital distance and the decrease of planetary radius.

Earth and Planetary Astrophysics Solar and Stellar Astrophysics

When Counterpoint Meets Chinese Folk Melodies

1 code implementation NeurIPS 2020 Nan Jiang, Sheng Jin, Zhiyao Duan, ChangShui Zhang

An interaction reward model is trained on the duets formed from outer parts of Bach chorales to model counterpoint interaction, while a style reward model is trained on monophonic melodies of Chinese folk songs to model melodic patterns.

Differentiable Hierarchical Graph Grouping for Multi-Person Pose Estimation

no code implementations ECCV 2020 Sheng Jin, Wentao Liu, Enze Xie, Wenhai Wang, Chen Qian, Wanli Ouyang, Ping Luo

The modules of HGG can be trained end-to-end with the keypoint detection network and is able to supervise the grouping process in a hierarchical manner.

2D Human Pose Estimation Clustering +5

Whole-Body Human Pose Estimation in the Wild

2 code implementations ECCV 2020 Sheng Jin, Lumin Xu, Jin Xu, Can Wang, Wentao Liu, Chen Qian, Wanli Ouyang, Ping Luo

This paper investigates the task of 2D human whole-body pose estimation, which aims to localize dense landmarks on the entire human body including face, hands, body, and feet.

2D Human Pose Estimation Facial Landmark Detection +2

RL-Duet: Online Music Accompaniment Generation Using Deep Reinforcement Learning

no code implementations8 Feb 2020 Nan Jiang, Sheng Jin, Zhiyao Duan, Chang-Shui Zhang

We cast this as a reinforcement learning problem, where the generation agent learns a policy to generate a musical note (action) based on previously generated context (state).

Music Generation reinforcement-learning +2

HoMM: Higher-order Moment Matching for Unsupervised Domain Adaptation

1 code implementation27 Dec 2019 Chao Chen, Zhihang Fu, Zhihong Chen, Sheng Jin, Zhaowei Cheng, Xinyu Jin, Xian-Sheng Hua

In particular, our proposed HoMM can perform arbitrary-order moment tensor matching, we show that the first-order HoMM is equivalent to Maximum Mean Discrepancy (MMD) and the second-order HoMM is equivalent to Correlation Alignment (CORAL).

Unsupervised Domain Adaptation

SSAH: Semi-supervised Adversarial Deep Hashing with Self-paced Hard Sample Generation

no code implementations20 Nov 2019 Sheng Jin, Shangchen Zhou, Yao Liu, Chao Chen, Xiaoshuai Sun, Hongxun Yao, Xian-Sheng Hua

In this paper, we propose a novel Semi-supervised Self-pace Adversarial Hashing method, named SSAH to solve the above problems in a unified framework.

Deep Hashing Generative Adversarial Network

TRB: A Novel Triplet Representation for Understanding 2D Human Body

2 code implementations ICCV 2019 Haodong Duan, Kwan-Yee Lin, Sheng Jin, Wentao Liu, Chen Qian, Wanli Ouyang

In this paper, we propose the Triplet Representation for Body (TRB) -- a compact 2D human body representation, with skeleton keypoints capturing human pose information and contour keypoints containing human shape information.

Conditional Image Generation Open-Ended Question Answering +1

Connectionist Temporal Classification with Maximum Entropy Regularization

1 code implementation NeurIPS 2018 Hu Liu, Sheng Jin, Chang-Shui Zhang

Connectionist Temporal Classification (CTC) is an objective function for end-to-end sequence learning, which adopts dynamic programming algorithms to directly learn the mapping between sequences.

Classification General Classification +3

Deep Saliency Hashing

no code implementations4 Jul 2018 Sheng Jin, Hongxun Yao, Xiaoshuai Sun, Shangchen Zhou, Lei Zhang, Xian-Sheng Hua

As the core of DSaH, the saliency loss guides the attention network to mine discriminative regions from pairs of images.

Deep Hashing Quantization

Unsupervised Semantic Deep Hashing

no code implementations19 Mar 2018 Sheng Jin

In this paper, we propose a novel unsupervised deep hashing method for large-scale image retrieval.

Deep Hashing Image Retrieval +1

Cannot find the paper you are looking for? You can Submit a new open access paper.