Search Results for author: Sanping Zhou

Found 52 papers, 20 papers with code

Moment Quantization for Video Temporal Grounding

no code implementations3 Apr 2025 Xiaolong Sun, Le Wang, Sanping Zhou, Liushuai Shi, Kun Xia, Mengnan Liu, Yabing Wang, Gang Hua

In this paper, we propose a novel Moment-Quantization based Video Temporal Grounding method (MQVTG), which quantizes the input video into various discrete vectors to enhance the discrimination between relevant and irrelevant moments.

Quantization Video Understanding

Versatile Multimodal Controls for Expressive Talking Human Animation

no code implementations10 Mar 2025 Zheng Qin, Ruobing Zheng, Yabing Wang, Tianqi Li, Zixin Zhu, Sanping Zhou, Ming Yang, Le Wang

AI-generated content faces similar requirements, where users not only need automatic generation of lip synchronization and basic gestures from audio input but also desire semantically accurate and expressive body movement that can be ``directly guided'' through text descriptions.

Human Animation

StructVPR++: Distill Structural and Semantic Knowledge with Weighting Samples for Visual Place Recognition

1 code implementation9 Mar 2025 Yanqing Shen, Sanping Zhou, Jingwen Fu, Ruotong Wang, Shitao Chen, Nanning Zheng

Most deep learning-based methods in an end-to-end manner cannot extract global features with sufficient semantic information from RGB images.

Autonomous Driving Image Retrieval +3

REGNav: Room Expert Guided Image-Goal Navigation

no code implementations15 Feb 2025 Pengna Li, Kangyi Wu, Jingwen Fu, Sanping Zhou

Most prior methods tackle this task by learning a navigation policy, which extracts visual features of goal and observation images, compares their similarity and predicts actions.

Referencing Where to Focus: Improving VisualGrounding with Referential Query

no code implementations26 Dec 2024 Yabing Wang, Zhuotao Tian, Qingpei Guo, Zheng Qin, Sanping Zhou, Ming Yang, Le Wang

It consists of the query adaption module that can be seamlessly integrated into CLIP and generate the referential query to provide the prior context for decoder, along with a task-specific decoder.

Decoder Visual Grounding

PMT: Progressive Mean Teacher via Exploring Temporal Consistency for Semi-Supervised Medical Image Segmentation

1 code implementation8 Sep 2024 Ning Gao, Sanping Zhou, Le Wang, Nanning Zheng

In this paper, we propose a simple yet effective semi-supervised learning framework, termed Progressive Mean Teachers (PMT), for medical image segmentation, whose goal is to generate high-fidelity pseudo labels by learning robust and diverse features in the training process.

Image Segmentation Pseudo Label +3

Semantic-aware Representation Learning for Homography Estimation

1 code implementation18 Jul 2024 YuHan Liu, Qianxin Huang, Siqi Hui, Jingwen Fu, Sanping Zhou, Kangyi Wu, Pengna Li, Jinjun Wang

In our work, we seek another way to use the semantic information, that is semantic-aware feature representation learning framework. Based on this, we propose SRMatcher, a new detector-free feature matching method, which encourages the network to learn integrated semantic feature representation. Specifically, to capture precise and rich semantics, we leverage the capabilities of recently popularized vision foundation models (VFMs) trained on extensive datasets.

Homography Estimation Representation Learning

Diversifying Query: Region-Guided Transformer for Temporal Sentence Grounding

1 code implementation31 May 2024 Xiaolong Sun, Liushuai Shi, Le Wang, Sanping Zhou, Kun Xia, Yabing Wang, Gang Hua

To tackle this limitation, we present a Region-Guided TRansformer (RGTR) for temporal sentence grounding, which diversifies moment queries to eliminate overlapped and redundant predictions.

Attribute Moment Queries +2

Advancing Pre-trained Teacher: Towards Robust Feature Discrepancy for Anomaly Detection

1 code implementation3 May 2024 Canhui Tang, Sanping Zhou, Yizhe Li, Yonghao Dong, Le Wang

The success of knowledge distillation mainly relies on how to keep the feature discrepancy between the teacher and student model, in which it assumes that: (1) the teacher model can jointly represent two different distributions for the normal and abnormal patterns, while (2) the student model can only reconstruct the normal distribution.

Anomaly Detection Attribute +1

Learning Discriminative Spatio-temporal Representations for Semi-supervised Action Recognition

no code implementations25 Apr 2024 Yu Wang, Sanping Zhou, Kun Xia, Le Wang

Semi-supervised action recognition aims to improve spatio-temporal reasoning ability with a few labeled data in conjunction with a large amount of unlabeled data.

Action Recognition Contrastive Learning

Robust Noisy Label Learning via Two-Stream Sample Distillation

no code implementations16 Apr 2024 Sihan Bai, Sanping Zhou, Zheng Qin, Le Wang, Nanning Zheng

Noisy label learning aims to learn robust networks under the supervision of noisy labels, which plays a critical role in deep learning.

Boosting Semi-Supervised Temporal Action Localization by Learning from Non-Target Classes

no code implementations17 Mar 2024 Kun Xia, Le Wang, Sanping Zhou, Gang Hua, Wei Tang

To this end, we first devise innovative strategies to adaptively select high-quality positive and negative classes from the label space, by modeling both the confidence and rank of a class in relation to those of the target class.

Temporal Action Localization

Recurrent Aligned Network for Generalized Pedestrian Trajectory Prediction

no code implementations9 Mar 2024 Yonghao Dong, Le Wang, Sanping Zhou, Gang Hua, Changyin Sun

Previous studies have tried to tackle this problem by leveraging a portion of the trajectory data from the target domain to adapt the model.

Domain Adaptation Pedestrian Trajectory Prediction +2

Molecule Design by Latent Prompt Transformer

no code implementations27 Feb 2024 Deqian Kong, Yuhao Huang, Jianwen Xie, Edouardo Honig, Ming Xu, Shuanghong Xue, Pei Lin, Sanping Zhou, Sheng Zhong, Nanning Zheng, Ying Nian Wu

We propose the Latent Prompt Transformer (LPT), a novel generative model comprising three components: (1) a latent vector with a learnable prior distribution modeled by a neural transformation of Gaussian white noise; (2) a molecule generation model based on a causal Transformer, which uses the latent vector as a prompt; and (3) a property prediction model that predicts a molecule's target properties and/or constraint values using the latent prompt.

Property Prediction

Single-Shot and Multi-Shot Feature Learning for Multi-Object Tracking

no code implementations17 Nov 2023 Yizhe Li, Sanping Zhou, Zheng Qin, Le Wang, Jinjun Wang, Nanning Zheng

In this paper, we propose a simple yet effective two-stage feature learning paradigm to jointly learn single-shot and multi-shot features for different targets, so as to achieve robust data association in the tracking process.

Multi-Object Tracking

FFINet: Future Feedback Interaction Network for Motion Forecasting

no code implementations8 Nov 2023 Miao Kang, Shengqi Wang, Sanping Zhou, Ke Ye, Jingjing Jiang, Nanning Zheng

In this paper, we propose a novel Future Feedback Interaction Network (FFINet) to aggregate features the current observations and potential future interactions for trajectory prediction.

Motion Forecasting Position +1

MLF-DET: Multi-Level Fusion for Cross-Modal 3D Object Detection

no code implementations18 Jul 2023 Zewei Lin, Yanqing Shen, Sanping Zhou, Shitao Chen, Nanning Zheng

In this paper, we propose a novel and effective Multi-Level Fusion network, named as MLF-DET, for high-performance cross-modal 3D object DETection, which integrates both the feature-level fusion and decision-level fusion to fully utilize the information in the image.

3D Object Detection Data Augmentation +1

Understanding the Overfitting of the Episodic Meta-training

no code implementations29 Jun 2023 Siqi Hui, Sanping Zhou, Ye Deng, Jinjun Wang

Specifically, we select the teacher model as the one with the best validation accuracy during meta-training and restrict the symmetric Kullback-Leibler (SKL) divergence between the output distribution of the linear classifier of the teacher model and that of the student model.

Knowledge Distillation

T-former: An Efficient Transformer for Image Inpainting

2 code implementations12 May 2023 Ye Deng, Siqi Hui, Sanping Zhou, Deyu Meng, Jinjun Wang

And based on this attention, a network called $T$-former is designed for image inpainting.

Image Inpainting Long-range modeling

MotionTrack: Learning Robust Short-term and Long-term Motions for Multi-Object Tracking

no code implementations CVPR 2023 Zheng Qin, Sanping Zhou, Le Wang, Jinghai Duan, Gang Hua, Wei Tang

For dense crowds, we design a novel Interaction Module to learn interaction-aware motions from short-term trajectories, which can estimate the complex movement of each target.

motion prediction Multi-Object Tracking

Learning from Noisy Pseudo Labels for Semi-Supervised Temporal Action Localization

1 code implementation ICCV 2023 Kun Xia, Le Wang, Sanping Zhou, Gang Hua, Wei Tang

To this end, we propose a unified framework, termed Noisy Pseudo-Label Learning, to handle both location biases and category errors.

Pseudo Label Temporal Action Localization

Parallel Attention Interaction Network for Few-Shot Skeleton-Based Action Recognition

no code implementations ICCV 2023 Xingyu Liu, Sanping Zhou, Le Wang, Gang Hua

Learning discriminative features from very few labeled samples to identify novel classes has received increasing attention in skeleton-based action recognition.

Action Recognition Few-Shot Skeleton-Based Action Recognition +1

Sparse Instance Conditioned Multimodal Trajectory Prediction

no code implementations ICCV 2023 Yonghao Dong, Le Wang, Sanping Zhou, Gang Hua

Specifically, SICNet learns comprehensive sparse instances, i. e., representative points of the future trajectory, through a mask generated by a long short-term memory encoder and uses the memory mechanism to store and retrieve such sparse instances.

Future prediction Pedestrian Trajectory Prediction +2

StructVPR: Distill Structural Knowledge with Weighting Samples for Visual Place Recognition

no code implementations CVPR 2023 Yanqing Shen, Sanping Zhou, Jingwen Fu, Ruotong Wang, Shitao Chen, Nanning Zheng

In this paper, we propose StructVPR, a novel training architecture for VPR, to enhance structural knowledge in RGB global features and thus improve feature stability in a constantly changing environment.

Image Retrieval Knowledge Distillation +3

Learning to Refactor Action and Co-occurrence Features for Temporal Action Localization

no code implementations CVPR 2022 Kun Xia, Le Wang, Sanping Zhou, Nanning Zheng, Wei Tang

The main challenge of Temporal Action Localization is to retrieve subtle human actions from various co-occurring ingredients, e. g., context and background, in an untrimmed video.

Temporal Action Localization

TransVPR: Transformer-based place recognition with multi-level attention aggregation

no code implementations CVPR 2022 Ruotong Wang, Yanqing Shen, Weiliang Zuo, Sanping Zhou, Nanning Zheng

In addition, the output tokens from Transformer layers filtered by the fused attention mask are considered as key-patch descriptors, which are used to perform spatial matching to re-rank the candidates retrieved by the global image features.

Autonomous Driving Visual Place Recognition

Auxiliary Loss Reweighting for Image Inpainting

1 code implementation14 Nov 2021 Siqi Hui, Sanping Zhou, Ye Deng, Wenli Huang, Jinjun Wang

TPL and TSL are supersets of perceptual and style losses and release the auxiliary potential of standard perceptual and style losses.

Image Inpainting

Unlimited Neighborhood Interaction for Heterogeneous Trajectory Prediction

1 code implementation ICCV 2021 Fang Zheng, Le Wang, Sanping Zhou, Wei Tang, Zhenxing Niu, Nanning Zheng, Gang Hua

Specifically, the proposed unlimited neighborhood interaction module generates the fused-features of all agents involved in an interaction simultaneously, which is adaptive to any number of agents and any range of interaction area.

Graph Attention Prediction +1

SGCN: Sparse Graph Convolution Network for Pedestrian Trajectory Prediction

no code implementations CVPR 2021 Liushuai Shi, Le Wang, Chengjiang Long, Sanping Zhou, Mo Zhou, Zhenxing Niu, Gang Hua

Specifically, the SGCN explicitly models the sparse directed interaction with a sparse directed spatial graph to capture adaptive interaction pedestrians.

Pedestrian Trajectory Prediction Prediction +1

SGCN:Sparse Graph Convolution Network for Pedestrian Trajectory Prediction

4 code implementations4 Apr 2021 Liushuai Shi, Le Wang, Chengjiang Long, Sanping Zhou, Mo Zhou, Zhenxing Niu, Gang Hua

Meanwhile, we use a sparse directed temporal graph to model the motion tendency, thus to facilitate the prediction based on the observed direction.

Pedestrian Trajectory Prediction Prediction +1

Progressive Depth Learning for Single Image Dehazing

no code implementations21 Feb 2021 Yudong Liang, Bin Wang, Jiaying Liu, Deyu Li, Sanping Zhou, Wenqi Ren

However, we note that the guidance of the depth information for transmission estimation could remedy the decreased visibility as distances increase.

Depth Estimation Depth Prediction +2

Teacher-Student Asynchronous Learning with Multi-Source Consistency for Facial Landmark Detection

1 code implementation12 Dec 2020 Rongye Meng, Sanping Zhou, Xingyu Wan, Mengliu Li, Jinjun Wang

The radical student uses multi-source supervision signals from the same task to update parameters, while the calm teacher uses a single-source supervision signal to update parameters.

Facial Landmark Detection

End-to-End Multi-Object Tracking with Global Response Map

no code implementations13 Jul 2020 Xingyu Wan, Jiakai Cao, Sanping Zhou, Jinjun Wang

Most existing Multi-Object Tracking (MOT) approaches follow the Tracking-by-Detection paradigm and the data association framework where objects are firstly detected and then associated.

Multi-Object Tracking Object +2

Meta Corrupted Pixels Mining for Medical Image Segmentation

no code implementations7 Jul 2020 Jixin Wang, Sanping Zhou, Chaowei Fang, Le Wang, Jinjun Wang

However the training of deep neural network requires a large amount of samples with high-quality annotations.

Image Segmentation Medical Image Analysis +3

Multiple Object Tracking by Flowing and Fusing

no code implementations30 Jan 2020 Jimuyang Zhang, Sanping Zhou, Xin Chang, Fangbin Wan, Jinjun Wang, Yang Wu, Dong Huang

Most of Multiple Object Tracking (MOT) approaches compute individual target features for two subtasks: estimating target-wise motions and conducting pair-wise Re-Identification (Re-ID).

Multiple Object Tracking Object +2

Discriminative Feature Learning With Consistent Attention Regularization for Person Re-Identification

no code implementations ICCV 2019 Sanping Zhou, Fei Wang, Zeyi Huang, Jinjun Wang

In this paper, we propose a simple yet effective feedforward attention network to address the two mentioned problems, in which a novel consistent attention regularizer and an improved triplet loss are designed to learn foreground attentive features for person Re-ID.

Person Re-Identification Triplet

Frame-wise Motion and Appearance for Real-time Multiple Object Tracking

no code implementations6 May 2019 Jimuyang Zhang, Sanping Zhou, Jinjun Wang, Dong Huang

The main challenge of Multiple Object Tracking (MOT) is the efficiency in associating indefinite number of objects between video frames.

Multiple Object Tracking Object

Person-in-WiFi: Fine-grained Person Perception using WiFi

1 code implementation ICCV 2019 Fei Wang, Sanping Zhou, Stanislav Panev, Jinsong Han, Dong Huang

Fine-grained person perception such as body segmentation and pose estimation has been achieved with many 2D and 3D sensors such as RGB/depth cameras, radars (e. g., RF-Pose) and LiDARs.

RF-based Pose Estimation

SE2Net: Siamese Edge-Enhancement Network for Salient Object Detection

1 code implementation29 Mar 2019 Sanping Zhou, Jimuyang Zhang, Jinjun Wang, Fei Wang, Dong Huang

In this paper, we propose a simple yet effective Siamese Edge-Enhancement Network (SE2Net) to preserve the edge structure for salient object detection.

Object object-detection +2

Meta-Weight-Net: Learning an Explicit Mapping For Sample Weighting

3 code implementations NeurIPS 2019 Jun Shu, Qi Xie, Lixuan Yi, Qian Zhao, Sanping Zhou, Zongben Xu, Deyu Meng

Current deep neural networks (DNNs) can easily overfit to biased training data with corrupted labels or class imbalance.

Ranked #24 on Image Classification on Clothing1M (using extra training data)

Image Classification Meta-Learning

Discriminative Feature Learning with Foreground Attention for Person Re-Identification

no code implementations4 Jul 2018 Sanping Zhou, Jinjun Wang, Deyu Meng, Yudong Liang, Yihong Gong, Nanning Zheng

Specifically, a novel foreground attentive subnetwork is designed to drive the network's attention, in which a decoder network is used to reconstruct the binary mask by using a novel local regression loss function, and an encoder network is regularized by the decoder network to focus its attention on the foreground persons.

Decoder Multi-Task Learning +2

Deep Self-Paced Learning for Person Re-Identification

no code implementations7 Oct 2017 Sanping Zhou, Jinjun Wang, Deyu Meng, Xiaomeng Xin, Yubing Li, Yihong Gong, Nanning Zheng

In this paper, we propose a novel deep self-paced learning (DSPL) algorithm to alleviate this problem, in which we apply a self-paced constraint and symmetric regularization to help the relative distance metric training the deep neural network, so as to learn the stable and discriminative features for person Re-ID.

Person Re-Identification Triplet

Large Margin Learning in Set to Set Similarity Comparison for Person Re-identification

no code implementations18 Aug 2017 Sanping Zhou, Jinjun Wang, Rui Shi, Qiqi Hou, Yihong Gong, Nanning Zheng

The class-identity term keeps the intra-class samples within each camera view gathering together, the relative distance term maximizes the distance between the intra-class class set and inter-class set across different camera views, and the regularization term smoothness the parameters of deep convolutional neural network (CNN).

Person Re-Identification Retrieval

Deep Ranking Model by Large Adaptive Margin Learning for Person Re-identification

no code implementations3 Jul 2017 Jiayun Wang, Sanping Zhou, Jinjun Wang, Qiqi Hou

In this paper, we present a novel deep ranking model with feature learning and fusion by learning a large adaptive margin between the intra-class distance and inter-class distance to solve the person re-identification problem.

Person Re-Identification

Point to Set Similarity Based Deep Feature Learning for Person Re-Identification

no code implementations CVPR 2017 Sanping Zhou, Jinjun Wang, Jiayun Wang, Yihong Gong, Nanning Zheng

One of the key issues for deep learning based person Re-ID is the selection of proper similarity comparison criteria, and the performance of learned features using existing criterion based on pairwise similarity is still limited, because only P2P distances are mostly considered.

Person Re-Identification

Cannot find the paper you are looking for? You can Submit a new open access paper.