no code implementations • 3 Apr 2025 • Xiaolong Sun, Le Wang, Sanping Zhou, Liushuai Shi, Kun Xia, Mengnan Liu, Yabing Wang, Gang Hua
In this paper, we propose a novel Moment-Quantization based Video Temporal Grounding method (MQVTG), which quantizes the input video into various discrete vectors to enhance the discrimination between relevant and irrelevant moments.
no code implementations • 10 Mar 2025 • Zheng Qin, Ruobing Zheng, Yabing Wang, Tianqi Li, Zixin Zhu, Sanping Zhou, Ming Yang, Le Wang
AI-generated content faces similar requirements, where users not only need automatic generation of lip synchronization and basic gestures from audio input but also desire semantically accurate and expressive body movement that can be ``directly guided'' through text descriptions.
1 code implementation • 9 Mar 2025 • Yanqing Shen, Sanping Zhou, Jingwen Fu, Ruotong Wang, Shitao Chen, Nanning Zheng
Most deep learning-based methods in an end-to-end manner cannot extract global features with sufficient semantic information from RGB images.
no code implementations • 15 Feb 2025 • Pengna Li, Kangyi Wu, Jingwen Fu, Sanping Zhou
Most prior methods tackle this task by learning a navigation policy, which extracts visual features of goal and observation images, compares their similarity and predicts actions.
no code implementations • 26 Dec 2024 • Yabing Wang, Zhuotao Tian, Qingpei Guo, Zheng Qin, Sanping Zhou, Ming Yang, Le Wang
It consists of the query adaption module that can be seamlessly integrated into CLIP and generate the referential query to provide the prior context for decoder, along with a task-specific decoder.
1 code implementation • 8 Sep 2024 • Ning Gao, Sanping Zhou, Le Wang, Nanning Zheng
In this paper, we propose a simple yet effective semi-supervised learning framework, termed Progressive Mean Teachers (PMT), for medical image segmentation, whose goal is to generate high-fidelity pseudo labels by learning robust and diverse features in the training process.
1 code implementation • 18 Jul 2024 • YuHan Liu, Qianxin Huang, Siqi Hui, Jingwen Fu, Sanping Zhou, Kangyi Wu, Pengna Li, Jinjun Wang
In our work, we seek another way to use the semantic information, that is semantic-aware feature representation learning framework. Based on this, we propose SRMatcher, a new detector-free feature matching method, which encourages the network to learn integrated semantic feature representation. Specifically, to capture precise and rich semantics, we leverage the capabilities of recently popularized vision foundation models (VFMs) trained on extensive datasets.
1 code implementation • CVPR 2024 • Zheng Qin, Le Wang, Sanping Zhou, Panpan Fu, Gang Hua, Wei Tang
Effective trackers should demonstrate a high degree of generalizability across diverse scenarios.
1 code implementation • 31 May 2024 • Xiaolong Sun, Liushuai Shi, Le Wang, Sanping Zhou, Kun Xia, Yabing Wang, Gang Hua
To tackle this limitation, we present a Region-Guided TRansformer (RGTR) for temporal sentence grounding, which diversifies moment queries to eliminate overlapped and redundant predictions.
1 code implementation • 3 May 2024 • Canhui Tang, Sanping Zhou, Yizhe Li, Yonghao Dong, Le Wang
The success of knowledge distillation mainly relies on how to keep the feature discrepancy between the teacher and student model, in which it assumes that: (1) the teacher model can jointly represent two different distributions for the normal and abnormal patterns, while (2) the student model can only reconstruct the normal distribution.
no code implementations • 25 Apr 2024 • Yu Wang, Sanping Zhou, Kun Xia, Le Wang
Semi-supervised action recognition aims to improve spatio-temporal reasoning ability with a few labeled data in conjunction with a large amount of unlabeled data.
no code implementations • 16 Apr 2024 • Sihan Bai, Sanping Zhou, Zheng Qin, Le Wang, Nanning Zheng
Noisy label learning aims to learn robust networks under the supervision of noisy labels, which plays a critical role in deep learning.
1 code implementation • 25 Mar 2024 • Pengna Li, Kangyi Wu, Wenli Huang, Sanping Zhou, Jinjun Wang
Unsupervised person re-identification aims to retrieve images of a specified person without identity labels.
no code implementations • 17 Mar 2024 • Kun Xia, Le Wang, Sanping Zhou, Gang Hua, Wei Tang
To this end, we first devise innovative strategies to adaptively select high-quality positive and negative classes from the label space, by modeling both the confidence and rank of a class in relation to those of the target class.
no code implementations • 9 Mar 2024 • Yonghao Dong, Le Wang, Sanping Zhou, Gang Hua, Changyin Sun
Previous studies have tried to tackle this problem by leveraging a portion of the trajectory data from the target domain to adapt the model.
no code implementations • 27 Feb 2024 • Deqian Kong, Yuhao Huang, Jianwen Xie, Edouardo Honig, Ming Xu, Shuanghong Xue, Pei Lin, Sanping Zhou, Sheng Zhong, Nanning Zheng, Ying Nian Wu
We propose the Latent Prompt Transformer (LPT), a novel generative model comprising three components: (1) a latent vector with a learnable prior distribution modeled by a neural transformation of Gaussian white noise; (2) a molecule generation model based on a causal Transformer, which uses the latent vector as a prompt; and (3) a property prediction model that predicts a molecule's target properties and/or constraint values using the latent prompt.
no code implementations • 17 Nov 2023 • Yizhe Li, Sanping Zhou, Zheng Qin, Le Wang, Jinjun Wang, Nanning Zheng
In this paper, we propose a simple yet effective two-stage feature learning paradigm to jointly learn single-shot and multi-shot features for different targets, so as to achieve robust data association in the tracking process.
no code implementations • 8 Nov 2023 • Miao Kang, Shengqi Wang, Sanping Zhou, Ke Ye, Jingjing Jiang, Nanning Zheng
In this paper, we propose a novel Future Feedback Interaction Network (FFINet) to aggregate features the current observations and potential future interactions for trajectory prediction.
no code implementations • 18 Jul 2023 • Zewei Lin, Yanqing Shen, Sanping Zhou, Shitao Chen, Nanning Zheng
In this paper, we propose a novel and effective Multi-Level Fusion network, named as MLF-DET, for high-performance cross-modal 3D object DETection, which integrates both the feature-level fusion and decision-level fusion to fully utilize the information in the image.
no code implementations • 29 Jun 2023 • Siqi Hui, Sanping Zhou, Ye Deng, Jinjun Wang
Specifically, we select the teacher model as the one with the best validation accuracy during meta-training and restrict the symmetric Kullback-Leibler (SKL) divergence between the output distribution of the linear classifier of the teacher model and that of the student model.
2 code implementations • 12 May 2023 • Ye Deng, Siqi Hui, Sanping Zhou, Deyu Meng, Jinjun Wang
And based on this attention, a network called $T$-former is designed for image inpainting.
no code implementations • 6 Apr 2023 • Yuhao Huang, Sanping Zhou, Junjie Zhang, Jinpeng Dong, Nanning Zheng
Efficient representation of point clouds is fundamental for LiDAR-based 3D object detection.
no code implementations • CVPR 2023 • Zheng Qin, Sanping Zhou, Le Wang, Jinghai Duan, Gang Hua, Wei Tang
For dense crowds, we design a novel Interaction Module to learn interaction-aware motions from short-term trajectories, which can estimate the complex movement of each target.
1 code implementation • ICCV 2023 • Kun Xia, Le Wang, Sanping Zhou, Gang Hua, Wei Tang
To this end, we propose a unified framework, termed Noisy Pseudo-Label Learning, to handle both location biases and category errors.
1 code implementation • ICCV 2023 • Liushuai Shi, Le Wang, Sanping Zhou, Gang Hua
Pedestrian trajectory prediction is an essentially connecting link to understanding human behavior.
no code implementations • ICCV 2023 • Xingyu Liu, Sanping Zhou, Le Wang, Gang Hua
Learning discriminative features from very few labeled samples to identify novel classes has received increasing attention in skeleton-based action recognition.
Action Recognition
Few-Shot Skeleton-Based Action Recognition
+1
no code implementations • ICCV 2023 • Yonghao Dong, Le Wang, Sanping Zhou, Gang Hua
Specifically, SICNet learns comprehensive sparse instances, i. e., representative points of the future trajectory, through a mask generated by a long short-term memory encoder and uses the memory mechanism to store and retrieve such sparse instances.
no code implementations • CVPR 2023 • Yanqing Shen, Sanping Zhou, Jingwen Fu, Ruotong Wang, Shitao Chen, Nanning Zheng
In this paper, we propose StructVPR, a novel training architecture for VPR, to enhance structural knowledge in RGB global features and thus improve feature stability in a constantly changing environment.
no code implementations • CVPR 2022 • Kun Xia, Le Wang, Sanping Zhou, Nanning Zheng, Wei Tang
The main challenge of Temporal Action Localization is to retrieve subtle human actions from various co-occurring ingredients, e. g., context and background, in an untrimmed video.
1 code implementation • 26 May 2022 • Liushuai Shi, Le Wang, Chengjiang Long, Sanping Zhou, Fang Zheng, Nanning Zheng, Gang Hua
Understanding the multiple socially-acceptable future behaviors is an essential task for many vision applications.
no code implementations • CVPR 2022 • Ruotong Wang, Yanqing Shen, Weiliang Zuo, Sanping Zhou, Nanning Zheng
In addition, the output tokens from Transformer layers filtered by the fused attention mask are considered as key-patch descriptors, which are used to perform spatial matching to re-rank the candidates retrieved by the global image features.
1 code implementation • 14 Nov 2021 • Siqi Hui, Sanping Zhou, Ye Deng, Wenli Huang, Jinjun Wang
TPL and TSL are supersets of perceptual and style losses and release the auxiliary potential of standard perceptual and style losses.
1 code implementation • ICCV 2021 • Fang Zheng, Le Wang, Sanping Zhou, Wei Tang, Zhenxing Niu, Nanning Zheng, Gang Hua
Specifically, the proposed unlimited neighborhood interaction module generates the fused-features of all agents involved in an interaction simultaneously, which is adaptive to any number of agents and any range of interaction area.
no code implementations • CVPR 2021 • Liushuai Shi, Le Wang, Chengjiang Long, Sanping Zhou, Mo Zhou, Zhenxing Niu, Gang Hua
Specifically, the SGCN explicitly models the sparse directed interaction with a sparse directed spatial graph to capture adaptive interaction pedestrians.
4 code implementations • 4 Apr 2021 • Liushuai Shi, Le Wang, Chengjiang Long, Sanping Zhou, Mo Zhou, Zhenxing Niu, Gang Hua
Meanwhile, we use a sparse directed temporal graph to model the motion tendency, thus to facilitate the prediction based on the observed direction.
no code implementations • 21 Feb 2021 • Yudong Liang, Bin Wang, Jiaying Liu, Deyu Li, Sanping Zhou, Wenqi Ren
However, we note that the guidance of the depth information for transmission estimation could remedy the decreased visibility as distances increase.
1 code implementation • ICCV 2021 • Haoxuanye Ji, Le Wang, Sanping Zhou, Wei Tang, Nanning Zheng, Gang Hua
Unsupervised person re-identification (Re-ID) remains challenging due to the lack of ground-truth labels.
1 code implementation • 12 Dec 2020 • Rongye Meng, Sanping Zhou, Xingyu Wan, Mengliu Li, Jinjun Wang
The radical student uses multi-source supervision signals from the same task to update parameters, while the calm teacher uses a single-source supervision signal to update parameters.
no code implementations • 13 Jul 2020 • Xingyu Wan, Jiakai Cao, Sanping Zhou, Jinjun Wang
Most existing Multi-Object Tracking (MOT) approaches follow the Tracking-by-Detection paradigm and the data association framework where objects are firstly detected and then associated.
no code implementations • 7 Jul 2020 • Jixin Wang, Sanping Zhou, Chaowei Fang, Le Wang, Jinjun Wang
However the training of deep neural network requires a large amount of samples with high-quality annotations.
no code implementations • 30 Jan 2020 • Jimuyang Zhang, Sanping Zhou, Xin Chang, Fangbin Wan, Jinjun Wang, Yang Wu, Dong Huang
Most of Multiple Object Tracking (MOT) approaches compute individual target features for two subtasks: estimating target-wise motions and conducting pair-wise Re-Identification (Re-ID).
no code implementations • ICCV 2019 • Sanping Zhou, Fei Wang, Zeyi Huang, Jinjun Wang
In this paper, we propose a simple yet effective feedforward attention network to address the two mentioned problems, in which a novel consistent attention regularizer and an improved triplet loss are designed to learn foreground attentive features for person Re-ID.
no code implementations • 6 May 2019 • Jimuyang Zhang, Sanping Zhou, Jinjun Wang, Dong Huang
The main challenge of Multiple Object Tracking (MOT) is the efficiency in associating indefinite number of objects between video frames.
1 code implementation • ICCV 2019 • Fei Wang, Sanping Zhou, Stanislav Panev, Jinsong Han, Dong Huang
Fine-grained person perception such as body segmentation and pose estimation has been achieved with many 2D and 3D sensors such as RGB/depth cameras, radars (e. g., RF-Pose) and LiDARs.
1 code implementation • 29 Mar 2019 • Sanping Zhou, Jimuyang Zhang, Jinjun Wang, Fei Wang, Dong Huang
In this paper, we propose a simple yet effective Siamese Edge-Enhancement Network (SE2Net) to preserve the edge structure for salient object detection.
3 code implementations • NeurIPS 2019 • Jun Shu, Qi Xie, Lixuan Yi, Qian Zhao, Sanping Zhou, Zongben Xu, Deyu Meng
Current deep neural networks (DNNs) can easily overfit to biased training data with corrupted labels or class imbalance.
Ranked #24 on
Image Classification
on Clothing1M
(using extra training data)
no code implementations • 4 Jul 2018 • Sanping Zhou, Jinjun Wang, Deyu Meng, Yudong Liang, Yihong Gong, Nanning Zheng
Specifically, a novel foreground attentive subnetwork is designed to drive the network's attention, in which a decoder network is used to reconstruct the binary mask by using a novel local regression loss function, and an encoder network is regularized by the decoder network to focus its attention on the foreground persons.
no code implementations • 7 Oct 2017 • Sanping Zhou, Jinjun Wang, Deyu Meng, Xiaomeng Xin, Yubing Li, Yihong Gong, Nanning Zheng
In this paper, we propose a novel deep self-paced learning (DSPL) algorithm to alleviate this problem, in which we apply a self-paced constraint and symmetric regularization to help the relative distance metric training the deep neural network, so as to learn the stable and discriminative features for person Re-ID.
no code implementations • 18 Aug 2017 • Sanping Zhou, Jinjun Wang, Rui Shi, Qiqi Hou, Yihong Gong, Nanning Zheng
The class-identity term keeps the intra-class samples within each camera view gathering together, the relative distance term maximizes the distance between the intra-class class set and inter-class set across different camera views, and the regularization term smoothness the parameters of deep convolutional neural network (CNN).
no code implementations • 3 Jul 2017 • Jiayun Wang, Sanping Zhou, Jinjun Wang, Qiqi Hou
In this paper, we present a novel deep ranking model with feature learning and fusion by learning a large adaptive margin between the intra-class distance and inter-class distance to solve the person re-identification problem.
no code implementations • CVPR 2017 • Sanping Zhou, Jinjun Wang, Jiayun Wang, Yihong Gong, Nanning Zheng
One of the key issues for deep learning based person Re-ID is the selection of proper similarity comparison criteria, and the performance of learned features using existing criterion based on pairwise similarity is still limited, because only P2P distances are mostly considered.
1 code implementation • CVPR 2016 • De Cheng, Yihong Gong, Sanping Zhou, Jinjun Wang, Nanning Zheng
Person re-identification across cameras remains a very challenging problem, especially when there are no overlapping fields of view between cameras.