Search Results for author: Zhenyu He

Found 39 papers, 17 papers with code

RTracker: Recoverable Tracking via PN Tree Structured Memory

1 code implementation28 Mar 2024 Yuqing Huang, Xin Li, Zikun Zhou, YaoWei Wang, Zhenyu He, Ming-Hsuan Yang

Upon the PN tree memory, we develop corresponding walking rules for determining the state of the target and define a set of control flows to unite the tracker and the detector in different tracking scenarios.

Do Efficient Transformers Really Save Computation?

no code implementations21 Feb 2024 Kai Yang, Jan Ackermann, Zhenyu He, Guhao Feng, Bohang Zhang, Yunzhen Feng, Qiwei Ye, Di He, LiWei Wang

Our results show that while these models are expressive enough to solve general DP tasks, contrary to expectations, they require a model size that scales with the problem size.

Two Stones Hit One Bird: Bilevel Positional Encoding for Better Length Extrapolation

no code implementations29 Jan 2024 Zhenyu He, Guhao Feng, Shengjie Luo, Kai Yang, Di He, Jingjing Xu, Zhi Zhang, Hongxia Yang, LiWei Wang

In this work, we leverage the intrinsic segmentation of language sequences and design a new positional encoding method called Bilevel Positional Encoding (BiPE).

Disentanglement Position

REST: Retrieval-Based Speculative Decoding

1 code implementation14 Nov 2023 Zhenyu He, Zexuan Zhong, Tianle Cai, Jason D Lee, Di He

We introduce Retrieval-Based Speculative Decoding (REST), a novel algorithm designed to speed up language model generation.

Language Modelling Retrieval +1

Examining User-Friendly and Open-Sourced Large GPT Models: A Survey on Language, Multimodal, and Scientific GPT Models

1 code implementation27 Aug 2023 Kaiyuan Gao, Sunan He, Zhenyu He, Jiacheng Lin, Qizhi Pei, Jie Shao, Wei zhang

Generative pre-trained transformer (GPT) models have revolutionized the field of natural language processing (NLP) with remarkable performance in various tasks and also extend their power to multimodal domains.

Channel and Spatial Relation-Propagation Network for RGB-Thermal Semantic Segmentation

no code implementations24 Aug 2023 Zikun Zhou, Shukun Wu, Guoqing Zhu, Hongpeng Wang, Zhenyu He

In this paper, we propose a Channel and Spatial Relation-Propagation Network (CSRPNet) for RGB-T semantic segmentation, which propagates only modality-shared information across different modalities and alleviates the modality-specific information contamination issue.

Relation Segmentation +2

Cross-Modality Proposal-guided Feature Mining for Unregistered RGB-Thermal Pedestrian Detection

no code implementations23 Aug 2023 Chao Tian, Zikun Zhou, Yuqing Huang, Gaojun Li, Zhenyu He

RGB-Thermal (RGB-T) pedestrian detection aims to locate the pedestrians in RGB-T image pairs to exploit the complementation between the two modalities for improving detection robustness in extreme conditions.

Data Augmentation Pedestrian Detection

CiteTracker: Correlating Image and Text for Visual Tracking

1 code implementation ICCV 2023 Xin Li, Yuqing Huang, Zhenyu He, YaoWei Wang, Huchuan Lu, Ming-Hsuan Yang

Existing visual tracking methods typically take an image patch as the reference of the target to perform tracking.

Attribute Descriptive +2

Transferable Decoding with Visual Entities for Zero-Shot Image Captioning

1 code implementation ICCV 2023 Junjie Fei, Teng Wang, Jinrui Zhang, Zhenyu He, Chengjie Wang, Feng Zheng

In this paper, we propose ViECap, a transferable decoding model that leverages entity-aware decoding to generate descriptions in both seen and unseen scenarios.

Hallucination Image Captioning +1

ZeroPose: CAD-Model-based Zero-Shot Pose Estimation

no code implementations29 May 2023 Jianqiu Chen, Mingshan Sun, Tianpeng Bao, Rui Zhao, Liwei Wu, Zhenyu He

In this paper, we present a CAD model-based zero-shot pose estimation pipeline called ZeroPose.

Instance Segmentation Object +3

Reliability-Hierarchical Memory Network for Scribble-Supervised Video Object Segmentation

1 code implementation25 Mar 2023 Zikun Zhou, Kaige Mao, Wenjie Pei, Hongpeng Wang, YaoWei Wang, Zhenyu He

To be specific, RHMNet first only uses the memory in the high-reliability level to locate the region with high reliability belonging to the target, which is highly similar to the initial target scribble.

Semantic Segmentation Video Object Segmentation +1

Joint Visual Grounding and Tracking with Natural Language Specification

1 code implementation CVPR 2023 Li Zhou, Zikun Zhou, Kaige Mao, Zhenyu He

Such a separated framework overlooks the link between visual grounding and tracking, which is that the natural language descriptions provide global semantic cues for localizing the target for both two steps.

Visual Grounding Visual Tracking

Audio2Gestures: Generating Diverse Gestures from Audio

no code implementations17 Jan 2023 Jing Li, Di Kang, Wenjie Pei, Xuefei Zhe, Ying Zhang, Linchao Bao, Zhenyu He

Finally, we demonstrate that our method can be readily used to generate motion sequences with user-specified motion clips on the timeline.

Gesture Generation

Geo6D: Geometric Constraints Learning for 6D Pose Estimation

no code implementations20 Oct 2022 Jianqiu Chen, Mingshan Sun, Ye Zheng, Tianpeng Bao, Zhenyu He, Donghai Li, Guoqiang Jin, Rui Zhao, Liwei Wu, Xiaoke Jiang

Numerous 6D pose estimation methods have been proposed that employ end-to-end regression to directly estimate the target pose parameters.

6D Pose Estimation object-detection +3

How Image Generation Helps Visible-to-Infrared Person Re-Identification?

no code implementations4 Oct 2022 Honghu Pan, Yongyong Chen, Yunqi He, Xin Li, Zhenyu He

To this end, we propose Flow2Flow, a unified framework that could jointly achieve training sample expansion and cross-modality image generation for V2I person ReID.

Image Generation Person Re-Identification

Pose-Aided Video-based Person Re-Identification via Recurrent Graph Convolutional Network

no code implementations23 Sep 2022 Honghu Pan, Qiao Liu, Yongyong Chen, Yunqi He, Yuan Zheng, Feng Zheng, Zhenyu He

Finally, we propose a dual-attention method consisting of node-attention and time-attention to obtain the temporal graph representation from the node embeddings, where the self-attention mechanism is employed to learn the importance of each node and each frame.

Retrieval Video-Based Person Re-Identification +1

Towards Complete-View and High-Level Pose-based Gait Recognition

no code implementations23 Sep 2022 Honghu Pan, Yongyong Chen, Tingyang Xu, Yunqi He, Zhenyu He

Extensive experiments on two large gait recognition datasets, i. e., CASIA-B and OUMVLP-Pose, demonstrate that our method outperforms the baseline model and existing pose-based methods by a large margin.

Gait Recognition Generative Adversarial Network +1

Multi-Granularity Graph Pooling for Video-based Person Re-Identification

no code implementations23 Sep 2022 Honghu Pan, Yongyong Chen, Zhenyu He

To downsample the graph, we propose a multi-head full attention graph pooling (MHFAPool) layer, which integrates the advantages of existing node clustering and node selection pooling methods.

Node Clustering Retrieval +2

SSORN: Self-Supervised Outlier Removal Network for Robust Homography Estimation

no code implementations30 Aug 2022 Yi Li, Wenjie Pei, Zhenyu He

In this paper, we attempt to build a deep learning model that mimics all four steps in the traditional homography estimation pipeline.

Denoising Homography Estimation

Global Tracking via Ensemble of Local Trackers

1 code implementation CVPR 2022 Zikun Zhou, Jianqiu Chen, Wenjie Pei, Kaige Mao, Hongpeng Wang, Zhenyu He

While it can exploit the temporal context like historical appearances and locations of the target, a potential limitation of such strategy is that the local tracker tends to misidentify a nearby distractor as the target instead of activating the re-detector when the real target is out of view.

Skating-Mixer: Long-Term Sport Audio-Visual Modeling with MLPs

1 code implementation8 Mar 2022 Jingfei Xia, Mingchen Zhuge, Tiantian Geng, Shun Fan, Yuantai Wei, Zhenyu He, Feng Zheng

Figure skating scoring is challenging because it requires judging the technical moves of the players as well as their coordination with the background music.

Representation Learning

GuidedMix-Net: Semi-supervised Semantic Segmentation by Using Labeled Images as Reference

no code implementations28 Dec 2021 Peng Tu, Yawen Huang, Feng Zheng, Zhenyu He, Liujun Cao, Ling Shao

In this paper, we propose a novel method for semi-supervised semantic segmentation named GuidedMix-Net, by leveraging labeled information to guide the learning of unlabeled instances.

Segmentation Semi-Supervised Semantic Segmentation

Active Learning for Deep Visual Tracking

no code implementations17 Oct 2021 Di Yuan, Xiaojun Chang, Yi Yang, Qiao Liu, Dehua Wang, Zhenyu He

In this paper, we propose an active learning method for deep visual tracking, which selects and annotates the unlabeled samples to train the deep CNNs model.

Active Learning Visual Tracking

Audio2Gestures: Generating Diverse Gestures from Speech Audio with Conditional Variational Autoencoders

no code implementations ICCV 2021 Jing Li, Di Kang, Wenjie Pei, Xuefei Zhe, Ying Zhang, Zhenyu He, Linchao Bao

In order to overcome this problem, we propose a novel conditional variational autoencoder (VAE) that explicitly models one-to-many audio-to-motion mapping by splitting the cross-modal latent code into shared code and motion-specific code.

Gesture Generation

Saliency-Associated Object Tracking

1 code implementation ICCV 2021 Zikun Zhou, Wenjie Pei, Xin Li, Hongpeng Wang, Feng Zheng, Zhenyu He

A potential limitation of such trackers is that not all patches are equally informative for tracking.

Object Object Tracking

Self-Supervised Tracking via Target-Aware Data Synthesis

no code implementations21 Jun 2021 Xin Li, Wenjie Pei, YaoWei Wang, Zhenyu He, Huchuan Lu, Ming-Hsuan Yang

While deep-learning based tracking methods have achieved substantial progress, they entail large-scale and high-quality annotated data for sufficient training.

Representation Learning Self-Supervised Learning +1

TCDesc: Learning Topology Consistent Descriptors for Image Matching

no code implementations13 Sep 2020 Honghu Pan, Fanyang Meng, Nana Fan, Zhenyu He

Our method has the following two advantages: (1) We are the first to consider neighborhood information of descriptors, while former works mainly focus on neighborhood consistency of feature points; (2) Our method can be applied in any former work of learning descriptors by triplet loss.

LSOTB-TIR:A Large-Scale High-Diversity Thermal Infrared Object Tracking Benchmark

1 code implementation3 Aug 2020 Qiao Liu, Xin Li, Zhenyu He, Chenglong Li, Jun Li, Zikun Zhou, Di Yuan, Jing Li, Kai Yang, Nana Fan, Feng Zheng

We evaluate and analyze more than 30 trackers on LSOTB-TIR to provide a series of baselines, and the results show that deep trackers achieve promising performance.

Thermal Infrared Object Tracking Vocal Bursts Intensity Prediction

Accurate Bounding-box Regression with Distance-IoU Loss for Visual Tracking

no code implementations3 Jul 2020 Di Yuan, Xiu Shu, Nana Fan, Xiaojun Chang, Qiao Liu, Zhenyu He

Moreover, we introduce a classification part that is trained online and optimized with a Conjugate-Gradient-based strategy to guarantee real-time tracking speed.

regression Visual Tracking

TCDesc: Learning Topology Consistent Descriptors

no code implementations5 Jun 2020 Honghu Pan, Fanyang Meng, Zhenyu He, Yongsheng Liang, Wei Liu

Then we define topology distance between descriptors as the difference of their topology vectors.

Multi-Task Driven Feature Models for Thermal Infrared Tracking

1 code implementation26 Nov 2019 Qiao Liu, Xin Li, Zhenyu He, Nana Fan, Di Yuan, Wei Liu, Yonsheng Liang

These two feature models are learned using a multi-task matching framework and are jointly optimized on the TIR tracking task.

Thermal Infrared Object Tracking

Learning Deep Multi-Level Similarity for Thermal Infrared Object Tracking

1 code implementation9 Jun 2019 Qiao Liu, Xin Li, Zhenyu He, Nana Fan, Di Yuan, Hongpeng Wang

These two similarities complement each other and hence enhance the discriminative capacity of the network for handling distractors.

Semantic Similarity Thermal Infrared Object Tracking

Target-Aware Deep Tracking

no code implementations CVPR 2019 Xin Li, Chao Ma, Baoyuan Wu, Zhenyu He, Ming-Hsuan Yang

Despite demonstrated successes for numerous vision tasks, the contributions of using pre-trained deep features for visual tracking are not as significant as that for object recognition.

Object Object Recognition +1

Region-filtering Correlation Tracking

no code implementations23 Mar 2018 Nana Fan, Zhenyu He

The IRs in training samples from cyclic shifts of the base training sample severely degrade the quality of a tracking model.

Visual Tracking

PTB-TIR: A Thermal Infrared Pedestrian Tracking Benchmark

1 code implementation18 Jan 2018 Qiao Liu, Zhenyu He, Xin Li, Yuan Zheng

The ability to evaluate the TIR pedestrian tracker fairly, on a benchmark dataset, is significant for the development of this field.

Attribute Thermal Infrared Object Tracking

Hierarchical Spatial-aware Siamese Network for Thermal Infrared Object Tracking

1 code implementation27 Nov 2017 Xin Li, Qiao Liu, Nana Fan, Zhenyu He, Hongzhi Wang

In this paper, we cast the TIR tracking problem as a similarity verification task, which is coupled well to the objective of the tracking task.

General Classification Thermal Infrared Object Tracking

Deep Convolutional Neural Networks for Thermal Infrared Object Tracking

1 code implementation Knowledge-Based Systems 2017 QiaoLiu, Xiaohuan Lu, Zhenyu He, Chunkai Zhang, WenSheng Chen

We observe that the features from the fully-connected layer are not suitable for thermal infrared tracking due to the lack of spatial information of the target, while the features from the convolution layers are.

Object Thermal Infrared Object Tracking +2

Cannot find the paper you are looking for? You can Submit a new open access paper.