no code implementations • CVPR 2023 • Zihang Lin, Chaolei Tan, Jian-Fang Hu, Zhi Jin, Tiancai Ye, Wei-Shi Zheng
The static stream performs cross-modal understanding in a single frame and learns to attend to the target object spatially according to intra-frame visual cues like object appearances.
no code implementations • CVPR 2023 • Chaolei Tan, Zihang Lin, Jian-Fang Hu, Wei-Shi Zheng, JianHuang Lai
Specifically, we develop a hierarchical encoder that encodes the multi-modal inputs into semantics-aligned representations at different levels.
no code implementations • 6 Jul 2022 • Zihang Lin, Chaolei Tan, Jian-Fang Hu, Zhi Jin, Tiancai Ye, Wei-Shi Zheng
The static branch performs cross-modal understanding in a single frame and learns to localize the target object spatially according to intra-frame visual cues like object appearances.
Ranked #1 on
Spatio-Temporal Video Grounding
on HC-STVG2
no code implementations • NeurIPS 2021 • Jiangxin Sun, Zihang Lin, Xintong Han, Jian-Fang Hu, Jia Xu, Wei-Shi Zheng
The ability of forecasting future human motion is important for human-machine interaction systems to understand human behaviors and make interaction.
no code implementations • 20 Jun 2021 • Chaolei Tan, Zihang Lin, Jian-Fang Hu, Xiang Li, Wei-Shi Zheng
We propose an effective two-stage approach to tackle the problem of language-based Human-centric Spatio-Temporal Video Grounding (HC-STVG) task.
no code implementations • ICCV 2021 • Zihang Lin, Jiangxin Sun, Jian-Fang Hu, QiZhi Yu, Jian-Huang Lai, Wei-Shi Zheng
In the latent feature learned by the autoencoder, global structures are enhanced and local details are suppressed so that it is more predictive.
no code implementations • 19 Oct 2018 • Jiafeng Xie, Bing Shuai, Jian-Fang Hu, Jingyang Lin, Wei-Shi Zheng
Recently, segmentation neural networks have been significantly improved by demonstrating very promising accuracies on public benchmarks.
no code implementations • ECCV 2018 • Jian-Fang Hu, Wei-Shi Zheng, Jia-Hui Pan, Jian-Huang Lai, Jian-Guo Zhang
In this paper, we focus on exploring modality-temporal mutual information for RGB-D action recognition.
no code implementations • IEEE Transactions on Pattern Analysis and Machine Intelligence 2018 • Jian-Fang Hu, Wei-Shi Zheng, Lianyang Ma, Gang Wang, Jian-Huang Lai, Jian-Guo Zhang
Our formulation of soft regression framework 1) overcomes a usual assumption in existing early action prediction systems that the progress level of on-going sequence is given in the testing stage; and 2) presents a theoretical framework to better resolve the ambiguity and uncertainty of subsequences at early performing stage.
Ranked #67 on
Skeleton Based Action Recognition
on NTU RGB+D 120
no code implementations • 20 Sep 2017 • Yongyi Tang, Peizhen Zhang, Jian-Fang Hu, Wei-Shi Zheng
Rather than simply recognizing the action of a person individually, collective activity recognition aims to find out what a group of people is acting in a collective scene.
no code implementations • IEEE Transactions on Pattern Analysis and Machine Intelligence ( Volume: 39 , Issue: 11 , Nov. 1 2017 ) 2016 • Jian-Fang Hu, Wei-Shi Zheng, Jian-Huang Lai, Jian-Guo Zhang
The proposed model formed in a unified framework is capable of: 1) jointly mining a set of subspaces with the same dimensionality to exploit latent shared features across different feature channels, 2) meanwhile, quantifying the shared and feature-specific components of features in the subspaces, and 3) transferring feature-specific intermediate transforms (i-transforms) for learning fusion of heterogeneous features across datasets.
Ranked #8 on
Skeleton Based Action Recognition
on SYSU 3D