Search Results for author: Jian-Fang Hu

Found 14 papers, 1 papers with code

Ranking Distillation for Open-Ended Video Question Answering with Insufficient Labels

no code implementations21 Mar 2024 Tianming Liang, Chaolei Tan, Beihao Xia, Wei-Shi Zheng, Jian-Fang Hu

This paper focuses on open-ended video question answering, which aims to find the correct answers from a large answer set in response to a video-related question.

Multi-Label Classification Question Answering +1

Siamese Learning with Joint Alignment and Regression for Weakly-Supervised Video Paragraph Grounding

no code implementations18 Mar 2024 Chaolei Tan, JianHuang Lai, Wei-Shi Zheng, Jian-Fang Hu

Different from previous weakly-supervised grounding frameworks based on multiple instance learning or reconstruction learning for two-stage candidate ranking, we propose a novel siamese learning framework that jointly learns the cross-modal feature alignment and temporal coordinate regression without timestamp labels to achieve concise one-stage localization for WSVPG.

Multiple Instance Learning

Collaborative Static and Dynamic Vision-Language Streams for Spatio-Temporal Video Grounding

no code implementations CVPR 2023 Zihang Lin, Chaolei Tan, Jian-Fang Hu, Zhi Jin, Tiancai Ye, Wei-Shi Zheng

The static stream performs cross-modal understanding in a single frame and learns to attend to the target object spatially according to intra-frame visual cues like object appearances.

Object Spatio-Temporal Video Grounding +1

Hierarchical Semantic Correspondence Networks for Video Paragraph Grounding

no code implementations CVPR 2023 Chaolei Tan, Zihang Lin, Jian-Fang Hu, Wei-Shi Zheng, JianHuang Lai

Specifically, we develop a hierarchical encoder that encodes the multi-modal inputs into semantics-aligned representations at different levels.

Sentence Video Grounding

STVGFormer: Spatio-Temporal Video Grounding with Static-Dynamic Cross-Modal Understanding

no code implementations6 Jul 2022 Zihang Lin, Chaolei Tan, Jian-Fang Hu, Zhi Jin, Tiancai Ye, Wei-Shi Zheng

The static branch performs cross-modal understanding in a single frame and learns to localize the target object spatially according to intra-frame visual cues like object appearances.

Spatio-Temporal Video Grounding Video Grounding

Action-guided 3D Human Motion Prediction

no code implementations NeurIPS 2021 Jiangxin Sun, Zihang Lin, Xintong Han, Jian-Fang Hu, Jia Xu, Wei-Shi Zheng

The ability of forecasting future human motion is important for human-machine interaction systems to understand human behaviors and make interaction.

Human motion prediction motion prediction

Augmented 2D-TAN: A Two-stage Approach for Human-centric Spatio-Temporal Video Grounding

no code implementations20 Jun 2021 Chaolei Tan, Zihang Lin, Jian-Fang Hu, Xiang Li, Wei-Shi Zheng

We propose an effective two-stage approach to tackle the problem of language-based Human-centric Spatio-Temporal Video Grounding (HC-STVG) task.

Spatio-Temporal Video Grounding Video Grounding

Predictive Feature Learning for Future Segmentation Prediction

no code implementations ICCV 2021 Zihang Lin, Jiangxin Sun, Jian-Fang Hu, QiZhi Yu, Jian-Huang Lai, Wei-Shi Zheng

In the latent feature learned by the autoencoder, global structures are enhanced and local details are suppressed so that it is more predictive.

Segmentation

Improving Fast Segmentation With Teacher-student Learning

no code implementations19 Oct 2018 Jiafeng Xie, Bing Shuai, Jian-Fang Hu, Jingyang Lin, Wei-Shi Zheng

Recently, segmentation neural networks have been significantly improved by demonstrating very promising accuracies on public benchmarks.

Segmentation

Early action prediction by soft regression

no code implementations IEEE Transactions on Pattern Analysis and Machine Intelligence 2018 Jian-Fang Hu, Wei-Shi Zheng, Lianyang Ma, Gang Wang, Jian-Huang Lai, Jian-Guo Zhang

Our formulation of soft regression framework 1) overcomes a usual assumption in existing early action prediction systems that the progress level of on-going sequence is given in the testing stage; and 2) presents a theoretical framework to better resolve the ambiguity and uncertainty of subsequences at early performing stage.

Early Action Prediction regression +1

Latent Embeddings for Collective Activity Recognition

no code implementations20 Sep 2017 Yongyi Tang, Peizhen Zhang, Jian-Fang Hu, Wei-Shi Zheng

Rather than simply recognizing the action of a person individually, collective activity recognition aims to find out what a group of people is acting in a collective scene.

Activity Recognition

Jointly learning heterogeneous features for rgb-d activity recognition

no code implementations IEEE Transactions on Pattern Analysis and Machine Intelligence ( Volume: 39 , Issue: 11 , Nov. 1 2017 ) 2016 Jian-Fang Hu, Wei-Shi Zheng, Jian-Huang Lai, Jian-Guo Zhang

The proposed model formed in a unified framework is capable of: 1) jointly mining a set of subspaces with the same dimensionality to exploit latent shared features across different feature channels, 2) meanwhile, quantifying the shared and feature-specific components of features in the subspaces, and 3) transferring feature-specific intermediate transforms (i-transforms) for learning fusion of heterogeneous features across datasets.

Activity Recognition Benchmarking +3

Cannot find the paper you are looking for? You can Submit a new open access paper.