Search Results for author: Jian-Fang Hu

Found 14 papers, 1 papers with code

Ranking Distillation for Open-Ended Video Question Answering with Insufficient Labels

no code implementations • 21 Mar 2024 • Tianming Liang, Chaolei Tan, Beihao Xia, Wei-Shi Zheng, Jian-Fang Hu

This paper focuses on open-ended video question answering, which aims to find the correct answers from a large answer set in response to a video-related question.

Multi-Label Classification Question Answering +1

Paper
Add Code

Siamese Learning with Joint Alignment and Regression for Weakly-Supervised Video Paragraph Grounding

no code implementations • 18 Mar 2024 • Chaolei Tan, JianHuang Lai, Wei-Shi Zheng, Jian-Fang Hu

Different from previous weakly-supervised grounding frameworks based on multiple instance learning or reconstruction learning for two-stage candidate ranking, we propose a novel siamese learning framework that jointly learns the cross-modal feature alignment and temporal coordinate regression without timestamp labels to achieve concise one-stage localization for WSVPG.

Multiple Instance Learning

Paper
Add Code

Selective Hourglass Mapping for Universal Image Restoration Based on Diffusion Model

1 code implementation • 17 Mar 2024 • Dian Zheng, Xiao-Ming Wu, Shuzhou Yang, Jian Zhang, Jian-Fang Hu, Wei-Shi Zheng

Universal image restoration is a practical and potential computer vision task for real-world applications.

Image Restoration Zero-shot Generalization

Paper
Code

Collaborative Static and Dynamic Vision-Language Streams for Spatio-Temporal Video Grounding

no code implementations • CVPR 2023 • Zihang Lin, Chaolei Tan, Jian-Fang Hu, Zhi Jin, Tiancai Ye, Wei-Shi Zheng

The static stream performs cross-modal understanding in a single frame and learns to attend to the target object spatially according to intra-frame visual cues like object appearances.

Object Spatio-Temporal Video Grounding +1

Paper
Add Code

Hierarchical Semantic Correspondence Networks for Video Paragraph Grounding

no code implementations • CVPR 2023 • Chaolei Tan, Zihang Lin, Jian-Fang Hu, Wei-Shi Zheng, JianHuang Lai

Specifically, we develop a hierarchical encoder that encodes the multi-modal inputs into semantics-aligned representations at different levels.

Sentence Video Grounding

Paper
Add Code

STVGFormer: Spatio-Temporal Video Grounding with Static-Dynamic Cross-Modal Understanding

no code implementations • 6 Jul 2022 • Zihang Lin, Chaolei Tan, Jian-Fang Hu, Zhi Jin, Tiancai Ye, Wei-Shi Zheng

The static branch performs cross-modal understanding in a single frame and learns to localize the target object spatially according to intra-frame visual cues like object appearances.

Ranked #2 on Spatio-Temporal Video Grounding on HC-STVG2

Spatio-Temporal Video Grounding Video Grounding

Paper
Add Code

Action-guided 3D Human Motion Prediction

no code implementations • NeurIPS 2021 • Jiangxin Sun, Zihang Lin, Xintong Han, Jian-Fang Hu, Jia Xu, Wei-Shi Zheng

The ability of forecasting future human motion is important for human-machine interaction systems to understand human behaviors and make interaction.

Human motion prediction motion prediction

Paper
Add Code

Augmented 2D-TAN: A Two-stage Approach for Human-centric Spatio-Temporal Video Grounding

no code implementations • 20 Jun 2021 • Chaolei Tan, Zihang Lin, Jian-Fang Hu, Xiang Li, Wei-Shi Zheng

We propose an effective two-stage approach to tackle the problem of language-based Human-centric Spatio-Temporal Video Grounding (HC-STVG) task.

Spatio-Temporal Video Grounding Video Grounding

Paper
Add Code

Predictive Feature Learning for Future Segmentation Prediction

no code implementations • ICCV 2021 • Zihang Lin, Jiangxin Sun, Jian-Fang Hu, QiZhi Yu, Jian-Huang Lai, Wei-Shi Zheng

In the latent feature learned by the autoencoder, global structures are enhanced and local details are suppressed so that it is more predictive.

Segmentation

Paper
Add Code

Improving Fast Segmentation With Teacher-student Learning

no code implementations • 19 Oct 2018 • Jiafeng Xie, Bing Shuai, Jian-Fang Hu, Jingyang Lin, Wei-Shi Zheng

Recently, segmentation neural networks have been significantly improved by demonstrating very promising accuracies on public benchmarks.

Segmentation

Paper
Add Code

Deep Bilinear Learning for RGB-D Action Recognition

no code implementations • ECCV 2018 • Jian-Fang Hu, Wei-Shi Zheng, Jia-Hui Pan, Jian-Huang Lai, Jian-Guo Zhang

In this paper, we focus on exploring modality-temporal mutual information for RGB-D action recognition.

Action Recognition Temporal Action Localization

Paper
Add Code

Early action prediction by soft regression

no code implementations • IEEE Transactions on Pattern Analysis and Machine Intelligence 2018 • Jian-Fang Hu, Wei-Shi Zheng, Lianyang Ma, Gang Wang, Jian-Huang Lai, Jian-Guo Zhang

Our formulation of soft regression framework 1) overcomes a usual assumption in existing early action prediction systems that the progress level of on-going sequence is given in the testing stage; and 2) presents a theoretical framework to better resolve the ambiguity and uncertainty of subsequences at early performing stage.

Ranked #70 on Skeleton Based Action Recognition on NTU RGB+D 120

Early Action Prediction regression +1

Paper
Add Code

Latent Embeddings for Collective Activity Recognition

no code implementations • 20 Sep 2017 • Yongyi Tang, Peizhen Zhang, Jian-Fang Hu, Wei-Shi Zheng

Rather than simply recognizing the action of a person individually, collective activity recognition aims to find out what a group of people is acting in a collective scene.

Activity Recognition

Paper
Add Code

Jointly learning heterogeneous features for rgb-d activity recognition

no code implementations • IEEE Transactions on Pattern Analysis and Machine Intelligence ( Volume: 39 , Issue: 11 , Nov. 1 2017 ) 2016 • Jian-Fang Hu, Wei-Shi Zheng, Jian-Huang Lai, Jian-Guo Zhang

The proposed model formed in a unified framework is capable of: 1) jointly mining a set of subspaces with the same dimensionality to exploit latent shared features across different feature channels, 2) meanwhile, quantifying the shared and feature-specific components of features in the subspaces, and 3) transferring feature-specific intermediate transforms (i-transforms) for learning fusion of heterogeneous features across datasets.

Ranked #8 on Skeleton Based Action Recognition on SYSU 3D

Activity Recognition Benchmarking +3

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.