1 code implementation • 17 Dec 2024 • Xing Liufu, Chaolei Tan, Xiaotong LIN, Yonggang Qi, Jinxuan Li, Jian-Fang Hu
Edge labels are typically at various granularity levels owing to the varying preferences of annotators, thus handling the subjectivity of per-pixel labels has been a focal point for edge detection.
no code implementations • 26 Nov 2024 • Yuan-Ming Li, An-Lan Wang, Kun-Yu Lin, Yu-Ming Tang, Ling-An Zeng, Jian-Fang Hu, Wei-Shi Zheng
To bridge this gap, we investigate a new task termed Descriptive Action Coaching (DAC) which requires a model to provide detailed commentary on what is done well and what can be improved beyond a quality score from an action execution.
no code implementations • 3 Aug 2024 • Chaolei Tan, Zihang Lin, Junfu Pu, Zhongang Qi, Wei-Yi Pei, Zhi Qu, Yexin Wang, Ying Shan, Wei-Shi Zheng, Jian-Fang Hu
Based on the dataset, we further introduce a more complex setting of video grounding dubbed Multi-Paragraph Video Grounding (MPVG), which takes as input multiple paragraphs and a long video for grounding each paragraph query to its temporal interval.
1 code implementation • 16 Jul 2024 • Xiaotong LIN, Tianming Liang, JianHuang Lai, Jian-Fang Hu
In the final stage, the model aims to address the entire future trajectory task by taking full advantage of the knowledge from previous stages.
Ranked #3 on Trajectory Prediction on Stanford Drone
no code implementations • CVPR 2024 • Tianming Liang, Chaolei Tan, Beihao Xia, Wei-Shi Zheng, Jian-Fang Hu
This paper focuses on open-ended video question answering, which aims to find the correct answers from a large answer set in response to a video-related question.
no code implementations • CVPR 2024 • Chaolei Tan, JianHuang Lai, Wei-Shi Zheng, Jian-Fang Hu
Different from previous weakly-supervised grounding frameworks based on multiple instance learning or reconstruction learning for two-stage candidate ranking, we propose a novel siamese learning framework that jointly learns the cross-modal feature alignment and temporal coordinate regression without timestamp labels to achieve concise one-stage localization for WSVPG.
1 code implementation • CVPR 2024 • Dian Zheng, Xiao-Ming Wu, Shuzhou Yang, Jian Zhang, Jian-Fang Hu, Wei-Shi Zheng
Universal image restoration is a practical and potential computer vision task for real-world applications.
no code implementations • CVPR 2023 • Chaolei Tan, Zihang Lin, Jian-Fang Hu, Wei-Shi Zheng, JianHuang Lai
Specifically, we develop a hierarchical encoder that encodes the multi-modal inputs into semantics-aligned representations at different levels.
no code implementations • CVPR 2023 • Zihang Lin, Chaolei Tan, Jian-Fang Hu, Zhi Jin, Tiancai Ye, Wei-Shi Zheng
The static stream performs cross-modal understanding in a single frame and learns to attend to the target object spatially according to intra-frame visual cues like object appearances.
no code implementations • 6 Jul 2022 • Zihang Lin, Chaolei Tan, Jian-Fang Hu, Zhi Jin, Tiancai Ye, Wei-Shi Zheng
The static branch performs cross-modal understanding in a single frame and learns to localize the target object spatially according to intra-frame visual cues like object appearances.
Ranked #2 on Spatio-Temporal Video Grounding on HC-STVG2
no code implementations • NeurIPS 2021 • Jiangxin Sun, Zihang Lin, Xintong Han, Jian-Fang Hu, Jia Xu, Wei-Shi Zheng
The ability of forecasting future human motion is important for human-machine interaction systems to understand human behaviors and make interaction.
no code implementations • 20 Jun 2021 • Chaolei Tan, Zihang Lin, Jian-Fang Hu, Xiang Li, Wei-Shi Zheng
We propose an effective two-stage approach to tackle the problem of language-based Human-centric Spatio-Temporal Video Grounding (HC-STVG) task.
no code implementations • ICCV 2021 • Zihang Lin, Jiangxin Sun, Jian-Fang Hu, QiZhi Yu, Jian-Huang Lai, Wei-Shi Zheng
In the latent feature learned by the autoencoder, global structures are enhanced and local details are suppressed so that it is more predictive.
no code implementations • 19 Oct 2018 • Jiafeng Xie, Bing Shuai, Jian-Fang Hu, Jingyang Lin, Wei-Shi Zheng
Recently, segmentation neural networks have been significantly improved by demonstrating very promising accuracies on public benchmarks.
no code implementations • ECCV 2018 • Jian-Fang Hu, Wei-Shi Zheng, Jia-Hui Pan, Jian-Huang Lai, Jian-Guo Zhang
In this paper, we focus on exploring modality-temporal mutual information for RGB-D action recognition.
no code implementations • IEEE Transactions on Pattern Analysis and Machine Intelligence 2018 • Jian-Fang Hu, Wei-Shi Zheng, Lianyang Ma, Gang Wang, Jian-Huang Lai, Jian-Guo Zhang
Our formulation of soft regression framework 1) overcomes a usual assumption in existing early action prediction systems that the progress level of on-going sequence is given in the testing stage; and 2) presents a theoretical framework to better resolve the ambiguity and uncertainty of subsequences at early performing stage.
Ranked #80 on Skeleton Based Action Recognition on NTU RGB+D 120
no code implementations • 20 Sep 2017 • Yongyi Tang, Peizhen Zhang, Jian-Fang Hu, Wei-Shi Zheng
Rather than simply recognizing the action of a person individually, collective activity recognition aims to find out what a group of people is acting in a collective scene.
no code implementations • IEEE Transactions on Pattern Analysis and Machine Intelligence ( Volume: 39 , Issue: 11 , Nov. 1 2017 ) 2016 • Jian-Fang Hu, Wei-Shi Zheng, Jian-Huang Lai, Jian-Guo Zhang
The proposed model formed in a unified framework is capable of: 1) jointly mining a set of subspaces with the same dimensionality to exploit latent shared features across different feature channels, 2) meanwhile, quantifying the shared and feature-specific components of features in the subspaces, and 3) transferring feature-specific intermediate transforms (i-transforms) for learning fusion of heterogeneous features across datasets.
Ranked #8 on Skeleton Based Action Recognition on SYSU 3D