Search Results for author: Jingjia Huang

Found 12 papers, 8 papers with code

Mug-STAN: Adapting Image-Language Pretrained Models for General Video Understanding

1 code implementation25 Nov 2023 Ruyang Liu, Jingjia Huang, Wei Gao, Thomas H. Li, Ge Li

Large-scale image-language pretrained models, e. g., CLIP, have demonstrated remarkable proficiency in acquiring general multi-modal knowledge through web-scale image-text data.

Video Understanding

Revisiting Temporal Modeling for CLIP-based Image-to-Video Knowledge Transferring

1 code implementation CVPR 2023 Ruyang Liu, Jingjia Huang, Ge Li, Jiashi Feng, Xinglong Wu, Thomas H. Li

In this paper, based on the CLIP model, we revisit temporal modeling in the context of image-to-video knowledge transferring, which is the key point for extending image-text pretrained models to the video domain.

Ranked #7 on Video Retrieval on MSR-VTT-1kA (using extra training data)

Representation Learning Retrieval +3

Temporal Perceiving Video-Language Pre-training

no code implementations18 Jan 2023 Fan Ma, Xiaojie Jin, Heng Wang, Jingjia Huang, Linchao Zhu, Jiashi Feng, Yi Yang

Specifically, text-video localization consists of moment retrieval, which predicts start and end boundaries in videos given the text description, and text localization which matches the subset of texts with the video features.

Contrastive Learning Moment Retrieval +7

Class Prototype-based Cleaner for Label Noise Learning

1 code implementation21 Dec 2022 Jingjia Huang, Yuanqi Chen, Jiashi Feng, Xinglong Wu

Semi-supervised learning based methods are current SOTA solutions to the noisy-label learning problem, which rely on learning an unsupervised label cleaner first to divide the training samples into a labeled set for clean data and an unlabeled set for noise data.

Image Classification

Clover: Towards A Unified Video-Language Alignment and Fusion Model

1 code implementation CVPR 2023 Jingjia Huang, Yinan Li, Jiashi Feng, Xinglong Wu, Xiaoshuai Sun, Rongrong Ji

We then introduce \textbf{Clover}\textemdash a Correlated Video-Language pre-training method\textemdash towards a universal Video-Language model for solving multiple video understanding tasks with neither performance nor efficiency compromise.

Language Modelling Question Answering +10

AttPool: Towards Hierarchical Feature Representation in Graph Convolutional Networks via Attention Mechanism

1 code implementation ICCV 2019 Jingjia Huang, Zhangheng Li, Nannan Li, Shan Liu, Ge Li

Graph convolutional networks (GCNs) are potentially short of the ability to learn hierarchical representation for graph embedding, which holds them back in the graph classification task.

General Classification Graph Classification +1

ARMIN: Towards a More Efficient and Light-weight Recurrent Memory Network

1 code implementation28 Jun 2019 Zhangheng Li, Jia-Xing Zhong, Jingjia Huang, Tao Zhang, Thomas Li, Ge Li

In recent years, memory-augmented neural networks(MANNs) have shown promising power to enhance the memory ability of neural networks for sequential processing tasks.

SEQUENCE MODELLING WITH AUTO-ADDRESSING AND RECURRENT MEMORY INTEGRATING NETWORKS

no code implementations27 Sep 2018 Zhangheng Li, Jia-Xing Zhong, Jingjia Huang, Tao Zhang, Thomas Li, Ge Li

Processing sequential data with long term dependencies and learn complex transitions are two major challenges in many deep learning applications.

A Self-Adaptive Proposal Model for Temporal Action Detection based on Reinforcement Learning

1 code implementation22 Jun 2017 Jingjia Huang, Nannan Li, Tao Zhang, Ge Li

Existing action detection algorithms usually generate action proposals through an extensive search over the video at multiple temporal scales, which brings about huge computational overhead and deviates from the human perception procedure.

Action Detection Position +2

Cannot find the paper you are looking for? You can Submit a new open access paper.