1 code implementation • 25 Nov 2023 • Ruyang Liu, Jingjia Huang, Wei Gao, Thomas H. Li, Ge Li
Large-scale image-language pretrained models, e. g., CLIP, have demonstrated remarkable proficiency in acquiring general multi-modal knowledge through web-scale image-text data.
no code implementations • 3 Apr 2023 • Yabo Zhang, ZiHao Wang, Jun Hao Liew, Jingjia Huang, Manyu Zhu, Jiashi Feng, WangMeng Zuo
In this work, we investigate performing semantic segmentation solely through the training on image-sentence pairs.
1 code implementation • ICLR 2023 • Ruyang Liu, Jingjia Huang, Ge Li, Thomas H. Li
Visual attention does not always capture the essential object representation desired for robust predictions.
Ranked #1 on Multi-Label Image Classification on MSCOCO
Multi-Label Classification Multi-Label Image Classification +1
1 code implementation • CVPR 2023 • Ruyang Liu, Jingjia Huang, Ge Li, Jiashi Feng, Xinglong Wu, Thomas H. Li
In this paper, based on the CLIP model, we revisit temporal modeling in the context of image-to-video knowledge transferring, which is the key point for extending image-text pretrained models to the video domain.
Ranked #7 on Video Retrieval on MSR-VTT-1kA (using extra training data)
no code implementations • 18 Jan 2023 • Fan Ma, Xiaojie Jin, Heng Wang, Jingjia Huang, Linchao Zhu, Jiashi Feng, Yi Yang
Specifically, text-video localization consists of moment retrieval, which predicts start and end boundaries in videos given the text description, and text localization which matches the subset of texts with the video features.
1 code implementation • 21 Dec 2022 • Jingjia Huang, Yuanqi Chen, Jiashi Feng, Xinglong Wu
Semi-supervised learning based methods are current SOTA solutions to the noisy-label learning problem, which rely on learning an unsupervised label cleaner first to divide the training samples into a labeled set for clean data and an unlabeled set for noise data.
Ranked #3 on Image Classification on Clothing1M
no code implementations • 16 Jul 2022 • Jingjia Huang, Baixiang Yang
Attention based relation parsing is a popular and effective strategy utilized in HOI.
1 code implementation • CVPR 2023 • Jingjia Huang, Yinan Li, Jiashi Feng, Xinglong Wu, Xiaoshuai Sun, Rongrong Ji
We then introduce \textbf{Clover}\textemdash a Correlated Video-Language pre-training method\textemdash towards a universal Video-Language model for solving multiple video understanding tasks with neither performance nor efficiency compromise.
Ranked #1 on Video Question Answering on LSMDC-FiB
1 code implementation • ICCV 2019 • Jingjia Huang, Zhangheng Li, Nannan Li, Shan Liu, Ge Li
Graph convolutional networks (GCNs) are potentially short of the ability to learn hierarchical representation for graph embedding, which holds them back in the graph classification task.
1 code implementation • 28 Jun 2019 • Zhangheng Li, Jia-Xing Zhong, Jingjia Huang, Tao Zhang, Thomas Li, Ge Li
In recent years, memory-augmented neural networks(MANNs) have shown promising power to enhance the memory ability of neural networks for sequential processing tasks.
no code implementations • 27 Sep 2018 • Zhangheng Li, Jia-Xing Zhong, Jingjia Huang, Tao Zhang, Thomas Li, Ge Li
Processing sequential data with long term dependencies and learn complex transitions are two major challenges in many deep learning applications.
1 code implementation • 22 Jun 2017 • Jingjia Huang, Nannan Li, Tao Zhang, Ge Li
Existing action detection algorithms usually generate action proposals through an extensive search over the video at multiple temporal scales, which brings about huge computational overhead and deviates from the human perception procedure.