Search Results for author: Weijie Kong

Found 9 papers, 6 papers with code

Global and Local Semantic Completion Learning for Vision-Language Pre-training

1 code implementation • 12 Jun 2023 • Rong-Cheng Tu, Yatai Ji, Jie Jiang, Weijie Kong, Chengfei Cai, Wenzhe Zhao, Hongfa Wang, Yujiu Yang, Wei Liu

MGSC promotes learning more representative global features, which have a great impact on the performance of downstream tasks, while MLTC reconstructs modal-fusion local tokens, further enhancing accurate comprehension of multimodal data.

Language Modelling Masked Language Modeling +5

Paper
Code

Seeing What You Miss: Vision-Language Pre-training with Semantic Completion Learning

1 code implementation • CVPR 2023 • Yatai Ji, RongCheng Tu, Jie Jiang, Weijie Kong, Chengfei Cai, Wenzhe Zhao, Hongfa Wang, Yujiu Yang, Wei Liu

Cross-modal alignment is essential for vision-language pre-training (VLP) models to learn the correct corresponding information across different modalities.

Ranked #8 on Zero-Shot Video Retrieval on LSMDC

Language Modelling Masked Language Modeling +6

Paper
Code

Egocentric Video-Language Pretraining @ EPIC-KITCHENS-100 Multi-Instance Retrieval Challenge 2022

1 code implementation • 4 Jul 2022 • Kevin Qinghong Lin, Alex Jinpeng Wang, Rui Yan, Eric Zhongcong Xu, RongCheng Tu, Yanru Zhu, Wenzhe Zhao, Weijie Kong, Chengfei Cai, Hongfa Wang, Wei Liu, Mike Zheng Shou

In this report, we propose a video-language pretraining (VLP) based solution \cite{kevin2022egovlp} for the EPIC-KITCHENS-100 Multi-Instance Retrieval (MIR) challenge.

Language Modelling Multi-Instance Retrieval +1

203

Paper
Code

Egocentric Video-Language Pretraining @ Ego4D Challenge 2022

1 code implementation • 4 Jul 2022 • Kevin Qinghong Lin, Alex Jinpeng Wang, Mattia Soldan, Michael Wray, Rui Yan, Eric Zhongcong Xu, Difei Gao, RongCheng Tu, Wenzhe Zhao, Weijie Kong, Chengfei Cai, Hongfa Wang, Dima Damen, Bernard Ghanem, Wei Liu, Mike Zheng Shou

In this report, we propose a video-language pretraining (VLP) based solution \cite{kevin2022egovlp} for four Ego4D challenge tasks, including Natural Language Query (NLQ), Moment Query (MQ), Object State Change Classification (OSCC), and PNR Localization (PNR).

Language Modelling Object State Change Classification

203

Paper
Code

Egocentric Video-Language Pretraining

2 code implementations • 3 Jun 2022 • Kevin Qinghong Lin, Alex Jinpeng Wang, Mattia Soldan, Michael Wray, Rui Yan, Eric Zhongcong Xu, Difei Gao, RongCheng Tu, Wenzhe Zhao, Weijie Kong, Chengfei Cai, Hongfa Wang, Dima Damen, Bernard Ghanem, Wei Liu, Mike Zheng Shou

Video-Language Pretraining (VLP), which aims to learn transferable representation to advance a wide range of video-text downstream tasks, has recently received increasing attention.

Ranked #2 on Video Summarization on Query-Focused Video Summarization Dataset

Action Recognition Contrastive Learning +11

203

Paper
Code

Tencent Text-Video Retrieval: Hierarchical Cross-Modal Interactions with Multi-Level Representations

no code implementations • 7 Apr 2022 • Jie Jiang, Shaobo Min, Weijie Kong, Dihong Gong, Hongfa Wang, Zhifeng Li, Wei Liu

With multi-level representations for video and text, hierarchical contrastive learning is designed to explore fine-grained cross-modal relationships, i. e., frame-word, clip-phrase, and video-sentence, which enables HCMI to achieve a comprehensive semantic comparison between video and text modalities.

Ranked #1 on Video Retrieval on MSR-VTT-1kA (using extra training data)

Contrastive Learning Denoising +4

Paper
Add Code

Graph Convolutional Label Noise Cleaner: Train a Plug-and-play Action Classifier for Anomaly Detection

1 code implementation • CVPR 2019 • Jia-Xing Zhong, Nannan Li, Weijie Kong, Shan Liu, Thomas H. Li, Ge Li

Remarkably, we obtain the frame-level AUC score of 82. 12% on UCF-Crime.

Ranked #6 on Anomaly Detection In Surveillance Videos on UCSD Peds2

Anomaly Detection In Surveillance Videos Multiple Instance Learning +3

222

Paper
Code

BLP -- Boundary Likelihood Pinpointing Networks for Accurate Temporal Action Localization

no code implementations • 6 Nov 2018 • Weijie Kong, Nannan Li, Shan Liu, Thomas Li, Ge Li

Despite tremendous progress achieved in temporal action detection, state-of-the-art methods still suffer from the sharp performance deterioration when localizing the starting and ending temporal action boundaries.

Action Detection regression +1

Paper
Add Code

Step-by-step Erasion, One-by-one Collection: A Weakly Supervised Temporal Action Detector

no code implementations • 9 Jul 2018 • Jia-Xing Zhong, Nannan Li, Weijie Kong, Tao Zhang, Thomas H. Li, Ge Li

Weakly supervised temporal action detection is a Herculean task in understanding untrimmed videos, since no supervisory signal except the video-level category label is available on training data.

Action Detection Temporal Localization

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.