Search Results for author: Fangxun Shu

Found 4 papers, 2 papers with code

Compress & Align: Curating Image-Text Data with Human Knowledge

no code implementations11 Dec 2023 Lei Zhang, Fangxun Shu, Sucheng Ren, Bingchen Zhao, Hao Jiang, Cihang Xie

The massive growth of image-text data through web crawling inherently presents the challenge of variability in data quality.

Image Captioning Text Retrieval

Audio-Visual LLM for Video Understanding

no code implementations11 Dec 2023 Fangxun Shu, Lei Zhang, Hao Jiang, Cihang Xie

This paper presents Audio-Visual LLM, a Multimodal Large Language Model that takes both visual and auditory inputs for holistic video understanding.

AudioCaps Language Modelling +2

Masked Contrastive Pre-Training for Efficient Video-Text Retrieval

1 code implementation2 Dec 2022 Fangxun Shu, Biaolong Chen, Yue Liao, Shuwen Xiao, Wenyu Sun, Xiaobo Li, Yousong Zhu, Jinqiao Wang, Si Liu

Our MAC aims to reduce video representation's spatial and temporal redundancy in the VidLP model by a mask sampling mechanism to improve pre-training efficiency.

Ranked #37 on Video Retrieval on MSR-VTT-1kA (using extra training data)

Retrieval Text Retrieval +1

Cannot find the paper you are looking for? You can Submit a new open access paper.