Search Results for author: Fangxun Shu

Found 4 papers, 2 papers with code

HyperLLaVA: Dynamic Visual and Language Expert Tuning for Multimodal Large Language Models

1 code implementation • 20 Mar 2024 • Wenqiao Zhang, Tianwei Lin, Jiang Liu, Fangxun Shu, Haoyuan Li, Lei Zhang, He Wanggui, Hao Zhou, Zheqi Lv, Hao Jiang, Juncheng Li, Siliang Tang, Yueting Zhuang

Recent advancements indicate that scaling up Multimodal Large Language Models (MLLMs) effectively enhances performance on downstream multimodal tasks.

Ranked #76 on Visual Question Answering on MM-Vet

Visual Question Answering

Paper
Code

Compress & Align: Curating Image-Text Data with Human Knowledge

no code implementations • 11 Dec 2023 • Lei Zhang, Fangxun Shu, Sucheng Ren, Bingchen Zhao, Hao Jiang, Cihang Xie

The massive growth of image-text data through web crawling inherently presents the challenge of variability in data quality.

Image Captioning Text Retrieval

Paper
Add Code

Audio-Visual LLM for Video Understanding

no code implementations • 11 Dec 2023 • Fangxun Shu, Lei Zhang, Hao Jiang, Cihang Xie

This paper presents Audio-Visual LLM, a Multimodal Large Language Model that takes both visual and auditory inputs for holistic video understanding.

AudioCaps Language Modelling +2

Paper
Add Code

Masked Contrastive Pre-Training for Efficient Video-Text Retrieval

1 code implementation • 2 Dec 2022 • Fangxun Shu, Biaolong Chen, Yue Liao, Shuwen Xiao, Wenyu Sun, Xiaobo Li, Yousong Zhu, Jinqiao Wang, Si Liu

Our MAC aims to reduce video representation's spatial and temporal redundancy in the VidLP model by a mask sampling mechanism to improve pre-training efficiency.

Ranked #37 on Video Retrieval on MSR-VTT-1kA (using extra training data)

Retrieval Text Retrieval +1

Paper
Code

Cannot find the paper you are looking for? You can Submit a new open access paper.