Search Results for author: Yunlong Tang

Found 9 papers, 5 papers with code

V2Xum-LLM: Cross-Modal Video Summarization with Temporal Prompt Instruction Tuning

no code implementations • 18 Apr 2024 • Hang Hua, Yunlong Tang, Chenliang Xu, Jiebo Luo

Recent efforts have been made to expand from unimodal to multimodal video summarization, categorizing the task into three sub-tasks based on the summary's modality: video-to-video (V2V), video-to-text (V2T), and a combination of video and text summarization (V2VT).

Paper
Add Code

DPStyler: Dynamic PromptStyler for Source-Free Domain Generalization

no code implementations • 25 Mar 2024 • Yunlong Tang, Yuxuan Wan, Lei Qi, Xin Geng

The Style Generation module refreshes all styles at every training epoch, while the Style Removal module eliminates variations in the encoder's output features caused by input styles.

Source-free Domain Generalization

Paper
Add Code

AVicuna: Audio-Visual LLM with Interleaver and Context-Boundary Alignment for Temporal Referential Dialogue

no code implementations • 24 Mar 2024 • Yunlong Tang, Daiki Shimada, Jing Bi, Chenliang Xu

In everyday communication, humans frequently use speech and gestures to refer to specific areas or objects, a process known as Referential Dialogue (RD).

Video Understanding

Paper
Add Code

Emo-Avatar: Efficient Monocular Video Style Avatar through Texture Rendering

no code implementations • 1 Feb 2024 • Pinxin Liu, Luchuan Song, Daoan Zhang, Hang Hua, Yunlong Tang, Huaijin Tu, Jiebo Luo, Chenliang Xu

To address the above problems, we propose the Efficient Monotonic Video Style Avatar (Emo-Avatar) through deferred neural rendering that enhances StyleGAN's capacity for producing dynamic, drivable portrait videos.

Contrastive Learning Neural Rendering

Paper
Add Code

Video Understanding with Large Language Models: A Survey

1 code implementation • 29 Dec 2023 • Yunlong Tang, Jing Bi, Siting Xu, Luchuan Song, Susan Liang, Teng Wang, Daoan Zhang, Jie An, Jingyang Lin, Rongyi Zhu, Ali Vosoughi, Chao Huang, Zeliang Zhang, Feng Zheng, JianGuo Zhang, Ping Luo, Jiebo Luo, Chenliang Xu

With the burgeoning growth of online video platforms and the escalating volume of video content, the demand for proficient video understanding tools has intensified markedly.

Video Understanding

614

Paper
Code

LaunchpadGPT: Language Model as Music Visualization Designer on Launchpad

1 code implementation • 7 Jul 2023 • Siting Xu, Yunlong Tang, Feng Zheng

To assist and inspire the design of the Launchpad light effect, and provide a more accessible approach for beginners to create music visualization with this instrument, we proposed the LaunchpadGPT model to generate music visualization designs on Launchpad automatically.

Language Modelling

Paper
Code

LLMVA-GEBC: Large Language Model with Video Adapter for Generic Event Boundary Captioning

1 code implementation • 17 Jun 2023 • Yunlong Tang, Jinrui Zhang, Xiangchen Wang, Teng Wang, Feng Zheng

This paper proposes an effective model LLMVA-GEBC (Large Language Model with Video Adapter for Generic Event Boundary Captioning): (1) We utilize a pretrained LLM for generating human-like captions with high quality.

Boundary Captioning Language Modelling +1

Paper
Code

Caption Anything: Interactive Image Description with Diverse Multimodal Controls

1 code implementation • 4 May 2023 • Teng Wang, Jinrui Zhang, Junjie Fei, Hao Zheng, Yunlong Tang, Zhe Li, Mingqi Gao, Shanshan Zhao

Controllable image captioning is an emerging multimodal topic that aims to describe the image with natural language following human purpose, $\textit{e. g.}$, looking at the specified regions or telling in a particular text style.

controllable image captioning Instruction Following

1,598

Paper
Code

Multi-modal Segment Assemblage Network for Ad Video Editing with Importance-Coherence Reward

1 code implementation • 25 Sep 2022 • Yunlong Tang, Siting Xu, Teng Wang, Qin Lin, Qinglin Lu, Feng Zheng

The existing method performs well at video segmentation stages but suffers from the problems of dependencies on extra cumbersome models and poor performance at the segment assemblage stage.

Video Editing Video Segmentation +1

Paper
Code

Cannot find the paper you are looking for? You can Submit a new open access paper.