Search Results for author: Yunlong Tang

Found 9 papers, 5 papers with code

V2Xum-LLM: Cross-Modal Video Summarization with Temporal Prompt Instruction Tuning

no code implementations18 Apr 2024 Hang Hua, Yunlong Tang, Chenliang Xu, Jiebo Luo

Recent efforts have been made to expand from unimodal to multimodal video summarization, categorizing the task into three sub-tasks based on the summary's modality: video-to-video (V2V), video-to-text (V2T), and a combination of video and text summarization (V2VT).

DPStyler: Dynamic PromptStyler for Source-Free Domain Generalization

no code implementations25 Mar 2024 Yunlong Tang, Yuxuan Wan, Lei Qi, Xin Geng

The Style Generation module refreshes all styles at every training epoch, while the Style Removal module eliminates variations in the encoder's output features caused by input styles.

Source-free Domain Generalization

AVicuna: Audio-Visual LLM with Interleaver and Context-Boundary Alignment for Temporal Referential Dialogue

no code implementations24 Mar 2024 Yunlong Tang, Daiki Shimada, Jing Bi, Chenliang Xu

In everyday communication, humans frequently use speech and gestures to refer to specific areas or objects, a process known as Referential Dialogue (RD).

Video Understanding

Emo-Avatar: Efficient Monocular Video Style Avatar through Texture Rendering

no code implementations1 Feb 2024 Pinxin Liu, Luchuan Song, Daoan Zhang, Hang Hua, Yunlong Tang, Huaijin Tu, Jiebo Luo, Chenliang Xu

To address the above problems, we propose the Efficient Monotonic Video Style Avatar (Emo-Avatar) through deferred neural rendering that enhances StyleGAN's capacity for producing dynamic, drivable portrait videos.

Contrastive Learning Neural Rendering

Video Understanding with Large Language Models: A Survey

1 code implementation29 Dec 2023 Yunlong Tang, Jing Bi, Siting Xu, Luchuan Song, Susan Liang, Teng Wang, Daoan Zhang, Jie An, Jingyang Lin, Rongyi Zhu, Ali Vosoughi, Chao Huang, Zeliang Zhang, Feng Zheng, JianGuo Zhang, Ping Luo, Jiebo Luo, Chenliang Xu

With the burgeoning growth of online video platforms and the escalating volume of video content, the demand for proficient video understanding tools has intensified markedly.

Video Understanding

LaunchpadGPT: Language Model as Music Visualization Designer on Launchpad

1 code implementation7 Jul 2023 Siting Xu, Yunlong Tang, Feng Zheng

To assist and inspire the design of the Launchpad light effect, and provide a more accessible approach for beginners to create music visualization with this instrument, we proposed the LaunchpadGPT model to generate music visualization designs on Launchpad automatically.

Language Modelling

LLMVA-GEBC: Large Language Model with Video Adapter for Generic Event Boundary Captioning

1 code implementation17 Jun 2023 Yunlong Tang, Jinrui Zhang, Xiangchen Wang, Teng Wang, Feng Zheng

This paper proposes an effective model LLMVA-GEBC (Large Language Model with Video Adapter for Generic Event Boundary Captioning): (1) We utilize a pretrained LLM for generating human-like captions with high quality.

Boundary Captioning Language Modelling +1

Caption Anything: Interactive Image Description with Diverse Multimodal Controls

1 code implementation4 May 2023 Teng Wang, Jinrui Zhang, Junjie Fei, Hao Zheng, Yunlong Tang, Zhe Li, Mingqi Gao, Shanshan Zhao

Controllable image captioning is an emerging multimodal topic that aims to describe the image with natural language following human purpose, $\textit{e. g.}$, looking at the specified regions or telling in a particular text style.

controllable image captioning Instruction Following

Multi-modal Segment Assemblage Network for Ad Video Editing with Importance-Coherence Reward

1 code implementation25 Sep 2022 Yunlong Tang, Siting Xu, Teng Wang, Qin Lin, Qinglin Lu, Feng Zheng

The existing method performs well at video segmentation stages but suffers from the problems of dependencies on extra cumbersome models and poor performance at the segment assemblage stage.

Video Editing Video Segmentation +1

Cannot find the paper you are looking for? You can Submit a new open access paper.