Search Results for author: Yuchong Sun

Found 7 papers, 4 papers with code

Parrot: Enhancing Multi-Turn Chat Models by Learning to Ask Questions

no code implementations11 Oct 2023 Yuchong Sun, Che Liu, Jinwen Huang, Ruihua Song, Fuzheng Zhang, Di Zhang, Zhongyuan Wang, Kun Gai

In this paper, we address these challenges by introducing Parrot, a highly scalable solution designed to automatically generate high-quality instruction-tuning data, which are then used to enhance the effectiveness of chat models in multi-turn conversations.

Attribute Instruction Following

ViCo: Engaging Video Comment Generation with Human Preference Rewards

no code implementations22 Aug 2023 Yuchong Sun, Bei Liu, Xu Chen, Ruihua Song, Jianlong Fu

Experiments on ViCo-20k show that the comments generated by our ViCo model exhibit the best performance in terms of both quantitative and qualitative results, particularly when engagement is considered.

Caption Generation Comment Generation +1

TeViS:Translating Text Synopses to Video Storyboards

no code implementations31 Dec 2022 Xu Gu, Yuchong Sun, Feiyue Ni, ShiZhe Chen, Xihua Wang, Ruihua Song, Boyuan Li, Xiang Cao

In this paper, we propose a new task called Text synopsis to Video Storyboard (TeViS) which aims to retrieve an ordered sequence of images as the video storyboard to visualize the text synopsis.

Language Modelling Quantization

Long-Form Video-Language Pre-Training with Multimodal Temporal Contrastive Learning

1 code implementation12 Oct 2022 Yuchong Sun, Hongwei Xue, Ruihua Song, Bei Liu, Huan Yang, Jianlong Fu

Large-scale video-language pre-training has shown significant improvement in video-language understanding tasks.

Ranked #2 on Video Retrieval on QuerYD (using extra training data)

Contrastive Learning Question Answering +3

Advancing High-Resolution Video-Language Representation with Large-Scale Video Transcriptions

1 code implementation CVPR 2022 Hongwei Xue, Tiankai Hang, Yanhong Zeng, Yuchong Sun, Bei Liu, Huan Yang, Jianlong Fu, Baining Guo

To enable VL pre-training, we jointly optimize the HD-VILA model by a hybrid Transformer that learns rich spatiotemporal features, and a multimodal Transformer that enforces interactions of the learned video features with diversified texts.

Retrieval Super-Resolution +4

Cannot find the paper you are looking for? You can Submit a new open access paper.