no code implementations • 30 Sep 2024 • Kaisi Guan, Qian Cao, Yuchong Sun, Xiting Wang, Ruihua Song
Retrieval Augmented Generation (RAG) system is important in domains such as e-commerce, which has many long-tail entities and frequently updated information.
no code implementations • 11 Oct 2023 • Yuchong Sun, Che Liu, Kun Zhou, Jinwen Huang, Ruihua Song, Wayne Xin Zhao, Fuzheng Zhang, Di Zhang, Kun Gai
In this paper, we introduce Parrot, a solution aiming to enhance multi-turn instruction following for LLMs.
no code implementations • 22 Aug 2023 • Yuchong Sun, Bei Liu, Xu Chen, Ruihua Song, Jianlong Fu
Experiments on ViCo-20k show that the comments generated by our ViCo model exhibit the best performance in terms of both quantitative and qualitative results, particularly when engagement is considered.
1 code implementation • 31 Dec 2022 • Xu Gu, Yuchong Sun, Feiyue Ni, ShiZhe Chen, Xihua Wang, Ruihua Song, Boyuan Li, Xiang Cao
In this paper, we propose a new task called Text synopsis to Video Storyboard (TeViS) which aims to retrieve an ordered sequence of images as the video storyboard to visualize the text synopsis.
1 code implementation • 12 Oct 2022 • Yuchong Sun, Hongwei Xue, Ruihua Song, Bei Liu, Huan Yang, Jianlong Fu
Large-scale video-language pre-training has shown significant improvement in video-language understanding tasks.
Ranked #2 on
Video Retrieval
on QuerYD
(using extra training data)
1 code implementation • 14 Sep 2022 • Hongwei Xue, Yuchong Sun, Bei Liu, Jianlong Fu, Ruihua Song, Houqiang Li, Jiebo Luo
and 2) how to mitigate the impact of these factors?
Ranked #2 on
Video Retrieval
on MSR-VTT-1kA
(using extra training data)
1 code implementation • CVPR 2022 • Hongwei Xue, Tiankai Hang, Yanhong Zeng, Yuchong Sun, Bei Liu, Huan Yang, Jianlong Fu, Baining Guo
To enable VL pre-training, we jointly optimize the HD-VILA model by a hybrid Transformer that learns rich spatiotemporal features, and a multimodal Transformer that enforces interactions of the learned video features with diversified texts.
Ranked #17 on
Video Retrieval
on MSR-VTT
2 code implementations • 11 Mar 2021 • Yuqi Huo, Manli Zhang, Guangzhen Liu, Haoyu Lu, Yizhao Gao, Guoxing Yang, Jingyuan Wen, Heng Zhang, Baogui Xu, Weihao Zheng, Zongzheng Xi, Yueqian Yang, Anwen Hu, Jinming Zhao, Ruichen Li, Yida Zhao, Liang Zhang, Yuqing Song, Xin Hong, Wanqing Cui, Danyang Hou, Yingyan Li, Junyi Li, Peiyu Liu, Zheng Gong, Chuhao Jin, Yuchong Sun, ShiZhe Chen, Zhiwu Lu, Zhicheng Dou, Qin Jin, Yanyan Lan, Wayne Xin Zhao, Ruihua Song, Ji-Rong Wen
We further construct a large Chinese multi-source image-text dataset called RUC-CAS-WenLan for pre-training our BriVL model.
Ranked #1 on
Image Retrieval
on RUC-CAS-WenLan