1 code implementation • 15 Mar 2024 • Yueqian Wang, Xiaojun Meng, Jianxin Liang, Yuxuan Wang, Qun Liu, Dongyan Zhao
Video-text Large Language Models (video-text LLMs) have shown remarkable performance in answering questions and holding conversations on simple videos.
Ranked #3 on Video Question Answering on MVBench
no code implementations • 12 Dec 2023 • Renlong Jie, Xiaojun Meng, Xin Jiang, Qun Liu
Different from the centrality-based ranking methods, our extractive scorer can be trained in an end-to-end manner, with no other requirement of positional assumption.
no code implementations • 23 Aug 2023 • Renlong Jie, Xiaojun Meng, Lifeng Shang, Xin Jiang, Qun Liu
Large language models (LLMs) like ChatGPT and GPT-4 have attracted great attention given their surprising performance on a wide range of NLP tasks.
no code implementations • 22 May 2023 • Renlong Jie, Xiaojun Meng, Lifeng Shang, Xin Jiang, Qun Liu
This study proposes a multitask learning architecture for extractive summarization with coherence boosting.
no code implementations • 8 May 2023 • Zenan Xu, Xiaojun Meng, Yasheng Wang, Qinliang Su, Zexuan Qiu, Xin Jiang, Qun Liu
Multimodal abstractive summarization for videos (MAS) requires generating a concise textual summary to describe the highlights of a video according to multimodal resources, in our case, the video content and its transcript.
no code implementations • 19 Dec 2022 • Haoli Bai, Zhiguang Liu, Xiaojun Meng, Wentao Li, Shuang Liu, Nian Xie, Rongfu Zheng, Liangwei Wang, Lu Hou, Jiansheng Wei, Xin Jiang, Qun Liu
While various vision-language pre-training objectives are studied in existing solutions, the document textline, as an intrinsic granularity in VDU, has seldom been explored so far.
no code implementations • 26 Nov 2022 • Xiaojun Meng, Wenlin Dai, Yasheng Wang, Baojun Wang, Zhiyong Wu, Xin Jiang, Qun Liu
Then we present a novel lexicon-injected semantic parser, which collects slot labels of tree representation as a lexicon, and injects lexical features to the span representation of parser.
1 code implementation • Findings (ACL) 2022 • Fanchao Qi, Chuancheng Lv, Zhiyuan Liu, Xiaojun Meng, Maosong Sun, Hai-Tao Zheng
In this paper, we utilize the multilingual synonyms, multilingual glosses and images in BabelNet for SPBS.
no code implementations • 8 Mar 2022 • Zhengkun Zhang, Wenya Guo, Xiaojun Meng, Yasheng Wang, Yadao Wang, Xin Jiang, Qun Liu, Zhenglu Yang
In this paper, we design a novel unified parameter-efficient transfer learning framework that works effectively on both pure language and V&L tasks.
1 code implementation • 14 Feb 2022 • Jiaxi Gu, Xiaojun Meng, Guansong Lu, Lu Hou, Minzhe Niu, Xiaodan Liang, Lewei Yao, Runhui Huang, Wei zhang, Xin Jiang, Chunjing Xu, Hang Xu
Experiments show that Wukong can serve as a promising Chinese pre-training dataset and benchmark for different cross-modal learning methods.
Ranked #6 on Image Retrieval on MUGE Retrieval
no code implementations • 13 Sep 2021 • Zhengkun Zhang, Xiaojun Meng, Yasheng Wang, Xin Jiang, Qun Liu, Zhenglu Yang
Specially, we adopt knowledge distillation from a vision-language pretrained model to improve image selection, which avoids any requirement on the existence and quality of image captions.
no code implementations • 19 Sep 2020 • Yuan Zang, Bairu Hou, Fanchao Qi, Zhiyuan Liu, Xiaojun Meng, Maosong Sun
Adversarial attacking aims to fool deep neural networks with adversarial examples.