1 code implementation • Findings (ACL) 2022 • Jian Li, Jieming Zhu, Qiwei Bi, Guohao Cai, Lifeng Shang, Zhenhua Dong, Xin Jiang, Qun Liu
Accurately matching user’s interests and candidate news is the key to news recommendation.
no code implementations • Findings (ACL) 2022 • Xianghong Fang, Jian Li, Lifeng Shang, Xin Jiang, Qun Liu, Dit-yan Yeung
While variational autoencoders (VAEs) have been widely applied in text generation tasks, they are troubled by two challenges: insufficient representation capacity and poor controllability.
1 code implementation • Findings (ACL) 2022 • Qiwei Bi, Jian Li, Lifeng Shang, Xin Jiang, Qun Liu, Hanfang Yang
With the adoption of large pre-trained models like BERT in news recommendation, the above way to incorporate multi-field information may encounter challenges: the shallow feature encoding to compress the category and entity information is not compatible with the deep BERT encoding.
no code implementations • 23 Aug 2023 • Renlong Jie, Xiaojun Meng, Lifeng Shang, Xin Jiang, Qun Liu
Recently, large language models (LLMs) like ChatGPT and GPT-4 have attracted great attention given their surprising improvement and performance.
1 code implementation • 12 Aug 2023 • Siheng Li, Yichun Yin, Cheng Yang, Wangjie Jiang, Yiwei Li, Zesen Cheng, Lifeng Shang, Xin Jiang, Qun Liu, Yujiu Yang
In this paper, we propose a novel task, Proactive News Grounded Conversation, in which a dialogue system can proactively lead the conversation based on some key topics of the news.
no code implementations • 12 Aug 2023 • Siheng Li, Cheng Yang, Yichun Yin, Xinyu Zhu, Zesen Cheng, Lifeng Shang, Xin Jiang, Qun Liu, Yujiu Yang
Information-seeking conversation, which aims to help users gather information through conversation, has achieved great progress in recent years.
1 code implementation • 24 Jul 2023 • YuFei Wang, Wanjun Zhong, Liangyou Li, Fei Mi, Xingshan Zeng, Wenyong Huang, Lifeng Shang, Xin Jiang, Qun Liu
(2) Training methodologies: a detailed review of the prevailing training methods employed for LLM alignment.
no code implementations • 22 May 2023 • Renlong Jie, Xiaojun Meng, Lifeng Shang, Xin Jiang, Qun Liu
This study proposes a multitask learning architecture for extractive summarization with coherence boosting.
no code implementations • 15 Dec 2022 • Jiawei Zhou, Xiaoguang Li, Lifeng Shang, Xin Jiang, Qun Liu, Lei Chen
Disentangled representation learning remains challenging as ground truth factors of variation do not naturally exist.
1 code implementation • 7 Dec 2022 • Zhongwei Wan, Yichun Yin, Wei zhang, Jiaxin Shi, Lifeng Shang, Guangyong Chen, Xin Jiang, Qun Liu
Recently, domain-specific PLMs have been proposed to boost the task performance of specific domains (e. g., biomedical and computer science) by continuing to pre-train general PLMs with domain-specific corpora.
1 code implementation • 4 Dec 2022 • Zhexin Zhang, Jiale Cheng, Hao Sun, Jiawen Deng, Fei Mi, Yasheng Wang, Lifeng Shang, Minlie Huang
In order to detect such toxic generations, existing methods rely on templates, real-world data extraction, crowdsourcing workers, or automatic generation to construct adversarial contexts that are likely to induce toxic generations.
no code implementations • 21 Oct 2022 • Dongsheng Chen, Chaofan Tao, Lu Hou, Lifeng Shang, Xin Jiang, Qun Liu
Recent large-scale video-language pre-trained models have shown appealing performance on various downstream tasks.
no code implementations • 20 Oct 2022 • Shaobo Li, Xiaoguang Li, Lifeng Shang, Chengjie Sun, Bingquan Liu, Zhenzhou Ji, Xin Jiang, Qun Liu
Further experiments on question-answering datasets show that trying to learn a deterministic relationship with the proposed methods can also help other knowledge-intensive tasks.
1 code implementation • ICLR 2022 • Yuxin Ren, Benyou Wang, Lifeng Shang, Xin Jiang, Qun Liu
A tiny version achieves $96. 7\%$ performance of BERT-base with $ {1}/{48} $ encoder parameters (i. e., less than 2M parameters excluding the embedding layer) and $2. 7 \times$ faster on inference.
no code implementations • Findings (ACL) 2022 • Shaobo Li, Xiaoguang Li, Lifeng Shang, Zhenhua Dong, Chengjie Sun, Bingquan Liu, Zhenzhou Ji, Xin Jiang, Qun Liu
We check the words that have three typical associations with the missing words: knowledge-dependent, positionally close, and highly co-occurred.
1 code implementation • 31 Mar 2022 • Fei Mi, Yitong Li, Yulong Zeng, Jingyan Zhou, Yasheng Wang, Chuanfei Xu, Lifeng Shang, Xin Jiang, Shiqi Zhao, Qun Liu
We investigate different aspects of responses generated by PanGu-Bot, including response quality, knowledge, and safety.
no code implementations • ACL 2022 • Chaofan Tao, Lu Hou, Wei zhang, Lifeng Shang, Xin Jiang, Qun Liu, Ping Luo, Ngai Wong
We find that previous quantization methods fail on generative tasks due to the \textit{homogeneous word embeddings} caused by reduced capacity, and \textit{varied distribution of weights}.
1 code implementation • ACL 2022 • Jiawei Zhou, Xiaoguang Li, Lifeng Shang, Lan Luo, Ke Zhan, Enrui Hu, Xinyu Zhang, Hao Jiang, Zhao Cao, Fan Yu, Xin Jiang, Qun Liu, Lei Chen
To alleviate the data scarcity problem in training question answering systems, recent works propose additional intermediate pre-training for dense passage retrieval (DPR).
no code implementations • Findings (ACL) 2022 • Wenliang Dai, Lu Hou, Lifeng Shang, Xin Jiang, Qun Liu, Pascale Fung
Furthermore, the original textual language understanding and generation ability of the PLM is maintained after VLKD, which makes our model versatile for both multimodal and unimodal tasks.
no code implementations • Findings (ACL) 2022 • Dan Su, Xiaoguang Li, Jindi Zhang, Lifeng Shang, Xin Jiang, Qun Liu, Pascale Fung
Long-form question answering (LFQA) aims to generate a paragraph-length answer for a given question.
Ranked #1 on
Question Answering
on KILT: ELI5
no code implementations • ACL 2022 • Cheng Chen, Yichun Yin, Lifeng Shang, Xin Jiang, Yujia Qin, Fengyu Wang, Zhi Wang, Xiao Chen, Zhiyuan Liu, Qun Liu
However, large language model pre-training costs intensive computational resources and most of the models are trained from scratch without reusing the existing pre-trained models, which is wasteful.
no code implementations • 30 Sep 2021 • Haoli Bai, Lu Hou, Lifeng Shang, Xin Jiang, Irwin King, Michael R. Lyu
Experiments on GLUE and SQuAD benchmarks show that our proposed PTQ solution not only performs close to QAT, but also enjoys significant reductions in training time, memory overhead, and data consumption.
1 code implementation • EMNLP 2021 • Baojun Wang, Zhao Zhang, Kun Xu, Guang-Yuan Hao, Yuyang Zhang, Lifeng Shang, Linlin Li, Xiao Chen, Xin Jiang, Qun Liu
Incorporating lexical knowledge into deep learning models has been proved to be very effective for sequence labeling tasks.
no code implementations • EMNLP 2021 • Chenyang Lyu, Lifeng Shang, Yvette Graham, Jennifer Foster, Xin Jiang, Qun Liu
Template-based QG uses linguistically-informed heuristics to transform declarative sentences into interrogatives, whereas supervised QG uses existing Question Answering (QA) datasets to train a system to generate a question given a passage and an answer.
no code implementations • 7 Sep 2021 • Shaobo Li, Qun Liu, Xin Jiang, Yichun Yin, Chengjie Sun, Bingquan Liu, Zhenzhou Ji, Lifeng Shang
Human-designed rules are widely used to build industry applications.
no code implementations • Findings (EMNLP) 2021 • Jianhao Shen, Yichun Yin, Lin Li, Lifeng Shang, Xin Jiang, Ming Zhang, Qun Liu
Math word problem (MWP) is a challenging and critical task in natural language processing.
Ranked #1 on
Math Word Problem Solving
on Math23K
no code implementations • ACL 2021 • Zhiqi Huang, Lu Hou, Lifeng Shang, Xin Jiang, Xiao Chen, Qun Liu
Transformer-based pre-trained language models like BERT, though powerful in many tasks, are expensive in both memory and computation, due to their large number of parameters.
1 code implementation • ACL 2021 • Yichun Yin, Cheng Chen, Lifeng Shang, Xin Jiang, Xiao Chen, Qun Liu
Specifically, we carefully design the techniques of one-shot learning and the search space to provide an adaptive and efficient development way of tiny PLMs for various latency constraints.
1 code implementation • ACL 2021 • Zhihong Shao, Lifeng Shang, Qun Liu, Minlie Huang
This setting gives rise to the spurious solution problem: there may exist many spurious solutions that coincidentally derive the correct answer, but training on such solutions can hurt model performance (e. g., producing wrong solutions or answers).
no code implementations • 24 May 2021 • Mingyang Yi, Lu Hou, Jiacheng Sun, Lifeng Shang, Xin Jiang, Qun Liu, Zhi-Ming Ma
In this paper, after defining OOD generalization via Wasserstein distance, we theoretically show that a model robust to input perturbation generalizes well on OOD data.
no code implementations • 24 Apr 2021 • Cheng Chen, Yichun Yin, Lifeng Shang, Zhi Wang, Xin Jiang, Xiao Chen, Qun Liu
Task-agnostic knowledge distillation, a teacher-student framework, has been proved effective for BERT compression.
no code implementations • ICLR 2021 • Mingyang Yi, Lu Hou, Lifeng Shang, Xin Jiang, Qun Liu, Zhi-Ming Ma
Inspired by adversarial training, we minimize this maximal expected loss (MMEL) and obtain a simple and interpretable closed-form solution: more attention should be paid to augmented samples with large loss values (i. e., harder examples).
no code implementations • 11 Mar 2021 • Xiaoqi Jiao, Yichun Yin, Lifeng Shang, Xin Jiang, Xiao Chen, Linlin Li, Fang Wang, Qun Liu
The multilingual pre-trained language models (e. g, mBERT, XLM and XLM-R) have shown impressive performance on cross-lingual natural language understanding tasks.
no code implementations • 5 Mar 2021 • Chang Liu, Xiaoguang Li, Guohao Cai, Zhenhua Dong, Hong Zhu, Lifeng Shang
It is still an open question to leverage various types of information under the BERT framework.
no code implementations • ICLR 2021 • Benyou Wang, Lifeng Shang, Christina Lioma, Xin Jiang, Hao Yang, Qun Liu, Jakob Grue Simonsen
Various Position Embeddings (PEs) have been proposed in Transformer based architectures~(e. g. BERT) to model word order.
no code implementations • 31 Dec 2020 • Shaobo Li, Xiaoguang Li, Lifeng Shang, Xin Jiang, Qun Liu, Chengjie Sun, Zhenzhou Ji, Bingquan Liu
In this paper, we propose a new retrieval target, hop, to collect the hidden reasoning evidence from Wikipedia for complex question answering.
Ranked #8 on
Question Answering
on HotpotQA
1 code implementation • ACL 2021 • Haoli Bai, Wei zhang, Lu Hou, Lifeng Shang, Jing Jin, Xin Jiang, Qun Liu, Michael Lyu, Irwin King
In this paper, we propose BinaryBERT, which pushes BERT quantization to the limit by weight binarization.
no code implementations • 11 Dec 2020 • Xiaoqi Jiao, Huating Chang, Yichun Yin, Lifeng Shang, Xin Jiang, Xiao Chen, Linlin Li, Fang Wang, Qun Liu
Comprehensive experiments on the evaluation benchmarks demonstrate that 1) layer mapping strategy has a significant effect on task-agnostic BERT distillation and different layer mappings can result in quite different performances; 2) the optimal layer mapping strategy from the proposed search process consistently outperforms the other heuristic ones; 3) with the optimal layer mapping, our student model achieves state-of-the-art performance on the GLUE tasks.
no code implementations • 2 Oct 2020 • Yang Bai, Xiaoguang Li, Gang Wang, Chaoliang Zhang, Lifeng Shang, Jun Xu, Zhaowei Wang, Fangshan Wang, Qun Liu
Term-based sparse representations dominate the first-stage text retrieval in industrial applications, due to its advantage in efficiency, interpretability, and exact term matching.
3 code implementations • EMNLP 2020 • Wei Zhang, Lu Hou, Yichun Yin, Lifeng Shang, Xiao Chen, Xin Jiang, Qun Liu
Transformer-based pre-training models like BERT have achieved remarkable performance in many natural language processing tasks. However, these models are both computation and memory expensive, hindering their deployment to resource-constrained devices.
1 code implementation • AKBC 2020 • Changlong Yu, Hongming Zhang, Yangqiu Song, Wilfred Ng, Lifeng Shang
Computational and cognitive studies suggest that the abstraction of eventualities (activities, states, and events) is crucial for humans to understand daily eventualities.
3 code implementations • NeurIPS 2020 • Lu Hou, Zhiqi Huang, Lifeng Shang, Xin Jiang, Xiao Chen, Qun Liu
The pre-trained language models like BERT, though powerful in many natural language processing tasks, are both computation and memory expensive.
1 code implementation • 25 Dec 2019 • Xin Liu, Haojie Pan, Mutian He, Yangqiu Song, Xin Jiang, Lifeng Shang
In this paper, we study a new graph learning problem: learning to count subgraph isomorphisms.
7 code implementations • Findings of the Association for Computational Linguistics 2020 • Xiaoqi Jiao, Yichun Yin, Lifeng Shang, Xin Jiang, Xiao Chen, Linlin Li, Fang Wang, Qun Liu
To accelerate inference and reduce model size while maintaining accuracy, we first propose a novel Transformer distillation method that is specially designed for knowledge distillation (KD) of the Transformer-based models.
Ranked #1 on
Natural Language Inference
on MultiNLI Dev
no code implementations • 21 Aug 2019 • Yichun Yin, Lifeng Shang, Xin Jiang, Xiao Chen, Qun Liu
Neural dialog state trackers are generally limited due to the lack of quantity and diversity of annotated training data.
no code implementations • ACL 2019 • Zichao Li, Xin Jiang, Lifeng Shang, Qun Liu
Paraphrasing exists at different granularity levels, such as lexical level, phrasal level and sentential level.
1 code implementation • 26 Dec 2018 • Yangbin Chen, Tom Ko, Lifeng Shang, Xiao Chen, Xin Jiang, Qing Li
In this paper, we investigate the feasibility of applying few-shot learning algorithms to a speech task.
no code implementations • EMNLP 2018 • Zichao Li, Xin Jiang, Lifeng Shang, Hang Li
The generator, built as a sequence-to-sequence learning model, can produce paraphrases given a sentence.
1 code implementation • 7 Nov 2016 • Zhaopeng Tu, Yang Liu, Lifeng Shang, Xiaohua Liu, Hang Li
Although end-to-end Neural Machine Translation (NMT) has achieved remarkable progress in the past two years, it suffers from a major drawback: translations generated by NMT systems often lack of adequacy.
1 code implementation • WS 2016 • Jun Yin, Xin Jiang, Zhengdong Lu, Lifeng Shang, Hang Li, Xiaoming Li
Empirical study shows the proposed model can effectively deal with the variations of questions and answers, and generate right and natural answers by referring to the facts in the knowledge-base.
3 code implementations • ICCV 2015 • Lin Ma, Zhengdong Lu, Lifeng Shang, Hang Li
In this paper, we propose multimodal convolutional neural networks (m-CNNs) for matching image and sentence.
Ranked #15 on
Image Retrieval
on Flickr30K 1K test
4 code implementations • IJCNLP 2015 • Lifeng Shang, Zhengdong Lu, Hang Li
We propose Neural Responding Machine (NRM), a neural network-based response generator for Short-Text Conversation.
no code implementations • 25 Nov 2013 • Lifeng Shang, Antoni B. Chan
In this paper, we consider efficient algorithms for approximate inference on GGPMs using the general form of the EFD.