1 code implementation • Findings (ACL) 2022 • Qiwei Bi, Jian Li, Lifeng Shang, Xin Jiang, Qun Liu, Hanfang Yang
With the adoption of large pre-trained models like BERT in news recommendation, the above way to incorporate multi-field information may encounter challenges: the shallow feature encoding to compress the category and entity information is not compatible with the deep BERT encoding.
no code implementations • Findings (ACL) 2022 • Xianghong Fang, Jian Li, Lifeng Shang, Xin Jiang, Qun Liu, Dit-yan Yeung
While variational autoencoders (VAEs) have been widely applied in text generation tasks, they are troubled by two challenges: insufficient representation capacity and poor controllability.
1 code implementation • Findings (ACL) 2022 • Jian Li, Jieming Zhu, Qiwei Bi, Guohao Cai, Lifeng Shang, Zhenhua Dong, Xin Jiang, Qun Liu
Accurately matching user’s interests and candidate news is the key to news recommendation.
1 code implementation • 12 Mar 2025 • Boyang Xue, Qi Zhu, Hongru Wang, Rui Wang, Sheng Wang, Hongling Xu, Fei Mi, Yasheng Wang, Lifeng Shang, Qun Liu, Kam-Fai Wong
Present Large Language Models (LLM) self-training methods always under-sample on challenging queries, leading to inadequate learning on difficult problems which limits LLMs' ability.
no code implementations • 26 Feb 2025 • Kaishuai Xu, Tiezheng Yu, Wenjun Hou, Yi Cheng, Liangyou Li, Xin Jiang, Lifeng Shang, Qun Liu, Wenjie Li
Large Language Models (LLMs) are being used more and more extensively for automated evaluation in various scenarios.
no code implementations • 25 Feb 2025 • Tianyi Zhuang, Chuqiao Kuang, Xiaoguang Li, Yihua Teng, Jihao Wu, Yasheng Wang, Lifeng Shang
We present DocPuzzle, a rigorously constructed benchmark for evaluating long-context reasoning capabilities in large language models (LLMs).
no code implementations • 17 Feb 2025 • Xin Xu, Yan Xu, Tianhao Chen, Yuchen Yan, Chengwu Liu, Zaoyu Chen, YuFei Wang, Yichun Yin, Yasheng Wang, Lifeng Shang, Qun Liu
Existing approaches to mathematical reasoning with large language models (LLMs) rely on Chain-of-Thought (CoT) for generalizability or Tool-Integrated Reasoning (TIR) for precise computation.
no code implementations • 17 Dec 2024 • Jiebin Zhang, Dawei Zhu, YiFan Song, Wenhao Wu, Chuqiao Kuang, Xiaoguang Li, Lifeng Shang, Qun Liu, Sujian Li
As large language models (LLMs) process increasing context windows, the memory usage of KV cache has become a critical bottleneck during inference.
no code implementations • 24 Oct 2024 • Zezhong Wang, Xingshan Zeng, Weiwen Liu, Liangyou Li, Yasheng Wang, Lifeng Shang, Xin Jiang, Qun Liu, Kam-Fai Wong
Data quality assessments demonstrate improvements in the naturalness and coherence of our synthesized dialogues.
no code implementations • 9 Oct 2024 • Kaishuai Xu, Tiezheng Yu, Wenjun Hou, Yi Cheng, Chak Tou Leong, Liangyou Li, Xin Jiang, Lifeng Shang, Qun Liu, Wenjie Li
In this work, we propose a novel preference learning framework called eRror-Injected Self-Editing (RISE), which injects predefined subtle errors into partial tokens of correct solutions to construct hard pairs for error mitigation.
no code implementations • 7 Oct 2024 • Qiyuan Zhang, YuFei Wang, Tiezheng Yu, Yuxin Jiang, Chuhan Wu, Liangyou Li, Yasheng Wang, Xin Jiang, Lifeng Shang, Ruiming Tang, Fuyuan Lyu, Chen Ma
With significant efforts in recent studies, LLM-as-a-Judge has become a cost-effective alternative to human evaluation for assessing the text generation quality in a wide range of tasks.
no code implementations • 22 Sep 2024 • Tao Li, Zhengbao He, YuJun Li, Yasheng Wang, Lifeng Shang, Xiaolin Huang
Fine-tuning large-scale pre-trained models is prohibitively expensive in terms of computational and memory costs.
no code implementations • 2 Sep 2024 • Weiwen Liu, Xu Huang, Xingshan Zeng, Xinlong Hao, Shuai Yu, Dexun Li, Shuai Wang, Weinan Gan, Zhengying Liu, Yuanqing Yu, Zezhong Wang, Yuxian Wang, Wu Ning, Yutai Hou, Bin Wang, Chuhan Wu, Xinzhi Wang, Yong liu, Yasheng Wang, Duyu Tang, Dandan Tu, Lifeng Shang, Xin Jiang, Ruiming Tang, Defu Lian, Qun Liu, Enhong Chen
Function calling significantly extends the application boundary of large language models, where high-quality and diverse training data is critical for unlocking this capability.
1 code implementation • 14 Aug 2024 • Yuxin Jiang, Bo Huang, YuFei Wang, Xingshan Zeng, Liangyou Li, Yasheng Wang, Xin Jiang, Lifeng Shang, Ruiming Tang, Wei Wang
Firstly, we increase the consistency and informativeness of the pairwise preference signals through targeted modifications, synthesizing a pseudo-winning response by improving the losing response with the winning response as a reference.
no code implementations • 23 Jun 2024 • Zezhong Wang, Xingshan Zeng, Weiwen Liu, YuFei Wang, Liangyou Li, Yasheng Wang, Lifeng Shang, Xin Jiang, Qun Liu, Kam-Fai Wong
To address these questions, we propose a method, namely Chain-of-Probe (CoP), to probe changes in the mind during the model's reasoning.
no code implementations • 29 May 2024 • Hao Zhang, Yuyang Zhang, Xiaoguang Li, Wenxuan Shi, Haonan Xu, Huanshuo Liu, Yasheng Wang, Lifeng Shang, Qun Liu, Yong liu, Ruiming Tang
Integrating external knowledge into large language models (LLMs) presents a promising solution to overcome the limitations imposed by their antiquated and static parametric memory.
no code implementations • 22 Feb 2024 • Xinshuo Hu, Baotian Hu, Dongfang Li, Xiaoguang Li, Lifeng Shang
The present study introduces the knowledge-augmented generator, which is specifically designed to produce information that remains grounded in contextual knowledge, regardless of alterations in the context.
1 code implementation • 19 Feb 2024 • Yuxin Jiang, YuFei Wang, Chuhan Wu, Wanjun Zhong, Xingshan Zeng, Jiahui Gao, Liangyou Li, Xin Jiang, Lifeng Shang, Ruiming Tang, Qun Liu, Wei Wang
Knowledge editing techniques, aiming to efficiently modify a minor proportion of knowledge in large language models (LLMs) without negatively impacting performance across other inputs, have garnered widespread attention.
1 code implementation • 30 Jan 2024 • Wai-Chung Kwan, Xingshan Zeng, Yuxin Jiang, YuFei Wang, Liangyou Li, Lifeng Shang, Xin Jiang, Qun Liu, Kam-Fai Wong
Large language models (LLMs) are increasingly relied upon for complex multi-turn conversations across diverse real-world applications.
1 code implementation • 30 Jan 2024 • Shijue Huang, Wanjun Zhong, Jianqiao Lu, Qi Zhu, Jiahui Gao, Weiwen Liu, Yutai Hou, Xingshan Zeng, Yasheng Wang, Lifeng Shang, Xin Jiang, Ruifeng Xu, Qun Liu
The recent trend of using Large Language Models (LLMs) as tool agents in real-world applications underscores the necessity for comprehensive evaluations of their capabilities, particularly in complex scenarios involving planning, creating, and using tools.
no code implementations • 28 Jan 2024 • Jianqiao Lu, Wanjun Zhong, YuFei Wang, Zhijiang Guo, Qi Zhu, Wenyong Huang, Yanlin Wang, Fei Mi, Baojun Wang, Yasheng Wang, Lifeng Shang, Xin Jiang, Qun Liu
With the teacher's guidance, the student learns to iteratively refine its answer with feedback, and forms a robust and comprehensive understanding of the posed questions.
1 code implementation • 26 Jan 2024 • Haochen Tan, Zhijiang Guo, Zhan Shi, Lu Xu, Zhili Liu, Yunlong Feng, Xiaoguang Li, Yasheng Wang, Lifeng Shang, Qun Liu, Linqi Song
Large Language Models (LLMs) have succeeded remarkably in understanding long-form contents.
1 code implementation • 17 Jan 2024 • Yu Pan, Ye Yuan, Yichun Yin, Jiaxin Shi, Zenglin Xu, Ming Zhang, Lifeng Shang, Xin Jiang, Qun Liu
The rapid progress of Transformers in artificial intelligence has come at the cost of increased resource consumption and greenhouse gas emissions due to growing model sizes.
1 code implementation • 4 Dec 2023 • Zige Wang, Wanjun Zhong, YuFei Wang, Qi Zhu, Fei Mi, Baojun Wang, Lifeng Shang, Xin Jiang, Qun Liu
This survey aims to provide a comprehensive overview of current research in data management within both the pretraining and supervised fine-tuning stages of LLMs, covering various aspects of data management strategy design.
1 code implementation • 31 Oct 2023 • Yuxin Jiang, YuFei Wang, Xingshan Zeng, Wanjun Zhong, Liangyou Li, Fei Mi, Lifeng Shang, Xin Jiang, Qun Liu, Wei Wang
To fill this research gap, in this paper, we propose FollowBench, a Multi-level Fine-grained Constraints Following Benchmark for LLMs.
1 code implementation • 30 Oct 2023 • Wai-Chung Kwan, Xingshan Zeng, YuFei Wang, Yusen Sun, Liangyou Li, Lifeng Shang, Qun Liu, Kam-Fai Wong
In this paper, we propose M4LE, a Multi-ability, Multi-range, Multi-task, Multi-domain benchmark for Long-context Evaluation.
no code implementations • 16 Oct 2023 • Kai Chen, Chunwei Wang, Kuo Yang, Jianhua Han, Lanqing Hong, Fei Mi, Hang Xu, Zhengying Liu, Wenyong Huang, Zhenguo Li, Dit-yan Yeung, Lifeng Shang, Xin Jiang, Qun Liu
The rapid development of large language models (LLMs) has not only provided numerous opportunities but also presented significant challenges.
1 code implementation • 12 Oct 2023 • Boyang Xue, Weichao Wang, Hongru Wang, Fei Mi, Rui Wang, Yasheng Wang, Lifeng Shang, Xin Jiang, Qun Liu, Kam-Fai Wong
Inspired by previous work which identified that feed-forward networks (FFNs) within Transformers are responsible for factual knowledge expressions, we investigate two methods to efficiently improve the factual expression capability {of FFNs} by knowledge enhancement and alignment respectively.
no code implementations • 8 Oct 2023 • Baojun Wang, Kun Xu, Lifeng Shang
Through delicate pretraining tasks, the characters and pinyin representation are fused, which can enhance the error tolerance for SSP errors.
no code implementations • 1 Oct 2023 • Jianqiao Lu, Wanjun Zhong, Wenyong Huang, YuFei Wang, Qi Zhu, Fei Mi, Baojun Wang, Weichao Wang, Xingshan Zeng, Lifeng Shang, Xin Jiang, Qun Liu
SELF initiates with a meta-skill learning process that equips the LLMs with capabilities for self-feedback and self-refinement.
no code implementations • 23 Aug 2023 • Renlong Jie, Xiaojun Meng, Lifeng Shang, Xin Jiang, Qun Liu
Large language models (LLMs) like ChatGPT and GPT-4 have attracted great attention given their surprising performance on a wide range of NLP tasks.
no code implementations • 12 Aug 2023 • Siheng Li, Cheng Yang, Yichun Yin, Xinyu Zhu, Zesen Cheng, Lifeng Shang, Xin Jiang, Qun Liu, Yujiu Yang
Information-seeking conversation, which aims to help users gather information through conversation, has achieved great progress in recent years.
1 code implementation • 12 Aug 2023 • Siheng Li, Yichun Yin, Cheng Yang, Wangjie Jiang, Yiwei Li, Zesen Cheng, Lifeng Shang, Xin Jiang, Qun Liu, Yujiu Yang
In this paper, we propose a novel task, Proactive News Grounded Conversation, in which a dialogue system can proactively lead the conversation based on some key topics of the news.
1 code implementation • 24 Jul 2023 • YuFei Wang, Wanjun Zhong, Liangyou Li, Fei Mi, Xingshan Zeng, Wenyong Huang, Lifeng Shang, Xin Jiang, Qun Liu
(2) Training methodologies: a detailed review of the prevailing training methods employed for LLM alignment.
1 code implementation • ACL 2023 • Guanhua Chen, Lu Hou, Yun Chen, Wenliang Dai, Lifeng Shang, Xin Jiang, Qun Liu, Jia Pan, Wenping Wang
Furthermore, to enhance the token- and sentence-level multilingual representation of the MTE, we propose to train it with machine translation and contrastive learning jointly before the TriKD to provide a better initialization.
no code implementations • 22 May 2023 • Renlong Jie, Xiaojun Meng, Lifeng Shang, Xin Jiang, Qun Liu
This study proposes a multitask learning architecture for extractive summarization with coherence boosting.
no code implementations • 15 Dec 2022 • Jiawei Zhou, Xiaoguang Li, Lifeng Shang, Xin Jiang, Qun Liu, Lei Chen
In light of this, we present Vocabulary Disentangled Retrieval (VDR), a retrieval-based framework that harnesses natural language as proxies of the underlying data variation to drive disentangled representation learning.
1 code implementation • 7 Dec 2022 • Zhongwei Wan, Yichun Yin, Wei zhang, Jiaxin Shi, Lifeng Shang, Guangyong Chen, Xin Jiang, Qun Liu
Recently, domain-specific PLMs have been proposed to boost the task performance of specific domains (e. g., biomedical and computer science) by continuing to pre-train general PLMs with domain-specific corpora.
1 code implementation • 4 Dec 2022 • Zhexin Zhang, Jiale Cheng, Hao Sun, Jiawen Deng, Fei Mi, Yasheng Wang, Lifeng Shang, Minlie Huang
In order to detect such toxic generations, existing methods rely on templates, real-world data extraction, crowdsourcing workers, or automatic generation to construct adversarial contexts that are likely to induce toxic generations.
no code implementations • 21 Oct 2022 • Dongsheng Chen, Chaofan Tao, Lu Hou, Lifeng Shang, Xin Jiang, Qun Liu
Recent large-scale video-language pre-trained models have shown appealing performance on various downstream tasks.
no code implementations • 20 Oct 2022 • Shaobo Li, Xiaoguang Li, Lifeng Shang, Chengjie Sun, Bingquan Liu, Zhenzhou Ji, Xin Jiang, Qun Liu
Further experiments on question-answering datasets show that trying to learn a deterministic relationship with the proposed methods can also help other knowledge-intensive tasks.
1 code implementation • ICLR 2022 • Yuxin Ren, Benyou Wang, Lifeng Shang, Xin Jiang, Qun Liu
A tiny version achieves $96. 7\%$ performance of BERT-base with $ {1}/{48} $ encoder parameters (i. e., less than 2M parameters excluding the embedding layer) and $2. 7 \times$ faster on inference.
no code implementations • Findings (ACL) 2022 • Shaobo Li, Xiaoguang Li, Lifeng Shang, Zhenhua Dong, Chengjie Sun, Bingquan Liu, Zhenzhou Ji, Xin Jiang, Qun Liu
We check the words that have three typical associations with the missing words: knowledge-dependent, positionally close, and highly co-occurred.
2 code implementations • 31 Mar 2022 • Fei Mi, Yitong Li, Yulong Zeng, Jingyan Zhou, Yasheng Wang, Chuanfei Xu, Lifeng Shang, Xin Jiang, Shiqi Zhao, Qun Liu
We investigate different aspects of responses generated by PanGu-Bot, including response quality, knowledge, and safety.
no code implementations • ACL 2022 • Chaofan Tao, Lu Hou, Wei zhang, Lifeng Shang, Xin Jiang, Qun Liu, Ping Luo, Ngai Wong
We find that previous quantization methods fail on generative tasks due to the \textit{homogeneous word embeddings} caused by reduced capacity, and \textit{varied distribution of weights}.
1 code implementation • ACL 2022 • Jiawei Zhou, Xiaoguang Li, Lifeng Shang, Lan Luo, Ke Zhan, Enrui Hu, Xinyu Zhang, Hao Jiang, Zhao Cao, Fan Yu, Xin Jiang, Qun Liu, Lei Chen
To alleviate the data scarcity problem in training question answering systems, recent works propose additional intermediate pre-training for dense passage retrieval (DPR).
no code implementations • Findings (ACL) 2022 • Wenliang Dai, Lu Hou, Lifeng Shang, Xin Jiang, Qun Liu, Pascale Fung
Furthermore, the original textual language understanding and generation ability of the PLM is maintained after VLKD, which makes our model versatile for both multimodal and unimodal tasks.
no code implementations • Findings (ACL) 2022 • Dan Su, Xiaoguang Li, Jindi Zhang, Lifeng Shang, Xin Jiang, Qun Liu, Pascale Fung
Long-form question answering (LFQA) aims to generate a paragraph-length answer for a given question.
Ranked #1 on
Question Answering
on KILT: ELI5
no code implementations • ACL 2022 • Cheng Chen, Yichun Yin, Lifeng Shang, Xin Jiang, Yujia Qin, Fengyu Wang, Zhi Wang, Xiao Chen, Zhiyuan Liu, Qun Liu
However, large language model pre-training costs intensive computational resources and most of the models are trained from scratch without reusing the existing pre-trained models, which is wasteful.
no code implementations • 30 Sep 2021 • Haoli Bai, Lu Hou, Lifeng Shang, Xin Jiang, Irwin King, Michael R. Lyu
Experiments on GLUE and SQuAD benchmarks show that our proposed PTQ solution not only performs close to QAT, but also enjoys significant reductions in training time, memory overhead, and data consumption.
1 code implementation • EMNLP 2021 • Baojun Wang, Zhao Zhang, Kun Xu, Guang-Yuan Hao, Yuyang Zhang, Lifeng Shang, Linlin Li, Xiao Chen, Xin Jiang, Qun Liu
Incorporating lexical knowledge into deep learning models has been proved to be very effective for sequence labeling tasks.
no code implementations • EMNLP 2021 • Chenyang Lyu, Lifeng Shang, Yvette Graham, Jennifer Foster, Xin Jiang, Qun Liu
Template-based QG uses linguistically-informed heuristics to transform declarative sentences into interrogatives, whereas supervised QG uses existing Question Answering (QA) datasets to train a system to generate a question given a passage and an answer.
no code implementations • 7 Sep 2021 • Shaobo Li, Qun Liu, Xin Jiang, Yichun Yin, Chengjie Sun, Bingquan Liu, Zhenzhou Ji, Lifeng Shang
Human-designed rules are widely used to build industry applications.
no code implementations • Findings (EMNLP) 2021 • Jianhao Shen, Yichun Yin, Lin Li, Lifeng Shang, Xin Jiang, Ming Zhang, Qun Liu
Math word problem (MWP) is a challenging and critical task in natural language processing.
Ranked #3 on
Math Word Problem Solving
on Math23K
no code implementations • ACL 2021 • Zhiqi Huang, Lu Hou, Lifeng Shang, Xin Jiang, Xiao Chen, Qun Liu
Transformer-based pre-trained language models like BERT, though powerful in many tasks, are expensive in both memory and computation, due to their large number of parameters.
1 code implementation • ACL 2021 • Yichun Yin, Cheng Chen, Lifeng Shang, Xin Jiang, Xiao Chen, Qun Liu
Specifically, we carefully design the techniques of one-shot learning and the search space to provide an adaptive and efficient development way of tiny PLMs for various latency constraints.
1 code implementation • ACL 2021 • Zhihong Shao, Lifeng Shang, Qun Liu, Minlie Huang
This setting gives rise to the spurious solution problem: there may exist many spurious solutions that coincidentally derive the correct answer, but training on such solutions can hurt model performance (e. g., producing wrong solutions or answers).
no code implementations • 24 May 2021 • Mingyang Yi, Lu Hou, Jiacheng Sun, Lifeng Shang, Xin Jiang, Qun Liu, Zhi-Ming Ma
In this paper, after defining OOD generalization via Wasserstein distance, we theoretically show that a model robust to input perturbation generalizes well on OOD data.
no code implementations • 24 Apr 2021 • Cheng Chen, Yichun Yin, Lifeng Shang, Zhi Wang, Xin Jiang, Xiao Chen, Qun Liu
Task-agnostic knowledge distillation, a teacher-student framework, has been proved effective for BERT compression.
1 code implementation • ICLR 2021 • Mingyang Yi, Lu Hou, Lifeng Shang, Xin Jiang, Qun Liu, Zhi-Ming Ma
Inspired by adversarial training, we minimize this maximal expected loss (MMEL) and obtain a simple and interpretable closed-form solution: more attention should be paid to augmented samples with large loss values (i. e., harder examples).
no code implementations • 11 Mar 2021 • Xiaoqi Jiao, Yichun Yin, Lifeng Shang, Xin Jiang, Xiao Chen, Linlin Li, Fang Wang, Qun Liu
The multilingual pre-trained language models (e. g, mBERT, XLM and XLM-R) have shown impressive performance on cross-lingual natural language understanding tasks.
no code implementations • 5 Mar 2021 • Chang Liu, Xiaoguang Li, Guohao Cai, Zhenhua Dong, Hong Zhu, Lifeng Shang
It is still an open question to leverage various types of information under the BERT framework.
no code implementations • ICLR 2021 • Benyou Wang, Lifeng Shang, Christina Lioma, Xin Jiang, Hao Yang, Qun Liu, Jakob Grue Simonsen
Various Position Embeddings (PEs) have been proposed in Transformer based architectures~(e. g. BERT) to model word order.
1 code implementation • ACL 2021 • Haoli Bai, Wei zhang, Lu Hou, Lifeng Shang, Jing Jin, Xin Jiang, Qun Liu, Michael Lyu, Irwin King
In this paper, we propose BinaryBERT, which pushes BERT quantization to the limit by weight binarization.
no code implementations • 31 Dec 2020 • Shaobo Li, Xiaoguang Li, Lifeng Shang, Xin Jiang, Qun Liu, Chengjie Sun, Zhenzhou Ji, Bingquan Liu
In this paper, we propose a new retrieval target, hop, to collect the hidden reasoning evidence from Wikipedia for complex question answering.
Ranked #6 on
Question Answering
on HotpotQA
no code implementations • 11 Dec 2020 • Xiaoqi Jiao, Huating Chang, Yichun Yin, Lifeng Shang, Xin Jiang, Xiao Chen, Linlin Li, Fang Wang, Qun Liu
Comprehensive experiments on the evaluation benchmarks demonstrate that 1) layer mapping strategy has a significant effect on task-agnostic BERT distillation and different layer mappings can result in quite different performances; 2) the optimal layer mapping strategy from the proposed search process consistently outperforms the other heuristic ones; 3) with the optimal layer mapping, our student model achieves state-of-the-art performance on the GLUE tasks.
no code implementations • 2 Oct 2020 • Yang Bai, Xiaoguang Li, Gang Wang, Chaoliang Zhang, Lifeng Shang, Jun Xu, Zhaowei Wang, Fangshan Wang, Qun Liu
Term-based sparse representations dominate the first-stage text retrieval in industrial applications, due to its advantage in efficiency, interpretability, and exact term matching.
5 code implementations • EMNLP 2020 • Wei Zhang, Lu Hou, Yichun Yin, Lifeng Shang, Xiao Chen, Xin Jiang, Qun Liu
Transformer-based pre-training models like BERT have achieved remarkable performance in many natural language processing tasks. However, these models are both computation and memory expensive, hindering their deployment to resource-constrained devices.
1 code implementation • AKBC 2020 • Changlong Yu, Hongming Zhang, Yangqiu Song, Wilfred Ng, Lifeng Shang
Computational and cognitive studies suggest that the abstraction of eventualities (activities, states, and events) is crucial for humans to understand daily eventualities.
3 code implementations • NeurIPS 2020 • Lu Hou, Zhiqi Huang, Lifeng Shang, Xin Jiang, Xiao Chen, Qun Liu
The pre-trained language models like BERT, though powerful in many natural language processing tasks, are both computation and memory expensive.
1 code implementation • 25 Dec 2019 • Xin Liu, Haojie Pan, Mutian He, Yangqiu Song, Xin Jiang, Lifeng Shang
In this paper, we study a new graph learning problem: learning to count subgraph isomorphisms.
11 code implementations • Findings of the Association for Computational Linguistics 2020 • Xiaoqi Jiao, Yichun Yin, Lifeng Shang, Xin Jiang, Xiao Chen, Linlin Li, Fang Wang, Qun Liu
To accelerate inference and reduce model size while maintaining accuracy, we first propose a novel Transformer distillation method that is specially designed for knowledge distillation (KD) of the Transformer-based models.
Ranked #1 on
Natural Language Inference
on MultiNLI Dev
no code implementations • 21 Aug 2019 • Yichun Yin, Lifeng Shang, Xin Jiang, Xiao Chen, Qun Liu
Neural dialog state trackers are generally limited due to the lack of quantity and diversity of annotated training data.
no code implementations • ACL 2019 • Zichao Li, Xin Jiang, Lifeng Shang, Qun Liu
Paraphrasing exists at different granularity levels, such as lexical level, phrasal level and sentential level.
1 code implementation • 26 Dec 2018 • Yangbin Chen, Tom Ko, Lifeng Shang, Xiao Chen, Xin Jiang, Qing Li
In this paper, we investigate the feasibility of applying few-shot learning algorithms to a speech task.
no code implementations • EMNLP 2018 • Zichao Li, Xin Jiang, Lifeng Shang, Hang Li
The generator, built as a sequence-to-sequence learning model, can produce paraphrases given a sentence.
1 code implementation • 7 Nov 2016 • Zhaopeng Tu, Yang Liu, Lifeng Shang, Xiaohua Liu, Hang Li
Although end-to-end Neural Machine Translation (NMT) has achieved remarkable progress in the past two years, it suffers from a major drawback: translations generated by NMT systems often lack of adequacy.
1 code implementation • WS 2016 • Jun Yin, Xin Jiang, Zhengdong Lu, Lifeng Shang, Hang Li, Xiaoming Li
Empirical study shows the proposed model can effectively deal with the variations of questions and answers, and generate right and natural answers by referring to the facts in the knowledge-base.
3 code implementations • ICCV 2015 • Lin Ma, Zhengdong Lu, Lifeng Shang, Hang Li
In this paper, we propose multimodal convolutional neural networks (m-CNNs) for matching image and sentence.
Ranked #16 on
Image Retrieval
on Flickr30K 1K test
4 code implementations • IJCNLP 2015 • Lifeng Shang, Zhengdong Lu, Hang Li
We propose Neural Responding Machine (NRM), a neural network-based response generator for Short-Text Conversation.
no code implementations • 25 Nov 2013 • Lifeng Shang, Antoni B. Chan
In this paper, we consider efficient algorithms for approximate inference on GGPMs using the general form of the EFD.