Search Results for author: Lifeng Shang

Found 43 papers, 15 papers with code

MTRec: Multi-Task Learning over BERT for News Recommendation

no code implementations Findings (ACL) 2022 Qiwei Bi, Jian Li, Lifeng Shang, Xin Jiang, Qun Liu, Hanfang Yang

With the adoption of large pre-trained models like BERT in news recommendation, the above way to incorporate multi-field information may encounter challenges: the shallow feature encoding to compress the category and entity information is not compatible with the deep BERT encoding.

Multi-Task Learning News Recommendation

Controlled Text Generation Using Dictionary Prior in Variational Autoencoders

no code implementations Findings (ACL) 2022 Xianghong Fang, Jian Li, Lifeng Shang, Xin Jiang, Qun Liu, Dit-yan Yeung

While variational autoencoders (VAEs) have been widely applied in text generation tasks, they are troubled by two challenges: insufficient representation capacity and poor controllability.

Contrastive Learning Language Modelling +2

How Pre-trained Language Models Capture Factual Knowledge? A Causal-Inspired Analysis

no code implementations Findings (ACL) 2022 Shaobo Li, Xiaoguang Li, Lifeng Shang, Zhenhua Dong, Chengjie Sun, Bingquan Liu, Zhenzhou Ji, Xin Jiang, Qun Liu

We check the words that have three typical associations with the missing words: knowledge-dependent, positionally close, and highly co-occurred.

Compression of Generative Pre-trained Language Models via Quantization

no code implementations ACL 2022 Chaofan Tao, Lu Hou, Wei zhang, Lifeng Shang, Xin Jiang, Qun Liu, Ping Luo, Ngai Wong

We find that previous quantization methods fail on generative tasks due to the \textit{homogeneous word embeddings} caused by reduced capacity, and \textit{varied distribution of weights}.

Model Compression Quantization +1

Hyperlink-induced Pre-training for Passage Retrieval in Open-domain Question Answering

1 code implementation ACL 2022 Jiawei Zhou, Xiaoguang Li, Lifeng Shang, Lan Luo, Ke Zhan, Enrui Hu, Xinyu Zhang, Hao Jiang, Zhao Cao, Fan Yu, Xin Jiang, Qun Liu, Lei Chen

To alleviate the data scarcity problem in training question answering systems, recent works propose additional intermediate pre-training for dense passage retrieval (DPR).

Open-Domain Question Answering Passage Retrieval

Enabling Multimodal Generation on CLIP via Vision-Language Knowledge Distillation

no code implementations Findings (ACL) 2022 Wenliang Dai, Lu Hou, Lifeng Shang, Xin Jiang, Qun Liu, Pascale Fung

Furthermore, the original textual language understanding and generation ability of the PLM is maintained after VLKD, which makes our model versatile for both multimodal and unimodal tasks.

Image Captioning Knowledge Distillation +3

bert2BERT: Towards Reusable Pretrained Language Models

no code implementations ACL 2022 Cheng Chen, Yichun Yin, Lifeng Shang, Xin Jiang, Yujia Qin, Fengyu Wang, Zhi Wang, Xiao Chen, Zhiyuan Liu, Qun Liu

However, large language model pre-training costs intensive computational resources and most of the models are trained from scratch without reusing the existing pre-trained models, which is wasteful.

Language Modelling Pretrained Language Models

Towards Efficient Post-training Quantization of Pre-trained Language Models

no code implementations30 Sep 2021 Haoli Bai, Lu Hou, Lifeng Shang, Xin Jiang, Irwin King, Michael R. Lyu

Experiments on GLUE and SQuAD benchmarks show that our proposed PTQ solution not only performs close to QAT, but also enjoys significant reductions in training time, memory overhead, and data consumption.


Exploring extreme parameter compression for pre-trained language models

no code implementations ICLR 2022 Benyou Wang, Yuxin Ren, Lifeng Shang, Xin Jiang, Qun Liu

A tiny version achieves 96. 7\% performance of BERT-base with $ {1}/{48} $ encoder parameters (i. e., less than 2M parameters excluding the embedding layer) and \textbf{$2. 7 \times$} faster on inference.

Knowledge Distillation Tensor Decomposition

Improving Unsupervised Question Answering via Summarization-Informed Question Generation

no code implementations EMNLP 2021 Chenyang Lyu, Lifeng Shang, Yvette Graham, Jennifer Foster, Xin Jiang, Qun Liu

Template-based QG uses linguistically-informed heuristics to transform declarative sentences into interrogatives, whereas supervised QG uses existing Question Answering (QA) datasets to train a system to generate a question given a passage and an answer.

Dependency Parsing Named Entity Recognition +3

GhostBERT: Generate More Features with Cheap Operations for BERT

no code implementations ACL 2021 Zhiqi Huang, Lu Hou, Lifeng Shang, Xin Jiang, Xiao Chen, Qun Liu

Transformer-based pre-trained language models like BERT, though powerful in many tasks, are expensive in both memory and computation, due to their large number of parameters.

AutoTinyBERT: Automatic Hyper-parameter Optimization for Efficient Pre-trained Language Models

1 code implementation ACL 2021 Yichun Yin, Cheng Chen, Lifeng Shang, Xin Jiang, Xiao Chen, Qun Liu

Specifically, we carefully design the techniques of one-shot learning and the search space to provide an adaptive and efficient development way of tiny PLMs for various latency constraints.

Neural Architecture Search One-Shot Learning

A Mutual Information Maximization Approach for the Spurious Solution Problem in Weakly Supervised Question Answering

1 code implementation ACL 2021 Zhihong Shao, Lifeng Shang, Qun Liu, Minlie Huang

This setting gives rise to the spurious solution problem: there may exist many spurious solutions that coincidentally derive the correct answer, but training on such solutions can hurt model performance (e. g., producing wrong solutions or answers).

Question Answering

Improved OOD Generalization via Adversarial Training and Pre-training

no code implementations24 May 2021 Mingyang Yi, Lu Hou, Jiacheng Sun, Lifeng Shang, Xin Jiang, Qun Liu, Zhi-Ming Ma

In this paper, after defining OOD generalization via Wasserstein distance, we theoretically show that a model robust to input perturbation generalizes well on OOD data.

Image Classification Natural Language Understanding

Extract then Distill: Efficient and Effective Task-Agnostic BERT Distillation

no code implementations24 Apr 2021 Cheng Chen, Yichun Yin, Lifeng Shang, Zhi Wang, Xin Jiang, Xiao Chen, Qun Liu

Task-agnostic knowledge distillation, a teacher-student framework, has been proved effective for BERT compression.

Knowledge Distillation

Reweighting Augmented Samples by Minimizing the Maximal Expected Loss

no code implementations ICLR 2021 Mingyang Yi, Lu Hou, Lifeng Shang, Xin Jiang, Qun Liu, Zhi-Ming Ma

Inspired by adversarial training, we minimize this maximal expected loss (MMEL) and obtain a simple and interpretable closed-form solution: more attention should be paid to augmented samples with large loss values (i. e., harder examples).

Image Augmentation Image Classification +1

LightMBERT: A Simple Yet Effective Method for Multilingual BERT Distillation

no code implementations11 Mar 2021 Xiaoqi Jiao, Yichun Yin, Lifeng Shang, Xin Jiang, Xiao Chen, Linlin Li, Fang Wang, Qun Liu

The multilingual pre-trained language models (e. g, mBERT, XLM and XLM-R) have shown impressive performance on cross-lingual natural language understanding tasks.

Natural Language Understanding

On Position Embeddings in BERT

no code implementations ICLR 2021 Benyou Wang, Lifeng Shang, Christina Lioma, Xin Jiang, Hao Yang, Qun Liu, Jakob Grue Simonsen

Various Position Embeddings (PEs) have been proposed in Transformer based architectures~(e. g. BERT) to model word order.

General Classification Translation

HopRetriever: Retrieve Hops over Wikipedia to Answer Complex Questions

no code implementations31 Dec 2020 Shaobo Li, Xiaoguang Li, Lifeng Shang, Xin Jiang, Qun Liu, Chengjie Sun, Zhenzhou Ji, Bingquan Liu

In this paper, we propose a new retrieval target, hop, to collect the hidden reasoning evidence from Wikipedia for complex question answering.

Document Embedding Open-Domain Question Answering

Improving Task-Agnostic BERT Distillation with Layer Mapping Search

no code implementations11 Dec 2020 Xiaoqi Jiao, Huating Chang, Yichun Yin, Lifeng Shang, Xin Jiang, Xiao Chen, Linlin Li, Fang Wang, Qun Liu

Comprehensive experiments on the evaluation benchmarks demonstrate that 1) layer mapping strategy has a significant effect on task-agnostic BERT distillation and different layer mappings can result in quite different performances; 2) the optimal layer mapping strategy from the proposed search process consistently outperforms the other heuristic ones; 3) with the optimal layer mapping, our student model achieves state-of-the-art performance on the GLUE tasks.

Knowledge Distillation

SparTerm: Learning Term-based Sparse Representation for Fast Text Retrieval

no code implementations2 Oct 2020 Yang Bai, Xiaoguang Li, Gang Wang, Chaoliang Zhang, Lifeng Shang, Jun Xu, Zhaowei Wang, Fangshan Wang, Qun Liu

Term-based sparse representations dominate the first-stage text retrieval in industrial applications, due to its advantage in efficiency, interpretability, and exact term matching.

Language Modelling

TernaryBERT: Distillation-aware Ultra-low Bit BERT

2 code implementations EMNLP 2020 Wei Zhang, Lu Hou, Yichun Yin, Lifeng Shang, Xiao Chen, Xin Jiang, Qun Liu

Transformer-based pre-training models like BERT have achieved remarkable performance in many natural language processing tasks. However, these models are both computation and memory expensive, hindering their deployment to resource-constrained devices.

Knowledge Distillation Quantization

Enriching Large-Scale Eventuality Knowledge Graph with Entailment Relations

1 code implementation AKBC 2020 Changlong Yu, Hongming Zhang, Yangqiu Song, Wilfred Ng, Lifeng Shang

Computational and cognitive studies suggest that the abstraction of eventualities (activities, states, and events) is crucial for humans to understand daily eventualities.


DynaBERT: Dynamic BERT with Adaptive Width and Depth

3 code implementations NeurIPS 2020 Lu Hou, Zhiqi Huang, Lifeng Shang, Xin Jiang, Xiao Chen, Qun Liu

The pre-trained language models like BERT, though powerful in many natural language processing tasks, are both computation and memory expensive.

Language Modelling

Neural Subgraph Isomorphism Counting

1 code implementation25 Dec 2019 Xin Liu, Haojie Pan, Mutian He, Yangqiu Song, Xin Jiang, Lifeng Shang

In this paper, we study a new graph learning problem: learning to count subgraph isomorphisms.

Domain Adaptation Graph Learning +4

TinyBERT: Distilling BERT for Natural Language Understanding

5 code implementations Findings of the Association for Computational Linguistics 2020 Xiaoqi Jiao, Yichun Yin, Lifeng Shang, Xin Jiang, Xiao Chen, Linlin Li, Fang Wang, Qun Liu

To accelerate inference and reduce model size while maintaining accuracy, we first propose a novel Transformer distillation method that is specially designed for knowledge distillation (KD) of the Transformer-based models.

Knowledge Distillation Language Modelling +6

Dialog State Tracking with Reinforced Data Augmentation

no code implementations21 Aug 2019 Yichun Yin, Lifeng Shang, Xin Jiang, Xiao Chen, Qun Liu

Neural dialog state trackers are generally limited due to the lack of quantity and diversity of annotated training data.

Data Augmentation reinforcement-learning

Decomposable Neural Paraphrase Generation

no code implementations ACL 2019 Zichao Li, Xin Jiang, Lifeng Shang, Qun Liu

Paraphrasing exists at different granularity levels, such as lexical level, phrasal level and sentential level.

Paraphrase Generation Unsupervised Domain Adaptation

Neural Machine Translation with Reconstruction

1 code implementation7 Nov 2016 Zhaopeng Tu, Yang Liu, Lifeng Shang, Xiaohua Liu, Hang Li

Although end-to-end Neural Machine Translation (NMT) has achieved remarkable progress in the past two years, it suffers from a major drawback: translations generated by NMT systems often lack of adequacy.

Machine Translation Translation

Neural Generative Question Answering

1 code implementation WS 2016 Jun Yin, Xin Jiang, Zhengdong Lu, Lifeng Shang, Hang Li, Xiaoming Li

Empirical study shows the proposed model can effectively deal with the variations of questions and answers, and generate right and natural answers by referring to the facts in the knowledge-base.

Generative Question Answering Text Generation

Multimodal Convolutional Neural Networks for Matching Image and Sentence

2 code implementations ICCV 2015 Lin Ma, Zhengdong Lu, Lifeng Shang, Hang Li

In this paper, we propose multimodal convolutional neural networks (m-CNNs) for matching image and sentence.

Neural Responding Machine for Short-Text Conversation

5 code implementations IJCNLP 2015 Lifeng Shang, Zhengdong Lu, Hang Li

We propose Neural Responding Machine (NRM), a neural network-based response generator for Short-Text Conversation.

Short-Text Conversation

On Approximate Inference for Generalized Gaussian Process Models

no code implementations25 Nov 2013 Lifeng Shang, Antoni B. Chan

In this paper, we consider efficient algorithms for approximate inference on GGPMs using the general form of the EFD.

Cannot find the paper you are looking for? You can Submit a new open access paper.