Search Results for author: Yichun Yin

Found 15 papers, 3 papers with code

bert2BERT: Towards Reusable Pretrained Language Models

no code implementations ACL 2022 Cheng Chen, Yichun Yin, Lifeng Shang, Xin Jiang, Yujia Qin, Fengyu Wang, Zhi Wang, Xiao Chen, Zhiyuan Liu, Qun Liu

However, large language model pre-training costs intensive computational resources and most of the models are trained from scratch without reusing the existing pre-trained models, which is wasteful.

Language Modelling Pretrained Language Models

AutoTinyBERT: Automatic Hyper-parameter Optimization for Efficient Pre-trained Language Models

1 code implementation ACL 2021 Yichun Yin, Cheng Chen, Lifeng Shang, Xin Jiang, Xiao Chen, Qun Liu

Specifically, we carefully design the techniques of one-shot learning and the search space to provide an adaptive and efficient development way of tiny PLMs for various latency constraints.

Neural Architecture Search One-Shot Learning

Extract then Distill: Efficient and Effective Task-Agnostic BERT Distillation

no code implementations24 Apr 2021 Cheng Chen, Yichun Yin, Lifeng Shang, Zhi Wang, Xin Jiang, Xiao Chen, Qun Liu

Task-agnostic knowledge distillation, a teacher-student framework, has been proved effective for BERT compression.

Knowledge Distillation

LightMBERT: A Simple Yet Effective Method for Multilingual BERT Distillation

no code implementations11 Mar 2021 Xiaoqi Jiao, Yichun Yin, Lifeng Shang, Xin Jiang, Xiao Chen, Linlin Li, Fang Wang, Qun Liu

The multilingual pre-trained language models (e. g, mBERT, XLM and XLM-R) have shown impressive performance on cross-lingual natural language understanding tasks.

Natural Language Understanding

Improving Task-Agnostic BERT Distillation with Layer Mapping Search

no code implementations11 Dec 2020 Xiaoqi Jiao, Huating Chang, Yichun Yin, Lifeng Shang, Xin Jiang, Xiao Chen, Linlin Li, Fang Wang, Qun Liu

Comprehensive experiments on the evaluation benchmarks demonstrate that 1) layer mapping strategy has a significant effect on task-agnostic BERT distillation and different layer mappings can result in quite different performances; 2) the optimal layer mapping strategy from the proposed search process consistently outperforms the other heuristic ones; 3) with the optimal layer mapping, our student model achieves state-of-the-art performance on the GLUE tasks.

Knowledge Distillation

TernaryBERT: Distillation-aware Ultra-low Bit BERT

2 code implementations EMNLP 2020 Wei Zhang, Lu Hou, Yichun Yin, Lifeng Shang, Xiao Chen, Xin Jiang, Qun Liu

Transformer-based pre-training models like BERT have achieved remarkable performance in many natural language processing tasks. However, these models are both computation and memory expensive, hindering their deployment to resource-constrained devices.

Knowledge Distillation Quantization

PoD: Positional Dependency-Based Word Embedding for Aspect Term Extraction

no code implementations COLING 2020 Yichun Yin, Chenguang Wang, Ming Zhang

Dependency context-based word embedding jointly learns the representations of word and dependency context, and has been proved effective in aspect term extraction.

POS Term Extraction

TinyBERT: Distilling BERT for Natural Language Understanding

5 code implementations Findings of the Association for Computational Linguistics 2020 Xiaoqi Jiao, Yichun Yin, Lifeng Shang, Xin Jiang, Xiao Chen, Linlin Li, Fang Wang, Qun Liu

To accelerate inference and reduce model size while maintaining accuracy, we first propose a novel Transformer distillation method that is specially designed for knowledge distillation (KD) of the Transformer-based models.

Knowledge Distillation Language Modelling +6

Dialog State Tracking with Reinforced Data Augmentation

no code implementations21 Aug 2019 Yichun Yin, Lifeng Shang, Xin Jiang, Xiao Chen, Qun Liu

Neural dialog state trackers are generally limited due to the lack of quantity and diversity of annotated training data.

Data Augmentation reinforcement-learning

Unsupervised Word and Dependency Path Embeddings for Aspect Term Extraction

no code implementations25 May 2016 Yichun Yin, Furu Wei, Li Dong, Kaimeng Xu, Ming Zhang, Ming Zhou

In this paper, we develop a novel approach to aspect term extraction based on unsupervised learning of distributed representations of words and dependency paths.

Term Extraction

Cannot find the paper you are looking for? You can Submit a new open access paper.