However, large language model pre-training costs intensive computational resources and most of the models are trained from scratch without reusing the existing pre-trained models, which is wasteful.
Human-designed rules are widely used to build industry applications.
Specifically, we carefully design the techniques of one-shot learning and the search space to provide an adaptive and efficient development way of tiny PLMs for various latency constraints.
Task-agnostic knowledge distillation, a teacher-student framework, has been proved effective for BERT compression.
The multilingual pre-trained language models (e. g, mBERT, XLM and XLM-R) have shown impressive performance on cross-lingual natural language understanding tasks.
Comprehensive experiments on the evaluation benchmarks demonstrate that 1) layer mapping strategy has a significant effect on task-agnostic BERT distillation and different layer mappings can result in quite different performances; 2) the optimal layer mapping strategy from the proposed search process consistently outperforms the other heuristic ones; 3) with the optimal layer mapping, our student model achieves state-of-the-art performance on the GLUE tasks.
Transformer-based pre-training models like BERT have achieved remarkable performance in many natural language processing tasks. However, these models are both computation and memory expensive, hindering their deployment to resource-constrained devices.
Dependency context-based word embedding jointly learns the representations of word and dependency context, and has been proved effective in aspect term extraction.
To accelerate inference and reduce model size while maintaining accuracy, we first propose a novel Transformer distillation method that is specially designed for knowledge distillation (KD) of the Transformer-based models.
Ranked #1 on Natural Language Inference on MultiNLI Dev
Neural dialog state trackers are generally limited due to the lack of quantity and diversity of annotated training data.
Document-level multi-aspect sentiment classification is an important task for customer relation management.
In this paper, we propose a simple and effective ensemble method to further boost the performances of neural models.
In this paper, we develop a novel approach to aspect term extraction based on unsupervised learning of distributed representations of words and dependency paths.