no code implementations • 11 Dec 2020 • Fei Yuan, Linjun Shou, Jian Pei, Wutao Lin, Ming Gong, Yan Fu, Daxin Jiang
When multiple teacher models are available in distillation, the state-of-the-art methods assign a fixed weight to a teacher model in the whole distillation.
no code implementations • 18 Oct 2019 • Ze Yang, Linjun Shou, Ming Gong, Wutao Lin, Daxin Jiang
The experiment results show that our method can significantly outperform the baseline methods and even achieve comparable results with the original teacher models, along with substantial speedup of model inference.
no code implementations • 21 Apr 2019 • Ze Yang, Linjun Shou, Ming Gong, Wutao Lin, Daxin Jiang
Deep pre-training and fine-tuning models (like BERT, OpenAI GPT) have demonstrated excellent results in question answering areas.
2 code implementations • IJCNLP 2019 • Ming Gong, Linjun Shou, Wutao Lin, Zhijie Sang, Quanjia Yan, Ze Yang, Feixiang Cheng, Daxin Jiang
Deep Neural Networks (DNN) have been widely employed in industry to address various Natural Language Processing (NLP) tasks.