Search Results for author: Yimeng Wu

Found 6 papers, 2 papers with code

Universal-KD: Attention-based Output-Grounded Intermediate Layer Knowledge Distillation

no code implementations • EMNLP 2021 • Yimeng Wu, Mehdi Rezagholizadeh, Abbas Ghaddar, Md Akmal Haidar, Ali Ghodsi

Intermediate layer matching is shown as an effective approach for improving knowledge distillation (KD).

Paper
Add Code

AraMUS: Pushing the Limits of Data and Model Scale for Arabic Natural Language Processing

no code implementations • 11 Jun 2023 • Asaad Alghamdi, Xinyu Duan, Wei Jiang, Zhenhai Wang, Yimeng Wu, Qingrong Xia, Zhefeng Wang, Yi Zheng, Mehdi Rezagholizadeh, Baoxing Huai, Peilun Cheng, Abbas Ghaddar

Developing monolingual large Pre-trained Language Models (PLMs) is shown to be very successful in handling different tasks in Natural Language Processing (NLP).

Few-Shot Learning

Paper
Add Code

Revisiting Pre-trained Language Models and their Evaluation for Arabic Natural Language Understanding

no code implementations • 21 May 2022 • Abbas Ghaddar, Yimeng Wu, Sunyam Bagga, Ahmad Rashid, Khalil Bibi, Mehdi Rezagholizadeh, Chao Xing, Yasheng Wang, Duan Xinyu, Zhefeng Wang, Baoxing Huai, Xin Jiang, Qun Liu, Philippe Langlais

There is a growing body of work in recent years to develop pre-trained language models (PLMs) for the Arabic language.

Natural Language Understanding

Paper
Add Code

JABER and SABER: Junior and Senior Arabic BERt

1 code implementation • 8 Dec 2021 • Abbas Ghaddar, Yimeng Wu, Ahmad Rashid, Khalil Bibi, Mehdi Rezagholizadeh, Chao Xing, Yasheng Wang, Duan Xinyu, Zhefeng Wang, Baoxing Huai, Xin Jiang, Qun Liu, Philippe Langlais

Language-specific pre-trained models have proven to be more accurate than multilingual ones in a monolingual evaluation setting, Arabic is no exception.

Language Modelling NER

2,953

Paper
Code

ALP-KD: Attention-Based Layer Projection for Knowledge Distillation

no code implementations • 27 Dec 2020 • Peyman Passban, Yimeng Wu, Mehdi Rezagholizadeh, Qun Liu

Knowledge distillation is considered as a training and compression strategy in which two neural networks, namely a teacher and a student, are coupled together during training.

Knowledge Distillation

Paper
Add Code

Why Skip If You Can Combine: A Simple Knowledge Distillation Technique for Intermediate Layers

2 code implementations • EMNLP 2020 • Yimeng Wu, Peyman Passban, Mehdi Rezagholizade, Qun Liu

With the growth of computing power neural machine translation (NMT) models also grow accordingly and become better.

Knowledge Distillation Machine Translation +2

Paper
Code

Cannot find the paper you are looking for? You can Submit a new open access paper.