Search Results for author: Alexandre Muzio

Found 8 papers, 3 papers with code

SEER-MoE: Sparse Expert Efficiency through Regularization for Mixture-of-Experts

no code implementations • 7 Apr 2024 • Alexandre Muzio, Alex Sun, Churan He

The advancement of deep learning has led to the emergence of Mixture-of-Experts (MoEs) models, known for their dynamic allocation of computational resources based on input.

Paper
Add Code

Gating Dropout: Communication-efficient Regularization for Sparsely Activated Transformers

no code implementations • 28 May 2022 • Rui Liu, Young Jin Kim, Alexandre Muzio, Hany Hassan Awadalla

Sparsely activated transformers, such as Mixture of Experts (MoE), have received great interest due to their outrageous scaling capability which enables dramatical increases in model size without significant increases in computational cost.

Machine Translation

Paper
Add Code

Multilingual Machine Translation Systems from Microsoft for WMT21 Shared Task

no code implementations • WMT (EMNLP) 2021 • Jian Yang, Shuming Ma, Haoyang Huang, Dongdong Zhang, Li Dong, Shaohan Huang, Alexandre Muzio, Saksham Singhal, Hany Hassan Awadalla, Xia Song, Furu Wei

This report describes Microsoft's machine translation systems for the WMT21 shared task on large-scale multilingual machine translation.

Decoder Machine Translation +1

Paper
Add Code

Scalable and Efficient MoE Training for Multitask Multilingual Models

1 code implementation • 22 Sep 2021 • Young Jin Kim, Ammar Ahmad Awan, Alexandre Muzio, Andres Felipe Cruz Salinas, Liyang Lu, Amr Hendy, Samyam Rajbhandari, Yuxiong He, Hany Hassan Awadalla

By combining the efficient system and training methods, we are able to significantly scale up large multitask multilingual models for language generation which results in a great improvement in model accuracy.

Machine Translation Text Generation

33,183

Paper
Code

Improving Multilingual Translation by Representation and Gradient Regularization

1 code implementation • EMNLP 2021 • Yilin Yang, Akiko Eriguchi, Alexandre Muzio, Prasad Tadepalli, Stefan Lee, Hany Hassan

At the gradient level, we leverage a small amount of direct data (in thousands of sentence pairs) to regularize model gradients.

Decoder Machine Translation +3

Paper
Code

Discovering Representation Sprachbund For Multilingual Pre-Training

no code implementations • Findings (EMNLP) 2021 • Yimin Fan, Yaobo Liang, Alexandre Muzio, Hany Hassan, Houqiang Li, Ming Zhou, Nan Duan

Then we cluster all the target languages into multiple groups and name each group as a representation sprachbund.

Multilingual NLP

Paper
Add Code

DeltaLM: Encoder-Decoder Pre-training for Language Generation and Translation by Augmenting Pretrained Multilingual Encoders

2 code implementations • 25 Jun 2021 • Shuming Ma, Li Dong, Shaohan Huang, Dongdong Zhang, Alexandre Muzio, Saksham Singhal, Hany Hassan Awadalla, Xia Song, Furu Wei

While pretrained encoders have achieved success in various natural language understanding (NLU) tasks, there is a gap between these pretrained encoders and natural language generation (NLG).

Abstractive Text Summarization Decoder +6

18,719

Paper
Code

XLM-T: Scaling up Multilingual Machine Translation with Pretrained Cross-lingual Transformer Encoders

no code implementations • 31 Dec 2020 • Shuming Ma, Jian Yang, Haoyang Huang, Zewen Chi, Li Dong, Dongdong Zhang, Hany Hassan Awadalla, Alexandre Muzio, Akiko Eriguchi, Saksham Singhal, Xia Song, Arul Menezes, Furu Wei

Multilingual machine translation enables a single model to translate between different languages.

Language Modelling Machine Translation +2

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.