Search Results for author: Alexandre Muzio

Found 8 papers, 3 papers with code

SEER-MoE: Sparse Expert Efficiency through Regularization for Mixture-of-Experts

no code implementations7 Apr 2024 Alexandre Muzio, Alex Sun, Churan He

The advancement of deep learning has led to the emergence of Mixture-of-Experts (MoEs) models, known for their dynamic allocation of computational resources based on input.

Gating Dropout: Communication-efficient Regularization for Sparsely Activated Transformers

no code implementations28 May 2022 Rui Liu, Young Jin Kim, Alexandre Muzio, Hany Hassan Awadalla

Sparsely activated transformers, such as Mixture of Experts (MoE), have received great interest due to their outrageous scaling capability which enables dramatical increases in model size without significant increases in computational cost.

Machine Translation

Scalable and Efficient MoE Training for Multitask Multilingual Models

1 code implementation22 Sep 2021 Young Jin Kim, Ammar Ahmad Awan, Alexandre Muzio, Andres Felipe Cruz Salinas, Liyang Lu, Amr Hendy, Samyam Rajbhandari, Yuxiong He, Hany Hassan Awadalla

By combining the efficient system and training methods, we are able to significantly scale up large multitask multilingual models for language generation which results in a great improvement in model accuracy.

Machine Translation Text Generation

DeltaLM: Encoder-Decoder Pre-training for Language Generation and Translation by Augmenting Pretrained Multilingual Encoders

2 code implementations25 Jun 2021 Shuming Ma, Li Dong, Shaohan Huang, Dongdong Zhang, Alexandre Muzio, Saksham Singhal, Hany Hassan Awadalla, Xia Song, Furu Wei

While pretrained encoders have achieved success in various natural language understanding (NLU) tasks, there is a gap between these pretrained encoders and natural language generation (NLG).

Abstractive Text Summarization Machine Translation +5

Cannot find the paper you are looking for? You can Submit a new open access paper.