no code implementations • 7 Apr 2024 • Alexandre Muzio, Alex Sun, Churan He
The advancement of deep learning has led to the emergence of Mixture-of-Experts (MoEs) models, known for their dynamic allocation of computational resources based on input.
no code implementations • 28 May 2022 • Rui Liu, Young Jin Kim, Alexandre Muzio, Hany Hassan Awadalla
Sparsely activated transformers, such as Mixture of Experts (MoE), have received great interest due to their outrageous scaling capability which enables dramatical increases in model size without significant increases in computational cost.
no code implementations • WMT (EMNLP) 2021 • Jian Yang, Shuming Ma, Haoyang Huang, Dongdong Zhang, Li Dong, Shaohan Huang, Alexandre Muzio, Saksham Singhal, Hany Hassan Awadalla, Xia Song, Furu Wei
This report describes Microsoft's machine translation systems for the WMT21 shared task on large-scale multilingual machine translation.
1 code implementation • 22 Sep 2021 • Young Jin Kim, Ammar Ahmad Awan, Alexandre Muzio, Andres Felipe Cruz Salinas, Liyang Lu, Amr Hendy, Samyam Rajbhandari, Yuxiong He, Hany Hassan Awadalla
By combining the efficient system and training methods, we are able to significantly scale up large multitask multilingual models for language generation which results in a great improvement in model accuracy.
1 code implementation • EMNLP 2021 • Yilin Yang, Akiko Eriguchi, Alexandre Muzio, Prasad Tadepalli, Stefan Lee, Hany Hassan
At the gradient level, we leverage a small amount of direct data (in thousands of sentence pairs) to regularize model gradients.
no code implementations • Findings (EMNLP) 2021 • Yimin Fan, Yaobo Liang, Alexandre Muzio, Hany Hassan, Houqiang Li, Ming Zhou, Nan Duan
Then we cluster all the target languages into multiple groups and name each group as a representation sprachbund.
2 code implementations • 25 Jun 2021 • Shuming Ma, Li Dong, Shaohan Huang, Dongdong Zhang, Alexandre Muzio, Saksham Singhal, Hany Hassan Awadalla, Xia Song, Furu Wei
While pretrained encoders have achieved success in various natural language understanding (NLU) tasks, there is a gap between these pretrained encoders and natural language generation (NLG).
no code implementations • 31 Dec 2020 • Shuming Ma, Jian Yang, Haoyang Huang, Zewen Chi, Li Dong, Dongdong Zhang, Hany Hassan Awadalla, Alexandre Muzio, Akiko Eriguchi, Saksham Singhal, Xia Song, Arul Menezes, Furu Wei
Multilingual machine translation enables a single model to translate between different languages.