Search Results for author: Shruti Bhosale

Found 18 papers, 8 papers with code

Facebook AI’s WMT21 News Translation Task Submission

1 code implementation WMT (EMNLP) 2021 Chau Tran, Shruti Bhosale, James Cross, Philipp Koehn, Sergey Edunov, Angela Fan

We describe Facebook’s multilingual model submission to the WMT2021 shared task on news translation.


Effective Long-Context Scaling of Foundation Models

no code implementations27 Sep 2023 Wenhan Xiong, Jingyu Liu, Igor Molybog, Hejia Zhang, Prajjwal Bhargava, Rui Hou, Louis Martin, Rashi Rungta, Karthik Abinav Sankararaman, Barlas Oguz, Madian Khabsa, Han Fang, Yashar Mehdad, Sharan Narang, Kshitiz Malik, Angela Fan, Shruti Bhosale, Sergey Edunov, Mike Lewis, Sinong Wang, Hao Ma

We also examine the impact of various design choices in the pretraining process, including the data mix and the training curriculum of sequence lengths -- our ablation experiments suggest that having abundant long texts in the pretrain dataset is not the key to achieving strong performance, and we empirically verify that long context continual pretraining is more efficient and similarly effective compared to pretraining from scratch with long sequences.

Continual Pretraining Language Modelling

Revisiting Machine Translation for Cross-lingual Classification

no code implementations23 May 2023 Mikel Artetxe, Vedanuj Goswami, Shruti Bhosale, Angela Fan, Luke Zettlemoyer

Machine Translation (MT) has been widely used for cross-lingual classification, either by translating the test set into English and running inference with a monolingual model (translate-test), or translating the training set into the target languages and finetuning a multilingual model (translate-train).

Classification Cross-Lingual Transfer +2

Towards MoE Deployment: Mitigating Inefficiencies in Mixture-of-Expert (MoE) Inference

no code implementations10 Mar 2023 Haiyang Huang, Newsha Ardalani, Anna Sun, Liu Ke, Hsien-Hsin S. Lee, Anjali Sridhar, Shruti Bhosale, Carole-Jean Wu, Benjamin Lee

We propose three optimization techniques to mitigate sources of inefficiencies, namely (1) Dynamic gating, (2) Expert Buffering, and (3) Expert load balancing.

Language Modelling Machine Translation

Fixing MoE Over-Fitting on Low-Resource Languages in Multilingual Machine Translation

no code implementations15 Dec 2022 Maha Elbayad, Anna Sun, Shruti Bhosale

Sparsely gated Mixture of Experts (MoE) models have been shown to be a compute-efficient method to scale model capacity for multilingual machine translation.

Machine Translation Translation

Causes and Cures for Interference in Multilingual Translation

no code implementations14 Dec 2022 Uri Shaham, Maha Elbayad, Vedanuj Goswami, Omer Levy, Shruti Bhosale

Multilingual machine translation models can benefit from synergy between different language pairs, but also suffer from interference.

Machine Translation Translation

Multilingual Machine Translation with Hyper-Adapters

3 code implementations22 May 2022 Christos Baziotis, Mikel Artetxe, James Cross, Shruti Bhosale

We find that hyper-adapters are more parameter efficient than regular adapters, reaching the same performance with up to 12 times less parameters.

Machine Translation Translation

Data Selection Curriculum for Neural Machine Translation

no code implementations25 Mar 2022 Tasnim Mohiuddin, Philipp Koehn, Vishrav Chaudhary, James Cross, Shruti Bhosale, Shafiq Joty

In this work, we introduce a two-stage curriculum training framework for NMT where we fine-tune a base NMT model on subsets of data, selected by both deterministic scoring using pre-trained methods and online scoring that considers prediction scores of the emerging NMT model.

Machine Translation NMT +1

Efficient Large Scale Language Modeling with Mixtures of Experts

no code implementations20 Dec 2021 Mikel Artetxe, Shruti Bhosale, Naman Goyal, Todor Mihaylov, Myle Ott, Sam Shleifer, Xi Victoria Lin, Jingfei Du, Srinivasan Iyer, Ramakanth Pasunuru, Giri Anantharaman, Xian Li, Shuohui Chen, Halil Akin, Mandeep Baines, Louis Martin, Xing Zhou, Punit Singh Koura, Brian O'Horo, Jeff Wang, Luke Zettlemoyer, Mona Diab, Zornitsa Kozareva, Ves Stoyanov

This paper presents a detailed empirical study of how autoregressive MoE language models scale in comparison with dense models in a wide range of settings: in- and out-of-domain language modeling, zero- and few-shot priming, and full-shot fine-tuning.

Language Modelling

Tricks for Training Sparse Translation Models

no code implementations NAACL 2022 Dheeru Dua, Shruti Bhosale, Vedanuj Goswami, James Cross, Mike Lewis, Angela Fan

Multi-task learning with an unbalanced data distribution skews model learning towards high resource tasks, especially when model capacity is fixed and fully shared across all tasks.

Machine Translation Multi-Task Learning +1

Facebook AI WMT21 News Translation Task Submission

no code implementations6 Aug 2021 Chau Tran, Shruti Bhosale, James Cross, Philipp Koehn, Sergey Edunov, Angela Fan

We describe Facebook's multilingual model submission to the WMT2021 shared task on news translation.


BASE Layers: Simplifying Training of Large, Sparse Models

1 code implementation30 Mar 2021 Mike Lewis, Shruti Bhosale, Tim Dettmers, Naman Goyal, Luke Zettlemoyer

Sparse layers can dramatically improve the efficiency of training and inference by routing each token to specialized expert modules that contain only a small fraction of the model parameters.

Language Models not just for Pre-training: Fast Online Neural Noisy Channel Modeling

1 code implementation WMT (EMNLP) 2020 Shruti Bhosale, Kyra Yee, Sergey Edunov, Michael Auli

Pre-training models on vast quantities of unlabeled data has emerged as an effective approach to improving accuracy on many NLP tasks.

 Ranked #1 on Machine Translation on WMT2016 Romanian-English (using extra training data)

Machine Translation Translation

Beyond English-Centric Multilingual Machine Translation

7 code implementations21 Oct 2020 Angela Fan, Shruti Bhosale, Holger Schwenk, Zhiyi Ma, Ahmed El-Kishky, Siddharth Goyal, Mandeep Baines, Onur Celebi, Guillaume Wenzek, Vishrav Chaudhary, Naman Goyal, Tom Birch, Vitaliy Liptchinsky, Sergey Edunov, Edouard Grave, Michael Auli, Armand Joulin

Existing work in translation demonstrated the potential of massively multilingual machine translation by training a single model able to translate between any pair of languages.

Machine Translation Translation

Cannot find the paper you are looking for? You can Submit a new open access paper.