Search Results for author: Mostofa Patwary

Found 20 papers, 10 papers with code

Evaluating Parameter Efficient Learning for Generation

no code implementations25 Oct 2022 Peng Xu, Mostofa Patwary, Shrimai Prabhumoye, Virginia Adams, Ryan J. Prenger, Wei Ping, Nayeon Lee, Mohammad Shoeybi, Bryan Catanzaro

For cross-domain and cross-dataset cases, we show that (a) Adapter (Houlsby et al., 2019) performs the best amongst all the PERMs studied here, and (b) it outperforms finetuning if the task dataset is below a certain size.

Factuality Enhanced Language Models for Open-Ended Text Generation

3 code implementations9 Jun 2022 Nayeon Lee, Wei Ping, Peng Xu, Mostofa Patwary, Pascale Fung, Mohammad Shoeybi, Bryan Catanzaro

In this work, we measure and improve the factual accuracy of large-scale LMs for open-ended text generation.

Misconceptions Sentence +2

Efficient Large-Scale Language Model Training on GPU Clusters Using Megatron-LM

1 code implementation9 Apr 2021 Deepak Narayanan, Mohammad Shoeybi, Jared Casper, Patrick Legresley, Mostofa Patwary, Vijay Anand Korthikanti, Dmitri Vainbrand, Prethvi Kashinkunti, Julie Bernauer, Bryan Catanzaro, Amar Phanishayee, Matei Zaharia

In this paper, we show how different types of parallelism methods (tensor, pipeline, and data parallelism) can be composed to scale to thousands of GPUs and models with trillions of parameters.

Language Modelling

Local Knowledge Powered Conversational Agents

1 code implementation20 Oct 2020 Sashank Santhanam, Wei Ping, Raul Puri, Mohammad Shoeybi, Mostofa Patwary, Bryan Catanzaro

State-of-the-art conversational agents have advanced significantly in conjunction with the use of large transformer-based language models.

Informativeness

BioMegatron: Larger Biomedical Domain Language Model

1 code implementation EMNLP 2020 Hoo-chang Shin, Yang Zhang, Evelina Bakhturina, Raul Puri, Mostofa Patwary, Mohammad Shoeybi, Raghav Mani

There has been an influx of biomedical domain-specific language models, showing language models pre-trained on biomedical text perform better on biomedical domain benchmarks than those trained on general domain text corpora such as Wikipedia and Books.

Language Modelling named-entity-recognition +4

Large Scale Multi-Actor Generative Dialog Modeling

no code implementations ACL 2020 Alex Boyd, Raul Puri, Mohammad Shoeybi, Mostofa Patwary, Bryan Catanzaro

This work introduces the Generative Conversation Control model, an augmented and fine-tuned GPT-2 language model that conditions on past reference conversations to probabilistically model multi-turn conversations in the actor's persona.

Goal-Oriented Dialog Language Modelling

Training Question Answering Models From Synthetic Data

no code implementations EMNLP 2020 Raul Puri, Ryan Spring, Mostofa Patwary, Mohammad Shoeybi, Bryan Catanzaro

On the SQuAD1. 1 question answering task, we achieve higher accuracy using solely synthetic questions and answers than when using the SQuAD1. 1 training set questions alone.

Answer Generation Data Augmentation +1

Megatron-LM: Training Multi-Billion Parameter Language Models Using Model Parallelism

10 code implementations17 Sep 2019 Mohammad Shoeybi, Mostofa Patwary, Raul Puri, Patrick Legresley, Jared Casper, Bryan Catanzaro

To demonstrate that large language models can further advance the state of the art (SOTA), we train an 8. 3 billion parameter transformer language model similar to GPT-2 and a 3. 9 billion parameter model similar to BERT.

LAMBADA Language Modelling +1

Coloring Big Graphs with AlphaGoZero

no code implementations26 Feb 2019 Jiayi Huang, Mostofa Patwary, Gregory Diamos

We show that recent innovations in deep reinforcement learning can effectively color very large graphs -- a well-known NP-hard problem with clear commercial applications.

Language Modeling at Scale

no code implementations23 Oct 2018 Mostofa Patwary, Milind Chabbi, Heewoo Jun, Jiaji Huang, Gregory Diamos, Kenneth Church

We show how Zipf's Law can be used to scale up language modeling (LM) to take advantage of more training data and more GPUs.

Language Modelling Machine Translation +2

Cannot find the paper you are looking for? You can Submit a new open access paper.