no code implementations • 15 Oct 2024 • Syeda Nahida Akter, Shrimai Prabhumoye, John Kamalu, Sanjeev Satheesh, Eric Nyberg, Mostofa Patwary, Mohammad Shoeybi, Bryan Catanzaro
The utility of synthetic data to enhance pretraining data quality and hence to improve downstream task accuracy has been widely explored in recent large language models (LLMs).
no code implementations • 8 Jul 2024 • Jupinder Parmar, Shrimai Prabhumoye, Joseph Jennings, Bo Liu, Aastha Jhunjhunwala, Zhilin Wang, Mostofa Patwary, Mohammad Shoeybi, Bryan Catanzaro
The impressive capabilities of recent language models can be largely attributed to the multi-trillion token pretraining datasets that they are trained on.
1 code implementation • 17 Jun 2024 • Nvidia, :, Bo Adler, Niket Agarwal, Ashwath Aithal, Dong H. Anh, Pallab Bhattacharya, Annika Brundyn, Jared Casper, Bryan Catanzaro, Sharon Clay, Jonathan Cohen, Sirshak Das, Ayush Dattagupta, Olivier Delalleau, Leon Derczynski, Yi Dong, Daniel Egert, Ellie Evans, Aleksander Ficek, Denys Fridman, Shaona Ghosh, Boris Ginsburg, Igor Gitman, Tomasz Grzegorzek, Robert Hero, Jining Huang, Vibhu Jawa, Joseph Jennings, Aastha Jhunjhunwala, John Kamalu, Sadaf Khan, Oleksii Kuchaiev, Patrick Legresley, Hui Li, Jiwei Liu, Zihan Liu, Eileen Long, Ameya Sunil Mahabaleshwarkar, Somshubra Majumdar, James Maki, Miguel Martinez, Maer Rodrigues de Melo, Ivan Moshkov, Deepak Narayanan, Sean Narenthiran, Jesus Navarro, Phong Nguyen, Osvald Nitski, Vahid Noroozi, Guruprasad Nutheti, Christopher Parisien, Jupinder Parmar, Mostofa Patwary, Krzysztof Pawelec, Wei Ping, Shrimai Prabhumoye, Rajarshi Roy, Trisha Saar, Vasanth Rao Naik Sabavat, Sanjeev Satheesh, Jane Polak Scowcroft, Jason Sewall, Pavel Shamis, Gerald Shen, Mohammad Shoeybi, Dave Sizer, Misha Smelyanskiy, Felipe Soares, Makesh Narsimhan Sreedhar, Dan Su, Sandeep Subramanian, Shengyang Sun, Shubham Toshniwal, Hao Wang, Zhilin Wang, Jiaxuan You, Jiaqi Zeng, Jimmy Zhang, Jing Zhang, Vivienne Zhang, Yian Zhang, Chen Zhu
We release the Nemotron-4 340B model family, including Nemotron-4-340B-Base, Nemotron-4-340B-Instruct, and Nemotron-4-340B-Reward.
1 code implementation • 17 Apr 2024 • Yue Wu, Yewen Fan, So Yeon Min, Shrimai Prabhumoye, Stephen Mcaleer, Yonatan Bisk, Ruslan Salakhutdinov, Yuanzhi Li, Tom Mitchell
The chains of nodes can be designed to explicitly enforce a naturally structured "thought process".
no code implementations • 26 Feb 2024 • Jupinder Parmar, Shrimai Prabhumoye, Joseph Jennings, Mostofa Patwary, Sandeep Subramanian, Dan Su, Chen Zhu, Deepak Narayanan, Aastha Jhunjhunwala, Ayush Dattagupta, Vibhu Jawa, Jiwei Liu, Ameya Mahabaleshwarkar, Osvald Nitski, Annika Brundyn, James Maki, Miguel Martinez, Jiaxuan You, John Kamalu, Patrick Legresley, Denys Fridman, Jared Casper, Ashwath Aithal, Oleksii Kuchaiev, Mohammad Shoeybi, Jonathan Cohen, Bryan Catanzaro
We introduce Nemotron-4 15B, a 15-billion-parameter large multilingual language model trained on 8 trillion text tokens.
1 code implementation • 24 May 2023 • Yue Wu, Shrimai Prabhumoye, So Yeon Min, Yonatan Bisk, Ruslan Salakhutdinov, Amos Azaria, Tom Mitchell, Yuanzhi Li
Finally, we show the potential of games as a test bed for LLMs.
no code implementations • 3 May 2023 • Yue Wu, So Yeon Min, Yonatan Bisk, Ruslan Salakhutdinov, Amos Azaria, Yuanzhi Li, Tom Mitchell, Shrimai Prabhumoye
We propose the Plan, Eliminate, and Track (PET) framework.
3 code implementations • NeurIPS 2023 • Aman Madaan, Niket Tandon, Prakhar Gupta, Skyler Hallinan, Luyu Gao, Sarah Wiegreffe, Uri Alon, Nouha Dziri, Shrimai Prabhumoye, Yiming Yang, Shashank Gupta, Bodhisattwa Prasad Majumder, Katherine Hermann, Sean Welleck, Amir Yazdanbakhsh, Peter Clark
Motivated by how humans refine their written text, we introduce Self-Refine, an approach for improving initial outputs from LLMs through iterative feedback and refinement.
no code implementations • 14 Feb 2023 • Rafal Kocielnik, Shrimai Prabhumoye, Vivian Zhang, Roy Jiang, R. Michael Alvarez, Anima Anandkumar
We thus enable seamless open-ended social bias testing of PLMs by domain experts through an automatic large-scale generation of diverse test sentences for any combination of social categories and attributes.
no code implementations • 14 Feb 2023 • Shrimai Prabhumoye, Mostofa Patwary, Mohammad Shoeybi, Bryan Catanzaro
Pretrained large language models have become indispensable for solving various natural language processing (NLP) tasks.
no code implementations • 21 Nov 2022 • Rafal Kocielnik, Sara Kangaslahti, Shrimai Prabhumoye, Meena Hari, R. Michael Alvarez, Anima Anandkumar
Finally, we find that not all transfer scenarios yield a positive gain, which seems related to the PLMs initial performance on the target-domain task.
no code implementations • 25 Oct 2022 • Peng Xu, Mostofa Patwary, Shrimai Prabhumoye, Virginia Adams, Ryan J. Prenger, Wei Ping, Nayeon Lee, Mohammad Shoeybi, Bryan Catanzaro
For cross-domain and cross-dataset cases, we show that (a) Adapter (Houlsby et al., 2019) performs the best amongst all the PERMs studied here, and (b) it outperforms finetuning if the task dataset is below a certain size.
no code implementations • 12 Oct 2022 • Dan Su, Mostofa Patwary, Shrimai Prabhumoye, Peng Xu, Ryan Prenger, Mohammad Shoeybi, Pascale Fung, Anima Anandkumar, Bryan Catanzaro
Prior work on closed-book QA either directly finetunes or prompts a pretrained language model (LM) to leverage the stored knowledge.
1 code implementation • Findings (ACL) 2022 • Zihan Liu, Mostofa Patwary, Ryan Prenger, Shrimai Prabhumoye, Wei Ping, Mohammad Shoeybi, Bryan Catanzaro
We propose a multi-stage prompting approach to generate knowledgeable responses from a single pretrained LM.
2 code implementations • 28 Jan 2022 • Shaden Smith, Mostofa Patwary, Brandon Norick, Patrick Legresley, Samyam Rajbhandari, Jared Casper, Zhun Liu, Shrimai Prabhumoye, George Zerveas, Vijay Korthikanti, Elton Zhang, Rewon Child, Reza Yazdani Aminabadi, Julie Bernauer, Xia Song, Mohammad Shoeybi, Yuxiong He, Michael Houston, Saurabh Tiwary, Bryan Catanzaro
Next, we detail the training process, the design of our training corpus, and our data curation techniques, which we believe is a key ingredient to the success of the model.
Ranked #35 on Language Modelling on LAMBADA
no code implementations • 15 Dec 2021 • Shrimai Prabhumoye, Rafal Kocielnik, Mohammad Shoeybi, Anima Anandkumar, Bryan Catanzaro
We then provide the LM with instruction that consists of this subset of labeled exemplars, the query text to be classified, a definition of bias, and prompt it to make a decision.
1 code implementation • NAACL 2021 • Shrimai Prabhumoye, Kazuma Hashimoto, Yingbo Zhou, Alan W Black, Ruslan Salakhutdinov
Document grounded generation is the task of using the information provided in a document to improve text generation.
1 code implementation • CSRR (ACL) 2022 • Dheeraj Rajagopal, Aman Madaan, Niket Tandon, Yiming Yang, Shrimai Prabhumoye, Abhilasha Ravichander, Peter Clark, Eduard Hovy
Recently, models have been shown to predict the effects of unexpected situations, e. g., would cloudy skies help or hinder plant growth?
no code implementations • 22 Oct 2020 • Aman Madaan, Dheeraj Rajagopal, Yiming Yang, Abhilasha Ravichander, Eduard Hovy, Shrimai Prabhumoye
Reasoning about events and tracking their influences is fundamental to understanding processes.
no code implementations • NAACL 2021 • Shrimai Prabhumoye, Brendon Boldt, Ruslan Salakhutdinov, Alan W Black
Recent work in natural language processing (NLP) has focused on ethical challenges such as understanding and mitigating bias in data and algorithms; identifying objectionable content like hate speech, stereotypes and offensive language; and building frameworks for better system design and data handling practices.
no code implementations • COLING 2020 • Shrimai Prabhumoye, Alan W. black, Ruslan Salakhutdinov
In this work, we provide a new schema of the pipeline of the generation process by classifying it into five modules.
2 code implementations • ACL 2020 • Shrimai Prabhumoye, Ruslan Salakhutdinov, Alan W. black
Sentence ordering is the task of arranging the sentences of a given text in the correct order.
1 code implementation • ACL 2020 • Aman Madaan, Amrith Setlur, Tanmay Parekh, Barnabas Poczos, Graham Neubig, Yiming Yang, Ruslan Salakhutdinov, Alan W. black, Shrimai Prabhumoye
This paper introduces a new task of politeness transfer which involves converting non-polite sentences to polite sentences while preserving the meaning.
no code implementations • 7 Feb 2020 • Shrimai Prabhumoye, Margaret Li, Jack Urbanek, Emily Dinan, Douwe Kiela, Jason Weston, Arthur Szlam
Dialogue research tends to distinguish between chit-chat and goal-oriented tasks.
no code implementations • 14 Jan 2020 • Rahul Radhakrishnan Iyer, Rohan Kohli, Shrimai Prabhumoye
With the rapid growth of e-Commerce, online product search has emerged as a popular and effective paradigm for customers to find desired products and engage in online shopping.
no code implementations • 20 Nov 2019 • Angela Fan, Jack Urbanek, Pratik Ringshia, Emily Dinan, Emma Qian, Siddharth Karamcheti, Shrimai Prabhumoye, Douwe Kiela, Tim Rocktaschel, Arthur Szlam, Jason Weston
We show that the game environments created with our approach are cohesive, diverse, and preferred by human evaluators compared to other machine learning based world construction algorithms.
no code implementations • WS 2019 • Elijah Mayfield, Michael Madaio, Shrimai Prabhumoye, David Gerritsen, Brittany McLaughlin, Ezekiel Dixon-Rom{\'a}n, Alan W. black
There is a long record of research on equity in schools.
no code implementations • WS 2019 • Ch, Khyathi u, Shrimai Prabhumoye, Ruslan Salakhutdinov, Alan W. black
To this end, we propose five models which are incremental extensions to the baseline model to perform the task at hand.
no code implementations • WS 2019 • Shrimai Prabhumoye, Elijah Mayfield, Alan W. black
We critique recent work on ethics in natural language processing.
no code implementations • 14 Jun 2019 • Shrimai Prabhumoye, Khyathi Raghavi Chandu, Ruslan Salakhutdinov, Alan W. black
To this end, we propose five models which are incremental extensions to the baseline model to perform the task at hand.
no code implementations • NAACL 2019 • Shrimai Prabhumoye, Chris Quirk, Michel Galley
Recent work in neural generation has attracted significant interest in controlling the form of text, such as style, persona, and politeness.
2 code implementations • 31 Jan 2019 • Emily Dinan, Varvara Logacheva, Valentin Malykh, Alexander Miller, Kurt Shuster, Jack Urbanek, Douwe Kiela, Arthur Szlam, Iulian Serban, Ryan Lowe, Shrimai Prabhumoye, Alan W. black, Alexander Rudnicky, Jason Williams, Joelle Pineau, Mikhail Burtsev, Jason Weston
We describe the setting and results of the ConvAI2 NeurIPS competition that aims to further the state-of-the-art in open-domain chatbots.
3 code implementations • EMNLP 2018 • Kangyan Zhou, Shrimai Prabhumoye, Alan W. black
We define "Document Grounded Conversations" as conversations that are about the contents of a specified document.
no code implementations • 17 Sep 2018 • Shrimai Prabhumoye, Yulia Tsvetkov, Alan W. black, Ruslan Salakhutdinov
Style transfer is the task of transferring an attribute of a sentence (e. g., formality) while maintaining its semantic content.
3 code implementations • ACL 2018 • Shrimai Prabhumoye, Yulia Tsvetkov, Ruslan Salakhutdinov, Alan W. black
We first learn a latent representation of the input sentence which is grounded in a language translation model in order to better preserve the meaning of the sentence while reducing stylistic properties.
Ranked #10 on Unsupervised Text Style Transfer on Yelp
no code implementations • WS 2017 • Shrimai Prabhumoye, Samridhi Choudhary, Evangelia Spiliopoulou, Christopher Bogart, Carolyn Penstein Rose, Alan W. black
There has been a long standing interest in understanding `Social Influence' both in Social Sciences and in Computational Linguistics.