Search Results for author: Shrimai Prabhumoye

Found 36 papers, 13 papers with code

MIND: Math Informed syNthetic Dialogues for Pretraining LLMs

no code implementations15 Oct 2024 Syeda Nahida Akter, Shrimai Prabhumoye, John Kamalu, Sanjeev Satheesh, Eric Nyberg, Mostofa Patwary, Mohammad Shoeybi, Bryan Catanzaro

The utility of synthetic data to enhance pretraining data quality and hence to improve downstream task accuracy has been widely explored in recent large language models (LLMs).

GSM8K Math +2

Data, Data Everywhere: A Guide for Pretraining Dataset Construction

no code implementations8 Jul 2024 Jupinder Parmar, Shrimai Prabhumoye, Joseph Jennings, Bo Liu, Aastha Jhunjhunwala, Zhilin Wang, Mostofa Patwary, Mohammad Shoeybi, Bryan Catanzaro

The impressive capabilities of recent language models can be largely attributed to the multi-trillion token pretraining datasets that they are trained on.

Attribute

Nemotron-4 340B Technical Report

1 code implementation17 Jun 2024 Nvidia, :, Bo Adler, Niket Agarwal, Ashwath Aithal, Dong H. Anh, Pallab Bhattacharya, Annika Brundyn, Jared Casper, Bryan Catanzaro, Sharon Clay, Jonathan Cohen, Sirshak Das, Ayush Dattagupta, Olivier Delalleau, Leon Derczynski, Yi Dong, Daniel Egert, Ellie Evans, Aleksander Ficek, Denys Fridman, Shaona Ghosh, Boris Ginsburg, Igor Gitman, Tomasz Grzegorzek, Robert Hero, Jining Huang, Vibhu Jawa, Joseph Jennings, Aastha Jhunjhunwala, John Kamalu, Sadaf Khan, Oleksii Kuchaiev, Patrick Legresley, Hui Li, Jiwei Liu, Zihan Liu, Eileen Long, Ameya Sunil Mahabaleshwarkar, Somshubra Majumdar, James Maki, Miguel Martinez, Maer Rodrigues de Melo, Ivan Moshkov, Deepak Narayanan, Sean Narenthiran, Jesus Navarro, Phong Nguyen, Osvald Nitski, Vahid Noroozi, Guruprasad Nutheti, Christopher Parisien, Jupinder Parmar, Mostofa Patwary, Krzysztof Pawelec, Wei Ping, Shrimai Prabhumoye, Rajarshi Roy, Trisha Saar, Vasanth Rao Naik Sabavat, Sanjeev Satheesh, Jane Polak Scowcroft, Jason Sewall, Pavel Shamis, Gerald Shen, Mohammad Shoeybi, Dave Sizer, Misha Smelyanskiy, Felipe Soares, Makesh Narsimhan Sreedhar, Dan Su, Sandeep Subramanian, Shengyang Sun, Shubham Toshniwal, Hao Wang, Zhilin Wang, Jiaxuan You, Jiaqi Zeng, Jimmy Zhang, Jing Zhang, Vivienne Zhang, Yian Zhang, Chen Zhu

We release the Nemotron-4 340B model family, including Nemotron-4-340B-Base, Nemotron-4-340B-Instruct, and Nemotron-4-340B-Reward.

Synthetic Data Generation

AgentKit: Structured LLM Reasoning with Dynamic Graphs

1 code implementation17 Apr 2024 Yue Wu, Yewen Fan, So Yeon Min, Shrimai Prabhumoye, Stephen Mcaleer, Yonatan Bisk, Ruslan Salakhutdinov, Yuanzhi Li, Tom Mitchell

The chains of nodes can be designed to explicitly enforce a naturally structured "thought process".

BiasTestGPT: Using ChatGPT for Social Bias Testing of Language Models

no code implementations14 Feb 2023 Rafal Kocielnik, Shrimai Prabhumoye, Vivian Zhang, Roy Jiang, R. Michael Alvarez, Anima Anandkumar

We thus enable seamless open-ended social bias testing of PLMs by domain experts through an automatic large-scale generation of diverse test sentences for any combination of social categories and attributes.

Sentence Text Generation

Can You Label Less by Using Out-of-Domain Data? Active & Transfer Learning with Few-shot Instructions

no code implementations21 Nov 2022 Rafal Kocielnik, Sara Kangaslahti, Shrimai Prabhumoye, Meena Hari, R. Michael Alvarez, Anima Anandkumar

Finally, we find that not all transfer scenarios yield a positive gain, which seems related to the PLMs initial performance on the target-domain task.

Active Learning Transfer Learning

Evaluating Parameter Efficient Learning for Generation

no code implementations25 Oct 2022 Peng Xu, Mostofa Patwary, Shrimai Prabhumoye, Virginia Adams, Ryan J. Prenger, Wei Ping, Nayeon Lee, Mohammad Shoeybi, Bryan Catanzaro

For cross-domain and cross-dataset cases, we show that (a) Adapter (Houlsby et al., 2019) performs the best amongst all the PERMs studied here, and (b) it outperforms finetuning if the task dataset is below a certain size.

Few-shot Instruction Prompts for Pretrained Language Models to Detect Social Biases

no code implementations15 Dec 2021 Shrimai Prabhumoye, Rafal Kocielnik, Mohammad Shoeybi, Anima Anandkumar, Bryan Catanzaro

We then provide the LM with instruction that consists of this subset of labeled exemplars, the query text to be classified, a definition of bias, and prompt it to make a decision.

Case Study: Deontological Ethics in NLP

no code implementations NAACL 2021 Shrimai Prabhumoye, Brendon Boldt, Ruslan Salakhutdinov, Alan W Black

Recent work in natural language processing (NLP) has focused on ethical challenges such as understanding and mitigating bias in data and algorithms; identifying objectionable content like hate speech, stereotypes and offensive language; and building frameworks for better system design and data handling practices.

Ethics

Exploring Controllable Text Generation Techniques

no code implementations COLING 2020 Shrimai Prabhumoye, Alan W. black, Ruslan Salakhutdinov

In this work, we provide a new schema of the pipeline of the generation process by classifying it into five modules.

Text Generation

Politeness Transfer: A Tag and Generate Approach

1 code implementation ACL 2020 Aman Madaan, Amrith Setlur, Tanmay Parekh, Barnabas Poczos, Graham Neubig, Yiming Yang, Ruslan Salakhutdinov, Alan W. black, Shrimai Prabhumoye

This paper introduces a new task of politeness transfer which involves converting non-polite sentences to polite sentences while preserving the meaning.

Sentence Style Transfer +1

Modeling Product Search Relevance in e-Commerce

no code implementations14 Jan 2020 Rahul Radhakrishnan Iyer, Rohan Kohli, Shrimai Prabhumoye

With the rapid growth of e-Commerce, online product search has emerged as a popular and effective paradigm for customers to find desired products and engage in online shopping.

Information Retrieval Retrieval

Generating Interactive Worlds with Text

no code implementations20 Nov 2019 Angela Fan, Jack Urbanek, Pratik Ringshia, Emily Dinan, Emma Qian, Siddharth Karamcheti, Shrimai Prabhumoye, Douwe Kiela, Tim Rocktaschel, Arthur Szlam, Jason Weston

We show that the game environments created with our approach are cohesive, diverse, and preferred by human evaluators compared to other machine learning based world construction algorithms.

BIG-bench Machine Learning Common Sense Reasoning

"My Way of Telling a Story": Persona based Grounded Story Generation

no code implementations14 Jun 2019 Shrimai Prabhumoye, Khyathi Raghavi Chandu, Ruslan Salakhutdinov, Alan W. black

To this end, we propose five models which are incremental extensions to the baseline model to perform the task at hand.

Decoder Visual Storytelling

Towards Content Transfer through Grounded Text Generation

no code implementations NAACL 2019 Shrimai Prabhumoye, Chris Quirk, Michel Galley

Recent work in neural generation has attracted significant interest in controlling the form of text, such as style, persona, and politeness.

Sentence Text Generation

A Dataset for Document Grounded Conversations

3 code implementations EMNLP 2018 Kangyan Zhou, Shrimai Prabhumoye, Alan W. black

We define "Document Grounded Conversations" as conversations that are about the contents of a specified document.

Style Transfer Through Multilingual and Feedback-Based Back-Translation

no code implementations17 Sep 2018 Shrimai Prabhumoye, Yulia Tsvetkov, Alan W. black, Ruslan Salakhutdinov

Style transfer is the task of transferring an attribute of a sentence (e. g., formality) while maintaining its semantic content.

Attribute Sentence +2

Style Transfer Through Back-Translation

3 code implementations ACL 2018 Shrimai Prabhumoye, Yulia Tsvetkov, Ruslan Salakhutdinov, Alan W. black

We first learn a latent representation of the input sentence which is grounded in a language translation model in order to better preserve the meaning of the sentence while reducing stylistic properties.

Sentence Style Transfer +2

Cannot find the paper you are looking for? You can Submit a new open access paper.