Search Results for author: Sharan Narang

Found 28 papers, 20 papers with code

Effective Long-Context Scaling of Foundation Models

1 code implementation27 Sep 2023 Wenhan Xiong, Jingyu Liu, Igor Molybog, Hejia Zhang, Prajjwal Bhargava, Rui Hou, Louis Martin, Rashi Rungta, Karthik Abinav Sankararaman, Barlas Oguz, Madian Khabsa, Han Fang, Yashar Mehdad, Sharan Narang, Kshitiz Malik, Angela Fan, Shruti Bhosale, Sergey Edunov, Mike Lewis, Sinong Wang, Hao Ma

We also examine the impact of various design choices in the pretraining process, including the data mix and the training curriculum of sequence lengths -- our ablation experiments suggest that having abundant long texts in the pretrain dataset is not the key to achieving strong performance, and we empirically verify that long context continual pretraining is more efficient and similarly effective compared to pretraining from scratch with long sequences.

Continual Pretraining Language Modelling

UniMax: Fairer and more Effective Language Sampling for Large-Scale Multilingual Pretraining

1 code implementation18 Apr 2023 Hyung Won Chung, Noah Constant, Xavier Garcia, Adam Roberts, Yi Tay, Sharan Narang, Orhan Firat

As part of our contribution, we release: (i) an improved and refreshed mC4 multilingual corpus consisting of 29 trillion characters across 107 languages, and (ii) a suite of pretrained umT5 model checkpoints trained with UniMax sampling.

Character-Aware Models Improve Visual Text Rendering

1 code implementation20 Dec 2022 Rosanne Liu, Dan Garrette, Chitwan Saharia, William Chan, Adam Roberts, Sharan Narang, Irina Blok, RJ Mical, Mohammad Norouzi, Noah Constant

In the text-only domain, we find that character-aware models provide large gains on a novel spelling task (WikiSpell).

Image Generation

Towards Better Few-Shot and Finetuning Performance with Forgetful Causal Language Models

no code implementations24 Oct 2022 Hao liu, Xinyang Geng, Lisa Lee, Igor Mordatch, Sergey Levine, Sharan Narang, Pieter Abbeel

Large language models (LLM) trained using the next-token-prediction objective, such as GPT3 and PaLM, have revolutionized natural language processing in recent years by showing impressive zero-shot and few-shot capabilities across a wide range of tasks.

Language Modelling Natural Language Inference +1

Understanding HTML with Large Language Models

no code implementations8 Oct 2022 Izzeddin Gur, Ofir Nachum, Yingjie Miao, Mustafa Safdari, Austin Huang, Aakanksha Chowdhery, Sharan Narang, Noah Fiedel, Aleksandra Faust

We contribute HTML understanding models (fine-tuned LLMs) and an in-depth analysis of their capabilities under three tasks: (i) Semantic Classification of HTML elements, (ii) Description Generation for HTML inputs, and (iii) Autonomous Web Navigation of HTML pages.

Autonomous Web Navigation Retrieval

Self-Consistency Improves Chain of Thought Reasoning in Language Models

1 code implementation21 Mar 2022 Xuezhi Wang, Jason Wei, Dale Schuurmans, Quoc Le, Ed Chi, Sharan Narang, Aakanksha Chowdhery, Denny Zhou

Chain-of-thought prompting combined with pre-trained large language models has achieved encouraging results on complex reasoning tasks.

Ranked #76 on Arithmetic Reasoning on GSM8K (using extra training data)

Arithmetic Reasoning GSM8K +3

Scale Efficiently: Insights from Pretraining and Finetuning Transformers

no code implementations ICLR 2022 Yi Tay, Mostafa Dehghani, Jinfeng Rao, William Fedus, Samira Abnar, Hyung Won Chung, Sharan Narang, Dani Yogatama, Ashish Vaswani, Donald Metzler

The key findings of this paper are as follows: (1) we show that aside from only the model size, model shape matters for downstream fine-tuning, (2) scaling protocols operate differently at different compute regions, (3) widely adopted T5-base and T5-large sizes are Pareto-inefficient.

Scale Efficiently: Insights from Pre-training and Fine-tuning Transformers

3 code implementations22 Sep 2021 Yi Tay, Mostafa Dehghani, Jinfeng Rao, William Fedus, Samira Abnar, Hyung Won Chung, Sharan Narang, Dani Yogatama, Ashish Vaswani, Donald Metzler

The key findings of this paper are as follows: (1) we show that aside from only the model size, model shape matters for downstream fine-tuning, (2) scaling protocols operate differently at different compute regions, (3) widely adopted T5-base and T5-large sizes are Pareto-inefficient.

Do Transformer Modifications Transfer Across Implementations and Applications?

1 code implementation EMNLP 2021 Sharan Narang, Hyung Won Chung, Yi Tay, William Fedus, Thibault Fevry, Michael Matena, Karishma Malkan, Noah Fiedel, Noam Shazeer, Zhenzhong Lan, Yanqi Zhou, Wei Li, Nan Ding, Jake Marcus, Adam Roberts, Colin Raffel

The research community has proposed copious modifications to the Transformer architecture since it was introduced over three years ago, relatively few of which have seen widespread adoption.

On Task-Level Dialogue Composition of Generative Transformer Model

1 code implementation EMNLP (insights) 2020 Prasanna Parthasarathi, Arvind Neelakantan, Sharan Narang

In this work, we begin by studying the effect of training human-human task-oriented dialogues towards improving the ability to compose multiple tasks on Transformer generative models.

Response Generation Task-Oriented Dialogue Systems

WT5?! Training Text-to-Text Models to Explain their Predictions

2 code implementations30 Apr 2020 Sharan Narang, Colin Raffel, Katherine Lee, Adam Roberts, Noah Fiedel, Karishma Malkan

Neural networks have recently achieved human-level performance on various challenging natural language processing (NLP) tasks, but it is notoriously difficult to understand why a neural network produced a particular prediction.

Neural Assistant: Joint Action Prediction, Response Generation, and Latent Knowledge Reasoning

1 code implementation31 Oct 2019 Arvind Neelakantan, Semih Yavuz, Sharan Narang, Vishaal Prasad, Ben Goodrich, Daniel Duckworth, Chinnadhurai Sankar, Xifeng Yan

In this paper, we develop Neural Assistant: a single neural network model that takes conversation history and an external knowledge source as input and jointly produces both text response and action to be taken by the system as output.

Response Generation Retrieval +1

Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer

50 code implementations arXiv 2019 Colin Raffel, Noam Shazeer, Adam Roberts, Katherine Lee, Sharan Narang, Michael Matena, Yanqi Zhou, Wei Li, Peter J. Liu

Transfer learning, where a model is first pre-trained on a data-rich task before being fine-tuned on a downstream task, has emerged as a powerful technique in natural language processing (NLP).

Answer Generation Common Sense Reasoning +11

A Proposed Hierarchy of Deep Learning Tasks

no code implementations27 Sep 2018 Joel Hestness, Sharan Narang, Newsha Ardalani, Heewoo Jun, Hassan Kianinejad, Md. Mostofa Ali Patwary, Yang Yang, Yanqi Zhou, Gregory Diamos, Kenneth Church

As the pace of deep learning innovation accelerates, it becomes increasingly important to organize the space of problems by relative difficultly.

Deep Learning Scaling is Predictable, Empirically

no code implementations1 Dec 2017 Joel Hestness, Sharan Narang, Newsha Ardalani, Gregory Diamos, Heewoo Jun, Hassan Kianinejad, Md. Mostofa Ali Patwary, Yang Yang, Yanqi Zhou

As DL application domains grow, we would like a deeper understanding of the relationships between training set size, computational scale, and model accuracy improvements to advance the state-of-the-art.

Language Modelling Machine Translation +3

Block-Sparse Recurrent Neural Networks

no code implementations ICLR 2018 Sharan Narang, Eric Undersander, Gregory Diamos

Even though sparse operations need less compute and memory relative to their dense counterparts, the speed-up observed by using sparse operations is less than expected on different hardware platforms.

Language Modelling Machine Translation +3

Exploring Sparsity in Recurrent Neural Networks

1 code implementation17 Apr 2017 Sharan Narang, Erich Elsen, Gregory Diamos, Shubho Sengupta

Benchmarks show that using our technique model size can be reduced by 90% and speed-up is around 2x to 7x.

Cannot find the paper you are looking for? You can Submit a new open access paper.