Search Results for author: Srinivasan Iyer

Found 23 papers, 12 papers with code

MoMa: Efficient Early-Fusion Pre-training with Mixture of Modality-Aware Experts

no code implementations31 Jul 2024 Xi Victoria Lin, Akshat Shrivastava, Liang Luo, Srinivasan Iyer, Mike Lewis, Gargi Ghosh, Luke Zettlemoyer, Armen Aghajanyan

Under a 1-trillion-token training budget, the MoMa 1. 4B model, featuring 4 text experts and 4 image experts, achieves impressive FLOPs savings: 3. 7x overall, with 2. 6x for text and 5. 2x for image processing compared to a compute-equivalent dense baseline, measured by pre-training loss.

Causal Inference Language Modelling

Instruction-tuned Language Models are Better Knowledge Learners

1 code implementation20 Feb 2024 Zhengbao Jiang, Zhiqing Sun, Weijia Shi, Pedro Rodriguez, Chunting Zhou, Graham Neubig, Xi Victoria Lin, Wen-tau Yih, Srinivasan Iyer

The standard recipe for doing so involves continued pre-training on new documents followed by instruction-tuning on question-answer (QA) pairs.

Language Modelling Large Language Model

OPT-IML: Scaling Language Model Instruction Meta Learning through the Lens of Generalization

1 code implementation22 Dec 2022 Srinivasan Iyer, Xi Victoria Lin, Ramakanth Pasunuru, Todor Mihaylov, Daniel Simig, Ping Yu, Kurt Shuster, Tianlu Wang, Qing Liu, Punit Singh Koura, Xian Li, Brian O'Horo, Gabriel Pereyra, Jeff Wang, Christopher Dewan, Asli Celikyilmaz, Luke Zettlemoyer, Ves Stoyanov

To this end, we create OPT-IML Bench: a large benchmark for Instruction Meta-Learning (IML) of 2000 NLP tasks consolidated into task categories from 8 existing benchmarks, and prepare an evaluation framework to measure three types of model generalizations: to tasks from fully held-out categories, to held-out tasks from seen categories, and to held-out instances from seen tasks.

Language Modelling Meta-Learning +2

Complementary Explanations for Effective In-Context Learning

1 code implementation25 Nov 2022 Xi Ye, Srinivasan Iyer, Asli Celikyilmaz, Ves Stoyanov, Greg Durrett, Ramakanth Pasunuru

Large language models (LLMs) have exhibited remarkable capabilities in learning from explanations in prompts, but there has been limited understanding of exactly how these explanations function or why they are effective.

In-Context Learning

Efficient Large Scale Language Modeling with Mixtures of Experts

no code implementations20 Dec 2021 Mikel Artetxe, Shruti Bhosale, Naman Goyal, Todor Mihaylov, Myle Ott, Sam Shleifer, Xi Victoria Lin, Jingfei Du, Srinivasan Iyer, Ramakanth Pasunuru, Giri Anantharaman, Xian Li, Shuohui Chen, Halil Akin, Mandeep Baines, Louis Martin, Xing Zhou, Punit Singh Koura, Brian O'Horo, Jeff Wang, Luke Zettlemoyer, Mona Diab, Zornitsa Kozareva, Ves Stoyanov

This paper presents a detailed empirical study of how autoregressive MoE language models scale in comparison with dense models in a wide range of settings: in- and out-of-domain language modeling, zero- and few-shot priming, and full-shot fine-tuning.

Language Modelling

Do Language Models Have Beliefs? Methods for Detecting, Updating, and Visualizing Model Beliefs

1 code implementation26 Nov 2021 Peter Hase, Mona Diab, Asli Celikyilmaz, Xian Li, Zornitsa Kozareva, Veselin Stoyanov, Mohit Bansal, Srinivasan Iyer

In this paper, we discuss approaches to detecting when models have beliefs about the world, and we improve on methods for updating model beliefs to be more truthful, with a focus on methods based on learned optimizers or hypernetworks.

RECONSIDER: Improved Re-Ranking using Span-Focused Cross-Attention for Open Domain Question Answering

no code implementations NAACL 2021 Srinivasan Iyer, Sewon Min, Yashar Mehdad, Wen-tau Yih

State-of-the-art Machine Reading Comprehension (MRC) models for Open-domain Question Answering (QA) are typically trained for span selection using distantly supervised positive examples and heuristically retrieved negative examples.

Machine Reading Comprehension Natural Questions +3

EASE: Extractive-Abstractive Summarization with Explanations

no code implementations14 May 2021 Haoran Li, Arash Einolghozati, Srinivasan Iyer, Bhargavi Paranjape, Yashar Mehdad, Sonal Gupta, Marjan Ghazvininejad

Current abstractive summarization systems outperform their extractive counterparts, but their widespread adoption is inhibited by the inherent lack of interpretability.

Abstractive Text Summarization Document Summarization +1

FiD-Ex: Improving Sequence-to-Sequence Models for Extractive Rationale Generation

no code implementations EMNLP 2021 Kushal Lakhotia, Bhargavi Paranjape, Asish Ghoshal, Wen-tau Yih, Yashar Mehdad, Srinivasan Iyer

Natural language (NL) explanations of model predictions are gaining popularity as a means to understand and verify decisions made by large black-box pre-trained models, for NLP tasks such as Question Answering (QA) and Fact Verification.

Decoder Fact Verification +2

Human Evaluation of Spoken vs. Visual Explanations for Open-Domain QA

no code implementations30 Dec 2020 Ana Valeria Gonzalez, Gagan Bansal, Angela Fan, Robin Jia, Yashar Mehdad, Srinivasan Iyer

While research on explaining predictions of open-domain QA systems (ODQA) to users is gaining momentum, most works have failed to evaluate the extent to which explanations improve user trust.

RECONSIDER: Re-Ranking using Span-Focused Cross-Attention for Open Domain Question Answering

1 code implementation21 Oct 2020 Srinivasan Iyer, Sewon Min, Yashar Mehdad, Wen-tau Yih

State-of-the-art Machine Reading Comprehension (MRC) models for Open-domain Question Answering (QA) are typically trained for span selection using distantly supervised positive examples and heuristically retrieved negative examples.

Machine Reading Comprehension Natural Questions +3

Efficient One-Pass End-to-End Entity Linking for Questions

3 code implementations EMNLP 2020 Belinda Z. Li, Sewon Min, Srinivasan Iyer, Yashar Mehdad, Wen-tau Yih

We present ELQ, a fast end-to-end entity linking model for questions, which uses a biencoder to jointly perform mention detection and linking in one pass.

Entity Linking Question Answering

DeLighT: Deep and Light-weight Transformer

2 code implementations ICLR 2021 Sachin Mehta, Marjan Ghazvininejad, Srinivasan Iyer, Luke Zettlemoyer, Hannaneh Hajishirzi

We introduce a deep and light-weight transformer, DeLighT, that delivers similar or better performance than standard transformer-based models with significantly fewer parameters.

Language Modelling Machine Translation +1

JuICe: A Large Scale Distantly Supervised Dataset for Open Domain Context-based Code Generation

2 code implementations IJCNLP 2019 Rajas Agashe, Srinivasan Iyer, Luke Zettlemoyer

Interactive programming with interleaved code snippet cells and natural language markdown is recently gaining popularity in the form of Jupyter notebooks, which accelerate prototyping and collaboration.

Code Generation

Learning Programmatic Idioms for Scalable Semantic Parsing

no code implementations IJCNLP 2019 Srinivasan Iyer, Alvin Cheung, Luke Zettlemoyer

Programmers typically organize executable source code using high-level coding patterns or idiomatic structures such as nested loops, exception handlers and recursive blocks, rather than as individual code tokens.

Code Generation Semantic Parsing

Mapping Language to Code in Programmatic Context

1 code implementation EMNLP 2018 Srinivasan Iyer, Ioannis Konstas, Alvin Cheung, Luke Zettlemoyer

To study this phenomenon, we introduce the task of generating class member functions given English documentation and the programmatic context provided by the rest of the class.

Decoder

Neural Semantic Parsing

no code implementations ACL 2018 Matt Gardner, Pradeep Dasigi, Srinivasan Iyer, Alane Suhr, Luke Zettlemoyer

Semantic parsing, the study of translating natural language utterances into machine-executable programs, is a well-established research area and has applications in question answering, instruction following, voice assistants, and code generation.

Code Generation Instruction Following +4

Learning to Map Context-Dependent Sentences to Executable Formal Queries

1 code implementation NAACL 2018 Alane Suhr, Srinivasan Iyer, Yoav Artzi

We propose a context-dependent model to map utterances within an interaction to executable formal queries.

Learning a Neural Semantic Parser from User Feedback

no code implementations ACL 2017 Srinivasan Iyer, Ioannis Konstas, Alvin Cheung, Jayant Krishnamurthy, Luke Zettlemoyer

We present an approach to rapidly and easily build natural language interfaces to databases for new domains, whose performance improves over time based on user feedback, and requires minimal intervention.

SQL Parsing

Cannot find the paper you are looking for? You can Submit a new open access paper.