Search Results for author: Armen Aghajanyan

Found 26 papers, 14 papers with code

DOMINO: A Dual-System for Multi-step Visual Language Reasoning

1 code implementation4 Oct 2023 Peifang Wang, Olga Golovneva, Armen Aghajanyan, Xiang Ren, Muhao Chen, Asli Celikyilmaz, Maryam Fazel-Zarandi

By fine-tuning the System-2 module (LLaMA-2 70B) on only a small amount of data on multi-step reasoning, the accuracy of our method is further improved and surpasses the best fully-supervised end-to-end approach by 5. 7% and a pipeline approach with FlanPaLM (540B) by 7. 5% on a challenging dataset with human-authored questions.

Arithmetic Reasoning Language Modelling +2

Jointly Training Large Autoregressive Multimodal Models

1 code implementation27 Sep 2023 Emanuele Aiello, Lili Yu, Yixin Nie, Armen Aghajanyan, Barlas Oguz

In recent years, advances in the large-scale pretraining of language and text-to-image models have revolutionized the field of machine learning.

Image Generation

MEGABYTE: Predicting Million-byte Sequences with Multiscale Transformers

no code implementations NeurIPS 2023 Lili Yu, Dániel Simig, Colin Flaherty, Armen Aghajanyan, Luke Zettlemoyer, Mike Lewis

Autoregressive transformers are spectacular models for short sequences but scale poorly to long sequences such as high-resolution images, podcasts, code, or books.

Density Estimation Language Modelling

Scaling Laws for Generative Mixed-Modal Language Models

no code implementations10 Jan 2023 Armen Aghajanyan, Lili Yu, Alexis Conneau, Wei-Ning Hsu, Karen Hambardzumyan, Susan Zhang, Stephen Roller, Naman Goyal, Omer Levy, Luke Zettlemoyer

To better understand the scaling properties of such mixed-modal models, we conducted over 250 experiments using seven different modalities and model sizes ranging from 8 million to 30 billion, trained on 5-100 billion tokens.

BARTSmiles: Generative Masked Language Models for Molecular Representations

1 code implementation29 Nov 2022 Gayane Chilingaryan, Hovhannes Tamoyan, Ani Tevosyan, Nelly Babayan, Lusine Khondkaryan, Karen Hambardzumyan, Zaven Navoyan, Hrant Khachatrian, Armen Aghajanyan

We then quantitatively show that when applied to the molecular domain, the BART objective learns representations that implicitly encode our downstream tasks of interest.

Retrieval-Augmented Multimodal Language Modeling

no code implementations22 Nov 2022 Michihiro Yasunaga, Armen Aghajanyan, Weijia Shi, Rich James, Jure Leskovec, Percy Liang, Mike Lewis, Luke Zettlemoyer, Wen-tau Yih

To integrate knowledge in a more scalable and modular way, we propose a retrieval-augmented multimodal model, which enables a base multimodal model (generator) to refer to relevant text and images fetched by a retriever from external memory (e. g., documents on the web).

Caption Generation Image Captioning +5

InCoder: A Generative Model for Code Infilling and Synthesis

3 code implementations12 Apr 2022 Daniel Fried, Armen Aghajanyan, Jessy Lin, Sida Wang, Eric Wallace, Freda Shi, Ruiqi Zhong, Wen-tau Yih, Luke Zettlemoyer, Mike Lewis

Our model is the first generative model that is able to directly perform zero-shot code infilling, which we evaluate on challenging tasks such as type inference, comment generation, and variable re-naming.

Code Generation Comment Generation +1

CM3: A Causal Masked Multimodal Model of the Internet

no code implementations19 Jan 2022 Armen Aghajanyan, Bernie Huang, Candace Ross, Vladimir Karpukhin, Hu Xu, Naman Goyal, Dmytro Okhonko, Mandar Joshi, Gargi Ghosh, Mike Lewis, Luke Zettlemoyer

We introduce CM3, a family of causally masked generative models trained over a large corpus of structured multi-modal documents that can contain both text and image tokens.

Entity Disambiguation Entity Linking

VideoCLIP: Contrastive Pre-training for Zero-shot Video-Text Understanding

2 code implementations EMNLP 2021 Hu Xu, Gargi Ghosh, Po-Yao Huang, Dmytro Okhonko, Armen Aghajanyan, Florian Metze, Luke Zettlemoyer, Christoph Feichtenhofer

We present VideoCLIP, a contrastive approach to pre-train a unified model for zero-shot video and text understanding, without using any labels on downstream tasks.

 Ranked #1 on Temporal Action Localization on CrossTask (using extra training data)

Action Segmentation Long Video Retrieval (Background Removed) +4

RETRONLU: Retrieval Augmented Task-Oriented Semantic Parsing

no code implementations NLP4ConvAI (ACL) 2022 Vivek Gupta, Akshat Shrivastava, Adithya Sagar, Armen Aghajanyan, Denis Savenkov

While large pre-trained language models accumulate a lot of knowledge in their parameters, it has been demonstrated that augmenting it with non-parametric retrieval-based memory has a number of benefits from accuracy improvements to data efficiency for knowledge-focused tasks, such as question answering.

Question Answering Retrieval +1

Non-Autoregressive Semantic Parsing for Compositional Task-Oriented Dialog

1 code implementation NAACL 2021 Arun Babu, Akshat Shrivastava, Armen Aghajanyan, Ahmed Aly, Angela Fan, Marjan Ghazvininejad

Semantic parsing using sequence-to-sequence models allows parsing of deeper representations compared to traditional word tagging based models.

Semantic Parsing

Intrinsic Dimensionality Explains the Effectiveness of Language Model Fine-Tuning

2 code implementations ACL 2021 Armen Aghajanyan, Luke Zettlemoyer, Sonal Gupta

Although pretrained language models can be fine-tuned to produce state-of-the-art results for a very wide range of language understanding tasks, the dynamics of this process are not well understood, especially in the low data regime.

 Ranked #1 on Transfer Learning on Amazon Review Polarity (Structure Aware Intrinsic Dimension metric)

Generalization Bounds Language Modelling +3

Conversational Semantic Parsing

no code implementations EMNLP 2020 Armen Aghajanyan, Jean Maillard, Akshat Shrivastava, Keith Diedrick, Mike Haeger, Haoran Li, Yashar Mehdad, Ves Stoyanov, Anuj Kumar, Mike Lewis, Sonal Gupta

In this paper, we propose a semantic representation for such task-oriented conversational systems that can represent concepts such as co-reference and context carryover, enabling comprehensive understanding of queries in a session.

dialog state tracking Semantic Parsing

Better Fine-Tuning by Reducing Representational Collapse

3 code implementations ICLR 2021 Armen Aghajanyan, Akshat Shrivastava, Anchit Gupta, Naman Goyal, Luke Zettlemoyer, Sonal Gupta

Although widely adopted, existing approaches for fine-tuning pre-trained language models have been shown to be unstable across hyper-parameter settings, motivating recent work on trust region methods.

Abstractive Text Summarization Cross-Lingual Natural Language Inference

Pre-training via Paraphrasing

2 code implementations NeurIPS 2020 Mike Lewis, Marjan Ghazvininejad, Gargi Ghosh, Armen Aghajanyan, Sida Wang, Luke Zettlemoyer

The objective noisily captures aspects of paraphrase, translation, multi-document summarization, and information retrieval, allowing for strong zero-shot performance on several tasks.

Document Summarization Document Translation +6

Towards Language Agnostic Universal Representations

no code implementations ACL 2019 Armen Aghajanyan, Xia Song, Saurabh Tiwary

When a bilingual student learns to solve word problems in math, we expect the student to be able to solve these problem in both languages the student is fluent in, even if the math lessons were only taught in one language.


Convolution Aware Initialization

no code implementations21 Feb 2017 Armen Aghajanyan

Initialization of parameters in deep neural networks has been shown to have a big impact on the performance of the networks (Mishkin & Matas, 2015).

Charged Point Normalization: An Efficient Solution to the Saddle Point Problem

no code implementations29 Sep 2016 Armen Aghajanyan

Recently, the problem of local minima in very high dimensional non-convex optimization has been challenged and the problem of saddle points has been introduced.

SoftTarget Regularization: An Effective Technique to Reduce Over-Fitting in Neural Networks

no code implementations21 Sep 2016 Armen Aghajanyan

In this paper we introduce a new form of regularization that guides the learning problem in a way that reduces over-fitting without sacrificing the capacity of the model.

Gravitational Clustering

1 code implementation5 Sep 2015 Armen Aghajanyan

The downfall of many supervised learning algorithms, such as neural networks, is the inherent need for a large amount of training data.

Clustering General Classification

Cannot find the paper you are looking for? You can Submit a new open access paper.