Search Results for author: Anna Rumshisky

Found 59 papers, 11 papers with code

ReLoRA: High-Rank Training Through Low-Rank Updates

3 code implementations11 Jul 2023 Vladislav Lialin, Namrata Shivagunde, Sherin Muckatira, Anna Rumshisky

Despite the dominance and effectiveness of scaling, resulting in large networks with hundreds of billions of parameters, the necessity to train overparameterized models remains poorly understood, while training costs grow exponentially.

AlexaTM 20B: Few-Shot Learning Using a Large-Scale Multilingual Seq2Seq Model

1 code implementation2 Aug 2022 Saleh Soltan, Shankar Ananthakrishnan, Jack FitzGerald, Rahul Gupta, Wael Hamza, Haidar Khan, Charith Peris, Stephen Rawls, Andy Rosenbaum, Anna Rumshisky, Chandana Satya Prakash, Mukund Sridhar, Fabian Triefenbach, Apurv Verma, Gokhan Tur, Prem Natarajan

In this work, we demonstrate that multilingual large-scale sequence-to-sequence (seq2seq) models, pre-trained on a mixture of denoising and Causal Language Modeling (CLM) tasks, are more efficient few-shot learners than decoder-only models on various tasks.

Causal Language Modeling Common Sense Reasoning +8

Multi-Stream Transformers

1 code implementation21 Jul 2021 Mikhail Burtsev, Anna Rumshisky

Transformer-based encoder-decoder models produce a fused token-wise representation after every encoder layer.

Adversarial Decomposition of Text Representation

2 code implementations NAACL 2019 Alexey Romanov, Anna Rumshisky, Anna Rogers, David Donahue

We show that the proposed method is capable of fine-grained controlled change of these aspects of the input sentence.

Sentence

Triad-based Neural Network for Coreference Resolution

1 code implementation COLING 2018 Yuanliang Meng, Anna Rumshisky

We propose a triad-based neural network system that generates affinity scores between entity mentions for coreference resolution.

Clustering coreference-resolution

Injecting Hierarchy with U-Net Transformers

2 code implementations16 Oct 2019 David Donahue, Vladislav Lialin, Anna Rumshisky

The Transformer architecture has become increasingly popular over the past two years, owing to its impressive performance on a number of natural language processing (NLP) tasks.

Machine Translation

Life after BERT: What do Other Muppets Understand about Language?

1 code implementation ACL 2022 Vladislav Lialin, Kevin Zhao, Namrata Shivagunde, Anna Rumshisky

Existing pre-trained transformer analysis works usually focus only on one or two model families at a time, overlooking the variability of the architecture and pre-training objectives.

Down and Across: Introducing Crossword-Solving as a New NLP Benchmark

1 code implementation ACL 2022 Saurabh Kulshreshtha, Olga Kovaleva, Namrata Shivagunde, Anna Rumshisky

Solving crossword puzzles requires diverse reasoning capabilities, access to a vast amount of knowledge about language and the world, and the ability to satisfy the constraints imposed by the structure of the puzzle.

Natural Language Understanding Open-Domain Question Answering +1

Larger Probes Tell a Different Story: Extending Psycholinguistic Datasets Via In-Context Learning

1 code implementation29 Mar 2023 Namrata Shivagunde, Vladislav Lialin, Anna Rumshisky

Finally, we observe that while GPT3 has generated all the examples in ROLE-1500 is only able to solve 24. 6% of them during probing.

In-Context Learning Language Modelling +2

Honey, I Shrunk the Language: Language Model Behavior at Reduced Scale

1 code implementation26 May 2023 Vijeta Deshpande, Dan Pechi, Shree Thatte, Vladislav Lialin, Anna Rumshisky

The majority of recent scaling laws studies focused on high-compute high-parameter count settings, leaving the question of when these abilities begin to emerge largely unanswered.

Language Modelling Masked Language Modeling

Here's My Point: Joint Pointer Architecture for Argument Mining

no code implementations EMNLP 2017 Peter Potash, Alexey Romanov, Anna Rumshisky

One of the major goals in automated argumentation mining is to uncover the argument structure present in argumentative text.

Argument Mining

Forced to Learn: Discovering Disentangled Representations Without Exhaustive Labels

no code implementations1 May 2017 Alexey Romanov, Anna Rumshisky

Learning a better representation with neural networks is a challenging problem, which was tackled extensively from different prospectives in the past few years.

Clustering

#HashtagWars: Learning a Sense of Humor

no code implementations9 Dec 2016 Peter Potash, Alexey Romanov, Anna Rumshisky

Our best supervised system achieved 63. 7% accuracy, suggesting that this task is much more difficult than comparable humor detection tasks.

Humor Detection

Evaluating Creative Language Generation: The Case of Rap Lyric Ghostwriting

no code implementations WS 2018 Peter Potash, Alexey Romanov, Anna Rumshisky

The goal of this paper is to develop evaluation methods for one such task, ghostwriting of rap lyrics, and to provide an explicit, quantifiable foundation for the goals and future directions of this task.

Text Generation

Normalization of Relative and Incomplete Temporal Expressions in Clinical Narratives

no code implementations16 Oct 2015 Weiyi Sun, Anna Rumshisky, Ozlem Uzuner

We analyze the RI-TIMEXes in temporally annotated corpora and propose two hypotheses regarding the normalization of RI-TIMEXes in the clinical narrative domain: the anchor point hypothesis and the anchor relation hypothesis.

Classification General Classification +4

Adversarial Text Generation Without Reinforcement Learning

no code implementations11 Oct 2018 David Donahue, Anna Rumshisky

This is largely because sequences of text are discrete, and thus gradients cannot propagate from the discriminator to the generator.

Adversarial Text reinforcement-learning +3

SemEval-2017 Task 6: \#HashtagWars: Learning a Sense of Humor

no code implementations SEMEVAL 2017 Peter Potash, Alexey Romanov, Anna Rumshisky

This paper describes a new shared task for humor understanding that attempts to eschew the ubiquitous binary approach to humor detection and focus on comparative humor ranking instead.

Humor Detection

Towards Debate Automation: a Recurrent Model for Predicting Debate Winners

no code implementations EMNLP 2017 Peter Potash, Anna Rumshisky

In this paper we introduce a practical first step towards the creation of an automated debate agent: a state-of-the-art recurrent predictive model for predicting debate winners.

Text Generation

Tracking Bias in News Sources Using Social Media: the Russia-Ukraine Maidan Crisis of 2013--2014

no code implementations WS 2017 Peter Potash, Alexey Romanov, Anna Rumshisky, Mikhail Gronas

We show that on the task of predicting which side is likely to prefer a given article, a Naive Bayes classifier can record 90. 3{\%} accuracy looking only at domain names of the news sources.

RuSentiment: An Enriched Sentiment Analysis Dataset for Social Media in Russian

no code implementations COLING 2018 Anna Rogers, Alexey Romanov, Anna Rumshisky, Svitlana Volkova, Mikhail Gronas, Alex Gribov

This paper presents RuSentiment, a new dataset for sentiment analysis of social media posts in Russian, and a new set of comprehensive annotation guidelines that are extensible to other languages.

Active Learning General Classification +2

Forced Apart: Discovering Disentangled Representations Without Exhaustive Labels

no code implementations ICLR 2018 Alexey Romanov, Anna Rumshisky

Learning a better representation with neural networks is a challenging problem, which has been tackled from different perspectives in the past few years.

Clustering

Word Sense Inventories by Non-Experts.

no code implementations LREC 2012 Anna Rumshisky, Nick Botchan, Sophie Kushkuley, James Pustejovsky

In this paper, we explore different strategies for implementing a crowdsourcing methodology for a single-step construction of an empirically-derived sense inventory and the corresponding sense-annotated corpus.

Word Sense Disambiguation

What's in a Name? Reducing Bias in Bios without Access to Protected Attributes

no code implementations NAACL 2019 Alexey Romanov, Maria De-Arteaga, Hanna Wallach, Jennifer Chayes, Christian Borgs, Alexandra Chouldechova, Sahin Geyik, Krishnaram Kenthapadi, Anna Rumshisky, Adam Tauman Kalai

In the context of mitigating bias in occupation classification, we propose a method for discouraging correlation between the predicted probability of an individual's true occupation and a word embedding of their name.

Word Embeddings

Revealing the Dark Secrets of BERT

no code implementations IJCNLP 2019 Olga Kovaleva, Alexey Romanov, Anna Rogers, Anna Rumshisky

BERT-based architectures currently give state-of-the-art performance on many NLP tasks, but little is known about the exact mechanisms that contribute to its success.

Solving Math Word Problems with Double-Decoder Transformer

no code implementations28 Aug 2019 Yuanliang Meng, Anna Rumshisky

This paper proposes a Transformer-based model to generate equations for math word problems.

Math reinforcement-learning +1

NarrativeTime: Dense Temporal Annotation on a Timeline

no code implementations29 Aug 2019 Anna Rogers, Marzena Karpinska, Ankita Gupta, Vladislav Lialin, Gregory Smelkov, Anna Rumshisky

For the past decade, temporal annotation has been sparse: only a small portion of event pairs in a text was annotated.

Chunking

Memory-Augmented Recurrent Networks for Dialogue Coherence

no code implementations16 Oct 2019 David Donahue, Yuanliang Meng, Anna Rumshisky

The first design features a sequence-to-sequence architecture with two separate NTM modules, one for each participant in the conversation.

Language Modelling

Calls to Action on Social Media: Detection, Social Impact, and Censorship Potential

no code implementations WS 2019 Anna Rogers, Olga Kovaleva, Anna Rumshisky

Calls to action on social media are known to be effective means of mobilization in social movements, and a frequent target of censorship.

A Primer in BERTology: What we know about how BERT works

no code implementations27 Feb 2020 Anna Rogers, Olga Kovaleva, Anna Rumshisky

Transformer-based models have pushed state of the art in many areas of NLP, but our understanding of what is behind their success is still limited.

When BERT Plays the Lottery, All Tickets Are Winning

no code implementations EMNLP 2020 Sai Prasanna, Anna Rogers, Anna Rumshisky

Large Transformer-based models were shown to be reducible to a smaller number of self-attention heads and layers.

Towards Visual Dialog for Radiology

no code implementations WS 2020 Olga Kovaleva, Chaitanya Shivade, Satyan Kashyap, a, Karina Kanjaria, Joy Wu, Deddeh Ballah, Adam Coy, Alex Karargyris, ros, Yufan Guo, David Beymer Beymer, Anna Rumshisky, V Mukherjee, ana Mukherjee

Using MIMIC-CXR, an openly available database of chest X-ray images, we construct both a synthetic and a real-world dataset and provide baseline scores achieved by state-of-the-art models.

Question Answering Visual Dialog +1

Update Frequently, Update Fast: Retraining Semantic Parsing Systems in a Fraction of Time

no code implementations15 Oct 2020 Vladislav Lialin, Rahul Goel, Andrey Simanovsky, Anna Rumshisky, Rushin Shah

To reduce training time, one can fine-tune the previously trained model on each patch, but naive fine-tuning exhibits catastrophic forgetting - degradation of the model performance on the data not represented in the data patch.

Continual Learning Goal-Oriented Dialogue Systems +1

An Efficient DP-SGD Mechanism for Large Scale NLP Models

no code implementations14 Jul 2021 Christophe Dupuy, Radhika Arava, Rahul Gupta, Anna Rumshisky

However, the data used to train NLU models may contain private information such as addresses or phone numbers, particularly when drawn from human subjects.

Natural Language Understanding Privacy Preserving

A guide to the dataset explosion in QA, NLI, and commonsense reasoning

no code implementations COLING 2020 Anna Rogers, Anna Rumshisky

Question answering, natural language inference and commonsense reasoning are increasingly popular as general NLP system benchmarks, driving both modeling and dataset work.

Natural Language Inference Question Answering

Federated Learning with Noisy User Feedback

no code implementations NAACL 2022 Rahul Sharma, Anil Ramakrishna, Ansel MacLaughlin, Anna Rumshisky, Jimit Majmudar, Clement Chung, Salman Avestimehr, Rahul Gupta

Federated learning (FL) has recently emerged as a method for training ML models on edge devices using sensitive user data and is seen as a way to mitigate concerns over data privacy.

Federated Learning text-classification +1

Controlled Data Generation via Insertion Operations for NLU

no code implementations NAACL (ACL) 2022 Manoj Kumar, Yuval Merhav, Haidar Khan, Rahul Gupta, Anna Rumshisky, Wael Hamza

Use of synthetic data is rapidly emerging as a realistic alternative to manually annotating live traffic for industry-scale model building.

intent-classification Intent Classification +4

On Task-Adaptive Pretraining for Dialogue Response Selection

no code implementations8 Oct 2022 Tzu-Hsiang Lin, Ta-Chung Chi, Anna Rumshisky

Recent advancements in dialogue response selection (DRS) are based on the \textit{task-adaptive pre-training (TAP)} approach, by first initializing their model with BERT~\cite{devlin-etal-2019-bert}, and adapt to dialogue data with dialogue-specific or fine-grained pre-training tasks.

Reasoning Circuits: Few-shot Multihop Question Generation with Structured Rationales

no code implementations15 Nov 2022 Saurabh Kulshreshtha, Anna Rumshisky

Multi-hop Question Generation is the task of generating questions which require the reader to reason over and combine information spread across multiple passages using several reasoning steps.

Language Modelling Question Generation +1

Scaling Down to Scale Up: A Guide to Parameter-Efficient Fine-Tuning

no code implementations28 Mar 2023 Vladislav Lialin, Vijeta Deshpande, Anna Rumshisky

This paper presents a systematic overview and comparison of parameter-efficient fine-tuning methods covering over 40 papers published between February 2019 and February 2023.

Scalable and Accurate Self-supervised Multimodal Representation Learning without Aligned Video and Text Data

no code implementations4 Apr 2023 Vladislav Lialin, Stephen Rawls, David Chan, Shalini Ghosh, Anna Rumshisky, Wael Hamza

Currently popular video-text data mining approach via automatic speech recognition (ASR) used in HowTo100M provides low-quality captions that often do not refer to the video content.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +5

Recipes for Sequential Pre-training of Multilingual Encoder and Seq2Seq Models

no code implementations14 Jun 2023 Saleh Soltan, Andy Rosenbaum, Tobias Falke, Qin Lu, Anna Rumshisky, Wael Hamza

(2) Conversely, using an encoder to warm-start seq2seq training, we show that by unfreezing the encoder partway through training, we can match task performance of a from-scratch seq2seq model.

Language Modelling Masked Language Modeling

Let's Reinforce Step by Step

no code implementations10 Nov 2023 Sarah Pan, Vladislav Lialin, Sherin Muckatira, Anna Rumshisky

While recent advances have boosted LM proficiency in linguistic benchmarks, LMs consistently struggle to reason correctly on complex tasks like mathematics.

GSM8K Logical Reasoning +2

Prompt Perturbation Consistency Learning for Robust Language Models

no code implementations24 Feb 2024 Yao Qiang, Subhrangshu Nandi, Ninareh Mehrabi, Greg Ver Steeg, Anoop Kumar, Anna Rumshisky, Aram Galstyan

However, their performance on sequence labeling tasks such as intent classification and slot filling (IC-SF), which is a central component in personal assistant systems, lags significantly behind discriminative models.

Data Augmentation intent-classification +6

Cannot find the paper you are looking for? You can Submit a new open access paper.