Search Results for author: Jonathan May

Found 84 papers, 31 papers with code

Mega: Moving Average Equipped Gated Attention

5 code implementations • 21 Sep 2022 • Xuezhe Ma, Chunting Zhou, Xiang Kong, Junxian He, Liangke Gui, Graham Neubig, Jonathan May, Luke Zettlemoyer

The design choices in the Transformer attention mechanism, including weak inductive bias and quadratic computational complexity, have limited its application for modeling long sequences.

Ranked #1 on Long-range modeling on LRA

Image Classification Inductive Bias +3

124,527

Paper
Code

Know Thy Strengths: Comprehensive Dialogue State Tracking Diagnostics

2 code implementations • 15 Dec 2021 • Hyundong Cho, Chinnadhurai Sankar, Christopher Lin, Kaushik Ram Sadagopan, Shahin Shayandeh, Asli Celikyilmaz, Jonathan May, Ahmad Beirami

Recent works that revealed the vulnerability of dialogue state tracking (DST) models to distributional shifts have made holistic comparisons on robustness and qualitative analyses increasingly important for understanding their relative performance.

Ranked #4 on Multi-domain Dialogue State Tracking on MULTIWOZ 2.1 (using extra training data)

Dialogue State Tracking Multi-domain Dialogue State Tracking +1

818

Paper
Code

Megalodon: Efficient LLM Pretraining and Inference with Unlimited Context Length

1 code implementation • 12 Apr 2024 • Xuezhe Ma, Xiaomeng Yang, Wenhan Xiong, Beidi Chen, Lili Yu, Hao Zhang, Jonathan May, Luke Zettlemoyer, Omer Levy, Chunting Zhou

The quadratic complexity and weak length extrapolation of Transformers limits their ability to scale to long sequences, and while sub-quadratic solutions like linear attention and state space models exist, they empirically underperform Transformers in pretraining efficiency and downstream task accuracy.

179

Paper
Code

Transfer Learning for Low-Resource Neural Machine Translation

1 code implementation • EMNLP 2016 • Barret Zoph, Deniz Yuret, Jonathan May, Kevin Knight

Ensembling and unknown word replacement add another 2 Bleu which brings the NMT performance on low-resource machine translation close to a strong syntax based machine translation (SBMT) system, exceeding its performance on one language pair.

Low-Resource Neural Machine Translation NMT +2

172

Paper
Code

Luna: Linear Unified Nested Attention

2 code implementations • NeurIPS 2021 • Xuezhe Ma, Xiang Kong, Sinong Wang, Chunting Zhou, Jonathan May, Hao Ma, Luke Zettlemoyer

Specifically, with the first attention function, Luna packs the input sequence into a sequence of fixed length.

Language Modelling Machine Translation +2

102

Paper
Code

WARP: Word-level Adversarial ReProgramming

1 code implementation • ACL 2021 • Karen Hambardzumyan, Hrant Khachatrian, Jonathan May

Transfer learning from pretrained language models recently became the dominant approach for solving many NLP tasks.

Language Modelling Transfer Learning +1

Paper
Code

Cross-Attention is All You Need: Adapting Pretrained Transformers for Machine Translation

1 code implementation • EMNLP 2021 • Mozhdeh Gheini, Xiang Ren, Jonathan May

We study the power of cross-attention in the Transformer architecture within the context of transfer learning for machine translation, and extend the findings of studies into cross-attention when training from scratch.

Machine Translation Transfer Learning +1

Paper
Code

CaSiNo: A Corpus of Campsite Negotiation Dialogues for Automatic Negotiation Systems

1 code implementation • NAACL 2021 • Kushal Chawla, Jaysa Ramirez, Rene Clever, Gale Lucas, Jonathan May, Jonathan Gratch

Automated systems that negotiate with humans have broad applications in pedagogy and conversational AI.

Multi-Task Learning Persuasion Strategies

Paper
Code

NewsEdits: A News Article Revision Dataset and a Document-Level Reasoning Challenge

1 code implementation • 14 Jun 2022 • Alexander Spangher, Xiang Ren, Jonathan May, Nanyun Peng

News article revision histories provide clues to narrative and factual evolution in news articles.

Paper
Code

NewsEdits: A News Article Revision Dataset and a Novel Document-Level Reasoning Challenge

1 code implementation • NAACL 2022 • Alexander Spangher, Xiang Ren, Jonathan May, Nanyun Peng

News article revision histories provide clues to narrative and factual evolution in news articles.

Paper
Code

X-METRA-ADA: Cross-lingual Meta-Transfer Learning Adaptation to Natural Language Understanding and Question Answering

1 code implementation • NAACL 2021 • Meryem M'hamdi, Doo Soon Kim, Franck Dernoncourt, Trung Bui, Xiang Ren, Jonathan May

We extensively evaluate our framework on two challenging cross-lingual NLU tasks: multilingual task-oriented dialog and typologically diverse question answering.

Meta-Learning Natural Language Understanding +4

Paper
Code

RECAP: Retrieval-Enhanced Context-Aware Prefix Encoder for Personalized Dialogue Response Generation

1 code implementation • 12 Jun 2023 • Shuai Liu, Hyundong J. Cho, Marjorie Freedman, Xuezhe Ma, Jonathan May

Endowing chatbots with a consistent persona is essential to an engaging conversation, yet it remains an unresolved challenge.

Response Generation Retrieval

Paper
Code

Grounding Conversations with Improvised Dialogues

1 code implementation • ACL 2020 • Hyundong Cho, Jonathan May

Effective dialogue involves grounding, the process of establishing mutual knowledge that is essential for communication between people.

Paper
Code

Many-to-English Machine Translation Tools, Data, and Pretrained Models

2 code implementations • ACL 2021 • Thamme Gowda, Zhao Zhang, Chris A Mattmann, Jonathan May

While there are more than 7000 languages in the world, most translation research efforts have targeted a few high-resource languages.

Machine Translation Transfer Learning +1

Paper
Code

A Grounded Unsupervised Universal Part-of-Speech Tagger for Low-Resource Languages

1 code implementation • NAACL 2019 • Ronald Cardenas, Ying Lin, Heng Ji, Jonathan May

We also show extrinsically that incorporating our POS tagger into a name tagger leads to state-of-the-art tagging performance in Sinhalese and Kinyarwanda, two languages with nearly no labeled POS data available.

Clustering Decipherment +4

Paper
Code

WinoQueer: A Community-in-the-Loop Benchmark for Anti-LGBTQ+ Bias in Large Language Models

1 code implementation • 26 Jun 2023 • Virginia K. Felkner, Ho-Chun Herbert Chang, Eugene Jang, Jonathan May

We present WinoQueer: a benchmark specifically designed to measure whether large language models (LLMs) encode biases that are harmful to the LGBTQ+ community.

Paper
Code

Finding the Optimal Vocabulary Size for Neural Machine Translation

1 code implementation • Findings of the Association for Computational Linguistics 2020 • Thamme Gowda, Jonathan May

We cast neural machine translation (NMT) as a classification task in an autoregressive setting and analyze the limitations of both classification and autoregression components.

Classification General Classification +3

Paper
Code

Opponent Modeling in Negotiation Dialogues by Related Data Adaptation

1 code implementation • Findings (NAACL) 2022 • Kushal Chawla, Gale M. Lucas, Jonathan May, Jonathan Gratch

A practical model for this task needs to infer these priorities of the opponent on the fly based on partial dialogues as input, without needing additional annotations for training.

Paper
Code

Cross-lingual Lifelong Learning

1 code implementation • 23 May 2022 • Meryem M'hamdi, Xiang Ren, Jonathan May

The longstanding goal of multi-lingual learning has been to develop a universal cross-lingual model that can withstand the changes in multi-lingual data distributions.

Continual Learning Transfer Learning

Paper
Code

Continual Dialogue State Tracking via Example-Guided Question Answering

1 code implementation • 23 May 2023 • Hyundong Cho, Andrea Madotto, Zhaojiang Lin, Khyathi Raghavi Chandu, Satwik Kottur, Jing Xu, Jonathan May, Chinnadhurai Sankar

Dialogue systems are frequently updated to accommodate new services, but naively updating them by continually training with data for new services in diminishing performance on previously learnt services.

Continual Learning Dialogue State Tracking +3

Paper
Code

Comprehensible Context-driven Text Game Playing

2 code implementations • 6 May 2019 • Xusen Yin, Jonathan May

As such, an LSTM-based DQN can take tens of days to finish the training process.

Q-Learning

Paper
Code

Learn How to Cook a New Recipe in a New House: Using Map Familiarization, Curriculum Learning, and Bandit Feedback to Learn Families of Text-Based Adventure Games

1 code implementation • 13 Aug 2019 • Xusen Yin, Jonathan May

We consider the task of learning to play families of text-based computer adventure games, i. e., fully textual environments with a common theme (e. g. cooking) and goal (e. g. prepare a meal from a recipe) but with different specifics; new instances of such games are relatively straightforward for humans to master after a brief exposure to the genre but have been curiously difficult for computer agents to learn.

Common Sense Reasoning Q-Learning

Paper
Code

Challenges in Context-Aware Neural Machine Translation

1 code implementation • 23 May 2023 • Linghao Jin, Jacqueline He, Jonathan May, Xuezhe Ma

Context-aware neural machine translation involves leveraging information beyond sentence-level context to resolve inter-sentential discourse dependencies and improve document-level translation quality, and has given rise to a number of recent techniques.

Machine Translation Sentence +1

Paper
Code

Experience Grounds Language

2 code implementations • EMNLP 2020 • Yonatan Bisk, Ari Holtzman, Jesse Thomason, Jacob Andreas, Yoshua Bengio, Joyce Chai, Mirella Lapata, Angeliki Lazaridou, Jonathan May, Aleksandr Nisnevich, Nicolas Pinto, Joseph Turian

Language understanding research is held back by a failure to relate language to the physical world it describes and to the social interactions it facilitates.

Representation Learning

Paper
Code

Macro-Average: Rare Types Are Important Too

1 code implementation • NAACL 2021 • Thamme Gowda, Weiqiu You, Constantine Lignos, Jonathan May

While traditional corpus-level evaluation metrics for machine translation (MT) correlate well with fluency, they struggle to reflect adequacy.

Cross-Lingual Information Retrieval Machine Translation +2

Paper
Code

"Don't quote me on that": Finding Mixtures of Sources in News Articles

1 code implementation • 19 Apr 2021 • Alexander Spangher, Nanyun Peng, Jonathan May, Emilio Ferrara

Journalists publish statements provided by people, or \textit{sources} to contextualize current events, help voters make informed decisions, and hold powerful individuals accountable.

Clustering

Paper
Code

Identifying Informational Sources in News Articles

1 code implementation • 24 May 2023 • Alexander Spangher, Nanyun Peng, Jonathan May, Emilio Ferrara

News articles are driven by the informational sources journalists use in reporting.

Text Generation

Paper
Code

Learning to Generalize for Sequential Decision Making

1 code implementation • Findings of the Association for Computational Linguistics 2020 • Xusen Yin, Ralph Weischedel, Jonathan May

However, the large amount of computation necessary to adequately train and explore the search space of sequential decision making, under a reinforcement learning paradigm, precludes the inclusion of large contextualized language models, which might otherwise enable the desired generalization ability.

Imitation Learning Natural Language Understanding +2

Paper
Code

Machine Translation Robustness to Natural Asemantic Variation

1 code implementation • 25 May 2022 • Jacob Bremerman, Xiang Ren, Jonathan May

We find that existing MT models fail when presented with NAV data, but we demonstrate strategies to improve performance on NAV by fine-tuning them with human-generated variations.

Machine Translation Translation

Paper
Code

Authorship Style Transfer with Policy Optimization

1 code implementation • 12 Mar 2024 • Shuai Liu, Shantanu Agarwal, Jonathan May

Authorship style transfer aims to rewrite a given text into a specified target while preserving the original meaning in the source.

Style Transfer Transfer Learning

Paper
Code

Recurrent Neural Networks as Weighted Language Recognizers

no code implementations • NAACL 2018 • Yining Chen, Sorcha Gilroy, Andreas Maletti, Jonathan May, Kevin Knight

We investigate the computational complexity of various problems for simple recurrent neural networks (RNNs) as formal models for recognizing weighted languages.

Paper
Add Code

Building a Fine-Grained Entity Typing System Overnight for a New X (X = Language, Domain, Genre)

no code implementations • 10 Mar 2016 • Lifu Huang, Jonathan May, Xiaoman Pan, Heng Ji

Recent research has shown great progress on fine-grained entity typing.

Clustering Entity Typing

Paper
Add Code

Using Syntax-Based Machine Translation to Parse English into Abstract Meaning Representation

no code implementations • 24 Apr 2015 • Michael Pust, Ulf Hermjakob, Kevin Knight, Daniel Marcu, Jonathan May

To make this work, we transform the AMR structure into a form suitable for the mechanics of SBMT and useful for modeling.

Language Modelling Machine Translation +1

Paper
Add Code

Augmenting Statistical Machine Translation with Subword Translation of Out-of-Vocabulary Words

no code implementations • 16 Aug 2018 • Nelson F. Liu, Jonathan May, Michael Pust, Kevin Knight

Most statistical machine translation systems cannot translate words that are unseen in the training data.

Machine Translation Translation

Paper
Add Code

Out-of-the-box Universal Romanization Tool uroman

no code implementations • ACL 2018 • Ulf Hermjakob, Jonathan May, Kevin Knight

We present uroman, a tool for converting text in myriads of languages and scripts such as Chinese, Arabic and Cyrillic into a common Latin-script representation.

Machine Translation

Paper
Add Code

Translating a Language You Don't Know In the Chinese Room

no code implementations • ACL 2018 • Ulf Hermjakob, Jonathan May, Michael Pust, Kevin Knight

In a corruption of John Searle{'}s famous AI thought experiment, the Chinese Room (Searle, 1980), we twist its original intent by enabling humans to translate text, e. g. from Uyghur to English, even if they don{'}t have any prior knowledge of the source language.

Domain Adaptation Language Modelling +3

Paper
Add Code

Cross-lingual Name Tagging and Linking for 282 Languages

no code implementations • ACL 2017 • Xiaoman Pan, Boliang Zhang, Jonathan May, Joel Nothman, Kevin Knight, Heng Ji

The ambitious goal of this work is to develop a cross-lingual name tagging and linking framework for 282 languages that exist in Wikipedia.

Translation Word Translation

Paper
Add Code

ELISA-EDL: A Cross-lingual Entity Extraction, Linking and Localization System

no code implementations • NAACL 2018 • Boliang Zhang, Ying Lin, Xiaoman Pan, Di Lu, Jonathan May, Kevin Knight, Heng Ji

We demonstrate ELISA-EDL, a state-of-the-art re-trainable system to extract entity mentions from low-resource languages, link them to external English knowledge bases, and visualize locations related to disaster topics on a world heatmap.

Entity Extraction using GAN Entity Linking +1

Paper
Add Code

Simple, Fast Noise-Contrastive Estimation for Large RNN Vocabularies

no code implementations • NAACL 2016 • Barret Zoph, Ashish Vaswani, Jonathan May, Kevin Knight

Language Modelling Machine Translation +2

Paper
Add Code

SemEval-2017 Task 9: Abstract Meaning Representation Parsing and Generation

no code implementations • SEMEVAL 2017 • Jonathan May, Jay Priyadarshi

In the generation subtask, participants were asked to generate English sentences given AMR graphs in the news/forum domain.

AMR Parsing Machine Translation

Paper
Add Code

SemEval-2016 Task 8: Meaning Representation Parsing

no code implementations • SEMEVAL 2016 • Jonathan May

Paper
Add Code

Towards Controllable Story Generation

no code implementations • WS 2018 • Nanyun Peng, Marjan Ghazvininejad, Jonathan May, Kevin Knight

We present a general framework of analyzing existing story corpora to generate controllable and creative new stories.

Story Generation

Paper
Add Code

Models of Translation Competitions

no code implementations • ACL 2013 • Mark Hopkins, Jonathan May

Machine Translation Translation

Paper
Add Code

Parsing English into Abstract Meaning Representation Using Syntax-Based Machine Translation

no code implementations • EMNLP 2015 • Michael Pust, Ulf Hermjakob, Kevin Knight, Daniel Marcu, Jonathan May

Ranked #8 on AMR Parsing on LDC2014T12

AMR Parsing Language Modelling +2

Paper
Add Code

High-Precision Abductive Mapping of Multilingual Metaphors

no code implementations • WS 2015 • Jonathan Gordon, Jerry Hobbs, Jonathan May, Fabrizio Morbini

Vocal Bursts Intensity Prediction

Paper
Add Code

A Corpus of Rich Metaphor Annotation

no code implementations • WS 2015 • Jonathan Gordon, Jerry Hobbs, Jonathan May, Michael Mohler, Fabrizio Morbini, Bryan Rink, Marc Tomlinson, Suzanne Wertheim

Paper
Add Code

An Analysis (and an Annotated Corpus) of User Responses to Machine Translation Output

no code implementations • LREC 2012 • Daniele Pighin, Llu{\'\i}s M{\`a}rquez, Jonathan May

We present an annotated resource consisting of open-domain translation requests, automatic translations and user-provided corrections collected from casual users of the translation portal http://reverso. net.

Machine Translation Translation

Paper
Add Code

Cross-lingual Multi-Level Adversarial Transfer to Enhance Low-Resource Name Tagging

no code implementations • NAACL 2019 • Lifu Huang, Heng Ji, Jonathan May

We focus on improving name tagging for low-resource languages using annotations from related languages.

Cross-Lingual Transfer Sentence

Paper
Add Code

Extracting Structured Scholarly Information from the Machine Translation Literature

no code implementations • LREC 2016 • Eunsol Choi, Matic Horvat, Jonathan May, Kevin Knight, Daniel Marcu

Understanding the experimental results of a scientific paper is crucial to understanding its contribution and to comparing it with related work.

Machine Translation Reading Comprehension +1

Paper
Add Code

Translating Translationese: A Two-Step Approach to Unsupervised Machine Translation

no code implementations • ACL 2019 • Nima Pourdamghani, Nada Aldarrab, Marjan Ghazvininejad, Kevin Knight, Jonathan May

Given a rough, word-by-word gloss of a source language sentence, target language natives can uncover the latent, fully-fluent rendering of the translation.

Sentence Translation +2

Paper
Add Code

SARAL: A Low-Resource Cross-Lingual Domain-Focused Information Retrieval System for Effective Rapid Document Triage

no code implementations • ACL 2019 • Elizabeth Boschee, Joel Barry, Jayadev Billa, Marjorie Freedman, Thamme Gowda, Constantine Lignos, Chester Palen-Michel, Michael Pust, Banriskhem Kayang Khonglah, Srikanth Madikeri, Jonathan May, Scott Miller

In this paper we present an end-to-end cross-lingual information retrieval (CLIR) and summarization system for low-resource languages that 1) enables English speakers to search foreign language repositories of text and audio using English queries, 2) summarizes the retrieved documents in English with respect to a particular information need, and 3) provides complete transcriptions and translations as needed.

Cross-Lingual Information Retrieval Machine Translation +2

Paper
Add Code

What Matters for Neural Cross-Lingual Named Entity Recognition: An Empirical Analysis

no code implementations • IJCNLP 2019 • Xiaolei Huang, Jonathan May, Nanyun Peng

While recent work has shown promising results on cross-lingual transfer from high-resource languages to low-resource languages, it is unclear what knowledge is transferred.

Cross-Lingual NER named-entity-recognition +3

Paper
Add Code

A Universal Parent Model for Low-Resource Neural Machine Translation Transfer

no code implementations • 14 Sep 2019 • Mozhdeh Gheini, Jonathan May

In this work, we present a `universal' pre-trained neural parent model with constant vocabulary that can be used as a starting point for training practically any new low-resource language to a fixed target language.

Humanitarian Low-Resource Neural Machine Translation +2

Paper
Add Code

Cross-lingual Joint Entity and Word Embedding to Improve Entity Linking and Parallel Sentence Mining

no code implementations • WS 2019 • Xiaoman Pan, Thamme Gowda, Heng Ji, Jonathan May, Scott Miller

Because this multilingual common space directly relates the semantics of contextual words in the source language to that of entities in the target language, we leverage it for unsupervised cross-lingual entity linking.

Cross-Lingual Entity Linking Entity Linking +1

Paper
Add Code

Contextualized Cross-Lingual Event Trigger Extraction with Minimal Resources

no code implementations • CONLL 2019 • Meryem M{'}hamdi, Marjorie Freedman, Jonathan May

Our work is the first to experiment with two event architecture variants in a cross-lingual setting, to show the effectiveness of contextualized embeddings obtained using BERT, and to explore and analyze its performance on Arabic.

Event Extraction Transfer Learning

Paper
Add Code

Cross-lingual Structure Transfer for Relation and Event Extraction

no code implementations • IJCNLP 2019 • Ananya Subburathinam, Di Lu, Heng Ji, Jonathan May, Shih-Fu Chang, Avirup Sil, Clare Voss

The identification of complex semantic structures such as events and entity relations, already a challenging Information Extraction task, is doubly difficult from sources written in under-resourced and under-annotated languages.

Event Extraction Relation +1

Paper
Add Code

Do Nuclear Submarines Have Nuclear Captains? A Challenge Dataset for Commonsense Reasoning over Adjectives and Objects

no code implementations • IJCNLP 2019 • James Mullenbach, Jonathan Gordon, Nanyun Peng, Jonathan May

This provides evidence that the amount of commonsense knowledge encoded in these language models does not extend far beyond that already baked into the word embeddings.

Word Embeddings

Paper
Add Code

Exploring Early Prediction of Buyer-Seller Negotiation Outcomes

no code implementations • 6 Apr 2020 • Kushal Chawla, Gale Lucas, Jonathan May, Jonathan Gratch

Agents that negotiate with humans find broad applications in pedagogy and conversational AI.

Language Modelling Sentence

Paper
Add Code

Zero-Shot Learning of Text Adventure Games with Sentence-Level Semantics

no code implementations • 6 Apr 2020 • Xusen Yin, Jonathan May

Reinforcement learning algorithms such as Q-learning have shown great promise in training models to learn the optimal action to take for a given system state; a goal in applications with an exploratory or adversarial nature such as task-oriented dialogues or games.

Clustering Q-Learning +2

Paper
Add Code

Cross-lingual Structure Transfer for Zero-resource Event Extraction

no code implementations • LREC 2020 • Di Lu, Ananya Subburathinam, Heng Ji, Jonathan May, Shih-Fu Chang, Avi Sil, Clare Voss

Most of the current cross-lingual transfer learning methods for Information Extraction (IE) have been only applied to name tagging.

Cross-Lingual Transfer Event Extraction +2

Paper
Add Code

Summary-Oriented Question Generation for Informational Queries

no code implementations • ACL (dialdoc) 2021 • Xusen Yin, Li Zhou, Kevin Small, Jonathan May

Our model shows SOTA performance of SQ generation on the NQ dataset (20. 1 BLEU-4).

Natural Questions Question Answering +2

Paper
Add Code

Enabling Low-Resource Transfer Learning across COVID-19 Corpora by Combining Event-Extraction and Co-Training

no code implementations • ACL 2020 • Alex Spangher, er, Nanyun Peng, Jonathan May, Emilio Ferrara

Event Extraction Transfer Learning

Paper
Add Code

Connecting the Dots: Event Graph Schema Induction with Path Language Modeling

no code implementations • EMNLP 2020 • Manling Li, Qi Zeng, Ying Lin, Kyunghyun Cho, Heng Ji, Jonathan May, Nathanael Chambers, Clare Voss

Event schemas can guide our understanding and ability to make predictions with respect to what might happen next.

Language Modelling

Paper
Add Code

Can Sequence-to-Sequence Models Crack Substitution Ciphers?

no code implementations • ACL 2021 • Nada Aldarrab, Jonathan May

Decipherment of historical ciphers is a challenging problem.

Decipherment Language Identification +1

Paper
Add Code

Multitask Learning for Class-Imbalanced Discourse Classification

no code implementations • 2 Jan 2021 • Alexander Spangher, Jonathan May, Sz-Rung Shiang, Lingjia Deng

Small class-imbalanced datasets, common in many high-level semantic tasks like discourse analysis, present a particular challenge to current deep-learning architectures.

Classification General Classification +1

Paper
Add Code

NewsEdits: A Dataset of Revision Histories for News Articles (Technical Report: Data Processing)

no code implementations • 19 Apr 2021 • Alexander Spangher, Jonathan May

In this work, we present, to our knowledge, the first publicly available dataset of news article revision histories, or NewsEdits.

Paper
Add Code

Modeling "Newsworthiness" for Lead-Generation Across Corpora

no code implementations • 19 Apr 2021 • Alexander Spangher, Nanyun Peng, Jonathan May, Emilio Ferrara

Journalists obtain "leads", or story ideas, by reading large corpora of government records: court cases, proposed bills, etc.

Paper
Add Code

StateCensusLaws.org: A Web Application for Consuming and Annotating Legal Discourse Learning

no code implementations • 20 Apr 2021 • Alexander Spangher, Jonathan May

In this work, we create a web application to highlight the output of NLP models trained to parse and label discourse segments in law text.

Paper
Add Code

Viola: A Topic Agnostic Generate-and-Rank Dialogue System

no code implementations • 25 Aug 2021 • Hyundong Cho, Basel Shbita, Kartik Shenoy, Shuai Liu, Nikhil Patel, Hitesh Pindikanti, Jennifer Lee, Jonathan May

We present Viola, an open-domain dialogue system for spoken conversation that uses a topic-agnostic dialogue manager based on a simple generate-and-rank approach.

Paper
Add Code

Salience-Aware Event Chain Modeling for Narrative Understanding

no code implementations • EMNLP 2021 • Xiyang Zhang, Muhao Chen, Jonathan May

Storytelling, whether via fables, news reports, documentaries, or memoirs, can be thought of as the communication of interesting and related events that, taken together, form a concrete process.

Question Answering

Paper
Add Code

Multitask Semi-Supervised Learning for Class-Imbalanced Discourse Classification

no code implementations • EMNLP 2021 • Alexander Spangher, Jonathan May, Sz-Rung Shiang, Lingjia Deng

As labeling schemas evolve over time, small differences can render datasets following older schemas unusable.

Ranked #1 on Text Classification on NewsDiscourse

Text Classification

Paper
Add Code

Explaining Face Presentation Attack Detection Using Natural Language

no code implementations • 8 Nov 2021 • Hengameh Mirzaalian, Mohamed E. Hussein, Leonidas Spinoulas, Jonathan May, Wael Abd-Almageed

Due to the limited amount of annotated data in our study, we apply a light-weight LSTM network as our natural language generation model.

Face Presentation Attack Detection Language Modelling +2

Paper
Add Code

Segmenting Numerical Substitution Ciphers

no code implementations • 25 May 2022 • Nada Aldarrab, Jonathan May

In this work, we propose the first automatic methods to segment those ciphers using Byte Pair Encoding (BPE) and unigram language models.

Language Modelling Segmentation

Paper
Add Code

Know Where You're Going: Meta-Learning for Parameter-Efficient Fine-Tuning

no code implementations • 25 May 2022 • Mozhdeh Gheini, Xuezhe Ma, Jonathan May

A recent family of techniques, dubbed lightweight fine-tuning methods, facilitates parameter-efficient transfer learning by updating only a small set of additional parameters while keeping the parameters of the pretrained language model frozen.

Cross-Lingual NER Language Modelling +3

Paper
Add Code

Investigating the Benefits of Free-Form Rationales

no code implementations • 25 May 2022 • Jiao Sun, Swabha Swayamdipta, Jonathan May, Xuezhe Ma

After controlling for instances where rationales leak the correct answer while not providing additional background knowledge, we find that incorporating only 5% of rationales during training can boost model performance by 47. 22% for CoS-E and 57. 14% for ECQA during inference.

Paper
Add Code

Towards WinoQueer: Developing a Benchmark for Anti-Queer Bias in Large Language Models

no code implementations • 23 Jun 2022 • Virginia K. Felkner, Ho-Chun Herbert Chang, Eugene Jang, Jonathan May

This paper presents exploratory work on whether and to what extent biases against queer and trans people are encoded in large language models (LLMs) such as BERT.

Bias Detection

Paper
Add Code

Augmenting Training Data for Massive Semantic Matching Models in Low-Traffic E-commerce Stores

no code implementations • NAACL (ACL) 2022 • Ashutosh Joshi, Shankar Vishwanath, Choon Teo, Vaclav Petricek, Vishy Vishwanathan, Rahul Bhagat, Jonathan May

We use the Aggregated Label eXtreme Multi-label Classification (AL-XMC) system (Shen et al., 2020) as an example semantic matching model and show via crowd-sourced human judgments that, when the training data is augmented through query reformulations, the quality of AL-XMC improves over a baseline that does not use query reformulation.

Extreme Multi-Label Classification

Paper
Add Code

Checks and Strategies for Enabling Code-Switched Machine Translation

no code implementations • 11 Oct 2022 • Thamme Gowda, Mozhdeh Gheini, Jonathan May

Code-switching is a common phenomenon among multilingual speakers, where alternation between two or more languages occurs within the context of a single conversation.

Data Augmentation Machine Translation +2

Paper
Add Code

Anger Breeds Controversy: Analyzing Controversy and Emotions on Reddit

no code implementations • 1 Dec 2022 • Kai Chen, Zihao He, Rong-Ching Chang, Jonathan May, Kristina Lerman

We collect discussions from a wide variety of topical forums and use emotion detection to recognize a range of emotions from text, including anger, fear, joy, admiration, etc.

Paper
Add Code

CPL-NoViD: Context-Aware Prompt-based Learning for Norm Violation Detection in Online Communities

1 code implementation • 16 May 2023 • Zihao He, Jonathan May, Kristina Lerman

Detecting norm violations in online communities is critical to maintaining healthy and safe spaces for online discussions.

Few-Shot Learning

Paper
Code

Analyzing Norm Violations in Live-Stream Chat

no code implementations • 18 May 2023 • Jihyung Moon, Dong-Ho Lee, Hyundong Cho, Woojeong Jin, Chan Young Park, Minwoo Kim, Jonathan May, Jay Pujara, Sungjoon Park

Previous approaches to detecting toxic language and norm violations have been primarily concerned with conversations from online forums and social media, such as Reddit and Twitter.

Paper
Add Code

Multilingual Sentence-Level Semantic Search using Meta-Distillation Learning

no code implementations • 15 Sep 2023 • Meryem M'hamdi, Jonathan May, Franck Dernoncourt, Trung Bui, Seunghyun Yoon

Our approach leverages meta-distillation learning based on MAML, an optimization-based Model-Agnostic Meta-Learner.

Sentence

Paper
Add Code

Tracking the Newsworthiness of Public Documents

no code implementations • 16 Nov 2023 • Alexander Spangher, Emilio Ferrara, Ben Welsh, Nanyun Peng, Serdar Tumgoren, Jonathan May

Journalists must find stories in huge amounts of textual data (e. g. leaks, bills, press releases) as part of their jobs: determining when and why text becomes news can help us understand coverage patterns and help us build assistive tools.

Retrieval

Paper
Add Code

Can Language Model Moderators Improve the Health of Online Discourse?

no code implementations • 16 Nov 2023 • Hyundong Cho, Shuai Liu, Taiwei Shi, Darpan Jain, Basem Rizk, YuYang Huang, Zixun Lu, Nuan Wen, Jonathan Gratch, Emilio Ferrera, Jonathan May

Human moderation of online conversation is essential to maintaining civility and focus in a dialogue, but is challenging to scale and harmful to moderators.

Language Modelling Text Generation

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.