Search Results for author: Jonathan K. Kummerfeld

Found 41 papers, 21 papers with code

Iterative Feature Mining for Constraint-Based Data Collection to Increase Data Diversity and Model Robustness

no code implementations • EMNLP 2020 • Stefan Larson, Anthony Zheng, Anish Mahendran, Rishi Tekriwal, Adrian Cheung, Eric Guldan, Kevin Leach, Jonathan K. Kummerfeld

Diverse data is crucial for training robust models, but crowdsourced text often lacks diversity as workers tend to write simple variations from prompts.

intent-classification Intent Classification +3

Paper
Add Code

Supporting Sensemaking of Large Language Model Outputs at Scale

no code implementations • 24 Jan 2024 • Katy Ilonka Gero, Chelse Swoopes, Ziwei Gu, Jonathan K. Kummerfeld, Elena L. Glassman

Large language models (LLMs) are capable of generating multiple responses to a single prompt, yet little effort has been expended to help end-users or system designers make use of this capability.

Language Modelling Large Language Model

Paper
Add Code

A Mechanistic Understanding of Alignment Algorithms: A Case Study on DPO and Toxicity

1 code implementation • 3 Jan 2024 • Andrew Lee, Xiaoyan Bai, Itamar Pres, Martin Wattenberg, Jonathan K. Kummerfeld, Rada Mihalcea

While alignment algorithms are now commonly used to tune pre-trained language models towards a user's preferences, we lack explanations for the underlying mechanisms in which models become ``aligned'', thus making it difficult to explain phenomena like jailbreaks.

Language Modelling

Paper
Code

Interactive Text-to-SQL Generation via Editable Step-by-Step Explanations

1 code implementation • 12 May 2023 • Yuan Tian, Zheng Zhang, Zheng Ning, Toby Jia-Jun Li, Jonathan K. Kummerfeld, Tianyi Zhang

Many techniques have been proposed to automatically generate SQL from natural language, but they suffer from two issues: (1) they still make many mistakes, particularly for complex queries, and (2) they do not provide a flexible way for non-expert users to validate and refine incorrect queries.

Text-To-SQL

Paper
Code

Augmenting Task-Oriented Dialogue Systems with Relation Extraction

no code implementations • 24 Oct 2022 • Andrew Lee, Zhenguo Chen, Kevin Leach, Jonathan K. Kummerfeld

The standard task-oriented dialogue pipeline uses intent classification and slot-filling to interpret user utterances.

intent-classification Intent Classification +6

Paper
Add Code

Using Paraphrases to Study Properties of Contextual Embeddings

no code implementations • NAACL 2022 • Laura Burdick, Jonathan K. Kummerfeld, Rada Mihalcea

We use paraphrases as a unique source of data to analyze contextualized embeddings, with a particular focus on BERT.

Paper
Add Code

Learning to Learn End-to-End Goal-Oriented Dialog From Related Dialog Tasks

no code implementations • EMNLP (NLP4ConvAI) 2021 • Janarthanan Rajendran, Jonathan K. Kummerfeld, Satinder Singh

For each goal-oriented dialog task of interest, large amounts of data need to be collected for end-to-end learning of a neural dialog system.

Goal-Oriented Dialog Meta-Learning

Paper
Add Code

Micromodels for Efficient, Explainable, and Reusable Systems: A Case Study on Mental Health

1 code implementation • Findings (EMNLP) 2021 • Andrew Lee, Jonathan K. Kummerfeld, Lawrence C. An, Rada Mihalcea

Many statistical models have high accuracy on test benchmarks, but are not explainable, struggle in low-resource scenarios, cannot be reused for multiple tasks, and cannot easily integrate domain expertise.

Classification

Paper
Code

Exploring Self-Identified Counseling Expertise in Online Support Forums

1 code implementation • Findings (ACL) 2021 • Allison Lahnala, Yuntian Zhao, Charles Welch, Jonathan K. Kummerfeld, Lawrence An, Kenneth Resnicow, Rada Mihalcea, Verónica Pérez-Rosas

A growing number of people engage in online health forums, making it important to understand the quality of the advice they receive.

Paper
Code

Quantifying and Avoiding Unfair Qualification Labour in Crowdsourcing

no code implementations • ACL 2021 • Jonathan K. Kummerfeld

Extensive work has argued in favour of paying crowd workers a wage that is at least equivalent to the U. S. federal minimum wage.

Paper
Add Code

Chord Embeddings: Analyzing What They Capture and Their Role for Next Chord Prediction and Artist Attribute Prediction

no code implementations • 4 Feb 2021 • Allison Lahnala, Gauri Kambhatla, Jiajun Peng, Matthew Whitehead, Gillian Minnehan, Eric Guldan, Jonathan K. Kummerfeld, Anıl Çamcı, Rada Mihalcea

In the first case study, we demonstrate that using chord embeddings in a next chord prediction task yields predictions that more closely match those by experienced musicians.

Attribute

Paper
Add Code

Inconsistencies in Crowdsourced Slot-Filling Annotations: A Typology and Identification Methods

no code implementations • COLING 2020 • Stefan Larson, Adrian Cheung, Anish Mahendran, Kevin Leach, Jonathan K. Kummerfeld

Using three new noisy crowd-annotated datasets, we show that a wide range of inconsistencies occur and can impact system performance if not addressed.

slot-filling Slot Filling

Paper
Add Code

Exploring the Value of Personalized Word Embeddings

no code implementations • COLING 2020 • Charles Welch, Jonathan K. Kummerfeld, Verónica Pérez-Rosas, Rada Mihalcea

Our results show that a subset of words belonging to specific psycholinguistic categories tend to vary more in their representations across users and that combining generic and personalized word embeddings yields the best performance, with a 4. 7% relative reduction in perplexity.

Authorship Attribution Language Modelling +1

Paper
Add Code

A Novel Workflow for Accurately and Efficiently Crowdsourcing Predicate Senses and Argument Labels

1 code implementation • Findings of the Association for Computational Linguistics 2020 • Youxuan Jiang, Huaiyu Zhu, Jonathan K. Kummerfeld, Yunyao Li, Walter Lasecki

Resources for Semantic Role Labeling (SRL) are typically annotated by experts at great expense.

Semantic Role Labeling

Paper
Code

Compositional Demographic Word Embeddings

1 code implementation • EMNLP 2020 • Charles Welch, Jonathan K. Kummerfeld, Verónica Pérez-Rosas, Rada Mihalcea

Word embeddings are usually derived from corpora containing text from many individuals, thus leading to general purpose representations rather than individually personalized representations.

Language Modelling Word Embeddings

Paper
Code

Improving Low Compute Language Modeling with In-Domain Embedding Initialisation

1 code implementation • EMNLP 2020 • Charles Welch, Rada Mihalcea, Jonathan K. Kummerfeld

In the process, we show that the standard convention of tying input and output embeddings does not improve perplexity when initializing with embeddings trained on in-domain data.

Language Modelling

Paper
Code

Analyzing the Surprising Variability in Word Embedding Stability Across Languages

1 code implementation • EMNLP 2021 • Laura Burdick, Jonathan K. Kummerfeld, Rada Mihalcea

Word embeddings are powerful representations that form the foundation of many natural language processing architectures, both in English and in other languages.

Word Embeddings

Paper
Code

No-Press Diplomacy: Modeling Multi-Agent Gameplay

no code implementations • NeurIPS 2019 • Philip Paquette, Yuchen Lu, Seton Steven Bocco, Max Smith, Satya O.-G., Jonathan K. Kummerfeld, Joelle Pineau, Satinder Singh, Aaron C. Courville

Diplomacy is a seven-player non-stochastic, non-cooperative game, where agents acquire resources through a mix of teamwork and betrayal.

Reinforcement Learning (RL)

Paper
Add Code

The Eighth Dialog System Technology Challenge

no code implementations • 14 Nov 2019 • Seokhwan Kim, Michel Galley, Chulaka Gunasekara, Sungjin Lee, Adam Atkinson, Baolin Peng, Hannes Schulz, Jianfeng Gao, Jinchao Li, Mahmoud Adada, Minlie Huang, Luis Lastras, Jonathan K. Kummerfeld, Walter S. Lasecki, Chiori Hori, Anoop Cherian, Tim K. Marks, Abhinav Rastogi, Xiaoxue Zang, Srinivas Sunkara, Raghav Gupta

This paper introduces the Eighth Dialog System Technology Challenge.

dialog state tracking

Paper
Add Code

No Press Diplomacy: Modeling Multi-Agent Gameplay

1 code implementation • 4 Sep 2019 • Philip Paquette, Yuchen Lu, Steven Bocco, Max O. Smith, Satya Ortiz-Gagne, Jonathan K. Kummerfeld, Satinder Singh, Joelle Pineau, Aaron Courville

Diplomacy is a seven-player non-stochastic, non-cooperative game, where agents acquire resources through a mix of teamwork and betrayal.

Reinforcement Learning (RL)

Paper
Code

An Evaluation Dataset for Intent Classification and Out-of-Scope Prediction

5 code implementations • IJCNLP 2019 • Stefan Larson, Anish Mahendran, Joseph J. Peper, Christopher Clarke, Andrew Lee, Parker Hill, Jonathan K. Kummerfeld, Kevin Leach, Michael A. Laurenzano, Lingjia Tang, Jason Mars

We find that while the classifiers perform well on in-scope intent classification, they struggle to identify out-of-scope queries.

Benchmarking General Classification +3

193

Paper
Code

DSTC7 Task 1: Noetic End-to-End Response Selection

no code implementations • WS 2019 • Chulaka Gunasekara, Jonathan K. Kummerfeld, Lazaros Polymenakos, Walter Lasecki

Goal-oriented dialogue in complex domains is an extremely challenging problem and there are relatively few datasets.

Conversational Response Selection Goal-Oriented Dialogue Systems

Paper
Add Code

SLATE: A Super-Lightweight Annotation Tool for Experts

1 code implementation • ACL 2019 • Jonathan K. Kummerfeld

Many annotation tools have been developed, covering a wide variety of tasks and providing features like user management, pre-processing, and automatic labeling.

Management

Paper
Code

Look Who's Talking: Inferring Speaker Attributes from Personal Longitudinal Dialog

1 code implementation • 25 Apr 2019 • Charles Welch, Verónica Pérez-Rosas, Jonathan K. Kummerfeld, Rada Mihalcea

We examine a large dialog corpus obtained from the conversation history of a single individual with 104 conversation partners.

Attribute

Paper
Code

Outlier Detection for Improved Data Quality and Diversity in Dialog Systems

no code implementations • NAACL 2019 • Stefan Larson, Anish Mahendran, Andrew Lee, Jonathan K. Kummerfeld, Parker Hill, Michael A. Laurenzano, Johann Hauswald, Lingjia Tang, Jason Mars

We also present a novel data collection pipeline built atop our detection technique to automatically and iteratively mine unique data samples while discarding erroneous samples.

intent-classification Intent Classification +6

Paper
Add Code

Dialog System Technology Challenge 7

no code implementations • 11 Jan 2019 • Koichiro Yoshino, Chiori Hori, Julien Perez, Luis Fernando D'Haro, Lazaros Polymenakos, Chulaka Gunasekara, Walter S. Lasecki, Jonathan K. Kummerfeld, Michel Galley, Chris Brockett, Jianfeng Gao, Bill Dolan, Xiang Gao, Huda Alamari, Tim K. Marks, Devi Parikh, Dhruv Batra

This paper introduces the Seventh Dialog System Technology Challenges (DSTC), which use shared datasets to explore the problem of building dialog systems.

Sentence

Paper
Add Code

A Large-Scale Corpus for Conversation Disentanglement

3 code implementations • ACL 2019 • Jonathan K. Kummerfeld, Sai R. Gouravajhala, Joseph Peper, Vignesh Athreya, Chulaka Gunasekara, Jatin Ganhotra, Siva Sankalp Patel, Lazaros Polymenakos, Walter S. Lasecki

Disentangling conversations mixed together in a single stream of messages is a difficult task, made harder by the lack of large manually annotated datasets.

Ranked #1 on Conversation Disentanglement on Linux IRC (Ch2 Kummerfeld)

Conversation Disentanglement Disentanglement

Paper
Code

Improving Text-to-SQL Evaluation Methodology

1 code implementation • ACL 2018 • Catherine Finegan-Dollak, Jonathan K. Kummerfeld, Li Zhang, Karthik Ramanathan, Sesh Sadasivam, Rui Zhang, Dragomir Radev

Second, we show that the current division of data into training and test sets measures robustness to variations in the way questions are asked, but only partially tests how well systems generalize to new queries; therefore, we propose a complementary dataset split for evaluation of future work.

Ranked #1 on SQL Parsing on IMDb

SQL Parsing Text-To-SQL

502

Paper
Code

Data Collection for Dialogue System: A Startup Perspective

no code implementations • NAACL 2018 • Yiping Kang, Yunqi Zhang, Jonathan K. Kummerfeld, Lingjia Tang, Jason Mars

In this paper, we present a study of crowdsourcing methods for a user intent classification task in our deployed dialogue system.

General Classification intent-classification +2

Paper
Add Code

Effective Crowdsourcing for a New Type of Summarization Task

no code implementations • NAACL 2018 • Youxuan Jiang, Catherine Finegan-Dollak, Jonathan K. Kummerfeld, Walter Lasecki

Most summarization research focuses on summarizing the entire given text, but in practice readers are often interested in only one aspect of the document or conversation.

Vocal Bursts Type Prediction

Paper
Add Code

World Knowledge for Abstract Meaning Representation Parsing

no code implementations • LREC 2018 • Charles Welch, Jonathan K. Kummerfeld, Song Feng, Rada Mihalcea

AMR Parsing Named Entity Recognition (NER) +1

Paper
Add Code

Factors Influencing the Surprising Instability of Word Embeddings

2 code implementations • NAACL 2018 • Laura Wendlandt, Jonathan K. Kummerfeld, Rada Mihalcea

Despite the recent popularity of word embedding methods, there is only a small body of work exploring the limitations of these representations.

Word Embeddings

Paper
Code

Identifying Products in Online Cybercrime Marketplaces: A Dataset for Fine-grained Domain Adaptation

1 code implementation • EMNLP 2017 • Greg Durrett, Jonathan K. Kummerfeld, Taylor Berg-Kirkpatrick, Rebecca S. Portnoff, Sadia Afroz, Damon McCoy, Kirill Levchenko, Vern Paxson

One weakness of machine-learned NLP models is that they typically perform poorly on out-of-domain data.

Domain Adaptation named-entity-recognition +4

Paper
Code

Parsing with Traces: An $O(n^4)$ Algorithm and a Structural Representation

1 code implementation • 13 Jul 2017 • Jonathan K. Kummerfeld, Dan Klein

General treebank analyses are graph structured, but parsers are typically restricted to tree structures for efficiency and modeling reasons.

Ranked #2 on Missing Elements on Penn Treebank

Constituency Parsing Missing Elements

Paper
Code

Understanding Task Design Trade-offs in Crowdsourced Paraphrase Collection

no code implementations • ACL 2017 • Youxuan Jiang, Jonathan K. Kummerfeld, Walter S. Lasecki

Linguistically diverse datasets are critical for training and evaluating robust machine learning systems, but data collection is a costly process that often requires experts.

Paraphrase Generation