Search Results for author: Jonathan K. Kummerfeld

Found 41 papers, 21 papers with code

Supporting Sensemaking of Large Language Model Outputs at Scale

no code implementations24 Jan 2024 Katy Ilonka Gero, Chelse Swoopes, Ziwei Gu, Jonathan K. Kummerfeld, Elena L. Glassman

Large language models (LLMs) are capable of generating multiple responses to a single prompt, yet little effort has been expended to help end-users or system designers make use of this capability.

Language Modelling Large Language Model

A Mechanistic Understanding of Alignment Algorithms: A Case Study on DPO and Toxicity

1 code implementation3 Jan 2024 Andrew Lee, Xiaoyan Bai, Itamar Pres, Martin Wattenberg, Jonathan K. Kummerfeld, Rada Mihalcea

While alignment algorithms are now commonly used to tune pre-trained language models towards a user's preferences, we lack explanations for the underlying mechanisms in which models become ``aligned'', thus making it difficult to explain phenomena like jailbreaks.

Language Modelling

Interactive Text-to-SQL Generation via Editable Step-by-Step Explanations

1 code implementation12 May 2023 Yuan Tian, Zheng Zhang, Zheng Ning, Toby Jia-Jun Li, Jonathan K. Kummerfeld, Tianyi Zhang

Many techniques have been proposed to automatically generate SQL from natural language, but they suffer from two issues: (1) they still make many mistakes, particularly for complex queries, and (2) they do not provide a flexible way for non-expert users to validate and refine incorrect queries.

Text-To-SQL

Augmenting Task-Oriented Dialogue Systems with Relation Extraction

no code implementations24 Oct 2022 Andrew Lee, Zhenguo Chen, Kevin Leach, Jonathan K. Kummerfeld

The standard task-oriented dialogue pipeline uses intent classification and slot-filling to interpret user utterances.

intent-classification Intent Classification +6

Using Paraphrases to Study Properties of Contextual Embeddings

no code implementations NAACL 2022 Laura Burdick, Jonathan K. Kummerfeld, Rada Mihalcea

We use paraphrases as a unique source of data to analyze contextualized embeddings, with a particular focus on BERT.

Micromodels for Efficient, Explainable, and Reusable Systems: A Case Study on Mental Health

1 code implementation Findings (EMNLP) 2021 Andrew Lee, Jonathan K. Kummerfeld, Lawrence C. An, Rada Mihalcea

Many statistical models have high accuracy on test benchmarks, but are not explainable, struggle in low-resource scenarios, cannot be reused for multiple tasks, and cannot easily integrate domain expertise.

Classification

Quantifying and Avoiding Unfair Qualification Labour in Crowdsourcing

no code implementations ACL 2021 Jonathan K. Kummerfeld

Extensive work has argued in favour of paying crowd workers a wage that is at least equivalent to the U. S. federal minimum wage.

Chord Embeddings: Analyzing What They Capture and Their Role for Next Chord Prediction and Artist Attribute Prediction

no code implementations4 Feb 2021 Allison Lahnala, Gauri Kambhatla, Jiajun Peng, Matthew Whitehead, Gillian Minnehan, Eric Guldan, Jonathan K. Kummerfeld, Anıl Çamcı, Rada Mihalcea

In the first case study, we demonstrate that using chord embeddings in a next chord prediction task yields predictions that more closely match those by experienced musicians.

Attribute

Inconsistencies in Crowdsourced Slot-Filling Annotations: A Typology and Identification Methods

no code implementations COLING 2020 Stefan Larson, Adrian Cheung, Anish Mahendran, Kevin Leach, Jonathan K. Kummerfeld

Using three new noisy crowd-annotated datasets, we show that a wide range of inconsistencies occur and can impact system performance if not addressed.

slot-filling Slot Filling

Exploring the Value of Personalized Word Embeddings

no code implementations COLING 2020 Charles Welch, Jonathan K. Kummerfeld, Verónica Pérez-Rosas, Rada Mihalcea

Our results show that a subset of words belonging to specific psycholinguistic categories tend to vary more in their representations across users and that combining generic and personalized word embeddings yields the best performance, with a 4. 7% relative reduction in perplexity.

Language Modelling Word Embeddings

Compositional Demographic Word Embeddings

1 code implementation EMNLP 2020 Charles Welch, Jonathan K. Kummerfeld, Verónica Pérez-Rosas, Rada Mihalcea

Word embeddings are usually derived from corpora containing text from many individuals, thus leading to general purpose representations rather than individually personalized representations.

Language Modelling Word Embeddings

Improving Low Compute Language Modeling with In-Domain Embedding Initialisation

1 code implementation EMNLP 2020 Charles Welch, Rada Mihalcea, Jonathan K. Kummerfeld

In the process, we show that the standard convention of tying input and output embeddings does not improve perplexity when initializing with embeddings trained on in-domain data.

Language Modelling

Analyzing the Surprising Variability in Word Embedding Stability Across Languages

1 code implementation EMNLP 2021 Laura Burdick, Jonathan K. Kummerfeld, Rada Mihalcea

Word embeddings are powerful representations that form the foundation of many natural language processing architectures, both in English and in other languages.

Word Embeddings

No Press Diplomacy: Modeling Multi-Agent Gameplay

1 code implementation4 Sep 2019 Philip Paquette, Yuchen Lu, Steven Bocco, Max O. Smith, Satya Ortiz-Gagne, Jonathan K. Kummerfeld, Satinder Singh, Joelle Pineau, Aaron Courville

Diplomacy is a seven-player non-stochastic, non-cooperative game, where agents acquire resources through a mix of teamwork and betrayal.

Reinforcement Learning (RL)

SLATE: A Super-Lightweight Annotation Tool for Experts

1 code implementation ACL 2019 Jonathan K. Kummerfeld

Many annotation tools have been developed, covering a wide variety of tasks and providing features like user management, pre-processing, and automatic labeling.

Management

Look Who's Talking: Inferring Speaker Attributes from Personal Longitudinal Dialog

1 code implementation25 Apr 2019 Charles Welch, Verónica Pérez-Rosas, Jonathan K. Kummerfeld, Rada Mihalcea

We examine a large dialog corpus obtained from the conversation history of a single individual with 104 conversation partners.

Attribute

Improving Text-to-SQL Evaluation Methodology

1 code implementation ACL 2018 Catherine Finegan-Dollak, Jonathan K. Kummerfeld, Li Zhang, Karthik Ramanathan, Sesh Sadasivam, Rui Zhang, Dragomir Radev

Second, we show that the current division of data into training and test sets measures robustness to variations in the way questions are asked, but only partially tests how well systems generalize to new queries; therefore, we propose a complementary dataset split for evaluation of future work.

SQL Parsing Text-To-SQL

Effective Crowdsourcing for a New Type of Summarization Task

no code implementations NAACL 2018 Youxuan Jiang, Catherine Finegan-Dollak, Jonathan K. Kummerfeld, Walter Lasecki

Most summarization research focuses on summarizing the entire given text, but in practice readers are often interested in only one aspect of the document or conversation.

Vocal Bursts Type Prediction

Factors Influencing the Surprising Instability of Word Embeddings

2 code implementations NAACL 2018 Laura Wendlandt, Jonathan K. Kummerfeld, Rada Mihalcea

Despite the recent popularity of word embedding methods, there is only a small body of work exploring the limitations of these representations.

Word Embeddings

Parsing with Traces: An $O(n^4)$ Algorithm and a Structural Representation

1 code implementation13 Jul 2017 Jonathan K. Kummerfeld, Dan Klein

General treebank analyses are graph structured, but parsers are typically restricted to tree structures for efficiency and modeling reasons.

Constituency Parsing Missing Elements

Understanding Task Design Trade-offs in Crowdsourced Paraphrase Collection

no code implementations ACL 2017 Youxuan Jiang, Jonathan K. Kummerfeld, Walter S. Lasecki

Linguistically diverse datasets are critical for training and evaluating robust machine learning systems, but data collection is a costly process that often requires experts.

Paraphrase Generation

Parsing with Traces: An O(n4) Algorithm and a Structural Representation

no code implementations TACL 2017 Jonathan K. Kummerfeld, Dan Klein

General treebank analyses are graph structured, but parsers are typically restricted to tree structures for efficiency and modeling reasons.

Question Answering

Cannot find the paper you are looking for? You can Submit a new open access paper.