Search Results for author: Jordan Boyd-Graber

Found 118 papers, 32 papers with code

Eliciting Bias in Question Answering Models through Ambiguity

1 code implementation EMNLP (MRQA) 2021 Andrew Mao, Naveen Raman, Matthew Shu, Eric Li, Franklin Yang, Jordan Boyd-Graber

We develop two sets of questions for closed and open domain questions respectively, which use ambiguous questions to probe QA models for bias.

Question Answering

Distantly-Supervised Dense Retrieval Enables Open-Domain Question Answering without Evidence Annotation

1 code implementation EMNLP 2021 Chen Zhao, Chenyan Xiong, Jordan Boyd-Graber, Hal Daumé III

This paper investigates whether models can learn to find evidence from a large corpus, with only distant supervision from answer labels for model training, thereby generating no additional annotation cost.

Open-Domain Question Answering Retrieval

Evaluation Paradigms in Question Answering

no code implementations EMNLP 2021 Pedro Rodriguez, Jordan Boyd-Graber

Question answering (QA) primarily descends from two branches of research: (1) Alan Turing’s investigation of machine intelligence at Manchester University and (2) Cyril Cleverdon’s comparison of library card catalog indices at Cranfield University.

Position Question Answering

Human-Centered Evaluation of Explanations

no code implementations NAACL (ACL) 2022 Jordan Boyd-Graber, Samuel Carton, Shi Feng, Q. Vera Liao, Tania Lombrozo, Alison Smith-Renner, Chenhao Tan

The NLP community are increasingly interested in providing explanations for NLP models to help people make sense of model behavior and potentially improve human interaction with models.

Adapting Entities across Languages and Cultures

no code implementations Findings (EMNLP) 2021 Denis Peskov, Viktor Hangya, Jordan Boyd-Graber, Alexander Fraser

He is associated with founding a company in the United States, so perhaps the German founder Carl Benz could stand in for Gates in those contexts.

Machine Translation Question Answering +1

Reverse Question Answering: Can an LLM Write a Question so Hard (or Bad) that it Can't Answer?

1 code implementation20 Oct 2024 Nishant Balepur, Feng Gu, Abhilasha Ravichander, Shi Feng, Jordan Boyd-Graber, Rachel Rudinger

Question answering (QA)-producing correct answers for input questions-is popular, but we test a reverse question answering (RQA) task: given an input answer, generate a question with that answer.

Question Answering valid

Do great minds think alike? Investigating Human-AI Complementarity in Question Answering with CAIMIRA

no code implementations9 Oct 2024 Maharshi Gor, Hal Daumé III, Tianyi Zhou, Jordan Boyd-Graber

Recent advancements of large language models (LLMs) have led to claims of AI surpassing humans in natural language processing (NLP) tasks such as textual understanding and reasoning.

Information Retrieval Question Answering +1

SciDoc2Diagrammer-MAF: Towards Generation of Scientific Diagrams from Documents guided by Multi-Aspect Feedback Refinement

no code implementations28 Sep 2024 Ishani Mondal, Zongxia Li, Yufang Hou, Anandhavelu Natarajan, Aparna Garimella, Jordan Boyd-Graber

Automating the creation of scientific diagrams from academic papers can significantly streamline the development of tutorials, presentations, and posters, thereby saving time and accelerating the process.

Benchmarking Code Generation

KARL: Knowledge-Aware Retrieval and Representations aid Retention and Learning in Students

no code implementations19 Feb 2024 Matthew Shu, Nishant Balepur, Shi Feng, Jordan Boyd-Graber

Flashcard schedulers rely on 1) student models to predict the flashcards a student knows; and 2) teaching policies to pick which cards to show next via these predictions.

Knowledge Tracing Retrieval +1

Improving the TENOR of Labeling: Re-evaluating Topic Models for Content Analysis

1 code implementation29 Jan 2024 Zongxia Li, Andrew Mao, Daniel Stephens, Pranav Goel, Emily Walpole, Alden Dima, Juan Fung, Jordan Boyd-Graber

Topic models are a popular tool for understanding text collections, but their evaluation has been a point of contention.

Topic Models

CFMatch: Aligning Automated Answer Equivalence Evaluation with Expert Judgments For Open-Domain Question Answering

no code implementations24 Jan 2024 Zongxia Li, Ishani Mondal, Yijun Liang, Huy Nghiem, Jordan Boyd-Graber

Question answering (QA) can only make progress if we know if an answer is correct, but for many of the most challenging and interesting QA examples, current evaluation metrics to determine answer equivalence (AE) often do not align with human judgments, particularly more verbose, free-form answers from large language models (LLM).

Open-Domain Question Answering

How the Advent of Ubiquitous Large Language Models both Stymie and Turbocharge Dynamic Adversarial Question Generation

no code implementations20 Jan 2024 Yoo yeon Sung, Ishani Mondal, Jordan Boyd-Graber

Dynamic adversarial question generation, where humans write examples to stump a model, aims to create examples that are realistic and informative.

Question Generation Question-Generation +1

Labeled Interactive Topic Models

no code implementations15 Nov 2023 Kyle Seelman, Mozhi Zhang, Jordan Boyd-Graber

To facilitate user interaction with these neural topic models, we have developed an interactive interface.

Topic Models

Not all Fake News is Written: A Dataset and Analysis of Misleading Video Headlines

1 code implementation20 Oct 2023 Yoo yeon Sung, Jordan Boyd-Graber, Naeemul Hassan

Polarization and the marketplace for impressions have conspired to make navigating information online difficult for users, and while there has been a significant effort to detect false or misleading text, multimodal datasets have received considerably less attention.

Large Language Models Help Humans Verify Truthfulness -- Except When They Are Convincingly Wrong

no code implementations19 Oct 2023 Chenglei Si, Navita Goyal, Sherry Tongshuang Wu, Chen Zhao, Shi Feng, Hal Daumé III, Jordan Boyd-Graber

To reduce over-reliance on LLMs, we ask LLMs to provide contrastive information - explain both why the claim is true and false, and then we present both sides of the explanation to users.

Fact Checking Information Retrieval

MegaWika: Millions of reports and their sources across 50 diverse languages

no code implementations13 Jul 2023 Samuel Barham, Orion Weller, Michelle Yuan, Kenton Murray, Mahsa Yarmohammadi, Zhengping Jiang, Siddharth Vashishtha, Alexander Martin, Anqi Liu, Aaron Steven White, Jordan Boyd-Graber, Benjamin Van Durme

To foster the development of new models for collaborative AI-assisted report generation, we introduce MegaWika, consisting of 13 million Wikipedia articles in 50 diverse languages, along with their 71 million referenced source materials.

Cross-Lingual Question Answering Retrieval +1

Getting MoRE out of Mixture of Language Model Reasoning Experts

no code implementations24 May 2023 Chenglei Si, Weijia Shi, Chen Zhao, Luke Zettlemoyer, Jordan Boyd-Graber

Beyond generalizability, the interpretable design of MoRE improves selective question answering results compared to baselines without incorporating inter-expert agreement.

Answer Selection Language Modelling

Cheater's Bowl: Human vs. Computer Search Strategies for Open-Domain Question Answering

no code implementations15 Nov 2022 Wanrong He, Andrew Mao, Jordan Boyd-Graber

For humans and computers, the first step in answering an open-domain question is retrieving a set of relevant documents from a large corpus.

Open-Domain Question Answering World Knowledge

Prompting GPT-3 To Be Reliable

1 code implementation17 Oct 2022 Chenglei Si, Zhe Gan, Zhengyuan Yang, Shuohang Wang, JianFeng Wang, Jordan Boyd-Graber, Lijuan Wang

While reliability is a broad and vaguely defined term, we decompose reliability into four main facets that correspond to the existing framework of ML safety and are well-recognized to be important: generalizability, social biases, calibration, and factuality.

Fairness Language Modelling

Improving Question Answering with Generation of NQ-like Questions

no code implementations12 Oct 2022 Saptarashmi Bandyopadhyay, Shraman Pal, Hao Zou, Abhranil Chandra, Jordan Boyd-Graber

We demonstrate that in a low resource setting, using the generated data improves the QA performance over the baseline system on both NQ and QB data.

Natural Questions Question Answering

Re-Examining Calibration: The Case of Question Answering

1 code implementation25 May 2022 Chenglei Si, Chen Zhao, Sewon Min, Jordan Boyd-Graber

Building on those observations, we propose a new calibration metric, MacroCE, that better captures whether the model assigns low confidence to wrong predictions and high confidence to correct predictions.

Open-Domain Question Answering

Automatic Song Translation for Tonal Languages

no code implementations Findings (ACL) 2022 Fenfei Guo, Chen Zhang, Zhirui Zhang, Qixin He, Kejun Zhang, Jun Xie, Jordan Boyd-Graber

This paper develops automatic song translation (AST) for tonal languages and addresses the unique challenge of aligning words' tones with melody of a song in addition to conveying the original meaning.

Translation

What's in a Name? Answer Equivalence For Open-Domain Question Answering

1 code implementation11 Sep 2021 Chenglei Si, Chen Zhao, Jordan Boyd-Graber

We incorporate answers for two settings: evaluation with additional answers and model training with equivalent answers.

Natural Questions Open-Domain Question Answering +2

Evaluation Examples are not Equally Informative: How should that change NLP Leaderboards?

1 code implementation ACL 2021 Pedro Rodriguez, Joe Barrow, Alexander Miserlis Hoyle, John P. Lalor, Robin Jia, Jordan Boyd-Graber

While leaderboards are a straightforward ranking of NLP models, this simplicity can mask nuances in evaluation items (examples) and subjects (NLP models).

Picard understanding Darmok: A Dataset and Model for Metaphor-Rich Translation in a Constructed Language

1 code implementation16 Jul 2021 Peter Jansen, Jordan Boyd-Graber

Tamarian, a fictional language introduced in the Star Trek episode Darmok, communicates meaning through utterances of metaphorical references, such as "Darmok and Jalad at Tanagra" instead of "We should work together."

Language Modelling Large Language Model +2

Is Automated Topic Model Evaluation Broken?: The Incoherence of Coherence

2 code implementations NeurIPS 2021 Alexander Hoyle, Pranav Goel, Denis Peskov, Andrew Hian-Cheong, Jordan Boyd-Graber, Philip Resnik

To address the standardization gap, we systematically evaluate a dominant classical model and two state-of-the-art neural models on two commonly used datasets.

Topic Models

Fool Me Twice: Entailment from Wikipedia Gamification

1 code implementation NAACL 2021 Julian Martin Eisenschlos, Bhuwan Dhingra, Jannis Bulian, Benjamin Börschinger, Jordan Boyd-Graber

We release FoolMeTwice (FM2 for short), a large dataset of challenging entailment pairs collected through a fun multi-player game.

Retrieval

Complex Factoid Question Answering with a Free-Text Knowledge Graph

no code implementations23 Mar 2021 Chen Zhao, Chenyan Xiong, Xin Qian, Jordan Boyd-Graber

DELFT's advantage comes from both the high coverage of its free-text knowledge graph-more than double that of dbpedia relations-and the novel graph neural network which reasons on the rich but noisy free-text evidence.

Graph Neural Network Graph Question Answering +2

An Attentive Recurrent Model for Incremental Prediction of Sentence-final Verbs

no code implementations Findings of the Association for Computational Linguistics 2020 Wenyan Li, Alvin Grissom II, Jordan Boyd-Graber

Verb prediction is important for understanding human processing of verb-final languages, with practical applications to real-time simultaneous interpretation from verb-final to verb-medial languages.

Sentence

It Takes Two to Lie: One to Lie, and One to Listen

no code implementations ACL 2020 Denis Peskov, Benny Cheng, Ahmed Elgohary, Joe Barrow, Cristian Danescu-Niculescu-Mizil, Jordan Boyd-Graber

Trust is implicit in many online text conversations{---}striking up new friendships, or asking for tech support.

Meta Answering for Machine Reading

no code implementations11 Nov 2019 Benjamin Borschinger, Jordan Boyd-Graber, Christian Buck, Jannis Bulian, Massimiliano Ciaramita, Michelle Chen Huebscher, Wojciech Gajewski, Yannic Kilcher, Rodrigo Nogueira, Lierni Sestorain Saralegu

We investigate a framework for machine reading, inspired by real world information-seeking problems, where a meta question answering system interacts with a black box environment.

Natural Questions Question Answering +1

Interactive Refinement of Cross-Lingual Word Embeddings

1 code implementation EMNLP 2020 Michelle Yuan, Mozhi Zhang, Benjamin Van Durme, Leah Findlater, Jordan Boyd-Graber

Cross-lingual word embeddings transfer knowledge between languages: models trained on high-resource languages can predict in low-resource languages.

Active Learning Cross-Lingual Word Embeddings +3

How Pre-trained Word Representations Capture Commonsense Physical Comparisons

no code implementations WS 2019 Pranav Goel, Shi Feng, Jordan Boyd-Graber

One type of common sense is how two objects compare on physical properties such as size and weight: e. g., {`}is a house bigger than a person?{'}.

Common Sense Reasoning

Can You Unpack That? Learning to Rewrite Questions-in-Context

no code implementations IJCNLP 2019 Ahmed Elgohary, Denis Peskov, Jordan Boyd-Graber

Question answering is an AI-complete problem, but existing datasets lack key elements of language understanding such as coreference and ellipsis resolution.

Question Answering

What Question Answering can Learn from Trivia Nerds

no code implementations ACL 2020 Jordan Boyd-Graber, Benjamin Börschinger

In addition to the traditional task of getting machines to answer questions, a major research question in question answering is to create interesting, challenging questions that can help systems learn how to answer questions and also reveal which systems are the best at answering questions.

Question Answering

Mitigating Noisy Inputs for Question Answering

no code implementations8 Aug 2019 Denis Peskov, Joe Barrow, Pedro Rodriguez, Graham Neubig, Jordan Boyd-Graber

We investigate and mitigate the effects of noise from Automatic Speech Recognition systems on two factoid Question Answering (QA) tasks.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +6

Are Girls Neko or Sh\=ojo? Cross-Lingual Alignment of Non-Isomorphic Embeddings with Iterative Normalization

no code implementations ACL 2019 Mozhi Zhang, Keyulu Xu, Ken-ichi Kawarabayashi, Stefanie Jegelka, Jordan Boyd-Graber

Cross-lingual word embeddings (CLWE) underlie many multilingual natural language processing systems, often through orthogonal transformations of pre-trained monolingual embeddings.

Cross-Lingual Word Embeddings Translation +2

Are Girls Neko or Shōjo? Cross-Lingual Alignment of Non-Isomorphic Embeddings with Iterative Normalization

1 code implementation4 Jun 2019 Mozhi Zhang, Keyulu Xu, Ken-ichi Kawarabayashi, Stefanie Jegelka, Jordan Boyd-Graber

Cross-lingual word embeddings (CLWE) underlie many multilingual natural language processing systems, often through orthogonal transformations of pre-trained monolingual embeddings.

Cross-Lingual Word Embeddings Translation +2

Why Didn't You Listen to Me? Comparing User Control of Human-in-the-Loop Topic Models

no code implementations ACL 2019 Varun Kumar, Alison Smith-Renner, Leah Findlater, Kevin Seppi, Jordan Boyd-Graber

To address the lack of comparative evaluation of Human-in-the-Loop Topic Modeling (HLTM) systems, we implement and evaluate three contrasting HLTM modeling approaches using simulation experiments.

Topic Models

Automatic Evaluation of Local Topic Quality

no code implementations ACL 2019 Jeffrey Lund, Piper Armstrong, Wilson Fearn, Stephen Cowley, Courtni Byun, Jordan Boyd-Graber, Kevin Seppi

Topic models are typically evaluated with respect to the global topic distributions that they generate, using metrics such as coherence, but without regard to local (token-level) topic assignments.

Topic Models

Misleading Failures of Partial-input Baselines

no code implementations ACL 2019 Shi Feng, Eric Wallace, Jordan Boyd-Graber

Recent work establishes dataset difficulty and removes annotation artifacts via partial-input baselines (e. g., hypothesis-only models for SNLI or question-only models for VQA).

Natural Language Inference Visual Question Answering (VQA)

Quizbowl: The Case for Incremental Question Answering

no code implementations9 Apr 2019 Pedro Rodriguez, Shi Feng, Mohit Iyyer, He He, Jordan Boyd-Graber

Throughout this paper, we show that collaborations with the vibrant trivia community have contributed to the quality of our dataset, spawned new research directions, and doubled as an exciting way to engage the public with research in machine learning and natural language processing.

BIG-bench Machine Learning Decision Making +2

What can AI do for me: Evaluating Machine Learning Interpretations in Cooperative Play

no code implementations23 Oct 2018 Shi Feng, Jordan Boyd-Graber

Machine learning is an important tool for decision making, but its ethical and responsible application requires rigorous vetting of its interpretability and utility: an understudied problem, particularly for natural language processing models.

BIG-bench Machine Learning Decision Making +1

A dataset and baselines for sequential open-domain question answering

no code implementations EMNLP 2018 Ahmed Elgohary, Chen Zhao, Jordan Boyd-Graber

Previous work on question-answering systems mainly focuses on answering individual questions, assuming they are independent and devoid of context.

Information Retrieval Open-Domain Question Answering +1

A Differentiable Self-disambiguated Sense Embedding Model via Scaled Gumbel Softmax

no code implementations27 Sep 2018 Fenfei Guo, Mohit Iyyer, Leah Findlater, Jordan Boyd-Graber

We present a differentiable multi-prototype word representation model that disentangles senses of polysemous words and produces meaningful sense-specific embeddings without external resources.

Hard Attention Sentence +1

Automatic Estimation of Simultaneous Interpreter Performance

1 code implementation ACL 2018 Craig Stewart, Nikolai Vogler, Junjie Hu, Jordan Boyd-Graber, Graham Neubig

Simultaneous interpretation, translation of the spoken word in real-time, is both highly challenging and physically demanding.

Machine Translation Translation

Lessons from the Bible on Modern Topics: Low-Resource Multilingual Topic Model Evaluation

no code implementations NAACL 2018 Shudong Hao, Jordan Boyd-Graber, Michael J. Paul

Multilingual topic models enable document analysis across languages through coherent multilingual summaries of the data.

Topic Models

Inducing and Embedding Senses with Scaled Gumbel Softmax

no code implementations22 Apr 2018 Fenfei Guo, Mohit Iyyer, Jordan Boyd-Graber

Methods for learning word sense embeddings represent a single word with multiple sense-specific vectors.

Pathologies of Neural Models Make Interpretations Difficult

no code implementations EMNLP 2018 Shi Feng, Eric Wallace, Alvin Grissom II, Mohit Iyyer, Pedro Rodriguez, Jordan Boyd-Graber

In existing interpretation methods for NLP, a word's importance is determined by either input perturbation---measuring the decrease in model confidence when that word is removed---or by the gradient with respect to that word.

Sentence

Reinforcement Learning for Bandit Neural Machine Translation with Simulated Human Feedback

1 code implementation EMNLP 2017 Khanh Nguyen, Hal Daumé III, Jordan Boyd-Graber

Machine translation is a natural candidate problem for reinforcement learning from human feedback: users provide quick, dirty ratings on candidate translations to guide a system to improve.

Decoder Machine Translation +4

Tandem Anchoring: a Multiword Anchor Approach for Interactive Topic Modeling

no code implementations ACL 2017 Jeffrey Lund, Connor Cook, Kevin Seppi, Jordan Boyd-Graber

We propose combinations of words as anchors, going beyond existing single word anchor algorithms{---}an approach we call {``}Tandem Anchors{''}.

Document Classification Information Retrieval +2

The Amazing Mysteries of the Gutter: Drawing Inferences Between Panels in Comic Book Narratives

2 code implementations CVPR 2017 Mohit Iyyer, Varun Manjunatha, Anupam Guha, Yogarshi Vyas, Jordan Boyd-Graber, Hal Daumé III, Larry Davis

While computers can now describe what is explicitly depicted in natural images, in this paper we examine whether they can understand the closure-driven narratives conveyed by stylized artwork and dialogue in comic book panels.

Opponent Modeling in Deep Reinforcement Learning

1 code implementation18 Sep 2016 He He, Jordan Boyd-Graber, Kevin Kwok, Hal Daumé III

Opponent modeling is necessary in multi-agent settings where secondary agents with competing goals also adapt their strategies, yet it remains challenging because strategies interact with each other and change.

Deep Reinforcement Learning reinforcement-learning +1

Online Adaptor Grammars with Hybrid Inference

no code implementations TACL 2014 Ke Zhai, Jordan Boyd-Graber, Shay B. Cohen

Adaptor grammars are a flexible, powerful formalism for defining nonparametric, unsupervised models of grammar productions.

Topic Models Variational Inference

Cannot find the paper you are looking for? You can Submit a new open access paper.