1 code implementation • EMNLP (MRQA) 2021 • Andrew Mao, Naveen Raman, Matthew Shu, Eric Li, Franklin Yang, Jordan Boyd-Graber
We develop two sets of questions for closed and open domain questions respectively, which use ambiguous questions to probe QA models for bias.
no code implementations • EMNLP 2021 • Maharshi Gor, Kellie Webster, Jordan Boyd-Graber
The goal of question answering (QA) is to answer _any_ question.
1 code implementation • EMNLP 2021 • Chen Zhao, Chenyan Xiong, Jordan Boyd-Graber, Hal Daumé III
This paper investigates whether models can learn to find evidence from a large corpus, with only distant supervision from answer labels for model training, thereby generating no additional annotation cost.
no code implementations • EMNLP 2021 • Pedro Rodriguez, Jordan Boyd-Graber
Question answering (QA) primarily descends from two branches of research: (1) Alan Turing’s investigation of machine intelligence at Manchester University and (2) Cyril Cleverdon’s comparison of library card catalog indices at Cranfield University.
1 code implementation • EMNLP 2021 • Chenglei Si, Chen Zhao, Jordan Boyd-Graber
We incorporate answers for two settings: evaluation with additional answers and model training with equivalent answers.
no code implementations • NAACL (ACL) 2022 • Jordan Boyd-Graber, Samuel Carton, Shi Feng, Q. Vera Liao, Tania Lombrozo, Alison Smith-Renner, Chenhao Tan
The NLP community are increasingly interested in providing explanations for NLP models to help people make sense of model behavior and potentially improve human interaction with models.
no code implementations • Findings (EMNLP) 2021 • Denis Peskov, Viktor Hangya, Jordan Boyd-Graber, Alexander Fraser
He is associated with founding a company in the United States, so perhaps the German founder Carl Benz could stand in for Gates in those contexts.
1 code implementation • 20 Oct 2024 • Nishant Balepur, Feng Gu, Abhilasha Ravichander, Shi Feng, Jordan Boyd-Graber, Rachel Rudinger
Question answering (QA)-producing correct answers for input questions-is popular, but we test a reverse question answering (RQA) task: given an input answer, generate a question with that answer.
no code implementations • 9 Oct 2024 • Maharshi Gor, Hal Daumé III, Tianyi Zhou, Jordan Boyd-Graber
Recent advancements of large language models (LLMs) have led to claims of AI surpassing humans in natural language processing (NLP) tasks such as textual understanding and reasoning.
no code implementations • 28 Sep 2024 • Ishani Mondal, Zongxia Li, Yufang Hou, Anandhavelu Natarajan, Aparna Garimella, Jordan Boyd-Graber
Automating the creation of scientific diagrams from academic papers can significantly streamline the development of tutorials, presentations, and posters, thereby saving time and accelerating the process.
1 code implementation • 21 Jun 2024 • Nishant Balepur, Matthew Shu, Alexander Hoyle, Alison Robey, Shi Feng, Seraphina Goldfarb-Tarrant, Jordan Boyd-Graber
To train SMART, we first fine-tune LLaMA-2 on a curated set of user-written mnemonics.
no code implementations • 19 Feb 2024 • Matthew Shu, Nishant Balepur, Shi Feng, Jordan Boyd-Graber
Flashcard schedulers rely on 1) student models to predict the flashcards a student knows; and 2) teaching policies to pick which cards to show next via these predictions.
1 code implementation • 29 Jan 2024 • Zongxia Li, Andrew Mao, Daniel Stephens, Pranav Goel, Emily Walpole, Alden Dima, Juan Fung, Jordan Boyd-Graber
Topic models are a popular tool for understanding text collections, but their evaluation has been a point of contention.
no code implementations • 24 Jan 2024 • Zongxia Li, Ishani Mondal, Yijun Liang, Huy Nghiem, Jordan Boyd-Graber
Question answering (QA) can only make progress if we know if an answer is correct, but for many of the most challenging and interesting QA examples, current evaluation metrics to determine answer equivalence (AE) often do not align with human judgments, particularly more verbose, free-form answers from large language models (LLM).
no code implementations • 20 Jan 2024 • Yoo yeon Sung, Ishani Mondal, Jordan Boyd-Graber
Dynamic adversarial question generation, where humans write examples to stump a model, aims to create examples that are realistic and informative.
no code implementations • 16 Nov 2023 • Neha Srikanth, Rupak Sarkar, Heran Mane, Elizabeth M. Aparicio, Quynh C. Nguyen, Rachel Rudinger, Jordan Boyd-Graber
Questions posed by information-seeking users often contain implicit false or potentially harmful assumptions.
no code implementations • 15 Nov 2023 • Kyle Seelman, Mozhi Zhang, Jordan Boyd-Graber
To facilitate user interaction with these neural topic models, we have developed an interactive interface.
2 code implementations • 24 Oct 2023 • Sander Schulhoff, Jeremy Pinto, Anaum Khan, Louis-François Bouchard, Chenglei Si, Svetlina Anati, Valen Tagliabue, Anson Liu Kost, Christopher Carnahan, Jordan Boyd-Graber
We also present a comprehensive taxonomical ontology of the types of adversarial prompts.
1 code implementation • 20 Oct 2023 • Yoo yeon Sung, Jordan Boyd-Graber, Naeemul Hassan
Polarization and the marketplace for impressions have conspired to make navigating information online difficult for users, and while there has been a significant effort to detect false or misleading text, multimodal datasets have received considerably less attention.
no code implementations • 19 Oct 2023 • Chenglei Si, Navita Goyal, Sherry Tongshuang Wu, Chen Zhao, Shi Feng, Hal Daumé III, Jordan Boyd-Graber
To reduce over-reliance on LLMs, we ask LLMs to provide contrastive information - explain both why the claim is true and false, and then we present both sides of the explanation to users.
no code implementations • 13 Jul 2023 • Samuel Barham, Orion Weller, Michelle Yuan, Kenton Murray, Mahsa Yarmohammadi, Zhengping Jiang, Siddharth Vashishtha, Alexander Martin, Anqi Liu, Aaron Steven White, Jordan Boyd-Graber, Benjamin Van Durme
To foster the development of new models for collaborative AI-assisted report generation, we introduce MegaWika, consisting of 13 million Wikipedia articles in 50 diverse languages, along with their 71 million referenced source materials.
no code implementations • 24 May 2023 • Chenglei Si, Weijia Shi, Chen Zhao, Luke Zettlemoyer, Jordan Boyd-Graber
Beyond generalizability, the interpretable design of MoRE improves selective question answering results compared to baselines without incorporating inter-expert agreement.
no code implementations • 24 May 2023 • Ishani Mondal, Michelle Yuan, Anandhavelu N, Aparna Garimella, Francis Ferraro, Andrew Blair-Stanek, Benjamin Van Durme, Jordan Boyd-Graber
Learning template based information extraction from documents is a crucial yet difficult task.
no code implementations • 15 Nov 2022 • Wanrong He, Andrew Mao, Jordan Boyd-Graber
For humans and computers, the first step in answering an open-domain question is retrieving a set of relevant documents from a large corpus.
1 code implementation • 17 Oct 2022 • Chenglei Si, Zhe Gan, Zhengyuan Yang, Shuohang Wang, JianFeng Wang, Jordan Boyd-Graber, Lijuan Wang
While reliability is a broad and vaguely defined term, we decompose reliability into four main facets that correspond to the existing framework of ML safety and are well-recognized to be important: generalizability, social biases, calibration, and factuality.
no code implementations • 12 Oct 2022 • Saptarashmi Bandyopadhyay, Shraman Pal, Hao Zou, Abhranil Chandra, Jordan Boyd-Graber
We demonstrate that in a low resource setting, using the generated data improves the QA performance over the baseline system on both NQ and QB data.
1 code implementation • 25 May 2022 • Chenglei Si, Chen Zhao, Sewon Min, Jordan Boyd-Graber
Building on those observations, we propose a new calibration metric, MacroCE, that better captures whether the model assigns low confidence to wrong predictions and high confidence to correct predictions.
no code implementations • Findings (ACL) 2022 • Fenfei Guo, Chen Zhang, Zhirui Zhang, Qixin He, Kejun Zhang, Jun Xie, Jordan Boyd-Graber
This paper develops automatic song translation (AST) for tonal languages and addresses the unique challenge of aligning words' tones with melody of a song in addition to conveying the original meaning.
no code implementations • ACL 2022 • Yoshinari Fujinuma, Jordan Boyd-Graber, Katharina Kann
(2) Does the answer to that question change with model adaptation?
1 code implementation • 10 Oct 2021 • Chen Zhao, Chenyan Xiong, Jordan Boyd-Graber, Hal Daumé III
Open-domain question answering answers a question based on evidence retrieved from a large corpus.
1 code implementation • 11 Sep 2021 • Chenglei Si, Chen Zhao, Jordan Boyd-Graber
We incorporate answers for two settings: evaluation with additional answers and model training with equivalent answers.
1 code implementation • ACL 2021 • Pedro Rodriguez, Joe Barrow, Alexander Miserlis Hoyle, John P. Lalor, Robin Jia, Jordan Boyd-Graber
While leaderboards are a straightforward ranking of NLP models, this simplicity can mask nuances in evaluation items (examples) and subjects (NLP models).
1 code implementation • 16 Jul 2021 • Peter Jansen, Jordan Boyd-Graber
Tamarian, a fictional language introduced in the Star Trek episode Darmok, communicates meaning through utterances of metaphorical references, such as "Darmok and Jalad at Tanagra" instead of "We should work together."
2 code implementations • NeurIPS 2021 • Alexander Hoyle, Pranav Goel, Denis Peskov, Andrew Hian-Cheong, Jordan Boyd-Graber, Philip Resnik
To address the standardization gap, we systematically evaluate a dominant classical model and two state-of-the-art neural models on two commonly used datasets.
no code implementations • 15 Apr 2021 • Maharshi Gor, Kellie Webster, Jordan Boyd-Graber
The goal of question answering (QA) is to answer any question.
1 code implementation • ACL 2022 • Michelle Yuan, Patrick Xia, Chandler May, Benjamin Van Durme, Jordan Boyd-Graber
Active learning mitigates this problem by sampling a small subset of data for annotators to label.
1 code implementation • NAACL 2021 • Chen Zhao, Chenyan Xiong, Jordan Boyd-Graber, Hal Daumé III
Complex question answering often requires finding a reasoning chain that consists of multiple evidence pieces.
1 code implementation • NAACL 2021 • Julian Martin Eisenschlos, Bhuwan Dhingra, Jannis Bulian, Benjamin Börschinger, Jordan Boyd-Graber
We release FoolMeTwice (FM2 for short), a large dataset of challenging entailment pairs collected through a fun multi-player game.
no code implementations • 23 Mar 2021 • Chen Zhao, Chenyan Xiong, Xin Qian, Jordan Boyd-Graber
DELFT's advantage comes from both the high coverage of its free-text knowledge graph-more than double that of dbpedia relations-and the novel graph neural network which reasons on the rich but noisy free-text evidence.
no code implementations • 1 Jan 2021 • Sewon Min, Jordan Boyd-Graber, Chris Alberti, Danqi Chen, Eunsol Choi, Michael Collins, Kelvin Guu, Hannaneh Hajishirzi, Kenton Lee, Jennimaria Palomaki, Colin Raffel, Adam Roberts, Tom Kwiatkowski, Patrick Lewis, Yuxiang Wu, Heinrich Küttler, Linqing Liu, Pasquale Minervini, Pontus Stenetorp, Sebastian Riedel, Sohee Yang, Minjoon Seo, Gautier Izacard, Fabio Petroni, Lucas Hosseini, Nicola De Cao, Edouard Grave, Ikuya Yamada, Sonse Shimaoka, Masatoshi Suzuki, Shumpei Miyawaki, Shun Sato, Ryo Takahashi, Jun Suzuki, Martin Fajcik, Martin Docekal, Karel Ondrej, Pavel Smrz, Hao Cheng, Yelong Shen, Xiaodong Liu, Pengcheng He, Weizhu Chen, Jianfeng Gao, Barlas Oguz, Xilun Chen, Vladimir Karpukhin, Stan Peshterliev, Dmytro Okhonko, Michael Schlichtkrull, Sonal Gupta, Yashar Mehdad, Wen-tau Yih
We review the EfficientQA competition from NeurIPS 2020.
no code implementations • 1 Dec 2020 • Thomas Diggelmann, Jordan Boyd-Graber, Jannis Bulian, Massimiliano Ciaramita, Markus Leippold
We introduce CLIMATE-FEVER, a new publicly available dataset for verification of climate change-related claims.
no code implementations • 1 Dec 2020 • Francesco S. Varini, Jordan Boyd-Graber, Massimiliano Ciaramita, Markus Leippold
Climate change communication in the mass media and other textual sources may affect and shape public perception.
no code implementations • Findings of the Association for Computational Linguistics 2020 • Wenyan Li, Alvin Grissom II, Jordan Boyd-Graber
Verb prediction is important for understanding human processing of verb-final languages, with practical applications to real-time simultaneous interpretation from verb-final to verb-medial languages.
2 code implementations • Findings of the Association for Computational Linguistics 2020 • Tianze Shi, Chen Zhao, Jordan Boyd-Graber, Hal Daumé III, Lillian Lee
Large-scale semantic parsing datasets annotated with logical forms have enabled major advances in supervised approaches.
1 code implementation • EMNLP 2020 • Michelle Yuan, Hsuan-Tien Lin, Jordan Boyd-Graber
Typically, the active learning strategy is contingent on the classification model.
no code implementations • ACL 2020 • Denis Peskov, Benny Cheng, Ahmed Elgohary, Joe Barrow, Cristian Danescu-Niculescu-Mizil, Jordan Boyd-Graber
Trust is implicit in many online text conversations{---}striking up new friendships, or asking for tech support.
no code implementations • ACL 2020 • Mozhi Zhang, Yoshinari Fujinuma, Michael J. Paul, Jordan Boyd-Graber
Cross-lingual word embeddings (CLWE) are often evaluated on bilingual lexicon induction (BLI).
Bilingual Lexicon Induction Cross-Lingual Word Embeddings +2
no code implementations • LREC 2020 • Jordan Boyd-Graber, Fenfei Guo, Leah Findlater, Mohit Iyyer
Text representations are critical for modern natural language processing.
no code implementations • 11 Nov 2019 • Benjamin Borschinger, Jordan Boyd-Graber, Christian Buck, Jannis Bulian, Massimiliano Ciaramita, Michelle Chen Huebscher, Wojciech Gajewski, Yannic Kilcher, Rodrigo Nogueira, Lierni Sestorain Saralegu
We investigate a framework for machine reading, inspired by real world information-seeking problems, where a meta question answering system interacts with a black box environment.
1 code implementation • EMNLP 2020 • Michelle Yuan, Mozhi Zhang, Benjamin Van Durme, Leah Findlater, Jordan Boyd-Graber
Cross-lingual word embeddings transfer knowledge between languages: models trained on high-resource languages can predict in low-resource languages.
no code implementations • WS 2019 • Pranav Goel, Shi Feng, Jordan Boyd-Graber
One type of common sense is how two objects compare on physical properties such as size and weight: e. g., {`}is a house bigger than a person?{'}.
no code implementations • IJCNLP 2019 • Weiwei Yang, Jordan Boyd-Graber, Philip Resnik
Multilingual topic models (MTMs) learn topics on documents in multiple languages.
no code implementations • IJCNLP 2019 • Ahmed Elgohary, Denis Peskov, Jordan Boyd-Graber
Question answering is an AI-complete problem, but existing datasets lack key elements of language understanding such as coreference and ellipsis resolution.
no code implementations • ACL 2020 • Jordan Boyd-Graber, Benjamin Börschinger
In addition to the traditional task of getting machines to answer questions, a major research question in question answering is to create interesting, challenging questions that can help systems learn how to answer questions and also reveal which systems are the best at answering questions.
no code implementations • 8 Aug 2019 • Denis Peskov, Joe Barrow, Pedro Rodriguez, Graham Neubig, Jordan Boyd-Graber
We investigate and mitigate the effects of noise from Automatic Speech Recognition systems on two factoid Question Answering (QA) tasks.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +6
no code implementations • ACL 2019 • Mozhi Zhang, Keyulu Xu, Ken-ichi Kawarabayashi, Stefanie Jegelka, Jordan Boyd-Graber
Cross-lingual word embeddings (CLWE) underlie many multilingual natural language processing systems, often through orthogonal transformations of pre-trained monolingual embeddings.
1 code implementation • ACL 2019 • Yoshinari Fujinuma, Jordan Boyd-Graber, Michael J. Paul
Cross-lingual word embeddings encode the meaning of words from different languages into a shared low-dimensional space.
1 code implementation • 4 Jun 2019 • Mozhi Zhang, Keyulu Xu, Ken-ichi Kawarabayashi, Stefanie Jegelka, Jordan Boyd-Graber
Cross-lingual word embeddings (CLWE) underlie many multilingual natural language processing systems, often through orthogonal transformations of pre-trained monolingual embeddings.
no code implementations • ACL 2019 • Varun Kumar, Alison Smith-Renner, Leah Findlater, Kevin Seppi, Jordan Boyd-Graber
To address the lack of comparative evaluation of Human-in-the-Loop Topic Modeling (HLTM) systems, we implement and evaluate three contrasting HLTM modeling approaches using simulation experiments.
no code implementations • ACL 2019 • Jeffrey Lund, Piper Armstrong, Wilson Fearn, Stephen Cowley, Courtni Byun, Jordan Boyd-Graber, Kevin Seppi
Topic models are typically evaluated with respect to the global topic distributions that they generate, using metrics such as coherence, but without regard to local (token-level) topic assignments.
no code implementations • ACL 2019 • Shi Feng, Eric Wallace, Jordan Boyd-Graber
Recent work establishes dataset difficulty and removes annotation artifacts via partial-input baselines (e. g., hypothesis-only models for SNLI or question-only models for VQA).
no code implementations • 9 Apr 2019 • Pedro Rodriguez, Shi Feng, Mohit Iyyer, He He, Jordan Boyd-Graber
Throughout this paper, we show that collaborations with the vibrant trivia community have contributed to the quality of our dataset, spawned new research directions, and doubled as an exciting way to engage the public with research in machine learning and natural language processing.
no code implementations • 22 Dec 2018 • Mozhi Zhang, Yoshinari Fujinuma, Jordan Boyd-Graber
Text classification must sometimes be applied in a low-resource language with no labeled training data.
Cross-Lingual Document Classification Document Classification +3
no code implementations • 23 Oct 2018 • Shi Feng, Jordan Boyd-Graber
Machine learning is an important tool for decision making, but its ethical and responsible application requires rigorous vetting of its interpretability and utility: an understudied problem, particularly for natural language processing models.
no code implementations • EMNLP 2018 • Ahmed Elgohary, Chen Zhao, Jordan Boyd-Graber
Previous work on question-answering systems mainly focuses on answering individual questions, assuming they are independent and devoid of context.
no code implementations • 27 Sep 2018 • Fenfei Guo, Mohit Iyyer, Leah Findlater, Jordan Boyd-Graber
We present a differentiable multi-prototype word representation model that disentangles senses of polysemous words and produces meaningful sense-specific embeddings without external resources.
1 code implementation • WS 2018 • Eric Wallace, Shi Feng, Jordan Boyd-Graber
However, the confidence of neural networks is not a robust measure of model uncertainty.
1 code implementation • TACL 2019 • Eric Wallace, Pedro Rodriguez, Shi Feng, Ikuya Yamada, Jordan Boyd-Graber
We propose human-in-the-loop adversarial generation, where human authors are guided to break models.
no code implementations • COLING 2018 • Paul Felt, Eric Ringger, Jordan Boyd-Graber, Kevin Seppi
Annotated corpora enable supervised machine learning and data analysis.
no code implementations • ACL 2018 • Eric Wallace, Jordan Boyd-Graber
Modern question answering systems have been touted as approaching human performance.
1 code implementation • ACL 2018 • Craig Stewart, Nikolai Vogler, Junjie Hu, Jordan Boyd-Graber, Graham Neubig
Simultaneous interpretation, translation of the spoken word in real-time, is both highly challenging and physically demanding.
no code implementations • NAACL 2018 • Shudong Hao, Jordan Boyd-Graber, Michael J. Paul
Multilingual topic models enable document analysis across languages through coherent multilingual summaries of the data.
no code implementations • 22 Apr 2018 • Fenfei Guo, Mohit Iyyer, Jordan Boyd-Graber
Methods for learning word sense embeddings represent a single word with multiple sense-specific vectors.
no code implementations • EMNLP 2018 • Shi Feng, Eric Wallace, Alvin Grissom II, Mohit Iyyer, Pedro Rodriguez, Jordan Boyd-Graber
In existing interpretation methods for NLP, a word's importance is determined by either input perturbation---measuring the decrease in model confidence when that word is removed---or by the gradient with respect to that word.
1 code implementation • NAACL 2018 • Varun Manjunatha, Mohit Iyyer, Jordan Boyd-Graber, Larry Davis
Automatic colorization is the process of adding color to greyscale images.
no code implementations • EMNLP 2017 • You Lu, Jeffrey Lund, Jordan Boyd-Graber
For online topic modeling, the magnitude of gradients is very large.
no code implementations • EMNLP 2017 • Weiwei Yang, Jordan Boyd-Graber, Philip Resnik
Models work best when they are optimized taking into account the evaluation criteria that people care about.
1 code implementation • EMNLP 2017 • Khanh Nguyen, Hal Daumé III, Jordan Boyd-Graber
Machine translation is a natural candidate problem for reinforcement learning from human feedback: users provide quick, dirty ratings on candidate translations to guide a system to improve.
no code implementations • ACL 2017 • Jeffrey Lund, Connor Cook, Kevin Seppi, Jordan Boyd-Graber
We propose combinations of words as anchors, going beyond existing single word anchor algorithms{---}an approach we call {``}Tandem Anchors{''}.
no code implementations • TACL 2017 • Alison Smith, Tak Yeon Lee, Forough Poursabzi-Sangdeh, Jordan Boyd-Graber, Niklas Elmqvist, Leah Findlater
Probabilistic topic models are important tools for indexing, summarizing, and analyzing large document collections by their themes.
2 code implementations • CVPR 2017 • Mohit Iyyer, Varun Manjunatha, Anupam Guha, Yogarshi Vyas, Jordan Boyd-Graber, Hal Daumé III, Larry Davis
While computers can now describe what is explicitly depicted in natural images, in this paper we examine whether they can understand the closure-driven narratives conveyed by stylized artwork and dialogue in comic book panels.
1 code implementation • 18 Sep 2016 • He He, Jordan Boyd-Graber, Kevin Kwok, Hal Daumé III
Opponent modeling is necessary in multi-agent settings where secondary agents with competing goals also adapt their strategies, yet it remains challenging because strategies interact with each other and change.
no code implementations • IJCNLP 2015 • Vlad Niculae, Srijan Kumar, Jordan Boyd-Graber, Cristian Danescu-Niculescu-Mizil
Interpersonal relations are fickle, with close friendships often dissolving into enmity.
no code implementations • TACL 2014 • Ke Zhai, Jordan Boyd-Graber, Shay B. Cohen
Adaptor grammars are a flexible, powerful formalism for defining nonparametric, unsupervised models of grammar productions.