no code implementations • EMNLP (ACL) 2021 • Kai-Wei Chang, He He, Robin Jia, Sameer Singh
In particular, we will review recent studies on analyzing the weakness of NLP systems when facing adversarial inputs and data with a distribution shift.
no code implementations • 24 May 2023 • Wang Zhu, Jesse Thomason, Robin Jia
We train a language model (LM) to robustly answer multistep questions by generating and answering sub-questions.
no code implementations • 24 May 2023 • Qinyuan Ye, Harvey Yiyun Fu, Xiang Ren, Robin Jia
We investigate the predictability of large language model (LLM) capabilities: given records of past experiments using different model families, numbers of parameters, tasks, and numbers of in-context examples, can we accurately predict LLM performance on new experiment configurations?
no code implementations • 24 May 2023 • Harvey Yiyun Fu, Qinyuan Ye, Albert Xu, Xiang Ren, Robin Jia
In this paper, we propose the task of ICL accuracy estimation, in which we predict the accuracy of an LLM when doing in-context learning on a new task given only unlabeled data for that task.
no code implementations • 13 May 2023 • Deqing Fu, Ameya Godbole, Robin Jia
In this work, we propose Self-labeled Counterfactuals for Extrapolating to Negative Examples (SCENE), an automatic method for synthesizing training data that greatly improves models' ability to detect challenging negative examples.
1 code implementation • 20 Dec 2022 • Ting-Yun Chang, Robin Jia
Across five tasks and two LLMs, sampling from stable subsets selected by CondAcc and Datamodels improves average accuracy over sampling from the entire training set by 7. 7% and 6. 3%, respectively.
1 code implementation • 28 Nov 2022 • Albert Xu, Xiang Ren, Robin Jia
In many task settings, text classification models are likely to encounter examples from novel classes on which they cannot predict correctly.
no code implementations • 26 Oct 2022 • Wang Zhu, Jesse Thomason, Robin Jia
For vision-and-language reasoning tasks, both fully connectionist, end-to-end methods and hybrid, neuro-symbolic methods have achieved high in-distribution performance.
1 code implementation • 13 Oct 2022 • Ameya Godbole, Robin Jia
In order to reliably process natural language, NLP systems must generalize to the long tail of rare utterances.
no code implementations • 12 Oct 2022 • Nelson F. Liu, Ananya Kumar, Percy Liang, Robin Jia
Recent results in image classification and extractive question answering have observed that pre-trained models trained on less in-distribution data have better out-of-distribution performance.
no code implementations • ACL 2022 • Bill Yuchen Lin, Sida Wang, Xi Victoria Lin, Robin Jia, Lin Xiao, Xiang Ren, Wen-tau Yih
Real-world natural language processing (NLP) models need to be continually updated to fix the prediction errors in out-of-distribution (OOD) data streams while overcoming catastrophic forgetting.
1 code implementation • 22 Feb 2022 • Rajarshi Das, Ameya Godbole, Ankita Naik, Elliot Tower, Robin Jia, Manzil Zaheer, Hannaneh Hajishirzi, Andrew McCallum
Question answering (QA) over knowledge bases (KBs) is challenging because of the diverse, essentially unbounded, types of reasoning patterns needed.
no code implementations • NAACL 2022 • Max Bartolo, Tristan Thrush, Sebastian Riedel, Pontus Stenetorp, Robin Jia, Douwe Kiela
We collect training datasets in twenty experimental settings and perform a detailed analysis of this approach for the task of extractive question answering (QA) for both standard and adversarial data collection.
1 code implementation • NAACL 2022 • Jun Yan, Yang Xiao, Sagnik Mukherjee, Bill Yuchen Lin, Robin Jia, Xiang Ren
We study the robustness of machine reading comprehension (MRC) models to entity renaming -- do models make more wrong predictions when the same questions are asked about an entity whose name has been changed?
1 code implementation • Findings (ACL) 2022 • Eric Wallace, Adina Williams, Robin Jia, Douwe Kiela
To create models that are robust across a wide range of test inputs, training datasets should include diverse examples that span numerous phenomena.
1 code implementation • ACL 2021 • Pedro Rodriguez, Joe Barrow, Alexander Miserlis Hoyle, John P. Lalor, Robin Jia, Jordan Boyd-Graber
While leaderboards are a straightforward ranking of NLP models, this simplicity can mask nuances in evaluation items (examples) and subjects (NLP models).
1 code implementation • Findings (ACL) 2022 • Robin Jia, Mike Lewis, Luke Zettlemoyer
We propose a pre-training objective based on question answering (QA) for learning general-purpose contextual representations, motivated by the intuition that the representation of a phrase in a passage should encode all questions that the phrase can answer in context.
1 code implementation • NAACL 2021 • Mina Lee, Chris Donahue, Robin Jia, Alexander Iyabor, Percy Liang
We release a new benchmark for lexical substitution, the task of finding appropriate substitutes for a target word in a context.
1 code implementation • ACL 2021 • Johnny Tian-Zheng Wei, Robin Jia
Our analysis compares the adjusted error of metrics to humans and a derived, perfect segment-level annotator, both of which are unbiased estimators dependent on the number of judgments collected.
no code implementations • NeurIPS 2021 • Zhiyi Ma, Kawin Ethayarajh, Tristan Thrush, Somya Jain, Ledell Wu, Robin Jia, Christopher Potts, Adina Williams, Douwe Kiela
We introduce Dynaboard, an evaluation-as-a-service framework for hosting benchmarks and conducting holistic model comparison, integrated with the Dynabench platform.
no code implementations • EMNLP 2021 • Max Bartolo, Tristan Thrush, Robin Jia, Sebastian Riedel, Pontus Stenetorp, Douwe Kiela
We further conduct a novel human-in-the-loop evaluation to show that our models are considerably more robust to new human-written adversarial examples: crowdworkers can fool our model only 8. 8% of the time on average, compared to 17. 6% for a model trained without synthetic data.
no code implementations • EMNLP 2021 • Koustuv Sinha, Robin Jia, Dieuwke Hupkes, Joelle Pineau, Adina Williams, Douwe Kiela
A possible explanation for the impressive performance of masked language model (MLM) pre-training is that such models have learned to represent the syntactic structures prevalent in classical NLP pipelines.
no code implementations • NAACL 2021 • Douwe Kiela, Max Bartolo, Yixin Nie, Divyansh Kaushik, Atticus Geiger, Zhengxuan Wu, Bertie Vidgen, Grusha Prasad, Amanpreet Singh, Pratik Ringshia, Zhiyi Ma, Tristan Thrush, Sebastian Riedel, Zeerak Waseem, Pontus Stenetorp, Robin Jia, Mohit Bansal, Christopher Potts, Adina Williams
We introduce Dynabench, an open-source platform for dynamic dataset creation and model benchmarking.
no code implementations • 1 Feb 2021 • Nelson F. Liu, Tony Lee, Robin Jia, Percy Liang
Do question answering (QA) modeling improvements (e. g., choice of architecture and training procedure) hold consistently across the diverse landscape of QA benchmarks?
no code implementations • 30 Dec 2020 • Ana Valeria Gonzalez, Gagan Bansal, Angela Fan, Robin Jia, Yashar Mehdad, Srinivasan Iyer
While research on explaining predictions of open-domain QA systems (ODQA) to users is gaining momentum, most works have failed to evaluate the extent to which explanations improve user trust.
no code implementations • EMNLP (BlackboxNLP) 2021 • Grusha Prasad, Yixin Nie, Mohit Bansal, Robin Jia, Douwe Kiela, Adina Williams
Given the increasingly prominent role NLP models (will) play in our lives, it is important for human expectations of model behavior to align with actual model behavior.
2 code implementations • EMNLP 2020 • Dallas Card, Peter Henderson, Urvashi Khandelwal, Robin Jia, Kyle Mahowald, Dan Jurafsky
Despite its importance to experimental design, statistical power (the probability that, given a real effect, an experiment will reject the null hypothesis) has largely been ignored by the NLP community.
1 code implementation • Findings of the Association for Computational Linguistics 2020 • Stephen Mussmann, Robin Jia, Percy Liang
Many pairwise classification tasks, such as paraphrase detection and open-domain question answering, naturally have extreme label imbalance (e. g., $99. 99\%$ of examples are negatives).
2 code implementations • ACL 2020 • Amita Kamath, Robin Jia, Percy Liang
In this work, we propose the setting of selective question answering under domain shift, in which a QA model is tested on a mixture of in-domain and out-of-domain data, and must answer (i. e., not abstain on) as many questions as possible while maintaining high accuracy.
1 code implementation • ACL 2020 • Erik Jones, Robin Jia, aditi raghunathan, Percy Liang
We instantiate RobEn to defend against a large family of adversarial typos.
1 code implementation • WS 2019 • Adam Fisch, Alon Talmor, Robin Jia, Minjoon Seo, Eunsol Choi, Danqi Chen
We present the results of the Machine Reading for Question Answering (MRQA) 2019 shared task on evaluating the generalization capabilities of reading comprehension systems.
2 code implementations • IJCNLP 2019 • Robin Jia, aditi raghunathan, Kerem Göksel, Percy Liang
We train the first models that are provably robust to all word substitutions in this family.
no code implementations • NAACL 2019 • Robin Jia, Cliff Wong, Hoifung Poon
Most information extraction methods focus on binary relations expressed within single sentences.
no code implementations • 4 Apr 2019 • Robin Jia, Cliff Wong, Hoifung Poon
Widening the system's purview to the entire document maximizes potential recall.
13 code implementations • ACL 2018 • Pranav Rajpurkar, Robin Jia, Percy Liang
Extractive reading comprehension systems can often locate the correct answer to a question in a context document, but they also tend to make unreliable guesses on questions for which the correct answer is not stated in the context.
6 code implementations • NAACL 2018 • Juncen Li, Robin Jia, He He, Percy Liang
We consider the task of text attribute transfer: transforming a sentence to alter a specific attribute (e. g., sentiment) while preserving its attribute-independent content (e. g., changing "screen is just the right size" to "screen is too small").
Ranked #1 on
Unsupervised Text Style Transfer
on Yelp2018
3 code implementations • EMNLP 2017 • Robin Jia, Percy Liang
Standard accuracy metrics indicate that reading comprehension systems are making rapid progress, but the extent to which these systems truly understand language remains unclear.
1 code implementation • ACL 2016 • Robin Jia, Percy Liang
Modeling crisp logical regularities is crucial in semantic parsing, making it difficult for neural models with no task-specific prior knowledge to achieve good results.