Search Results for author: Robin Jia

Found 45 papers, 24 papers with code

Robustness and Adversarial Examples in Natural Language Processing

no code implementations • EMNLP (ACL) 2021 • Kai-Wei Chang, He He, Robin Jia, Sameer Singh

In particular, we will review recent studies on analyzing the weakness of NLP systems when facing adversarial inputs and data with a distribution shift.

Paper
Add Code

Do Explanations Help Users Detect Errors in Open-Domain QA? An Evaluation of Spoken vs. Visual Explanations

no code implementations • Findings (ACL) 2021 • Ana Valeria González, Gagan Bansal, Angela Fan, Yashar Mehdad, Robin Jia, Srinivasan Iyer

Paper
Add Code

IsoBench: Benchmarking Multimodal Foundation Models on Isomorphic Representations

no code implementations • 1 Apr 2024 • Deqing Fu, Ghazal Khalighinejad, Ollie Liu, Bhuwan Dhingra, Dani Yogatama, Robin Jia, Willie Neiswanger

Current foundation models exhibit impressive capabilities when prompted either with text only or with both image and text inputs.

Benchmarking Math

Paper
Add Code

Proving membership in LLM pretraining data via data watermarks

no code implementations • 16 Feb 2024 • Johnny Tian-Zheng Wei, Ryan Yixiang Wang, Robin Jia

Detecting whether copyright holders' works were used in LLM pretraining is poised to be an important problem.

Paper
Add Code

Does VLN Pretraining Work with Nonsensical or Irrelevant Instructions?

no code implementations • 28 Nov 2023 • Wang Zhu, Ishika Singh, Yuan Huang, Robin Jia, Jesse Thomason

Data augmentation via back-translation is common when pretraining Vision-and-Language Navigation (VLN) models, even though the generated instructions are noisy.

Data Augmentation Translation +1

Paper
Add Code

Efficient End-to-End Visual Document Understanding with Rationale Distillation

no code implementations • 16 Nov 2023 • Wang Zhu, Alekh Agarwal, Mandar Joshi, Robin Jia, Jesse Thomason, Kristina Toutanova

Pre-processing tools, such as optical character recognition (OCR), can map document image inputs to textual tokens, then large language models (LLMs) can reason over text.

document understanding Optical Character Recognition +1

Paper
Add Code

Do Localization Methods Actually Localize Memorized Data in LLMs? A Tale of Two Benchmarks

2 code implementations • 15 Nov 2023 • Ting-Yun Chang, Jesse Thomason, Robin Jia

On the other hand, even successful methods identify neurons that are not specific to a single memorized sequence.

Benchmarking Network Pruning

Paper
Code

Transformers Learn Higher-Order Optimization Methods for In-Context Learning: A Study with Linear Models

no code implementations • 26 Oct 2023 • Deqing Fu, Tian-Qi Chen, Robin Jia, Vatsal Sharan

In this paper, we instead demonstrate that Transformers learn to implement higher-order optimization methods to perform ICL.

In-Context Learning

Paper
Add Code

Estimating Large Language Model Capabilities without Labeled Test Data

1 code implementation • 24 May 2023 • Harvey Yiyun Fu, Qinyuan Ye, Albert Xu, Xiang Ren, Robin Jia

In this paper, we propose the task of ICL accuracy estimation, in which we predict the accuracy of an LLM when doing in-context learning on a new task given only unlabeled test data for that task.

In-Context Learning Language Modelling +1

Paper
Code

Chain-of-Questions Training with Latent Answers for Robust Multistep Question Answering

no code implementations • 24 May 2023 • Wang Zhu, Jesse Thomason, Robin Jia

We train a language model (LM) to robustly answer multistep questions by generating and answering sub-questions.

Language Modelling Question Answering

Paper
Add Code

How Predictable Are Large Language Model Capabilities? A Case Study on BIG-bench

1 code implementation • 24 May 2023 • Qinyuan Ye, Harvey Yiyun Fu, Xiang Ren, Robin Jia

We investigate the predictability of large language model (LLM) capabilities: given records of past experiments using different model families, numbers of parameters, tasks, and numbers of in-context examples, can we accurately predict LLM performance on new experiment configurations?

Language Modelling Large Language Model

Paper
Code

SCENE: Self-Labeled Counterfactuals for Extrapolating to Negative Examples

1 code implementation • 13 May 2023 • Deqing Fu, Ameya Godbole, Robin Jia

In this work, we propose Self-labeled Counterfactuals for Extrapolating to Negative Examples (SCENE), an automatic method for synthesizing training data that greatly improves models' ability to detect challenging negative examples.

Data Augmentation Natural Language Inference +2

Paper
Code

Data Curation Alone Can Stabilize In-context Learning

1 code implementation • 20 Dec 2022 • Ting-Yun Chang, Robin Jia

Across five tasks and two LLMs, sampling from stable subsets selected by CondAcc and Datamodels improves average accuracy over sampling from the entire training set by 7. 7% and 6. 3%, respectively.

In-Context Learning Retrieval

Paper
Code

Contrastive Novelty-Augmented Learning: Anticipating Outliers with Large Language Models

1 code implementation • 28 Nov 2022 • Albert Xu, Xiang Ren, Robin Jia

In many task settings, text classification models are likely to encounter examples from novel classes on which they cannot predict correctly.

Language Modelling Large Language Model +2

Paper
Code

Generalization Differences between End-to-End and Neuro-Symbolic Vision-Language Reasoning Systems

no code implementations • 26 Oct 2022 • Wang Zhu, Jesse Thomason, Robin Jia

For vision-and-language reasoning tasks, both fully connectionist, end-to-end methods and hybrid, neuro-symbolic methods have achieved high in-distribution performance.

Question Answering Visual Question Answering

Paper
Add Code

Benchmarking Long-tail Generalization with Likelihood Splits

1 code implementation • 13 Oct 2022 • Ameya Godbole, Robin Jia

In order to reliably process natural language, NLP systems must generalize to the long tail of rare utterances.

Benchmarking Language Modelling +3

Paper
Code

Are Sample-Efficient NLP Models More Robust?

no code implementations • 12 Oct 2022 • Nelson F. Liu, Ananya Kumar, Percy Liang, Robin Jia

Recent results in image classification and extractive question answering have observed that pre-trained models trained on less in-distribution data have better out-of-distribution performance.

Extractive Question-Answering Image Classification +2

Paper
Add Code

On Continual Model Refinement in Out-of-Distribution Data Streams

no code implementations • ACL 2022 • Bill Yuchen Lin, Sida Wang, Xi Victoria Lin, Robin Jia, Lin Xiao, Xiang Ren, Wen-tau Yih

Real-world natural language processing (NLP) models need to be continually updated to fix the prediction errors in out-of-distribution (OOD) data streams while overcoming catastrophic forgetting.

Benchmarking Continual Learning

Paper
Add Code

Knowledge Base Question Answering by Case-based Reasoning over Subgraphs

1 code implementation • 22 Feb 2022 • Rajarshi Das, Ameya Godbole, Ankita Naik, Elliot Tower, Robin Jia, Manzil Zaheer, Hannaneh Hajishirzi, Andrew McCallum

Question answering (QA) over knowledge bases (KBs) is challenging because of the diverse, essentially unbounded, types of reasoning patterns needed.

Knowledge Base Question Answering

Paper
Code

Models in the Loop: Aiding Crowdworkers with Generative Annotation Assistants

no code implementations • NAACL 2022 • Max Bartolo, Tristan Thrush, Sebastian Riedel, Pontus Stenetorp, Robin Jia, Douwe Kiela

We collect training datasets in twenty experimental settings and perform a detailed analysis of this approach for the task of extractive question answering (QA) for both standard and adversarial data collection.

Extractive Question-Answering Question Answering

Paper
Add Code

Analyzing Dynamic Adversarial Training Data in the Limit

1 code implementation • Findings (ACL) 2022 • Eric Wallace, Adina Williams, Robin Jia, Douwe Kiela

To create models that are robust across a wide range of test inputs, training datasets should include diverse examples that span numerous phenomena.

Paper
Code

On the Robustness of Reading Comprehension Models to Entity Renaming

1 code implementation • NAACL 2022 • Jun Yan, Yang Xiao, Sagnik Mukherjee, Bill Yuchen Lin, Robin Jia, Xiang Ren

We study the robustness of machine reading comprehension (MRC) models to entity renaming -- do models make more wrong predictions when the same questions are asked about an entity whose name has been changed?

Continual Pretraining Machine Reading Comprehension

Paper
Code

Evaluation Examples are not Equally Informative: How should that change NLP Leaderboards?

1 code implementation • ACL 2021 • Pedro Rodriguez, Joe Barrow, Alexander Miserlis Hoyle, John P. Lalor, Robin Jia, Jordan Boyd-Graber

While leaderboards are a straightforward ranking of NLP models, this simplicity can mask nuances in evaluation items (examples) and subjects (NLP models).

108

Paper
Code

Question Answering Infused Pre-training of General-Purpose Contextualized Representations

1 code implementation • Findings (ACL) 2022 • Robin Jia, Mike Lewis, Luke Zettlemoyer

We propose a pre-training objective based on question answering (QA) for learning general-purpose contextual representations, motivated by the intuition that the representation of a phrase in a passage should encode all questions that the phrase can answer in context.

named-entity-recognition Named Entity Recognition +3

Paper
Code

Swords: A Benchmark for Lexical Substitution with Improved Data Coverage and Quality

1 code implementation • NAACL 2021 • Mina Lee, Chris Donahue, Robin Jia, Alexander Iyabor, Percy Liang

We release a new benchmark for lexical substitution, the task of finding appropriate substitutes for a target word in a context.

Paper
Code

The statistical advantage of automatic NLG metrics at the system level

1 code implementation • ACL 2021 • Johnny Tian-Zheng Wei, Robin Jia

Our analysis compares the adjusted error of metrics to humans and a derived, perfect segment-level annotator, both of which are unbiased estimators dependent on the number of judgments collected.

Paper
Code

Dynaboard: An Evaluation-As-A-Service Platform for Holistic Next-Generation Benchmarking

no code implementations • NeurIPS 2021 • Zhiyi Ma, Kawin Ethayarajh, Tristan Thrush, Somya Jain, Ledell Wu, Robin Jia, Christopher Potts, Adina Williams, Douwe Kiela

We introduce Dynaboard, an evaluation-as-a-service framework for hosting benchmarks and conducting holistic model comparison, integrated with the Dynabench platform.

Benchmarking

Paper
Add Code

Improving Question Answering Model Robustness with Synthetic Adversarial Data Generation

no code implementations • EMNLP 2021 • Max Bartolo, Tristan Thrush, Robin Jia, Sebastian Riedel, Pontus Stenetorp, Douwe Kiela

We further conduct a novel human-in-the-loop evaluation to show that our models are considerably more robust to new human-written adversarial examples: crowdworkers can fool our model only 8. 8% of the time on average, compared to 17. 6% for a model trained without synthetic data.

Answer Selection Question Generation

Paper
Add Code

Masked Language Modeling and the Distributional Hypothesis: Order Word Matters Pre-training for Little

no code implementations • EMNLP 2021 • Koustuv Sinha, Robin Jia, Dieuwke Hupkes, Joelle Pineau, Adina Williams, Douwe Kiela

A possible explanation for the impressive performance of masked language model (MLM) pre-training is that such models have learned to represent the syntactic structures prevalent in classical NLP pipelines.

Language Modelling Masked Language Modeling

Paper
Add Code

Dynabench: Rethinking Benchmarking in NLP

no code implementations • NAACL 2021 • Douwe Kiela, Max Bartolo, Yixin Nie, Divyansh Kaushik, Atticus Geiger, Zhengxuan Wu, Bertie Vidgen, Grusha Prasad, Amanpreet Singh, Pratik Ringshia, Zhiyi Ma, Tristan Thrush, Sebastian Riedel, Zeerak Waseem, Pontus Stenetorp, Robin Jia, Mohit Bansal, Christopher Potts, Adina Williams

We introduce Dynabench, an open-source platform for dynamic dataset creation and model benchmarking.

Benchmarking

Paper
Add Code

Do Question Answering Modeling Improvements Hold Across Benchmarks?

no code implementations • 1 Feb 2021 • Nelson F. Liu, Tony Lee, Robin Jia, Percy Liang

Do question answering (QA) modeling improvements (e. g., choice of architecture and training procedure) hold consistently across the diverse landscape of QA benchmarks?

Question Answering

Paper
Add Code

Human Evaluation of Spoken vs. Visual Explanations for Open-Domain QA

no code implementations • 30 Dec 2020 • Ana Valeria Gonzalez, Gagan Bansal, Angela Fan, Robin Jia, Yashar Mehdad, Srinivasan Iyer

While research on explaining predictions of open-domain QA systems (ODQA) to users is gaining momentum, most works have failed to evaluate the extent to which explanations improve user trust.

Paper
Add Code

To what extent do human explanations of model behavior align with actual model behavior?

no code implementations • EMNLP (BlackboxNLP) 2021 • Grusha Prasad, Yixin Nie, Mohit Bansal, Robin Jia, Douwe Kiela, Adina Williams

Given the increasingly prominent role NLP models (will) play in our lives, it is important for human expectations of model behavior to align with actual model behavior.

Natural Language Inference

Paper
Add Code

With Little Power Comes Great Responsibility

2 code implementations • EMNLP 2020 • Dallas Card, Peter Henderson, Urvashi Khandelwal, Robin Jia, Kyle Mahowald, Dan Jurafsky

Despite its importance to experimental design, statistical power (the probability that, given a real effect, an experiment will reject the null hypothesis) has largely been ignored by the NLP community.

Experimental Design Machine Translation +1

164

Paper
Code

On the Importance of Adaptive Data Collection for Extremely Imbalanced Pairwise Tasks

1 code implementation • Findings of the Association for Computational Linguistics 2020 • Stephen Mussmann, Robin Jia, Percy Liang

Many pairwise classification tasks, such as paraphrase detection and open-domain question answering, naturally have extreme label imbalance (e. g., $99. 99\%$ of examples are negatives).

Active Learning Open-Domain Question Answering +1

Paper
Code

Selective Question Answering under Domain Shift

2 code implementations • ACL 2020 • Amita Kamath, Robin Jia, Percy Liang

In this work, we propose the setting of selective question answering under domain shift, in which a QA model is tested on a mixture of in-domain and out-of-domain data, and must answer (i. e., not abstain on) as many questions as possible while maintaining high accuracy.

Question Answering

Paper
Code

Robust Encodings: A Framework for Combating Adversarial Typos

1 code implementation • ACL 2020 • Erik Jones, Robin Jia, aditi raghunathan, Percy Liang

We instantiate RobEn to defend against a large family of adversarial typos.

Sentence

Paper
Code

MRQA 2019 Shared Task: Evaluating Generalization in Reading Comprehension

1 code implementation • WS 2019 • Adam Fisch, Alon Talmor, Robin Jia, Minjoon Seo, Eunsol Choi, Danqi Chen

We present the results of the Machine Reading for Question Answering (MRQA) 2019 shared task on evaluating the generalization capabilities of reading comprehension systems.

Multi-Task Learning Question Answering +1

289

Paper
Code

Certified Robustness to Adversarial Word Substitutions

2 code implementations • IJCNLP 2019 • Robin Jia, aditi raghunathan, Kerem Göksel, Percy Liang

We train the first models that are provably robust to all word substitutions in this family.

Data Augmentation Natural Language Inference +1

Paper
Code

Document-Level N-ary Relation Extraction with Multiscale Representation Learning

no code implementations • NAACL 2019 • Robin Jia, Cliff Wong, Hoifung Poon

Most information extraction methods focus on binary relations expressed within single sentences.

Reading Comprehension Relation +3

Paper
Add Code

Document-Level $N$-ary Relation Extraction with Multiscale Representation Learning

no code implementations • 4 Apr 2019 • Robin Jia, Cliff Wong, Hoifung Poon

Widening the system's purview to the entire document maximizes potential recall.

Reading Comprehension Relation +3

Paper
Add Code

Know What You Don't Know: Unanswerable Questions for SQuAD

12 code implementations • ACL 2018 • Pranav Rajpurkar, Robin Jia, Percy Liang

Extractive reading comprehension systems can often locate the correct answer to a question in a context document, but they also tend to make unreliable guesses on questions for which the correct answer is not stated in the context.

Natural Language Understanding Question Answering +1

Paper
Code

Delete, Retrieve, Generate: A Simple Approach to Sentiment and Style Transfer

6 code implementations • NAACL 2018 • Juncen Li, Robin Jia, He He, Percy Liang

We consider the task of text attribute transfer: transforming a sentence to alter a specific attribute (e. g., sentiment) while preserving its attribute-independent content (e. g., changing "screen is just the right size" to "screen is too small").

Ranked #1 on Unsupervised Text Style Transfer on Yelp2018

Attribute Image Captioning +4

467

Paper
Code

Adversarial Examples for Evaluating Reading Comprehension Systems

3 code implementations • EMNLP 2017 • Robin Jia, Percy Liang

Standard accuracy metrics indicate that reading comprehension systems are making rapid progress, but the extent to which these systems truly understand language remains unclear.

Question Answering Reading Comprehension

4,296

Paper
Code

Data Recombination for Neural Semantic Parsing

1 code implementation • ACL 2016 • Robin Jia, Percy Liang

Modeling crisp logical regularities is crucial in semantic parsing, making it difficult for neural models with no task-specific prior knowledge to achieve good results.

Semantic Parsing

Paper
Code

Cannot find the paper you are looking for? You can Submit a new open access paper.