Search Results for author: Robin Jia

Found 45 papers, 24 papers with code

Robustness and Adversarial Examples in Natural Language Processing

no code implementations EMNLP (ACL) 2021 Kai-Wei Chang, He He, Robin Jia, Sameer Singh

In particular, we will review recent studies on analyzing the weakness of NLP systems when facing adversarial inputs and data with a distribution shift.

IsoBench: Benchmarking Multimodal Foundation Models on Isomorphic Representations

no code implementations1 Apr 2024 Deqing Fu, Ghazal Khalighinejad, Ollie Liu, Bhuwan Dhingra, Dani Yogatama, Robin Jia, Willie Neiswanger

Current foundation models exhibit impressive capabilities when prompted either with text only or with both image and text inputs.

Benchmarking Math

Proving membership in LLM pretraining data via data watermarks

no code implementations16 Feb 2024 Johnny Tian-Zheng Wei, Ryan Yixiang Wang, Robin Jia

Detecting whether copyright holders' works were used in LLM pretraining is poised to be an important problem.

Does VLN Pretraining Work with Nonsensical or Irrelevant Instructions?

no code implementations28 Nov 2023 Wang Zhu, Ishika Singh, Yuan Huang, Robin Jia, Jesse Thomason

Data augmentation via back-translation is common when pretraining Vision-and-Language Navigation (VLN) models, even though the generated instructions are noisy.

Data Augmentation Translation +1

Efficient End-to-End Visual Document Understanding with Rationale Distillation

no code implementations16 Nov 2023 Wang Zhu, Alekh Agarwal, Mandar Joshi, Robin Jia, Jesse Thomason, Kristina Toutanova

Pre-processing tools, such as optical character recognition (OCR), can map document image inputs to textual tokens, then large language models (LLMs) can reason over text.

document understanding Optical Character Recognition +1

Do Localization Methods Actually Localize Memorized Data in LLMs? A Tale of Two Benchmarks

2 code implementations15 Nov 2023 Ting-Yun Chang, Jesse Thomason, Robin Jia

On the other hand, even successful methods identify neurons that are not specific to a single memorized sequence.

Benchmarking Network Pruning

Transformers Learn Higher-Order Optimization Methods for In-Context Learning: A Study with Linear Models

no code implementations26 Oct 2023 Deqing Fu, Tian-Qi Chen, Robin Jia, Vatsal Sharan

In this paper, we instead demonstrate that Transformers learn to implement higher-order optimization methods to perform ICL.

In-Context Learning

Estimating Large Language Model Capabilities without Labeled Test Data

1 code implementation24 May 2023 Harvey Yiyun Fu, Qinyuan Ye, Albert Xu, Xiang Ren, Robin Jia

In this paper, we propose the task of ICL accuracy estimation, in which we predict the accuracy of an LLM when doing in-context learning on a new task given only unlabeled test data for that task.

In-Context Learning Language Modelling +1

Chain-of-Questions Training with Latent Answers for Robust Multistep Question Answering

no code implementations24 May 2023 Wang Zhu, Jesse Thomason, Robin Jia

We train a language model (LM) to robustly answer multistep questions by generating and answering sub-questions.

Language Modelling Question Answering

How Predictable Are Large Language Model Capabilities? A Case Study on BIG-bench

1 code implementation24 May 2023 Qinyuan Ye, Harvey Yiyun Fu, Xiang Ren, Robin Jia

We investigate the predictability of large language model (LLM) capabilities: given records of past experiments using different model families, numbers of parameters, tasks, and numbers of in-context examples, can we accurately predict LLM performance on new experiment configurations?

Language Modelling Large Language Model

SCENE: Self-Labeled Counterfactuals for Extrapolating to Negative Examples

1 code implementation13 May 2023 Deqing Fu, Ameya Godbole, Robin Jia

In this work, we propose Self-labeled Counterfactuals for Extrapolating to Negative Examples (SCENE), an automatic method for synthesizing training data that greatly improves models' ability to detect challenging negative examples.

Data Augmentation Natural Language Inference +2

Data Curation Alone Can Stabilize In-context Learning

1 code implementation20 Dec 2022 Ting-Yun Chang, Robin Jia

Across five tasks and two LLMs, sampling from stable subsets selected by CondAcc and Datamodels improves average accuracy over sampling from the entire training set by 7. 7% and 6. 3%, respectively.

In-Context Learning Retrieval

Contrastive Novelty-Augmented Learning: Anticipating Outliers with Large Language Models

1 code implementation28 Nov 2022 Albert Xu, Xiang Ren, Robin Jia

In many task settings, text classification models are likely to encounter examples from novel classes on which they cannot predict correctly.

Language Modelling Large Language Model +2

Generalization Differences between End-to-End and Neuro-Symbolic Vision-Language Reasoning Systems

no code implementations26 Oct 2022 Wang Zhu, Jesse Thomason, Robin Jia

For vision-and-language reasoning tasks, both fully connectionist, end-to-end methods and hybrid, neuro-symbolic methods have achieved high in-distribution performance.

Question Answering Visual Question Answering

Benchmarking Long-tail Generalization with Likelihood Splits

1 code implementation13 Oct 2022 Ameya Godbole, Robin Jia

In order to reliably process natural language, NLP systems must generalize to the long tail of rare utterances.

Benchmarking Language Modelling +3

Are Sample-Efficient NLP Models More Robust?

no code implementations12 Oct 2022 Nelson F. Liu, Ananya Kumar, Percy Liang, Robin Jia

Recent results in image classification and extractive question answering have observed that pre-trained models trained on less in-distribution data have better out-of-distribution performance.

Extractive Question-Answering Image Classification +2

On Continual Model Refinement in Out-of-Distribution Data Streams

no code implementations ACL 2022 Bill Yuchen Lin, Sida Wang, Xi Victoria Lin, Robin Jia, Lin Xiao, Xiang Ren, Wen-tau Yih

Real-world natural language processing (NLP) models need to be continually updated to fix the prediction errors in out-of-distribution (OOD) data streams while overcoming catastrophic forgetting.

Benchmarking Continual Learning

Knowledge Base Question Answering by Case-based Reasoning over Subgraphs

1 code implementation22 Feb 2022 Rajarshi Das, Ameya Godbole, Ankita Naik, Elliot Tower, Robin Jia, Manzil Zaheer, Hannaneh Hajishirzi, Andrew McCallum

Question answering (QA) over knowledge bases (KBs) is challenging because of the diverse, essentially unbounded, types of reasoning patterns needed.

Knowledge Base Question Answering

Models in the Loop: Aiding Crowdworkers with Generative Annotation Assistants

no code implementations NAACL 2022 Max Bartolo, Tristan Thrush, Sebastian Riedel, Pontus Stenetorp, Robin Jia, Douwe Kiela

We collect training datasets in twenty experimental settings and perform a detailed analysis of this approach for the task of extractive question answering (QA) for both standard and adversarial data collection.

Extractive Question-Answering Question Answering

Analyzing Dynamic Adversarial Training Data in the Limit

1 code implementation Findings (ACL) 2022 Eric Wallace, Adina Williams, Robin Jia, Douwe Kiela

To create models that are robust across a wide range of test inputs, training datasets should include diverse examples that span numerous phenomena.

On the Robustness of Reading Comprehension Models to Entity Renaming

1 code implementation NAACL 2022 Jun Yan, Yang Xiao, Sagnik Mukherjee, Bill Yuchen Lin, Robin Jia, Xiang Ren

We study the robustness of machine reading comprehension (MRC) models to entity renaming -- do models make more wrong predictions when the same questions are asked about an entity whose name has been changed?

Continual Pretraining Machine Reading Comprehension

Evaluation Examples are not Equally Informative: How should that change NLP Leaderboards?

1 code implementation ACL 2021 Pedro Rodriguez, Joe Barrow, Alexander Miserlis Hoyle, John P. Lalor, Robin Jia, Jordan Boyd-Graber

While leaderboards are a straightforward ranking of NLP models, this simplicity can mask nuances in evaluation items (examples) and subjects (NLP models).

Question Answering Infused Pre-training of General-Purpose Contextualized Representations

1 code implementation Findings (ACL) 2022 Robin Jia, Mike Lewis, Luke Zettlemoyer

We propose a pre-training objective based on question answering (QA) for learning general-purpose contextual representations, motivated by the intuition that the representation of a phrase in a passage should encode all questions that the phrase can answer in context.

named-entity-recognition Named Entity Recognition +3

Swords: A Benchmark for Lexical Substitution with Improved Data Coverage and Quality

1 code implementation NAACL 2021 Mina Lee, Chris Donahue, Robin Jia, Alexander Iyabor, Percy Liang

We release a new benchmark for lexical substitution, the task of finding appropriate substitutes for a target word in a context.

The statistical advantage of automatic NLG metrics at the system level

1 code implementation ACL 2021 Johnny Tian-Zheng Wei, Robin Jia

Our analysis compares the adjusted error of metrics to humans and a derived, perfect segment-level annotator, both of which are unbiased estimators dependent on the number of judgments collected.

Dynaboard: An Evaluation-As-A-Service Platform for Holistic Next-Generation Benchmarking

no code implementations NeurIPS 2021 Zhiyi Ma, Kawin Ethayarajh, Tristan Thrush, Somya Jain, Ledell Wu, Robin Jia, Christopher Potts, Adina Williams, Douwe Kiela

We introduce Dynaboard, an evaluation-as-a-service framework for hosting benchmarks and conducting holistic model comparison, integrated with the Dynabench platform.

Benchmarking

Improving Question Answering Model Robustness with Synthetic Adversarial Data Generation

no code implementations EMNLP 2021 Max Bartolo, Tristan Thrush, Robin Jia, Sebastian Riedel, Pontus Stenetorp, Douwe Kiela

We further conduct a novel human-in-the-loop evaluation to show that our models are considerably more robust to new human-written adversarial examples: crowdworkers can fool our model only 8. 8% of the time on average, compared to 17. 6% for a model trained without synthetic data.

Answer Selection Question Generation

Masked Language Modeling and the Distributional Hypothesis: Order Word Matters Pre-training for Little

no code implementations EMNLP 2021 Koustuv Sinha, Robin Jia, Dieuwke Hupkes, Joelle Pineau, Adina Williams, Douwe Kiela

A possible explanation for the impressive performance of masked language model (MLM) pre-training is that such models have learned to represent the syntactic structures prevalent in classical NLP pipelines.

Language Modelling Masked Language Modeling

Do Question Answering Modeling Improvements Hold Across Benchmarks?

no code implementations1 Feb 2021 Nelson F. Liu, Tony Lee, Robin Jia, Percy Liang

Do question answering (QA) modeling improvements (e. g., choice of architecture and training procedure) hold consistently across the diverse landscape of QA benchmarks?

Question Answering

Human Evaluation of Spoken vs. Visual Explanations for Open-Domain QA

no code implementations30 Dec 2020 Ana Valeria Gonzalez, Gagan Bansal, Angela Fan, Robin Jia, Yashar Mehdad, Srinivasan Iyer

While research on explaining predictions of open-domain QA systems (ODQA) to users is gaining momentum, most works have failed to evaluate the extent to which explanations improve user trust.

To what extent do human explanations of model behavior align with actual model behavior?

no code implementations EMNLP (BlackboxNLP) 2021 Grusha Prasad, Yixin Nie, Mohit Bansal, Robin Jia, Douwe Kiela, Adina Williams

Given the increasingly prominent role NLP models (will) play in our lives, it is important for human expectations of model behavior to align with actual model behavior.

Natural Language Inference

With Little Power Comes Great Responsibility

2 code implementations EMNLP 2020 Dallas Card, Peter Henderson, Urvashi Khandelwal, Robin Jia, Kyle Mahowald, Dan Jurafsky

Despite its importance to experimental design, statistical power (the probability that, given a real effect, an experiment will reject the null hypothesis) has largely been ignored by the NLP community.

Experimental Design Machine Translation +1

On the Importance of Adaptive Data Collection for Extremely Imbalanced Pairwise Tasks

1 code implementation Findings of the Association for Computational Linguistics 2020 Stephen Mussmann, Robin Jia, Percy Liang

Many pairwise classification tasks, such as paraphrase detection and open-domain question answering, naturally have extreme label imbalance (e. g., $99. 99\%$ of examples are negatives).

Active Learning Open-Domain Question Answering +1

Selective Question Answering under Domain Shift

2 code implementations ACL 2020 Amita Kamath, Robin Jia, Percy Liang

In this work, we propose the setting of selective question answering under domain shift, in which a QA model is tested on a mixture of in-domain and out-of-domain data, and must answer (i. e., not abstain on) as many questions as possible while maintaining high accuracy.

Question Answering

MRQA 2019 Shared Task: Evaluating Generalization in Reading Comprehension

1 code implementation WS 2019 Adam Fisch, Alon Talmor, Robin Jia, Minjoon Seo, Eunsol Choi, Danqi Chen

We present the results of the Machine Reading for Question Answering (MRQA) 2019 shared task on evaluating the generalization capabilities of reading comprehension systems.

Multi-Task Learning Question Answering +1

Know What You Don't Know: Unanswerable Questions for SQuAD

12 code implementations ACL 2018 Pranav Rajpurkar, Robin Jia, Percy Liang

Extractive reading comprehension systems can often locate the correct answer to a question in a context document, but they also tend to make unreliable guesses on questions for which the correct answer is not stated in the context.

Natural Language Understanding Question Answering +1

Delete, Retrieve, Generate: A Simple Approach to Sentiment and Style Transfer

6 code implementations NAACL 2018 Juncen Li, Robin Jia, He He, Percy Liang

We consider the task of text attribute transfer: transforming a sentence to alter a specific attribute (e. g., sentiment) while preserving its attribute-independent content (e. g., changing "screen is just the right size" to "screen is too small").

Attribute Image Captioning +4

Adversarial Examples for Evaluating Reading Comprehension Systems

3 code implementations EMNLP 2017 Robin Jia, Percy Liang

Standard accuracy metrics indicate that reading comprehension systems are making rapid progress, but the extent to which these systems truly understand language remains unclear.

Question Answering Reading Comprehension

Data Recombination for Neural Semantic Parsing

1 code implementation ACL 2016 Robin Jia, Percy Liang

Modeling crisp logical regularities is crucial in semantic parsing, making it difficult for neural models with no task-specific prior knowledge to achieve good results.

Semantic Parsing

Cannot find the paper you are looking for? You can Submit a new open access paper.