Search Results for author: Percy Liang

Found 217 papers, 157 papers with code

Learning Semantic Correspondences with Less Supervision

1 code implementation1 Aug 2009 Percy Liang, Michael Jordan, Dan Klein

A central problem in grounded language acquisition is learning the correspondences between a rich world state and a stream of text which references that world state.

Language Acquisition

Spectral Experts for Estimating Mixtures of Linear Regressions

no code implementations17 Jun 2013 Arun Tejasvi Chaganty, Percy Liang

Discriminative latent-variable models are typically learned using EM or gradient-based optimization, which suffer from local optima.

regression

Dropout Training as Adaptive Regularization

no code implementations NeurIPS 2013 Stefan Wager, Sida Wang, Percy Liang

Dropout and other feature noising schemes control overfitting by artificially corrupting the training data.

Document Classification

Lambda Dependency-Based Compositional Semantics

no code implementations cs.AL 2013 Percy Liang

This short note presents a new formal language, lambda dependency-based compositional semantics (lambda DCS) for representing logical forms in semantic parsing.

Semantic Parsing

Relaxations for inference in restricted Boltzmann machines

no code implementations21 Dec 2013 Sida I. Wang, Roy Frostig, Percy Liang, Christopher D. Manning

We propose a relaxation-based approximate inference algorithm that samples near-MAP configurations of a binary pairwise Markov random field.

Altitude Training: Strong Bounds for Single-Layer Dropout

no code implementations NeurIPS 2014 Stefan Wager, William Fithian, Sida Wang, Percy Liang

Dropout training, originally designed for deep neural networks, has been successful on high-dimensional single-layer natural language tasks.

The Statistics of Streaming Sparse Regression

no code implementations13 Dec 2014 Jacob Steinhardt, Stefan Wager, Percy Liang

We present a sparse analogue to stochastic gradient descent that is guaranteed to perform well under similar conditions to the lasso.

regression

Imitation Learning of Agenda-based Semantic Parsers

1 code implementation TACL 2015 Jonathan Berant, Percy Liang

Semantic parsers conventionally construct logical forms bottom-up in a fixed order, resulting in the generation of many extraneous partial logical forms.

Imitation Learning Question Answering +1

Tensor Factorization via Matrix Factorization

1 code implementation29 Jan 2015 Volodymyr Kuleshov, Arun Tejasvi Chaganty, Percy Liang

Tensor factorization arises in many machine learning applications, such knowledge base modeling and parameter estimation in latent variable models.

Learning Fast-Mixing Models for Structured Prediction

1 code implementation24 Feb 2015 Jacob Steinhardt, Percy Liang

Markov Chain Monte Carlo (MCMC) algorithms are often used for approximate inference inside learning, but their slow mixing can be difficult to diagnose and the approximations can seriously degrade learning.

Structured Prediction

Reified Context Models

1 code implementation24 Feb 2015 Jacob Steinhardt, Percy Liang

A classic tension exists between exact inference in a simple model and approximate inference in a complex model.

Learning Where to Sample in Structured Prediction

1 code implementation9 May 2015 Tianlin Shi, Jacob Steinhardt, Percy Liang

In structured prediction, most inference algorithms allocate a homogeneous amount of computation to all parts of the output, which can be wasteful when different parts vary widely in terms of difficulty.

Reinforcement Learning (RL) Structured Prediction

Traversing Knowledge Graphs in Vector Space

2 code implementations EMNLP 2015 Kelvin Guu, John Miller, Percy Liang

Path queries on a knowledge graph can be used to answer compositional questions such as "What languages are spoken by people living in Lisbon?".

Knowledge Base Completion Knowledge Graphs

Compositional Semantic Parsing on Semi-Structured Tables

4 code implementations IJCNLP 2015 Panupong Pasupat, Percy Liang

Two important aspects of semantic parsing for question answering are the breadth of the knowledge source and the depth of logical compositionality.

Question Answering Semantic Parsing

Data Augmentation via Levy Processes

1 code implementation21 Mar 2016 Stefan Wager, William Fithian, Percy Liang

The framework imagines data as being drawn from a slice of a Levy process.

Image Augmentation

Learning Executable Semantic Parsers for Natural Language Understanding

no code implementations22 Mar 2016 Percy Liang

For building question answering systems and natural language interfaces, semantic parsing has emerged as an important and powerful paradigm.

Natural Language Understanding Question Answering +1

Estimating Mixture Models via Mixtures of Polynomials

3 code implementations NeurIPS 2015 Sida I. Wang, Arun Tejasvi Chaganty, Percy Liang

This framework allows us to draw insights and apply tools from convex optimization, computer algebra and the theory of moments to study problems in statistical estimation.

Learning Language Games through Interaction

3 code implementations ACL 2016 Sida I. Wang, Percy Liang, Christopher D. Manning

We introduce a new language learning setting relevant to building adaptive natural language interfaces.

Semantic Parsing

Data Recombination for Neural Semantic Parsing

1 code implementation ACL 2016 Robin Jia, Percy Liang

Modeling crisp logical regularities is crucial in semantic parsing, making it difficult for neural models with no task-specific prior knowledge to achieve good results.

Semantic Parsing

SQuAD: 100,000+ Questions for Machine Comprehension of Text

19 code implementations EMNLP 2016 Pranav Rajpurkar, Jian Zhang, Konstantin Lopyrev, Percy Liang

We present the Stanford Question Answering Dataset (SQuAD), a new reading comprehension dataset consisting of 100, 000+ questions posed by crowdworkers on a set of Wikipedia articles, where the answer to each question is a segment of text from the corresponding reading passage.

Question Answering Reading Comprehension +1

Simpler Context-Dependent Logical Forms via Model Projections

1 code implementation ACL 2016 Reginald Long, Panupong Pasupat, Percy Liang

With only denotations at training time, we must search over a combinatorially large space of logical forms, which is even larger with context-dependent utterances.

Semantic Parsing

Unsupervised Risk Estimation Using Only Conditional Independence Structure

no code implementations NeurIPS 2016 Jacob Steinhardt, Percy Liang

We show how to estimate a model's test error from unlabeled data, on distributions very different from the training distribution, while assuming only that certain conditional independencies are preserved between train and test.

Unanimous Prediction for 100% Precision with Application to Learning Semantic Mappings

1 code implementation20 Jun 2016 Fereshte Khani, Martin Rinard, Percy Liang

Specifically, we introduce the unanimity principle: only predict when all models consistent with the training data predict the same output.

Semantic Parsing

Inferring Logical Forms From Denotations

2 code implementations ACL 2016 Panupong Pasupat, Percy Liang

A core problem in learning semantic parsers from denotations is picking out consistent logical forms--those that yield the correct denotation--from a combinatorially large space.

Synthesizing Program Input Grammars

1 code implementation5 Aug 2016 Osbert Bastani, Rahul Sharma, Alex Aiken, Percy Liang

We present an algorithm for synthesizing a context-free grammar encoding the language of valid program inputs from a set of input examples and blackbox access to the program.

Programming Languages

Estimation from Indirect Supervision with Linear Moments

1 code implementation10 Aug 2016 Aditi Raghunathan, Roy Frostig, John Duchi, Percy Liang

In structured prediction problems where we have indirect supervision of the output, maximum marginal likelihood faces two computational obstacles: non-convexity of the objective and intractability of even a single gradient computation.

Structured Prediction

How Much is 131 Million Dollars? Putting Numbers in Perspective with Compositional Descriptions

1 code implementation ACL 2016 Arun Tejasvi Chaganty, Percy Liang

We then propose a system to generate these descriptions consisting of two steps: formula construction and description generation.

Convexified Convolutional Neural Networks

1 code implementation ICML 2017 Yuchen Zhang, Percy Liang, Martin J. Wainwright

For learning two-layer convolutional neural networks, we prove that the generalization error obtained by a convexified CNN converges to that of the best possible CNN.

Denoising

Prediction with a Short Memory

no code implementations8 Dec 2016 Vatsal Sharan, Sham Kakade, Percy Liang, Gregory Valiant

For a Hidden Markov Model with $n$ hidden states, $I$ is bounded by $\log n$, a quantity that does not depend on the mixing time, and we show that the trivial prediction algorithm based on the empirical frequencies of length $O(\log n/\epsilon)$ windows of observations achieves this error, provided the length of the sequence is $d^{\Omega(\log n/\epsilon)}$, where $d$ is the size of the observation alphabet.

A Hitting Time Analysis of Stochastic Gradient Langevin Dynamics

no code implementations18 Feb 2017 Yuchen Zhang, Percy Liang, Moses Charikar

We study the Stochastic Gradient Langevin Dynamics (SGLD) algorithm for non-convex optimization.

Naturalizing a Programming Language via Interactive Learning

1 code implementation ACL 2017 Sida I. Wang, Samuel Ginn, Percy Liang, Christoper D. Manning

Our goal is to create a convenient natural language interface for performing well-specified but complex actions such as analyzing data, manipulating text, and querying databases.

Learning Symmetric Collaborative Dialogue Agents with Dynamic Knowledge Graph Embeddings

2 code implementations ACL 2017 He He, Anusha Balakrishnan, Mihail Eric, Percy Liang

To model both structured knowledge and unstructured language, we propose a neural model with dynamic knowledge graph embeddings that evolve as the dialogue progresses.

Knowledge Graph Embeddings

From Language to Programs: Bridging Reinforcement Learning and Maximum Marginal Likelihood

3 code implementations ACL 2017 Kelvin Guu, Panupong Pasupat, Evan Zheran Liu, Percy Liang

Our goal is to learn a semantic parser that maps natural language utterances into executable programs when only indirect supervision is available: examples are labeled with the correct execution result, but not the program itself.

reinforcement-learning Reinforcement Learning (RL) +1

Certified Defenses for Data Poisoning Attacks

2 code implementations NeurIPS 2017 Jacob Steinhardt, Pang Wei Koh, Percy Liang

Machine learning systems trained on user-provided data are susceptible to data poisoning attacks, whereby malicious users inject false training data with the aim of corrupting the learned model.

Data Poisoning

Developing Bug-Free Machine Learning Systems With Formal Mathematics

1 code implementation ICML 2017 Daniel Selsam, Percy Liang, David L. Dill

As a case study, we implement a new system, Certigrad, for optimizing over stochastic computation graphs, and we generate a formal (i. e. machine-checkable) proof that the gradients sampled by the system are unbiased estimates of the true mathematical gradients.

BIG-bench Machine Learning

Adversarial Examples for Evaluating Reading Comprehension Systems

3 code implementations EMNLP 2017 Robin Jia, Percy Liang

Standard accuracy metrics indicate that reading comprehension systems are making rapid progress, but the extent to which these systems truly understand language remains unclear.

Question Answering Reading Comprehension

Macro Grammars and Holistic Triggering for Efficient Semantic Parsing

2 code implementations EMNLP 2017 Yuchen Zhang, Panupong Pasupat, Percy Liang

To learn a semantic parser from denotations, a learning algorithm must search over a combinatorially large space of logical forms for ones consistent with the annotated denotations.

Semantic Parsing Sentence +1

World of Bits: An Open-Domain Platform for Web-Based Agents

no code implementations ICML 2017 Tianlin Shi, Andrej Karpathy, Linxi Fan, Jonathan Hernandez, Percy Liang

While simulated game environments have greatly accelerated research in reinforcement learning, existing environments lack the open-domain realism of tasks in computer vision or natural language processing, which operate on artifacts created by humans in natural, organic settings.

reinforcement-learning Reinforcement Learning (RL)

Importance sampling for unbiased on-demand evaluation of knowledge base population

no code implementations EMNLP 2017 Arun Chaganty, Ashwin Paranjape, Percy Liang, Christopher D. Manning

Our first contribution is a new importance-sampling based evaluation which corrects for this bias by annotating a new system{'}s predictions on-demand via crowdsourcing.

Information Retrieval Knowledge Base Population +1

Generating Sentences by Editing Prototypes

3 code implementations TACL 2018 Kelvin Guu, Tatsunori B. Hashimoto, Yonatan Oren, Percy Liang

We propose a new generative model of sentences that first samples a prototype sentence from the training corpus and then edits it into a new sentence.

Language Modelling Sentence +1

Unsupervised Transformation Learning via Convex Relaxations

1 code implementation NeurIPS 2017 Tatsunori B. Hashimoto, John C. Duchi, Percy Liang

Our goal is to extract meaningful transformations from raw images, such as varying the thickness of lines in handwriting or the lighting in a portrait.

Learning Overcomplete HMMs

no code implementations NeurIPS 2017 Vatsal Sharan, Sham Kakade, Percy Liang, Gregory Valiant

On the other hand, we show that learning is impossible given only a polynomial number of samples for HMMs with a small output alphabet and whose transition matrices are random regular graphs with large degree.

Certified Defenses against Adversarial Examples

4 code implementations ICLR 2018 Aditi Raghunathan, Jacob Steinhardt, Percy Liang

While neural networks have achieved high accuracy on standard image classification benchmarks, their accuracy drops to nearly zero in the presence of small adversarial perturbations to test inputs.

Adversarial Attack Adversarial Defense +1

Learning a SAT Solver from Single-Bit Supervision

6 code implementations ICLR 2019 Daniel Selsam, Matthew Lamm, Benedikt Bünz, Percy Liang, Leonardo de Moura, David L. Dill

We present NeuroSAT, a message passing neural network that learns to solve SAT problems after only being trained as a classifier to predict satisfiability.

Reinforcement Learning on Web Interfaces Using Workflow-Guided Exploration

4 code implementations ICLR 2018 Evan Zheran Liu, Kelvin Guu, Panupong Pasupat, Tianlin Shi, Percy Liang

Reinforcement learning (RL) agents improve through trial-and-error, but when reward is sparse and the agent cannot discover successful action sequences, learning stagnates.

reinforcement-learning Reinforcement Learning (RL)

Generalized Binary Search For Split-Neighborly Problems

no code implementations27 Feb 2018 Stephen Mussmann, Percy Liang

In sequential hypothesis testing, Generalized Binary Search (GBS) greedily chooses the test with the highest information gain at each step.

Two-sample testing

Delete, Retrieve, Generate: A Simple Approach to Sentiment and Style Transfer

6 code implementations NAACL 2018 Juncen Li, Robin Jia, He He, Percy Liang

We consider the task of text attribute transfer: transforming a sentence to alter a specific attribute (e. g., sentiment) while preserving its attribute-independent content (e. g., changing "screen is just the right size" to "screen is too small").

Attribute Image Captioning +4

Training Classifiers with Natural Language Explanations

2 code implementations ACL 2018 Braden Hancock, Paroma Varma, Stephanie Wang, Martin Bringmann, Percy Liang, Christopher Ré

Training accurate classifiers requires many labels, but each label provides only limited information (one bit for binary classification).

Binary Classification General Classification +1

Planning, Inference and Pragmatics in Sequential Language Games

1 code implementation TACL 2018 Fereshte Khani, Noah D. Goodman, Percy Liang

We study sequential language games in which two players, each with private information, communicate to achieve a common goal.

Know What You Don't Know: Unanswerable Questions for SQuAD

12 code implementations ACL 2018 Pranav Rajpurkar, Robin Jia, Percy Liang

Extractive reading comprehension systems can often locate the correct answer to a question in a context document, but they also tend to make unreliable guesses on questions for which the correct answer is not stated in the context.

Natural Language Understanding Question Answering +1

On the Relationship between Data Efficiency and Error for Uncertainty Sampling

1 code implementation ICML 2018 Stephen Mussmann, Percy Liang

While active learning offers potential cost savings, the actual data efficiency---the reduction in amount of labeled data needed to obtain the same error rate---observed in practice is mixed.

Active Learning regression

Fairness Without Demographics in Repeated Loss Minimization

1 code implementation ICML 2018 Tatsunori B. Hashimoto, Megha Srivastava, Hongseok Namkoong, Percy Liang

Machine learning models (e. g., speech recognizers) are usually trained to minimize average loss, which results in representation disparity---minority groups (e. g., non-native speakers) contribute less to the training objective and thus tend to suffer higher loss.

Fairness

The price of debiasing automatic metrics in natural language evalaution

no code implementations ACL 2018 Arun Chaganty, Stephen Mussmann, Percy Liang

For evaluating generation systems, automatic metrics such as BLEU cost nothing to run but have been shown to correlate poorly with human judgment, leading to systematic bias against certain model improvements.

Abstractive Text Summarization Image Captioning +1

The price of debiasing automatic metrics in natural language evaluation

1 code implementation6 Jul 2018 Arun Tejasvi Chaganty, Stephen Mussman, Percy Liang

For evaluating generation systems, automatic metrics such as BLEU cost nothing to run but have been shown to correlate poorly with human judgment, leading to systematic bias against certain model improvements.

Question Answering

Inferring Multidimensional Rates of Aging from Cross-Sectional Data

1 code implementation12 Jul 2018 Emma Pierson, Pang Wei Koh, Tatsunori Hashimoto, Daphne Koller, Jure Leskovec, Nicholas Eriksson, Percy Liang

Motivated by the study of human aging, we present an interpretable latent-variable model that learns temporal dynamics from cross-sectional data.

Human Aging Time Series +1

QuAC : Question Answering in Context

no code implementations21 Aug 2018 Eunsol Choi, He He, Mohit Iyyer, Mark Yatskar, Wen-tau Yih, Yejin Choi, Percy Liang, Luke Zettlemoyer

We present QuAC, a dataset for Question Answering in Context that contains 14K information-seeking QA dialogs (100K questions in total).

Question Answering Reading Comprehension

Textual Analogy Parsing: What's Shared and What's Compared among Analogous Facts

2 code implementations EMNLP 2018 Matthew Lamm, Arun Tejasvi Chaganty, Christopher D. Manning, Dan Jurafsky, Percy Liang

To understand a sentence like "whereas only 10% of White Americans live at or below the poverty line, 28% of African Americans do" it is important not only to identify individual facts, e. g., poverty rates of distinct demographic groups, but also the higher-order relations between them, e. g., the disparity between them.

Sentence Textual Analogy Parsing

QuAC: Question Answering in Context

no code implementations EMNLP 2018 Eunsol Choi, He He, Mohit Iyyer, Mark Yatskar, Wen-tau Yih, Yejin Choi, Percy Liang, Luke Zettlemoyer

We present QuAC, a dataset for Question Answering in Context that contains 14K information-seeking QA dialogs (100K questions in total).

Question Answering Reading Comprehension

Semidefinite relaxations for certifying robustness to adversarial examples

3 code implementations NeurIPS 2018 Aditi Raghunathan, Jacob Steinhardt, Percy Liang

One promise of ending the arms race is developing certified defenses, ones which are provably robust against all attackers in some family.

Stronger Data Poisoning Attacks Break Data Sanitization Defenses

2 code implementations2 Nov 2018 Pang Wei Koh, Jacob Steinhardt, Percy Liang

In this paper, we develop three attacks that can bypass a broad range of common data sanitization defenses, including anomaly detectors based on nearest neighbors, training loss, and singular-value decomposition.

Data Poisoning Sentiment Analysis +2

FrAngel: Component-Based Synthesis with Control Structures

2 code implementations13 Nov 2018 Kensen Shi, Jacob Steinhardt, Percy Liang

We present FrAngel, a new approach to component-based synthesis that can synthesize short Java functions with control structures when given a desired signature, a set of input-output examples, and a collection of libraries (without formal specifications).

Programming Languages

A Retrieve-and-Edit Framework for Predicting Structured Outputs

1 code implementation NeurIPS 2018 Tatsunori B. Hashimoto, Kelvin Guu, Yonatan Oren, Percy Liang

For the task of generating complex outputs such as source code, editing existing outputs can be easier than generating complex outputs from scratch.

Retrieval

Uncertainty Sampling is Preconditioned Stochastic Gradient Descent on Zero-One Loss

1 code implementation NeurIPS 2018 Stephen Mussmann, Percy Liang

Uncertainty sampling, a popular active learning algorithm, is used to reduce the amount of data required to learn a classifier, but it has been observed in practice to converge to different parameters depending on the initialization and sometimes to even better parameters than standard training on all the data.

Active Learning

Defending against Whitebox Adversarial Attacks via Randomized Discretization

1 code implementation25 Mar 2019 Yuchen Zhang, Percy Liang

Adversarial perturbations dramatically decrease the accuracy of state-of-the-art image classifiers.

Adversarial Attack General Classification

Pun Generation with Surprise

2 code implementations NAACL 2019 He He, Nanyun Peng, Percy Liang

We tackle the problem of generating a pun sentence given a pair of homophones (e. g., "died" and "dyed").

Language Modelling Sentence +1

Select Via Proxy: Efficient Data Selection For Training Deep Networks

no code implementations ICLR 2019 Cody Coleman, Stephen Mussmann, Baharan Mirzasoleiman, Peter Bailis, Percy Liang, Jure Leskovec, Matei Zaharia

In our approach, we first train a small proxy model quickly, which we then use to estimate the utility of individual training data points, and then select the most informative ones for training the large target model.

BIG-bench Machine Learning Image Classification +1

Learning Abstract Models for Long-Horizon Exploration

no code implementations ICLR 2019 Evan Zheran Liu, Ramtin Keramati, Sudarshan Seshadri, Kelvin Guu, Panupong Pasupat, Emma Brunskill, Percy Liang

In our approach, a manager maintains an abstract MDP over a subset of the abstract states, which grows monotonically through targeted exploration (possible due to the abstract MDP).

Atari Games

Strategies for Pre-training Graph Neural Networks

10 code implementations ICLR 2020 Weihua Hu, Bowen Liu, Joseph Gomes, Marinka Zitnik, Percy Liang, Vijay Pande, Jure Leskovec

Many applications of machine learning require a model to make accurate pre-dictions on test examples that are distributionally different from training ones, while task-specific labels are scarce during training.

Graph Classification Molecular Property Prediction +4

Unlabeled Data Improves Adversarial Robustness

4 code implementations NeurIPS 2019 Yair Carmon, aditi raghunathan, Ludwig Schmidt, Percy Liang, John C. Duchi

We demonstrate, theoretically and empirically, that adversarial robustness can significantly benefit from semisupervised learning.

Adversarial Robustness Robust classification

Maximum Weighted Loss Discrepancy

1 code implementation8 Jun 2019 Fereshte Khani, aditi raghunathan, Percy Liang

To capture this inequality, we introduce and study a notion we call maximum weighted loss discrepancy (MWLD), the maximum (weighted) difference between the loss of a group and the loss of the population.

Fairness Generalization Bounds

SPoC: Search-based Pseudocode to Code

1 code implementation NeurIPS 2019 Sumith Kulal, Panupong Pasupat, Kartik Chandra, Mina Lee, Oded Padon, Alex Aiken, Percy Liang

Given test cases as a mechanism to validate programs, we search over the space of possible translations of the pseudocode to find a program that passes the validation.

Program Synthesis Translation

Adversarial Training Can Hurt Generalization

no code implementations ICML Workshop Deep_Phenomen 2019 Aditi Raghunathan, Sang Michael Xie, Fanny Yang, John C. Duchi, Percy Liang

While adversarial training can improve robust accuracy (against an adversary), it sometimes hurts standard accuracy (when there is no adversary).

A Tight Analysis of Greedy Yields Subexponential Time Approximation for Uniform Decision Tree

no code implementations26 Jun 2019 Ray Li, Percy Liang, Stephen Mussmann

The greedy algorithm's $O(\log n)$ approximation ratio was the best known, but the largest approximation ratio known to be NP-hard is $4-\varepsilon$.

Active Learning

Selection via Proxy: Efficient Data Selection for Deep Learning

1 code implementation ICLR 2020 Cody Coleman, Christopher Yeh, Stephen Mussmann, Baharan Mirzasoleiman, Peter Bailis, Percy Liang, Jure Leskovec, Matei Zaharia

By removing hidden layers from the target model, using smaller architectures, and training for fewer epochs, we create proxies that are an order of magnitude faster to train.

Active Learning Computational Efficiency

Distributionally Robust Language Modeling

1 code implementation IJCNLP 2019 Yonatan Oren, Shiori Sagawa, Tatsunori B. Hashimoto, Percy Liang

Language models are generally trained on data spanning a wide range of topics (e. g., news, reviews, fiction), but they might be applied to an a priori unknown target distribution (e. g., restaurant reviews).

Language Modelling

Designing and Interpreting Probes with Control Tasks

1 code implementation IJCNLP 2019 John Hewitt, Percy Liang

The selectivity of a probe puts linguistic task accuracy in context with the probe's capacity to memorize from word types.

Part-Of-Speech Tagging

Verified Uncertainty Calibration

3 code implementations NeurIPS 2019 Ananya Kumar, Percy Liang, Tengyu Ma

In these experiments, we also estimate the calibration error and ECE more accurately than the commonly used plugin estimators.

Weather Forecasting

When Covariate-shifted Data Augmentation Increases Test Error And How to Fix It

no code implementations25 Sep 2019 Sang Michael Xie*, Aditi Raghunathan*, Fanny Yang, John C. Duchi, Percy Liang

Empirically, data augmentation sometimes improves and sometimes hurts test error, even when only adding points with labels from the true conditional distribution that the hypothesis class is expressive enough to fit.

Data Augmentation regression

Shaping Visual Representations with Language for Few-shot Classification

2 code implementations ACL 2020 Jesse Mu, Percy Liang, Noah Goodman

By describing the features and abstractions of our world, language is a crucial tool for human learning and a promising source of supervision for machine learning models.

Classification General Classification +2

Learning Autocomplete Systems as a Communication Game

1 code implementation16 Nov 2019 Mina Lee, Tatsunori B. Hashimoto, Percy Liang

We study textual autocomplete---the task of predicting a full sentence from a partial sentence---as a human-machine communication game.

Sentence

Feature Noise Induces Loss Discrepancy Across Groups

1 code implementation ICML 2020 Fereshte Khani, Percy Liang

Our main result is that even when there is no information deficiency specific to one group (e. g., both groups have infinite data), adding the same amount of feature noise to all individuals leads to loss discrepancy.

Attribute

Understanding and Mitigating the Tradeoff Between Robustness and Accuracy

1 code implementation ICML 2020 Aditi Raghunathan, Sang Michael Xie, Fanny Yang, John Duchi, Percy Liang

In this work, we precisely characterize the effect of augmentation on the standard error in linear regression when the optimal linear predictor has zero standard and robust error.

regression

Understanding Self-Training for Gradual Domain Adaptation

2 code implementations ICML 2020 Ananya Kumar, Tengyu Ma, Percy Liang

Machine learning systems must adapt to data distributions that evolve over time, in applications ranging from sensor networks and self-driving car perception modules to brain-machine interfaces.

Unsupervised Domain Adaptation

Distributionally Robust Neural Networks

1 code implementation ICLR 2020 Shiori Sagawa*, Pang Wei Koh*, Tatsunori B. Hashimoto, Percy Liang

Distributionally robust optimization (DRO) allows us to learn models that instead minimize the worst-case training loss over a set of pre-defined groups.

L2 Regularization Natural Language Inference +1

ExpBERT: Representation Engineering with Natural Language Explanations

2 code implementations ACL 2020 Shikhar Murty, Pang Wei Koh, Percy Liang

Suppose we want to specify the inductive bias that married couples typically go on honeymoons for the task of extracting pairs of spouses from text.

Inductive Bias Relation Extraction +1

An Investigation of Why Overparameterization Exacerbates Spurious Correlations

3 code implementations9 May 2020 Shiori Sagawa, aditi raghunathan, Pang Wei Koh, Percy Liang

We study why overparameterization -- increasing model size well beyond the point of zero training error -- can hurt test error on minority groups despite improving average test error when there are spurious correlations in the data.

Inductive Bias

Enabling Language Models to Fill in the Blanks

3 code implementations ACL 2020 Chris Donahue, Mina Lee, Percy Liang

We show that this approach, which we call infilling by language modeling, can enable LMs to infill entire sentences effectively on three different domains: short stories, scientific abstracts, and lyrics.

Language Modelling Text Infilling

Graph-based, Self-Supervised Program Repair from Diagnostic Feedback

2 code implementations ICML 2020 Michihiro Yasunaga, Percy Liang

Second, we present a self-supervised learning paradigm for program repair that leverages unlabeled programs available online to create a large amount of extra program repair examples, which we use to pre-train our models.

Code Generation Graph Learning +2

Explore then Execute: Adapting without Rewards via Factorized Meta-Reinforcement Learning

no code implementations ICML Workshop LifelongML 2020 Evan Zheran Liu, aditi raghunathan, Percy Liang, Chelsea Finn

In principle, meta-reinforcement learning approaches can exploit this shared structure, but in practice, they fail to adapt to new environments when adaptation requires targeted exploration (e. g., exploring the cabinets to find ingredients in a new kitchen).

Meta Reinforcement Learning reinforcement-learning +2

Selective Question Answering under Domain Shift

2 code implementations ACL 2020 Amita Kamath, Robin Jia, Percy Liang

In this work, we propose the setting of selective question answering under domain shift, in which a QA model is tested on a mixture of in-domain and out-of-domain data, and must answer (i. e., not abstain on) as many questions as possible while maintaining high accuracy.

Question Answering

Composed Fine-Tuning: Freezing Pre-Trained Denoising Autoencoders for Improved Generalization

2 code implementations29 Jun 2020 Sang Michael Xie, Tengyu Ma, Percy Liang

Empirically, we show that composed fine-tuning improves over standard fine-tuning on two pseudocode-to-code translation datasets (3% and 6% relative).

Code Translation Denoising +2

Concept Bottleneck Models

4 code implementations ICML 2020 Pang Wei Koh, Thao Nguyen, Yew Siang Tang, Stephen Mussmann, Emma Pierson, Been Kim, Percy Liang

We seek to learn models that we can interact with using high-level concepts: if the model did not think there was a bone spur in the x-ray, would it still predict severe arthritis?

Learning Abstract Models for Strategic Exploration and Fast Reward Transfer

1 code implementation12 Jul 2020 Evan Zheran Liu, Ramtin Keramati, Sudarshan Seshadri, Kelvin Guu, Panupong Pasupat, Emma Brunskill, Percy Liang

Model-based reinforcement learning (RL) is appealing because (i) it enables planning and thus more strategic exploration, and (ii) by decoupling dynamics from rewards, it enables fast transfer to new reward functions.

Model-based Reinforcement Learning Montezuma's Revenge +2

Robustness to Spurious Correlations via Human Annotations

1 code implementation ICML 2020 Megha Srivastava, Tatsunori Hashimoto, Percy Liang

The reliability of machine learning systems critically assumes that the associations between features and labels remain similar between training and test distributions.

Common Sense Reasoning

Decoupling Exploration and Exploitation for Meta-Reinforcement Learning without Sacrifices

2 code implementations6 Aug 2020 Evan Zheran Liu, aditi raghunathan, Percy Liang, Chelsea Finn

Learning a new task often requires both exploring to gather task-relevant information and exploiting this information to solve the task.

Meta Reinforcement Learning reinforcement-learning +2

Simplifying Models with Unlabeled Output Data

no code implementations28 Sep 2020 Sang Michael Xie, Tengyu Ma, Percy Liang

We focus on prediction problems with high-dimensional outputs that are subject to output validity constraints, e. g. a pseudocode-to-code translation task where the code must compile.

Code Translation Image Generation +2

On the Importance of Adaptive Data Collection for Extremely Imbalanced Pairwise Tasks

1 code implementation Findings of the Association for Computational Linguistics 2020 Stephen Mussmann, Robin Jia, Percy Liang

Many pairwise classification tasks, such as paraphrase detection and open-domain question answering, naturally have extreme label imbalance (e. g., $99. 99\%$ of examples are negatives).

Active Learning Open-Domain Question Answering +1

Learning Adaptive Language Interfaces through Decomposition

no code implementations EMNLP (intexsempar) 2020 Siddharth Karamcheti, Dorsa Sadigh, Percy Liang

Our goal is to create an interactive natural language interface that efficiently and reliably learns from users to complete tasks in simulated robotics settings.

Semantic Parsing

The EOS Decision and Length Extrapolation

1 code implementation EMNLP (BlackboxNLP) 2020 Benjamin Newman, John Hewitt, Percy Liang, Christopher D. Manning

Extrapolation to unseen sequence lengths is a challenge for neural generative models of language.

Enabling certification of verification-agnostic networks via memory-efficient semidefinite programming

2 code implementations NeurIPS 2020 Sumanth Dathathri, Krishnamurthy Dvijotham, Alexey Kurakin, aditi raghunathan, Jonathan Uesato, Rudy Bunel, Shreya Shankar, Jacob Steinhardt, Ian Goodfellow, Percy Liang, Pushmeet Kohli

In this work, we propose a first-order dual SDP algorithm that (1) requires memory only linear in the total number of network activations, (2) only requires a fixed number of forward/backward passes through the network per iteration.

Selective Classification Can Magnify Disparities Across Groups

1 code implementation ICLR 2021 Erik Jones, Shiori Sagawa, Pang Wei Koh, Ananya Kumar, Percy Liang

In this paper, we find that while selective classification can improve average accuracies, it can simultaneously magnify existing accuracy disparities between various groups within a population, especially in the presence of spurious correlations.

Classification General Classification

Beyond I.I.D.: Three Levels of Generalization for Question Answering on Knowledge Bases

1 code implementation16 Nov 2020 Yu Gu, Sue Kase, Michelle Vanni, Brian Sadler, Percy Liang, Xifeng Yan, Yu Su

To facilitate the development of KBQA models with stronger generalization, we construct and release a new large-scale, high-quality dataset with 64, 331 questions, GrailQA, and provide evaluation settings for all three levels of generalization.

Knowledge Base Question Answering

Removing Spurious Features can Hurt Accuracy and Affect Groups Disproportionately

1 code implementation7 Dec 2020 Fereshte Khani, Percy Liang

The presence of spurious features interferes with the goal of obtaining robust models that perform well across many groups within the population.

In-N-Out: Pre-Training and Self-Training using Auxiliary Information for Out-of-Distribution Robustness

1 code implementation ICLR 2021 Sang Michael Xie, Ananya Kumar, Robbie Jones, Fereshte Khani, Tengyu Ma, Percy Liang

To get the best of both worlds, we introduce In-N-Out, which first trains a model with auxiliary inputs and uses it to pseudolabel all the in-distribution inputs, then pre-trains a model on OOD auxiliary outputs and fine-tunes this model with the pseudolabels (self-training).

Time Series Time Series Analysis +1

Prefix-Tuning: Optimizing Continuous Prompts for Generation

10 code implementations ACL 2021 Xiang Lisa Li, Percy Liang

Fine-tuning is the de facto way to leverage large pretrained language models to perform downstream tasks.

Language Modelling Table-to-Text Generation

Do Question Answering Modeling Improvements Hold Across Benchmarks?

no code implementations1 Feb 2021 Nelson F. Liu, Tony Lee, Robin Jia, Percy Liang

Do question answering (QA) modeling improvements (e. g., choice of architecture and training procedure) hold consistently across the diverse landscape of QA benchmarks?

Question Answering

QA-GNN: Reasoning with Language Models and Knowledge Graphs for Question Answering

4 code implementations NAACL 2021 Michihiro Yasunaga, Hongyu Ren, Antoine Bosselut, Percy Liang, Jure Leskovec

The problem of answering questions using knowledge from pre-trained language models (LMs) and knowledge graphs (KGs) presents two challenges: given a QA context (question and answer choice), methods need to (i) identify relevant knowledge from large KGs, and (ii) perform joint reasoning over the QA context and KG.

Graph Representation Learning Knowledge Graphs +5

Swords: A Benchmark for Lexical Substitution with Improved Data Coverage and Quality

1 code implementation NAACL 2021 Mina Lee, Chris Donahue, Robin Jia, Alexander Iyabor, Percy Liang

We release a new benchmark for lexical substitution, the task of finding appropriate substitutes for a target word in a context.

Break-It-Fix-It: Unsupervised Learning for Program Repair

1 code implementation11 Jun 2021 Michihiro Yasunaga, Percy Liang

To bridge this gap, we propose a new training approach, Break-It-Fix-It (BIFI), which has two key ideas: (i) we use the critic to check a fixer's output on real bad inputs and add good (fixed) outputs to the training data, and (ii) we train a breaker to generate realistic bad code from good code.

C++ code Code Repair +4

Codified audio language modeling learns useful representations for music information retrieval

1 code implementation12 Jul 2021 Rodrigo Castellon, Chris Donahue, Percy Liang

Relative to representations from conventional MIR models which are pre-trained on tagging, we find that using representations from Jukebox as input features yields 30% stronger performance on average across four MIR tasks: tagging, genre classification, emotion recognition, and key detection.

Emotion Recognition Genre classification +8

Just Train Twice: Improving Group Robustness without Training Group Information

1 code implementation19 Jul 2021 Evan Zheran Liu, Behzad Haghgoo, Annie S. Chen, aditi raghunathan, Pang Wei Koh, Shiori Sagawa, Percy Liang, Chelsea Finn

Standard training via empirical risk minimization (ERM) can produce models that achieve high accuracy on average but low accuracy on certain groups, especially in the presence of spurious correlations between the input and label.

Image Classification Out-of-Distribution Generalization

On the Opportunities and Risks of Foundation Models

2 code implementations16 Aug 2021 Rishi Bommasani, Drew A. Hudson, Ehsan Adeli, Russ Altman, Simran Arora, Sydney von Arx, Michael S. Bernstein, Jeannette Bohg, Antoine Bosselut, Emma Brunskill, Erik Brynjolfsson, Shyamal Buch, Dallas Card, Rodrigo Castellon, Niladri Chatterji, Annie Chen, Kathleen Creel, Jared Quincy Davis, Dora Demszky, Chris Donahue, Moussa Doumbouya, Esin Durmus, Stefano Ermon, John Etchemendy, Kawin Ethayarajh, Li Fei-Fei, Chelsea Finn, Trevor Gale, Lauren Gillespie, Karan Goel, Noah Goodman, Shelby Grossman, Neel Guha, Tatsunori Hashimoto, Peter Henderson, John Hewitt, Daniel E. Ho, Jenny Hong, Kyle Hsu, Jing Huang, Thomas Icard, Saahil Jain, Dan Jurafsky, Pratyusha Kalluri, Siddharth Karamcheti, Geoff Keeling, Fereshte Khani, Omar Khattab, Pang Wei Koh, Mark Krass, Ranjay Krishna, Rohith Kuditipudi, Ananya Kumar, Faisal Ladhak, Mina Lee, Tony Lee, Jure Leskovec, Isabelle Levent, Xiang Lisa Li, Xuechen Li, Tengyu Ma, Ali Malik, Christopher D. Manning, Suvir Mirchandani, Eric Mitchell, Zanele Munyikwa, Suraj Nair, Avanika Narayan, Deepak Narayanan, Ben Newman, Allen Nie, Juan Carlos Niebles, Hamed Nilforoshan, Julian Nyarko, Giray Ogut, Laurel Orr, Isabel Papadimitriou, Joon Sung Park, Chris Piech, Eva Portelance, Christopher Potts, aditi raghunathan, Rob Reich, Hongyu Ren, Frieda Rong, Yusuf Roohani, Camilo Ruiz, Jack Ryan, Christopher Ré, Dorsa Sadigh, Shiori Sagawa, Keshav Santhanam, Andy Shih, Krishnan Srinivasan, Alex Tamkin, Rohan Taori, Armin W. Thomas, Florian Tramèr, Rose E. Wang, William Wang, Bohan Wu, Jiajun Wu, Yuhuai Wu, Sang Michael Xie, Michihiro Yasunaga, Jiaxuan You, Matei Zaharia, Michael Zhang, Tianyi Zhang, Xikun Zhang, Yuhui Zhang, Lucia Zheng, Kaitlyn Zhou, Percy Liang

AI is undergoing a paradigm shift with the rise of models (e. g., BERT, DALL-E, GPT-3) that are trained on broad data at scale and are adaptable to a wide range of downstream tasks.

Transfer Learning

LM-Critic: Language Models for Unsupervised Grammatical Error Correction

2 code implementations EMNLP 2021 Michihiro Yasunaga, Jure Leskovec, Percy Liang

Training a model for grammatical error correction (GEC) requires a set of labeled ungrammatical / grammatical sentence pairs, but manually annotating such pairs can be expensive.

Grammatical Error Correction Language Modelling +2

Conditional probing: measuring usable information beyond a baseline

1 code implementation EMNLP 2021 John Hewitt, Kawin Ethayarajh, Percy Liang, Christopher D. Manning

Probing experiments investigate the extent to which neural representations make properties -- like part-of-speech -- predictable.

Word Embeddings

GreaseLM: Graph REASoning Enhanced Language Models

no code implementations ICLR 2022 Xikun Zhang, Antoine Bosselut, Michihiro Yasunaga, Hongyu Ren, Percy Liang, Christopher D Manning, Jure Leskovec

Answering complex questions about textual narratives requires reasoning over both stated context and the world knowledge that underlies it.

Knowledge Graphs Negation +2

Ensembles and Cocktails: Robust Finetuning for Natural Language Generation

no code implementations29 Sep 2021 John Hewitt, Xiang Lisa Li, Sang Michael Xie, Benjamin Newman, Percy Liang

When finetuning a pretrained language model for natural language generation tasks, one is currently faced with a tradeoff.

Language Modelling Text Generation

How does Contrastive Pre-training Connect Disparate Domains?

no code implementations29 Sep 2021 Kendrick Shen, Robbie Matthew Jones, Ananya Kumar, Sang Michael Xie, Percy Liang

We develop a conceptual model for contrastive learning under domain shifts, where data augmentations form connections between classes and domains that can be far apart.

Contrastive Learning Unsupervised Domain Adaptation

Calibrated ensembles - a simple way to mitigate ID-OOD accuracy tradeoffs

no code implementations29 Sep 2021 Ananya Kumar, aditi raghunathan, Tengyu Ma, Percy Liang

We often see undesirable tradeoffs in robust machine learning where out-of-distribution (OOD) accuracy is at odds with in-distribution (ID) accuracy.

Large Language Models Can Be Strong Differentially Private Learners

4 code implementations ICLR 2022 Xuechen Li, Florian Tramèr, Percy Liang, Tatsunori Hashimoto

Differentially Private (DP) learning has seen limited success for building large deep learning models of text, and straightforward attempts at applying Differentially Private Stochastic Gradient Descent (DP-SGD) to NLP tasks have resulted in large performance drops and high computational overhead.

LILA: Language-Informed Latent Actions

1 code implementation5 Nov 2021 Siddharth Karamcheti, Megha Srivastava, Percy Liang, Dorsa Sadigh

We introduce Language-Informed Latent Actions (LILA), a framework for learning natural language interfaces in the context of human-robot collaboration.

Imitation Learning

Extending the WILDS Benchmark for Unsupervised Adaptation

1 code implementation ICLR 2022 Shiori Sagawa, Pang Wei Koh, Tony Lee, Irena Gao, Sang Michael Xie, Kendrick Shen, Ananya Kumar, Weihua Hu, Michihiro Yasunaga, Henrik Marklund, Sara Beery, Etienne David, Ian Stavness, Wei Guo, Jure Leskovec, Kate Saenko, Tatsunori Hashimoto, Sergey Levine, Chelsea Finn, Percy Liang

Unlabeled data can be a powerful point of leverage for mitigating these distribution shifts, as it is frequently much more available than labeled data and can often be obtained from distributions beyond the source distribution as well.

CoAuthor: Designing a Human-AI Collaborative Writing Dataset for Exploring Language Model Capabilities

1 code implementation18 Jan 2022 Mina Lee, Percy Liang, Qian Yang

Large language models (LMs) offer unprecedented language generation capabilities and exciting opportunities for interaction design.

Language Modelling Text Generation

GreaseLM: Graph REASoning Enhanced Language Models for Question Answering

1 code implementation21 Jan 2022 Xikun Zhang, Antoine Bosselut, Michihiro Yasunaga, Hongyu Ren, Percy Liang, Christopher D. Manning, Jure Leskovec

Answering complex questions about textual narratives requires reasoning over both stated context and the world knowledge that underlies it.

Knowledge Graphs Negation +2

Fine-Tuning can Distort Pretrained Features and Underperform Out-of-Distribution

3 code implementations21 Feb 2022 Ananya Kumar, aditi raghunathan, Robbie Jones, Tengyu Ma, Percy Liang

However, in this paper, we find that fine-tuning can achieve worse accuracy than linear probing out-of-distribution (OOD) when the pretrained features are good and the distribution shift is large.

Connect, Not Collapse: Explaining Contrastive Learning for Unsupervised Domain Adaptation

no code implementations1 Apr 2022 Kendrick Shen, Robbie Jones, Ananya Kumar, Sang Michael Xie, Jeff Z. HaoChen, Tengyu Ma, Percy Liang

We consider unsupervised domain adaptation (UDA), where labeled data from a source domain (e. g., photographs) and unlabeled data from a target domain (e. g., sketches) are used to learn a classifier for the target domain.

Contrastive Learning Unsupervised Domain Adaptation

Diffusion-LM Improves Controllable Text Generation

1 code implementation27 May 2022 Xiang Lisa Li, John Thickstun, Ishaan Gulrajani, Percy Liang, Tatsunori B. Hashimoto

Controlling the behavior of language models (LMs) without re-training is a major open problem in natural language generation.

Language Modelling Sentence +1

Decentralized Training of Foundation Models in Heterogeneous Environments

1 code implementation2 Jun 2022 Binhang Yuan, Yongjun He, Jared Quincy Davis, Tianyi Zhang, Tri Dao, Beidi Chen, Percy Liang, Christopher Re, Ce Zhang

Our key technical contribution is a scheduling algorithm that allocates different computational "tasklets" in the training of foundation models to a group of decentralized GPU devices connected by a slow heterogeneous network.

Scheduling

Beyond the Imitation Game: Quantifying and extrapolating the capabilities of language models

3 code implementations9 Jun 2022 Aarohi Srivastava, Abhinav Rastogi, Abhishek Rao, Abu Awal Md Shoeb, Abubakar Abid, Adam Fisch, Adam R. Brown, Adam Santoro, Aditya Gupta, Adrià Garriga-Alonso, Agnieszka Kluska, Aitor Lewkowycz, Akshat Agarwal, Alethea Power, Alex Ray, Alex Warstadt, Alexander W. Kocurek, Ali Safaya, Ali Tazarv, Alice Xiang, Alicia Parrish, Allen Nie, Aman Hussain, Amanda Askell, Amanda Dsouza, Ambrose Slone, Ameet Rahane, Anantharaman S. Iyer, Anders Andreassen, Andrea Madotto, Andrea Santilli, Andreas Stuhlmüller, Andrew Dai, Andrew La, Andrew Lampinen, Andy Zou, Angela Jiang, Angelica Chen, Anh Vuong, Animesh Gupta, Anna Gottardi, Antonio Norelli, Anu Venkatesh, Arash Gholamidavoodi, Arfa Tabassum, Arul Menezes, Arun Kirubarajan, Asher Mullokandov, Ashish Sabharwal, Austin Herrick, Avia Efrat, Aykut Erdem, Ayla Karakaş, B. Ryan Roberts, Bao Sheng Loe, Barret Zoph, Bartłomiej Bojanowski, Batuhan Özyurt, Behnam Hedayatnia, Behnam Neyshabur, Benjamin Inden, Benno Stein, Berk Ekmekci, Bill Yuchen Lin, Blake Howald, Bryan Orinion, Cameron Diao, Cameron Dour, Catherine Stinson, Cedrick Argueta, César Ferri Ramírez, Chandan Singh, Charles Rathkopf, Chenlin Meng, Chitta Baral, Chiyu Wu, Chris Callison-Burch, Chris Waites, Christian Voigt, Christopher D. Manning, Christopher Potts, Cindy Ramirez, Clara E. Rivera, Clemencia Siro, Colin Raffel, Courtney Ashcraft, Cristina Garbacea, Damien Sileo, Dan Garrette, Dan Hendrycks, Dan Kilman, Dan Roth, Daniel Freeman, Daniel Khashabi, Daniel Levy, Daniel Moseguí González, Danielle Perszyk, Danny Hernandez, Danqi Chen, Daphne Ippolito, Dar Gilboa, David Dohan, David Drakard, David Jurgens, Debajyoti Datta, Deep Ganguli, Denis Emelin, Denis Kleyko, Deniz Yuret, Derek Chen, Derek Tam, Dieuwke Hupkes, Diganta Misra, Dilyar Buzan, Dimitri Coelho Mollo, Diyi Yang, Dong-Ho Lee, Dylan Schrader, Ekaterina Shutova, Ekin Dogus Cubuk, Elad Segal, Eleanor Hagerman, Elizabeth Barnes, Elizabeth Donoway, Ellie Pavlick, Emanuele Rodola, Emma Lam, Eric Chu, Eric Tang, Erkut Erdem, Ernie Chang, Ethan A. Chi, Ethan Dyer, Ethan Jerzak, Ethan Kim, Eunice Engefu Manyasi, Evgenii Zheltonozhskii, Fanyue Xia, Fatemeh Siar, Fernando Martínez-Plumed, Francesca Happé, Francois Chollet, Frieda Rong, Gaurav Mishra, Genta Indra Winata, Gerard de Melo, Germán Kruszewski, Giambattista Parascandolo, Giorgio Mariani, Gloria Wang, Gonzalo Jaimovitch-López, Gregor Betz, Guy Gur-Ari, Hana Galijasevic, Hannah Kim, Hannah Rashkin, Hannaneh Hajishirzi, Harsh Mehta, Hayden Bogar, Henry Shevlin, Hinrich Schütze, Hiromu Yakura, Hongming Zhang, Hugh Mee Wong, Ian Ng, Isaac Noble, Jaap Jumelet, Jack Geissinger, Jackson Kernion, Jacob Hilton, Jaehoon Lee, Jaime Fernández Fisac, James B. Simon, James Koppel, James Zheng, James Zou, Jan Kocoń, Jana Thompson, Janelle Wingfield, Jared Kaplan, Jarema Radom, Jascha Sohl-Dickstein, Jason Phang, Jason Wei, Jason Yosinski, Jekaterina Novikova, Jelle Bosscher, Jennifer Marsh, Jeremy Kim, Jeroen Taal, Jesse Engel, Jesujoba Alabi, Jiacheng Xu, Jiaming Song, Jillian Tang, Joan Waweru, John Burden, John Miller, John U. Balis, Jonathan Batchelder, Jonathan Berant, Jörg Frohberg, Jos Rozen, Jose Hernandez-Orallo, Joseph Boudeman, Joseph Guerr, Joseph Jones, Joshua B. Tenenbaum, Joshua S. Rule, Joyce Chua, Kamil Kanclerz, Karen Livescu, Karl Krauth, Karthik Gopalakrishnan, Katerina Ignatyeva, Katja Markert, Kaustubh D. Dhole, Kevin Gimpel, Kevin Omondi, Kory Mathewson, Kristen Chiafullo, Ksenia Shkaruta, Kumar Shridhar, Kyle McDonell, Kyle Richardson, Laria Reynolds, Leo Gao, Li Zhang, Liam Dugan, Lianhui Qin, Lidia Contreras-Ochando, Louis-Philippe Morency, Luca Moschella, Lucas Lam, Lucy Noble, Ludwig Schmidt, Luheng He, Luis Oliveros Colón, Luke Metz, Lütfi Kerem Şenel, Maarten Bosma, Maarten Sap, Maartje ter Hoeve, Maheen Farooqi, Manaal Faruqui, Mantas Mazeika, Marco Baturan, Marco Marelli, Marco Maru, Maria Jose Ramírez Quintana, Marie Tolkiehn, Mario Giulianelli, Martha Lewis, Martin Potthast, Matthew L. Leavitt, Matthias Hagen, Mátyás Schubert, Medina Orduna Baitemirova, Melody Arnaud, Melvin McElrath, Michael A. Yee, Michael Cohen, Michael Gu, Michael Ivanitskiy, Michael Starritt, Michael Strube, Michał Swędrowski, Michele Bevilacqua, Michihiro Yasunaga, Mihir Kale, Mike Cain, Mimee Xu, Mirac Suzgun, Mitch Walker, Mo Tiwari, Mohit Bansal, Moin Aminnaseri, Mor Geva, Mozhdeh Gheini, Mukund Varma T, Nanyun Peng, Nathan A. Chi, Nayeon Lee, Neta Gur-Ari Krakover, Nicholas Cameron, Nicholas Roberts, Nick Doiron, Nicole Martinez, Nikita Nangia, Niklas Deckers, Niklas Muennighoff, Nitish Shirish Keskar, Niveditha S. Iyer, Noah Constant, Noah Fiedel, Nuan Wen, Oliver Zhang, Omar Agha, Omar Elbaghdadi, Omer Levy, Owain Evans, Pablo Antonio Moreno Casares, Parth Doshi, Pascale Fung, Paul Pu Liang, Paul Vicol, Pegah Alipoormolabashi, Peiyuan Liao, Percy Liang, Peter Chang, Peter Eckersley, Phu Mon Htut, Pinyu Hwang, Piotr Miłkowski, Piyush Patil, Pouya Pezeshkpour, Priti Oli, Qiaozhu Mei, Qing Lyu, Qinlang Chen, Rabin Banjade, Rachel Etta Rudolph, Raefer Gabriel, Rahel Habacker, Ramon Risco, Raphaël Millière, Rhythm Garg, Richard Barnes, Rif A. Saurous, Riku Arakawa, Robbe Raymaekers, Robert Frank, Rohan Sikand, Roman Novak, Roman Sitelew, Ronan LeBras, Rosanne Liu, Rowan Jacobs, Rui Zhang, Ruslan Salakhutdinov, Ryan Chi, Ryan Lee, Ryan Stovall, Ryan Teehan, Rylan Yang, Sahib Singh, Saif M. Mohammad, Sajant Anand, Sam Dillavou, Sam Shleifer, Sam Wiseman, Samuel Gruetter, Samuel R. Bowman, Samuel S. Schoenholz, Sanghyun Han, Sanjeev Kwatra, Sarah A. Rous, Sarik Ghazarian, Sayan Ghosh, Sean Casey, Sebastian Bischoff, Sebastian Gehrmann, Sebastian Schuster, Sepideh Sadeghi, Shadi Hamdan, Sharon Zhou, Shashank Srivastava, Sherry Shi, Shikhar Singh, Shima Asaadi, Shixiang Shane Gu, Shubh Pachchigar, Shubham Toshniwal, Shyam Upadhyay, Shyamolima, Debnath, Siamak Shakeri, Simon Thormeyer, Simone Melzi, Siva Reddy, Sneha Priscilla Makini, Soo-Hwan Lee, Spencer Torene, Sriharsha Hatwar, Stanislas Dehaene, Stefan Divic, Stefano Ermon, Stella Biderman, Stephanie Lin, Stephen Prasad, Steven T. Piantadosi, Stuart M. Shieber, Summer Misherghi, Svetlana Kiritchenko, Swaroop Mishra, Tal Linzen, Tal Schuster, Tao Li, Tao Yu, Tariq Ali, Tatsu Hashimoto, Te-Lin Wu, Théo Desbordes, Theodore Rothschild, Thomas Phan, Tianle Wang, Tiberius Nkinyili, Timo Schick, Timofei Kornev, Titus Tunduny, Tobias Gerstenberg, Trenton Chang, Trishala Neeraj, Tushar Khot, Tyler Shultz, Uri Shaham, Vedant Misra, Vera Demberg, Victoria Nyamai, Vikas Raunak, Vinay Ramasesh, Vinay Uday Prabhu, Vishakh Padmakumar, Vivek Srikumar, William Fedus, William Saunders, William Zhang, Wout Vossen, Xiang Ren, Xiaoyu Tong, Xinran Zhao, Xinyi Wu, Xudong Shen, Yadollah Yaghoobzadeh, Yair Lakretz, Yangqiu Song, Yasaman Bahri, Yejin Choi, Yichi Yang, Yiding Hao, Yifu Chen, Yonatan Belinkov, Yu Hou, Yufang Hou, Yuntao Bai, Zachary Seid, Zhuoye Zhao, Zijian Wang, Zijie J. Wang, ZiRui Wang, Ziyi Wu

BIG-bench focuses on tasks that are believed to be beyond the capabilities of current language models.

Common Sense Reasoning Math +1

Insights into Pre-training via Simpler Synthetic Tasks

1 code implementation21 Jun 2022 Yuhuai Wu, Felix Li, Percy Liang

Second, to our surprise, we find that pre-training on a simple and generic synthetic task defined by the Set function achieves $65\%$ of the benefits, almost matching LIME.

Is a Caption Worth a Thousand Images? A Controlled Study for Representation Learning

no code implementations15 Jul 2022 Shibani Santurkar, Yann Dubois, Rohan Taori, Percy Liang, Tatsunori Hashimoto

The development of CLIP [Radford et al., 2021] has sparked a debate on whether language supervision can result in vision models with more transferable representations than traditional image-only methods.

Descriptive Representation Learning

Calibrated ensembles can mitigate accuracy tradeoffs under distribution shift

no code implementations18 Jul 2022 Ananya Kumar, Tengyu Ma, Percy Liang, aditi raghunathan

We often see undesirable tradeoffs in robust machine learning where out-of-distribution (OOD) accuracy is at odds with in-distribution (ID) accuracy: a robust classifier obtained via specialized techniques such as removing spurious features often has better OOD but worse ID accuracy compared to a standard classifier trained via ERM.

What Can Transformers Learn In-Context? A Case Study of Simple Function Classes

2 code implementations1 Aug 2022 Shivam Garg, Dimitris Tsipras, Percy Liang, Gregory Valiant

To make progress towards understanding in-context learning, we consider the well-defined problem of training a model to in-context learn a function class (e. g., linear functions): that is, given data derived from some functions in the class, can we train a model to in-context learn "most" functions from this class?

In-Context Learning

Are Sample-Efficient NLP Models More Robust?

no code implementations12 Oct 2022 Nelson F. Liu, Ananya Kumar, Percy Liang, Robin Jia

Recent results in image classification and extractive question answering have observed that pre-trained models trained on less in-distribution data have better out-of-distribution performance.

Extractive Question-Answering Image Classification +2

Surgical Fine-Tuning Improves Adaptation to Distribution Shifts

1 code implementation20 Oct 2022 Yoonho Lee, Annie S. Chen, Fahim Tajwar, Ananya Kumar, Huaxiu Yao, Percy Liang, Chelsea Finn

A common approach to transfer learning under distribution shift is to fine-tune the last few layers of a pre-trained model, preserving learned features while also adapting to the new task.

Transfer Learning

Contrastive Decoding: Open-ended Text Generation as Optimization

2 code implementations27 Oct 2022 Xiang Lisa Li, Ari Holtzman, Daniel Fried, Percy Liang, Jason Eisner, Tatsunori Hashimoto, Luke Zettlemoyer, Mike Lewis

We propose contrastive decoding (CD), a reliable decoding approach that optimizes a contrastive objective subject to a plausibility constraint.

Language Modelling Text Generation

Truncation Sampling as Language Model Desmoothing

1 code implementation27 Oct 2022 John Hewitt, Christopher D. Manning, Percy Liang

In this light, truncation algorithms aim to perform desmoothing, estimating a subset of the support of the true distribution.

Language Modelling

How do Authors' Perceptions of their Papers Compare with Co-authors' Perceptions and Peer-review Decisions?

no code implementations22 Nov 2022 Charvi Rastogi, Ivan Stelmakh, Alina Beygelzimer, Yann N. Dauphin, Percy Liang, Jennifer Wortman Vaughan, Zhenyu Xue, Hal Daumé III, Emma Pierson, Nihar B. Shah

In a top-tier computer science conference (NeurIPS 2021) with more than 23, 000 submitting authors and 9, 000 submitted papers, we survey the authors on three questions: (i) their predicted probability of acceptance for each of their papers, (ii) their perceived ranking of their own papers based on scientific contribution, and (iii) the change in their perception about their own papers after seeing the reviews.

Retrieval-Augmented Multimodal Language Modeling

no code implementations22 Nov 2022 Michihiro Yasunaga, Armen Aghajanyan, Weijia Shi, Rich James, Jure Leskovec, Percy Liang, Mike Lewis, Luke Zettlemoyer, Wen-tau Yih

To integrate knowledge in a more scalable and modular way, we propose a retrieval-augmented multimodal model, which enables a base multimodal model (generator) to refer to relevant text and images fetched by a retriever from external memory (e. g., documents on the web).

Caption Generation Image Captioning +5

Picking on the Same Person: Does Algorithmic Monoculture lead to Outcome Homogenization?

no code implementations25 Nov 2022 Rishi Bommasani, Kathleen A. Creel, Ananya Kumar, Dan Jurafsky, Percy Liang

As the scope of machine learning broadens, we observe a recurring theme of algorithmic monoculture: the same systems, or systems that share components (e. g. training data), are deployed by multiple decision-makers.

Fairness

Melody transcription via generative pre-training

1 code implementation4 Dec 2022 Chris Donahue, John Thickstun, Percy Liang

The combination of generative pre-training and a new dataset for this task results in $77$% stronger performance on melody transcription relative to the strongest available baseline.

Chord Recognition Information Retrieval +2

Evaluating Human-Language Model Interaction

1 code implementation19 Dec 2022 Mina Lee, Megha Srivastava, Amelia Hardy, John Thickstun, Esin Durmus, Ashwin Paranjape, Ines Gerard-Ursin, Xiang Lisa Li, Faisal Ladhak, Frieda Rong, Rose E. Wang, Minae Kwon, Joon Sung Park, Hancheng Cao, Tony Lee, Rishi Bommasani, Michael Bernstein, Percy Liang

To evaluate human-LM interaction, we develop a new framework, Human-AI Language-based Interaction Evaluation (HALIE), that defines the components of interactive systems and dimensions to consider when designing evaluation metrics.

Language Modelling Question Answering

Trustworthy Social Bias Measurement

1 code implementation20 Dec 2022 Rishi Bommasani, Percy Liang

How do we design measures of social bias that we trust?

Demonstrate-Search-Predict: Composing retrieval and language models for knowledge-intensive NLP

2 code implementations28 Dec 2022 Omar Khattab, Keshav Santhanam, Xiang Lisa Li, David Hall, Percy Liang, Christopher Potts, Matei Zaharia

Retrieval-augmented in-context learning has emerged as a powerful approach for addressing knowledge-intensive tasks using frozen language models (LM) and retrieval models (RM).

In-Context Learning Language Modelling +2

"No, to the Right" -- Online Language Corrections for Robotic Manipulation via Shared Autonomy

1 code implementation6 Jan 2023 Yuchen Cui, Siddharth Karamcheti, Raj Palleti, Nidhya Shivakumar, Percy Liang, Dorsa Sadigh

Instead of discrete turn-taking between a human and robot, LILAC splits agency between the human and robot: language is an input to a learned model that produces a meaningful, low-dimensional control space that the human can use to guide the robot.

Instruction Following

Benchmarking Large Language Models for News Summarization

1 code implementation31 Jan 2023 Tianyi Zhang, Faisal Ladhak, Esin Durmus, Percy Liang, Kathleen McKeown, Tatsunori B. Hashimoto

Large language models (LLMs) have shown promise for automatic summarization but the reasons behind their successes are poorly understood.

Benchmarking News Summarization

Evaluating Self-Supervised Learning via Risk Decomposition

1 code implementation6 Feb 2023 Yann Dubois, Tatsunori Hashimoto, Percy Liang

Our decomposition consists of four error components: approximation, representation usability, probe generalization, and encoder generalization.

Representation Learning Self-Supervised Learning

Data Selection for Language Models via Importance Resampling

1 code implementation NeurIPS 2023 Sang Michael Xie, Shibani Santurkar, Tengyu Ma, Percy Liang

To measure whether hashed n-gram features preserve the aspects of the data that are relevant to the target, we define KL reduction, a data metric that measures the proximity between the selected pretraining data and the target on some feature space.

Out-of-Domain Robustness via Targeted Augmentations

1 code implementation23 Feb 2023 Irena Gao, Shiori Sagawa, Pang Wei Koh, Tatsunori Hashimoto, Percy Liang

Models trained on one set of domains often suffer performance drops on unseen domains, e. g., when wildlife monitoring models are deployed in new camera locations.

Language-Driven Representation Learning for Robotics

2 code implementations24 Feb 2023 Siddharth Karamcheti, Suraj Nair, Annie S. Chen, Thomas Kollar, Chelsea Finn, Dorsa Sadigh, Percy Liang

First, we demonstrate that existing representations yield inconsistent results across these tasks: masked autoencoding approaches pick up on low-level spatial features at the cost of high-level semantics, while contrastive learning approaches capture the opposite.

Contrastive Learning Imitation Learning +2

Improving Representational Continuity via Continued Pretraining

1 code implementation26 Feb 2023 Michael Sun, Ananya Kumar, Divyam Madaan, Percy Liang

We consider the continual representation learning setting: sequentially pretrain a model $M'$ on tasks $T_1, \ldots, T_T$, and then adapt $M'$ on a small amount of data from each task $T_i$ to check if it has forgotten information from old tasks.

Continual Learning Representation Learning +1

FlexGen: High-Throughput Generative Inference of Large Language Models with a Single GPU

1 code implementation13 Mar 2023 Ying Sheng, Lianmin Zheng, Binhang Yuan, Zhuohan Li, Max Ryabinin, Daniel Y. Fu, Zhiqiang Xie, Beidi Chen, Clark Barrett, Joseph E. Gonzalez, Percy Liang, Christopher Ré, Ion Stoica, Ce Zhang

As a result, when running OPT-175B on a single 16GB GPU, FlexGen achieves significantly higher throughput compared to state-of-the-art offloading systems, reaching a generation throughput of 1 token/s for the first time with an effective batch size of 144.

Language Modelling Large Language Model

Ecosystem Graphs: The Social Footprint of Foundation Models

no code implementations28 Mar 2023 Rishi Bommasani, Dilara Soylu, Thomas I. Liao, Kathleen A. Creel, Percy Liang

Foundation models (e. g. ChatGPT, StableDiffusion) pervasively influence society, warranting immediate social attention.

Whose Opinions Do Language Models Reflect?

1 code implementation30 Mar 2023 Shibani Santurkar, Esin Durmus, Faisal Ladhak, Cinoo Lee, Percy Liang, Tatsunori Hashimoto

Language models (LMs) are increasingly being used in open-ended contexts, where the opinions reflected by LMs in response to subjective queries can have a profound impact, both on user satisfaction, as well as shaping the views of society at large.

Generative Agents: Interactive Simulacra of Human Behavior

7 code implementations7 Apr 2023 Joon Sung Park, Joseph C. O'Brien, Carrie J. Cai, Meredith Ringel Morris, Percy Liang, Michael S. Bernstein

Believable proxies of human behavior can empower interactive applications ranging from immersive environments to rehearsal spaces for interpersonal communication to prototyping tools.

Language Modelling Large Language Model

Evaluating Verifiability in Generative Search Engines

1 code implementation19 Apr 2023 Nelson F. Liu, Tianyi Zhang, Percy Liang

Generative search engines directly generate responses to user queries, along with in-line citations.

Sentence

DoReMi: Optimizing Data Mixtures Speeds Up Language Model Pretraining

2 code implementations NeurIPS 2023 Sang Michael Xie, Hieu Pham, Xuanyi Dong, Nan Du, Hanxiao Liu, Yifeng Lu, Percy Liang, Quoc V. Le, Tengyu Ma, Adams Wei Yu

The mixture proportions of pretraining data domains (e. g., Wikipedia, books, web text) greatly affect language model (LM) performance.

Language Modelling

PRODIGY: Enabling In-context Learning Over Graphs

no code implementations NeurIPS 2023 Qian Huang, Hongyu Ren, Peng Chen, Gregor Kržmanc, Daniel Zeng, Percy Liang, Jure Leskovec

In-context learning is the ability of a pretrained model to adapt to novel and diverse downstream tasks by conditioning on prompt examples, without optimizing any parameters.

In-Context Learning Knowledge Graphs

AlpacaFarm: A Simulation Framework for Methods that Learn from Human Feedback

2 code implementations NeurIPS 2023 Yann Dubois, Xuechen Li, Rohan Taori, Tianyi Zhang, Ishaan Gulrajani, Jimmy Ba, Carlos Guestrin, Percy Liang, Tatsunori B. Hashimoto

As a demonstration of the research possible in AlpacaFarm, we find that methods that use a reward model can substantially improve over supervised fine-tuning and that our reference PPO implementation leads to a +10% improvement in win-rate against Davinci003.

Instruction Following

Sophia: A Scalable Stochastic Second-order Optimizer for Language Model Pre-training

3 code implementations23 May 2023 Hong Liu, Zhiyuan Li, David Hall, Percy Liang, Tengyu Ma

Given the massive cost of language model pre-training, a non-trivial improvement of the optimization algorithm would lead to a material reduction on the time and cost of training.

Language Modelling Stochastic Optimization

Backpack Language Models

1 code implementation26 May 2023 John Hewitt, John Thickstun, Christopher D. Manning, Percy Liang

We can interpret a sense vector by inspecting its (non-contextual, linear) projection onto the output space, and intervene on these interpretable hooks to change the model's behavior in predictable ways.

Language Modelling Text Generation +1

Beyond Positive Scaling: How Negation Impacts Scaling Trends of Language Models

1 code implementation27 May 2023 Yuhui Zhang, Michihiro Yasunaga, Zhengping Zhou, Jeff Z. HaoChen, James Zou, Percy Liang, Serena Yeung

Language models have been shown to exhibit positive scaling, where performance improves as models are scaled up in terms of size, compute, or data.

Negation Question Answering +1

Has the Machine Learning Review Process Become More Arbitrary as the Field Has Grown? The NeurIPS 2021 Consistency Experiment

no code implementations5 Jun 2023 Alina Beygelzimer, Yann N. Dauphin, Percy Liang, Jennifer Wortman Vaughan

We present the NeurIPS 2021 consistency experiment, a larger-scale variant of the 2014 NeurIPS experiment in which 10% of conference submissions were reviewed by two independent committees to quantify the randomness in the review process.

One-sided Matrix Completion from Two Observations Per Row

no code implementations6 Jun 2023 Steven Cao, Percy Liang, Gregory Valiant

We propose a natural algorithm that involves imputing the missing values of the matrix $X^TX$ and show that even with only two observations per row in $X$, we can provably recover $X^TX$ as long as we have at least $\Omega(r^2 d \log d)$ rows, where $r$ is the rank and $d$ is the number of columns.

Matrix Completion

Anticipatory Music Transformer

no code implementations14 Jun 2023 John Thickstun, David Hall, Chris Donahue, Percy Liang

We achieve this by interleaving sequences of events and controls, such that controls appear following stopping times in the event sequence.

Music Generation

Lost in the Middle: How Language Models Use Long Contexts

4 code implementations6 Jul 2023 Nelson F. Liu, Kevin Lin, John Hewitt, Ashwin Paranjape, Michele Bevilacqua, Fabio Petroni, Percy Liang

While recent language models have the ability to take long contexts as input, relatively little is known about how well they use longer context.

Language Modelling Position +2

Ecosystem-level Analysis of Deployed Machine Learning Reveals Homogeneous Outcomes

no code implementations NeurIPS 2023 Connor Toups, Rishi Bommasani, Kathleen A. Creel, Sarah H. Bana, Dan Jurafsky, Percy Liang

In practice, the societal impact of machine learning is determined by the surrounding context of machine learning deployments.

Robust Distortion-free Watermarks for Language Models

2 code implementations28 Jul 2023 Rohith Kuditipudi, John Thickstun, Tatsunori Hashimoto, Percy Liang

We generate watermarked text by mapping a sequence of random numbers -- which we compute using a randomized watermark key -- to a sample from the language model.

Language Modelling

Cannot find the paper you are looking for? You can Submit a new open access paper.