Search Results for author: Percy Liang

Found 217 papers, 157 papers with code

Overparameterization hurts worst-group accuracy with spurious correlations

no code implementations • ICML 2020 • Shiori Sagawa, aditi raghunathan, Pang Wei Koh, Percy Liang

Increasing model capacity well beyond the point of zero training error has been observed to improve average test accuracy.

Paper
Add Code

Introducing v0.5 of the AI Safety Benchmark from MLCommons

no code implementations • 18 Apr 2024 • Bertie Vidgen, Adarsh Agrawal, Ahmed M. Ahmed, Victor Akinwande, Namir Al-Nuaimi, Najla Alfaraj, Elie Alhajjar, Lora Aroyo, Trupti Bavalatti, Borhane Blili-Hamelin, Kurt Bollacker, Rishi Bomassani, Marisa Ferrara Boston, Siméon Campos, Kal Chakra, Canyu Chen, Cody Coleman, Zacharie Delpierre Coudert, Leon Derczynski, Debojyoti Dutta, Ian Eisenberg, James Ezick, Heather Frase, Brian Fuller, Ram Gandikota, Agasthya Gangavarapu, Ananya Gangavarapu, James Gealy, Rajat Ghosh, James Goel, Usman Gohar, Sujata Goswami, Scott A. Hale, Wiebke Hutiri, Joseph Marvin Imperial, Surgan Jandial, Nick Judd, Felix Juefei-Xu, Foutse khomh, Bhavya Kailkhura, Hannah Rose Kirk, Kevin Klyman, Chris Knotz, Michael Kuchnik, Shachi H. Kumar, Chris Lengerich, Bo Li, Zeyi Liao, Eileen Peters Long, Victor Lu, Yifan Mai, Priyanka Mary Mammen, Kelvin Manyeki, Sean McGregor, Virendra Mehta, Shafee Mohammed, Emanuel Moss, Lama Nachman, Dinesh Jinenhally Naganna, Amin Nikanjam, Besmira Nushi, Luis Oala, Iftach Orr, Alicia Parrish, Cigdem Patlak, William Pietri, Forough Poursabzi-Sangdeh, Eleonora Presani, Fabrizio Puletti, Paul Röttger, Saurav Sahay, Tim Santos, Nino Scherrer, Alice Schoenauer Sebag, Patrick Schramowski, Abolfazl Shahbazi, Vin Sharma, Xudong Shen, Vamsi Sistla, Leonard Tang, Davide Testuggine, Vithursan Thangarasa, Elizabeth Anne Watkins, Rebecca Weiss, Chris Welty, Tyler Wilbers, Adina Williams, Carole-Jean Wu, Poonam Yadav, Xianjun Yang, Yi Zeng, Wenhui Zhang, Fedor Zhdanov, Jiacheng Zhu, Percy Liang, Peter Mattson, Joaquin Vanschoren

We created a new taxonomy of 13 hazard categories, of which 7 have tests in the v0. 5 benchmark.

Paper
Add Code

Length-Controlled AlpacaEval: A Simple Way to Debias Automatic Evaluators

1 code implementation • 6 Apr 2024 • Yann Dubois, Balázs Galambosi, Percy Liang, Tatsunori B. Hashimoto

Even simple, known confounders such as preference for longer outputs remain in existing automated evaluation metrics.

Chatbot counterfactual

1,079

Paper
Code

FLawN-T5: An Empirical Examination of Effective Instruction-Tuning Data Mixtures for Legal Reasoning

1 code implementation • 2 Apr 2024 • Joel Niklaus, Lucia Zheng, Arya D. McCarthy, Christopher Hahn, Brian M. Rosen, Peter Henderson, Daniel E. Ho, Garrett Honke, Percy Liang, Christopher Manning

In this work, we curate LawInstruct, a large legal instruction dataset, covering 17 jurisdictions, 24 languages and a total of 12M examples.

Decision Making Legal Reasoning

Paper
Code

BioMedLM: A 2.7B Parameter Language Model Trained On Biomedical Text

1 code implementation • 27 Mar 2024 • Elliot Bolton, Abhinav Venigalla, Michihiro Yasunaga, David Hall, Betty Xiong, Tony Lee, Roxana Daneshjou, Jonathan Frankle, Percy Liang, Michael Carbin, Christopher D. Manning

Models such as GPT-4 and Med-PaLM 2 have demonstrated impressive performance on a wide variety of biomedical NLP tasks.

Language Modelling Medical Genetics +3

578

Paper
Code

A Safe Harbor for AI Evaluation and Red Teaming

no code implementations • 7 Mar 2024 • Shayne Longpre, Sayash Kapoor, Kevin Klyman, Ashwin Ramaswami, Rishi Bommasani, Borhane Blili-Hamelin, Yangsibo Huang, Aviya Skowron, Zheng-Xin Yong, Suhas Kotha, Yi Zeng, Weiyan Shi, Xianjun Yang, Reid Southen, Alexander Robey, Patrick Chao, Diyi Yang, Ruoxi Jia, Daniel Kang, Sandy Pentland, Arvind Narayanan, Percy Liang, Peter Henderson

Independent evaluation and red teaming are critical for identifying the risks posed by generative AI systems.

Paper
Add Code

On the Societal Impact of Open Foundation Models

no code implementations • 27 Feb 2024 • Sayash Kapoor, Rishi Bommasani, Kevin Klyman, Shayne Longpre, Ashwin Ramaswami, Peter Cihon, Aspen Hopkins, Kevin Bankston, Stella Biderman, Miranda Bogen, Rumman Chowdhury, Alex Engler, Peter Henderson, Yacine Jernite, Seth Lazar, Stefano Maffulli, Alondra Nelson, Joelle Pineau, Aviya Skowron, Dawn Song, Victor Storchan, Daniel Zhang, Daniel E. Ho, Percy Liang, Arvind Narayanan

To understand their risks of misuse, we design a risk assessment framework for analyzing their marginal risk.

Decision Making

Paper
Add Code

Foundation Model Transparency Reports

no code implementations • 26 Feb 2024 • Rishi Bommasani, Kevin Klyman, Shayne Longpre, Betty Xiong, Sayash Kapoor, Nestor Maslej, Arvind Narayanan, Percy Liang

Foundation models are critical digital technologies with sweeping societal impact that necessitates transparency.

Paper
Add Code

Prismatic VLMs: Investigating the Design Space of Visually-Conditioned Language Models

2 code implementations • 12 Feb 2024 • Siddharth Karamcheti, Suraj Nair, Ashwin Balakrishna, Percy Liang, Thomas Kollar, Dorsa Sadigh

Visually-conditioned language models (VLMs) have seen growing adoption in applications such as visual dialogue, scene understanding, and robotic task planning; adoption that has fueled a wealth of new models such as LLaVa, InstructBLIP, and PaLI-3.

Hallucination Object Localization +3

177

Paper
Code

Model Editing with Canonical Examples

1 code implementation • 9 Feb 2024 • John Hewitt, Sarah Chen, Lanruo Lora Xie, Edward Adams, Percy Liang, Christopher D. Manning

The Backpack defines a large bank of sense vectors--a decomposition of the different uses of each word--which are weighted and summed to form the output logits of the model.

Language Modelling Model Editing

Paper
Code

On the Learnability of Watermarks for Language Models

1 code implementation • 7 Dec 2023 • Chenchen Gu, Xiang Lisa Li, Percy Liang, Tatsunori Hashimoto

Watermarking of language model outputs enables statistical detection of model-generated text, which has many applications in the responsible deployment of language models.

Language Modelling

Paper
Code

Llamas Know What GPTs Don't Show: Surrogate Models for Confidence Estimation

no code implementations • 15 Nov 2023 • Vaishnavi Shrivastava, Percy Liang, Ananya Kumar

To maintain user trust, large language models (LLMs) should signal low confidence on examples where they are incorrect, instead of misleading the user.

Question Answering

Paper
Add Code

Holistic Evaluation of Text-To-Image Models

1 code implementation • NeurIPS 2023 • Tony Lee, Michihiro Yasunaga, Chenlin Meng, Yifan Mai, Joon Sung Park, Agrim Gupta, Yunzhi Zhang, Deepak Narayanan, Hannah Benita Teufel, Marco Bellagente, Minguk Kang, Taesung Park, Jure Leskovec, Jun-Yan Zhu, Li Fei-Fei, Jiajun Wu, Stefano Ermon, Percy Liang

The stunning qualitative improvement of recent text-to-image models has led to their widespread attention and adoption.

Fairness

1,625

Paper
Code

The Foundation Model Transparency Index

1 code implementation • 19 Oct 2023 • Rishi Bommasani, Kevin Klyman, Shayne Longpre, Sayash Kapoor, Nestor Maslej, Betty Xiong, Daniel Zhang, Percy Liang

We score 10 major foundation model developers (e. g. OpenAI, Google, Meta) against the 100 indicators to assess their transparency.

Paper
Code

MLAgentBench: Evaluating Language Agents on Machine Learning Experimentation

1 code implementation • 5 Oct 2023 • Qian Huang, Jian Vora, Percy Liang, Jure Leskovec

A central aspect of machine learning research is experimentation, the process of designing and running experiments, analyzing the results, and iterating towards some positive outcome (e. g., improving accuracy).

Benchmarking Decision Making +1

177

Paper
Code

Large Language Models as Analogical Reasoners

no code implementations • 3 Oct 2023 • Michihiro Yasunaga, Xinyun Chen, Yujia Li, Panupong Pasupat, Jure Leskovec, Percy Liang, Ed H. Chi, Denny Zhou

Chain-of-thought (CoT) prompting for language models demonstrates impressive performance across reasoning tasks, but typically needs labeled exemplars of the reasoning process.

Code Generation GSM8K +1

Paper
Add Code

Benchmarking and Improving Generator-Validator Consistency of Language Models

no code implementations • 3 Oct 2023 • Xiang Lisa Li, Vaishnavi Shrivastava, Siyan Li, Tatsunori Hashimoto, Percy Liang

To improve the consistency of LMs, we propose to finetune on the filtered generator and validator responses that are GV-consistent, and call this approach consistency fine-tuning.

Benchmarking Instruction Following +1

Paper
Add Code

MedAlign: A Clinician-Generated Dataset for Instruction Following with Electronic Medical Records

no code implementations • 27 Aug 2023 • Scott L. Fleming, Alejandro Lozano, William J. Haberkorn, Jenelle A. Jindal, Eduardo P. Reis, Rahul Thapa, Louis Blankemeier, Julian Z. Genkins, Ethan Steinberg, Ashwin Nayak, Birju S. Patel, Chia-Chun Chiang, Alison Callahan, Zepeng Huo, Sergios Gatidis, Scott J. Adams, Oluseyi Fayanju, Shreya J. Shah, Thomas Savage, Ethan Goh, Akshay S. Chaudhari, Nima Aghaeepour, Christopher Sharp, Michael A. Pfeffer, Percy Liang, Jonathan H. Chen, Keith E. Morse, Emma P. Brunskill, Jason A. Fries, Nigam H. Shah

The ability of large language models (LLMs) to follow natural language instructions with human-level fluency suggests many opportunities in healthcare to reduce administrative burden and improve quality of care.

2k Instruction Following +2

Paper
Add Code

Robust Distortion-free Watermarks for Language Models

2 code implementations • 28 Jul 2023 • Rohith Kuditipudi, John Thickstun, Tatsunori Hashimoto, Percy Liang

We generate watermarked text by mapping a sequence of random numbers -- which we compute using a randomized watermark key -- to a sample from the language model.

Language Modelling

Paper
Code

Ecosystem-level Analysis of Deployed Machine Learning Reveals Homogeneous Outcomes

no code implementations • NeurIPS 2023 • Connor Toups, Rishi Bommasani, Kathleen A. Creel, Sarah H. Bana, Dan Jurafsky, Percy Liang

In practice, the societal impact of machine learning is determined by the surrounding context of machine learning deployments.

Paper
Add Code

Lost in the Middle: How Language Models Use Long Contexts

4 code implementations • 6 Jul 2023 • Nelson F. Liu, Kevin Lin, John Hewitt, Ashwin Paranjape, Michele Bevilacqua, Fabio Petroni, Percy Liang

While recent language models have the ability to take long contexts as input, relatively little is known about how well they use longer context.

Language Modelling Position +2

259

Paper
Code

Just One Byte (per gradient): A Note on Low-Bandwidth Decentralized Language Model Finetuning Using Shared Randomness

1 code implementation • 16 Jun 2023 • Eric Zelikman, Qian Huang, Percy Liang, Nick Haber, Noah D. Goodman

Language model training in distributed settings is limited by the communication cost of gradient exchanges.

Distributed Optimization Language Modelling

Paper
Code

Anticipatory Music Transformer

no code implementations • 14 Jun 2023 • John Thickstun, David Hall, Chris Donahue, Percy Liang

We achieve this by interleaving sequences of events and controls, such that controls appear following stopping times in the event sequence.

Music Generation

Paper
Add Code

One-sided Matrix Completion from Two Observations Per Row

no code implementations • 6 Jun 2023 • Steven Cao, Percy Liang, Gregory Valiant

We propose a natural algorithm that involves imputing the missing values of the matrix $X^TX$ and show that even with only two observations per row in $X$, we can provably recover $X^TX$ as long as we have at least $\Omega(r^2 d \log d)$ rows, where $r$ is the rank and $d$ is the number of columns.

Matrix Completion

Paper
Add Code

Has the Machine Learning Review Process Become More Arbitrary as the Field Has Grown? The NeurIPS 2021 Consistency Experiment

no code implementations • 5 Jun 2023 • Alina Beygelzimer, Yann N. Dauphin, Percy Liang, Jennifer Wortman Vaughan

We present the NeurIPS 2021 consistency experiment, a larger-scale variant of the 2014 NeurIPS experiment in which 10% of conference submissions were reviewed by two independent committees to quantify the randomness in the review process.

Paper
Add Code

Beyond Positive Scaling: How Negation Impacts Scaling Trends of Language Models

1 code implementation • 27 May 2023 • Yuhui Zhang, Michihiro Yasunaga, Zhengping Zhou, Jeff Z. HaoChen, James Zou, Percy Liang, Serena Yeung

Language models have been shown to exhibit positive scaling, where performance improves as models are scaled up in terms of size, compute, or data.

Negation Question Answering +1

Paper
Code

Backpack Language Models

1 code implementation • 26 May 2023 • John Hewitt, John Thickstun, Christopher D. Manning, Percy Liang

We can interpret a sense vector by inspecting its (non-contextual, linear) projection onto the output space, and intervene on these interpretable hooks to change the model's behavior in predictable ways.

Language Modelling Text Generation +1

Paper
Code

Sophia: A Scalable Stochastic Second-order Optimizer for Language Model Pre-training

3 code implementations • 23 May 2023 • Hong Liu, Zhiyuan Li, David Hall, Percy Liang, Tengyu Ma

Given the massive cost of language model pre-training, a non-trivial improvement of the optimization algorithm would lead to a material reduction on the time and cost of training.

Language Modelling Stochastic Optimization

881

Paper
Code

AlpacaFarm: A Simulation Framework for Methods that Learn from Human Feedback

2 code implementations • NeurIPS 2023 • Yann Dubois, Xuechen Li, Rohan Taori, Tianyi Zhang, Ishaan Gulrajani, Jimmy Ba, Carlos Guestrin, Percy Liang, Tatsunori B. Hashimoto

As a demonstration of the research possible in AlpacaFarm, we find that methods that use a reward model can substantially improve over supervised fine-tuning and that our reference PPO implementation leads to a +10% improvement in win-rate against Davinci003.

Instruction Following

1,079

Paper
Code

PRODIGY: Enabling In-context Learning Over Graphs

no code implementations • NeurIPS 2023 • Qian Huang, Hongyu Ren, Peng Chen, Gregor Kržmanc, Daniel Zeng, Percy Liang, Jure Leskovec

In-context learning is the ability of a pretrained model to adapt to novel and diverse downstream tasks by conditioning on prompt examples, without optimizing any parameters.

In-Context Learning Knowledge Graphs

Paper
Add Code

DoReMi: Optimizing Data Mixtures Speeds Up Language Model Pretraining

2 code implementations • NeurIPS 2023 • Sang Michael Xie, Hieu Pham, Xuanyi Dong, Nan Du, Hanxiao Liu, Yifeng Lu, Percy Liang, Quoc V. Le, Tengyu Ma, Adams Wei Yu

The mixture proportions of pretraining data domains (e. g., Wikipedia, books, web text) greatly affect language model (LM) performance.

Language Modelling

436

Paper
Code

Cheaply Evaluating Inference Efficiency Metrics for Autoregressive Transformer APIs

no code implementations • 3 May 2023 • Deepak Narayanan, Keshav Santhanam, Peter Henderson, Rishi Bommasani, Tony Lee, Percy Liang

Large language models (LLMs) power many state-of-the-art systems in natural language processing.

Text Generation

Paper
Add Code

Evaluating Verifiability in Generative Search Engines

1 code implementation • 19 Apr 2023 • Nelson F. Liu, Tianyi Zhang, Percy Liang

Generative search engines directly generate responses to user queries, along with in-line citations.

Sentence

Paper
Code

Generative Agents: Interactive Simulacra of Human Behavior

7 code implementations • 7 Apr 2023 • Joon Sung Park, Joseph C. O'Brien, Carrie J. Cai, Meredith Ringel Morris, Percy Liang, Michael S. Bernstein

Believable proxies of human behavior can empower interactive applications ranging from immersive environments to rehearsal spaces for interpersonal communication to prototyping tools.

Language Modelling Large Language Model

15,042

Paper
Code

Whose Opinions Do Language Models Reflect?

1 code implementation • 30 Mar 2023 • Shibani Santurkar, Esin Durmus, Faisal Ladhak, Cinoo Lee, Percy Liang, Tatsunori Hashimoto

Language models (LMs) are increasingly being used in open-ended contexts, where the opinions reflected by LMs in response to subjective queries can have a profound impact, both on user satisfaction, as well as shaping the views of society at large.

Paper
Code

Ecosystem Graphs: The Social Footprint of Foundation Models

no code implementations • 28 Mar 2023 • Rishi Bommasani, Dilara Soylu, Thomas I. Liao, Kathleen A. Creel, Percy Liang

Foundation models (e. g. ChatGPT, StableDiffusion) pervasively influence society, warranting immediate social attention.

Paper
Add Code

FlexGen: High-Throughput Generative Inference of Large Language Models with a Single GPU

1 code implementation • 13 Mar 2023 • Ying Sheng, Lianmin Zheng, Binhang Yuan, Zhuohan Li, Max Ryabinin, Daniel Y. Fu, Zhiqiang Xie, Beidi Chen, Clark Barrett, Joseph E. Gonzalez, Percy Liang, Christopher Ré, Ion Stoica, Ce Zhang

As a result, when running OPT-175B on a single 16GB GPU, FlexGen achieves significantly higher throughput compared to state-of-the-art offloading systems, reaching a generation throughput of 1 token/s for the first time with an effective batch size of 144.

Language Modelling Large Language Model

8,987

Paper
Code

Improving Representational Continuity via Continued Pretraining

1 code implementation • 26 Feb 2023 • Michael Sun, Ananya Kumar, Divyam Madaan, Percy Liang

We consider the continual representation learning setting: sequentially pretrain a model $M'$ on tasks $T_1, \ldots, T_T$, and then adapt $M'$ on a small amount of data from each task $T_i$ to check if it has forgotten information from old tasks.

Continual Learning Representation Learning +1

Paper
Code

Language-Driven Representation Learning for Robotics

2 code implementations • 24 Feb 2023 • Siddharth Karamcheti, Suraj Nair, Annie S. Chen, Thomas Kollar, Chelsea Finn, Dorsa Sadigh, Percy Liang

First, we demonstrate that existing representations yield inconsistent results across these tasks: masked autoencoding approaches pick up on low-level spatial features at the cost of high-level semantics, while contrastive learning approaches capture the opposite.

Contrastive Learning Imitation Learning +2

172

Paper
Code

Out-of-Domain Robustness via Targeted Augmentations

1 code implementation • 23 Feb 2023 • Irena Gao, Shiori Sagawa, Pang Wei Koh, Tatsunori Hashimoto, Percy Liang

Models trained on one set of domains often suffer performance drops on unseen domains, e. g., when wildlife monitoring models are deployed in new camera locations.

Paper
Code

Evaluating Self-Supervised Learning via Risk Decomposition

1 code implementation • 6 Feb 2023 • Yann Dubois, Tatsunori Hashimoto, Percy Liang

Our decomposition consists of four error components: approximation, representation usability, probe generalization, and encoder generalization.

Representation Learning Self-Supervised Learning

Paper
Code

Data Selection for Language Models via Importance Resampling

1 code implementation • NeurIPS 2023 • Sang Michael Xie, Shibani Santurkar, Tengyu Ma, Percy Liang

To measure whether hashed n-gram features preserve the aspects of the data that are relevant to the target, we define KL reduction, a data metric that measures the proximity between the selected pretraining data and the target on some feature space.

185

Paper
Code

Benchmarking Large Language Models for News Summarization

1 code implementation • 31 Jan 2023 • Tianyi Zhang, Faisal Ladhak, Esin Durmus, Percy Liang, Kathleen McKeown, Tatsunori B. Hashimoto

Large language models (LLMs) have shown promise for automatic summarization but the reasons behind their successes are poorly understood.

Benchmarking News Summarization

Paper
Code

"No, to the Right" -- Online Language Corrections for Robotic Manipulation via Shared Autonomy

1 code implementation • 6 Jan 2023 • Yuchen Cui, Siddharth Karamcheti, Raj Palleti, Nidhya Shivakumar, Percy Liang, Dorsa Sadigh

Instead of discrete turn-taking between a human and robot, LILAC splits agency between the human and robot: language is an input to a learned model that produces a meaningful, low-dimensional control space that the human can use to guide the robot.

Instruction Following

Paper
Code

Demonstrate-Search-Predict: Composing retrieval and language models for knowledge-intensive NLP

2 code implementations • 28 Dec 2022 • Omar Khattab, Keshav Santhanam, Xiang Lisa Li, David Hall, Percy Liang, Christopher Potts, Matei Zaharia

Retrieval-augmented in-context learning has emerged as a powerful approach for addressing knowledge-intensive tasks using frozen language models (LM) and retrieval models (RM).

In-Context Learning Language Modelling +2

10,165

Paper
Code

Trustworthy Social Bias Measurement

1 code implementation • 20 Dec 2022 • Rishi Bommasani, Percy Liang

How do we design measures of social bias that we trust?

Paper
Code

Evaluating Human-Language Model Interaction

1 code implementation • 19 Dec 2022 • Mina Lee, Megha Srivastava, Amelia Hardy, John Thickstun, Esin Durmus, Ashwin Paranjape, Ines Gerard-Ursin, Xiang Lisa Li, Faisal Ladhak, Frieda Rong, Rose E. Wang, Minae Kwon, Joon Sung Park, Hancheng Cao, Tony Lee, Rishi Bommasani, Michael Bernstein, Percy Liang

To evaluate human-LM interaction, we develop a new framework, Human-AI Language-based Interaction Evaluation (HALIE), that defines the components of interactive systems and dimensions to consider when designing evaluation metrics.

Language Modelling Question Answering

Paper
Code

Melody transcription via generative pre-training

1 code implementation • 4 Dec 2022 • Chris Donahue, John Thickstun, Percy Liang

The combination of generative pre-training and a new dataset for this task results in $77$% stronger performance on melody transcription relative to the strongest available baseline.

Chord Recognition Information Retrieval +2

242

Paper
Code

Picking on the Same Person: Does Algorithmic Monoculture lead to Outcome Homogenization?

no code implementations • 25 Nov 2022 • Rishi Bommasani, Kathleen A. Creel, Ananya Kumar, Dan Jurafsky, Percy Liang

As the scope of machine learning broadens, we observe a recurring theme of algorithmic monoculture: the same systems, or systems that share components (e. g. training data), are deployed by multiple decision-makers.

Fairness

Paper
Add Code

How do Authors' Perceptions of their Papers Compare with Co-authors' Perceptions and Peer-review Decisions?

no code implementations • 22 Nov 2022 • Charvi Rastogi, Ivan Stelmakh, Alina Beygelzimer, Yann N. Dauphin, Percy Liang, Jennifer Wortman Vaughan, Zhenyu Xue, Hal Daumé III, Emma Pierson, Nihar B. Shah

In a top-tier computer science conference (NeurIPS 2021) with more than 23, 000 submitting authors and 9, 000 submitted papers, we survey the authors on three questions: (i) their predicted probability of acceptance for each of their papers, (ii) their perceived ranking of their own papers based on scientific contribution, and (iii) the change in their perception about their own papers after seeing the reviews.

Paper
Add Code

Retrieval-Augmented Multimodal Language Modeling

no code implementations • 22 Nov 2022 • Michihiro Yasunaga, Armen Aghajanyan, Weijia Shi, Rich James, Jure Leskovec, Percy Liang, Mike Lewis, Luke Zettlemoyer, Wen-tau Yih

To integrate knowledge in a more scalable and modular way, we propose a retrieval-augmented multimodal model, which enables a base multimodal model (generator) to refer to relevant text and images fetched by a retriever from external memory (e. g., documents on the web).

Ranked #7 on Image Captioning on MS COCO

Caption Generation Image Captioning +5

Paper
Add Code

Holistic Evaluation of Language Models

1 code implementation • 16 Nov 2022 • Percy Liang, Rishi Bommasani, Tony Lee, Dimitris Tsipras, Dilara Soylu, Michihiro Yasunaga, Yian Zhang, Deepak Narayanan, Yuhuai Wu, Ananya Kumar, Benjamin Newman, Binhang Yuan, Bobby Yan, Ce Zhang, Christian Cosgrove, Christopher D. Manning, Christopher Ré, Diana Acosta-Navas, Drew A. Hudson, Eric Zelikman, Esin Durmus, Faisal Ladhak, Frieda Rong, Hongyu Ren, Huaxiu Yao, Jue Wang, Keshav Santhanam, Laurel Orr, Lucia Zheng, Mert Yuksekgonul, Mirac Suzgun, Nathan Kim, Neel Guha, Niladri Chatterji, Omar Khattab, Peter Henderson, Qian Huang, Ryan Chi, Sang Michael Xie, Shibani Santurkar, Surya Ganguli, Tatsunori Hashimoto, Thomas Icard, Tianyi Zhang, Vishrav Chaudhary, William Wang, Xuechen Li, Yifan Mai, Yuhui Zhang, Yuta Koreeda

We present Holistic Evaluation of Language Models (HELM) to improve the transparency of language models.

Fairness Question Answering

1,625

Paper
Code

Contrastive Decoding: Open-ended Text Generation as Optimization

2 code implementations • 27 Oct 2022 • Xiang Lisa Li, Ari Holtzman, Daniel Fried, Percy Liang, Jason Eisner, Tatsunori Hashimoto, Luke Zettlemoyer, Mike Lewis

We propose contrastive decoding (CD), a reliable decoding approach that optimizes a contrastive objective subject to a plausibility constraint.

Language Modelling Text Generation

159

Paper
Code

Truncation Sampling as Language Model Desmoothing

1 code implementation • 27 Oct 2022 • John Hewitt, Christopher D. Manning, Percy Liang

In this light, truncation algorithms aim to perform desmoothing, estimating a subset of the support of the true distribution.

Language Modelling

Paper
Code

Surgical Fine-Tuning Improves Adaptation to Distribution Shifts

1 code implementation • 20 Oct 2022 • Yoonho Lee, Annie S. Chen, Fahim Tajwar, Ananya Kumar, Huaxiu Yao, Percy Liang, Chelsea Finn

A common approach to transfer learning under distribution shift is to fine-tune the last few layers of a pre-trained model, preserving learned features while also adapting to the new task.

Transfer Learning

Paper
Code

Deep Bidirectional Language-Knowledge Graph Pretraining

1 code implementation • 17 Oct 2022 • Michihiro Yasunaga, Antoine Bosselut, Hongyu Ren, Xikun Zhang, Christopher D Manning, Percy Liang, Jure Leskovec

Pretraining a language model (LM) on text has been shown to help various downstream NLP tasks.

Ranked #1 on Riddle Sense on RiddleSense

Knowledge Graphs Language Modelling +4

288

Paper
Code

Are Sample-Efficient NLP Models More Robust?

no code implementations • 12 Oct 2022 • Nelson F. Liu, Ananya Kumar, Percy Liang, Robin Jia

Recent results in image classification and extractive question answering have observed that pre-trained models trained on less in-distribution data have better out-of-distribution performance.

Extractive Question-Answering Image Classification +2

Paper
Add Code

Improving Self-Supervised Learning by Characterizing Idealized Representations

1 code implementation • 13 Sep 2022 • Yann Dubois, Tatsunori Hashimoto, Stefano Ermon, Percy Liang

For non-contrastive learning, we use our framework to derive a simple and novel objective.

Contrastive Learning Self-Supervised Learning

Paper
Code

What Can Transformers Learn In-Context? A Case Study of Simple Function Classes

2 code implementations • 1 Aug 2022 • Shivam Garg, Dimitris Tsipras, Percy Liang, Gregory Valiant

To make progress towards understanding in-context learning, we consider the well-defined problem of training a model to in-context learn a function class (e. g., linear functions): that is, given data derived from some functions in the class, can we train a model to in-context learn "most" functions from this class?

In-Context Learning

154

Paper
Code

Calibrated ensembles can mitigate accuracy tradeoffs under distribution shift

no code implementations • 18 Jul 2022 • Ananya Kumar, Tengyu Ma, Percy Liang, aditi raghunathan

We often see undesirable tradeoffs in robust machine learning where out-of-distribution (OOD) accuracy is at odds with in-distribution (ID) accuracy: a robust classifier obtained via specialized techniques such as removing spurious features often has better OOD but worse ID accuracy compared to a standard classifier trained via ERM.

Paper
Add Code

Is a Caption Worth a Thousand Images? A Controlled Study for Representation Learning

no code implementations • 15 Jul 2022 • Shibani Santurkar, Yann Dubois, Rohan Taori, Percy Liang, Tatsunori Hashimoto

The development of CLIP [Radford et al., 2021] has sparked a debate on whether language supervision can result in vision models with more transferable representations than traditional image-only methods.

Descriptive Representation Learning

Paper
Add Code

Insights into Pre-training via Simpler Synthetic Tasks

1 code implementation • 21 Jun 2022 • Yuhuai Wu, Felix Li, Percy Liang

Second, to our surprise, we find that pre-training on a simple and generic synthetic task defined by the Set function achieves $65\%$ of the benefits, almost matching LIME.

Paper
Code

Emergent Abilities of Large Language Models

no code implementations • 15 Jun 2022 • Jason Wei, Yi Tay, Rishi Bommasani, Colin Raffel, Barret Zoph, Sebastian Borgeaud, Dani Yogatama, Maarten Bosma, Denny Zhou, Donald Metzler, Ed H. Chi, Tatsunori Hashimoto, Oriol Vinyals, Percy Liang, Jeff Dean, William Fedus

Scaling up language models has been shown to predictably improve performance and sample efficiency on a wide range of downstream tasks.

Language Modelling

Paper
Add Code

Beyond the Imitation Game: Quantifying and extrapolating the capabilities of language models

3 code implementations • 9 Jun 2022 • Aarohi Srivastava, Abhinav Rastogi, Abhishek Rao, Abu Awal Md Shoeb, Abubakar Abid, Adam Fisch, Adam R. Brown, Adam Santoro, Aditya Gupta, Adrià Garriga-Alonso, Agnieszka Kluska, Aitor Lewkowycz, Akshat Agarwal, Alethea Power, Alex Ray, Alex Warstadt, Alexander W. Kocurek, Ali Safaya, Ali Tazarv, Alice Xiang, Alicia Parrish, Allen Nie, Aman Hussain, Amanda Askell, Amanda Dsouza, Ambrose Slone, Ameet Rahane, Anantharaman S. Iyer, Anders Andreassen, Andrea Madotto, Andrea Santilli, Andreas Stuhlmüller, Andrew Dai, Andrew La, Andrew Lampinen, Andy Zou, Angela Jiang, Angelica Chen, Anh Vuong, Animesh Gupta, Anna Gottardi, Antonio Norelli, Anu Venkatesh, Arash Gholamidavoodi, Arfa Tabassum, Arul Menezes, Arun Kirubarajan, Asher Mullokandov, Ashish Sabharwal, Austin Herrick, Avia Efrat, Aykut Erdem, Ayla Karakaş, B. Ryan Roberts, Bao Sheng Loe, Barret Zoph, Bartłomiej Bojanowski, Batuhan Özyurt, Behnam Hedayatnia, Behnam Neyshabur, Benjamin Inden, Benno Stein, Berk Ekmekci, Bill Yuchen Lin, Blake Howald, Bryan Orinion, Cameron Diao, Cameron Dour, Catherine Stinson, Cedrick Argueta, César Ferri Ramírez, Chandan Singh, Charles Rathkopf, Chenlin Meng, Chitta Baral, Chiyu Wu, Chris Callison-Burch, Chris Waites, Christian Voigt, Christopher D. Manning, Christopher Potts, Cindy Ramirez, Clara E. Rivera, Clemencia Siro, Colin Raffel, Courtney Ashcraft, Cristina Garbacea, Damien Sileo, Dan Garrette, Dan Hendrycks, Dan Kilman, Dan Roth, Daniel Freeman, Daniel Khashabi, Daniel Levy, Daniel Moseguí González, Danielle Perszyk, Danny Hernandez, Danqi Chen, Daphne Ippolito, Dar Gilboa, David Dohan, David Drakard, David Jurgens, Debajyoti Datta, Deep Ganguli, Denis Emelin, Denis Kleyko, Deniz Yuret, Derek Chen, Derek Tam, Dieuwke Hupkes, Diganta Misra, Dilyar Buzan, Dimitri Coelho Mollo, Diyi Yang, Dong-Ho Lee, Dylan Schrader, Ekaterina Shutova, Ekin Dogus Cubuk, Elad Segal, Eleanor Hagerman, Elizabeth Barnes, Elizabeth Donoway, Ellie Pavlick, Emanuele Rodola, Emma Lam, Eric Chu, Eric Tang, Erkut Erdem, Ernie Chang, Ethan A. Chi, Ethan Dyer, Ethan Jerzak, Ethan Kim, Eunice Engefu Manyasi, Evgenii Zheltonozhskii, Fanyue Xia, Fatemeh Siar, Fernando Martínez-Plumed, Francesca Happé, Francois Chollet, Frieda Rong, Gaurav Mishra, Genta Indra Winata, Gerard de Melo, Germán Kruszewski, Giambattista Parascandolo, Giorgio Mariani, Gloria Wang, Gonzalo Jaimovitch-López, Gregor Betz, Guy Gur-Ari, Hana Galijasevic, Hannah Kim, Hannah Rashkin, Hannaneh Hajishirzi, Harsh Mehta, Hayden Bogar, Henry Shevlin, Hinrich Schütze, Hiromu Yakura, Hongming Zhang, Hugh Mee Wong, Ian Ng, Isaac Noble, Jaap Jumelet, Jack Geissinger, Jackson Kernion, Jacob Hilton, Jaehoon Lee, Jaime Fernández Fisac, James B. Simon, James Koppel, James Zheng, James Zou, Jan Kocoń, Jana Thompson, Janelle Wingfield, Jared Kaplan, Jarema Radom, Jascha Sohl-Dickstein, Jason Phang, Jason Wei, Jason Yosinski, Jekaterina Novikova, Jelle Bosscher, Jennifer Marsh, Jeremy Kim, Jeroen Taal, Jesse Engel, Jesujoba Alabi, Jiacheng Xu, Jiaming Song, Jillian Tang, Joan Waweru, John Burden, John Miller, John U. Balis, Jonathan Batchelder, Jonathan Berant, Jörg Frohberg, Jos Rozen, Jose Hernandez-Orallo, Joseph Boudeman, Joseph Guerr, Joseph Jones, Joshua B. Tenenbaum, Joshua S. Rule, Joyce Chua, Kamil Kanclerz, Karen Livescu, Karl Krauth, Karthik Gopalakrishnan, Katerina Ignatyeva, Katja Markert, Kaustubh D. Dhole, Kevin Gimpel, Kevin Omondi, Kory Mathewson, Kristen Chiafullo, Ksenia Shkaruta, Kumar Shridhar, Kyle McDonell, Kyle Richardson, Laria Reynolds, Leo Gao, Li Zhang, Liam Dugan, Lianhui Qin, Lidia Contreras-Ochando, Louis-Philippe Morency, Luca Moschella, Lucas Lam, Lucy Noble, Ludwig Schmidt, Luheng He, Luis Oliveros Colón, Luke Metz, Lütfi Kerem Şenel, Maarten Bosma, Maarten Sap, Maartje ter Hoeve, Maheen Farooqi, Manaal Faruqui, Mantas Mazeika, Marco Baturan, Marco Marelli, Marco Maru, Maria Jose Ramírez Quintana, Marie Tolkiehn, Mario Giulianelli, Martha Lewis, Martin Potthast, Matthew L. Leavitt, Matthias Hagen, Mátyás Schubert, Medina Orduna Baitemirova, Melody Arnaud, Melvin McElrath, Michael A. Yee, Michael Cohen, Michael Gu, Michael Ivanitskiy, Michael Starritt, Michael Strube, Michał Swędrowski, Michele Bevilacqua, Michihiro Yasunaga, Mihir Kale, Mike Cain, Mimee Xu, Mirac Suzgun, Mitch Walker, Mo Tiwari, Mohit Bansal, Moin Aminnaseri, Mor Geva, Mozhdeh Gheini, Mukund Varma T, Nanyun Peng, Nathan A. Chi, Nayeon Lee, Neta Gur-Ari Krakover, Nicholas Cameron, Nicholas Roberts, Nick Doiron, Nicole Martinez, Nikita Nangia, Niklas Deckers, Niklas Muennighoff, Nitish Shirish Keskar, Niveditha S. Iyer, Noah Constant, Noah Fiedel, Nuan Wen, Oliver Zhang, Omar Agha, Omar Elbaghdadi, Omer Levy, Owain Evans, Pablo Antonio Moreno Casares, Parth Doshi, Pascale Fung, Paul Pu Liang, Paul Vicol, Pegah Alipoormolabashi, Peiyuan Liao, Percy Liang, Peter Chang, Peter Eckersley, Phu Mon Htut, Pinyu Hwang, Piotr Miłkowski, Piyush Patil, Pouya Pezeshkpour, Priti Oli, Qiaozhu Mei, Qing Lyu, Qinlang Chen, Rabin Banjade, Rachel Etta Rudolph, Raefer Gabriel, Rahel Habacker, Ramon Risco, Raphaël Millière, Rhythm Garg, Richard Barnes, Rif A. Saurous, Riku Arakawa, Robbe Raymaekers, Robert Frank, Rohan Sikand, Roman Novak, Roman Sitelew, Ronan LeBras, Rosanne Liu, Rowan Jacobs, Rui Zhang, Ruslan Salakhutdinov, Ryan Chi, Ryan Lee, Ryan Stovall, Ryan Teehan, Rylan Yang, Sahib Singh, Saif M. Mohammad, Sajant Anand, Sam Dillavou, Sam Shleifer, Sam Wiseman, Samuel Gruetter, Samuel R. Bowman, Samuel S. Schoenholz, Sanghyun Han, Sanjeev Kwatra, Sarah A. Rous, Sarik Ghazarian, Sayan Ghosh, Sean Casey, Sebastian Bischoff, Sebastian Gehrmann, Sebastian Schuster, Sepideh Sadeghi, Shadi Hamdan, Sharon Zhou, Shashank Srivastava, Sherry Shi, Shikhar Singh, Shima Asaadi, Shixiang Shane Gu, Shubh Pachchigar, Shubham Toshniwal, Shyam Upadhyay, Shyamolima, Debnath, Siamak Shakeri, Simon Thormeyer, Simone Melzi, Siva Reddy, Sneha Priscilla Makini, Soo-Hwan Lee, Spencer Torene, Sriharsha Hatwar, Stanislas Dehaene, Stefan Divic, Stefano Ermon, Stella Biderman, Stephanie Lin, Stephen Prasad, Steven T. Piantadosi, Stuart M. Shieber, Summer Misherghi, Svetlana Kiritchenko, Swaroop Mishra, Tal Linzen, Tal Schuster, Tao Li, Tao Yu, Tariq Ali, Tatsu Hashimoto, Te-Lin Wu, Théo Desbordes, Theodore Rothschild, Thomas Phan, Tianle Wang, Tiberius Nkinyili, Timo Schick, Timofei Kornev, Titus Tunduny, Tobias Gerstenberg, Trenton Chang, Trishala Neeraj, Tushar Khot, Tyler Shultz, Uri Shaham, Vedant Misra, Vera Demberg, Victoria Nyamai, Vikas Raunak, Vinay Ramasesh, Vinay Uday Prabhu, Vishakh Padmakumar, Vivek Srikumar, William Fedus, William Saunders, William Zhang, Wout Vossen, Xiang Ren, Xiaoyu Tong, Xinran Zhao, Xinyi Wu, Xudong Shen, Yadollah Yaghoobzadeh, Yair Lakretz, Yangqiu Song, Yasaman Bahri, Yejin Choi, Yichi Yang, Yiding Hao, Yifu Chen, Yonatan Belinkov, Yu Hou, Yufang Hou, Yuntao Bai, Zachary Seid, Zhuoye Zhao, Zijian Wang, Zijie J. Wang, ZiRui Wang, Ziyi Wu

BIG-bench focuses on tasks that are believed to be beyond the capabilities of current language models.

Common Sense Reasoning Math +1

2,647

Paper
Code

Decentralized Training of Foundation Models in Heterogeneous Environments

1 code implementation • 2 Jun 2022 • Binhang Yuan, Yongjun He, Jared Quincy Davis, Tianyi Zhang, Tri Dao, Beidi Chen, Percy Liang, Christopher Re, Ce Zhang

Our key technical contribution is a scheduling algorithm that allocates different computational "tasklets" in the training of foundation models to a group of decentralized GPU devices connected by a slow heterogeneous network.

Scheduling

Paper
Code

Diffusion-LM Improves Controllable Text Generation

1 code implementation • 27 May 2022 • Xiang Lisa Li, John Thickstun, Ishaan Gulrajani, Percy Liang, Tatsunori B. Hashimoto

Controlling the behavior of language models (LMs) without re-training is a major open problem in natural language generation.

Language Modelling Sentence +1

983

Paper
Code

Connect, Not Collapse: Explaining Contrastive Learning for Unsupervised Domain Adaptation

no code implementations • 1 Apr 2022 • Kendrick Shen, Robbie Jones, Ananya Kumar, Sang Michael Xie, Jeff Z. HaoChen, Tengyu Ma, Percy Liang

We consider unsupervised domain adaptation (UDA), where labeled data from a source domain (e. g., photographs) and unlabeled data from a target domain (e. g., sketches) are used to learn a classifier for the target domain.

Contrastive Learning Unsupervised Domain Adaptation

Paper
Add Code

LinkBERT: Pretraining Language Models with Document Links

1 code implementation • ACL 2022 • Michihiro Yasunaga, Jure Leskovec, Percy Liang

Language model (LM) pretraining can learn various knowledge from text corpora, helping downstream tasks.

Ranked #1 on Semantic Similarity on BIOSSES

Document Classification Language Modelling +9

394

Paper
Code

Fine-Tuning can Distort Pretrained Features and Underperform Out-of-Distribution

3 code implementations • 21 Feb 2022 • Ananya Kumar, aditi raghunathan, Robbie Jones, Tengyu Ma, Percy Liang

However, in this paper, we find that fine-tuning can achieve worse accuracy than linear probing out-of-distribution (OOD) when the pretrained features are good and the distribution shift is large.

Paper
Code

GreaseLM: Graph REASoning Enhanced Language Models for Question Answering

1 code implementation • 21 Jan 2022 • Xikun Zhang, Antoine Bosselut, Michihiro Yasunaga, Hongyu Ren, Percy Liang, Christopher D. Manning, Jure Leskovec

Answering complex questions about textual narratives requires reasoning over both stated context and the world knowledge that underlies it.

Knowledge Graphs Negation +2

219

Paper
Code

CoAuthor: Designing a Human-AI Collaborative Writing Dataset for Exploring Language Model Capabilities

1 code implementation • 18 Jan 2022 • Mina Lee, Percy Liang, Qian Yang

Large language models (LMs) offer unprecedented language generation capabilities and exciting opportunities for interaction design.

Language Modelling Text Generation

Paper
Code

Extending the WILDS Benchmark for Unsupervised Adaptation

1 code implementation • ICLR 2022 • Shiori Sagawa, Pang Wei Koh, Tony Lee, Irena Gao, Sang Michael Xie, Kendrick Shen, Ananya Kumar, Weihua Hu, Michihiro Yasunaga, Henrik Marklund, Sara Beery, Etienne David, Ian Stavness, Wei Guo, Jure Leskovec, Kate Saenko, Tatsunori Hashimoto, Sergey Levine, Chelsea Finn, Percy Liang

Unlabeled data can be a powerful point of leverage for mitigating these distribution shifts, as it is frequently much more available than labeled data and can often be obtained from distributions beyond the source distribution as well.

Paper
Code

LILA: Language-Informed Latent Actions

1 code implementation • 5 Nov 2021 • Siddharth Karamcheti, Megha Srivastava, Percy Liang, Dorsa Sadigh

We introduce Language-Informed Latent Actions (LILA), a framework for learning natural language interfaces in the context of human-robot collaboration.

Imitation Learning

Paper
Code

An Explanation of In-context Learning as Implicit Bayesian Inference

1 code implementation • ICLR 2022 • Sang Michael Xie, aditi raghunathan, Percy Liang, Tengyu Ma

At test time, in-context learning occurs when the LM also infers a shared latent concept between examples in a prompt.

Few-Shot Learning In-Context Learning +1

Paper
Code

Large Language Models Can Be Strong Differentially Private Learners

4 code implementations • ICLR 2022 • Xuechen Li, Florian Tramèr, Percy Liang, Tatsunori Hashimoto

Differentially Private (DP) learning has seen limited success for building large deep learning models of text, and straightforward attempts at applying Differentially Private Stochastic Gradient Descent (DP-SGD) to NLP tasks have resulted in large performance drops and high computational overhead.

138

Paper
Code

Calibrated ensembles - a simple way to mitigate ID-OOD accuracy tradeoffs

no code implementations • 29 Sep 2021 • Ananya Kumar, aditi raghunathan, Tengyu Ma, Percy Liang

We often see undesirable tradeoffs in robust machine learning where out-of-distribution (OOD) accuracy is at odds with in-distribution (ID) accuracy.

Paper
Add Code

Fine-Tuning Distorts Pretrained Features and Underperforms Out-of-Distribution

no code implementations • ICLR 2022 • Ananya Kumar, aditi raghunathan, Robbie Matthew Jones, Tengyu Ma, Percy Liang

It is well known that fine-tuning leads to better accuracy in-distribution (ID).

Paper
Add Code

Ensembles and Cocktails: Robust Finetuning for Natural Language Generation

no code implementations • 29 Sep 2021 • John Hewitt, Xiang Lisa Li, Sang Michael Xie, Benjamin Newman, Percy Liang

When finetuning a pretrained language model for natural language generation tasks, one is currently faced with a tradeoff.

Language Modelling Text Generation

Paper
Add Code

GreaseLM: Graph REASoning Enhanced Language Models

no code implementations • ICLR 2022 • Xikun Zhang, Antoine Bosselut, Michihiro Yasunaga, Hongyu Ren, Percy Liang, Christopher D Manning, Jure Leskovec

Answering complex questions about textual narratives requires reasoning over both stated context and the world knowledge that underlies it.

Knowledge Graphs Negation +2

Paper
Add Code

How does Contrastive Pre-training Connect Disparate Domains?

no code implementations • 29 Sep 2021 • Kendrick Shen, Robbie Matthew Jones, Ananya Kumar, Sang Michael Xie, Percy Liang

We develop a conceptual model for contrastive learning under domain shifts, where data augmentations form connections between classes and domains that can be far apart.

Contrastive Learning Unsupervised Domain Adaptation

Paper
Add Code

Conditional probing: measuring usable information beyond a baseline

1 code implementation • EMNLP 2021 • John Hewitt, Kawin Ethayarajh, Percy Liang, Christopher D. Manning

Probing experiments investigate the extent to which neural representations make properties -- like part-of-speech -- predictable.

Word Embeddings

Paper
Code

LM-Critic: Language Models for Unsupervised Grammatical Error Correction

2 code implementations • EMNLP 2021 • Michihiro Yasunaga, Jure Leskovec, Percy Liang

Training a model for grammatical error correction (GEC) requires a set of labeled ungrammatical / grammatical sentence pairs, but manually annotating such pairs can be expensive.

Ranked #2 on Grammatical Error Correction on Unrestricted

Grammatical Error Correction Language Modelling +2

863

Paper
Code

No True State-of-the-Art? OOD Detection Methods are Inconsistent across Datasets

1 code implementation • 12 Sep 2021 • Fahim Tajwar, Ananya Kumar, Sang Michael Xie, Percy Liang

Out-of-distribution detection is an important component of reliable ML systems.

Out-of-Distribution Detection Out of Distribution (OOD) Detection

Paper
Code

On the Opportunities and Risks of Foundation Models

2 code implementations • 16 Aug 2021 • Rishi Bommasani, Drew A. Hudson, Ehsan Adeli, Russ Altman, Simran Arora, Sydney von Arx, Michael S. Bernstein, Jeannette Bohg, Antoine Bosselut, Emma Brunskill, Erik Brynjolfsson, Shyamal Buch, Dallas Card, Rodrigo Castellon, Niladri Chatterji, Annie Chen, Kathleen Creel, Jared Quincy Davis, Dora Demszky, Chris Donahue, Moussa Doumbouya, Esin Durmus, Stefano Ermon, John Etchemendy, Kawin Ethayarajh, Li Fei-Fei, Chelsea Finn, Trevor Gale, Lauren Gillespie, Karan Goel, Noah Goodman, Shelby Grossman, Neel Guha, Tatsunori Hashimoto, Peter Henderson, John Hewitt, Daniel E. Ho, Jenny Hong, Kyle Hsu, Jing Huang, Thomas Icard, Saahil Jain, Dan Jurafsky, Pratyusha Kalluri, Siddharth Karamcheti, Geoff Keeling, Fereshte Khani, Omar Khattab, Pang Wei Koh, Mark Krass, Ranjay Krishna, Rohith Kuditipudi, Ananya Kumar, Faisal Ladhak, Mina Lee, Tony Lee, Jure Leskovec, Isabelle Levent, Xiang Lisa Li, Xuechen Li, Tengyu Ma, Ali Malik, Christopher D. Manning, Suvir Mirchandani, Eric Mitchell, Zanele Munyikwa, Suraj Nair, Avanika Narayan, Deepak Narayanan, Ben Newman, Allen Nie, Juan Carlos Niebles, Hamed Nilforoshan, Julian Nyarko, Giray Ogut, Laurel Orr, Isabel Papadimitriou, Joon Sung Park, Chris Piech, Eva Portelance, Christopher Potts, aditi raghunathan, Rob Reich, Hongyu Ren, Frieda Rong, Yusuf Roohani, Camilo Ruiz, Jack Ryan, Christopher Ré, Dorsa Sadigh, Shiori Sagawa, Keshav Santhanam, Andy Shih, Krishnan Srinivasan, Alex Tamkin, Rohan Taori, Armin W. Thomas, Florian Tramèr, Rose E. Wang, William Wang, Bohan Wu, Jiajun Wu, Yuhuai Wu, Sang Michael Xie, Michihiro Yasunaga, Jiaxuan You, Matei Zaharia, Michael Zhang, Tianyi Zhang, Xikun Zhang, Yuhui Zhang, Lucia Zheng, Kaitlyn Zhou, Percy Liang

AI is undergoing a paradigm shift with the rise of models (e. g., BERT, DALL-E, GPT-3) that are trained on broad data at scale and are adaptable to a wide range of downstream tasks.

Transfer Learning

847

Paper
Code

Just Train Twice: Improving Group Robustness without Training Group Information

1 code implementation • 19 Jul 2021 • Evan Zheran Liu, Behzad Haghgoo, Annie S. Chen, aditi raghunathan, Pang Wei Koh, Shiori Sagawa, Percy Liang, Chelsea Finn

Standard training via empirical risk minimization (ERM) can produce models that achieve high accuracy on average but low accuracy on certain groups, especially in the presence of spurious correlations between the input and label.

Ranked #1 on Out-of-Distribution Generalization on ImageNet-W

Image Classification Out-of-Distribution Generalization

Paper
Code

Codified audio language modeling learns useful representations for music information retrieval

1 code implementation • 12 Jul 2021 • Rodrigo Castellon, Chris Donahue, Percy Liang

Relative to representations from conventional MIR models which are pre-trained on tagging, we find that using representations from Jukebox as input features yields 30% stronger performance on average across four MIR tasks: tagging, genre classification, emotion recognition, and key detection.

Ranked #1 on Emotion Recognition on Emomusic

Emotion Recognition Genre classification +8

159

Paper
Code

Accuracy on the Line: On the Strong Correlation Between Out-of-Distribution and In-Distribution Generalization

1 code implementation • 9 Jul 2021 • John Miller, Rohan Taori, aditi raghunathan, Shiori Sagawa, Pang Wei Koh, Vaishaal Shankar, Percy Liang, Yair Carmon, Ludwig Schmidt

For machine learning systems to be reliable, we must understand their performance in unseen, out-of-distribution environments.

Classification Domain Adaptation +1

112

Paper
Code

Break-It-Fix-It: Unsupervised Learning for Program Repair

1 code implementation • 11 Jun 2021 • Michihiro Yasunaga, Percy Liang

To bridge this gap, we propose a new training approach, Break-It-Fix-It (BIFI), which has two key ideas: (i) we use the critic to check a fixer's output on real bad inputs and add good (fixed) outputs to the training data, and (ii) we train a breaker to generate realistic bad code from good code.

Ranked #1 on Program Repair on DeepFix

C++ code Code Repair +4

107

Paper
Code

Swords: A Benchmark for Lexical Substitution with Improved Data Coverage and Quality

1 code implementation • NAACL 2021 • Mina Lee, Chris Donahue, Robin Jia, Alexander Iyabor, Percy Liang

We release a new benchmark for lexical substitution, the task of finding appropriate substitutes for a target word in a context.

Paper
Code

QA-GNN: Reasoning with Language Models and Knowledge Graphs for Question Answering

4 code implementations • NAACL 2021 • Michihiro Yasunaga, Hongyu Ren, Antoine Bosselut, Percy Liang, Jure Leskovec

The problem of answering questions using knowledge from pre-trained language models (LMs) and knowledge graphs (KGs) presents two challenges: given a QA context (question and answer choice), methods need to (i) identify relevant knowledge from large KGs, and (ii) perform joint reasoning over the QA context and KG.

Ranked #2 on Riddle Sense on RiddleSense

Graph Representation Learning Knowledge Graphs +5

602

Paper
Code

Do Question Answering Modeling Improvements Hold Across Benchmarks?

no code implementations • 1 Feb 2021 • Nelson F. Liu, Tony Lee, Robin Jia, Percy Liang

Do question answering (QA) modeling improvements (e. g., choice of architecture and training procedure) hold consistently across the diverse landscape of QA benchmarks?

Question Answering

Paper
Add Code

Prefix-Tuning: Optimizing Continuous Prompts for Generation

10 code implementations • ACL 2021 • Xiang Lisa Li, Percy Liang

Fine-tuning is the de facto way to leverage large pretrained language models to perform downstream tasks.

Language Modelling Table-to-Text Generation

5,437

Paper
Code

WILDS: A Benchmark of in-the-Wild Distribution Shifts

6 code implementations • 14 Dec 2020 • Pang Wei Koh, Shiori Sagawa, Henrik Marklund, Sang Michael Xie, Marvin Zhang, Akshay Balsubramani, Weihua Hu, Michihiro Yasunaga, Richard Lanas Phillips, Irena Gao, Tony Lee, Etienne David, Ian Stavness, Wei Guo, Berton A. Earnshaw, Imran S. Haque, Sara Beery, Jure Leskovec, Anshul Kundaje, Emma Pierson, Sergey Levine, Chelsea Finn, Percy Liang

Distribution shifts -- where the training distribution differs from the test distribution -- can substantially degrade the accuracy of machine learning (ML) systems deployed in the wild.

1,328

Paper
Code

In-N-Out: Pre-Training and Self-Training using Auxiliary Information for Out-of-Distribution Robustness

1 code implementation • ICLR 2021 • Sang Michael Xie, Ananya Kumar, Robbie Jones, Fereshte Khani, Tengyu Ma, Percy Liang

To get the best of both worlds, we introduce In-N-Out, which first trains a model with auxiliary inputs and uses it to pseudolabel all the in-distribution inputs, then pre-trains a model on OOD auxiliary outputs and fine-tunes this model with the pseudolabels (self-training).

Time Series Time Series Analysis +1

Paper
Code

Removing Spurious Features can Hurt Accuracy and Affect Groups Disproportionately

1 code implementation • 7 Dec 2020 • Fereshte Khani, Percy Liang

The presence of spurious features interferes with the goal of obtaining robust models that perform well across many groups within the population.

Paper
Code

Beyond I.I.D.: Three Levels of Generalization for Question Answering on Knowledge Bases

1 code implementation • 16 Nov 2020 • Yu Gu, Sue Kase, Michelle Vanni, Brian Sadler, Percy Liang, Xifeng Yan, Yu Su

To facilitate the development of KBQA models with stronger generalization, we construct and release a new large-scale, high-quality dataset with 64, 331 questions, GrailQA, and provide evaluation settings for all three levels of generalization.

Knowledge Base Question Answering

Paper
Code

Selective Classification Can Magnify Disparities Across Groups

1 code implementation • ICLR 2021 • Erik Jones, Shiori Sagawa, Pang Wei Koh, Ananya Kumar, Percy Liang

In this paper, we find that while selective classification can improve average accuracies, it can simultaneously magnify existing accuracy disparities between various groups within a population, especially in the presence of spurious correlations.

Classification General Classification

Paper
Code

Enabling certification of verification-agnostic networks via memory-efficient semidefinite programming

2 code implementations • NeurIPS 2020 • Sumanth Dathathri, Krishnamurthy Dvijotham, Alexey Kurakin, aditi raghunathan, Jonathan Uesato, Rudy Bunel, Shreya Shankar, Jacob Steinhardt, Ian Goodfellow, Percy Liang, Pushmeet Kohli

In this work, we propose a first-order dual SDP algorithm that (1) requires memory only linear in the total number of network activations, (2) only requires a fixed number of forward/backward passes through the network per iteration.

306

Paper
Code

RNNs can generate bounded hierarchical languages with optimal memory

2 code implementations • EMNLP 2020 • John Hewitt, Michael Hahn, Surya Ganguli, Percy Liang, Christopher D. Manning

Recurrent neural networks empirically generate natural language with high syntactic fidelity.

Paper
Code

The EOS Decision and Length Extrapolation

1 code implementation • EMNLP (BlackboxNLP) 2020 • Benjamin Newman, John Hewitt, Percy Liang, Christopher D. Manning

Extrapolation to unseen sequence lengths is a challenge for neural generative models of language.

Paper
Code

Learning Adaptive Language Interfaces through Decomposition

no code implementations • EMNLP (intexsempar) 2020 • Siddharth Karamcheti, Dorsa Sadigh, Percy Liang

Our goal is to create an interactive natural language interface that efficiently and reliably learns from users to complete tasks in simulated robotics settings.

Semantic Parsing

Paper
Add Code

On the Importance of Adaptive Data Collection for Extremely Imbalanced Pairwise Tasks

1 code implementation • Findings of the Association for Computational Linguistics 2020 • Stephen Mussmann, Robin Jia, Percy Liang

Many pairwise classification tasks, such as paraphrase detection and open-domain question answering, naturally have extreme label imbalance (e. g., $99. 99\%$ of examples are negatives).

Active Learning Open-Domain Question Answering +1

Paper
Code

Simplifying Models with Unlabeled Output Data

no code implementations • 28 Sep 2020 • Sang Michael Xie, Tengyu Ma, Percy Liang

We focus on prediction problems with high-dimensional outputs that are subject to output validity constraints, e. g. a pseudocode-to-code translation task where the code must compile.

Code Translation Image Generation +2

Paper
Add Code

Task-Oriented Dialogue as Dataflow Synthesis

1 code implementation • 24 Sep 2020 • Semantic Machines, Jacob Andreas, John Bufe, David Burkett, Charles Chen, Josh Clausman, Jean Crawford, Kate Crim, Jordan DeLoach, Leah Dorner, Jason Eisner, Hao Fang, Alan Guo, David Hall, Kristin Hayes, Kellie Hill, Diana Ho, Wendy Iwaszuk, Smriti Jha, Dan Klein, Jayant Krishnamurthy, Theo Lanman, Percy Liang, Christopher H Lin, Ilya Lintsbakh, Andy McGovern, Aleksandr Nisnevich, Adam Pauls, Dmitrij Petters, Brent Read, Dan Roth, Subhro Roy, Jesse Rusak, Beth Short, Div Slomin, Ben Snyder, Stephon Striplin, Yu Su, Zachary Tellman, Sam Thomson, Andrei Vorobev, Izabela Witoszko, Jason Wolfe, Abby Wray, Yuchen Zhang, Alexander Zotov

We describe an approach to task-oriented dialogue in which dialogue state is represented as a dataflow graph.

Paper
Code

Decoupling Exploration and Exploitation for Meta-Reinforcement Learning without Sacrifices

2 code implementations • 6 Aug 2020 • Evan Zheran Liu, aditi raghunathan, Percy Liang, Chelsea Finn

Learning a new task often requires both exploring to gather task-relevant information and exploiting this information to solve the task.

Meta Reinforcement Learning reinforcement-learning +2

672

Paper
Code

Robustness to Spurious Correlations via Human Annotations

1 code implementation • ICML 2020 • Megha Srivastava, Tatsunori Hashimoto, Percy Liang

The reliability of machine learning systems critically assumes that the associations between features and labels remain similar between training and test distributions.

Common Sense Reasoning

Paper
Code

Learning Abstract Models for Strategic Exploration and Fast Reward Transfer

1 code implementation • 12 Jul 2020 • Evan Zheran Liu, Ramtin Keramati, Sudarshan Seshadri, Kelvin Guu, Panupong Pasupat, Emma Brunskill, Percy Liang

Model-based reinforcement learning (RL) is appealing because (i) it enables planning and thus more strategic exploration, and (ii) by decoupling dynamics from rewards, it enables fast transfer to new reward functions.

Model-based Reinforcement Learning Montezuma's Revenge +2

32,758

Paper
Code

Concept Bottleneck Models

4 code implementations • ICML 2020 • Pang Wei Koh, Thao Nguyen, Yew Siang Tang, Stephen Mussmann, Emma Pierson, Been Kim, Percy Liang

We seek to learn models that we can interact with using high-level concepts: if the model did not think there was a bone spur in the x-ray, would it still predict severe arthritis?

155

Paper
Code

Composed Fine-Tuning: Freezing Pre-Trained Denoising Autoencoders for Improved Generalization

2 code implementations • 29 Jun 2020 • Sang Michael Xie, Tengyu Ma, Percy Liang

Empirically, we show that composed fine-tuning improves over standard fine-tuning on two pseudocode-to-code translation datasets (3% and 6% relative).

Code Translation Denoising +2

Paper
Code

Selective Question Answering under Domain Shift

2 code implementations • ACL 2020 • Amita Kamath, Robin Jia, Percy Liang

In this work, we propose the setting of selective question answering under domain shift, in which a QA model is tested on a mixture of in-domain and out-of-domain data, and must answer (i. e., not abstain on) as many questions as possible while maintaining high accuracy.

Question Answering

Paper
Code

Explore then Execute: Adapting without Rewards via Factorized Meta-Reinforcement Learning

no code implementations • ICML Workshop LifelongML 2020 • Evan Zheran Liu, aditi raghunathan, Percy Liang, Chelsea Finn

In principle, meta-reinforcement learning approaches can exploit this shared structure, but in practice, they fail to adapt to new environments when adaptation requires targeted exploration (e. g., exploring the cabinets to find ingredients in a new kitchen).

Meta Reinforcement Learning reinforcement-learning +2

Paper
Add Code

Graph-based, Self-Supervised Program Repair from Diagnostic Feedback

2 code implementations • ICML 2020 • Michihiro Yasunaga, Percy Liang

Second, we present a self-supervised learning paradigm for program repair that leverages unlabeled programs available online to create a large amount of extra program repair examples, which we use to pre-train our models.

Ranked #1 on Program Synthesis on SPoC TestW

Code Generation Graph Learning +2

188

Paper
Code

Enabling Language Models to Fill in the Blanks

3 code implementations • ACL 2020 • Chris Donahue, Mina Lee, Percy Liang

We show that this approach, which we call infilling by language modeling, can enable LMs to infill entire sentences effectively on three different domains: short stories, scientific abstracts, and lyrics.

Language Modelling Text Infilling

193

Paper
Code

An Investigation of Why Overparameterization Exacerbates Spurious Correlations

3 code implementations • 9 May 2020 • Shiori Sagawa, aditi raghunathan, Pang Wei Koh, Percy Liang

We study why overparameterization -- increasing model size well beyond the point of zero training error -- can hurt test error on minority groups despite improving average test error when there are spurious correlations in the data.

Inductive Bias

Paper
Code

ExpBERT: Representation Engineering with Natural Language Explanations

2 code implementations • ACL 2020 • Shikhar Murty, Pang Wei Koh, Percy Liang

Suppose we want to specify the inductive bias that married couples typically go on honeymoons for the task of extracting pairs of spouses from text.

Inductive Bias Relation Extraction +1

Paper
Code

Robust Encodings: A Framework for Combating Adversarial Typos

1 code implementation • ACL 2020 • Erik Jones, Robin Jia, aditi raghunathan, Percy Liang

We instantiate RobEn to defend against a large family of adversarial typos.

Sentence

Paper
Code

Distributionally Robust Neural Networks

1 code implementation • ICLR 2020 • Shiori Sagawa*, Pang Wei Koh*, Tatsunori B. Hashimoto, Percy Liang

Distributionally robust optimization (DRO) allows us to learn models that instead minimize the worst-case training loss over a set of pre-defined groups.

L2 Regularization Natural Language Inference +1

216

Paper
Code

Understanding Self-Training for Gradual Domain Adaptation

2 code implementations • ICML 2020 • Ananya Kumar, Tengyu Ma, Percy Liang

Machine learning systems must adapt to data distributions that evolve over time, in applications ranging from sensor networks and self-driving car perception modules to brain-machine interfaces.

Ranked #1 on Unsupervised Domain Adaptation on Portraits (over time)

Unsupervised Domain Adaptation

Paper
Code

Understanding and Mitigating the Tradeoff Between Robustness and Accuracy

1 code implementation • ICML 2020 • Aditi Raghunathan, Sang Michael Xie, Fanny Yang, John Duchi, Percy Liang

In this work, we precisely characterize the effect of augmentation on the standard error in linear regression when the optimal linear predictor has zero standard and robust error.

regression

Paper
Code

Feature Noise Induces Loss Discrepancy Across Groups

1 code implementation • ICML 2020 • Fereshte Khani, Percy Liang

Our main result is that even when there is no information deficiency specific to one group (e. g., both groups have infinite data), adding the same amount of feature noise to all individuals leads to loss discrepancy.

Attribute

Paper
Code

Distributionally Robust Neural Networks for Group Shifts: On the Importance of Regularization for Worst-Case Generalization

8 code implementations • 20 Nov 2019 • Shiori Sagawa, Pang Wei Koh, Tatsunori B. Hashimoto, Percy Liang

Distributionally robust optimization (DRO) allows us to learn models that instead minimize the worst-case training loss over a set of pre-defined groups.

Ranked #1 on Out-of-Distribution Generalization on UrbanCars

Domain Generalization Natural Language Inference +2

1,328

Paper
Code

Learning Autocomplete Systems as a Communication Game

1 code implementation • 16 Nov 2019 • Mina Lee, Tatsunori B. Hashimoto, Percy Liang

We study textual autocomplete---the task of predicting a full sentence from a partial sentence---as a human-machine communication game.

Sentence

Paper
Code

Shaping Visual Representations with Language for Few-shot Classification

2 code implementations • ACL 2020 • Jesse Mu, Percy Liang, Noah Goodman

By describing the features and abstractions of our world, language is a crucial tool for human learning and a promising source of supervision for machine learning models.

Classification General Classification +2

Paper
Code

When Covariate-shifted Data Augmentation Increases Test Error And How to Fix It

no code implementations • 25 Sep 2019 • Sang Michael Xie*, Aditi Raghunathan*, Fanny Yang, John C. Duchi, Percy Liang

Empirically, data augmentation sometimes improves and sometimes hurts test error, even when only adding points with labels from the true conditional distribution that the hypothesis class is expressive enough to fit.

Data Augmentation regression

Paper
Add Code

Verified Uncertainty Calibration

3 code implementations • NeurIPS 2019 • Ananya Kumar, Percy Liang, Tengyu Ma

In these experiments, we also estimate the calibration error and ECE more accurately than the commonly used plugin estimators.

Weather Forecasting

130

Paper
Code

Designing and Interpreting Probes with Control Tasks

1 code implementation • IJCNLP 2019 • John Hewitt, Percy Liang

The selectivity of a probe puts linguistic task accuracy in context with the probe's capacity to memorize from word types.

Part-Of-Speech Tagging

Paper
Code

Distributionally Robust Language Modeling

1 code implementation • IJCNLP 2019 • Yonatan Oren, Shiori Sagawa, Tatsunori B. Hashimoto, Percy Liang

Language models are generally trained on data spanning a wide range of topics (e. g., news, reviews, fiction), but they might be applied to an a priori unknown target distribution (e. g., restaurant reviews).

Language Modelling

Paper
Code

Certified Robustness to Adversarial Word Substitutions

2 code implementations • IJCNLP 2019 • Robin Jia, aditi raghunathan, Kerem Göksel, Percy Liang

We train the first models that are provably robust to all word substitutions in this family.

Data Augmentation Natural Language Inference +1

Paper
Code

Selection via Proxy: Efficient Data Selection for Deep Learning

1 code implementation • ICLR 2020 • Cody Coleman, Christopher Yeh, Stephen Mussmann, Baharan Mirzasoleiman, Peter Bailis, Percy Liang, Jure Leskovec, Matei Zaharia

By removing hidden layers from the target model, using smaller architectures, and training for fewer epochs, we create proxies that are an order of magnitude faster to train.

Active Learning Computational Efficiency

Paper
Code

A Tight Analysis of Greedy Yields Subexponential Time Approximation for Uniform Decision Tree

no code implementations • 26 Jun 2019 • Ray Li, Percy Liang, Stephen Mussmann

The greedy algorithm's $O(\log n)$ approximation ratio was the best known, but the largest approximation ratio known to be NP-hard is $4-\varepsilon$.

Active Learning

Paper
Add Code

Adversarial Training Can Hurt Generalization

no code implementations • ICML Workshop Deep_Phenomen 2019 • Aditi Raghunathan, Sang Michael Xie, Fanny Yang, John C. Duchi, Percy Liang

While adversarial training can improve robust accuracy (against an adversary), it sometimes hurts standard accuracy (when there is no adversary).

Paper
Add Code

SPoC: Search-based Pseudocode to Code

1 code implementation • NeurIPS 2019 • Sumith Kulal, Panupong Pasupat, Kartik Chandra, Mina Lee, Oded Padon, Alex Aiken, Percy Liang

Given test cases as a mechanism to validate programs, we search over the space of possible translations of the pseudocode to find a program that passes the validation.

Ranked #2 on Program Synthesis on SPoC TestP

Program Synthesis Translation

Paper
Code

Maximum Weighted Loss Discrepancy

1 code implementation • 8 Jun 2019 • Fereshte Khani, aditi raghunathan, Percy Liang

To capture this inequality, we introduce and study a notion we call maximum weighted loss discrepancy (MWLD), the maximum (weighted) difference between the loss of a group and the loss of the population.

Fairness Generalization Bounds

Paper
Code

Unlabeled Data Improves Adversarial Robustness

4 code implementations • NeurIPS 2019 • Yair Carmon, aditi raghunathan, Ludwig Schmidt, Percy Liang, John C. Duchi

We demonstrate, theoretically and empirically, that adversarial robustness can significantly benefit from semisupervised learning.

Adversarial Robustness Robust classification

133

Paper
Code

On the Accuracy of Influence Functions for Measuring Group Effects

2 code implementations • NeurIPS 2019 • Pang Wei Koh, Kai-Siang Ang, Hubert H. K. Teo, Percy Liang

Influence functions estimate the effect of removing a training point on a model without the need to retrain.

Influence Approximation

Paper
Code

Strategies for Pre-training Graph Neural Networks

10 code implementations • ICLR 2020 • Weihua Hu, Bowen Liu, Joseph Gomes, Marinka Zitnik, Percy Liang, Vijay Pande, Jure Leskovec

Many applications of machine learning require a model to make accurate pre-dictions on test examples that are distributionally different from training ones, while task-specific labels are scarce during training.

Ranked #3 on Molecular Property Prediction on ToxCast

Graph Classification Molecular Property Prediction +4

916

Paper
Code

Learning Abstract Models for Long-Horizon Exploration

no code implementations • ICLR 2019 • Evan Zheran Liu, Ramtin Keramati, Sudarshan Seshadri, Kelvin Guu, Panupong Pasupat, Emma Brunskill, Percy Liang

In our approach, a manager maintains an abstract MDP over a subset of the abstract states, which grows monotonically through targeted exploration (possible due to the abstract MDP).

Atari Games

Paper
Add Code

Select Via Proxy: Efficient Data Selection For Training Deep Networks

no code implementations • ICLR 2019 • Cody Coleman, Stephen Mussmann, Baharan Mirzasoleiman, Peter Bailis, Percy Liang, Jure Leskovec, Matei Zaharia

In our approach, we first train a small proxy model quickly, which we then use to estimate the utility of individual training data points, and then select the most informative ones for training the large target model.

BIG-bench Machine Learning Image Classification +1

Paper
Add Code

Pun Generation with Surprise

2 code implementations • NAACL 2019 • He He, Nanyun Peng, Percy Liang

We tackle the problem of generating a pun sentence given a pair of homophones (e. g., "died" and "dyed").

Language Modelling Sentence +1

Paper
Code

Unifying Human and Statistical Evaluation for Natural Language Generation

2 code implementations • NAACL 2019 • Tatsunori B. Hashimoto, Hugh Zhang, Percy Liang

How can we measure whether a natural language generation system produces both high quality and diverse outputs?

Sentence Text Generation

Paper
Code

Defending against Whitebox Adversarial Attacks via Randomized Discretization

1 code implementation • 25 Mar 2019 • Yuchen Zhang, Percy Liang

Adversarial perturbations dramatically decrease the accuracy of state-of-the-art image classifiers.

Adversarial Attack General Classification

Paper
Code

Uncertainty Sampling is Preconditioned Stochastic Gradient Descent on Zero-One Loss

1 code implementation • NeurIPS 2018 • Stephen Mussmann, Percy Liang

Uncertainty sampling, a popular active learning algorithm, is used to reduce the amount of data required to learn a classifier, but it has been observed in practice to converge to different parameters depending on the initialization and sometimes to even better parameters than standard training on all the data.

Active Learning

Paper
Code

A Retrieve-and-Edit Framework for Predicting Structured Outputs

1 code implementation • NeurIPS 2018 • Tatsunori B. Hashimoto, Kelvin Guu, Yonatan Oren, Percy Liang

For the task of generating complex outputs such as source code, editing existing outputs can be easier than generating complex outputs from scratch.

Retrieval

Paper
Code

FrAngel: Component-Based Synthesis with Control Structures

2 code implementations • 13 Nov 2018 • Kensen Shi, Jacob Steinhardt, Percy Liang

We present FrAngel, a new approach to component-based synthesis that can synthesize short Java functions with control structures when given a desired signature, a set of input-output examples, and a collection of libraries (without formal specifications).

Programming Languages

Paper
Code

Stronger Data Poisoning Attacks Break Data Sanitization Defenses

2 code implementations • 2 Nov 2018 • Pang Wei Koh, Jacob Steinhardt, Percy Liang

In this paper, we develop three attacks that can bypass a broad range of common data sanitization defenses, including anomaly detectors based on nearest neighbors, training loss, and singular-value decomposition.

Data Poisoning Sentiment Analysis +2

Paper
Code

Semidefinite relaxations for certifying robustness to adversarial examples

3 code implementations • NeurIPS 2018 • Aditi Raghunathan, Jacob Steinhardt, Percy Liang

One promise of ending the arms race is developing certified defenses, ones which are provably robust against all attackers in some family.

Paper
Code

QuAC: Question Answering in Context

no code implementations • EMNLP 2018 • Eunsol Choi, He He, Mohit Iyyer, Mark Yatskar, Wen-tau Yih, Yejin Choi, Percy Liang, Luke Zettlemoyer

We present QuAC, a dataset for Question Answering in Context that contains 14K information-seeking QA dialogs (100K questions in total).

Question Answering Reading Comprehension

Paper
Add Code

Transforming Question Answering Datasets Into Natural Language Inference Datasets

2 code implementations • 9 Sep 2018 • Dorottya Demszky, Kelvin Guu, Percy Liang

Existing datasets for natural language inference (NLI) have propelled research on language understanding.

Natural Language Inference Question Answering +1

Paper
Code

Textual Analogy Parsing: What's Shared and What's Compared among Analogous Facts

2 code implementations • EMNLP 2018 • Matthew Lamm, Arun Tejasvi Chaganty, Christopher D. Manning, Dan Jurafsky, Percy Liang

To understand a sentence like "whereas only 10% of White Americans live at or below the poverty line, 28% of African Americans do" it is important not only to identify individual facts, e. g., poverty rates of distinct demographic groups, but also the higher-order relations between them, e. g., the disparity between them.

Sentence Textual Analogy Parsing

Paper
Code

Decoupling Strategy and Generation in Negotiation Dialogues

2 code implementations • EMNLP 2018 • He He, Derek Chen, Anusha Balakrishnan, Percy Liang

We consider negotiation settings in which two agents use natural language to bargain on goods.

reinforcement-learning Reinforcement Learning (RL) +1

155

Paper
Code

Mapping Natural Language Commands to Web Elements

2 code implementations • EMNLP 2018 • Panupong Pasupat, Tian-Shun Jiang, Evan Zheran Liu, Kelvin Guu, Percy Liang

The web provides a rich, open-domain environment with textual, structural, and spatial properties.

Relational Reasoning Visual Reasoning

Paper
Code

QuAC : Question Answering in Context

no code implementations • 21 Aug 2018 • Eunsol Choi, He He, Mohit Iyyer, Mark Yatskar, Wen-tau Yih, Yejin Choi, Percy Liang, Luke Zettlemoyer

We present QuAC, a dataset for Question Answering in Context that contains 14K information-seeking QA dialogs (100K questions in total).

Question Answering Reading Comprehension

Paper
Add Code

Inferring Multidimensional Rates of Aging from Cross-Sectional Data

1 code implementation • 12 Jul 2018 • Emma Pierson, Pang Wei Koh, Tatsunori Hashimoto, Daphne Koller, Jure Leskovec, Nicholas Eriksson, Percy Liang

Motivated by the study of human aging, we present an interpretable latent-variable model that learns temporal dynamics from cross-sectional data.

Human Aging Time Series +1

Paper
Code

The price of debiasing automatic metrics in natural language evaluation

1 code implementation • 6 Jul 2018 • Arun Tejasvi Chaganty, Stephen Mussman, Percy Liang

For evaluating generation systems, automatic metrics such as BLEU cost nothing to run but have been shown to correlate poorly with human judgment, leading to systematic bias against certain model improvements.

Question Answering

Paper
Code

The price of debiasing automatic metrics in natural language evalaution

no code implementations • ACL 2018 • Arun Chaganty, Stephen Mussmann, Percy Liang

Abstractive Text Summarization Image Captioning +1

Paper
Add Code

Fairness Without Demographics in Repeated Loss Minimization

1 code implementation • ICML 2018 • Tatsunori B. Hashimoto, Megha Srivastava, Hongseok Namkoong, Percy Liang

Machine learning models (e. g., speech recognizers) are usually trained to minimize average loss, which results in representation disparity---minority groups (e. g., non-native speakers) contribute less to the training objective and thus tend to suffer higher loss.

Fairness

Paper
Code

On the Relationship between Data Efficiency and Error for Uncertainty Sampling

1 code implementation • ICML 2018 • Stephen Mussmann, Percy Liang

While active learning offers potential cost savings, the actual data efficiency---the reduction in amount of labeled data needed to obtain the same error rate---observed in practice is mixed.

Active Learning regression

Paper
Code

Know What You Don't Know: Unanswerable Questions for SQuAD

12 code implementations • ACL 2018 • Pranav Rajpurkar, Robin Jia, Percy Liang

Extractive reading comprehension systems can often locate the correct answer to a question in a context document, but they also tend to make unreliable guesses on questions for which the correct answer is not stated in the context.

Natural Language Understanding Question Answering +1

Paper
Code

Planning, Inference and Pragmatics in Sequential Language Games

1 code implementation • TACL 2018 • Fereshte Khani, Noah D. Goodman, Percy Liang

We study sequential language games in which two players, each with private information, communicate to achieve a common goal.

Paper
Code

Training Classifiers with Natural Language Explanations

2 code implementations • ACL 2018 • Braden Hancock, Paroma Varma, Stephanie Wang, Martin Bringmann, Percy Liang, Christopher Ré

Training accurate classifiers requires many labels, but each label provides only limited information (one bit for binary classification).

Binary Classification General Classification +1

Paper
Code

Delete, Retrieve, Generate: A Simple Approach to Sentiment and Style Transfer

6 code implementations • NAACL 2018 • Juncen Li, Robin Jia, He He, Percy Liang

We consider the task of text attribute transfer: transforming a sentence to alter a specific attribute (e. g., sentiment) while preserving its attribute-independent content (e. g., changing "screen is just the right size" to "screen is too small").

Ranked #1 on Unsupervised Text Style Transfer on Yelp2018

Attribute Image Captioning +4

467

Paper
Code

Generalized Binary Search For Split-Neighborly Problems

no code implementations • 27 Feb 2018 • Stephen Mussmann, Percy Liang

In sequential hypothesis testing, Generalized Binary Search (GBS) greedily chooses the test with the highest information gain at each step.

Two-sample testing

Paper
Add Code

Reinforcement Learning on Web Interfaces Using Workflow-Guided Exploration

4 code implementations • ICLR 2018 • Evan Zheran Liu, Kelvin Guu, Panupong Pasupat, Tianlin Shi, Percy Liang

Reinforcement learning (RL) agents improve through trial-and-error, but when reward is sparse and the agent cannot discover successful action sequences, learning stagnates.

reinforcement-learning Reinforcement Learning (RL)

251

Paper
Code

Learning a SAT Solver from Single-Bit Supervision

6 code implementations • ICLR 2019 • Daniel Selsam, Matthew Lamm, Benedikt Bünz, Percy Liang, Leonardo de Moura, David L. Dill

We present NeuroSAT, a message passing neural network that learns to solve SAT problems after only being trained as a classifier to predict satisfiability.

260

Paper
Code

Certified Defenses against Adversarial Examples

4 code implementations • ICLR 2018 • Aditi Raghunathan, Jacob Steinhardt, Percy Liang

While neural networks have achieved high accuracy on standard image classification benchmarks, their accuracy drops to nearly zero in the presence of small adversarial perturbations to test inputs.

Adversarial Attack Adversarial Defense +1

113

Paper
Code

Learning Overcomplete HMMs

no code implementations • NeurIPS 2017 • Vatsal Sharan, Sham Kakade, Percy Liang, Gregory Valiant

On the other hand, we show that learning is impossible given only a polynomial number of samples for HMMs with a small output alphabet and whose transition matrices are random regular graphs with large degree.

Paper
Add Code

Unsupervised Transformation Learning via Convex Relaxations

1 code implementation • NeurIPS 2017 • Tatsunori B. Hashimoto, John C. Duchi, Percy Liang

Our goal is to extract meaningful transformations from raw images, such as varying the thickness of lines in handwriting or the lighting in a portrait.

Paper
Code

Generating Sentences by Editing Prototypes

3 code implementations • TACL 2018 • Kelvin Guu, Tatsunori B. Hashimoto, Yonatan Oren, Percy Liang

We propose a new generative model of sentences that first samples a prototype sentence from the training corpus and then edits it into a new sentence.

Language Modelling Sentence +1

327

Paper
Code

Importance sampling for unbiased on-demand evaluation of knowledge base population

no code implementations • EMNLP 2017 • Arun Chaganty, Ashwin Paranjape, Percy Liang, Christopher D. Manning

Our first contribution is a new importance-sampling based evaluation which corrects for this bias by annotating a new system{'}s predictions on-demand via crowdsourcing.

Information Retrieval Knowledge Base Population +1

Paper
Add Code

World of Bits: An Open-Domain Platform for Web-Based Agents

no code implementations • ICML 2017 • Tianlin Shi, Andrej Karpathy, Linxi Fan, Jonathan Hernandez, Percy Liang

While simulated game environments have greatly accelerated research in reinforcement learning, existing environments lack the open-domain realism of tasks in computer vision or natural language processing, which operate on artifacts created by humans in natural, organic settings.

reinforcement-learning Reinforcement Learning (RL)

Paper
Add Code

Macro Grammars and Holistic Triggering for Efficient Semantic Parsing

2 code implementations • EMNLP 2017 • Yuchen Zhang, Panupong Pasupat, Percy Liang

To learn a semantic parser from denotations, a learning algorithm must search over a combinatorially large space of logical forms for ones consistent with the annotated denotations.

Semantic Parsing Sentence +1

823

Paper
Code

Adversarial Examples for Evaluating Reading Comprehension Systems

3 code implementations • EMNLP 2017 • Robin Jia, Percy Liang

Standard accuracy metrics indicate that reading comprehension systems are making rapid progress, but the extent to which these systems truly understand language remains unclear.

Question Answering Reading Comprehension

4,291

Paper
Code

Developing Bug-Free Machine Learning Systems With Formal Mathematics

1 code implementation • ICML 2017 • Daniel Selsam, Percy Liang, David L. Dill

As a case study, we implement a new system, Certigrad, for optimizing over stochastic computation graphs, and we generate a formal (i. e. machine-checkable) proof that the gradients sampled by the system are unbiased estimates of the true mathematical gradients.

BIG-bench Machine Learning

385

Paper
Code

Certified Defenses for Data Poisoning Attacks

2 code implementations • NeurIPS 2017 • Jacob Steinhardt, Pang Wei Koh, Percy Liang

Machine learning systems trained on user-provided data are susceptible to data poisoning attacks, whereby malicious users inject false training data with the aim of corrupting the learned model.

Data Poisoning

Paper
Code

From Language to Programs: Bridging Reinforcement Learning and Maximum Marginal Likelihood

3 code implementations • ACL 2017 • Kelvin Guu, Panupong Pasupat, Evan Zheran Liu, Percy Liang

Our goal is to learn a semantic parser that maps natural language utterances into executable programs when only indirect supervision is available: examples are labeled with the correct execution result, but not the program itself.

reinforcement-learning Reinforcement Learning (RL) +1

Paper
Code

Learning Symmetric Collaborative Dialogue Agents with Dynamic Knowledge Graph Embeddings

2 code implementations • ACL 2017 • He He, Anusha Balakrishnan, Mihail Eric, Percy Liang

To model both structured knowledge and unstructured language, we propose a neural model with dynamic knowledge graph embeddings that evolve as the dialogue progresses.

Knowledge Graph Embeddings

155

Paper
Code

Naturalizing a Programming Language via Interactive Learning

1 code implementation • ACL 2017 • Sida I. Wang, Samuel Ginn, Percy Liang, Christoper D. Manning

Our goal is to create a convenient natural language interface for performing well-specified but complex actions such as analyzing data, manipulating text, and querying databases.

Paper
Code

Understanding Black-box Predictions via Influence Functions

19 code implementations • ICML 2017 • Pang Wei Koh, Percy Liang

How can we explain the predictions of a black-box model?

760

Paper
Code

A Hitting Time Analysis of Stochastic Gradient Langevin Dynamics

no code implementations • 18 Feb 2017 • Yuchen Zhang, Percy Liang, Moses Charikar

We study the Stochastic Gradient Langevin Dynamics (SGLD) algorithm for non-convex optimization.

Paper
Add Code

Prediction with a Short Memory

no code implementations • 8 Dec 2016 • Vatsal Sharan, Sham Kakade, Percy Liang, Gregory Valiant

For a Hidden Markov Model with $n$ hidden states, $I$ is bounded by $\log n$, a quantity that does not depend on the mixing time, and we show that the trivial prediction algorithm based on the empirical frequencies of length $O(\log n/\epsilon)$ windows of observations achieves this error, provided the length of the sequence is $d^{\Omega(\log n/\epsilon)}$, where $d$ is the size of the observation alphabet.

Paper
Add Code

Convexified Convolutional Neural Networks

1 code implementation • ICML 2017 • Yuchen Zhang, Percy Liang, Martin J. Wainwright

For learning two-layer convolutional neural networks, we prove that the generalization error obtained by a convexified CNN converges to that of the best possible CNN.

Denoising

Paper
Code

How Much is 131 Million Dollars? Putting Numbers in Perspective with Compositional Descriptions

1 code implementation • ACL 2016 • Arun Tejasvi Chaganty, Percy Liang

We then propose a system to generate these descriptions consisting of two steps: formula construction and description generation.

Paper
Code

Estimation from Indirect Supervision with Linear Moments

1 code implementation • 10 Aug 2016 • Aditi Raghunathan, Roy Frostig, John Duchi, Percy Liang

In structured prediction problems where we have indirect supervision of the output, maximum marginal likelihood faces two computational obstacles: non-convexity of the objective and intractability of even a single gradient computation.

Structured Prediction

Paper
Code

Synthesizing Program Input Grammars

1 code implementation • 5 Aug 2016 • Osbert Bastani, Rahul Sharma, Alex Aiken, Percy Liang

We present an algorithm for synthesizing a context-free grammar encoding the language of valid program inputs from a set of input examples and blackbox access to the program.

Programming Languages

Paper
Code

Unanimous Prediction for 100\% Precision with Application to Learning Semantic Mappings

no code implementations • ACL 2016 • Fereshte Khani, Martin Rinard, Percy Liang

Question Answering Semantic Parsing

Paper
Add Code

Inferring Logical Forms From Denotations

2 code implementations • ACL 2016 • Panupong Pasupat, Percy Liang

A core problem in learning semantic parsers from denotations is picking out consistent logical forms--those that yield the correct denotation--from a combinatorially large space.

823

Paper
Code

Unanimous Prediction for 100% Precision with Application to Learning Semantic Mappings

1 code implementation • 20 Jun 2016 • Fereshte Khani, Martin Rinard, Percy Liang

Specifically, we introduce the unanimity principle: only predict when all models consistent with the training data predict the same output.

Semantic Parsing

Paper
Code

Unsupervised Risk Estimation Using Only Conditional Independence Structure

no code implementations • NeurIPS 2016 • Jacob Steinhardt, Percy Liang

We show how to estimate a model's test error from unlabeled data, on distributions very different from the training distribution, while assuming only that certain conditional independencies are preserved between train and test.

Paper
Add Code

Simpler Context-Dependent Logical Forms via Model Projections

1 code implementation • ACL 2016 • Reginald Long, Panupong Pasupat, Percy Liang

With only denotations at training time, we must search over a combinatorially large space of logical forms, which is even larger with context-dependent utterances.

Semantic Parsing

Paper
Code

SQuAD: 100,000+ Questions for Machine Comprehension of Text

19 code implementations • EMNLP 2016 • Pranav Rajpurkar, Jian Zhang, Konstantin Lopyrev, Percy Liang

We present the Stanford Question Answering Dataset (SQuAD), a new reading comprehension dataset consisting of 100, 000+ questions posed by crowdworkers on a set of Wikipedia articles, where the answer to each question is a segment of text from the corresponding reading passage.

Question Answering Reading Comprehension +1

1,374

Paper
Code

Data Recombination for Neural Semantic Parsing

1 code implementation • ACL 2016 • Robin Jia, Percy Liang

Modeling crisp logical regularities is crucial in semantic parsing, making it difficult for neural models with no task-specific prior knowledge to achieve good results.

Semantic Parsing

Paper
Code

Learning Language Games through Interaction

3 code implementations • ACL 2016 • Sida I. Wang, Percy Liang, Christopher D. Manning

We introduce a new language learning setting relevant to building adaptive natural language interfaces.

Semantic Parsing

111

Paper
Code

Estimating Mixture Models via Mixtures of Polynomials

3 code implementations • NeurIPS 2015 • Sida I. Wang, Arun Tejasvi Chaganty, Percy Liang

This framework allows us to draw insights and apply tools from convex optimization, computer algebra and the theory of moments to study problems in statistical estimation.