no code implementations • ICML 2020 • Shiori Sagawa, aditi raghunathan, Pang Wei Koh, Percy Liang
Increasing model capacity well beyond the point of zero training error has been observed to improve average test accuracy.
no code implementations • 18 Apr 2024 • Bertie Vidgen, Adarsh Agrawal, Ahmed M. Ahmed, Victor Akinwande, Namir Al-Nuaimi, Najla Alfaraj, Elie Alhajjar, Lora Aroyo, Trupti Bavalatti, Borhane Blili-Hamelin, Kurt Bollacker, Rishi Bomassani, Marisa Ferrara Boston, Siméon Campos, Kal Chakra, Canyu Chen, Cody Coleman, Zacharie Delpierre Coudert, Leon Derczynski, Debojyoti Dutta, Ian Eisenberg, James Ezick, Heather Frase, Brian Fuller, Ram Gandikota, Agasthya Gangavarapu, Ananya Gangavarapu, James Gealy, Rajat Ghosh, James Goel, Usman Gohar, Sujata Goswami, Scott A. Hale, Wiebke Hutiri, Joseph Marvin Imperial, Surgan Jandial, Nick Judd, Felix Juefei-Xu, Foutse khomh, Bhavya Kailkhura, Hannah Rose Kirk, Kevin Klyman, Chris Knotz, Michael Kuchnik, Shachi H. Kumar, Chris Lengerich, Bo Li, Zeyi Liao, Eileen Peters Long, Victor Lu, Yifan Mai, Priyanka Mary Mammen, Kelvin Manyeki, Sean McGregor, Virendra Mehta, Shafee Mohammed, Emanuel Moss, Lama Nachman, Dinesh Jinenhally Naganna, Amin Nikanjam, Besmira Nushi, Luis Oala, Iftach Orr, Alicia Parrish, Cigdem Patlak, William Pietri, Forough Poursabzi-Sangdeh, Eleonora Presani, Fabrizio Puletti, Paul Röttger, Saurav Sahay, Tim Santos, Nino Scherrer, Alice Schoenauer Sebag, Patrick Schramowski, Abolfazl Shahbazi, Vin Sharma, Xudong Shen, Vamsi Sistla, Leonard Tang, Davide Testuggine, Vithursan Thangarasa, Elizabeth Anne Watkins, Rebecca Weiss, Chris Welty, Tyler Wilbers, Adina Williams, Carole-Jean Wu, Poonam Yadav, Xianjun Yang, Yi Zeng, Wenhui Zhang, Fedor Zhdanov, Jiacheng Zhu, Percy Liang, Peter Mattson, Joaquin Vanschoren
We created a new taxonomy of 13 hazard categories, of which 7 have tests in the v0. 5 benchmark.
1 code implementation • 6 Apr 2024 • Yann Dubois, Balázs Galambosi, Percy Liang, Tatsunori B. Hashimoto
Even simple, known confounders such as preference for longer outputs remain in existing automated evaluation metrics.
1 code implementation • 2 Apr 2024 • Joel Niklaus, Lucia Zheng, Arya D. McCarthy, Christopher Hahn, Brian M. Rosen, Peter Henderson, Daniel E. Ho, Garrett Honke, Percy Liang, Christopher Manning
In this work, we curate LawInstruct, a large legal instruction dataset, covering 17 jurisdictions, 24 languages and a total of 12M examples.
1 code implementation • 27 Mar 2024 • Elliot Bolton, Abhinav Venigalla, Michihiro Yasunaga, David Hall, Betty Xiong, Tony Lee, Roxana Daneshjou, Jonathan Frankle, Percy Liang, Michael Carbin, Christopher D. Manning
Models such as GPT-4 and Med-PaLM 2 have demonstrated impressive performance on a wide variety of biomedical NLP tasks.
no code implementations • 7 Mar 2024 • Shayne Longpre, Sayash Kapoor, Kevin Klyman, Ashwin Ramaswami, Rishi Bommasani, Borhane Blili-Hamelin, Yangsibo Huang, Aviya Skowron, Zheng-Xin Yong, Suhas Kotha, Yi Zeng, Weiyan Shi, Xianjun Yang, Reid Southen, Alexander Robey, Patrick Chao, Diyi Yang, Ruoxi Jia, Daniel Kang, Sandy Pentland, Arvind Narayanan, Percy Liang, Peter Henderson
Independent evaluation and red teaming are critical for identifying the risks posed by generative AI systems.
no code implementations • 27 Feb 2024 • Sayash Kapoor, Rishi Bommasani, Kevin Klyman, Shayne Longpre, Ashwin Ramaswami, Peter Cihon, Aspen Hopkins, Kevin Bankston, Stella Biderman, Miranda Bogen, Rumman Chowdhury, Alex Engler, Peter Henderson, Yacine Jernite, Seth Lazar, Stefano Maffulli, Alondra Nelson, Joelle Pineau, Aviya Skowron, Dawn Song, Victor Storchan, Daniel Zhang, Daniel E. Ho, Percy Liang, Arvind Narayanan
To understand their risks of misuse, we design a risk assessment framework for analyzing their marginal risk.
no code implementations • 26 Feb 2024 • Rishi Bommasani, Kevin Klyman, Shayne Longpre, Betty Xiong, Sayash Kapoor, Nestor Maslej, Arvind Narayanan, Percy Liang
Foundation models are critical digital technologies with sweeping societal impact that necessitates transparency.
2 code implementations • 12 Feb 2024 • Siddharth Karamcheti, Suraj Nair, Ashwin Balakrishna, Percy Liang, Thomas Kollar, Dorsa Sadigh
Visually-conditioned language models (VLMs) have seen growing adoption in applications such as visual dialogue, scene understanding, and robotic task planning; adoption that has fueled a wealth of new models such as LLaVa, InstructBLIP, and PaLI-3.
1 code implementation • 9 Feb 2024 • John Hewitt, Sarah Chen, Lanruo Lora Xie, Edward Adams, Percy Liang, Christopher D. Manning
The Backpack defines a large bank of sense vectors--a decomposition of the different uses of each word--which are weighted and summed to form the output logits of the model.
1 code implementation • 7 Dec 2023 • Chenchen Gu, Xiang Lisa Li, Percy Liang, Tatsunori Hashimoto
Watermarking of language model outputs enables statistical detection of model-generated text, which has many applications in the responsible deployment of language models.
no code implementations • 15 Nov 2023 • Vaishnavi Shrivastava, Percy Liang, Ananya Kumar
To maintain user trust, large language models (LLMs) should signal low confidence on examples where they are incorrect, instead of misleading the user.
1 code implementation • NeurIPS 2023 • Tony Lee, Michihiro Yasunaga, Chenlin Meng, Yifan Mai, Joon Sung Park, Agrim Gupta, Yunzhi Zhang, Deepak Narayanan, Hannah Benita Teufel, Marco Bellagente, Minguk Kang, Taesung Park, Jure Leskovec, Jun-Yan Zhu, Li Fei-Fei, Jiajun Wu, Stefano Ermon, Percy Liang
The stunning qualitative improvement of recent text-to-image models has led to their widespread attention and adoption.
1 code implementation • 19 Oct 2023 • Rishi Bommasani, Kevin Klyman, Shayne Longpre, Sayash Kapoor, Nestor Maslej, Betty Xiong, Daniel Zhang, Percy Liang
We score 10 major foundation model developers (e. g. OpenAI, Google, Meta) against the 100 indicators to assess their transparency.
1 code implementation • 5 Oct 2023 • Qian Huang, Jian Vora, Percy Liang, Jure Leskovec
A central aspect of machine learning research is experimentation, the process of designing and running experiments, analyzing the results, and iterating towards some positive outcome (e. g., improving accuracy).
no code implementations • 3 Oct 2023 • Michihiro Yasunaga, Xinyun Chen, Yujia Li, Panupong Pasupat, Jure Leskovec, Percy Liang, Ed H. Chi, Denny Zhou
Chain-of-thought (CoT) prompting for language models demonstrates impressive performance across reasoning tasks, but typically needs labeled exemplars of the reasoning process.
no code implementations • 3 Oct 2023 • Xiang Lisa Li, Vaishnavi Shrivastava, Siyan Li, Tatsunori Hashimoto, Percy Liang
To improve the consistency of LMs, we propose to finetune on the filtered generator and validator responses that are GV-consistent, and call this approach consistency fine-tuning.
no code implementations • 27 Aug 2023 • Scott L. Fleming, Alejandro Lozano, William J. Haberkorn, Jenelle A. Jindal, Eduardo P. Reis, Rahul Thapa, Louis Blankemeier, Julian Z. Genkins, Ethan Steinberg, Ashwin Nayak, Birju S. Patel, Chia-Chun Chiang, Alison Callahan, Zepeng Huo, Sergios Gatidis, Scott J. Adams, Oluseyi Fayanju, Shreya J. Shah, Thomas Savage, Ethan Goh, Akshay S. Chaudhari, Nima Aghaeepour, Christopher Sharp, Michael A. Pfeffer, Percy Liang, Jonathan H. Chen, Keith E. Morse, Emma P. Brunskill, Jason A. Fries, Nigam H. Shah
The ability of large language models (LLMs) to follow natural language instructions with human-level fluency suggests many opportunities in healthcare to reduce administrative burden and improve quality of care.
2 code implementations • 28 Jul 2023 • Rohith Kuditipudi, John Thickstun, Tatsunori Hashimoto, Percy Liang
We generate watermarked text by mapping a sequence of random numbers -- which we compute using a randomized watermark key -- to a sample from the language model.
no code implementations • NeurIPS 2023 • Connor Toups, Rishi Bommasani, Kathleen A. Creel, Sarah H. Bana, Dan Jurafsky, Percy Liang
In practice, the societal impact of machine learning is determined by the surrounding context of machine learning deployments.
4 code implementations • 6 Jul 2023 • Nelson F. Liu, Kevin Lin, John Hewitt, Ashwin Paranjape, Michele Bevilacqua, Fabio Petroni, Percy Liang
While recent language models have the ability to take long contexts as input, relatively little is known about how well they use longer context.
1 code implementation • 16 Jun 2023 • Eric Zelikman, Qian Huang, Percy Liang, Nick Haber, Noah D. Goodman
Language model training in distributed settings is limited by the communication cost of gradient exchanges.
no code implementations • 14 Jun 2023 • John Thickstun, David Hall, Chris Donahue, Percy Liang
We achieve this by interleaving sequences of events and controls, such that controls appear following stopping times in the event sequence.
no code implementations • 6 Jun 2023 • Steven Cao, Percy Liang, Gregory Valiant
We propose a natural algorithm that involves imputing the missing values of the matrix $X^TX$ and show that even with only two observations per row in $X$, we can provably recover $X^TX$ as long as we have at least $\Omega(r^2 d \log d)$ rows, where $r$ is the rank and $d$ is the number of columns.
no code implementations • 5 Jun 2023 • Alina Beygelzimer, Yann N. Dauphin, Percy Liang, Jennifer Wortman Vaughan
We present the NeurIPS 2021 consistency experiment, a larger-scale variant of the 2014 NeurIPS experiment in which 10% of conference submissions were reviewed by two independent committees to quantify the randomness in the review process.
1 code implementation • 27 May 2023 • Yuhui Zhang, Michihiro Yasunaga, Zhengping Zhou, Jeff Z. HaoChen, James Zou, Percy Liang, Serena Yeung
Language models have been shown to exhibit positive scaling, where performance improves as models are scaled up in terms of size, compute, or data.
1 code implementation • 26 May 2023 • John Hewitt, John Thickstun, Christopher D. Manning, Percy Liang
We can interpret a sense vector by inspecting its (non-contextual, linear) projection onto the output space, and intervene on these interpretable hooks to change the model's behavior in predictable ways.
3 code implementations • 23 May 2023 • Hong Liu, Zhiyuan Li, David Hall, Percy Liang, Tengyu Ma
Given the massive cost of language model pre-training, a non-trivial improvement of the optimization algorithm would lead to a material reduction on the time and cost of training.
2 code implementations • NeurIPS 2023 • Yann Dubois, Xuechen Li, Rohan Taori, Tianyi Zhang, Ishaan Gulrajani, Jimmy Ba, Carlos Guestrin, Percy Liang, Tatsunori B. Hashimoto
As a demonstration of the research possible in AlpacaFarm, we find that methods that use a reward model can substantially improve over supervised fine-tuning and that our reference PPO implementation leads to a +10% improvement in win-rate against Davinci003.
no code implementations • NeurIPS 2023 • Qian Huang, Hongyu Ren, Peng Chen, Gregor Kržmanc, Daniel Zeng, Percy Liang, Jure Leskovec
In-context learning is the ability of a pretrained model to adapt to novel and diverse downstream tasks by conditioning on prompt examples, without optimizing any parameters.
2 code implementations • NeurIPS 2023 • Sang Michael Xie, Hieu Pham, Xuanyi Dong, Nan Du, Hanxiao Liu, Yifeng Lu, Percy Liang, Quoc V. Le, Tengyu Ma, Adams Wei Yu
The mixture proportions of pretraining data domains (e. g., Wikipedia, books, web text) greatly affect language model (LM) performance.
no code implementations • 3 May 2023 • Deepak Narayanan, Keshav Santhanam, Peter Henderson, Rishi Bommasani, Tony Lee, Percy Liang
Large language models (LLMs) power many state-of-the-art systems in natural language processing.
1 code implementation • 19 Apr 2023 • Nelson F. Liu, Tianyi Zhang, Percy Liang
Generative search engines directly generate responses to user queries, along with in-line citations.
7 code implementations • 7 Apr 2023 • Joon Sung Park, Joseph C. O'Brien, Carrie J. Cai, Meredith Ringel Morris, Percy Liang, Michael S. Bernstein
Believable proxies of human behavior can empower interactive applications ranging from immersive environments to rehearsal spaces for interpersonal communication to prototyping tools.
1 code implementation • 30 Mar 2023 • Shibani Santurkar, Esin Durmus, Faisal Ladhak, Cinoo Lee, Percy Liang, Tatsunori Hashimoto
Language models (LMs) are increasingly being used in open-ended contexts, where the opinions reflected by LMs in response to subjective queries can have a profound impact, both on user satisfaction, as well as shaping the views of society at large.
no code implementations • 28 Mar 2023 • Rishi Bommasani, Dilara Soylu, Thomas I. Liao, Kathleen A. Creel, Percy Liang
Foundation models (e. g. ChatGPT, StableDiffusion) pervasively influence society, warranting immediate social attention.
1 code implementation • 13 Mar 2023 • Ying Sheng, Lianmin Zheng, Binhang Yuan, Zhuohan Li, Max Ryabinin, Daniel Y. Fu, Zhiqiang Xie, Beidi Chen, Clark Barrett, Joseph E. Gonzalez, Percy Liang, Christopher Ré, Ion Stoica, Ce Zhang
As a result, when running OPT-175B on a single 16GB GPU, FlexGen achieves significantly higher throughput compared to state-of-the-art offloading systems, reaching a generation throughput of 1 token/s for the first time with an effective batch size of 144.
1 code implementation • 26 Feb 2023 • Michael Sun, Ananya Kumar, Divyam Madaan, Percy Liang
We consider the continual representation learning setting: sequentially pretrain a model $M'$ on tasks $T_1, \ldots, T_T$, and then adapt $M'$ on a small amount of data from each task $T_i$ to check if it has forgotten information from old tasks.
2 code implementations • 24 Feb 2023 • Siddharth Karamcheti, Suraj Nair, Annie S. Chen, Thomas Kollar, Chelsea Finn, Dorsa Sadigh, Percy Liang
First, we demonstrate that existing representations yield inconsistent results across these tasks: masked autoencoding approaches pick up on low-level spatial features at the cost of high-level semantics, while contrastive learning approaches capture the opposite.
1 code implementation • 23 Feb 2023 • Irena Gao, Shiori Sagawa, Pang Wei Koh, Tatsunori Hashimoto, Percy Liang
Models trained on one set of domains often suffer performance drops on unseen domains, e. g., when wildlife monitoring models are deployed in new camera locations.
1 code implementation • 6 Feb 2023 • Yann Dubois, Tatsunori Hashimoto, Percy Liang
Our decomposition consists of four error components: approximation, representation usability, probe generalization, and encoder generalization.
1 code implementation • NeurIPS 2023 • Sang Michael Xie, Shibani Santurkar, Tengyu Ma, Percy Liang
To measure whether hashed n-gram features preserve the aspects of the data that are relevant to the target, we define KL reduction, a data metric that measures the proximity between the selected pretraining data and the target on some feature space.
1 code implementation • 31 Jan 2023 • Tianyi Zhang, Faisal Ladhak, Esin Durmus, Percy Liang, Kathleen McKeown, Tatsunori B. Hashimoto
Large language models (LLMs) have shown promise for automatic summarization but the reasons behind their successes are poorly understood.
1 code implementation • 6 Jan 2023 • Yuchen Cui, Siddharth Karamcheti, Raj Palleti, Nidhya Shivakumar, Percy Liang, Dorsa Sadigh
Instead of discrete turn-taking between a human and robot, LILAC splits agency between the human and robot: language is an input to a learned model that produces a meaningful, low-dimensional control space that the human can use to guide the robot.
2 code implementations • 28 Dec 2022 • Omar Khattab, Keshav Santhanam, Xiang Lisa Li, David Hall, Percy Liang, Christopher Potts, Matei Zaharia
Retrieval-augmented in-context learning has emerged as a powerful approach for addressing knowledge-intensive tasks using frozen language models (LM) and retrieval models (RM).
1 code implementation • 20 Dec 2022 • Rishi Bommasani, Percy Liang
How do we design measures of social bias that we trust?
1 code implementation • 19 Dec 2022 • Mina Lee, Megha Srivastava, Amelia Hardy, John Thickstun, Esin Durmus, Ashwin Paranjape, Ines Gerard-Ursin, Xiang Lisa Li, Faisal Ladhak, Frieda Rong, Rose E. Wang, Minae Kwon, Joon Sung Park, Hancheng Cao, Tony Lee, Rishi Bommasani, Michael Bernstein, Percy Liang
To evaluate human-LM interaction, we develop a new framework, Human-AI Language-based Interaction Evaluation (HALIE), that defines the components of interactive systems and dimensions to consider when designing evaluation metrics.
1 code implementation • 4 Dec 2022 • Chris Donahue, John Thickstun, Percy Liang
The combination of generative pre-training and a new dataset for this task results in $77$% stronger performance on melody transcription relative to the strongest available baseline.
no code implementations • 25 Nov 2022 • Rishi Bommasani, Kathleen A. Creel, Ananya Kumar, Dan Jurafsky, Percy Liang
As the scope of machine learning broadens, we observe a recurring theme of algorithmic monoculture: the same systems, or systems that share components (e. g. training data), are deployed by multiple decision-makers.
no code implementations • 22 Nov 2022 • Charvi Rastogi, Ivan Stelmakh, Alina Beygelzimer, Yann N. Dauphin, Percy Liang, Jennifer Wortman Vaughan, Zhenyu Xue, Hal Daumé III, Emma Pierson, Nihar B. Shah
In a top-tier computer science conference (NeurIPS 2021) with more than 23, 000 submitting authors and 9, 000 submitted papers, we survey the authors on three questions: (i) their predicted probability of acceptance for each of their papers, (ii) their perceived ranking of their own papers based on scientific contribution, and (iii) the change in their perception about their own papers after seeing the reviews.
no code implementations • 22 Nov 2022 • Michihiro Yasunaga, Armen Aghajanyan, Weijia Shi, Rich James, Jure Leskovec, Percy Liang, Mike Lewis, Luke Zettlemoyer, Wen-tau Yih
To integrate knowledge in a more scalable and modular way, we propose a retrieval-augmented multimodal model, which enables a base multimodal model (generator) to refer to relevant text and images fetched by a retriever from external memory (e. g., documents on the web).
Ranked #7 on Image Captioning on MS COCO
1 code implementation • 16 Nov 2022 • Percy Liang, Rishi Bommasani, Tony Lee, Dimitris Tsipras, Dilara Soylu, Michihiro Yasunaga, Yian Zhang, Deepak Narayanan, Yuhuai Wu, Ananya Kumar, Benjamin Newman, Binhang Yuan, Bobby Yan, Ce Zhang, Christian Cosgrove, Christopher D. Manning, Christopher Ré, Diana Acosta-Navas, Drew A. Hudson, Eric Zelikman, Esin Durmus, Faisal Ladhak, Frieda Rong, Hongyu Ren, Huaxiu Yao, Jue Wang, Keshav Santhanam, Laurel Orr, Lucia Zheng, Mert Yuksekgonul, Mirac Suzgun, Nathan Kim, Neel Guha, Niladri Chatterji, Omar Khattab, Peter Henderson, Qian Huang, Ryan Chi, Sang Michael Xie, Shibani Santurkar, Surya Ganguli, Tatsunori Hashimoto, Thomas Icard, Tianyi Zhang, Vishrav Chaudhary, William Wang, Xuechen Li, Yifan Mai, Yuhui Zhang, Yuta Koreeda
We present Holistic Evaluation of Language Models (HELM) to improve the transparency of language models.
2 code implementations • 27 Oct 2022 • Xiang Lisa Li, Ari Holtzman, Daniel Fried, Percy Liang, Jason Eisner, Tatsunori Hashimoto, Luke Zettlemoyer, Mike Lewis
We propose contrastive decoding (CD), a reliable decoding approach that optimizes a contrastive objective subject to a plausibility constraint.
1 code implementation • 27 Oct 2022 • John Hewitt, Christopher D. Manning, Percy Liang
In this light, truncation algorithms aim to perform desmoothing, estimating a subset of the support of the true distribution.
1 code implementation • 20 Oct 2022 • Yoonho Lee, Annie S. Chen, Fahim Tajwar, Ananya Kumar, Huaxiu Yao, Percy Liang, Chelsea Finn
A common approach to transfer learning under distribution shift is to fine-tune the last few layers of a pre-trained model, preserving learned features while also adapting to the new task.
1 code implementation • 17 Oct 2022 • Michihiro Yasunaga, Antoine Bosselut, Hongyu Ren, Xikun Zhang, Christopher D Manning, Percy Liang, Jure Leskovec
Pretraining a language model (LM) on text has been shown to help various downstream NLP tasks.
Ranked #1 on Riddle Sense on RiddleSense
no code implementations • 12 Oct 2022 • Nelson F. Liu, Ananya Kumar, Percy Liang, Robin Jia
Recent results in image classification and extractive question answering have observed that pre-trained models trained on less in-distribution data have better out-of-distribution performance.
1 code implementation • 13 Sep 2022 • Yann Dubois, Tatsunori Hashimoto, Stefano Ermon, Percy Liang
For non-contrastive learning, we use our framework to derive a simple and novel objective.
2 code implementations • 1 Aug 2022 • Shivam Garg, Dimitris Tsipras, Percy Liang, Gregory Valiant
To make progress towards understanding in-context learning, we consider the well-defined problem of training a model to in-context learn a function class (e. g., linear functions): that is, given data derived from some functions in the class, can we train a model to in-context learn "most" functions from this class?
no code implementations • 18 Jul 2022 • Ananya Kumar, Tengyu Ma, Percy Liang, aditi raghunathan
We often see undesirable tradeoffs in robust machine learning where out-of-distribution (OOD) accuracy is at odds with in-distribution (ID) accuracy: a robust classifier obtained via specialized techniques such as removing spurious features often has better OOD but worse ID accuracy compared to a standard classifier trained via ERM.
no code implementations • 15 Jul 2022 • Shibani Santurkar, Yann Dubois, Rohan Taori, Percy Liang, Tatsunori Hashimoto
The development of CLIP [Radford et al., 2021] has sparked a debate on whether language supervision can result in vision models with more transferable representations than traditional image-only methods.
1 code implementation • 21 Jun 2022 • Yuhuai Wu, Felix Li, Percy Liang
Second, to our surprise, we find that pre-training on a simple and generic synthetic task defined by the Set function achieves $65\%$ of the benefits, almost matching LIME.
no code implementations • 15 Jun 2022 • Jason Wei, Yi Tay, Rishi Bommasani, Colin Raffel, Barret Zoph, Sebastian Borgeaud, Dani Yogatama, Maarten Bosma, Denny Zhou, Donald Metzler, Ed H. Chi, Tatsunori Hashimoto, Oriol Vinyals, Percy Liang, Jeff Dean, William Fedus
Scaling up language models has been shown to predictably improve performance and sample efficiency on a wide range of downstream tasks.
3 code implementations • 9 Jun 2022 • Aarohi Srivastava, Abhinav Rastogi, Abhishek Rao, Abu Awal Md Shoeb, Abubakar Abid, Adam Fisch, Adam R. Brown, Adam Santoro, Aditya Gupta, Adrià Garriga-Alonso, Agnieszka Kluska, Aitor Lewkowycz, Akshat Agarwal, Alethea Power, Alex Ray, Alex Warstadt, Alexander W. Kocurek, Ali Safaya, Ali Tazarv, Alice Xiang, Alicia Parrish, Allen Nie, Aman Hussain, Amanda Askell, Amanda Dsouza, Ambrose Slone, Ameet Rahane, Anantharaman S. Iyer, Anders Andreassen, Andrea Madotto, Andrea Santilli, Andreas Stuhlmüller, Andrew Dai, Andrew La, Andrew Lampinen, Andy Zou, Angela Jiang, Angelica Chen, Anh Vuong, Animesh Gupta, Anna Gottardi, Antonio Norelli, Anu Venkatesh, Arash Gholamidavoodi, Arfa Tabassum, Arul Menezes, Arun Kirubarajan, Asher Mullokandov, Ashish Sabharwal, Austin Herrick, Avia Efrat, Aykut Erdem, Ayla Karakaş, B. Ryan Roberts, Bao Sheng Loe, Barret Zoph, Bartłomiej Bojanowski, Batuhan Özyurt, Behnam Hedayatnia, Behnam Neyshabur, Benjamin Inden, Benno Stein, Berk Ekmekci, Bill Yuchen Lin, Blake Howald, Bryan Orinion, Cameron Diao, Cameron Dour, Catherine Stinson, Cedrick Argueta, César Ferri Ramírez, Chandan Singh, Charles Rathkopf, Chenlin Meng, Chitta Baral, Chiyu Wu, Chris Callison-Burch, Chris Waites, Christian Voigt, Christopher D. Manning, Christopher Potts, Cindy Ramirez, Clara E. Rivera, Clemencia Siro, Colin Raffel, Courtney Ashcraft, Cristina Garbacea, Damien Sileo, Dan Garrette, Dan Hendrycks, Dan Kilman, Dan Roth, Daniel Freeman, Daniel Khashabi, Daniel Levy, Daniel Moseguí González, Danielle Perszyk, Danny Hernandez, Danqi Chen, Daphne Ippolito, Dar Gilboa, David Dohan, David Drakard, David Jurgens, Debajyoti Datta, Deep Ganguli, Denis Emelin, Denis Kleyko, Deniz Yuret, Derek Chen, Derek Tam, Dieuwke Hupkes, Diganta Misra, Dilyar Buzan, Dimitri Coelho Mollo, Diyi Yang, Dong-Ho Lee, Dylan Schrader, Ekaterina Shutova, Ekin Dogus Cubuk, Elad Segal, Eleanor Hagerman, Elizabeth Barnes, Elizabeth Donoway, Ellie Pavlick, Emanuele Rodola, Emma Lam, Eric Chu, Eric Tang, Erkut Erdem, Ernie Chang, Ethan A. Chi, Ethan Dyer, Ethan Jerzak, Ethan Kim, Eunice Engefu Manyasi, Evgenii Zheltonozhskii, Fanyue Xia, Fatemeh Siar, Fernando Martínez-Plumed, Francesca Happé, Francois Chollet, Frieda Rong, Gaurav Mishra, Genta Indra Winata, Gerard de Melo, Germán Kruszewski, Giambattista Parascandolo, Giorgio Mariani, Gloria Wang, Gonzalo Jaimovitch-López, Gregor Betz, Guy Gur-Ari, Hana Galijasevic, Hannah Kim, Hannah Rashkin, Hannaneh Hajishirzi, Harsh Mehta, Hayden Bogar, Henry Shevlin, Hinrich Schütze, Hiromu Yakura, Hongming Zhang, Hugh Mee Wong, Ian Ng, Isaac Noble, Jaap Jumelet, Jack Geissinger, Jackson Kernion, Jacob Hilton, Jaehoon Lee, Jaime Fernández Fisac, James B. Simon, James Koppel, James Zheng, James Zou, Jan Kocoń, Jana Thompson, Janelle Wingfield, Jared Kaplan, Jarema Radom, Jascha Sohl-Dickstein, Jason Phang, Jason Wei, Jason Yosinski, Jekaterina Novikova, Jelle Bosscher, Jennifer Marsh, Jeremy Kim, Jeroen Taal, Jesse Engel, Jesujoba Alabi, Jiacheng Xu, Jiaming Song, Jillian Tang, Joan Waweru, John Burden, John Miller, John U. Balis, Jonathan Batchelder, Jonathan Berant, Jörg Frohberg, Jos Rozen, Jose Hernandez-Orallo, Joseph Boudeman, Joseph Guerr, Joseph Jones, Joshua B. Tenenbaum, Joshua S. Rule, Joyce Chua, Kamil Kanclerz, Karen Livescu, Karl Krauth, Karthik Gopalakrishnan, Katerina Ignatyeva, Katja Markert, Kaustubh D. Dhole, Kevin Gimpel, Kevin Omondi, Kory Mathewson, Kristen Chiafullo, Ksenia Shkaruta, Kumar Shridhar, Kyle McDonell, Kyle Richardson, Laria Reynolds, Leo Gao, Li Zhang, Liam Dugan, Lianhui Qin, Lidia Contreras-Ochando, Louis-Philippe Morency, Luca Moschella, Lucas Lam, Lucy Noble, Ludwig Schmidt, Luheng He, Luis Oliveros Colón, Luke Metz, Lütfi Kerem Şenel, Maarten Bosma, Maarten Sap, Maartje ter Hoeve, Maheen Farooqi, Manaal Faruqui, Mantas Mazeika, Marco Baturan, Marco Marelli, Marco Maru, Maria Jose Ramírez Quintana, Marie Tolkiehn, Mario Giulianelli, Martha Lewis, Martin Potthast, Matthew L. Leavitt, Matthias Hagen, Mátyás Schubert, Medina Orduna Baitemirova, Melody Arnaud, Melvin McElrath, Michael A. Yee, Michael Cohen, Michael Gu, Michael Ivanitskiy, Michael Starritt, Michael Strube, Michał Swędrowski, Michele Bevilacqua, Michihiro Yasunaga, Mihir Kale, Mike Cain, Mimee Xu, Mirac Suzgun, Mitch Walker, Mo Tiwari, Mohit Bansal, Moin Aminnaseri, Mor Geva, Mozhdeh Gheini, Mukund Varma T, Nanyun Peng, Nathan A. Chi, Nayeon Lee, Neta Gur-Ari Krakover, Nicholas Cameron, Nicholas Roberts, Nick Doiron, Nicole Martinez, Nikita Nangia, Niklas Deckers, Niklas Muennighoff, Nitish Shirish Keskar, Niveditha S. Iyer, Noah Constant, Noah Fiedel, Nuan Wen, Oliver Zhang, Omar Agha, Omar Elbaghdadi, Omer Levy, Owain Evans, Pablo Antonio Moreno Casares, Parth Doshi, Pascale Fung, Paul Pu Liang, Paul Vicol, Pegah Alipoormolabashi, Peiyuan Liao, Percy Liang, Peter Chang, Peter Eckersley, Phu Mon Htut, Pinyu Hwang, Piotr Miłkowski, Piyush Patil, Pouya Pezeshkpour, Priti Oli, Qiaozhu Mei, Qing Lyu, Qinlang Chen, Rabin Banjade, Rachel Etta Rudolph, Raefer Gabriel, Rahel Habacker, Ramon Risco, Raphaël Millière, Rhythm Garg, Richard Barnes, Rif A. Saurous, Riku Arakawa, Robbe Raymaekers, Robert Frank, Rohan Sikand, Roman Novak, Roman Sitelew, Ronan LeBras, Rosanne Liu, Rowan Jacobs, Rui Zhang, Ruslan Salakhutdinov, Ryan Chi, Ryan Lee, Ryan Stovall, Ryan Teehan, Rylan Yang, Sahib Singh, Saif M. Mohammad, Sajant Anand, Sam Dillavou, Sam Shleifer, Sam Wiseman, Samuel Gruetter, Samuel R. Bowman, Samuel S. Schoenholz, Sanghyun Han, Sanjeev Kwatra, Sarah A. Rous, Sarik Ghazarian, Sayan Ghosh, Sean Casey, Sebastian Bischoff, Sebastian Gehrmann, Sebastian Schuster, Sepideh Sadeghi, Shadi Hamdan, Sharon Zhou, Shashank Srivastava, Sherry Shi, Shikhar Singh, Shima Asaadi, Shixiang Shane Gu, Shubh Pachchigar, Shubham Toshniwal, Shyam Upadhyay, Shyamolima, Debnath, Siamak Shakeri, Simon Thormeyer, Simone Melzi, Siva Reddy, Sneha Priscilla Makini, Soo-Hwan Lee, Spencer Torene, Sriharsha Hatwar, Stanislas Dehaene, Stefan Divic, Stefano Ermon, Stella Biderman, Stephanie Lin, Stephen Prasad, Steven T. Piantadosi, Stuart M. Shieber, Summer Misherghi, Svetlana Kiritchenko, Swaroop Mishra, Tal Linzen, Tal Schuster, Tao Li, Tao Yu, Tariq Ali, Tatsu Hashimoto, Te-Lin Wu, Théo Desbordes, Theodore Rothschild, Thomas Phan, Tianle Wang, Tiberius Nkinyili, Timo Schick, Timofei Kornev, Titus Tunduny, Tobias Gerstenberg, Trenton Chang, Trishala Neeraj, Tushar Khot, Tyler Shultz, Uri Shaham, Vedant Misra, Vera Demberg, Victoria Nyamai, Vikas Raunak, Vinay Ramasesh, Vinay Uday Prabhu, Vishakh Padmakumar, Vivek Srikumar, William Fedus, William Saunders, William Zhang, Wout Vossen, Xiang Ren, Xiaoyu Tong, Xinran Zhao, Xinyi Wu, Xudong Shen, Yadollah Yaghoobzadeh, Yair Lakretz, Yangqiu Song, Yasaman Bahri, Yejin Choi, Yichi Yang, Yiding Hao, Yifu Chen, Yonatan Belinkov, Yu Hou, Yufang Hou, Yuntao Bai, Zachary Seid, Zhuoye Zhao, Zijian Wang, Zijie J. Wang, ZiRui Wang, Ziyi Wu
BIG-bench focuses on tasks that are believed to be beyond the capabilities of current language models.
1 code implementation • 2 Jun 2022 • Binhang Yuan, Yongjun He, Jared Quincy Davis, Tianyi Zhang, Tri Dao, Beidi Chen, Percy Liang, Christopher Re, Ce Zhang
Our key technical contribution is a scheduling algorithm that allocates different computational "tasklets" in the training of foundation models to a group of decentralized GPU devices connected by a slow heterogeneous network.
1 code implementation • 27 May 2022 • Xiang Lisa Li, John Thickstun, Ishaan Gulrajani, Percy Liang, Tatsunori B. Hashimoto
Controlling the behavior of language models (LMs) without re-training is a major open problem in natural language generation.
no code implementations • 1 Apr 2022 • Kendrick Shen, Robbie Jones, Ananya Kumar, Sang Michael Xie, Jeff Z. HaoChen, Tengyu Ma, Percy Liang
We consider unsupervised domain adaptation (UDA), where labeled data from a source domain (e. g., photographs) and unlabeled data from a target domain (e. g., sketches) are used to learn a classifier for the target domain.
1 code implementation • ACL 2022 • Michihiro Yasunaga, Jure Leskovec, Percy Liang
Language model (LM) pretraining can learn various knowledge from text corpora, helping downstream tasks.
Ranked #1 on Semantic Similarity on BIOSSES
3 code implementations • 21 Feb 2022 • Ananya Kumar, aditi raghunathan, Robbie Jones, Tengyu Ma, Percy Liang
However, in this paper, we find that fine-tuning can achieve worse accuracy than linear probing out-of-distribution (OOD) when the pretrained features are good and the distribution shift is large.
1 code implementation • 21 Jan 2022 • Xikun Zhang, Antoine Bosselut, Michihiro Yasunaga, Hongyu Ren, Percy Liang, Christopher D. Manning, Jure Leskovec
Answering complex questions about textual narratives requires reasoning over both stated context and the world knowledge that underlies it.
1 code implementation • 18 Jan 2022 • Mina Lee, Percy Liang, Qian Yang
Large language models (LMs) offer unprecedented language generation capabilities and exciting opportunities for interaction design.
1 code implementation • ICLR 2022 • Shiori Sagawa, Pang Wei Koh, Tony Lee, Irena Gao, Sang Michael Xie, Kendrick Shen, Ananya Kumar, Weihua Hu, Michihiro Yasunaga, Henrik Marklund, Sara Beery, Etienne David, Ian Stavness, Wei Guo, Jure Leskovec, Kate Saenko, Tatsunori Hashimoto, Sergey Levine, Chelsea Finn, Percy Liang
Unlabeled data can be a powerful point of leverage for mitigating these distribution shifts, as it is frequently much more available than labeled data and can often be obtained from distributions beyond the source distribution as well.
1 code implementation • 5 Nov 2021 • Siddharth Karamcheti, Megha Srivastava, Percy Liang, Dorsa Sadigh
We introduce Language-Informed Latent Actions (LILA), a framework for learning natural language interfaces in the context of human-robot collaboration.
1 code implementation • ICLR 2022 • Sang Michael Xie, aditi raghunathan, Percy Liang, Tengyu Ma
At test time, in-context learning occurs when the LM also infers a shared latent concept between examples in a prompt.
4 code implementations • ICLR 2022 • Xuechen Li, Florian Tramèr, Percy Liang, Tatsunori Hashimoto
Differentially Private (DP) learning has seen limited success for building large deep learning models of text, and straightforward attempts at applying Differentially Private Stochastic Gradient Descent (DP-SGD) to NLP tasks have resulted in large performance drops and high computational overhead.
no code implementations • 29 Sep 2021 • Ananya Kumar, aditi raghunathan, Tengyu Ma, Percy Liang
We often see undesirable tradeoffs in robust machine learning where out-of-distribution (OOD) accuracy is at odds with in-distribution (ID) accuracy.
no code implementations • ICLR 2022 • Ananya Kumar, aditi raghunathan, Robbie Matthew Jones, Tengyu Ma, Percy Liang
It is well known that fine-tuning leads to better accuracy in-distribution (ID).
no code implementations • 29 Sep 2021 • John Hewitt, Xiang Lisa Li, Sang Michael Xie, Benjamin Newman, Percy Liang
When finetuning a pretrained language model for natural language generation tasks, one is currently faced with a tradeoff.
no code implementations • ICLR 2022 • Xikun Zhang, Antoine Bosselut, Michihiro Yasunaga, Hongyu Ren, Percy Liang, Christopher D Manning, Jure Leskovec
Answering complex questions about textual narratives requires reasoning over both stated context and the world knowledge that underlies it.
no code implementations • 29 Sep 2021 • Kendrick Shen, Robbie Matthew Jones, Ananya Kumar, Sang Michael Xie, Percy Liang
We develop a conceptual model for contrastive learning under domain shifts, where data augmentations form connections between classes and domains that can be far apart.
1 code implementation • EMNLP 2021 • John Hewitt, Kawin Ethayarajh, Percy Liang, Christopher D. Manning
Probing experiments investigate the extent to which neural representations make properties -- like part-of-speech -- predictable.
2 code implementations • EMNLP 2021 • Michihiro Yasunaga, Jure Leskovec, Percy Liang
Training a model for grammatical error correction (GEC) requires a set of labeled ungrammatical / grammatical sentence pairs, but manually annotating such pairs can be expensive.
Ranked #2 on Grammatical Error Correction on Unrestricted
1 code implementation • 12 Sep 2021 • Fahim Tajwar, Ananya Kumar, Sang Michael Xie, Percy Liang
Out-of-distribution detection is an important component of reliable ML systems.
Out-of-Distribution Detection Out of Distribution (OOD) Detection
2 code implementations • 16 Aug 2021 • Rishi Bommasani, Drew A. Hudson, Ehsan Adeli, Russ Altman, Simran Arora, Sydney von Arx, Michael S. Bernstein, Jeannette Bohg, Antoine Bosselut, Emma Brunskill, Erik Brynjolfsson, Shyamal Buch, Dallas Card, Rodrigo Castellon, Niladri Chatterji, Annie Chen, Kathleen Creel, Jared Quincy Davis, Dora Demszky, Chris Donahue, Moussa Doumbouya, Esin Durmus, Stefano Ermon, John Etchemendy, Kawin Ethayarajh, Li Fei-Fei, Chelsea Finn, Trevor Gale, Lauren Gillespie, Karan Goel, Noah Goodman, Shelby Grossman, Neel Guha, Tatsunori Hashimoto, Peter Henderson, John Hewitt, Daniel E. Ho, Jenny Hong, Kyle Hsu, Jing Huang, Thomas Icard, Saahil Jain, Dan Jurafsky, Pratyusha Kalluri, Siddharth Karamcheti, Geoff Keeling, Fereshte Khani, Omar Khattab, Pang Wei Koh, Mark Krass, Ranjay Krishna, Rohith Kuditipudi, Ananya Kumar, Faisal Ladhak, Mina Lee, Tony Lee, Jure Leskovec, Isabelle Levent, Xiang Lisa Li, Xuechen Li, Tengyu Ma, Ali Malik, Christopher D. Manning, Suvir Mirchandani, Eric Mitchell, Zanele Munyikwa, Suraj Nair, Avanika Narayan, Deepak Narayanan, Ben Newman, Allen Nie, Juan Carlos Niebles, Hamed Nilforoshan, Julian Nyarko, Giray Ogut, Laurel Orr, Isabel Papadimitriou, Joon Sung Park, Chris Piech, Eva Portelance, Christopher Potts, aditi raghunathan, Rob Reich, Hongyu Ren, Frieda Rong, Yusuf Roohani, Camilo Ruiz, Jack Ryan, Christopher Ré, Dorsa Sadigh, Shiori Sagawa, Keshav Santhanam, Andy Shih, Krishnan Srinivasan, Alex Tamkin, Rohan Taori, Armin W. Thomas, Florian Tramèr, Rose E. Wang, William Wang, Bohan Wu, Jiajun Wu, Yuhuai Wu, Sang Michael Xie, Michihiro Yasunaga, Jiaxuan You, Matei Zaharia, Michael Zhang, Tianyi Zhang, Xikun Zhang, Yuhui Zhang, Lucia Zheng, Kaitlyn Zhou, Percy Liang
AI is undergoing a paradigm shift with the rise of models (e. g., BERT, DALL-E, GPT-3) that are trained on broad data at scale and are adaptable to a wide range of downstream tasks.
1 code implementation • 19 Jul 2021 • Evan Zheran Liu, Behzad Haghgoo, Annie S. Chen, aditi raghunathan, Pang Wei Koh, Shiori Sagawa, Percy Liang, Chelsea Finn
Standard training via empirical risk minimization (ERM) can produce models that achieve high accuracy on average but low accuracy on certain groups, especially in the presence of spurious correlations between the input and label.
Ranked #1 on Out-of-Distribution Generalization on ImageNet-W
1 code implementation • 12 Jul 2021 • Rodrigo Castellon, Chris Donahue, Percy Liang
Relative to representations from conventional MIR models which are pre-trained on tagging, we find that using representations from Jukebox as input features yields 30% stronger performance on average across four MIR tasks: tagging, genre classification, emotion recognition, and key detection.
Ranked #1 on Emotion Recognition on Emomusic
1 code implementation • 9 Jul 2021 • John Miller, Rohan Taori, aditi raghunathan, Shiori Sagawa, Pang Wei Koh, Vaishaal Shankar, Percy Liang, Yair Carmon, Ludwig Schmidt
For machine learning systems to be reliable, we must understand their performance in unseen, out-of-distribution environments.
1 code implementation • 11 Jun 2021 • Michihiro Yasunaga, Percy Liang
To bridge this gap, we propose a new training approach, Break-It-Fix-It (BIFI), which has two key ideas: (i) we use the critic to check a fixer's output on real bad inputs and add good (fixed) outputs to the training data, and (ii) we train a breaker to generate realistic bad code from good code.
Ranked #1 on Program Repair on DeepFix
1 code implementation • NAACL 2021 • Mina Lee, Chris Donahue, Robin Jia, Alexander Iyabor, Percy Liang
We release a new benchmark for lexical substitution, the task of finding appropriate substitutes for a target word in a context.
4 code implementations • NAACL 2021 • Michihiro Yasunaga, Hongyu Ren, Antoine Bosselut, Percy Liang, Jure Leskovec
The problem of answering questions using knowledge from pre-trained language models (LMs) and knowledge graphs (KGs) presents two challenges: given a QA context (question and answer choice), methods need to (i) identify relevant knowledge from large KGs, and (ii) perform joint reasoning over the QA context and KG.
Ranked #2 on Riddle Sense on RiddleSense
no code implementations • 1 Feb 2021 • Nelson F. Liu, Tony Lee, Robin Jia, Percy Liang
Do question answering (QA) modeling improvements (e. g., choice of architecture and training procedure) hold consistently across the diverse landscape of QA benchmarks?
10 code implementations • ACL 2021 • Xiang Lisa Li, Percy Liang
Fine-tuning is the de facto way to leverage large pretrained language models to perform downstream tasks.
6 code implementations • 14 Dec 2020 • Pang Wei Koh, Shiori Sagawa, Henrik Marklund, Sang Michael Xie, Marvin Zhang, Akshay Balsubramani, Weihua Hu, Michihiro Yasunaga, Richard Lanas Phillips, Irena Gao, Tony Lee, Etienne David, Ian Stavness, Wei Guo, Berton A. Earnshaw, Imran S. Haque, Sara Beery, Jure Leskovec, Anshul Kundaje, Emma Pierson, Sergey Levine, Chelsea Finn, Percy Liang
Distribution shifts -- where the training distribution differs from the test distribution -- can substantially degrade the accuracy of machine learning (ML) systems deployed in the wild.
1 code implementation • ICLR 2021 • Sang Michael Xie, Ananya Kumar, Robbie Jones, Fereshte Khani, Tengyu Ma, Percy Liang
To get the best of both worlds, we introduce In-N-Out, which first trains a model with auxiliary inputs and uses it to pseudolabel all the in-distribution inputs, then pre-trains a model on OOD auxiliary outputs and fine-tunes this model with the pseudolabels (self-training).
1 code implementation • 7 Dec 2020 • Fereshte Khani, Percy Liang
The presence of spurious features interferes with the goal of obtaining robust models that perform well across many groups within the population.
1 code implementation • 16 Nov 2020 • Yu Gu, Sue Kase, Michelle Vanni, Brian Sadler, Percy Liang, Xifeng Yan, Yu Su
To facilitate the development of KBQA models with stronger generalization, we construct and release a new large-scale, high-quality dataset with 64, 331 questions, GrailQA, and provide evaluation settings for all three levels of generalization.
1 code implementation • ICLR 2021 • Erik Jones, Shiori Sagawa, Pang Wei Koh, Ananya Kumar, Percy Liang
In this paper, we find that while selective classification can improve average accuracies, it can simultaneously magnify existing accuracy disparities between various groups within a population, especially in the presence of spurious correlations.
2 code implementations • NeurIPS 2020 • Sumanth Dathathri, Krishnamurthy Dvijotham, Alexey Kurakin, aditi raghunathan, Jonathan Uesato, Rudy Bunel, Shreya Shankar, Jacob Steinhardt, Ian Goodfellow, Percy Liang, Pushmeet Kohli
In this work, we propose a first-order dual SDP algorithm that (1) requires memory only linear in the total number of network activations, (2) only requires a fixed number of forward/backward passes through the network per iteration.
2 code implementations • EMNLP 2020 • John Hewitt, Michael Hahn, Surya Ganguli, Percy Liang, Christopher D. Manning
Recurrent neural networks empirically generate natural language with high syntactic fidelity.
1 code implementation • EMNLP (BlackboxNLP) 2020 • Benjamin Newman, John Hewitt, Percy Liang, Christopher D. Manning
Extrapolation to unseen sequence lengths is a challenge for neural generative models of language.
no code implementations • EMNLP (intexsempar) 2020 • Siddharth Karamcheti, Dorsa Sadigh, Percy Liang
Our goal is to create an interactive natural language interface that efficiently and reliably learns from users to complete tasks in simulated robotics settings.
1 code implementation • Findings of the Association for Computational Linguistics 2020 • Stephen Mussmann, Robin Jia, Percy Liang
Many pairwise classification tasks, such as paraphrase detection and open-domain question answering, naturally have extreme label imbalance (e. g., $99. 99\%$ of examples are negatives).
no code implementations • 28 Sep 2020 • Sang Michael Xie, Tengyu Ma, Percy Liang
We focus on prediction problems with high-dimensional outputs that are subject to output validity constraints, e. g. a pseudocode-to-code translation task where the code must compile.
1 code implementation • 24 Sep 2020 • Semantic Machines, Jacob Andreas, John Bufe, David Burkett, Charles Chen, Josh Clausman, Jean Crawford, Kate Crim, Jordan DeLoach, Leah Dorner, Jason Eisner, Hao Fang, Alan Guo, David Hall, Kristin Hayes, Kellie Hill, Diana Ho, Wendy Iwaszuk, Smriti Jha, Dan Klein, Jayant Krishnamurthy, Theo Lanman, Percy Liang, Christopher H Lin, Ilya Lintsbakh, Andy McGovern, Aleksandr Nisnevich, Adam Pauls, Dmitrij Petters, Brent Read, Dan Roth, Subhro Roy, Jesse Rusak, Beth Short, Div Slomin, Ben Snyder, Stephon Striplin, Yu Su, Zachary Tellman, Sam Thomson, Andrei Vorobev, Izabela Witoszko, Jason Wolfe, Abby Wray, Yuchen Zhang, Alexander Zotov
We describe an approach to task-oriented dialogue in which dialogue state is represented as a dataflow graph.
2 code implementations • 6 Aug 2020 • Evan Zheran Liu, aditi raghunathan, Percy Liang, Chelsea Finn
Learning a new task often requires both exploring to gather task-relevant information and exploiting this information to solve the task.
1 code implementation • ICML 2020 • Megha Srivastava, Tatsunori Hashimoto, Percy Liang
The reliability of machine learning systems critically assumes that the associations between features and labels remain similar between training and test distributions.
1 code implementation • 12 Jul 2020 • Evan Zheran Liu, Ramtin Keramati, Sudarshan Seshadri, Kelvin Guu, Panupong Pasupat, Emma Brunskill, Percy Liang
Model-based reinforcement learning (RL) is appealing because (i) it enables planning and thus more strategic exploration, and (ii) by decoupling dynamics from rewards, it enables fast transfer to new reward functions.
4 code implementations • ICML 2020 • Pang Wei Koh, Thao Nguyen, Yew Siang Tang, Stephen Mussmann, Emma Pierson, Been Kim, Percy Liang
We seek to learn models that we can interact with using high-level concepts: if the model did not think there was a bone spur in the x-ray, would it still predict severe arthritis?
2 code implementations • 29 Jun 2020 • Sang Michael Xie, Tengyu Ma, Percy Liang
Empirically, we show that composed fine-tuning improves over standard fine-tuning on two pseudocode-to-code translation datasets (3% and 6% relative).
2 code implementations • ACL 2020 • Amita Kamath, Robin Jia, Percy Liang
In this work, we propose the setting of selective question answering under domain shift, in which a QA model is tested on a mixture of in-domain and out-of-domain data, and must answer (i. e., not abstain on) as many questions as possible while maintaining high accuracy.
no code implementations • ICML Workshop LifelongML 2020 • Evan Zheran Liu, aditi raghunathan, Percy Liang, Chelsea Finn
In principle, meta-reinforcement learning approaches can exploit this shared structure, but in practice, they fail to adapt to new environments when adaptation requires targeted exploration (e. g., exploring the cabinets to find ingredients in a new kitchen).
2 code implementations • ICML 2020 • Michihiro Yasunaga, Percy Liang
Second, we present a self-supervised learning paradigm for program repair that leverages unlabeled programs available online to create a large amount of extra program repair examples, which we use to pre-train our models.
Ranked #1 on Program Synthesis on SPoC TestW
3 code implementations • ACL 2020 • Chris Donahue, Mina Lee, Percy Liang
We show that this approach, which we call infilling by language modeling, can enable LMs to infill entire sentences effectively on three different domains: short stories, scientific abstracts, and lyrics.
3 code implementations • 9 May 2020 • Shiori Sagawa, aditi raghunathan, Pang Wei Koh, Percy Liang
We study why overparameterization -- increasing model size well beyond the point of zero training error -- can hurt test error on minority groups despite improving average test error when there are spurious correlations in the data.
2 code implementations • ACL 2020 • Shikhar Murty, Pang Wei Koh, Percy Liang
Suppose we want to specify the inductive bias that married couples typically go on honeymoons for the task of extracting pairs of spouses from text.
1 code implementation • ACL 2020 • Erik Jones, Robin Jia, aditi raghunathan, Percy Liang
We instantiate RobEn to defend against a large family of adversarial typos.
1 code implementation • ICLR 2020 • Shiori Sagawa*, Pang Wei Koh*, Tatsunori B. Hashimoto, Percy Liang
Distributionally robust optimization (DRO) allows us to learn models that instead minimize the worst-case training loss over a set of pre-defined groups.
2 code implementations • ICML 2020 • Ananya Kumar, Tengyu Ma, Percy Liang
Machine learning systems must adapt to data distributions that evolve over time, in applications ranging from sensor networks and self-driving car perception modules to brain-machine interfaces.
1 code implementation • ICML 2020 • Aditi Raghunathan, Sang Michael Xie, Fanny Yang, John Duchi, Percy Liang
In this work, we precisely characterize the effect of augmentation on the standard error in linear regression when the optimal linear predictor has zero standard and robust error.
1 code implementation • ICML 2020 • Fereshte Khani, Percy Liang
Our main result is that even when there is no information deficiency specific to one group (e. g., both groups have infinite data), adding the same amount of feature noise to all individuals leads to loss discrepancy.
8 code implementations • 20 Nov 2019 • Shiori Sagawa, Pang Wei Koh, Tatsunori B. Hashimoto, Percy Liang
Distributionally robust optimization (DRO) allows us to learn models that instead minimize the worst-case training loss over a set of pre-defined groups.
Ranked #1 on Out-of-Distribution Generalization on UrbanCars
1 code implementation • 16 Nov 2019 • Mina Lee, Tatsunori B. Hashimoto, Percy Liang
We study textual autocomplete---the task of predicting a full sentence from a partial sentence---as a human-machine communication game.
2 code implementations • ACL 2020 • Jesse Mu, Percy Liang, Noah Goodman
By describing the features and abstractions of our world, language is a crucial tool for human learning and a promising source of supervision for machine learning models.
no code implementations • 25 Sep 2019 • Sang Michael Xie*, Aditi Raghunathan*, Fanny Yang, John C. Duchi, Percy Liang
Empirically, data augmentation sometimes improves and sometimes hurts test error, even when only adding points with labels from the true conditional distribution that the hypothesis class is expressive enough to fit.
3 code implementations • NeurIPS 2019 • Ananya Kumar, Percy Liang, Tengyu Ma
In these experiments, we also estimate the calibration error and ECE more accurately than the commonly used plugin estimators.
1 code implementation • IJCNLP 2019 • John Hewitt, Percy Liang
The selectivity of a probe puts linguistic task accuracy in context with the probe's capacity to memorize from word types.
1 code implementation • IJCNLP 2019 • Yonatan Oren, Shiori Sagawa, Tatsunori B. Hashimoto, Percy Liang
Language models are generally trained on data spanning a wide range of topics (e. g., news, reviews, fiction), but they might be applied to an a priori unknown target distribution (e. g., restaurant reviews).
2 code implementations • IJCNLP 2019 • Robin Jia, aditi raghunathan, Kerem Göksel, Percy Liang
We train the first models that are provably robust to all word substitutions in this family.
1 code implementation • ICLR 2020 • Cody Coleman, Christopher Yeh, Stephen Mussmann, Baharan Mirzasoleiman, Peter Bailis, Percy Liang, Jure Leskovec, Matei Zaharia
By removing hidden layers from the target model, using smaller architectures, and training for fewer epochs, we create proxies that are an order of magnitude faster to train.
no code implementations • 26 Jun 2019 • Ray Li, Percy Liang, Stephen Mussmann
The greedy algorithm's $O(\log n)$ approximation ratio was the best known, but the largest approximation ratio known to be NP-hard is $4-\varepsilon$.
no code implementations • ICML Workshop Deep_Phenomen 2019 • Aditi Raghunathan, Sang Michael Xie, Fanny Yang, John C. Duchi, Percy Liang
While adversarial training can improve robust accuracy (against an adversary), it sometimes hurts standard accuracy (when there is no adversary).
1 code implementation • NeurIPS 2019 • Sumith Kulal, Panupong Pasupat, Kartik Chandra, Mina Lee, Oded Padon, Alex Aiken, Percy Liang
Given test cases as a mechanism to validate programs, we search over the space of possible translations of the pseudocode to find a program that passes the validation.
Ranked #2 on Program Synthesis on SPoC TestP
1 code implementation • 8 Jun 2019 • Fereshte Khani, aditi raghunathan, Percy Liang
To capture this inequality, we introduce and study a notion we call maximum weighted loss discrepancy (MWLD), the maximum (weighted) difference between the loss of a group and the loss of the population.
4 code implementations • NeurIPS 2019 • Yair Carmon, aditi raghunathan, Ludwig Schmidt, Percy Liang, John C. Duchi
We demonstrate, theoretically and empirically, that adversarial robustness can significantly benefit from semisupervised learning.
2 code implementations • NeurIPS 2019 • Pang Wei Koh, Kai-Siang Ang, Hubert H. K. Teo, Percy Liang
Influence functions estimate the effect of removing a training point on a model without the need to retrain.
10 code implementations • ICLR 2020 • Weihua Hu, Bowen Liu, Joseph Gomes, Marinka Zitnik, Percy Liang, Vijay Pande, Jure Leskovec
Many applications of machine learning require a model to make accurate pre-dictions on test examples that are distributionally different from training ones, while task-specific labels are scarce during training.
Ranked #3 on Molecular Property Prediction on ToxCast
no code implementations • ICLR 2019 • Evan Zheran Liu, Ramtin Keramati, Sudarshan Seshadri, Kelvin Guu, Panupong Pasupat, Emma Brunskill, Percy Liang
In our approach, a manager maintains an abstract MDP over a subset of the abstract states, which grows monotonically through targeted exploration (possible due to the abstract MDP).
no code implementations • ICLR 2019 • Cody Coleman, Stephen Mussmann, Baharan Mirzasoleiman, Peter Bailis, Percy Liang, Jure Leskovec, Matei Zaharia
In our approach, we first train a small proxy model quickly, which we then use to estimate the utility of individual training data points, and then select the most informative ones for training the large target model.
2 code implementations • NAACL 2019 • He He, Nanyun Peng, Percy Liang
We tackle the problem of generating a pun sentence given a pair of homophones (e. g., "died" and "dyed").
2 code implementations • NAACL 2019 • Tatsunori B. Hashimoto, Hugh Zhang, Percy Liang
How can we measure whether a natural language generation system produces both high quality and diverse outputs?
1 code implementation • 25 Mar 2019 • Yuchen Zhang, Percy Liang
Adversarial perturbations dramatically decrease the accuracy of state-of-the-art image classifiers.
1 code implementation • NeurIPS 2018 • Stephen Mussmann, Percy Liang
Uncertainty sampling, a popular active learning algorithm, is used to reduce the amount of data required to learn a classifier, but it has been observed in practice to converge to different parameters depending on the initialization and sometimes to even better parameters than standard training on all the data.
1 code implementation • NeurIPS 2018 • Tatsunori B. Hashimoto, Kelvin Guu, Yonatan Oren, Percy Liang
For the task of generating complex outputs such as source code, editing existing outputs can be easier than generating complex outputs from scratch.
2 code implementations • 13 Nov 2018 • Kensen Shi, Jacob Steinhardt, Percy Liang
We present FrAngel, a new approach to component-based synthesis that can synthesize short Java functions with control structures when given a desired signature, a set of input-output examples, and a collection of libraries (without formal specifications).
Programming Languages
2 code implementations • 2 Nov 2018 • Pang Wei Koh, Jacob Steinhardt, Percy Liang
In this paper, we develop three attacks that can bypass a broad range of common data sanitization defenses, including anomaly detectors based on nearest neighbors, training loss, and singular-value decomposition.
3 code implementations • NeurIPS 2018 • Aditi Raghunathan, Jacob Steinhardt, Percy Liang
One promise of ending the arms race is developing certified defenses, ones which are provably robust against all attackers in some family.
no code implementations • EMNLP 2018 • Eunsol Choi, He He, Mohit Iyyer, Mark Yatskar, Wen-tau Yih, Yejin Choi, Percy Liang, Luke Zettlemoyer
We present QuAC, a dataset for Question Answering in Context that contains 14K information-seeking QA dialogs (100K questions in total).
2 code implementations • 9 Sep 2018 • Dorottya Demszky, Kelvin Guu, Percy Liang
Existing datasets for natural language inference (NLI) have propelled research on language understanding.
2 code implementations • EMNLP 2018 • Matthew Lamm, Arun Tejasvi Chaganty, Christopher D. Manning, Dan Jurafsky, Percy Liang
To understand a sentence like "whereas only 10% of White Americans live at or below the poverty line, 28% of African Americans do" it is important not only to identify individual facts, e. g., poverty rates of distinct demographic groups, but also the higher-order relations between them, e. g., the disparity between them.
2 code implementations • EMNLP 2018 • He He, Derek Chen, Anusha Balakrishnan, Percy Liang
We consider negotiation settings in which two agents use natural language to bargain on goods.
2 code implementations • EMNLP 2018 • Panupong Pasupat, Tian-Shun Jiang, Evan Zheran Liu, Kelvin Guu, Percy Liang
The web provides a rich, open-domain environment with textual, structural, and spatial properties.
no code implementations • 21 Aug 2018 • Eunsol Choi, He He, Mohit Iyyer, Mark Yatskar, Wen-tau Yih, Yejin Choi, Percy Liang, Luke Zettlemoyer
We present QuAC, a dataset for Question Answering in Context that contains 14K information-seeking QA dialogs (100K questions in total).
1 code implementation • 12 Jul 2018 • Emma Pierson, Pang Wei Koh, Tatsunori Hashimoto, Daphne Koller, Jure Leskovec, Nicholas Eriksson, Percy Liang
Motivated by the study of human aging, we present an interpretable latent-variable model that learns temporal dynamics from cross-sectional data.
1 code implementation • 6 Jul 2018 • Arun Tejasvi Chaganty, Stephen Mussman, Percy Liang
For evaluating generation systems, automatic metrics such as BLEU cost nothing to run but have been shown to correlate poorly with human judgment, leading to systematic bias against certain model improvements.
no code implementations • ACL 2018 • Arun Chaganty, Stephen Mussmann, Percy Liang
For evaluating generation systems, automatic metrics such as BLEU cost nothing to run but have been shown to correlate poorly with human judgment, leading to systematic bias against certain model improvements.
1 code implementation • ICML 2018 • Tatsunori B. Hashimoto, Megha Srivastava, Hongseok Namkoong, Percy Liang
Machine learning models (e. g., speech recognizers) are usually trained to minimize average loss, which results in representation disparity---minority groups (e. g., non-native speakers) contribute less to the training objective and thus tend to suffer higher loss.
1 code implementation • ICML 2018 • Stephen Mussmann, Percy Liang
While active learning offers potential cost savings, the actual data efficiency---the reduction in amount of labeled data needed to obtain the same error rate---observed in practice is mixed.
12 code implementations • ACL 2018 • Pranav Rajpurkar, Robin Jia, Percy Liang
Extractive reading comprehension systems can often locate the correct answer to a question in a context document, but they also tend to make unreliable guesses on questions for which the correct answer is not stated in the context.
1 code implementation • TACL 2018 • Fereshte Khani, Noah D. Goodman, Percy Liang
We study sequential language games in which two players, each with private information, communicate to achieve a common goal.
2 code implementations • ACL 2018 • Braden Hancock, Paroma Varma, Stephanie Wang, Martin Bringmann, Percy Liang, Christopher Ré
Training accurate classifiers requires many labels, but each label provides only limited information (one bit for binary classification).
6 code implementations • NAACL 2018 • Juncen Li, Robin Jia, He He, Percy Liang
We consider the task of text attribute transfer: transforming a sentence to alter a specific attribute (e. g., sentiment) while preserving its attribute-independent content (e. g., changing "screen is just the right size" to "screen is too small").
Ranked #1 on Unsupervised Text Style Transfer on Yelp2018
no code implementations • 27 Feb 2018 • Stephen Mussmann, Percy Liang
In sequential hypothesis testing, Generalized Binary Search (GBS) greedily chooses the test with the highest information gain at each step.
4 code implementations • ICLR 2018 • Evan Zheran Liu, Kelvin Guu, Panupong Pasupat, Tianlin Shi, Percy Liang
Reinforcement learning (RL) agents improve through trial-and-error, but when reward is sparse and the agent cannot discover successful action sequences, learning stagnates.
6 code implementations • ICLR 2019 • Daniel Selsam, Matthew Lamm, Benedikt Bünz, Percy Liang, Leonardo de Moura, David L. Dill
We present NeuroSAT, a message passing neural network that learns to solve SAT problems after only being trained as a classifier to predict satisfiability.
4 code implementations • ICLR 2018 • Aditi Raghunathan, Jacob Steinhardt, Percy Liang
While neural networks have achieved high accuracy on standard image classification benchmarks, their accuracy drops to nearly zero in the presence of small adversarial perturbations to test inputs.
no code implementations • NeurIPS 2017 • Vatsal Sharan, Sham Kakade, Percy Liang, Gregory Valiant
On the other hand, we show that learning is impossible given only a polynomial number of samples for HMMs with a small output alphabet and whose transition matrices are random regular graphs with large degree.
1 code implementation • NeurIPS 2017 • Tatsunori B. Hashimoto, John C. Duchi, Percy Liang
Our goal is to extract meaningful transformations from raw images, such as varying the thickness of lines in handwriting or the lighting in a portrait.
3 code implementations • TACL 2018 • Kelvin Guu, Tatsunori B. Hashimoto, Yonatan Oren, Percy Liang
We propose a new generative model of sentences that first samples a prototype sentence from the training corpus and then edits it into a new sentence.
no code implementations • EMNLP 2017 • Arun Chaganty, Ashwin Paranjape, Percy Liang, Christopher D. Manning
Our first contribution is a new importance-sampling based evaluation which corrects for this bias by annotating a new system{'}s predictions on-demand via crowdsourcing.
no code implementations • ICML 2017 • Tianlin Shi, Andrej Karpathy, Linxi Fan, Jonathan Hernandez, Percy Liang
While simulated game environments have greatly accelerated research in reinforcement learning, existing environments lack the open-domain realism of tasks in computer vision or natural language processing, which operate on artifacts created by humans in natural, organic settings.
2 code implementations • EMNLP 2017 • Yuchen Zhang, Panupong Pasupat, Percy Liang
To learn a semantic parser from denotations, a learning algorithm must search over a combinatorially large space of logical forms for ones consistent with the annotated denotations.
3 code implementations • EMNLP 2017 • Robin Jia, Percy Liang
Standard accuracy metrics indicate that reading comprehension systems are making rapid progress, but the extent to which these systems truly understand language remains unclear.
1 code implementation • ICML 2017 • Daniel Selsam, Percy Liang, David L. Dill
As a case study, we implement a new system, Certigrad, for optimizing over stochastic computation graphs, and we generate a formal (i. e. machine-checkable) proof that the gradients sampled by the system are unbiased estimates of the true mathematical gradients.
2 code implementations • NeurIPS 2017 • Jacob Steinhardt, Pang Wei Koh, Percy Liang
Machine learning systems trained on user-provided data are susceptible to data poisoning attacks, whereby malicious users inject false training data with the aim of corrupting the learned model.
3 code implementations • ACL 2017 • Kelvin Guu, Panupong Pasupat, Evan Zheran Liu, Percy Liang
Our goal is to learn a semantic parser that maps natural language utterances into executable programs when only indirect supervision is available: examples are labeled with the correct execution result, but not the program itself.
2 code implementations • ACL 2017 • He He, Anusha Balakrishnan, Mihail Eric, Percy Liang
To model both structured knowledge and unstructured language, we propose a neural model with dynamic knowledge graph embeddings that evolve as the dialogue progresses.
1 code implementation • ACL 2017 • Sida I. Wang, Samuel Ginn, Percy Liang, Christoper D. Manning
Our goal is to create a convenient natural language interface for performing well-specified but complex actions such as analyzing data, manipulating text, and querying databases.
19 code implementations • ICML 2017 • Pang Wei Koh, Percy Liang
How can we explain the predictions of a black-box model?
no code implementations • 18 Feb 2017 • Yuchen Zhang, Percy Liang, Moses Charikar
We study the Stochastic Gradient Langevin Dynamics (SGLD) algorithm for non-convex optimization.
no code implementations • 8 Dec 2016 • Vatsal Sharan, Sham Kakade, Percy Liang, Gregory Valiant
For a Hidden Markov Model with $n$ hidden states, $I$ is bounded by $\log n$, a quantity that does not depend on the mixing time, and we show that the trivial prediction algorithm based on the empirical frequencies of length $O(\log n/\epsilon)$ windows of observations achieves this error, provided the length of the sequence is $d^{\Omega(\log n/\epsilon)}$, where $d$ is the size of the observation alphabet.
1 code implementation • ICML 2017 • Yuchen Zhang, Percy Liang, Martin J. Wainwright
For learning two-layer convolutional neural networks, we prove that the generalization error obtained by a convexified CNN converges to that of the best possible CNN.
1 code implementation • ACL 2016 • Arun Tejasvi Chaganty, Percy Liang
We then propose a system to generate these descriptions consisting of two steps: formula construction and description generation.
1 code implementation • 10 Aug 2016 • Aditi Raghunathan, Roy Frostig, John Duchi, Percy Liang
In structured prediction problems where we have indirect supervision of the output, maximum marginal likelihood faces two computational obstacles: non-convexity of the objective and intractability of even a single gradient computation.
1 code implementation • 5 Aug 2016 • Osbert Bastani, Rahul Sharma, Alex Aiken, Percy Liang
We present an algorithm for synthesizing a context-free grammar encoding the language of valid program inputs from a set of input examples and blackbox access to the program.
Programming Languages
2 code implementations • ACL 2016 • Panupong Pasupat, Percy Liang
A core problem in learning semantic parsers from denotations is picking out consistent logical forms--those that yield the correct denotation--from a combinatorially large space.
1 code implementation • 20 Jun 2016 • Fereshte Khani, Martin Rinard, Percy Liang
Specifically, we introduce the unanimity principle: only predict when all models consistent with the training data predict the same output.
no code implementations • NeurIPS 2016 • Jacob Steinhardt, Percy Liang
We show how to estimate a model's test error from unlabeled data, on distributions very different from the training distribution, while assuming only that certain conditional independencies are preserved between train and test.
1 code implementation • ACL 2016 • Reginald Long, Panupong Pasupat, Percy Liang
With only denotations at training time, we must search over a combinatorially large space of logical forms, which is even larger with context-dependent utterances.
19 code implementations • EMNLP 2016 • Pranav Rajpurkar, Jian Zhang, Konstantin Lopyrev, Percy Liang
We present the Stanford Question Answering Dataset (SQuAD), a new reading comprehension dataset consisting of 100, 000+ questions posed by crowdworkers on a set of Wikipedia articles, where the answer to each question is a segment of text from the corresponding reading passage.
1 code implementation • ACL 2016 • Robin Jia, Percy Liang
Modeling crisp logical regularities is crucial in semantic parsing, making it difficult for neural models with no task-specific prior knowledge to achieve good results.
3 code implementations • ACL 2016 • Sida I. Wang, Percy Liang, Christopher D. Manning
We introduce a new language learning setting relevant to building adaptive natural language interfaces.
3 code implementations • NeurIPS 2015 • Sida I. Wang, Arun Tejasvi Chaganty, Percy Liang
This framework allows us to draw insights and apply tools from convex optimization, computer algebra and the theory of moments to study problems in statistical estimation.
no code implementations • 22 Mar 2016 • Percy Liang
For building question answering systems and natural language interfaces, semantic parsing has emerged as an important and powerful paradigm.
1 code implementation • 21 Mar 2016 • Stefan Wager, William Fithian, Percy Liang
The framework imagines data as being drawn from a slice of a Levy process.
4 code implementations • IJCNLP 2015 • Panupong Pasupat, Percy Liang
Two important aspects of semantic parsing for question answering are the breadth of the knowledge source and the depth of logical compositionality.
2 code implementations • NeurIPS 2015 • Keenon Werling, Arun Chaganty, Percy Liang, Chris Manning
Our goal is to deploy a high-accuracy system starting with zero training examples.
2 code implementations • EMNLP 2015 • Kelvin Guu, John Miller, Percy Liang
Path queries on a knowledge graph can be used to answer compositional questions such as "What languages are spoken by people living in Lisbon?".