Search Results for author: Sang Michael Xie

Found 21 papers, 14 papers with code

DoReMi: Optimizing Data Mixtures Speeds Up Language Model Pretraining

1 code implementation17 May 2023 Sang Michael Xie, Hieu Pham, Xuanyi Dong, Nan Du, Hanxiao Liu, Yifeng Lu, Percy Liang, Quoc V. Le, Tengyu Ma, Adams Wei Yu

On the GLaM dataset, DoReMi, which has no knowledge of downstream tasks, even matches the performance of using domain weights tuned on downstream tasks.

Language Modelling

Reward Design with Language Models

1 code implementation27 Feb 2023 Minae Kwon, Sang Michael Xie, Kalesha Bullard, Dorsa Sadigh

During training, the LLM evaluates an RL agent's behavior against the desired behavior described by the prompt and outputs a corresponding reward signal.

Language Modelling Large Language Model +1

Data Selection for Language Models via Importance Resampling

no code implementations6 Feb 2023 Sang Michael Xie, Shibani Santurkar, Tengyu Ma, Percy Liang

From this observation, we present Data Selection with Importance Resampling (DSIR), an efficient and scalable algorithm that estimates importance weights in a reduced feature space (e. g., n-gram features in our instantiation) and selects data with importance resampling according to these weights.

Same Pre-training Loss, Better Downstream: Implicit Bias Matters for Language Models

no code implementations25 Oct 2022 Hong Liu, Sang Michael Xie, Zhiyuan Li, Tengyu Ma

Toward understanding this implicit bias, we prove that SGD with standard mini-batch noise implicitly prefers flatter minima in language models, and empirically observe a strong correlation between flatness and downstream performance among models with the same minimal pre-training loss.

Language Modelling

Connect, Not Collapse: Explaining Contrastive Learning for Unsupervised Domain Adaptation

no code implementations1 Apr 2022 Kendrick Shen, Robbie Jones, Ananya Kumar, Sang Michael Xie, Jeff Z. HaoChen, Tengyu Ma, Percy Liang

We consider unsupervised domain adaptation (UDA), where labeled data from a source domain (e. g., photographs) and unlabeled data from a target domain (e. g., sketches) are used to learn a classifier for the target domain.

Contrastive Learning Unsupervised Domain Adaptation

Extending the WILDS Benchmark for Unsupervised Adaptation

1 code implementation ICLR 2022 Shiori Sagawa, Pang Wei Koh, Tony Lee, Irena Gao, Sang Michael Xie, Kendrick Shen, Ananya Kumar, Weihua Hu, Michihiro Yasunaga, Henrik Marklund, Sara Beery, Etienne David, Ian Stavness, Wei Guo, Jure Leskovec, Kate Saenko, Tatsunori Hashimoto, Sergey Levine, Chelsea Finn, Percy Liang

Unlabeled data can be a powerful point of leverage for mitigating these distribution shifts, as it is frequently much more available than labeled data and can often be obtained from distributions beyond the source distribution as well.

Ensembles and Cocktails: Robust Finetuning for Natural Language Generation

no code implementations29 Sep 2021 John Hewitt, Xiang Lisa Li, Sang Michael Xie, Benjamin Newman, Percy Liang

When finetuning a pretrained language model for natural language generation tasks, one is currently faced with a tradeoff.

Language Modelling Text Generation

How does Contrastive Pre-training Connect Disparate Domains?

no code implementations29 Sep 2021 Kendrick Shen, Robbie Matthew Jones, Ananya Kumar, Sang Michael Xie, Percy Liang

We develop a conceptual model for contrastive learning under domain shifts, where data augmentations form connections between classes and domains that can be far apart.

Contrastive Learning Unsupervised Domain Adaptation

On the Opportunities and Risks of Foundation Models

3 code implementations16 Aug 2021 Rishi Bommasani, Drew A. Hudson, Ehsan Adeli, Russ Altman, Simran Arora, Sydney von Arx, Michael S. Bernstein, Jeannette Bohg, Antoine Bosselut, Emma Brunskill, Erik Brynjolfsson, Shyamal Buch, Dallas Card, Rodrigo Castellon, Niladri Chatterji, Annie Chen, Kathleen Creel, Jared Quincy Davis, Dora Demszky, Chris Donahue, Moussa Doumbouya, Esin Durmus, Stefano Ermon, John Etchemendy, Kawin Ethayarajh, Li Fei-Fei, Chelsea Finn, Trevor Gale, Lauren Gillespie, Karan Goel, Noah Goodman, Shelby Grossman, Neel Guha, Tatsunori Hashimoto, Peter Henderson, John Hewitt, Daniel E. Ho, Jenny Hong, Kyle Hsu, Jing Huang, Thomas Icard, Saahil Jain, Dan Jurafsky, Pratyusha Kalluri, Siddharth Karamcheti, Geoff Keeling, Fereshte Khani, Omar Khattab, Pang Wei Koh, Mark Krass, Ranjay Krishna, Rohith Kuditipudi, Ananya Kumar, Faisal Ladhak, Mina Lee, Tony Lee, Jure Leskovec, Isabelle Levent, Xiang Lisa Li, Xuechen Li, Tengyu Ma, Ali Malik, Christopher D. Manning, Suvir Mirchandani, Eric Mitchell, Zanele Munyikwa, Suraj Nair, Avanika Narayan, Deepak Narayanan, Ben Newman, Allen Nie, Juan Carlos Niebles, Hamed Nilforoshan, Julian Nyarko, Giray Ogut, Laurel Orr, Isabel Papadimitriou, Joon Sung Park, Chris Piech, Eva Portelance, Christopher Potts, aditi raghunathan, Rob Reich, Hongyu Ren, Frieda Rong, Yusuf Roohani, Camilo Ruiz, Jack Ryan, Christopher Ré, Dorsa Sadigh, Shiori Sagawa, Keshav Santhanam, Andy Shih, Krishnan Srinivasan, Alex Tamkin, Rohan Taori, Armin W. Thomas, Florian Tramèr, Rose E. Wang, William Wang, Bohan Wu, Jiajun Wu, Yuhuai Wu, Sang Michael Xie, Michihiro Yasunaga, Jiaxuan You, Matei Zaharia, Michael Zhang, Tianyi Zhang, Xikun Zhang, Yuhui Zhang, Lucia Zheng, Kaitlyn Zhou, Percy Liang

AI is undergoing a paradigm shift with the rise of models (e. g., BERT, DALL-E, GPT-3) that are trained on broad data at scale and are adaptable to a wide range of downstream tasks.

Transfer Learning

Why Do Pretrained Language Models Help in Downstream Tasks? An Analysis of Head and Prompt Tuning

1 code implementation NeurIPS 2021 Colin Wei, Sang Michael Xie, Tengyu Ma

The generative model in our analysis is either a Hidden Markov Model (HMM) or an HMM augmented with a latent memory component, motivated by long-term dependencies in natural language.

In-N-Out: Pre-Training and Self-Training using Auxiliary Information for Out-of-Distribution Robustness

1 code implementation ICLR 2021 Sang Michael Xie, Ananya Kumar, Robbie Jones, Fereshte Khani, Tengyu Ma, Percy Liang

To get the best of both worlds, we introduce In-N-Out, which first trains a model with auxiliary inputs and uses it to pseudolabel all the in-distribution inputs, then pre-trains a model on OOD auxiliary outputs and fine-tunes this model with the pseudolabels (self-training).

Time Series Time Series Analysis +1

Simplifying Models with Unlabeled Output Data

no code implementations28 Sep 2020 Sang Michael Xie, Tengyu Ma, Percy Liang

We focus on prediction problems with high-dimensional outputs that are subject to output validity constraints, e. g. a pseudocode-to-code translation task where the code must compile.

Code Translation Image Generation +1

Composed Fine-Tuning: Freezing Pre-Trained Denoising Autoencoders for Improved Generalization

2 code implementations29 Jun 2020 Sang Michael Xie, Tengyu Ma, Percy Liang

Empirically, we show that composed fine-tuning improves over standard fine-tuning on two pseudocode-to-code translation datasets (3% and 6% relative).

Code Translation Denoising +2

Understanding and Mitigating the Tradeoff Between Robustness and Accuracy

1 code implementation ICML 2020 Aditi Raghunathan, Sang Michael Xie, Fanny Yang, John Duchi, Percy Liang

In this work, we precisely characterize the effect of augmentation on the standard error in linear regression when the optimal linear predictor has zero standard and robust error.


Adversarial Training Can Hurt Generalization

no code implementations ICML Workshop Deep_Phenomen 2019 Aditi Raghunathan, Sang Michael Xie, Fanny Yang, John C. Duchi, Percy Liang

While adversarial training can improve robust accuracy (against an adversary), it sometimes hurts standard accuracy (when there is no adversary).

Reparameterizable Subset Sampling via Continuous Relaxations

1 code implementation29 Jan 2019 Sang Michael Xie, Stefano Ermon

Many machine learning tasks require sampling a subset of items from a collection based on a parameterized distribution.

feature selection Stochastic Optimization

Cannot find the paper you are looking for? You can Submit a new open access paper.