Search Results for author: Tengyu Ma

Found 117 papers, 41 papers with code

Entity and Evidence Guided Document-Level Relation Extraction

no code implementations ACL (RepL4NLP) 2021 Kevin Huang, Peng Qi, Guangtao Wang, Tengyu Ma, Jing Huang

In this paper, we propose a novel framework E2GRE (Entity and Evidence Guided Relation Extraction) that jointly extracts relations and the underlying evidence sentences by using large pretrained language model (LM) as input encoder.

Document-level Relation Extraction Language Modelling +1

Linguistic Calibration of Language Models

no code implementations30 Mar 2024 Neil Band, Xuechen Li, Tengyu Ma, Tatsunori Hashimoto

Our results demonstrate that long-form generations may be calibrated end-to-end by constructing an objective in the space of the predictions that users make in downstream decision-making.

Decision Making Question Answering

Chain of Thought Empowers Transformers to Solve Inherently Serial Problems

no code implementations20 Feb 2024 Zhiyuan Li, Hong Liu, Denny Zhou, Tengyu Ma

Given input length $n$, previous works have shown that constant-depth transformers with finite precision $\mathsf{poly}(n)$ embedding size can only solve problems in $\mathsf{TC}^0$ without CoT.


One Step of Gradient Descent is Provably the Optimal In-Context Learner with One Layer of Linear Self-Attention

no code implementations7 Jul 2023 Arvind Mahankali, Tatsunori B. Hashimoto, Tengyu Ma

Then, we find that changing the distribution of the covariates and weight vector to a non-isotropic Gaussian distribution has a strong impact on the learned algorithm: the global minimizer of the pre-training loss now implements a single step of $\textit{pre-conditioned}$ GD.

In-Context Learning regression

The Inductive Bias of Flatness Regularization for Deep Matrix Factorization

no code implementations22 Jun 2023 Khashayar Gatmiry, Zhiyuan Li, Ching-Yao Chuang, Sashank Reddi, Tengyu Ma, Stefanie Jegelka

Recent works on over-parameterized neural networks have shown that the stochasticity in optimizers has the implicit regularization effect of minimizing the sharpness of the loss function (in particular, the trace of its Hessian) over the family zero-loss solutions.

Inductive Bias

Large Language Models as Tool Makers

1 code implementation26 May 2023 Tianle Cai, Xuezhi Wang, Tengyu Ma, Xinyun Chen, Denny Zhou

Our approach consists of two phases: 1) tool making: an LLM acts as the tool maker that crafts tools for a set of tasks.

Sophia: A Scalable Stochastic Second-order Optimizer for Language Model Pre-training

3 code implementations23 May 2023 Hong Liu, Zhiyuan Li, David Hall, Percy Liang, Tengyu Ma

Given the massive cost of language model pre-training, a non-trivial improvement of the optimization algorithm would lead to a material reduction on the time and cost of training.

Language Modelling Stochastic Optimization

DoReMi: Optimizing Data Mixtures Speeds Up Language Model Pretraining

2 code implementations NeurIPS 2023 Sang Michael Xie, Hieu Pham, Xuanyi Dong, Nan Du, Hanxiao Liu, Yifeng Lu, Percy Liang, Quoc V. Le, Tengyu Ma, Adams Wei Yu

The mixture proportions of pretraining data domains (e. g., Wikipedia, books, web text) greatly affect language model (LM) performance.

Language Modelling

Symbol tuning improves in-context learning in language models

no code implementations15 May 2023 Jerry Wei, Le Hou, Andrew Lampinen, Xiangning Chen, Da Huang, Yi Tay, Xinyun Chen, Yifeng Lu, Denny Zhou, Tengyu Ma, Quoc V. Le

We present symbol tuning - finetuning language models on in-context input-label pairs where natural language labels (e. g., "positive/negative sentiment") are replaced with arbitrary symbols (e. g., "foo/bar").

In-Context Learning

Toward $L_\infty$-recovery of Nonlinear Functions: A Polynomial Sample Complexity Bound for Gaussian Random Fields

no code implementations29 Apr 2023 Kefan Dong, Tengyu Ma

Our key technical novelty is to prove that the degree-$k$ spherical harmonics components of a function from Gaussian random field cannot be spiky in that their $L_\infty$/$L_2$ ratios are upperbounded by $O(d \sqrt{\ln k})$ with high probability.

Larger language models do in-context learning differently

no code implementations7 Mar 2023 Jerry Wei, Jason Wei, Yi Tay, Dustin Tran, Albert Webson, Yifeng Lu, Xinyun Chen, Hanxiao Liu, Da Huang, Denny Zhou, Tengyu Ma

We next study semantically-unrelated label ICL (SUL-ICL), in which labels are semantically unrelated to their inputs (e. g., foo/bar instead of negative/positive), thereby forcing language models to learn the input-label mappings shown in in-context exemplars in order to perform the task.

In-Context Learning

Data Selection for Language Models via Importance Resampling

1 code implementation NeurIPS 2023 Sang Michael Xie, Shibani Santurkar, Tengyu Ma, Percy Liang

To measure whether hashed n-gram features preserve the aspects of the data that are relevant to the target, we define KL reduction, a data metric that measures the proximity between the selected pretraining data and the target on some feature space.

What learning algorithm is in-context learning? Investigations with linear models

no code implementations28 Nov 2022 Ekin Akyürek, Dale Schuurmans, Jacob Andreas, Tengyu Ma, Denny Zhou

We investigate the hypothesis that transformer-based in-context learners implement standard learning algorithms implicitly, by encoding smaller models in their activations, and updating these implicit models as new examples appear in the context.

In-Context Learning regression

First Steps Toward Understanding the Extrapolation of Nonlinear Models to Unseen Domains

no code implementations21 Nov 2022 Kefan Dong, Tengyu Ma

The question is very challenging because even two-layer neural networks cannot be guaranteed to extrapolate outside the support of the training distribution without further assumptions on the domain shift.

How Does Sharpness-Aware Minimization Minimize Sharpness?

no code implementations10 Nov 2022 Kaiyue Wen, Tengyu Ma, Zhiyuan Li

SAM intends to penalize a notion of sharpness of the model but implements a computationally efficient variant; moreover, a third notion of sharpness was used for proving generalization guarantees.

Same Pre-training Loss, Better Downstream: Implicit Bias Matters for Language Models

no code implementations25 Oct 2022 Hong Liu, Sang Michael Xie, Zhiyuan Li, Tengyu Ma

Toward understanding this implicit bias, we prove that SGD with standard mini-batch noise implicitly prefers flatter minima in language models, and empirically observe a strong correlation between flatness and downstream performance among models with the same minimal pre-training loss.

Language Modelling

Calibrated ensembles can mitigate accuracy tradeoffs under distribution shift

no code implementations18 Jul 2022 Ananya Kumar, Tengyu Ma, Percy Liang, aditi raghunathan

We often see undesirable tradeoffs in robust machine learning where out-of-distribution (OOD) accuracy is at odds with in-distribution (ID) accuracy: a robust classifier obtained via specialized techniques such as removing spurious features often has better OOD but worse ID accuracy compared to a standard classifier trained via ERM.

Max-Margin Works while Large Margin Fails: Generalization without Uniform Convergence

no code implementations16 Jun 2022 Margalit Glasgow, Colin Wei, Mary Wootters, Tengyu Ma

Nagarajan and Kolter (2019) show that in certain simple linear and neural-network settings, any uniform convergence bound will be vacuous, leaving open the question of how to prove generalization in settings where UC fails.

Generalization Bounds Memorization

Asymptotic Instance-Optimal Algorithms for Interactive Decision Making

no code implementations6 Jun 2022 Kefan Dong, Tengyu Ma

Past research on interactive decision making problems (bandits, reinforcement learning, etc.)

Decision Making Multi-Armed Bandits +2

Near-Optimal Algorithms for Autonomous Exploration and Multi-Goal Stochastic Shortest Path

no code implementations22 May 2022 Haoyuan Cai, Tengyu Ma, Simon Du

In particular, the lower bound implies that our proposed algorithm, Value-Aware Autonomous Exploration, is nearly minimax-optimal when the number of $L$-controllable states grows polynomially with respect to $L$.

Toward Fast, Flexible, and Robust Low-Light Image Enhancement

1 code implementation CVPR 2022 Long Ma, Tengyu Ma, Risheng Liu, Xin Fan, Zhongxuan Luo

Existing low-light image enhancement techniques are mostly not only difficult to deal with both visual quality and computational efficiency but also commonly invalid in unknown complex scenarios.

Computational Efficiency Face Detection +2

Beyond Separability: Analyzing the Linear Transferability of Contrastive Representations to Related Subpopulations

no code implementations6 Apr 2022 Jeff Z. HaoChen, Colin Wei, Ananya Kumar, Tengyu Ma

In particular, a linear classifier trained to separate the representations on the source domain can also predict classes on the target domain accurately, even though the representations of the two domains are far from each other.

Contrastive Learning Unsupervised Domain Adaptation

Connect, Not Collapse: Explaining Contrastive Learning for Unsupervised Domain Adaptation

no code implementations1 Apr 2022 Kendrick Shen, Robbie Jones, Ananya Kumar, Sang Michael Xie, Jeff Z. HaoChen, Tengyu Ma, Percy Liang

We consider unsupervised domain adaptation (UDA), where labeled data from a source domain (e. g., photographs) and unlabeled data from a target domain (e. g., sketches) are used to learn a classifier for the target domain.

Contrastive Learning Unsupervised Domain Adaptation

Fine-Tuning can Distort Pretrained Features and Underperform Out-of-Distribution

3 code implementations21 Feb 2022 Ananya Kumar, aditi raghunathan, Robbie Jones, Tengyu Ma, Percy Liang

However, in this paper, we find that fine-tuning can achieve worse accuracy than linear probing out-of-distribution (OOD) when the pretrained features are good and the distribution shift is large.

Safe Reinforcement Learning by Imagining the Near Future

1 code implementation NeurIPS 2021 Garrett Thomas, Yuping Luo, Tengyu Ma

Safe reinforcement learning is a promising path toward applying reinforcement learning algorithms to real-world problems, where suboptimal behaviors may lead to actual negative consequences.

Continuous Control reinforcement-learning +2

Learning with Nested Scene Modeling and Cooperative Architecture Search for Low-Light Vision

1 code implementation9 Dec 2021 Risheng Liu, Long Ma, Tengyu Ma, Xin Fan, Zhongxuan Luo

To partially address above issues, we establish Retinex-inspired Unrolling with Architecture Search (RUAS), a general learning framework, which not only can address low-light enhancement task, but also has the flexibility to handle other more challenging downstream vision applications.

Rolling Shutter Correction

DR3: Value-Based Deep Reinforcement Learning Requires Explicit Regularization

no code implementations ICLR 2022 Aviral Kumar, Rishabh Agarwal, Tengyu Ma, Aaron Courville, George Tucker, Sergey Levine

In this paper, we discuss how the implicit regularization effect of SGD seen in supervised learning could in fact be harmful in the offline deep RL setting, leading to poor generalization and degenerate feature representations.

Atari Games D4RL +3

Plan Better Amid Conservatism: Offline Multi-Agent Reinforcement Learning with Actor Rectification

1 code implementation22 Nov 2021 Ling Pan, Longbo Huang, Tengyu Ma, Huazhe Xu

Conservatism has led to significant progress in offline reinforcement learning (RL) where an agent learns from pre-collected datasets.

Continuous Control Multi-agent Reinforcement Learning +3

Sharp Bounds for Federated Averaging (Local SGD) and Continuous Perspective

1 code implementation5 Nov 2021 Margalit Glasgow, Honglin Yuan, Tengyu Ma

In this work, we first resolve this question by providing a lower bound for FedAvg that matches the existing upper bound, which shows the existing FedAvg upper bound analysis is not improvable.

Federated Learning

Self-supervised Learning is More Robust to Dataset Imbalance

1 code implementation ICLR 2022 Hong Liu, Jeff Z. HaoChen, Adrien Gaidon, Tengyu Ma

Third, inspired by the theoretical insights, we devise a re-weighted regularization technique that consistently improves the SSL representation quality on imbalanced datasets with several evaluation criteria, closing the small gap between balanced and imbalanced datasets with the same number of examples.

Long-tail Learning Self-Supervised Learning

Calibrated ensembles - a simple way to mitigate ID-OOD accuracy tradeoffs

no code implementations29 Sep 2021 Ananya Kumar, aditi raghunathan, Tengyu Ma, Percy Liang

We often see undesirable tradeoffs in robust machine learning where out-of-distribution (OOD) accuracy is at odds with in-distribution (ID) accuracy.

Statistically Meaningful Approximation: a Theoretical Analysis for Approximating Turing Machines with Transformers

no code implementations29 Sep 2021 Colin Wei, Yining Chen, Tengyu Ma

A common lens to theoretically study neural net architectures is to analyze the functions they can approximate.

On the Opportunities and Risks of Foundation Models

2 code implementations16 Aug 2021 Rishi Bommasani, Drew A. Hudson, Ehsan Adeli, Russ Altman, Simran Arora, Sydney von Arx, Michael S. Bernstein, Jeannette Bohg, Antoine Bosselut, Emma Brunskill, Erik Brynjolfsson, Shyamal Buch, Dallas Card, Rodrigo Castellon, Niladri Chatterji, Annie Chen, Kathleen Creel, Jared Quincy Davis, Dora Demszky, Chris Donahue, Moussa Doumbouya, Esin Durmus, Stefano Ermon, John Etchemendy, Kawin Ethayarajh, Li Fei-Fei, Chelsea Finn, Trevor Gale, Lauren Gillespie, Karan Goel, Noah Goodman, Shelby Grossman, Neel Guha, Tatsunori Hashimoto, Peter Henderson, John Hewitt, Daniel E. Ho, Jenny Hong, Kyle Hsu, Jing Huang, Thomas Icard, Saahil Jain, Dan Jurafsky, Pratyusha Kalluri, Siddharth Karamcheti, Geoff Keeling, Fereshte Khani, Omar Khattab, Pang Wei Koh, Mark Krass, Ranjay Krishna, Rohith Kuditipudi, Ananya Kumar, Faisal Ladhak, Mina Lee, Tony Lee, Jure Leskovec, Isabelle Levent, Xiang Lisa Li, Xuechen Li, Tengyu Ma, Ali Malik, Christopher D. Manning, Suvir Mirchandani, Eric Mitchell, Zanele Munyikwa, Suraj Nair, Avanika Narayan, Deepak Narayanan, Ben Newman, Allen Nie, Juan Carlos Niebles, Hamed Nilforoshan, Julian Nyarko, Giray Ogut, Laurel Orr, Isabel Papadimitriou, Joon Sung Park, Chris Piech, Eva Portelance, Christopher Potts, aditi raghunathan, Rob Reich, Hongyu Ren, Frieda Rong, Yusuf Roohani, Camilo Ruiz, Jack Ryan, Christopher Ré, Dorsa Sadigh, Shiori Sagawa, Keshav Santhanam, Andy Shih, Krishnan Srinivasan, Alex Tamkin, Rohan Taori, Armin W. Thomas, Florian Tramèr, Rose E. Wang, William Wang, Bohan Wu, Jiajun Wu, Yuhuai Wu, Sang Michael Xie, Michihiro Yasunaga, Jiaxuan You, Matei Zaharia, Michael Zhang, Tianyi Zhang, Xikun Zhang, Yuhui Zhang, Lucia Zheng, Kaitlyn Zhou, Percy Liang

AI is undergoing a paradigm shift with the rise of models (e. g., BERT, DALL-E, GPT-3) that are trained on broad data at scale and are adaptable to a wide range of downstream tasks.

Transfer Learning

Learning Barrier Certificates: Towards Safe Reinforcement Learning with Zero Training-time Violations

1 code implementation NeurIPS 2021 Yuping Luo, Tengyu Ma

This paper explores the possibility of safe RL algorithms with zero training-time safety violations in the challenging setting where we are only given a safe but trivial-reward initial policy without any prior knowledge of the dynamics model and additional offline data.

reinforcement-learning Reinforcement Learning (RL) +1

Statistically Meaningful Approximation: a Case Study on Approximating Turing Machines with Transformers

no code implementations28 Jul 2021 Colin Wei, Yining Chen, Tengyu Ma

A common lens to theoretically study neural net architectures is to analyze the functions they can approximate.

Generalization Bounds

Calibrating Predictions to Decisions: A Novel Approach to Multi-Class Calibration

no code implementations NeurIPS 2021 Shengjia Zhao, Michael P. Kim, Roshni Sahoo, Tengyu Ma, Stefano Ermon

In this work, we introduce a new notion -- \emph{decision calibration} -- that requires the predicted distribution and true distribution to be ``indistinguishable'' to a set of downstream decision-makers.

Decision Making

Iterative Feature Matching: Toward Provable Domain Generalization with Logarithmic Environments

no code implementations18 Jun 2021 Yining Chen, Elan Rosenfeld, Mark Sellke, Tengyu Ma, Andrej Risteski

Domain generalization aims at performing well on unseen test environments with data from a limited number of training environments.

Domain Generalization

Why Do Pretrained Language Models Help in Downstream Tasks? An Analysis of Head and Prompt Tuning

1 code implementation NeurIPS 2021 Colin Wei, Sang Michael Xie, Tengyu Ma

The generative model in our analysis is either a Hidden Markov Model (HMM) or an HMM augmented with a latent memory component, motivated by long-term dependencies in natural language.

Task 2

Label Noise SGD Provably Prefers Flat Global Minimizers

no code implementations NeurIPS 2021 Alex Damian, Tengyu Ma, Jason D. Lee

In overparametrized models, the noise in stochastic gradient descent (SGD) implicitly regularizes the optimization trajectory and determines which local minimum SGD converges to.

Joint System-Wise Optimization for Pipeline Goal-Oriented Dialog System

no code implementations9 Jun 2021 Zichuan Lin, Jing Huang, BoWen Zhou, Xiaodong He, Tengyu Ma

Recent work (Takanobu et al., 2020) proposed the system-wise evaluation on dialog systems and found that improvement on individual components (e. g., NLU, policy) in prior work may not necessarily bring benefit to pipeline systems in system-wise evaluation.

Data Augmentation Goal-Oriented Dialog

Provable Guarantees for Self-Supervised Deep Learning with Spectral Contrastive Loss

1 code implementation NeurIPS 2021 Jeff Z. HaoChen, Colin Wei, Adrien Gaidon, Tengyu Ma

Despite the empirical successes, theoretical foundations are limited -- prior analyses assume conditional independence of the positive pairs given the same class label, but recent empirical applications use heavily correlated positive pairs (i. e., data augmentations of the same image).

Contrastive Learning Generalization Bounds +1

Why Do Local Methods Solve Nonconvex Problems?

no code implementations24 Mar 2021 Tengyu Ma

Non-convex optimization is ubiquitous in modern machine learning.

BIG-bench Machine Learning

Fine-Grained Gap-Dependent Bounds for Tabular MDPs via Adaptive Multi-Step Bootstrap

no code implementations9 Feb 2021 Haike Xu, Tengyu Ma, Simon S. Du

We further show that for general MDPs, AMB suffers an additional $\frac{|Z_{mul}|}{\Delta_{min}}$ regret, where $Z_{mul}$ is the set of state-action pairs $(s, a)$'s satisfying $a$ is a non-unique optimal action for $s$.

Multi-Armed Bandits

Improved Uncertainty Post-Calibration via Rank Preserving Transforms

no code implementations1 Jan 2021 Yu Bai, Tengyu Ma, Huan Wang, Caiming Xiong

In this paper, we propose Neural Rank Preserving Transforms (NRPT), a new post-calibration method that adjusts the output probabilities of a trained classifier using a calibrator of higher capacity, while maintaining its prediction accuracy.

text-classification Text Classification

In-N-Out: Pre-Training and Self-Training using Auxiliary Information for Out-of-Distribution Robustness

1 code implementation ICLR 2021 Sang Michael Xie, Ananya Kumar, Robbie Jones, Fereshte Khani, Tengyu Ma, Percy Liang

To get the best of both worlds, we introduce In-N-Out, which first trains a model with auxiliary inputs and uses it to pseudolabel all the in-distribution inputs, then pre-trains a model on OOD auxiliary outputs and fine-tunes this model with the pseudolabels (self-training).

Time Series Time Series Analysis +1

Meta-learning Transferable Representations with a Single Target Domain

no code implementations3 Nov 2020 Hong Liu, Jeff Z. HaoChen, Colin Wei, Tengyu Ma

Recent works found that fine-tuning and joint training---two popular approaches for transfer learning---do not always improve accuracy on downstream tasks.

Meta-Learning Representation Learning +1

Beyond Lazy Training for Over-parameterized Tensor Decomposition

no code implementations NeurIPS 2020 Xiang Wang, Chenwei Wu, Jason D. Lee, Tengyu Ma, Rong Ge

We show that in a lazy training regime (similar to the NTK regime for neural networks) one needs at least $m = \Omega(d^{l-1})$, while a variant of gradient descent can find an approximate tensor when $m = O^*(r^{2. 5l}\log d)$.

Tensor Decomposition

Document-Level Relation Extraction with Adaptive Thresholding and Localized Context Pooling

1 code implementation21 Oct 2020 Wenxuan Zhou, Kevin Huang, Tengyu Ma, Jing Huang

In this paper, we propose two novel techniques, adaptive thresholding and localized context pooling, to solve the multi-label and multi-entity problems.

Document-level Relation Extraction Multi-Label Classification +2

Theoretical Analysis of Self-Training with Deep Networks on Unlabeled Data

no code implementations ICLR 2021 Colin Wei, Kendrick Shen, Yining Chen, Tengyu Ma

Self-training algorithms, which train a model to fit pseudolabels predicted by another previously-learned model, have been very successful for learning with unlabeled data using neural networks.

Generalization Bounds Unsupervised Domain Adaptation

Simplifying Models with Unlabeled Output Data

no code implementations28 Sep 2020 Sang Michael Xie, Tengyu Ma, Percy Liang

We focus on prediction problems with high-dimensional outputs that are subject to output validity constraints, e. g. a pseudocode-to-code translation task where the code must compile.

Code Translation Image Generation +2

Entity and Evidence Guided Relation Extraction for DocRED

no code implementations27 Aug 2020 Kevin Huang, Guangtao Wang, Tengyu Ma, Jing Huang

Document-level relation extraction is a challenging task which requires reasoning over multiple sentences in order to predict relations in a document.

Document-level Relation Extraction Language Modelling +1

Heteroskedastic and Imbalanced Deep Learning with Adaptive Regularization

1 code implementation ICLR 2021 Kaidi Cao, Yining Chen, Junwei Lu, Nikos Arechiga, Adrien Gaidon, Tengyu Ma

Real-world large-scale datasets are heteroskedastic and imbalanced -- labels have varying levels of uncertainty and label distributions are long-tailed.

Image Classification

Composed Fine-Tuning: Freezing Pre-Trained Denoising Autoencoders for Improved Generalization

2 code implementations29 Jun 2020 Sang Michael Xie, Tengyu Ma, Percy Liang

Empirically, we show that composed fine-tuning improves over standard fine-tuning on two pseudocode-to-code translation datasets (3% and 6% relative).

Code Translation Denoising +2

Active Online Learning with Hidden Shifting Domains

no code implementations25 Jun 2020 Yining Chen, Haipeng Luo, Tengyu Ma, Chicheng Zhang

We propose a surprisingly simple algorithm that adaptively balances its regret and its number of label queries in settings where the data streams are from a mixture of hidden domains.

Domain Adaptation regression

Individual Calibration with Randomized Forecasting

no code implementations ICML 2020 Shengjia Zhao, Tengyu Ma, Stefano Ermon

We show that calibration for individual samples is possible in the regression setup if the predictions are randomized, i. e. outputting randomized credible intervals.

Decision Making Fairness +1

Self-training Avoids Using Spurious Features Under Domain Shift

no code implementations NeurIPS 2020 Yining Chen, Colin Wei, Ananya Kumar, Tengyu Ma

In unsupervised domain adaptation, existing theory focuses on situations where the source and target domains are close.

Unsupervised Domain Adaptation

Model-based Adversarial Meta-Reinforcement Learning

1 code implementation NeurIPS 2020 Zichuan Lin, Garrett Thomas, Guangwen Yang, Tengyu Ma

When the test task distribution is different from the training task distribution, the performance may degrade significantly.

Continuous Control Meta Reinforcement Learning +2

Federated Accelerated Stochastic Gradient Descent

1 code implementation NeurIPS 2020 Honglin Yuan, Tengyu Ma

We propose Federated Accelerated Stochastic Gradient Descent (FedAc), a principled acceleration of Federated Averaging (FedAvg, also known as Local SGD) for distributed optimization.

Distributed Optimization

Shape Matters: Understanding the Implicit Bias of the Noise Covariance

1 code implementation15 Jun 2020 Jeff Z. HaoChen, Colin Wei, Jason D. Lee, Tengyu Ma

We show that in an over-parameterized setting, SGD with label noise recovers the sparse ground-truth with an arbitrary initialization, whereas SGD with Gaussian noise or gradient descent overfits to dense solutions with large norms.

Active Online Domain Adaptation

no code implementations ICML Workshop LifelongML 2020 Yining Chen, Haipeng Luo, Tengyu Ma, Chicheng Zhang

We propose a surprisingly simple algorithm that adaptively balances its regret and its number of label queries in settings where the data streams are from a mixture of hidden domains.

Online Domain Adaptation regression

Improved Sample Complexities for Deep Neural Networks and Robust Classification via an All-Layer Margin

no code implementations ICLR 2020 Colin Wei, Tengyu Ma

For linear classifiers, the relationship between (normalized) output margin and generalization is captured in a clear and simple bound – a large output margin implies good generalization.

Generalization Bounds Robust classification

Robust and On-the-fly Dataset Denoising for Image Classification

no code implementations ECCV 2020 Jiaming Song, Lunjia Hu, Michael Auli, Yann Dauphin, Tengyu Ma

We address this problem by reasoning counterfactually about the loss distribution of examples with uniform random labels had they were trained with the real examples, and use this information to remove noisy examples from the training set.

Classification counterfactual +4

Optimal Regularization Can Mitigate Double Descent

no code implementations ICLR 2021 Preetum Nakkiran, Prayaag Venkat, Sham Kakade, Tengyu Ma

Recent empirical and theoretical studies have shown that many learning algorithms -- from linear regression to neural networks -- can have test performance that is non-monotonic in quantities such the sample size and model size.


The Implicit and Explicit Regularization Effects of Dropout

1 code implementation ICML 2020 Colin Wei, Sham Kakade, Tengyu Ma

This implicit regularization effect is analogous to the effect of stochasticity in small mini-batch stochastic gradient descent.

Understanding Self-Training for Gradual Domain Adaptation

2 code implementations ICML 2020 Ananya Kumar, Tengyu Ma, Percy Liang

Machine learning systems must adapt to data distributions that evolve over time, in applications ranging from sensor networks and self-driving car perception modules to brain-machine interfaces.

Unsupervised Domain Adaptation

Variable-Viewpoint Representations for 3D Object Recognition

no code implementations8 Feb 2020 Tengyu Ma, Joel Michelson, James Ainooson, Deepayan Sanyal, Xiaohan Wang, Maithilee Kunda

For the problem of 3D object recognition, researchers using deep learning methods have developed several very different input representations, including "multi-view" snapshots taken from discrete viewpoints around an object, as well as "spherical" representations consisting of a dense map of essentially ray-traced samples of the object from all directions.

3D Object Recognition Object

On the Expressivity of Neural Networks for Deep Reinforcement Learning

1 code implementation ICML 2020 Kefan Dong, Yuping Luo, Tengyu Ma

We compare the model-free reinforcement learning with the model-based approaches through the lens of the expressive power of neural networks for policies, $Q$-functions, and dynamics.

reinforcement-learning Reinforcement Learning (RL)

Improved Sample Complexities for Deep Networks and Robust Classification via an All-Layer Margin

1 code implementation9 Oct 2019 Colin Wei, Tengyu Ma

Unfortunately, for deep models, this relationship is less clear: existing analyses of the output margin give complicated bounds which sometimes depend exponentially on depth.

General Classification Generalization Bounds +1

Bootstrapping the Expressivity with Model-based Planning

1 code implementation25 Sep 2019 Kefan Dong, Yuping Luo, Tengyu Ma

We compare the model-free reinforcement learning with the model-based approaches through the lens of the expressive power of neural networks for policies, $Q$-functions, and dynamics.

Verified Uncertainty Calibration

3 code implementations NeurIPS 2019 Ananya Kumar, Percy Liang, Tengyu Ma

In these experiments, we also estimate the calibration error and ECE more accurately than the commonly used plugin estimators.

Weather Forecasting

Learning Self-Correctable Policies and Value Functions from Demonstrations with Negative Sampling

1 code implementation ICLR 2020 Yuping Luo, Huazhe Xu, Tengyu Ma

Imitation learning, followed by reinforcement learning algorithms, is a promising paradigm to solve complex control tasks sample-efficiently.

Imitation Learning reinforcement-learning +1

Towards Explaining the Regularization Effect of Initial Large Learning Rate in Training Neural Networks

2 code implementations NeurIPS 2019 Yuanzhi Li, Colin Wei, Tengyu Ma

This concept translates to a larger-scale setting: we demonstrate that one can add a small patch to CIFAR-10 images that is immediately memorizable by a model with small initial learning rate, but ignored by the model with large learning rate until after annealing.

Learning Imbalanced Datasets with Label-Distribution-Aware Margin Loss

7 code implementations NeurIPS 2019 Kaidi Cao, Colin Wei, Adrien Gaidon, Nikos Arechiga, Tengyu Ma

Deep learning algorithms can fare poorly when the training dataset suffers from heavy class-imbalance but the testing criterion requires good generalization on less frequent classes.

Long-tail learning with class descriptors

On the Performance of Thompson Sampling on Logistic Bandits

no code implementations12 May 2019 Shi Dong, Tengyu Ma, Benjamin Van Roy

Specifically, we establish that, when the set of feasible actions is identical to the set of possible coefficient vectors, the Bayesian regret of Thompson sampling is $\tilde{O}(d\sqrt{T})$.

Thompson Sampling

Data-dependent Sample Complexity of Deep Neural Networks via Lipschitz Augmentation

1 code implementation NeurIPS 2019 Colin Wei, Tengyu Ma

For feedforward neural nets as well as RNNs, we obtain tighter Rademacher complexity bounds by considering additional data-dependent properties of the network: the norms of the hidden layers of the network, and the norms of the Jacobians of each layer with respect to all previous layers.

On the Margin Theory of Feedforward Neural Networks

no code implementations ICLR 2019 Colin Wei, Jason Lee, Qiang Liu, Tengyu Ma

We establish: 1) for multi-layer feedforward relu networks, the global minimizer of a weakly-regularized cross-entropy loss has the maximum normalized margin among all networks, 2) as a result, increasing the over-parametrization improves the normalized margin and generalization error bounds for deep networks.

Better Generalization with On-the-fly Dataset Denoising

no code implementations ICLR 2019 Jiaming Song, Tengyu Ma, Michael Auli, Yann Dauphin

Memorization in over-parameterized neural networks can severely hurt generalization in the presence of mislabeled examples.

Denoising Memorization

Explaining Adversarial Examples with Knowledge Representation

no code implementations ICLR 2019 Xingyu Zhou, Tengyu Ma, Huahong Zhang

This paper, in contrast, discusses the origin of adversarial examples from a more underlying knowledge representation point of view.

Regularization Matters: Generalization and Optimization of Neural Nets v.s. their Induced Kernel

no code implementations NeurIPS 2019 Colin Wei, Jason D. Lee, Qiang Liu, Tengyu Ma

We prove that for infinite-width two-layer nets, noisy gradient descent optimizes the regularized neural net loss to a global minimum in polynomial iterations.

Approximability of Discriminators Implies Diversity in GANs

no code implementations ICLR 2019 Yu Bai, Tengyu Ma, Andrej Risteski

Our preliminary experiments show that on synthetic datasets the test IPM is well correlated with KL divergence or the Wasserstein distance, indicating that the lack of diversity in GANs may be caused by the sub-optimality in optimization instead of statistical inefficiency.

The Toybox Dataset of Egocentric Visual Object Transformations

no code implementations15 Jun 2018 Xiaohan Wang, Tengyu Ma, James Ainooson, Seunghwan Cha, Xiaotian Wang, Azhar Molla, Maithilee Kunda

In object recognition research, many commonly used datasets (e. g., ImageNet and similar) contain relatively sparse distributions of object instances and views, e. g., one might see a thousand different pictures of a thousand different giraffes, mostly taken from a few conventionally photographed angles.

Object Object Recognition +1

A La Carte Embedding: Cheap but Effective Induction of Semantic Feature Vectors

1 code implementation ACL 2018 Mikhail Khodak, Nikunj Saunshi, YIngyu Liang, Tengyu Ma, Brandon Stewart, Sanjeev Arora

Motivations like domain adaptation, transfer learning, and feature learning have fueled interest in inducing embeddings for rare or unseen words, n-grams, synsets, and other textual features.

Document Classification Domain Adaptation +2

Algorithmic Regularization in Over-parameterized Matrix Sensing and Neural Networks with Quadratic Activations

no code implementations26 Dec 2017 Yuanzhi Li, Tengyu Ma, Hongyang Zhang

We show that the gradient descent algorithm provides an implicit regularization effect in the learning of over-parameterized matrix factorization models and one-hidden-layer neural networks with quadratic activations.

On the Optimization Landscape of Tensor Decompositions

no code implementations NeurIPS 2017 Rong Ge, Tengyu Ma

The landscape of many objective functions in learning has been conjectured to have the geometric property that "all local optima are (approximately) global optima", and thus they can be solved efficiently by local search algorithms.

Tensor Decomposition

Generalization and Equilibrium in Generative Adversarial Nets (GANs)

1 code implementation ICML 2017 Sanjeev Arora, Rong Ge, YIngyu Liang, Tengyu Ma, Yi Zhang

We show that training of generative adversarial network (GAN) may not have good generalization properties; e. g., training may appear successful but the trained distribution may be far from target distribution in standard metrics.

Generative Adversarial Network

On the ability of neural nets to express distributions

no code implementations22 Feb 2017 Holden Lee, Rong Ge, Tengyu Ma, Andrej Risteski, Sanjeev Arora

We take a first cut at explaining the expressivity of multilayer nets by giving a sufficient criterion for a function to be approximable by a neural network with $n$ hidden layers.

Provable learning of Noisy-or Networks

no code implementations28 Dec 2016 Sanjeev Arora, Rong Ge, Tengyu Ma, Andrej Risteski

Many machine learning applications use latent variable models to explain structure in data, whereby visible variables (= coordinates of the given datapoint) are explained as a probabilistic function of some hidden variables.

Tensor Decomposition Topic Models

Identity Matters in Deep Learning

no code implementations14 Nov 2016 Moritz Hardt, Tengyu Ma

An emerging design principle in deep learning is that each layer of a deep artificial neural network should be able to easily express the identity transformation.

Finding Approximate Local Minima Faster than Gradient Descent

1 code implementation3 Nov 2016 Naman Agarwal, Zeyuan Allen-Zhu, Brian Bullins, Elad Hazan, Tengyu Ma

We design a non-convex second-order optimization algorithm that is guaranteed to return an approximate local minimum in time which scales linearly in the underlying dimension and the number of training examples.

BIG-bench Machine Learning

Polynomial-time Tensor Decompositions with Sum-of-Squares

no code implementations6 Oct 2016 Tengyu Ma, Jonathan Shi, David Steurer

We give new algorithms based on the sum-of-squares method for tensor decomposition.

Tensor Decomposition

A Non-generative Framework and Convex Relaxations for Unsupervised Learning

no code implementations NeurIPS 2016 Elad Hazan, Tengyu Ma

We give a novel formal theoretical framework for unsupervised learning with two distinctive characteristics.

Gradient Descent Learns Linear Dynamical Systems

no code implementations16 Sep 2016 Moritz Hardt, Tengyu Ma, Benjamin Recht

We prove that stochastic gradient descent efficiently converges to the global optimizer of the maximum likelihood objective of an unknown linear time-invariant dynamical system from a sequence of noisy observations generated by the system.

Provable Algorithms for Inference in Topic Models

no code implementations27 May 2016 Sanjeev Arora, Rong Ge, Frederic Koehler, Tengyu Ma, Ankur Moitra

But designing provable algorithms for inference has proven to be more challenging.

Topic Models

Matrix Completion has No Spurious Local Minimum

no code implementations NeurIPS 2016 Rong Ge, Jason D. Lee, Tengyu Ma

Matrix completion is a basic machine learning problem that has wide applications, especially in collaborative filtering and recommender systems.

Collaborative Filtering Matrix Completion +1

Linear Algebraic Structure of Word Senses, with Applications to Polysemy

1 code implementation TACL 2018 Sanjeev Arora, Yuanzhi Li, YIngyu Liang, Tengyu Ma, Andrej Risteski

A novel aspect of our technique is that each extracted word sense is accompanied by one of about 2000 "discourse atoms" that gives a succinct description of which other words co-occur with that word sense.

Information Retrieval Retrieval +1

Why are deep nets reversible: A simple theory, with implications for training

no code implementations18 Nov 2015 Sanjeev Arora, YIngyu Liang, Tengyu Ma

Under this assumption ---which is experimentally tested on real-life nets like AlexNet--- it is formally proved that feed forward net is a correct inference method for recovering the hidden layer.


Distributed Stochastic Variance Reduced Gradient Methods and A Lower Bound for Communication Complexity

no code implementations27 Jul 2015 Jason D. Lee, Qihang Lin, Tengyu Ma, Tianbao Yang

We also prove a lower bound for the number of rounds of communication for a broad class of distributed first-order methods including the proposed algorithms in this paper.

Distributed Optimization

Sum-of-Squares Lower Bounds for Sparse PCA

no code implementations NeurIPS 2015 Tengyu Ma, Avi Wigderson

It was also known that this quadratic gap cannot be improved by the the most basic {\em semi-definite} (SDP, aka spectral) relaxation, equivalent to a degree-2 SoS algorithms.

BIG-bench Machine Learning

Communication Lower Bounds for Statistical Estimation Problems via a Distributed Data Processing Inequality

no code implementations24 Jun 2015 Mark Braverman, Ankit Garg, Tengyu Ma, Huy L. Nguyen, David P. Woodruff

We study the tradeoff between the statistical error and communication cost of distributed statistical estimation problems in high dimensions.

Decomposing Overcomplete 3rd Order Tensors using Sum-of-Squares Algorithms

no code implementations21 Apr 2015 Rong Ge, Tengyu Ma

We also give a polynomial time algorithm for certifying the injective norm of random low rank tensors.

Tensor Decomposition

Simple, Efficient, and Neural Algorithms for Sparse Coding

no code implementations2 Mar 2015 Sanjeev Arora, Rong Ge, Tengyu Ma, Ankur Moitra

Its standard formulation is as a non-convex optimization problem which is solved in practice by heuristics based on alternating minimization.

A Latent Variable Model Approach to PMI-based Word Embeddings

4 code implementations TACL 2016 Sanjeev Arora, Yuanzhi Li, YIngyu Liang, Tengyu Ma, Andrej Risteski

Semantic word embeddings represent the meaning of a word via a vector, and are created by diverse methods.

Word Embeddings

On Communication Cost of Distributed Statistical Estimation and Dimensionality

no code implementations NeurIPS 2014 Ankit Garg, Tengyu Ma, Huy L. Nguyen

We conjecture that the tradeoff between communication and squared loss demonstrated by this protocol is essentially optimal up to logarithmic factor.

More Algorithms for Provable Dictionary Learning

no code implementations3 Jan 2014 Sanjeev Arora, Aditya Bhaskara, Rong Ge, Tengyu Ma

In dictionary learning, also known as sparse coding, the algorithm is given samples of the form $y = Ax$ where $x\in \mathbb{R}^m$ is an unknown random sparse vector and $A$ is an unknown dictionary matrix in $\mathbb{R}^{n\times m}$ (usually $m > n$, which is the overcomplete case).

Dictionary Learning

Provable Bounds for Learning Some Deep Representations

no code implementations23 Oct 2013 Sanjeev Arora, Aditya Bhaskara, Rong Ge, Tengyu Ma

The analysis of the algorithm reveals interesting structure of neural networks with random edge weights.

Cannot find the paper you are looking for? You can Submit a new open access paper.