Search Results for author: Yuping Luo

Found 14 papers, 8 papers with code

Safe Reinforcement Learning by Imagining the Near Future

1 code implementation NeurIPS 2021 Garrett Thomas, Yuping Luo, Tengyu Ma

Safe reinforcement learning is a promising path toward applying reinforcement learning algorithms to real-world problems, where suboptimal behaviors may lead to actual negative consequences.

Continuous Control reinforcement-learning +1

Learning Barrier Certificates: Towards Safe Reinforcement Learning with Zero Training-time Violations

1 code implementation NeurIPS 2021 Yuping Luo, Tengyu Ma

This paper explores the possibility of safe RL algorithms with zero training-time safety violations in the challenging setting where we are only given a safe but trivial-reward initial policy without any prior knowledge of the dynamics model and additional offline data.

reinforcement-learning Safe Reinforcement Learning

Towards Resolving the Implicit Bias of Gradient Descent for Matrix Factorization: Greedy Low-Rank Learning

no code implementations ICLR 2021 Zhiyuan Li, Yuping Luo, Kaifeng Lyu

Matrix factorization is a simple and natural test-bed to investigate the implicit regularization of gradient descent.

Provable Representation Learning for Imitation Learning via Bi-level Optimization

no code implementations ICML 2020 Sanjeev Arora, Simon S. Du, Sham Kakade, Yuping Luo, Nikunj Saunshi

We formulate representation learning as a bi-level optimization problem where the "outer" optimization tries to learn the joint representation and the "inner" optimization encodes the imitation learning setup and tries to learn task-specific parameters.

Imitation Learning Representation Learning

Provably Efficient Q-learning with Function Approximation via Distribution Shift Error Checking Oracle

no code implementations NeurIPS 2019 Simon S. Du, Yuping Luo, Ruosong Wang, Hanrui Zhang

Though the idea of using function approximation was proposed at least 60 years ago, even in the simplest setup, i. e, approximating Q-functions with linear functions, it is still an open problem how to design a provably efficient algorithm that learns a near-optimal policy.

Q-Learning reinforcement-learning

On the Expressivity of Neural Networks for Deep Reinforcement Learning

1 code implementation ICML 2020 Kefan Dong, Yuping Luo, Tengyu Ma

We compare the model-free reinforcement learning with the model-based approaches through the lens of the expressive power of neural networks for policies, $Q$-functions, and dynamics.

reinforcement-learning

Bootstrapping the Expressivity with Model-based Planning

1 code implementation25 Sep 2019 Kefan Dong, Yuping Luo, Tengyu Ma

We compare the model-free reinforcement learning with the model-based approaches through the lens of the expressive power of neural networks for policies, $Q$-functions, and dynamics.

Learning Self-Correctable Policies and Value Functions from Demonstrations with Negative Sampling

1 code implementation ICLR 2020 Yuping Luo, Huazhe Xu, Tengyu Ma

Imitation learning, followed by reinforcement learning algorithms, is a promising paradigm to solve complex control tasks sample-efficiently.

Imitation Learning reinforcement-learning

Provably Efficient $Q$-learning with Function Approximation via Distribution Shift Error Checking Oracle

no code implementations14 Jun 2019 Simon S. Du, Yuping Luo, Ruosong Wang, Hanrui Zhang

Though the idea of using function approximation was proposed at least 60 years ago, even in the simplest setup, i. e, approximating $Q$-functions with linear functions, it is still an open problem on how to design a provably efficient algorithm that learns a near-optimal policy.

Q-Learning reinforcement-learning

Implicit Regularization in Deep Matrix Factorization

1 code implementation NeurIPS 2019 Sanjeev Arora, Nadav Cohen, Wei Hu, Yuping Luo

Efforts to understand the generalization mystery in deep learning have led to the belief that gradient-based optimization induces a form of implicit regularization, a bias towards models of low "complexity."

Matrix Completion

An online sequence-to-sequence model for noisy speech recognition

no code implementations16 Jun 2017 Chung-Cheng Chiu, Dieterich Lawson, Yuping Luo, George Tucker, Kevin Swersky, Ilya Sutskever, Navdeep Jaitly

This is because the models require that the entirety of the input sequence be available at the beginning of inference, an assumption that is not valid for instantaneous speech recognition.

Noisy Speech Recognition

Learning Online Alignments with Continuous Rewards Policy Gradient

no code implementations3 Aug 2016 Yuping Luo, Chung-Cheng Chiu, Navdeep Jaitly, Ilya Sutskever

Though capable and easy to use, they require that the entirety of the input sequence is available at the beginning of inference, an assumption that is not valid for instantaneous translation and speech recognition.

Machine Translation Question Answering +2

Cannot find the paper you are looking for? You can Submit a new open access paper.