Search Results for author: Enlu Zhou

Found 12 papers, 2 papers with code

A Diffusion Approximation Theory of Momentum SGD in Nonconvex Optimization

no code implementations • 14 Feb 2018 • Tianyi Liu, Zhehui Chen, Enlu Zhou, Tuo Zhao

Our theoretical discovery partially corroborates the empirical success of MSGD in training deep neural networks.

Bayesian Inference Dimensionality Reduction +1

Paper
Add Code

Towards Understanding Acceleration Tradeoff between Momentum and Asynchrony in Nonconvex Stochastic Optimization

no code implementations • NeurIPS 2018 • Tianyi Liu, Shiyang Li, Jianping Shi, Enlu Zhou, Tuo Zhao

Asynchronous momentum stochastic gradient descent algorithms (Async-MSGD) is one of the most popular algorithms in distributed machine learning.

Stochastic Optimization

Paper
Add Code

Towards Understanding the Importance of Noise in Training Neural Networks

no code implementations • 7 Sep 2019 • Mo Zhou, Tianyi Liu, Yan Li, Dachao Lin, Enlu Zhou, Tuo Zhao

Numerous empirical evidence has corroborated that the noise plays a crucial rule in effective and efficient training of neural networks.

Paper
Add Code

Towards Understanding the Importance of Shortcut Connections in Residual Networks

no code implementations • NeurIPS 2019 • Tianyi Liu, Minshuo Chen, Mo Zhou, Simon S. Du, Enlu Zhou, Tuo Zhao

We show, however, that gradient descent combined with proper normalization, avoids being trapped by the spurious local optimum, and converges to a global optimum in polynomial time, when the weight of the first layer is initialized at 0, and that of the second layer is initialized arbitrarily in a ball.

Paper
Add Code

Bayesian Optimization of Risk Measures

2 code implementations • NeurIPS 2020 • Sait Cakmak, Raul Astudillo, Peter Frazier, Enlu Zhou

We consider Bayesian optimization of objective functions of the form $\rho[ F(x, W) ]$, where $F$ is a black-box expensive-to-evaluate function and $\rho$ denotes either the VaR or CVaR risk measure, computed with respect to the randomness induced by the environmental random variable $W$.

Bayesian Optimization Decision Making +2

Paper
Code

Noisy Gradient Descent Converges to Flat Minima for Nonconvex Matrix Factorization

no code implementations • 24 Feb 2021 • Tianyi Liu, Yan Li, Song Wei, Enlu Zhou, Tuo Zhao

Numerous empirical evidences have corroborated the importance of noise in nonconvex optimization problems.

Paper
Add Code

Bayesian Risk Markov Decision Processes

no code implementations • 4 Jun 2021 • Yifan Lin, Yuxuan Ren, Enlu Zhou

We consider finite-horizon Markov Decision Processes where parameters, such as transition probabilities, are unknown and estimated from data.

Paper
Add Code

Noise Regularizes Over-parameterized Rank One Matrix Recovery, Provably

no code implementations • 7 Feb 2022 • Tianyi Liu, Yan Li, Enlu Zhou, Tuo Zhao

We investigate the role of noise in optimization algorithms for learning over-parameterized models.

Paper
Add Code

Robust Multi-Objective Bayesian Optimization Under Input Noise

1 code implementation • 15 Feb 2022 • Samuel Daulton, Sait Cakmak, Maximilian Balandat, Michael A. Osborne, Enlu Zhou, Eytan Bakshy

In many manufacturing processes, the design parameters are subject to random input noise, resulting in a product that is often less performant than expected.

Bayesian Optimization

Paper
Code

Risk-averse Contextual Multi-armed Bandit Problem with Linear Payoffs

no code implementations • 24 Jun 2022 • Yifan Lin, Yuhao Wang, Enlu Zhou

In particular, we consider mean-variance as the risk criterion, and the best arm is the one with the largest mean-variance reward.

Thompson Sampling

Paper
Add Code

Approximate Bilevel Difference Convex Programming for Bayesian Risk Markov Decision Processes

no code implementations • 26 Jan 2023 • Yifan Lin, Enlu Zhou

We consider infinite-horizon Markov Decision Processes where parameters, such as transition probabilities, are unknown and estimated from data.

Paper
Add Code

Reusing Historical Trajectories in Natural Policy Gradient via Importance Sampling: Convergence and Convergence Rate

no code implementations • 1 Mar 2024 • Yifan Lin, Yuhao Wang, Enlu Zhou

The efficient utilization of historical trajectories obtained from previous policies is essential for expediting policy optimization.

Policy Gradient Methods

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.