Search Results for author: Zeyu Zheng

Found 17 papers, 4 papers with code

When Demands Evolve Larger and Noisier: Learning and Earning in a Growing Environment

no code implementations ICML 2020 Feng Zhu, Zeyu Zheng

Finally, we consider an analogous non-stationary setting in the canonical multi-armed bandit problem, and points out that the \textit{any-time} situation and the \textit{fixed-time} situation render the same optimal regret order in a simple form, in contrast to the dynamic pricing problem.

Gradient-Free Methods for Deterministic and Stochastic Nonsmooth Nonconvex Optimization

no code implementations12 Sep 2022 Tianyi Lin, Zeyu Zheng, Michael I. Jordan

Nonsmooth nonconvex optimization problems broadly emerge in machine learning and business decision making, whereas two core challenges impede the development of efficient solution methods with finite-time convergence guarantee: the lack of computationally tractable optimality criterion and the lack of computationally powerful oracles.

Decision Making

A Simple and Optimal Policy Design with Safety against Heavy-tailed Risk for Multi-armed Bandits

no code implementations7 Jun 2022 David Simchi-Levi, Zeyu Zheng, Feng Zhu

With the aim to ensure safety against such heavy-tailed risk, starting from the two-armed bandit setting, we provide a simple policy design that (i) has the worst-case optimality for the expected regret at order $\tilde O(\sqrt{T})$ and (ii) has the worst-case tail probability of incurring a linear regret decay at an optimal exponential rate $\exp(-\Omega(\sqrt{T}))$.

Multi-Armed Bandits online learning

GrASP: Gradient-Based Affordance Selection for Planning

no code implementations8 Feb 2022 Vivek Veeriah, Zeyu Zheng, Richard Lewis, Satinder Singh

Our empirical work shows that it is feasible to learn to select both primitive-action and option affordances, and that simultaneously learning to select affordances and planning with a learned value-equivalent model can outperform model-free RL.

Selecting the Best Optimizing System

1 code implementation9 Jan 2022 Nian Si, Zeyu Zheng

An SBOS problem compares different systems based on their expected performances under their own optimally chosen decision to select the best, without advance knowledge of expected performances of the systems nor the optimizing decision inside each system.

Stochastic $L^\natural$-convex Function Minimization

no code implementations NeurIPS 2021 Haixiang Zhang, Zeyu Zheng, Javad Lavaei

When applied to a stochastic submodular function, the computational complexity of the proposed algorithms is lower than that of the existing stochastic submodular minimization algorithms.

Offline Planning and Online Learning under Recovering Rewards

no code implementations28 Jun 2021 David Simchi-Levi, Zeyu Zheng, Feng Zhu

Motivated by emerging applications such as live-streaming e-commerce, promotions and recommendations, we introduce and solve a general class of non-stationary multi-armed bandit problems that have the following two features: (i) the decision maker can pull and collect rewards from up to $K\,(\ge 1)$ out of $N$ different arms in each time period; (ii) the expected reward of an arm immediately drops after it is pulled, and then non-parametrically recovers as the arm's idle time increases.

online learning

Continuous Conditional Generative Adversarial Networks (cGAN) with Generator Regularization

no code implementations27 Mar 2021 Yufeng Zheng, Yunkai Zhang, Zeyu Zheng

To partially alleviate this difficulty, we propose a simple generator regularization term on the GAN generator loss in the form of Lipschitz penalty.

Adaptive Pairwise Weights for Temporal Credit Assignment

no code implementations9 Feb 2021 Zeyu Zheng, Risto Vuorio, Richard Lewis, Satinder Singh

In this empirical paper, we explore heuristics based on more general pairwise weightings that are functions of the state in which the action was taken, the state at the time of the reward, as well as the time interval between the two.

Learning State Representations from Random Deep Action-conditional Predictions

1 code implementation NeurIPS 2021 Zeyu Zheng, Vivek Veeriah, Risto Vuorio, Richard Lewis, Satinder Singh

Our main contribution in this work is an empirical finding that random General Value Functions (GVFs), i. e., deep action-conditional predictions -- random both in what feature of observations they predict as well as in the sequence of actions the predictions are conditioned upon -- form good auxiliary tasks for reinforcement learning (RL) problems.

Atari Games Representation Learning +1

Doubly Stochastic Generative Arrivals Modeling

no code implementations27 Dec 2020 Yufeng Zheng, Zeyu Zheng

Numerical experiments suggest that, in terms of performance, the successful model estimation for DS-WGAN only requires a moderate size of representative data, which can be appealing in many contexts of operational management.

Management

On Projection Robust Optimal Transport: Sample Complexity and Model Misspecification

no code implementations22 Jun 2020 Tianyi Lin, Zeyu Zheng, Elynn Y. Chen, Marco Cuturi, Michael. I. Jordan

Yet, the behavior of minimum Wasserstein estimators is poorly understood, notably in high-dimensional regimes or under model misspecification.

What Can Learned Intrinsic Rewards Capture?

no code implementations ICML 2020 Zeyu Zheng, Junhyuk Oh, Matteo Hessel, Zhongwen Xu, Manuel Kroiss, Hado van Hasselt, David Silver, Satinder Singh

Furthermore, we show that unlike policy transfer methods that capture "how" the agent should behave, the learned reward functions can generalise to other kinds of agents and to changes in the dynamics of the environment by capturing "what" the agent should strive to do.

On Learning Intrinsic Rewards for Policy Gradient Methods

1 code implementation NeurIPS 2018 Zeyu Zheng, Junhyuk Oh, Satinder Singh

In this paper we derive a novel algorithm for learning intrinsic rewards for policy-gradient based learning agents.

Atari Games Decision Making +1

Poseidon: An Efficient Communication Architecture for Distributed Deep Learning on GPU Clusters

no code implementations11 Jun 2017 Hao Zhang, Zeyu Zheng, Shizhen Xu, Wei Dai, Qirong Ho, Xiaodan Liang, Zhiting Hu, Jinliang Wei, Pengtao Xie, Eric P. Xing

We show that Poseidon enables Caffe and TensorFlow to achieve 15. 5x speed-up on 16 single-GPU machines, even with limited bandwidth (10GbE) and the challenging VGG19-22K network for image classification.

Image Classification

Cannot find the paper you are looking for? You can Submit a new open access paper.