Search Results for author: Ping-Chun Hsieh

Found 15 papers, 2 papers with code

Exploration Through Bias: Revisiting Biased Maximum Likelihood Estimation in Stochastic Multi-Armed Bandits

no code implementations ICML 2020 Xi Liu, Ping-Chun Hsieh, Yu Heng Hung, Anirban Bhattacharya, P. Kumar

We propose a new family of bandit algorithms, that are formulated in a general way based on the Biased Maximum Likelihood Estimation (BMLE) method originally appearing in the adaptive control literature.

Multi-Armed Bandits

Coordinate Ascent for Off-Policy RL with Global Convergence Guarantees

no code implementations10 Dec 2022 Hsin-En Su, Yen-ju Chen, Ping-Chun Hsieh, Xi Liu

In this paper, we rethink off-policy learning via Coordinate Ascent Policy Optimization (CAPO), an off-policy actor-critic algorithm that decouples policy improvement from the state distribution of the behavior policy without using the policy gradient.

Q-Pensieve: Boosting Sample Efficiency of Multi-Objective RL Through Memory Sharing of Q-Snapshots

no code implementations6 Dec 2022 Wei Hung, Bo-Kai Huang, Ping-Chun Hsieh, Xi Liu

Many real-world continuous control problems are in the dilemma of weighing the pros and cons, multi-objective reinforcement learning (MORL) serves as a generic framework of learning control policies for different preferences over objectives.

Continuous Control Multi-Objective Reinforcement Learning

Reward-Biased Maximum Likelihood Estimation for Neural Contextual Bandits

no code implementations8 Mar 2022 Yu-Heng Hung, Ping-Chun Hsieh

Reward-biased maximum likelihood estimation (RBMLE) is a classic principle in the adaptive control literature for tackling explore-exploit trade-offs.

Multi-Armed Bandits

Neural PPO-Clip Attains Global Optimality: A Hinge Loss Perspective

no code implementations26 Oct 2021 Nai-Chieh Huang, Ping-Chun Hsieh, Kuo-Hao Ho, Hsuan-Yu Yao, Kai-Chun Hu, Liang-Chun Ouyang, I-Chen Wu

Policy optimization is a fundamental principle for designing reinforcement learning algorithms, and one example is the proximal policy optimization algorithm with a clipped surrogate objective (PPO-Clip), which has been popularly used in deep reinforcement learning due to its simplicity and effectiveness.

reinforcement-learning Reinforcement Learning (RL)

NeurWIN: Neural Whittle Index Network For Restless Bandits Via Deep RL

1 code implementation NeurIPS 2021 Khaled Nakhleh, Santosh Ganji, Ping-Chun Hsieh, I-Hong Hou, Srinivas Shakkottai

This paper proposes NeurWIN, a neural Whittle index network that seeks to learn the Whittle indices for any restless bandits by leveraging mathematical properties of the Whittle indices.

Reinforced Few-Shot Acquisition Function Learning for Bayesian Optimization

no code implementations NeurIPS 2021 Bing-Jing Hsieh, Ping-Chun Hsieh, Xi Liu

While it serves as a natural idea to combine DQN and an existing few-shot learning method, we identify that such a direct combination does not perform well due to severe overfitting, which is particularly critical in BO due to the need of a versatile sampling policy.

Few-Shot Learning

Escaping from Zero Gradient: Revisiting Action-Constrained Reinforcement Learning via Frank-Wolfe Policy Optimization

no code implementations22 Feb 2021 Jyun-Li Lin, Wei Hung, Shang-Hsuan Yang, Ping-Chun Hsieh, Xi Liu

Action-constrained reinforcement learning (RL) is a widely-used approach in various real-world applications, such as scheduling in networked systems with resource constraints and control of a robot with kinematic constraints.

Reinforcement Learning (RL) Scheduling

Rethinking Deep Policy Gradients via State-Wise Policy Improvement

no code implementations NeurIPS Workshop ICBINB 2020 Kai-Chun Hu, Ping-Chun Hsieh, Ting Han Wei, I-Chen Wu

Deep policy gradient is one of the major frameworks in reinforcement learning, and it has been shown to improve parameterized policies across various tasks and environments.

Policy Gradient Methods Value prediction

Reward-Biased Maximum Likelihood Estimation for Linear Stochastic Bandits

no code implementations8 Oct 2020 Yu-Heng Hung, Ping-Chun Hsieh, Xi Liu, P. R. Kumar

Modifying the reward-biased maximum likelihood method originally proposed in the adaptive control literature, we propose novel learning algorithms to handle the explore-exploit trade-off in linear bandits problems as well as generalized linear bandits problems.

Developing Multi-Task Recommendations with Long-Term Rewards via Policy Distilled Reinforcement Learning

no code implementations27 Jan 2020 Xi Liu, Li Li, Ping-Chun Hsieh, Muhe Xie, Yong Ge, Rui Chen

With the explosive growth of online products and content, recommendation techniques have been considered as an effective tool to overcome information overload, improve user experience, and boost business revenue.

Knowledge Distillation Multi-Task Learning +2

Exploration Through Reward Biasing: Reward-Biased Maximum Likelihood Estimation for Stochastic Multi-Armed Bandits

no code implementations2 Jul 2019 Xi Liu, Ping-Chun Hsieh, Anirban Bhattacharya, P. R. Kumar

To choose the bias-growth rate $\alpha(t)$ in RBMLE, we reveal the nontrivial interplay between $\alpha(t)$ and the regret bound that generally applies in both the Exponential Family as well as the sub-Gaussian/Exponential family bandits.

Multi-Armed Bandits

Streaming Network Embedding through Local Actions

no code implementations14 Nov 2018 Xi Liu, Ping-Chun Hsieh, Nick Duffield, Rui Chen, Muhe Xie, Xidao Wen

Thus the approach of adapting the existing methods to the streaming environment faces non-trivial technical challenges.

Multi-class Classification Network Embedding

Stay With Me: Lifetime Maximization Through Heteroscedastic Linear Bandits With Reneging

1 code implementation29 Oct 2018 Ping-Chun Hsieh, Xi Liu, Anirban Bhattacharya, P. R. Kumar

Sequential decision making for lifetime maximization is a critical problem in many real-world applications, such as medical treatment and portfolio selection.

Decision Making Multi-Armed Bandits

Cannot find the paper you are looking for? You can Submit a new open access paper.