Search Results for author: Ping-Chun Hsieh

Found 21 papers, 2 papers with code

Exploration Through Bias: Revisiting Biased Maximum Likelihood Estimation in Stochastic Multi-Armed Bandits

no code implementations ICML 2020 Xi Liu, Ping-Chun Hsieh, Yu Heng Hung, Anirban Bhattacharya, P. Kumar

We propose a new family of bandit algorithms, that are formulated in a general way based on the Biased Maximum Likelihood Estimation (BMLE) method originally appearing in the adaptive control literature.

Multi-Armed Bandits

Image Deraining via Self-supervised Reinforcement Learning

no code implementations27 Mar 2024 He-Hao Liao, Yan-Tsung Peng, Wen-Tao Chu, Ping-Chun Hsieh, Chung-Chi Tsai

The work aims to recover rain images by removing rain streaks via Self-supervised Reinforcement Learning (RL) for image deraining (SRL-Derain).

Denoising Dictionary Learning +3

Offline Imitation of Badminton Player Behavior via Experiential Contexts and Brownian Motion

no code implementations19 Mar 2024 Kuang-Da Wang, Wei-Yao Wang, Ping-Chun Hsieh, Wen-Chih Peng

(iii) To generate more realistic behavior, RallyNet leverages Geometric Brownian Motion (GBM) to model the interactions between players by introducing a valuable inductive bias for learning player behaviors.

Imitation Learning Inductive Bias +1

PPO-Clip Attains Global Optimality: Towards Deeper Understandings of Clipping

no code implementations19 Dec 2023 Nai-Chieh Huang, Ping-Chun Hsieh, Kuo-Hao Ho, I-Chen Wu

Our findings highlight the $O(1/\sqrt{T})$ min-iterate convergence rate specifically in the context of neural function approximation.

Accelerated Policy Gradient: On the Convergence Rates of the Nesterov Momentum for Reinforcement Learning

no code implementations18 Oct 2023 Yen-ju Chen, Nai-Chieh Huang, Ping-Chun Hsieh

In response to this gap, we adapt the celebrated Nesterov's accelerated gradient (NAG) method to policy optimization in RL, termed \textit{Accelerated Policy Gradient} (APG).

Policy Gradient Methods reinforcement-learning +1

Value-Biased Maximum Likelihood Estimation for Model-based Reinforcement Learning in Discounted Linear MDPs

no code implementations17 Oct 2023 Yu-Heng Hung, Ping-Chun Hsieh, Akshay Mete, P. R. Kumar

We consider the infinite-horizon linear Markov Decision Processes (MDPs), where the transition probabilities of the dynamic model can be linearly parameterized with the help of a predefined low-dimensional feature mapping.

Model-based Reinforcement Learning

Towards Human-Like RL: Taming Non-Naturalistic Behavior in Deep RL via Adaptive Behavioral Costs in 3D Games

no code implementations27 Sep 2023 Kuo-Hao Ho, Ping-Chun Hsieh, Chiu-Chou Lin, You-Ren Luo, Feng-Jian Wang, I-Chen Wu

In this paper, we propose a new approach called Adaptive Behavioral Costs in Reinforcement Learning (ABC-RL) for training a human-like agent with competitive strength.

Decision Making FPS Games +2

Coordinate Ascent for Off-Policy RL with Global Convergence Guarantees

no code implementations10 Dec 2022 Hsin-En Su, Yen-ju Chen, Ping-Chun Hsieh, Xi Liu

In this paper, we rethink off-policy learning via Coordinate Ascent Policy Optimization (CAPO), an off-policy actor-critic algorithm that decouples policy improvement from the state distribution of the behavior policy without using the policy gradient.

counterfactual

Q-Pensieve: Boosting Sample Efficiency of Multi-Objective RL Through Memory Sharing of Q-Snapshots

no code implementations6 Dec 2022 Wei Hung, Bo-Kai Huang, Ping-Chun Hsieh, Xi Liu

Many real-world continuous control problems are in the dilemma of weighing the pros and cons, multi-objective reinforcement learning (MORL) serves as a generic framework of learning control policies for different preferences over objectives.

Continuous Control Multi-Objective Reinforcement Learning

Reward-Biased Maximum Likelihood Estimation for Neural Contextual Bandits

no code implementations8 Mar 2022 Yu-Heng Hung, Ping-Chun Hsieh

Reward-biased maximum likelihood estimation (RBMLE) is a classic principle in the adaptive control literature for tackling explore-exploit trade-offs.

Multi-Armed Bandits

Neural PPO-Clip Attains Global Optimality: A Hinge Loss Perspective

no code implementations26 Oct 2021 Nai-Chieh Huang, Ping-Chun Hsieh, Kuo-Hao Ho, Hsuan-Yu Yao, Kai-Chun Hu, Liang-Chun Ouyang, I-Chen Wu

Policy optimization is a fundamental principle for designing reinforcement learning algorithms, and one example is the proximal policy optimization algorithm with a clipped surrogate objective (PPO-Clip), which has been popularly used in deep reinforcement learning due to its simplicity and effectiveness.

reinforcement-learning Reinforcement Learning (RL)

NeurWIN: Neural Whittle Index Network For Restless Bandits Via Deep RL

1 code implementation NeurIPS 2021 Khaled Nakhleh, Santosh Ganji, Ping-Chun Hsieh, I-Hong Hou, Srinivas Shakkottai

This paper proposes NeurWIN, a neural Whittle index network that seeks to learn the Whittle indices for any restless bandits by leveraging mathematical properties of the Whittle indices.

Reinforced Few-Shot Acquisition Function Learning for Bayesian Optimization

no code implementations NeurIPS 2021 Bing-Jing Hsieh, Ping-Chun Hsieh, Xi Liu

While it serves as a natural idea to combine DQN and an existing few-shot learning method, we identify that such a direct combination does not perform well due to severe overfitting, which is particularly critical in BO due to the need of a versatile sampling policy.

Bayesian Optimization Few-Shot Learning

Escaping from Zero Gradient: Revisiting Action-Constrained Reinforcement Learning via Frank-Wolfe Policy Optimization

no code implementations22 Feb 2021 Jyun-Li Lin, Wei Hung, Shang-Hsuan Yang, Ping-Chun Hsieh, Xi Liu

Action-constrained reinforcement learning (RL) is a widely-used approach in various real-world applications, such as scheduling in networked systems with resource constraints and control of a robot with kinematic constraints.

Reinforcement Learning (RL) Scheduling

Rethinking Deep Policy Gradients via State-Wise Policy Improvement

no code implementations NeurIPS Workshop ICBINB 2020 Kai-Chun Hu, Ping-Chun Hsieh, Ting Han Wei, I-Chen Wu

Deep policy gradient is one of the major frameworks in reinforcement learning, and it has been shown to improve parameterized policies across various tasks and environments.

Policy Gradient Methods Value prediction

Reward-Biased Maximum Likelihood Estimation for Linear Stochastic Bandits

no code implementations8 Oct 2020 Yu-Heng Hung, Ping-Chun Hsieh, Xi Liu, P. R. Kumar

Modifying the reward-biased maximum likelihood method originally proposed in the adaptive control literature, we propose novel learning algorithms to handle the explore-exploit trade-off in linear bandits problems as well as generalized linear bandits problems.

Computational Efficiency

Developing Multi-Task Recommendations with Long-Term Rewards via Policy Distilled Reinforcement Learning

no code implementations27 Jan 2020 Xi Liu, Li Li, Ping-Chun Hsieh, Muhe Xie, Yong Ge, Rui Chen

With the explosive growth of online products and content, recommendation techniques have been considered as an effective tool to overcome information overload, improve user experience, and boost business revenue.

Knowledge Distillation Multi-Task Learning +2

Exploration Through Reward Biasing: Reward-Biased Maximum Likelihood Estimation for Stochastic Multi-Armed Bandits

no code implementations2 Jul 2019 Xi Liu, Ping-Chun Hsieh, Anirban Bhattacharya, P. R. Kumar

To choose the bias-growth rate $\alpha(t)$ in RBMLE, we reveal the nontrivial interplay between $\alpha(t)$ and the regret bound that generally applies in both the Exponential Family as well as the sub-Gaussian/Exponential family bandits.

Multi-Armed Bandits

Streaming Network Embedding through Local Actions

no code implementations14 Nov 2018 Xi Liu, Ping-Chun Hsieh, Nick Duffield, Rui Chen, Muhe Xie, Xidao Wen

Thus the approach of adapting the existing methods to the streaming environment faces non-trivial technical challenges.

Clustering Multi-class Classification +1

Stay With Me: Lifetime Maximization Through Heteroscedastic Linear Bandits With Reneging

1 code implementation29 Oct 2018 Ping-Chun Hsieh, Xi Liu, Anirban Bhattacharya, P. R. Kumar

Sequential decision making for lifetime maximization is a critical problem in many real-world applications, such as medical treatment and portfolio selection.

Decision Making Multi-Armed Bandits

Cannot find the paper you are looking for? You can Submit a new open access paper.