no code implementations • ICML 2020 • Xi Liu, Ping-Chun Hsieh, Yu Heng Hung, Anirban Bhattacharya, P. Kumar
We propose a new family of bandit algorithms, that are formulated in a general way based on the Biased Maximum Likelihood Estimation (BMLE) method originally appearing in the adaptive control literature.
no code implementations • 10 Dec 2022 • Hsin-En Su, Yen-ju Chen, Ping-Chun Hsieh, Xi Liu
In this paper, we rethink off-policy learning via Coordinate Ascent Policy Optimization (CAPO), an off-policy actor-critic algorithm that decouples policy improvement from the state distribution of the behavior policy without using the policy gradient.
no code implementations • 6 Dec 2022 • Wei Hung, Bo-Kai Huang, Ping-Chun Hsieh, Xi Liu
Many real-world continuous control problems are in the dilemma of weighing the pros and cons, multi-objective reinforcement learning (MORL) serves as a generic framework of learning control policies for different preferences over objectives.
no code implementations • 27 Sep 2022 • Yung-Han Ho, Chia-Hao Kao, Wen-Hsiao Peng, Ping-Chun Hsieh
Recently, the dual-critic design is proposed to update the actor by alternating the rate and distortion critics.
no code implementations • 8 Mar 2022 • Yu-Heng Hung, Ping-Chun Hsieh
Reward-biased maximum likelihood estimation (RBMLE) is a classic principle in the adaptive control literature for tackling explore-exploit trade-offs.
no code implementations • 26 Oct 2021 • Nai-Chieh Huang, Ping-Chun Hsieh, Kuo-Hao Ho, Hsuan-Yu Yao, Kai-Chun Hu, Liang-Chun Ouyang, I-Chen Wu
Policy optimization is a fundamental principle for designing reinforcement learning algorithms, and one example is the proximal policy optimization algorithm with a clipped surrogate objective (PPO-Clip), which has been popularly used in deep reinforcement learning due to its simplicity and effectiveness.
1 code implementation • NeurIPS 2021 • Khaled Nakhleh, Santosh Ganji, Ping-Chun Hsieh, I-Hong Hou, Srinivas Shakkottai
This paper proposes NeurWIN, a neural Whittle index network that seeks to learn the Whittle indices for any restless bandits by leveraging mathematical properties of the Whittle indices.
no code implementations • NeurIPS 2021 • Bing-Jing Hsieh, Ping-Chun Hsieh, Xi Liu
While it serves as a natural idea to combine DQN and an existing few-shot learning method, we identify that such a direct combination does not perform well due to severe overfitting, which is particularly critical in BO due to the need of a versatile sampling policy.
no code implementations • 22 Feb 2021 • Jyun-Li Lin, Wei Hung, Shang-Hsuan Yang, Ping-Chun Hsieh, Xi Liu
Action-constrained reinforcement learning (RL) is a widely-used approach in various real-world applications, such as scheduling in networked systems with resource constraints and control of a robot with kinematic constraints.
no code implementations • NeurIPS Workshop ICBINB 2020 • Kai-Chun Hu, Ping-Chun Hsieh, Ting Han Wei, I-Chen Wu
Deep policy gradient is one of the major frameworks in reinforcement learning, and it has been shown to improve parameterized policies across various tasks and environments.
no code implementations • 8 Oct 2020 • Yu-Heng Hung, Ping-Chun Hsieh, Xi Liu, P. R. Kumar
Modifying the reward-biased maximum likelihood method originally proposed in the adaptive control literature, we propose novel learning algorithms to handle the explore-exploit trade-off in linear bandits problems as well as generalized linear bandits problems.
no code implementations • 27 Jan 2020 • Xi Liu, Li Li, Ping-Chun Hsieh, Muhe Xie, Yong Ge, Rui Chen
With the explosive growth of online products and content, recommendation techniques have been considered as an effective tool to overcome information overload, improve user experience, and boost business revenue.
no code implementations • 2 Jul 2019 • Xi Liu, Ping-Chun Hsieh, Anirban Bhattacharya, P. R. Kumar
To choose the bias-growth rate $\alpha(t)$ in RBMLE, we reveal the nontrivial interplay between $\alpha(t)$ and the regret bound that generally applies in both the Exponential Family as well as the sub-Gaussian/Exponential family bandits.
no code implementations • 14 Nov 2018 • Xi Liu, Ping-Chun Hsieh, Nick Duffield, Rui Chen, Muhe Xie, Xidao Wen
Thus the approach of adapting the existing methods to the streaming environment faces non-trivial technical challenges.
1 code implementation • 29 Oct 2018 • Ping-Chun Hsieh, Xi Liu, Anirban Bhattacharya, P. R. Kumar
Sequential decision making for lifetime maximization is a critical problem in many real-world applications, such as medical treatment and portfolio selection.