no code implementations • 17 Oct 2023 • Yu-Heng Hung, Ping-Chun Hsieh, Akshay Mete, P. R. Kumar
We consider the infinite-horizon linear Markov Decision Processes (MDPs), where the transition probabilities of the dynamic model can be linearly parameterized with the help of a predefined low-dimensional feature mapping.
no code implementations • 8 Mar 2022 • Yu-Heng Hung, Ping-Chun Hsieh
Reward-biased maximum likelihood estimation (RBMLE) is a classic principle in the adaptive control literature for tackling explore-exploit trade-offs.
no code implementations • 8 Oct 2020 • Yu-Heng Hung, Ping-Chun Hsieh, Xi Liu, P. R. Kumar
Modifying the reward-biased maximum likelihood method originally proposed in the adaptive control literature, we propose novel learning algorithms to handle the explore-exploit trade-off in linear bandits problems as well as generalized linear bandits problems.