1 code implementation • 6 Jan 2024 • Ava Pettet, Yunuo Zhang, Baiting Luo, Kyle Wray, Hendrik Baier, Aron Laszka, Abhishek Dubey, Ayan Mukhopadhyay
In this paper, we introduce \textit{Policy-Augmented Monte Carlo tree search} (PA-MCTS), which combines action-value estimates from an out-of-date policy with an online search using an up-to-date model of the environment.
1 code implementation • 3 Jan 2024 • Baiting Luo, Yunuo Zhang, Abhishek Dubey, Ayan Mukhopadhyay
However, existing approaches for decision-making in NSMDPs have two major shortcomings: first, they assume that the updated environmental dynamics at the current time are known (although future dynamics can change); and second, planning is largely pessimistic, i. e., the agent acts ``safely'' to account for the non-stationary evolution of the environment.