no code implementations • ICML 2020 • Byung-Jun Lee, Jongmin Lee, Peter Vrancx, Dongho Kim, Kee-Eung Kim
We consider the batch reinforcement learning problem where the agent needs to learn only from a fixed batch of data, without further interaction with the environment.
2 code implementations • 26 Mar 2021 • John McLeod, Hrvoje Stojic, Vincent Adam, Dongho Kim, Jordi Grau-Moya, Peter Vrancx, Felix Leibfried
This paves the way for new research directions, e. g. investigating uncertainty-aware environment models that are not necessarily neural-network-based, or developing algorithms to solve industrially-motivated benchmarks that share characteristics with real-world problems.
Model-based Reinforcement Learning reinforcement-learning +2
1 code implementation • 9 Oct 2019 • Marcin B. Tomczak, Dongho Kim, Peter Vrancx, Kee-Eung Kim
These proxy objectives allow stable and low variance policy learning, but require small policy updates to ensure that the proxy objective remains an accurate approximation of the target policy value.
no code implementations • NeurIPS 2018 • Rasul Tutunov, Dongho Kim, Haitham Bou Ammar
Multitask reinforcement learning (MTRL) suffers from scalability issues when the number of tasks or trajectories grows large.
no code implementations • 13 Aug 2015 • Pei-Hao Su, David Vandyke, Milica Gasic, Dongho Kim, Nikola Mrksic, Tsung-Hsien Wen, Steve Young
The models are trained on dialogues generated by a simulated user and the best model is then used to train a policy on-line which is shown to perform at least as well as a baseline system using prior knowledge of the user's task.
no code implementations • WS 2015 • Tsung-Hsien Wen, Milica Gasic, Dongho Kim, Nikola Mrksic, Pei-Hao Su, David Vandyke, Steve Young
The natural language generation (NLG) component of a spoken dialogue system (SDS) usually needs a substantial amount of handcrafting or a well-labeled dataset to be trained on.
no code implementations • NeurIPS 2012 • Dongho Kim, Kee-Eung Kim, Pascal Poupart
In this paper, we consider Bayesian reinforcement learning (BRL) where actions incur costs in addition to rewards, and thus exploration has to be constrained in terms of the expected total cost while learning to maximize the expected long-term total reward.