no code implementations • 6 Apr 2024 • Seungjae Jung, Gunsoo Han, Daniel Wontae Nam, Kyoung-Woon On
In the process of this discovery, we identified two techniques for effective alignment: reward shift and underlying distribution matching.
no code implementations • 10 Oct 2023 • DaeJin Jo, Daniel Wontae Nam, Gunsoo Han, Kyoung-Woon On, Taehwan Kwon, Seungeun Rho, Sungwoong Kim
A common practice in knowledge-grounded dialogue generation is to explicitly utilize intermediate steps (e. g., web-search, memory retrieval) with modular approaches.
no code implementations • 23 May 2023 • Eunbi Choi, Kyoung-Woon On, Gunsoo Han, Sungwoong Kim, Daniel Wontae Nam, DaeJin Jo, Seung Eun Rho, Taehwan Kwon, Minjoon Seo
Open-domain conversation systems integrate multiple conversation skills into a single system through a modular approach.
1 code implementation • 11 Oct 2022 • DaeJin Jo, Sungwoong Kim, Daniel Wontae Nam, Taehwan Kwon, Seungeun Rho, Jongmin Kim, Donghoon Lee
In order to resolve these issues, in this paper, we propose a learnable hash-based episodic count, which we name LECO, that efficiently performs as a task-specific intrinsic reward in hard exploration problems.
no code implementations • 24 May 2021 • Daniel Wontae Nam, Younghoon Kim, Chan Y. Park
In this paper, we devise a distributional framework on actor-critic as a solution to distributional instability, action type restriction, and conflation between samples and statistics.
no code implementations • 1 Jan 2021 • Daniel Wontae Nam, Younghoon Kim, Chan Youn Park
Recent distributional reinforcement learning methods, despite their successes, still contain fundamental problems that can lead to inaccurate representations of value distributions, such as distributional instability, action type restriction, and biased approximation.