no code implementations • 24 Oct 2024 • Sudhanshu Agrawal, Wonseok Jeon, Mingu Lee
The number of draft tokens produced in each drafting round is referred to as the draft length and is often a static hyperparameter chosen based on the acceptance rate statistics of the draft tokens.
no code implementations • 13 Apr 2024 • Mukul Gagrani, Raghavv Goel, Wonseok Jeon, Junyoung Park, Mingu Lee, Christopher Lott
We show that a language-only model can serve as a good draft model for speculative decoding with LLaVA 7B, bypassing the need for image tokens and their associated processing components from the draft model.
no code implementations • 29 Feb 2024 • Raghavv Goel, Mukul Gagrani, Wonseok Jeon, Junyoung Park, Mingu Lee, Christopher Lott
In this paper, we propose a simple draft model training framework for direct alignment to chat-capable target models.
no code implementations • 21 Feb 2024 • Wonseok Jeon, Mukul Gagrani, Raghavv Goel, Junyoung Park, Mingu Lee, Christopher Lott
We empirically evaluate RSD with Llama 2 and OPT models, showing that RSD outperforms the baseline methods, consistently for fixed draft sequence length and in most cases for fixed computational budgets at LLM.
1 code implementation • 24 Oct 2022 • Haanvid Lee, Jongmin Lee, Yunseon Choi, Wonseok Jeon, Byung-Jun Lee, Yung-Kyun Noh, Kee-Eung Kim
We consider local kernel metric learning for off-policy evaluation (OPE) of deterministic policies in contextual bandits with continuous action spaces.
no code implementations • 13 Jul 2022 • Mukul Gagrani, Corrado Rainone, Yang Yang, Harris Teague, Wonseok Jeon, Herke van Hoof, Weiliang Will Zeng, Piero Zappi, Christopher Lott, Roberto Bondesan
Recent works on machine learning for combinatorial optimization have shown that learning based approaches can outperform heuristic methods in terms of speed and performance.
no code implementations • ICLR 2022 • Geon-Hyeong Kim, Seokin Seo, Jongmin Lee, Wonseok Jeon, HyeongJoo Hwang, Hongseok Yang, Kee-Eung Kim
We consider offline imitation learning (IL), which aims to mimic the expert's behavior from its demonstration without further interaction with the environment.
1 code implementation • 21 Jun 2021 • Jongmin Lee, Wonseok Jeon, Byung-Jun Lee, Joelle Pineau, Kee-Eung Kim
We consider the offline reinforcement learning (RL) setting where the agent aims to optimize the policy solely from the data without further environment interactions.
no code implementations • ICLR 2021 • Wonseok Jeon, Chen-Yang Su, Paul Barde, Thang Doan, Derek Nowrouzezahrai, Joelle Pineau
Inverse Reinforcement Learning (IRL) aims to facilitate a learner's ability to imitate expert behavior by acquiring reward functions that explain the expert's decisions.
3 code implementations • NeurIPS 2020 • Paul Barde, Julien Roy, Wonseok Jeon, Joelle Pineau, Christopher Pal, Derek Nowrouzezahrai
Adversarial Imitation Learning alternates between learning a discriminator -- which tells apart expert's demonstrations from generated ones -- and a generator's policy to produce trajectories that can fool this discriminator.
no code implementations • 24 Feb 2020 • Wonseok Jeon, Paul Barde, Derek Nowrouzezahrai, Joelle Pineau
Multi-agent adversarial inverse reinforcement learning (MA-AIRL) is a recent approach that applies single-agent AIRL to multi-agent problems where we seek to recover both policies for our agents and reward functions that promote expert-like behavior.
no code implementations • NeurIPS 2018 • Wonseok Jeon, Seokin Seo, Kee-Eung Kim
Generative adversarial training for imitation learning has shown promising results on high-dimensional and continuous control tasks.