Search Results for author: Wonseok Jeon

Found 11 papers, 3 papers with code

On Speculative Decoding for Multimodal Large Language Models

no code implementations • 13 Apr 2024 • Mukul Gagrani, Raghavv Goel, Wonseok Jeon, Junyoung Park, Mingu Lee, Christopher Lott

We show that a language-only model can serve as a good draft model for speculative decoding with LLaVA 7B, bypassing the need for image tokens and their associated processing components from the draft model.

Image Captioning Language Modelling +1

Paper
Add Code

Direct Alignment of Draft Model for Speculative Decoding with Chat-Fine-Tuned LLMs

no code implementations • 29 Feb 2024 • Raghavv Goel, Mukul Gagrani, Wonseok Jeon, Junyoung Park, Mingu Lee, Christopher Lott

In this paper, we propose a simple draft model training framework for direct alignment to chat-capable target models.

Knowledge Distillation Text Generation

Paper
Add Code

Recursive Speculative Decoding: Accelerating LLM Inference via Sampling Without Replacement

no code implementations • 21 Feb 2024 • Wonseok Jeon, Mukul Gagrani, Raghavv Goel, Junyoung Park, Mingu Lee, Christopher Lott

We empirically evaluate RSD with Llama 2 and OPT models, showing that RSD outperforms the baseline methods, consistently for fixed draft sequence length and in most cases for fixed computational budgets at LLM.

Language Modelling

Paper
Add Code

Local Metric Learning for Off-Policy Evaluation in Contextual Bandits with Continuous Actions

1 code implementation • 24 Oct 2022 • Haanvid Lee, Jongmin Lee, Yunseon Choi, Wonseok Jeon, Byung-Jun Lee, Yung-Kyun Noh, Kee-Eung Kim

We consider local kernel metric learning for off-policy evaluation (OPE) of deterministic policies in contextual bandits with continuous action spaces.

Metric Learning Multi-Armed Bandits +1

Paper
Code

Neural Topological Ordering for Computation Graphs

no code implementations • 13 Jul 2022 • Mukul Gagrani, Corrado Rainone, Yang Yang, Harris Teague, Wonseok Jeon, Herke van Hoof, Weiliang Will Zeng, Piero Zappi, Christopher Lott, Roberto Bondesan

Recent works on machine learning for combinatorial optimization have shown that learning based approaches can outperform heuristic methods in terms of speed and performance.

2k BIG-bench Machine Learning +1

Paper
Add Code

DemoDICE: Offline Imitation Learning with Supplementary Imperfect Demonstrations

no code implementations • ICLR 2022 • Geon-Hyeong Kim, Seokin Seo, Jongmin Lee, Wonseok Jeon, HyeongJoo Hwang, Hongseok Yang, Kee-Eung Kim

We consider offline imitation learning (IL), which aims to mimic the expert's behavior from its demonstration without further interaction with the environment.

Imitation Learning

Paper
Add Code

OptiDICE: Offline Policy Optimization via Stationary Distribution Correction Estimation

1 code implementation • 21 Jun 2021 • Jongmin Lee, Wonseok Jeon, Byung-Jun Lee, Joelle Pineau, Kee-Eung Kim

We consider the offline reinforcement learning (RL) setting where the agent aims to optimize the policy solely from the data without further environment interactions.

Offline RL Reinforcement Learning (RL)

Paper
Code

Regularized Inverse Reinforcement Learning

no code implementations • ICLR 2021 • Wonseok Jeon, Chen-Yang Su, Paul Barde, Thang Doan, Derek Nowrouzezahrai, Joelle Pineau

Inverse Reinforcement Learning (IRL) aims to facilitate a learner's ability to imitate expert behavior by acquiring reward functions that explain the expert's decisions.

reinforcement-learning Reinforcement Learning (RL)

Paper
Add Code

Adversarial Soft Advantage Fitting: Imitation Learning without Policy Optimization

3 code implementations • NeurIPS 2020 • Paul Barde, Julien Roy, Wonseok Jeon, Joelle Pineau, Christopher Pal, Derek Nowrouzezahrai

Adversarial Imitation Learning alternates between learning a discriminator -- which tells apart expert's demonstrations from generated ones -- and a generator's policy to produce trajectories that can fool this discriminator.

Imitation Learning reinforcement-learning +1

Paper
Code

Scalable Multi-Agent Inverse Reinforcement Learning via Actor-Attention-Critic

no code implementations • 24 Feb 2020 • Wonseok Jeon, Paul Barde, Derek Nowrouzezahrai, Joelle Pineau

Multi-agent adversarial inverse reinforcement learning (MA-AIRL) is a recent approach that applies single-agent AIRL to multi-agent problems where we seek to recover both policies for our agents and reward functions that promote expert-like behavior.

Open-Ended Question Answering reinforcement-learning +1

Paper
Add Code

A Bayesian Approach to Generative Adversarial Imitation Learning

no code implementations • NeurIPS 2018 • Wonseok Jeon, Seokin Seo, Kee-Eung Kim

Generative adversarial training for imitation learning has shown promising results on high-dimensional and continuous control tasks.

Continuous Control Imitation Learning

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.