Search Results for author: Wonseok Jeon

Found 12 papers, 3 papers with code

AdaEDL: Early Draft Stopping for Speculative Decoding of Large Language Models via an Entropy-based Lower Bound on Token Acceptance Probability

no code implementations24 Oct 2024 Sudhanshu Agrawal, Wonseok Jeon, Mingu Lee

The number of draft tokens produced in each drafting round is referred to as the draft length and is often a static hyperparameter chosen based on the acceptance rate statistics of the draft tokens.

On Speculative Decoding for Multimodal Large Language Models

no code implementations13 Apr 2024 Mukul Gagrani, Raghavv Goel, Wonseok Jeon, Junyoung Park, Mingu Lee, Christopher Lott

We show that a language-only model can serve as a good draft model for speculative decoding with LLaVA 7B, bypassing the need for image tokens and their associated processing components from the draft model.

Image Captioning Language Modelling +1

Recursive Speculative Decoding: Accelerating LLM Inference via Sampling Without Replacement

no code implementations21 Feb 2024 Wonseok Jeon, Mukul Gagrani, Raghavv Goel, Junyoung Park, Mingu Lee, Christopher Lott

We empirically evaluate RSD with Llama 2 and OPT models, showing that RSD outperforms the baseline methods, consistently for fixed draft sequence length and in most cases for fixed computational budgets at LLM.

Language Modelling

Local Metric Learning for Off-Policy Evaluation in Contextual Bandits with Continuous Actions

1 code implementation24 Oct 2022 Haanvid Lee, Jongmin Lee, Yunseon Choi, Wonseok Jeon, Byung-Jun Lee, Yung-Kyun Noh, Kee-Eung Kim

We consider local kernel metric learning for off-policy evaluation (OPE) of deterministic policies in contextual bandits with continuous action spaces.

Metric Learning Multi-Armed Bandits +1

Neural Topological Ordering for Computation Graphs

no code implementations13 Jul 2022 Mukul Gagrani, Corrado Rainone, Yang Yang, Harris Teague, Wonseok Jeon, Herke van Hoof, Weiliang Will Zeng, Piero Zappi, Christopher Lott, Roberto Bondesan

Recent works on machine learning for combinatorial optimization have shown that learning based approaches can outperform heuristic methods in terms of speed and performance.

2k BIG-bench Machine Learning +3

DemoDICE: Offline Imitation Learning with Supplementary Imperfect Demonstrations

no code implementations ICLR 2022 Geon-Hyeong Kim, Seokin Seo, Jongmin Lee, Wonseok Jeon, HyeongJoo Hwang, Hongseok Yang, Kee-Eung Kim

We consider offline imitation learning (IL), which aims to mimic the expert's behavior from its demonstration without further interaction with the environment.

Imitation Learning

OptiDICE: Offline Policy Optimization via Stationary Distribution Correction Estimation

1 code implementation21 Jun 2021 Jongmin Lee, Wonseok Jeon, Byung-Jun Lee, Joelle Pineau, Kee-Eung Kim

We consider the offline reinforcement learning (RL) setting where the agent aims to optimize the policy solely from the data without further environment interactions.

Offline RL Reinforcement Learning (RL)

Regularized Inverse Reinforcement Learning

no code implementations ICLR 2021 Wonseok Jeon, Chen-Yang Su, Paul Barde, Thang Doan, Derek Nowrouzezahrai, Joelle Pineau

Inverse Reinforcement Learning (IRL) aims to facilitate a learner's ability to imitate expert behavior by acquiring reward functions that explain the expert's decisions.

reinforcement-learning Reinforcement Learning +1

Adversarial Soft Advantage Fitting: Imitation Learning without Policy Optimization

3 code implementations NeurIPS 2020 Paul Barde, Julien Roy, Wonseok Jeon, Joelle Pineau, Christopher Pal, Derek Nowrouzezahrai

Adversarial Imitation Learning alternates between learning a discriminator -- which tells apart expert's demonstrations from generated ones -- and a generator's policy to produce trajectories that can fool this discriminator.

Imitation Learning reinforcement-learning +2

Scalable Multi-Agent Inverse Reinforcement Learning via Actor-Attention-Critic

no code implementations24 Feb 2020 Wonseok Jeon, Paul Barde, Derek Nowrouzezahrai, Joelle Pineau

Multi-agent adversarial inverse reinforcement learning (MA-AIRL) is a recent approach that applies single-agent AIRL to multi-agent problems where we seek to recover both policies for our agents and reward functions that promote expert-like behavior.

Open-Ended Question Answering reinforcement-learning +2

A Bayesian Approach to Generative Adversarial Imitation Learning

no code implementations NeurIPS 2018 Wonseok Jeon, Seokin Seo, Kee-Eung Kim

Generative adversarial training for imitation learning has shown promising results on high-dimensional and continuous control tasks.

continuous-control Continuous Control +1

Cannot find the paper you are looking for? You can Submit a new open access paper.