Search Results for author: Wonyoung Kim

Found 9 papers, 2 papers with code

A Doubly Robust Approach to Sparse Reinforcement Learning

no code implementations • 23 Oct 2023 • Wonyoung Kim, Garud Iyengar, Assaf Zeevi

We propose a new regret minimization algorithm for episodic sparse linear Markov decision process (SMDP) where the state-transition distribution is a linear function of observed features.

reinforcement-learning

Paper
Add Code

Pareto Front Identification with Regret Minimization

no code implementations • 31 May 2023 • Wonyoung Kim, Garud Iyengar, Assaf Zeevi

The sample complexity of our proposed algorithm is $\tilde{O}(d/\Delta^2)$, where $d$ is the dimension of contexts and $\Delta$ is a measure of problem complexity.

Active Learning

Paper
Add Code

Improved Algorithms for Multi-period Multi-class Packing Problems with Bandit Feedback

no code implementations • 31 Jan 2023 • Wonyoung Kim, Garud Iyengar, Assaf Zeevi

We consider the linear contextual multi-class multi-period packing problem (LMMP) where the goal is to pack items such that the total vector of consumption is below a given budget vector and the total value is as large as possible.

Management Multi-Armed Bandits

Paper
Add Code

Double Doubly Robust Thompson Sampling for Generalized Linear Contextual Bandits

no code implementations • 15 Sep 2022 • Wonyoung Kim, Kyungbok Lee, Myunghee Cho Paik

We propose a novel contextual bandit algorithm for generalized linear rewards with an $\tilde{O}(\sqrt{\kappa^{-1} \phi T})$ regret over $T$ rounds where $\phi$ is the minimum eigenvalue of the covariance of contexts and $\kappa$ is a lower bound of the variance of rewards.

Multi-Armed Bandits Thompson Sampling

Paper
Add Code

Squeeze All: Novel Estimator and Self-Normalized Bound for Linear Contextual Bandits

no code implementations • 11 Jun 2022 • Wonyoung Kim, Myunghee Cho Paik, Min-hwan Oh

We propose a linear contextual bandit algorithm with $O(\sqrt{dT\log T})$ regret bound, where $d$ is the dimension of contexts and $T$ isthe time horizon.

Multi-Armed Bandits

Paper
Add Code

Doubly Robust Thompson Sampling with Linear Payoffs

no code implementations • NeurIPS 2021 • Wonyoung Kim, Gi-Soo Kim, Myunghee Cho Paik

A challenging aspect of the bandit problem is that a stochastic reward is observed only for the chosen arm and the rewards of other arms remain missing.

Thompson Sampling

Paper
Add Code

Doubly robust Thompson sampling for linear payoffs

no code implementations • 1 Feb 2021 • Wonyoung Kim, Gi-Soo Kim, Myunghee Cho Paik

A challenging aspect of the bandit problem is that a stochastic reward is observed only for the chosen arm and the rewards of other arms remain missing.

Thompson Sampling

Paper
Add Code

Principled learning method for Wasserstein distributionally robust optimization with local perturbations

1 code implementation • ICML 2020 • Yongchan Kwon, Wonyoung Kim, Joong-Ho Won, Myunghee Cho Paik

We show that our approximation and risk consistency results naturally extend to the cases when data are locally perturbed.

Image Classification

Paper
Code

Principled analytic classifier for positive-unlabeled learning via weighted integral probability metric

1 code implementation • 28 Jan 2019 • Yongchan Kwon, Wonyoung Kim, Masashi Sugiyama, Myunghee Cho Paik

We consider the problem of learning a binary classifier from only positive and unlabeled observations (called PU learning).

Hyperparameter Optimization

Paper
Code

Cannot find the paper you are looking for? You can Submit a new open access paper.