Search Results for author: Wonyoung Kim

Found 9 papers, 2 papers with code

A Doubly Robust Approach to Sparse Reinforcement Learning

no code implementations23 Oct 2023 Wonyoung Kim, Garud Iyengar, Assaf Zeevi

We propose a new regret minimization algorithm for episodic sparse linear Markov decision process (SMDP) where the state-transition distribution is a linear function of observed features.

reinforcement-learning

Pareto Front Identification with Regret Minimization

no code implementations31 May 2023 Wonyoung Kim, Garud Iyengar, Assaf Zeevi

The sample complexity of our proposed algorithm is $\tilde{O}(d/\Delta^2)$, where $d$ is the dimension of contexts and $\Delta$ is a measure of problem complexity.

Active Learning

Improved Algorithms for Multi-period Multi-class Packing Problems with Bandit Feedback

no code implementations31 Jan 2023 Wonyoung Kim, Garud Iyengar, Assaf Zeevi

We consider the linear contextual multi-class multi-period packing problem (LMMP) where the goal is to pack items such that the total vector of consumption is below a given budget vector and the total value is as large as possible.

Management Multi-Armed Bandits

Double Doubly Robust Thompson Sampling for Generalized Linear Contextual Bandits

no code implementations15 Sep 2022 Wonyoung Kim, Kyungbok Lee, Myunghee Cho Paik

We propose a novel contextual bandit algorithm for generalized linear rewards with an $\tilde{O}(\sqrt{\kappa^{-1} \phi T})$ regret over $T$ rounds where $\phi$ is the minimum eigenvalue of the covariance of contexts and $\kappa$ is a lower bound of the variance of rewards.

Multi-Armed Bandits Thompson Sampling

Squeeze All: Novel Estimator and Self-Normalized Bound for Linear Contextual Bandits

no code implementations11 Jun 2022 Wonyoung Kim, Myunghee Cho Paik, Min-hwan Oh

We propose a linear contextual bandit algorithm with $O(\sqrt{dT\log T})$ regret bound, where $d$ is the dimension of contexts and $T$ isthe time horizon.

Multi-Armed Bandits

Doubly Robust Thompson Sampling with Linear Payoffs

no code implementations NeurIPS 2021 Wonyoung Kim, Gi-Soo Kim, Myunghee Cho Paik

A challenging aspect of the bandit problem is that a stochastic reward is observed only for the chosen arm and the rewards of other arms remain missing.

Thompson Sampling

Doubly robust Thompson sampling for linear payoffs

no code implementations1 Feb 2021 Wonyoung Kim, Gi-Soo Kim, Myunghee Cho Paik

A challenging aspect of the bandit problem is that a stochastic reward is observed only for the chosen arm and the rewards of other arms remain missing.

Thompson Sampling

Principled analytic classifier for positive-unlabeled learning via weighted integral probability metric

1 code implementation28 Jan 2019 Yongchan Kwon, Wonyoung Kim, Masashi Sugiyama, Myunghee Cho Paik

We consider the problem of learning a binary classifier from only positive and unlabeled observations (called PU learning).

Hyperparameter Optimization

Cannot find the paper you are looking for? You can Submit a new open access paper.