no code implementations • 23 Oct 2023 • Wonyoung Kim, Garud Iyengar, Assaf Zeevi
We propose a new regret minimization algorithm for episodic sparse linear Markov decision process (SMDP) where the state-transition distribution is a linear function of observed features.
no code implementations • 31 May 2023 • Wonyoung Kim, Garud Iyengar, Assaf Zeevi
The sample complexity of our proposed algorithm is $\tilde{O}(d/\Delta^2)$, where $d$ is the dimension of contexts and $\Delta$ is a measure of problem complexity.
no code implementations • 31 Jan 2023 • Wonyoung Kim, Garud Iyengar, Assaf Zeevi
We consider the linear contextual multi-class multi-period packing problem (LMMP) where the goal is to pack items such that the total vector of consumption is below a given budget vector and the total value is as large as possible.
no code implementations • 15 Sep 2022 • Wonyoung Kim, Kyungbok Lee, Myunghee Cho Paik
We propose a novel contextual bandit algorithm for generalized linear rewards with an $\tilde{O}(\sqrt{\kappa^{-1} \phi T})$ regret over $T$ rounds where $\phi$ is the minimum eigenvalue of the covariance of contexts and $\kappa$ is a lower bound of the variance of rewards.
no code implementations • 11 Jun 2022 • Wonyoung Kim, Myunghee Cho Paik, Min-hwan Oh
We propose a linear contextual bandit algorithm with $O(\sqrt{dT\log T})$ regret bound, where $d$ is the dimension of contexts and $T$ isthe time horizon.
no code implementations • NeurIPS 2021 • Wonyoung Kim, Gi-Soo Kim, Myunghee Cho Paik
A challenging aspect of the bandit problem is that a stochastic reward is observed only for the chosen arm and the rewards of other arms remain missing.
no code implementations • 1 Feb 2021 • Wonyoung Kim, Gi-Soo Kim, Myunghee Cho Paik
A challenging aspect of the bandit problem is that a stochastic reward is observed only for the chosen arm and the rewards of other arms remain missing.
1 code implementation • ICML 2020 • Yongchan Kwon, Wonyoung Kim, Joong-Ho Won, Myunghee Cho Paik
We show that our approximation and risk consistency results naturally extend to the cases when data are locally perturbed.
1 code implementation • 28 Jan 2019 • Yongchan Kwon, Wonyoung Kim, Masashi Sugiyama, Myunghee Cho Paik
We consider the problem of learning a binary classifier from only positive and unlabeled observations (called PU learning).