Search Results for author: Myunghee Cho Paik

Found 14 papers, 5 papers with code

Doubly-Robust Off-Policy Evaluation with Estimated Logging Policy

no code implementations2 Apr 2024 Kyungbok Lee, Myunghee Cho Paik

We introduce a novel doubly-robust (DR) off-policy evaluation (OPE) estimator for Markov decision processes, DRUnknown, designed for situations where both the logging policy and the value function are unknown.

Multi-Armed Bandits Off-policy evaluation

Wasserstein Geodesic Generator for Conditional Distributions

1 code implementation20 Aug 2023 Young-geun Kim, Kyungbok Lee, Youngwon Choi, Joong-Ho Won, Myunghee Cho Paik

The conditional distributions given unobserved intermediate domains are on the Wasserstein geodesic between conditional distributions given two observed domain labels.

Double Doubly Robust Thompson Sampling for Generalized Linear Contextual Bandits

no code implementations15 Sep 2022 Wonyoung Kim, Kyungbok Lee, Myunghee Cho Paik

We propose a novel contextual bandit algorithm for generalized linear rewards with an $\tilde{O}(\sqrt{\kappa^{-1} \phi T})$ regret over $T$ rounds where $\phi$ is the minimum eigenvalue of the covariance of contexts and $\kappa$ is a lower bound of the variance of rewards.

Multi-Armed Bandits Thompson Sampling

Squeeze All: Novel Estimator and Self-Normalized Bound for Linear Contextual Bandits

no code implementations11 Jun 2022 Wonyoung Kim, Myunghee Cho Paik, Min-hwan Oh

We propose a linear contextual bandit algorithm with $O(\sqrt{dT\log T})$ regret bound, where $d$ is the dimension of contexts and $T$ isthe time horizon.

Multi-Armed Bandits

Doubly Robust Thompson Sampling with Linear Payoffs

no code implementations NeurIPS 2021 Wonyoung Kim, Gi-Soo Kim, Myunghee Cho Paik

A challenging aspect of the bandit problem is that a stochastic reward is observed only for the chosen arm and the rewards of other arms remain missing.

Thompson Sampling

Doubly robust Thompson sampling for linear payoffs

no code implementations1 Feb 2021 Wonyoung Kim, Gi-Soo Kim, Myunghee Cho Paik

A challenging aspect of the bandit problem is that a stochastic reward is observed only for the chosen arm and the rewards of other arms remain missing.

Thompson Sampling

Kernel-convoluted Deep Neural Networks with Data Augmentation

1 code implementation4 Dec 2020 Minjin Kim, Young-geun Kim, Dongha Kim, Yongdai Kim, Myunghee Cho Paik

The Mixup method (Zhang et al. 2018), which uses linearly interpolated data, has emerged as an effective data augmentation tool to improve generalization performance and the robustness to adversarial examples.

Data Augmentation

Doubly-Robust Lasso Bandit

1 code implementation NeurIPS 2019 Gi-Soo Kim, Myunghee Cho Paik

Contextual multi-armed bandit algorithms are widely used in sequential decision tasks such as news article recommendation systems, web page ad placement algorithms, and mobile health.

Multi-Armed Bandits Recommendation Systems

Contextual Multi-armed Bandit Algorithm for Semiparametric Reward Model

no code implementations31 Jan 2019 Gi-Soo Kim, Myunghee Cho Paik

We prove that the high-probability upper bound of the regret incurred by the proposed algorithm has the same order as the Thompson sampling algorithm for linear reward models.

Recommendation Systems Thompson Sampling

Principled analytic classifier for positive-unlabeled learning via weighted integral probability metric

1 code implementation28 Jan 2019 Yongchan Kwon, Wonyoung Kim, Masashi Sugiyama, Myunghee Cho Paik

We consider the problem of learning a binary classifier from only positive and unlabeled observations (called PU learning).

Hyperparameter Optimization

Cannot find the paper you are looking for? You can Submit a new open access paper.