Search Results for author: Junya Honda

Found 37 papers, 9 papers with code

Learning with Posterior Sampling for Revenue Management under Time-varying Demand

1 code implementation • 8 May 2024 • Kazuma Shimizu, Junya Honda, Shinji Ito, Shinji Nakadai

We also propose a heuristic modification of the proposed algorithm, which further efficiently learns the pricing policy in the experiments.

Management

Paper
Code

Follow-the-Perturbed-Leader with Fréchet-type Tail Distributions: Optimality in Adversarial Bandits and Best-of-Both-Worlds

no code implementations • 8 Mar 2024 • Jongyeong Lee, Junya Honda, Shinji Ito, Min-hwan Oh

In this paper, we establish a sufficient condition for perturbations to achieve $\mathcal{O}(\sqrt{KT})$ regrets in the adversarial setting, which covers, e. g., Fr\'{e}chet, Pareto, and Student-$t$ distributions.

Paper
Add Code

Adaptive Learning Rate for Follow-the-Regularized-Leader: Competitive Analysis and Best-of-Both-Worlds

no code implementations • 1 Mar 2024 • Shinji Ito, Taira Tsuchiya, Junya Honda

Follow-The-Regularized-Leader (FTRL) is known as an effective and versatile approach in online learning, where appropriate choice of the learning rate is crucial for smaller regret.

Decision Making Multi-Armed Bandits

Paper
Add Code

Exploration by Optimization with Hybrid Regularizers: Logarithmic Regret with Adversarial Robustness in Partial Monitoring

no code implementations • 13 Feb 2024 • Taira Tsuchiya, Shinji Ito, Junya Honda

This development allows us to significantly improve the existing regret bounds of best-of-both-worlds (BOBW) algorithms, which achieves nearly optimal bounds both in stochastic and adversarial environments.

Adversarial Robustness Decision Making

Paper
Add Code

Thompson Exploration with Best Challenger Rule in Best Arm Identification

no code implementations • 1 Oct 2023 • Jongyeong Lee, Junya Honda, Masashi Sugiyama

This paper studies the fixed-confidence best arm identification (BAI) problem in the bandit framework in the canonical single-parameter exponential models.

Thompson Sampling

Paper
Add Code

Stability-penalty-adaptive follow-the-regularized-leader: Sparsity, game-dependency, and best-of-both-worlds

no code implementations • NeurIPS 2023 • Taira Tsuchiya, Shinji Ito, Junya Honda

With this result, we establish several algorithms with three types of adaptivity: sparsity, game-dependency, and best-of-both-worlds (BOBW).

Decision Making Single Particle Analysis

Paper
Add Code

A General Recipe for the Analysis of Randomized Multi-Armed Bandit Algorithms

no code implementations • 10 Mar 2023 • Dorian Baudry, Kazuya Suzuki, Junya Honda

In this paper we propose a general methodology to derive regret bounds for randomized multi-armed bandit algorithms.

Thompson Sampling

Paper
Add Code

Optimality of Thompson Sampling with Noninformative Priors for Pareto Bandits

no code implementations • 3 Feb 2023 • Jongyeong Lee, Junya Honda, Chao-Kai Chiang, Masashi Sugiyama

In addition to the empirical performance, TS has been shown to achieve asymptotic problem-dependent lower bounds in several models.

Thompson Sampling

Paper
Add Code

Best-of-Both-Worlds Algorithms for Partial Monitoring

no code implementations • 29 Jul 2022 • Taira Tsuchiya, Shinji Ito, Junya Honda

This study considers the partial monitoring problem with $k$-actions and $d$-outcomes and provides the first best-of-both-worlds algorithms, whose regrets are favorably bounded both in the stochastic and adversarial regimes.

Paper
Add Code

Adversarially Robust Multi-Armed Bandit Algorithm with Variance-Dependent Regret Bounds

no code implementations • 14 Jun 2022 • Shinji Ito, Taira Tsuchiya, Junya Honda

In fact, they have provided a stochastic MAB algorithm with gap-variance-dependent regret bounds of $O(\sum_{i: \Delta_i>0} (\frac{\sigma_i^2}{\Delta_i} + 1) \log T )$ for loss variance $\sigma_i^2$ of arm $i$.

Paper
Add Code

Minimax Optimal Algorithms for Fixed-Budget Best Arm Identification

1 code implementation • 9 Jun 2022 • Junpei Komiyama, Taira Tsuchiya, Junya Honda

We introduce two rates, $R^{\mathrm{go}}$ and $R^{\mathrm{go}}_{\infty}$, corresponding to lower bounds on the probability of misidentification, each of which is associated with a proposed algorithm.

Paper
Code

The Survival Bandit Problem

no code implementations • 7 Jun 2022 • Charles Riou, Junya Honda, Masashi Sugiyama

For that purpose, we identify two key components in the survival regret: the regret given no ruin (which corresponds to the regret in the MAB), and the probability that the procedure is interrupted, called the probability of ruin.

Paper
Add Code

Nearly Optimal Best-of-Both-Worlds Algorithms for Online Learning with Feedback Graphs

no code implementations • 2 Jun 2022 • Shinji Ito, Taira Tsuchiya, Junya Honda

As Alon et al. [2015] have shown, tight regret bounds depend on the structure of the feedback graph: strongly observable graphs yield minimax regret of $\tilde{\Theta}( \alpha^{1/2} T^{1/2} )$, while weakly observable graphs induce minimax regret of $\tilde{\Theta}( \delta^{1/3} T^{2/3} )$, where $\alpha$ and $\delta$, respectively, represent the independence number of the graph and the domination number of a certain portion of the graph.

Open-Ended Question Answering

Paper
Add Code

Finite-time Analysis of Globally Nonstationary Multi-Armed Bandits

1 code implementation • 23 Jul 2021 • Junpei Komiyama, Edouard Fouché, Junya Honda

We demonstrate that ADR-bandit has nearly optimal performance when abrupt or gradual changes occur in a coordinated manner that we call global changes.

Multi-Armed Bandits

Paper
Code

Mediated Uncoupled Learning: Learning Functions without Direct Input-output Correspondences

1 code implementation • 16 Jul 2021 • Ikko Yamane, Junya Honda, Florian Yger, Masashi Sugiyama

In this paper, we consider the task of predicting $Y$ from $X$ when we have no paired data of them, but we have two separate, independent datasets of $X$ and $Y$ each observed with some mediating variable $U$, that is, we have two datasets $S_X = \{(X_i, U_i)\}$ and $S_Y = \{(U'_j, Y'_j)\}$.

Paper
Code

Combinatorial Pure Exploration with Full-bandit Feedback and Beyond: Solving Combinatorial Optimization under Uncertainty with Limited Observation

no code implementations • 31 Dec 2020 • Yuko Kuroki, Junya Honda, Masashi Sugiyama

Combinatorial optimization is one of the fundamental research fields that has been extensively studied in theoretical computer science and operations research.

Combinatorial Optimization Multi-Armed Bandits +1

Paper
Add Code

Online Dense Subgraph Discovery via Blurred-Graph Feedback

no code implementations • ICML 2020 • Yuko Kuroki, Atsushi Miyauchi, Junya Honda, Masashi Sugiyama

Dense subgraph discovery aims to find a dense component in edge-weighted graphs.

Graph Mining valid

Paper
Add Code

Analysis and Design of Thompson Sampling for Stochastic Partial Monitoring

no code implementations • NeurIPS 2020 • Taira Tsuchiya, Junya Honda, Masashi Sugiyama

We investigate finite stochastic partial monitoring, which is a general model for sequential learning with limited feedback.

Decision Making Thompson Sampling

Paper
Add Code

Time-varying Gaussian Process Bandit Optimization with Non-constant Evaluation Time

no code implementations • 10 Mar 2020 • Hideaki Imamura, Nontawat Charoenphakdee, Futoshi Futami, Issei Sato, Junya Honda, Masashi Sugiyama

If the black-box function varies with time, then time-varying Bayesian optimization is a promising framework.

Bayesian Optimization Recommendation Systems

Paper
Add Code

Efficient Adaptive Experimental Design for Average Treatment Effect Estimation

no code implementations • 13 Feb 2020 • Masahiro Kato, Takuya Ishihara, Junya Honda, Yusuke Narita

In adaptive experimental design, the experimenter is allowed to change the probability of assigning a treatment using past observations for estimating the ATE efficiently.

Experimental Design

Paper
Add Code

Uncoupled Regression from Pairwise Comparison Data

1 code implementation • NeurIPS 2019 • Liyuan Xu, Junya Honda, Gang Niu, Masashi Sugiyama

We propose two practical methods for uncoupled regression from pairwise comparison data and show that the learned regression model converges to the optimal model with the optimal parametric convergence rate when the target variable distributes uniformly.

Learning-To-Rank regression

Paper
Code

Learning from Positive and Unlabeled Data with a Selection Bias

no code implementations • ICLR 2019 • Masahiro Kato, Takeshi Teshima, Junya Honda

However, this assumption is unrealistic in many instances of PU learning because it fails to capture the existence of a selection bias in the labeling process.

Selection bias

Paper
Add Code

A Note on KL-UCB+ Policy for the Stochastic Bandit

no code implementations • 19 Mar 2019 • Junya Honda

A classic setting of the stochastic K-armed bandit problem is considered in this note.

Paper
Add Code

Polynomial-time Algorithms for Multiple-arm Identification with Full-bandit Feedback

no code implementations • 27 Feb 2019 • Yuko Kuroki, Liyuan Xu, Atsushi Miyauchi, Junya Honda, Masashi Sugiyama

Based on our approximation algorithm, we propose novel bandit algorithms for the top-k selection problem, and prove that our algorithms run in polynomial time.

Paper
Add Code

A Bad Arm Existence Checking Problem

no code implementations • 31 Jan 2019 • Koji Tabata, Atsuyoshi Nakamura, Junya Honda, Tamiki Komatsuzaki

We study a bad arm existing checking problem in which a player's task is to judge whether a positive arm exists or not among given K arms by drawing as small number of arms as possible.

Paper
Add Code

On the Calibration of Multiclass Classification with Rejection

1 code implementation • NeurIPS 2019 • Chenri Ni, Nontawat Charoenphakdee, Junya Honda, Masashi Sugiyama

First, we consider an approach based on simultaneous training of a classifier and a rejector, which achieves the state-of-the-art performance in the binary case.

Classification General Classification

Paper
Code

Dueling Bandits with Qualitative Feedback

no code implementations • 14 Sep 2018 • Liyuan Xu, Junya Honda, Masashi Sugiyama

We formulate and study a novel multi-armed bandit problem called the qualitative dueling bandit (QDB) problem, where an agent observes not numeric but qualitative feedback by pulling each arm.

Paper
Add Code

Unsupervised Domain Adaptation Based on Source-guided Discrepancy

no code implementations • 11 Sep 2018 • Seiichi Kuroki, Nontawat Charoenphakdee, Han Bao, Junya Honda, Issei Sato, Masashi Sugiyama

A previously proposed discrepancy that does not use the source domain labels requires high computational cost to estimate and may lead to a loose generalization error bound in the target domain.

Unsupervised Domain Adaptation

Paper
Add Code

Nonconvex Optimization for Regression with Fairness Constraints

1 code implementation • ICML 2018 • Junpei Komiyama, Akiko Takeda, Junya Honda, Hajime Shimao

However, a fairness level as a constraint induces a nonconvexity of the feasible region, which disables the use of an off-the-shelf convex optimizer.

Attribute Fairness +1

Paper
Code

Position-based Multiple-play Bandit Problem with Unknown Position Bias

no code implementations • NeurIPS 2017 • Junpei Komiyama, Junya Honda, Akiko Takeda

Motivated by online advertising, we study a multiple-play multi-armed bandit problem with position bias that involves several slots and the latter slots yield fewer rewards.

Position

Paper
Add Code

Good Arm Identification via Bandit Feedback

no code implementations • 17 Oct 2017 • Hideaki Kano, Junya Honda, Kentaro Sakamaki, Kentaro Matsuura, Atsuyoshi Nakamura, Masashi Sugiyama

We consider a novel stochastic multi-armed bandit problem called {\em good arm identification} (GAI), where a good arm is defined as an arm with expected reward greater than or equal to a given threshold.

Paper
Add Code

Fully adaptive algorithm for pure exploration in linear bandits

no code implementations • 16 Oct 2017 • Liyuan Xu, Junya Honda, Masashi Sugiyama

We propose the first fully-adaptive algorithm for pure exploration in linear bandits---the task to find the arm with the largest expected reward, which depends on an unknown parameter linearly.

Paper
Add Code

Copeland Dueling Bandit Problem: Regret Lower Bound, Optimal Algorithm, and Computationally Efficient Algorithm

no code implementations • 5 May 2016 • Junpei Komiyama, Junya Honda, Hiroshi Nakagawa

We study the K-armed dueling bandit problem, a variation of the standard stochastic bandit problem where the feedback is limited to relative comparisons of a pair of arms.

Paper
Add Code

Regret Lower Bound and Optimal Algorithm in Finite Stochastic Partial Monitoring

no code implementations • NeurIPS 2015 • Junpei Komiyama, Junya Honda, Hiroshi Nakagawa

To show the optimality of PM-DMED with respect to the regret bound, we slightly modify the algorithm by introducing a hinge function (PM-DMED-Hinge).

Paper
Add Code

Regret Lower Bound and Optimal Algorithm in Dueling Bandit Problem

1 code implementation • 8 Jun 2015 • Junpei Komiyama, Junya Honda, Hisashi Kashima, Hiroshi Nakagawa

We study the $K$-armed dueling bandit problem, a variation of the standard stochastic bandit problem where the feedback is limited to relative comparisons of a pair of arms.

Paper
Code

Optimal Regret Analysis of Thompson Sampling in Stochastic Multi-armed Bandit Problem with Multiple Plays

1 code implementation • 2 Jun 2015 • Junpei Komiyama, Junya Honda, Hiroshi Nakagawa

Recently, Thompson sampling (TS), a randomized algorithm with a Bayesian spirit, has attracted much attention for its empirically excellent performance, and it is revealed to have an optimal regret bound in the standard single-play MAB problem.

Thompson Sampling

Paper
Code

Normal Bandits of Unknown Means and Variances: Asymptotic Optimality, Finite Horizon Regret Bounds, and a Solution to an Open Problem

no code implementations • 22 Apr 2015 • Wesley Cowan, Junya Honda, Michael N. Katehakis

Consider the problem of sampling sequentially from a finite number of $N \geq 2$ populations, specified by random variables $X^i_k$, $ i = 1,\ldots , N,$ and $k = 1, 2, \ldots$; where $X^i_k$ denotes the outcome from population $i$ the $k^{th}$ time it is sampled.

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.