You need to log in to edit.

You can create a new account if you don't have one.

Or, discuss a change on Slack.

You can create a new account if you don't have one.

Or, discuss a change on Slack.

no code implementations • 29 Jul 2022 • Taira Tsuchiya, Shinji Ito, Junya Honda

To be more specific, we show that for non-degenerate locally observable games, the regret in the stochastic regime is bounded by $O(k^3 m^2 \log(T) \log(k_{\Pi} T) / \Delta_{\mathrm{\min}})$ and in the adversarial regime by $O(k^{2/3} m \sqrt{T \log(T) \log k_{\Pi}})$, where $T$ is the number of rounds, $m$ is the maximum number of distinct observations per action, $\Delta_{\min}$ is the minimum optimality gap, and $k_{\Pi}$ is the number of Pareto optimal actions.

no code implementations • 14 Jun 2022 • Shinji Ito, Taira Tsuchiya, Junya Honda

In fact, they have provided a stochastic MAB algorithm with gap-variance-dependent regret bounds of $O(\sum_{i: \Delta_i>0} (\frac{\sigma_i^2}{\Delta_i} + 1) \log T )$ for loss variance $\sigma_i^2$ of arm $i$.

no code implementations • 2 Jun 2022 • Shinji Ito, Taira Tsuchiya, Junya Honda

As Alon et al. [2015] have shown, tight regret bounds depend on the structure of the feedback graph: \textit{strongly observable} graphs yield minimax regret of $\tilde{\Theta}( \alpha^{1/2} T^{1/2} )$, while \textit{weakly observable} graphs induce minimax regret of $\tilde{\Theta}( \delta^{1/3} T^{2/3} )$, where $\alpha$ and $\delta$, respectively, represent the independence number of the graph and the domination number of a certain portion of the graph.

no code implementations • 15 Mar 2022 • Hanna Sumita, Shinji Ito, Kei Takemura, Daisuke Hatano, Takuro Fukunaga, Naonori Kakimura, Ken-ichi Kawarabayashi

The key features of our problem are (1) an agent is reusable, i. e., an agent comes back to the market after completing the assigned task, (2) an agent may reject the assigned task to stay the market, and (3) a task may accommodate multiple agents.

no code implementations • NeurIPS 2021 • Shinji Ito

This study aims to develop bandit algorithms that automatically exploit tendencies of certain environments to improve performance, without any prior knowledge regarding the environments.

no code implementations • NeurIPS 2021 • Shinji Ito

The main contribution of this paper is to show that optimal robustness can be expressed by a square-root dependency on the amount of corruption.

no code implementations • 20 Jan 2021 • Kei Takemura, Shinji Ito, Daisuke Hatano, Hanna Sumita, Takuro Fukunaga, Naonori Kakimura, Ken-ichi Kawarabayashi

However, there is a gap of $\tilde{O}(\max(\sqrt{d}, \sqrt{k}))$ between the current best upper and lower bounds, where $d$ is the dimension of the feature vectors, $k$ is the number of the chosen arms in a round, and $\tilde{O}(\cdot)$ ignores the logarithmic factors.

no code implementations • NeurIPS 2020 • Shinji Ito

Swap regret, a generic performance measure of online decision-making algorithms, plays an important role in the theory of repeated games, along with a close connection to correlated equilibria in strategic games.

no code implementations • NeurIPS 2020 • Shinji Ito, Daisuke Hatano, Hanna Sumita, Kei Takemura, Takuro Fukunaga, Naonori Kakimura, Ken-ichi Kawarabayashi

This paper offers a nearly optimal algorithm for online linear optimization with delayed bandit feedback.

no code implementations • NeurIPS 2020 • Shinji Ito, Shuichi Hirahara, Tasuku Soma, Yuichi Yoshida

We propose novel algorithms with first- and second-order regret bounds for adversarial linear bandits.

no code implementations • NeurIPS 2019 • Shinji Ito, Daisuke Hatano, Hanna Sumita, Kei Takemura, Takuro Fukunaga, Naonori Kakimura, Ken-ichi Kawarabayashi

Our algorithm for non-stochastic settings has an oracle complexity of $\tilde{O}( T )$ and is the first algorithm that achieves both a regret bound of $\tilde{O}( \sqrt{T} )$ and an oracle complexity of $\tilde{O} ( \mathrm{poly} ( T ) )$, given only linear optimization oracles.

no code implementations • NeurIPS 2019 • Shinji Ito

This paper considers submodular function minimization with \textit{noisy evaluation oracles} that return the function value of a submodular objective with zero-mean additive noise.

no code implementations • NeurIPS 2019 • Shinji Ito, Daisuke Hatano, Hanna Sumita, Kei Takemura, Takuro Fukunaga, Naonori Kakimura, Ken-ichi Kawarabayashi

\textit{Bandit combinatorial optimization} is a bandit framework in which a player chooses an action within a given finite set $\mathcal{A} \subseteq \{ 0, 1 \}^d$ and incurs a loss that is the inner product of the chosen action and an unobservable loss vector in $\mathbb{R} ^ d$ in each round.

no code implementations • 5 Sep 2019 • Kei Takemura, Shinji Ito

Our empirical evaluation with artificial and real-world datasets demonstrates that the proposed algorithms with the arm-wise randomization technique outperform the existing algorithms without this technique, especially for the clustered case.

no code implementations • NeurIPS 2018 • Shinji Ito, Daisuke Hatano, Sumita Hanna, Akihiro Yabe, Takuro Fukunaga, Naonori Kakimura, Ken-ichi Kawarabayashi

Online portfolio selection is a sequential decision-making problem in which a learner repetitively selects a portfolio over a set of assets, aiming to maximize long-term return.

no code implementations • ICML 2018 • Shinji Ito, Akihiro Yabe, Ryohei Fujimaki

Predictive optimization, however, suffers from the problem of a calculated optimal solution’s being evaluated too optimistically, i. e., the value of the objective function is overestimated.

no code implementations • ICML 2018 • Akihiro Yabe, Daisuke Hatano, Hanna Sumita, Shinji Ito, Naonori Kakimura, Takuro Fukunaga, Ken-ichi Kawarabayashi

In this setting, the arms are identified with interventions on a given causal graph, and the effect of an intervention propagates throughout all over the causal graph.

no code implementations • NeurIPS 2017 • Shinji Ito, Daisuke Hatano, Hanna Sumita, Akihiro Yabe, Takuro Fukunaga, Naonori Kakimura, Ken-ichi Kawarabayashi

Under these assumptions, we present polynomial-time sublinear-regret algorithms for the online sparse linear regression.

no code implementations • NeurIPS 2016 • Shinji Ito, Ryohei Fujimaki

On the basis of this connection, we propose an efficient algorithm that employs network flow algorithms.

no code implementations • 18 May 2016 • Shinji Ito, Ryohei Fujimaki

This paper addresses a novel data science problem, prescriptive price optimization, which derives the optimal price strategy to maximize future profit/revenue on the basis of massive predictive formulas produced by machine learning.

Cannot find the paper you are looking for? You can
Submit a new open access paper.

Contact us on:
hello@paperswithcode.com
.
Papers With Code is a free resource with all data licensed under CC-BY-SA.