no code implementations • 18 Nov 2024 • Masahiro Kato
Notably, this approach remains model-free as long as the original estimator and the conditional expected residual estimator satisfy the convergence rate condition.
no code implementations • 12 Nov 2024 • Masahiro Kato
A common approach in RD estimation is the use of nonparametric regression methods, such as local linear regression.
no code implementations • 19 Oct 2024 • Masahiro Kato
This study explores portfolio selection using predictive models for portfolio returns.
no code implementations • 29 May 2024 • Masahiro Kato
This study investigates a local asymptotic minimax optimal strategy for fixed-budget best arm identification (BAI).
no code implementations • 6 Mar 2024 • Masahiro Kato, Akihiro Oga, Wataru Komatsubara, Ryo Inokuchi
First, we derive the efficient covariate density and propensity score that minimize the semiparametric efficiency bound and find that optimizing both covariate density and propensity score minimizes the semiparametric efficiency bound more effectively than optimizing only the propensity score.
no code implementations • 5 Mar 2024 • Masahiro Kato
In high-dimensional linear regression, one typical approach is to assume sparsity.
no code implementations • 5 Mar 2024 • Masahiro Kato, Shinji Ito
For this issue, this study proposes an algorithm whose regret satisfies $O(\log(T))$ in the setting when the suboptimality gap is lower-bounded.
1 code implementation • 31 Jan 2024 • Toshinori Kitamura, Tadashi Kozuno, Masahiro Kato, Yuki Ichihara, Soichiro Nishimori, Akiyoshi Sannai, Sho Sonoda, Wataru Kumagai, Yutaka Matsuo
We study a primal-dual (PD) reinforcement learning (RL) algorithm for online constrained Markov decision processes (CMDPs).
no code implementations • 8 Jan 2024 • Masahiro Kato, Kyohei Okumura, Takuya Ishihara, Toru Kitagawa
Setting the worst-case expected regret as the performance criterion of adaptive sampling and recommended policies, we derive its asymptotic lower bounds, and propose a strategy, Adaptive Sampling-Policy Learning strategy (PLAS), whose leading factor of the regret upper bound aligns with the lower bound as the size of experimental units increases.
no code implementations • 27 Dec 2023 • Masahiro Kato, Shinji Ito
The goal of this study is to develop a strategy that is effective in both stochastic and adversarial environments, with theoretical guarantees.
no code implementations • 20 Dec 2023 • Masahiro Kato
They also propose a strategy, assuming that the variances of rewards are known, and show that it is asymptotically optimal in the sense that its probability of misidentification matches the lower bound as the budget approaches infinity.
no code implementations • 30 Oct 2023 • Masahiro Kato
Because available information is limited in actual experiments, we develop a lower bound that is valid under the unknown means and the unknown choice of the best arm, which are referred to as the worst-case lower bound.
no code implementations • 25 Oct 2023 • Masahiro Kato, Kota Matsui, Ryo Inokuchi
For this problem, existing studies have proposed covariate shift adaptation via importance weighting using the density ratio.
no code implementations • 25 Oct 2023 • Masahiro Kato, Masaaki Imaizumi
This study assumes two linear regression models between a potential outcome and covariates of the two treatments and defines CATEs as a difference between the linear regression models.
no code implementations • 20 Jul 2023 • Masahiro Kato, Akari Ohda, Masaaki Imaizumi
Based on this assumption, we estimate SC weights by matching the moments of treated outcomes with the weighted sum of moments of untreated outcomes.
no code implementations • 8 Mar 2023 • Masahiro Kato, Shuting Wu, Kodai Kureishi, Shota Yasui
Therefore, the positive labels that we observe are a combination of both the exposure and the labeling, which creates a selection bias problem for the observed positive samples.
no code implementations • 6 Feb 2023 • Masahiro Kato, Masaaki Imaizumi, Takuya Ishihara, Toru Kitagawa
We evaluate the decision based on the expected simple regret, which is the difference between the expected outcomes of the best arm and the recommended arm.
no code implementations • 15 Sep 2022 • Masahiro Kato, Masaaki Imaizumi, Takuya Ishihara, Toru Kitagawa
We then develop the ``Random Sampling (RS)-Augmented Inverse Probability weighting (AIPW) strategy,'' which is asymptotically optimal in the sense that the probability of misidentification under the strategy matches the lower bound when the budget goes to infinity in the small-gap regime.
no code implementations • 10 Mar 2022 • Danielle Cabel, Shonosuke Sugasawa, Masahiro Kato, Kosaku Takanashi, Kenichiro McAlinn
Spatial data are characterized by their spatial dependence, which is often complex, non-linear, and difficult to capture with a single model.
no code implementations • 10 Feb 2022 • Masahiro Kato, Masaaki Imaizumi
We study the benign overfitting theory in the prediction of the conditional average treatment effect (CATE), with linear regression models.
no code implementations • 31 Jan 2022 • Masahiro Kato, Masaaki Imaizumi, Kentaro Minami
This paper provides a unified perspective for the Kullback-Leibler (KL)-divergence and the integral probability metrics (IPMs) from the perspective of maximum likelihood density-ratio estimation (DRE).
no code implementations • 12 Jan 2022 • Masahiro Kato, Kaito Ariu, Masaaki Imaizumi, Masahiro Nomura, Chao Qin
We show that a strategy following the Neyman allocation rule (Neyman, 1934) is asymptotically optimal when the gap between the expected rewards is small.
no code implementations • NeurIPS 2021 • Masahiro Kato, Kenichiro McAlinn, Shota Yasui
This paper proposes a DR estimator for dependent samples obtained from adaptive experiments.
1 code implementation • 18 Nov 2021 • Junpei Komiyama, Kaito Ariu, Masahiro Kato, Chao Qin
We consider best arm identification in the multi-armed bandit problem.
no code implementations • ICLR 2022 • Masahiro Kato, Masaaki Imaizumi, Kenichiro McAlinn, Shota Yasui, Haruo Kakehi
We consider learning causal relationships under conditional moment restrictions.
no code implementations • 16 Sep 2021 • Kaito Ariu, Masahiro Kato, Junpei Komiyama, Kenichiro McAlinn, Chao Qin
We consider the "policy choice" problem -- otherwise known as best arm identification in the bandit literature -- proposed by Kasy and Sautmann (2021) for adaptive experimental design.
no code implementations • 3 Aug 2021 • Masahiro Kato, Masaaki Imaizumi, Kenichiro McAlinn, Haruo Kakehi, Shota Yasui
To address this issue, we propose a method that transforms conditional moment restrictions to unconditional moment restrictions through importance weighting, using a conditional density ratio estimator.
1 code implementation • 26 Jun 2021 • Masahiro Kato, Kaito Ariu
We demonstrate that contextual information can be used to improve the efficiency of the identification of the best marginalized mean reward compared with the results of Garivier & Kaufmann (2016).
no code implementations • 11 May 2021 • Riku Togashi, Masahiro Kato, Mayu Otani, Tetsuya Sakai, Shin'ichi Satoh
However, such methods have two main drawbacks particularly in large-scale applications; (1) the pairwise approach is severely inefficient due to the quadratic computational cost; and (2) even recent model-based samplers (e. g. IRGAN) cannot achieve practical efficiency due to the training of an extra model.
no code implementations • 17 Feb 2021 • Masahiro Kato
To mitigate this limitation, we propose another assumption that the average logging policy converges to a time-invariant function and show the doubly robust (DR) estimator's asymptotic normality.
no code implementations • 19 Jan 2021 • Riku Togashi, Masahiro Kato, Mayu Otani, Shin'ichi Satoh
Learning from implicit user feedback is challenging as we can only observe positive samples but never access negative ones.
no code implementations • 24 Oct 2020 • Masahiro Kato, Zhenghang Cui, Yoshihiro Fukuhara
In this paper, in order to acquire a more reliable classifier against adversarial attacks, we propose the method of Adversarial Training with a Rejection Option (ATRO).
no code implementations • 23 Oct 2020 • Masahiro Kato, Kenshi Abe, Kaito Ariu, Shota Yasui
Based on the properties of the evaluation policy, we categorize OPE situations.
no code implementations • 23 Oct 2020 • Masahiro Kato, Yusuke Kaneko
The goal of off-policy evaluation (OPE) is to evaluate a new policy using historical data obtained via a behavior policy.
no code implementations • 8 Oct 2020 • Masahiro Kato, Shota Yasui, Kenichiro McAlinn
This paper proposes a DR estimator for dependent samples obtained from adaptive experiments.
no code implementations • 3 Oct 2020 • Masahiro Kato, Kei Nakagawa, Kenshi Abe, Tetsuro Morimura, Kentaro Baba
To this end, we propose a method that trains our policy to maximize the expected quadratic utility, defined as a weighted sum of the first and second moments of the rewards obtained through our policy.
no code implementations • 28 Sep 2020 • Masahiro Kato, Kei Nakagawa
In this paper, we suggest expected quadratic utility maximization (EQUM) as a new framework for policy gradient style reinforcement learning (RL) algorithms with mean-variance control.
no code implementations • 28 Sep 2020 • Masahiro Kato, Shota Yasui
We consider training a binary classifier under delayed feedback (\emph{DF learning}).
1 code implementation • 12 Jun 2020 • Masahiro Kato, Takeshi Teshima
Density ratio estimation (DRE) is at the core of various machine learning tasks such as anomaly detection and domain adaptation.
no code implementations • 12 Jun 2020 • Masahiro Kato
The goal of OPE is to evaluate a new policy using historical data obtained from behavior policies generated by the bandit algorithm.
1 code implementation • NeurIPS 2020 • Masahiro Kato, Masatoshi Uehara, Shota Yasui
Then, we propose doubly robust and efficient estimators for OPE and OPL under a covariate shift by using a nonparametric estimator of the density ratio between the historical and evaluation data distributions.
no code implementations • 13 Feb 2020 • Masahiro Kato, Takuya Ishihara, Junya Honda, Yusuke Narita
In adaptive experimental design, the experimenter is allowed to change the probability of assigning a treatment using past observations for estimating the ATE efficiently.
no code implementations • 2 Nov 2019 • Masahiro Kato, Hikaru Kawarazaki
By applying the proposed method, we can obtain a model that predicts the label for the unlabeled test data well without losing the interpretability of the model.
no code implementations • 25 Sep 2019 • Masahiro Kato, Yoshihiro Fukuhara, Hirokatsu Kataoka, Shigeo Morishima
Our main idea is to apply a framework of learning with rejection and adversarial examples to assist in the decision making for such suspicious samples.
no code implementations • ICLR 2019 • Masahiro Kato, Takeshi Teshima, Junya Honda
However, this assumption is unrealistic in many instances of PU learning because it fails to capture the existence of a selection bias in the labeling process.
no code implementations • 15 Sep 2018 • Masahiro Kato, Liyuan Xu, Gang Niu, Masashi Sugiyama
In this paper, we propose a novel unified approach to estimating the class-prior and training a classifier alternately.