Search Results for author: Masatoshi Uehara

Found 41 papers, 13 papers with code

Double Reinforcement Learning for Efficient and Robust Off-Policy Evaluation

no code implementations • ICML 2020 • Nathan Kallus, Masatoshi Uehara

Off-policy evaluation (OPE) in reinforcement learning allows one to evaluate novel decision policies without needing to conduct exploration, which is often costly or otherwise infeasible.

Off-policy evaluation reinforcement-learning

Paper
Add Code

Regularized DeepIV with Model Selection

no code implementations • 7 Mar 2024 • Zihao Li, Hui Lan, Vasilis Syrgkanis, Mengdi Wang, Masatoshi Uehara

In this paper, we study nonparametric estimation of instrumental variable (IV) regressions.

Model Selection regression

Paper
Add Code

Feedback Efficient Online Fine-Tuning of Diffusion Models

no code implementations • 26 Feb 2024 • Masatoshi Uehara, Yulai Zhao, Kevin Black, Ehsan Hajiramezanali, Gabriele Scalia, Nathaniel Lee Diamant, Alex M Tseng, Sergey Levine, Tommaso Biancalani

It is natural to frame this as a reinforcement learning (RL) problem, in which the objective is to fine-tune a diffusion model to maximize a reward function that corresponds to some property.

reinforcement-learning Reinforcement Learning (RL)

Paper
Add Code

Fine-Tuning of Continuous-Time Diffusion Models as Entropy-Regularized Control

no code implementations • 23 Feb 2024 • Masatoshi Uehara, Yulai Zhao, Kevin Black, Ehsan Hajiramezanali, Gabriele Scalia, Nathaniel Lee Diamant, Alex M Tseng, Tommaso Biancalani, Sergey Levine

Diffusion models excel at capturing complex data distributions, such as those of natural images and proteins.

Paper
Add Code

Functional Graphical Models: Structure Enables Offline Data-Driven Optimization

no code implementations • 8 Jan 2024 • Jakub Grudzien Kuba, Masatoshi Uehara, Pieter Abbeel, Sergey Levine

This kind of data-driven optimization (DDO) presents a range of challenges beyond those in standard prediction problems, since we need models that successfully predict the performance of new designs that are better than the best designs seen in the training set.

Paper
Add Code

Source Condition Double Robust Inference on Functionals of Inverse Problems

no code implementations • 25 Jul 2023 • Andrew Bennett, Nathan Kallus, Xiaojie Mao, Whitney Newey, Vasilis Syrgkanis, Masatoshi Uehara

We consider estimation of parameters defined as linear functionals of solutions to linear inverse problems.

Paper
Add Code

Off-Policy Evaluation of Ranking Policies under Diverse User Behavior

1 code implementation • 26 Jun 2023 • Haruka Kiyohara, Masatoshi Uehara, Yusuke Narita, Nobuyuki Shimizu, Yasuo Yamamoto, Yuta Saito

We show that the resulting estimator, which we call Adaptive IPS (AIPS), can be unbiased under any complex user behavior.

Off-policy evaluation

Paper
Code

Provable Reward-Agnostic Preference-Based Reinforcement Learning

no code implementations • 29 May 2023 • Wenhao Zhan, Masatoshi Uehara, Wen Sun, Jason D. Lee

Preference-based Reinforcement Learning (PbRL) is a paradigm in which an RL agent learns to optimize a task using pair-wise preference-based feedback over trajectories, rather than explicit reward signals.

reinforcement-learning

Paper
Add Code

Provable Offline Preference-Based Reinforcement Learning

no code implementations • 24 May 2023 • Wenhao Zhan, Masatoshi Uehara, Nathan Kallus, Jason D. Lee, Wen Sun

Our proposed algorithm consists of two main steps: (1) estimate the implicit reward using Maximum Likelihood Estimation (MLE) with general function approximation from offline data and (2) solve a distributionally robust planning problem over a confidence set around the MLE.

reinforcement-learning

Paper
Add Code

Distributional Offline Policy Evaluation with Predictive Error Guarantees

1 code implementation • 19 Feb 2023 • Runzhe Wu, Masatoshi Uehara, Wen Sun

Our theoretical results show that for both finite-horizon and infinite-horizon discounted settings, FLE can learn distributions that are close to the ground truth under total variation distance and Wasserstein distance, respectively.

Paper
Code

Minimax Instrumental Variable Regression and $L_2$ Convergence Guarantees without Identification or Closedness

no code implementations • 10 Feb 2023 • Andrew Bennett, Nathan Kallus, Xiaojie Mao, Whitney Newey, Vasilis Syrgkanis, Masatoshi Uehara

In this paper, we study nonparametric estimation of instrumental variable (IV) regressions.

regression valid

Paper
Add Code

A Review of Off-Policy Evaluation in Reinforcement Learning

no code implementations • 13 Dec 2022 • Masatoshi Uehara, Chengchun Shi, Nathan Kallus

Reinforcement learning (RL) is one of the most vibrant research frontiers in machine learning and has been recently applied to solve a number of challenging problems.

Off-policy evaluation reinforcement-learning

Paper
Add Code

Inference on Strongly Identified Functionals of Weakly Identified Functions

no code implementations • 17 Aug 2022 • Andrew Bennett, Nathan Kallus, Xiaojie Mao, Whitney Newey, Vasilis Syrgkanis, Masatoshi Uehara

In a variety of applications, including nonparametric instrumental variable (NPIV) analysis, proximal causal inference under unmeasured confounding, and missing-not-at-random data with shadow variables, we are interested in inference on a continuous linear functional (e. g., average causal effects) of nuisance function (e. g., NPIV regression) defined by conditional moment restrictions.

Causal Inference regression +1

Paper
Add Code

Future-Dependent Value-Based Off-Policy Evaluation in POMDPs

1 code implementation • NeurIPS 2023 • Masatoshi Uehara, Haruka Kiyohara, Andrew Bennett, Victor Chernozhukov, Nan Jiang, Nathan Kallus, Chengchun Shi, Wen Sun

Finally, we extend our methods to learning of dynamics and establish the connection between our approach and the well-known spectral learning methods in POMDPs.

Off-policy evaluation

Paper
Code

PAC Reinforcement Learning for Predictive State Representations

no code implementations • 12 Jul 2022 • Wenhao Zhan, Masatoshi Uehara, Wen Sun, Jason D. Lee

We show that given a realizable model class, the sample complexity of learning the near optimal policy only scales polynomially with respect to the statistical complexity of the model class, without any explicit polynomial dependence on the size of the state and observation spaces.

reinforcement-learning Reinforcement Learning (RL)

Paper
Add Code

Provably Efficient Reinforcement Learning in Partially Observable Dynamical Systems

no code implementations • 24 Jun 2022 • Masatoshi Uehara, Ayush Sekhari, Jason D. Lee, Nathan Kallus, Wen Sun

We study Reinforcement Learning for partially observable dynamical systems using function approximation.

reinforcement-learning Reinforcement Learning (RL)

Paper
Add Code

Computationally Efficient PAC RL in POMDPs with Latent Determinism and Conditional Embeddings

no code implementations • 24 Jun 2022 • Masatoshi Uehara, Ayush Sekhari, Jason D. Lee, Nathan Kallus, Wen Sun

We show our algorithm's computational and statistical complexities scale polynomially with respect to the horizon and the intrinsic dimension of the feature on the observation space.

Paper
Add Code

Efficient Reinforcement Learning in Block MDPs: A Model-free Representation Learning Approach

1 code implementation • 31 Jan 2022 • Xuezhou Zhang, Yuda Song, Masatoshi Uehara, Mengdi Wang, Alekh Agarwal, Wen Sun

We present BRIEE (Block-structured Representation learning with Interleaved Explore Exploit), an algorithm for efficient reinforcement learning in Markov Decision Processes with block-structured dynamics (i. e., Block MDPs), where rich observations are generated from a set of unknown latent states.

reinforcement-learning Reinforcement Learning (RL) +1

Paper
Code

A Minimax Learning Approach to Off-Policy Evaluation in Confounded Partially Observable Markov Decision Processes

1 code implementation • 12 Nov 2021 • Chengchun Shi, Masatoshi Uehara, Jiawei Huang, Nan Jiang

In this work, we first propose novel identification methods for OPE in POMDPs with latent confounders, by introducing bridge functions that link the target policy's value and the observed data distribution.

Off-policy evaluation

Paper
Code

Representation Learning for Online and Offline RL in Low-rank MDPs

no code implementations • ICLR 2022 • Masatoshi Uehara, Xuezhou Zhang, Wen Sun

This work studies the question of Representation Learning in RL: how can we learn a compact low-dimensional representation such that on top of the representation we can perform RL procedures such as exploration and exploitation, in a sample efficient manner.

Offline RL Representation Learning

Paper
Add Code

Pessimistic Model-based Offline Reinforcement Learning under Partial Coverage

no code implementations • ICLR 2022 • Masatoshi Uehara, Wen Sun

Under the assumption that the ground truth model belongs to our function class (i. e., realizability in the function class), CPPO has a PAC guarantee with offline data only providing partial coverage, i. e., it can learn a policy that competes against any policy that is covered by the offline data.

Offline RL reinforcement-learning +2

Paper
Add Code

Mitigating Covariate Shift in Imitation Learning via Offline Data Without Great Coverage

1 code implementation • NeurIPS 2021 • Jonathan D. Chang, Masatoshi Uehara, Dhruv Sreenivas, Rahul Kidambi, Wen Sun

Instead, the learner is presented with a static offline dataset of state-action-next state transition triples from a potentially less proficient behavior policy.

Continuous Control Imitation Learning

Paper
Code

Mitigating Covariate Shift in Imitation Learning via Offline Data With Partial Coverage

1 code implementation • NeurIPS 2021 • Jonathan Daniel Chang, Masatoshi Uehara, Dhruv Sreenivas, Rahul Kidambi, Wen Sun

Instead, the learner is presented with a static offline dataset of state-action-next state triples from a potentially less proficient behavior policy.

Continuous Control Imitation Learning

Paper
Code

Causal Inference Under Unmeasured Confounding With Negative Controls: A Minimax Learning Approach

no code implementations • 25 Mar 2021 • Nathan Kallus, Xiaojie Mao, Masatoshi Uehara

Previous work has relied on completeness conditions on these functions to identify the causal parameters and required uniqueness assumptions in estimation, and they also focused on parametric estimation of bridge functions.

Causal Inference

Paper
Add Code

Finite Sample Analysis of Minimax Offline Reinforcement Learning: Completeness, Fast Rates and First-Order Efficiency

no code implementations • 5 Feb 2021 • Masatoshi Uehara, Masaaki Imaizumi, Nan Jiang, Nathan Kallus, Wen Sun, Tengyang Xie

We offer a theoretical characterization of off-policy evaluation (OPE) in reinforcement learning using function approximation for marginal importance weights and $q$-functions when these are estimated using recent minimax methods.

Off-policy evaluation reinforcement-learning

Paper
Add Code

Fast Rates for the Regret of Offline Reinforcement Learning

no code implementations • 31 Jan 2021 • Yichun Hu, Nathan Kallus, Masatoshi Uehara

Second, we provide new analyses of FQI and Bellman residual minimization to establish the correct pointwise convergence guarantees.

Decision Making reinforcement-learning +1

Paper
Add Code

Optimal Off-Policy Evaluation from Multiple Logging Policies

1 code implementation • 21 Oct 2020 • Nathan Kallus, Yuta Saito, Masatoshi Uehara

We study off-policy evaluation (OPE) from multiple logging policies, each generating a dataset of fixed size, i. e., stratified sampling.

Off-policy evaluation

Paper
Code

Doubly Robust Off-Policy Value and Gradient Estimation for Deterministic Policies

1 code implementation • NeurIPS 2020 • Nathan Kallus, Masatoshi Uehara

Targeting deterministic policies, for which action is a deterministic function of state, is crucial since optimal policies are always deterministic (up to ties).

Paper
Code

Efficient Evaluation of Natural Stochastic Policies in Offline Reinforcement Learning

no code implementations • 6 Jun 2020 • Nathan Kallus, Masatoshi Uehara

Compared with the classic case of a pre-specified evaluation policy, when evaluating natural stochastic policies, the efficiency bound, which measures the best-achievable estimation error, is inflated since the evaluation policy itself is unknown.

Off-policy evaluation reinforcement-learning

Paper
Add Code

Off-Policy Evaluation and Learning for External Validity under a Covariate Shift

1 code implementation • NeurIPS 2020 • Masahiro Kato, Masatoshi Uehara, Shota Yasui

Then, we propose doubly robust and efficient estimators for OPE and OPL under a covariate shift by using a nonparametric estimator of the density ratio between the historical and evaluation data distributions.

Off-policy evaluation

Paper
Code

Statistically Efficient Off-Policy Policy Gradients

no code implementations • ICML 2020 • Nathan Kallus, Masatoshi Uehara

Policy gradient methods in reinforcement learning update policy parameters by taking steps in the direction of an estimated gradient of policy value.

Policy Gradient Methods

Paper
Add Code

Localized Debiased Machine Learning: Efficient Inference on Quantile Treatment Effects and Beyond

1 code implementation • 30 Dec 2019 • Nathan Kallus, Xiaojie Mao, Masatoshi Uehara

A central example is the efficient estimating equation for the (local) quantile treatment effect ((L)QTE) in causal inference, which involves as a nuisance the covariate-conditional cumulative distribution function evaluated at the quantile to be estimated.

BIG-bench Machine Learning Causal Inference

Paper
Code

Minimax Weight and Q-Function Learning for Off-Policy Evaluation

no code implementations • ICML 2020 • Masatoshi Uehara, Jiawei Huang, Nan Jiang

We provide theoretical investigations into off-policy evaluation in reinforcement learning using function approximators for (marginalized) importance weights and value functions.

Off-policy evaluation

Paper
Add Code

Efficiently Breaking the Curse of Horizon in Off-Policy Evaluation with Double Reinforcement Learning

no code implementations • 12 Sep 2019 • Nathan Kallus, Masatoshi Uehara

This precisely characterizes the curse of horizon: in time-variant processes, OPE is only feasible in the near-on-policy setting, where behavior and target policies are sufficiently similar.

Off-policy evaluation reinforcement-learning

Paper
Add Code

Double Reinforcement Learning for Efficient Off-Policy Evaluation in Markov Decision Processes

1 code implementation • 22 Aug 2019 • Nathan Kallus, Masatoshi Uehara

Off-policy evaluation (OPE) in reinforcement learning allows one to evaluate novel decision policies without needing to conduct exploration, which is often costly or otherwise infeasible.

Off-policy evaluation reinforcement-learning

Paper
Code

Intrinsically Efficient, Stable, and Bounded Off-Policy Evaluation for Reinforcement Learning

1 code implementation • NeurIPS 2019 • Nathan Kallus, Masatoshi Uehara

We propose new estimators for OPE based on empirical likelihood that are always more efficient than IS, SNIS, and DR and satisfy the same stability and boundedness properties as SNIS.

Multi-Armed Bandits Off-policy evaluation +1