Search Results for author: Masatoshi Uehara

Found 46 papers, 18 papers with code

Double Reinforcement Learning for Efficient and Robust Off-Policy Evaluation

no code implementations ICML 2020 Nathan Kallus, Masatoshi Uehara

Off-policy evaluation (OPE) in reinforcement learning allows one to evaluate novel decision policies without needing to conduct exploration, which is often costly or otherwise infeasible.

Off-policy evaluation reinforcement-learning +1

Fine-Tuning Discrete Diffusion Models via Reward Optimization with Applications to DNA and Protein Design

1 code implementation17 Oct 2024 Chenyu Wang, Masatoshi Uehara, Yichun He, Amy Wang, Tommaso Biancalani, Avantika Lal, Tommi Jaakkola, Sergey Levine, Hanchen Wang, Aviv Regev

Finally, we demonstrate the effectiveness of DRAKES in generating DNA and protein sequences that optimize enhancer activity and protein stability, respectively, important tasks for gene therapies and protein-based therapeutics.

Protein Design Reinforcement Learning (RL)

Understanding Reinforcement Learning-Based Fine-Tuning of Diffusion Models: A Tutorial and Review

1 code implementation18 Jul 2024 Masatoshi Uehara, Yulai Zhao, Tommaso Biancalani, Sergey Levine

We explain the application of various RL algorithms, including PPO, differentiable optimization, reward-weighted MLE, value-weighted sampling, and path consistency learning, tailored specifically for fine-tuning diffusion models.

Reinforcement Learning (RL)

Adding Conditional Control to Diffusion Models with Reinforcement Learning

no code implementations17 Jun 2024 Yulai Zhao, Masatoshi Uehara, Gabriele Scalia, Tommaso Biancalani, Sergey Levine, Ehsan Hajiramezanali

This work presents a novel method based on reinforcement learning (RL) to add additional controls, leveraging an offline dataset comprising inputs and corresponding labels.

reinforcement-learning Reinforcement Learning +1

Bridging Model-Based Optimization and Generative Modeling via Conservative Fine-Tuning of Diffusion Models

1 code implementation30 May 2024 Masatoshi Uehara, Yulai Zhao, Ehsan Hajiramezanali, Gabriele Scalia, Gökcen Eraslan, Avantika Lal, Sergey Levine, Tommaso Biancalani

To combine the strengths of both approaches, we adopt a hybrid method that fine-tunes cutting-edge diffusion models by optimizing reward models through RL.

Regularized DeepIV with Model Selection

no code implementations7 Mar 2024 Zihao Li, Hui Lan, Vasilis Syrgkanis, Mengdi Wang, Masatoshi Uehara

In this paper, we study nonparametric estimation of instrumental variable (IV) regressions.

Model Selection regression

Feedback Efficient Online Fine-Tuning of Diffusion Models

1 code implementation26 Feb 2024 Masatoshi Uehara, Yulai Zhao, Kevin Black, Ehsan Hajiramezanali, Gabriele Scalia, Nathaniel Lee Diamant, Alex M Tseng, Sergey Levine, Tommaso Biancalani

It is natural to frame this as a reinforcement learning (RL) problem, in which the objective is to fine-tune a diffusion model to maximize a reward function that corresponds to some property.

reinforcement-learning Reinforcement Learning +1

Functional Graphical Models: Structure Enables Offline Data-Driven Optimization

no code implementations8 Jan 2024 Jakub Grudzien Kuba, Masatoshi Uehara, Pieter Abbeel, Sergey Levine

This kind of data-driven optimization (DDO) presents a range of challenges beyond those in standard prediction problems, since we need models that successfully predict the performance of new designs that are better than the best designs seen in the training set.

Source Condition Double Robust Inference on Functionals of Inverse Problems

no code implementations25 Jul 2023 Andrew Bennett, Nathan Kallus, Xiaojie Mao, Whitney Newey, Vasilis Syrgkanis, Masatoshi Uehara

We consider estimation of parameters defined as linear functionals of solutions to linear inverse problems.

Off-Policy Evaluation of Ranking Policies under Diverse User Behavior

1 code implementation26 Jun 2023 Haruka Kiyohara, Masatoshi Uehara, Yusuke Narita, Nobuyuki Shimizu, Yasuo Yamamoto, Yuta Saito

We show that the resulting estimator, which we call Adaptive IPS (AIPS), can be unbiased under any complex user behavior.

Off-policy evaluation

Provable Reward-Agnostic Preference-Based Reinforcement Learning

no code implementations29 May 2023 Wenhao Zhan, Masatoshi Uehara, Wen Sun, Jason D. Lee

Preference-based Reinforcement Learning (PbRL) is a paradigm in which an RL agent learns to optimize a task using pair-wise preference-based feedback over trajectories, rather than explicit reward signals.

reinforcement-learning Reinforcement Learning

Provable Offline Preference-Based Reinforcement Learning

no code implementations24 May 2023 Wenhao Zhan, Masatoshi Uehara, Nathan Kallus, Jason D. Lee, Wen Sun

Our proposed algorithm consists of two main steps: (1) estimate the implicit reward using Maximum Likelihood Estimation (MLE) with general function approximation from offline data and (2) solve a distributionally robust planning problem over a confidence set around the MLE.

reinforcement-learning Reinforcement Learning

Distributional Offline Policy Evaluation with Predictive Error Guarantees

1 code implementation19 Feb 2023 Runzhe Wu, Masatoshi Uehara, Wen Sun

Our theoretical results show that for both finite-horizon and infinite-horizon discounted settings, FLE can learn distributions that are close to the ground truth under total variation distance and Wasserstein distance, respectively.

A Review of Off-Policy Evaluation in Reinforcement Learning

no code implementations13 Dec 2022 Masatoshi Uehara, Chengchun Shi, Nathan Kallus

Reinforcement learning (RL) is one of the most vibrant research frontiers in machine learning and has been recently applied to solve a number of challenging problems.

Off-policy evaluation reinforcement-learning +1

Inference on Strongly Identified Functionals of Weakly Identified Functions

no code implementations17 Aug 2022 Andrew Bennett, Nathan Kallus, Xiaojie Mao, Whitney Newey, Vasilis Syrgkanis, Masatoshi Uehara

In a variety of applications, including nonparametric instrumental variable (NPIV) analysis, proximal causal inference under unmeasured confounding, and missing-not-at-random data with shadow variables, we are interested in inference on a continuous linear functional (e. g., average causal effects) of nuisance function (e. g., NPIV regression) defined by conditional moment restrictions.

Causal Inference regression +1

Future-Dependent Value-Based Off-Policy Evaluation in POMDPs

1 code implementation NeurIPS 2023 Masatoshi Uehara, Haruka Kiyohara, Andrew Bennett, Victor Chernozhukov, Nan Jiang, Nathan Kallus, Chengchun Shi, Wen Sun

Finally, we extend our methods to learning of dynamics and establish the connection between our approach and the well-known spectral learning methods in POMDPs.

Off-policy evaluation

PAC Reinforcement Learning for Predictive State Representations

no code implementations12 Jul 2022 Wenhao Zhan, Masatoshi Uehara, Wen Sun, Jason D. Lee

We show that given a realizable model class, the sample complexity of learning the near optimal policy only scales polynomially with respect to the statistical complexity of the model class, without any explicit polynomial dependence on the size of the state and observation spaces.

reinforcement-learning Reinforcement Learning +1

Computationally Efficient PAC RL in POMDPs with Latent Determinism and Conditional Embeddings

no code implementations24 Jun 2022 Masatoshi Uehara, Ayush Sekhari, Jason D. Lee, Nathan Kallus, Wen Sun

We show our algorithm's computational and statistical complexities scale polynomially with respect to the horizon and the intrinsic dimension of the feature on the observation space.

Efficient Reinforcement Learning in Block MDPs: A Model-free Representation Learning Approach

1 code implementation31 Jan 2022 Xuezhou Zhang, Yuda Song, Masatoshi Uehara, Mengdi Wang, Alekh Agarwal, Wen Sun

We present BRIEE (Block-structured Representation learning with Interleaved Explore Exploit), an algorithm for efficient reinforcement learning in Markov Decision Processes with block-structured dynamics (i. e., Block MDPs), where rich observations are generated from a set of unknown latent states.

reinforcement-learning Reinforcement Learning (RL) +1

A Minimax Learning Approach to Off-Policy Evaluation in Confounded Partially Observable Markov Decision Processes

1 code implementation12 Nov 2021 Chengchun Shi, Masatoshi Uehara, Jiawei Huang, Nan Jiang

In this work, we first propose novel identification methods for OPE in POMDPs with latent confounders, by introducing bridge functions that link the target policy's value and the observed data distribution.

Off-policy evaluation

Representation Learning for Online and Offline RL in Low-rank MDPs

no code implementations ICLR 2022 Masatoshi Uehara, Xuezhou Zhang, Wen Sun

This work studies the question of Representation Learning in RL: how can we learn a compact low-dimensional representation such that on top of the representation we can perform RL procedures such as exploration and exploitation, in a sample efficient manner.

Offline RL Representation Learning

Pessimistic Model-based Offline Reinforcement Learning under Partial Coverage

no code implementations ICLR 2022 Masatoshi Uehara, Wen Sun

Under the assumption that the ground truth model belongs to our function class (i. e., realizability in the function class), CPPO has a PAC guarantee with offline data only providing partial coverage, i. e., it can learn a policy that competes against any policy that is covered by the offline data.

Offline RL reinforcement-learning +3

Mitigating Covariate Shift in Imitation Learning via Offline Data Without Great Coverage

1 code implementation NeurIPS 2021 Jonathan D. Chang, Masatoshi Uehara, Dhruv Sreenivas, Rahul Kidambi, Wen Sun

Instead, the learner is presented with a static offline dataset of state-action-next state transition triples from a potentially less proficient behavior policy.

continuous-control Continuous Control +1

Causal Inference Under Unmeasured Confounding With Negative Controls: A Minimax Learning Approach

no code implementations25 Mar 2021 Nathan Kallus, Xiaojie Mao, Masatoshi Uehara

Previous work has relied on completeness conditions on these functions to identify the causal parameters and required uniqueness assumptions in estimation, and they also focused on parametric estimation of bridge functions.

Causal Inference

Finite Sample Analysis of Minimax Offline Reinforcement Learning: Completeness, Fast Rates and First-Order Efficiency

no code implementations5 Feb 2021 Masatoshi Uehara, Masaaki Imaizumi, Nan Jiang, Nathan Kallus, Wen Sun, Tengyang Xie

We offer a theoretical characterization of off-policy evaluation (OPE) in reinforcement learning using function approximation for marginal importance weights and $q$-functions when these are estimated using recent minimax methods.

Off-policy evaluation reinforcement-learning

Fast Rates for the Regret of Offline Reinforcement Learning

no code implementations31 Jan 2021 Yichun Hu, Nathan Kallus, Masatoshi Uehara

Second, we provide new analyses of FQI and Bellman residual minimization to establish the correct pointwise convergence guarantees.

Decision Making reinforcement-learning +2

Optimal Off-Policy Evaluation from Multiple Logging Policies

1 code implementation21 Oct 2020 Nathan Kallus, Yuta Saito, Masatoshi Uehara

We study off-policy evaluation (OPE) from multiple logging policies, each generating a dataset of fixed size, i. e., stratified sampling.

Off-policy evaluation

Doubly Robust Off-Policy Value and Gradient Estimation for Deterministic Policies

1 code implementation NeurIPS 2020 Nathan Kallus, Masatoshi Uehara

Targeting deterministic policies, for which action is a deterministic function of state, is crucial since optimal policies are always deterministic (up to ties).

Efficient Evaluation of Natural Stochastic Policies in Offline Reinforcement Learning

no code implementations6 Jun 2020 Nathan Kallus, Masatoshi Uehara

Compared with the classic case of a pre-specified evaluation policy, when evaluating natural stochastic policies, the efficiency bound, which measures the best-achievable estimation error, is inflated since the evaluation policy itself is unknown.

Off-policy evaluation reinforcement-learning +1

Off-Policy Evaluation and Learning for External Validity under a Covariate Shift

1 code implementation NeurIPS 2020 Masahiro Kato, Masatoshi Uehara, Shota Yasui

Then, we propose doubly robust and efficient estimators for OPE and OPL under a covariate shift by using a nonparametric estimator of the density ratio between the historical and evaluation data distributions.

Off-policy evaluation

Statistically Efficient Off-Policy Policy Gradients

no code implementations ICML 2020 Nathan Kallus, Masatoshi Uehara

Policy gradient methods in reinforcement learning update policy parameters by taking steps in the direction of an estimated gradient of policy value.

Policy Gradient Methods Reinforcement Learning

Localized Debiased Machine Learning: Efficient Inference on Quantile Treatment Effects and Beyond

1 code implementation30 Dec 2019 Nathan Kallus, Xiaojie Mao, Masatoshi Uehara

A central example is the efficient estimating equation for the (local) quantile treatment effect ((L)QTE) in causal inference, which involves as a nuisance the covariate-conditional cumulative distribution function evaluated at the quantile to be estimated.

BIG-bench Machine Learning Causal Inference

Minimax Weight and Q-Function Learning for Off-Policy Evaluation

no code implementations ICML 2020 Masatoshi Uehara, Jiawei Huang, Nan Jiang

We provide theoretical investigations into off-policy evaluation in reinforcement learning using function approximators for (marginalized) importance weights and value functions.

Off-policy evaluation Reinforcement Learning

Efficiently Breaking the Curse of Horizon in Off-Policy Evaluation with Double Reinforcement Learning

no code implementations12 Sep 2019 Nathan Kallus, Masatoshi Uehara

This precisely characterizes the curse of horizon: in time-variant processes, OPE is only feasible in the near-on-policy setting, where behavior and target policies are sufficiently similar.

Off-policy evaluation reinforcement-learning +1

Double Reinforcement Learning for Efficient Off-Policy Evaluation in Markov Decision Processes

1 code implementation22 Aug 2019 Nathan Kallus, Masatoshi Uehara

Off-policy evaluation (OPE) in reinforcement learning allows one to evaluate novel decision policies without needing to conduct exploration, which is often costly or otherwise infeasible.

Off-policy evaluation reinforcement-learning +1

Intrinsically Efficient, Stable, and Bounded Off-Policy Evaluation for Reinforcement Learning

1 code implementation NeurIPS 2019 Nathan Kallus, Masatoshi Uehara

We propose new estimators for OPE based on empirical likelihood that are always more efficient than IS, SNIS, and DR and satisfy the same stability and boundedness properties as SNIS.

Multi-Armed Bandits Off-policy evaluation +2

Information criteria for non-normalized models

no code implementations15 May 2019 Takeru Matsuda, Masatoshi Uehara, Aapo Hyvarinen

However, model selection methods for general non-normalized models have not been proposed so far.

Model Selection

Analysis of Noise Contrastive Estimation from the Perspective of Asymptotic Variance

no code implementations24 Aug 2018 Masatoshi Uehara, Takeru Matsuda, Fumiyasu Komaki

First, we propose a method for reducing asymptotic variance by estimating the parameters of the auxiliary distribution.

Cannot find the paper you are looking for? You can Submit a new open access paper.