no code implementations • 20 Mar 2025 • Yoav Wald, Mark Goldstein, Yonathan Efroni, Wouter A. C. van Amsterdam, Rajesh Ranganath
Problems in fields such as healthcare, robotics, and finance requires reasoning about the value both of what decision or action to take and when to take it.
no code implementations • 19 Feb 2025 • Yonathan Efroni, Ben Kretzu, Daniel Jiang, Jalaj Bhandari, Zheqing, Zhu, Karen Ullrich
To date, the multi-objective optimization literature has mainly focused on conflicting objectives, studying the Pareto front, or requiring users to balance tradeoffs.
no code implementations • 1 Oct 2024 • Wenhao Zhan, Scott Fujimoto, Zheqing Zhu, Jason D. Lee, Daniel R. Jiang, Yonathan Efroni
We study the problem of learning an approximate equilibrium in the offline multi-agent reinforcement learning (MARL) setting.
no code implementations • 3 Jun 2024 • Jeongyeol Kwon, Shie Mannor, Constantine Caramanis, Yonathan Efroni
Our result builds off a new perspective on the role of off-policy evaluation guarantees and coverage coefficients in LMDPs, a perspective, that has been overlooked in the context of exploration in partially observed environments.
no code implementations • 22 Apr 2024 • Lili Wu, Ben Evans, Riashat Islam, Raihan Seraj, Yonathan Efroni, Alex Lamb
In this work, we consider the problem of discovering the agent-centric state in the more challenging high-dimensional non-Markovian setting, when the state can be decoded from a sequence of past observations.
no code implementations • 11 Feb 2024 • Caner Hazirbas, Alicia Sun, Yonathan Efroni, Mark Ibrahim
Despite the remarkable performance of foundation vision-language models, the shared representation space for text and vision can also encode harmful label associations detrimental to fairness.
1 code implementation • 6 Dec 2023 • Zheqing Zhu, Rodrigo de Salvo Braz, Jalaj Bhandari, Daniel Jiang, Yi Wan, Yonathan Efroni, Liyuan Wang, Ruiyang Xu, Hongbo Guo, Alex Nikulkov, Dmytro Korenkevych, Urun Dogan, Frank Cheng, Zheng Wu, Wanqiao Xu
Reinforcement learning (RL) is a versatile framework for optimizing long-term goals.
no code implementations • 6 Nov 2023 • Anurag Koul, Shivakanth Sujit, Shaoru Chen, Ben Evans, Lili Wu, Byron Xu, Rajan Chari, Riashat Islam, Raihan Seraj, Yonathan Efroni, Lekan Molu, Miro Dudik, John Langford, Alex Lamb
Goal-conditioned planning benefits from learned low-dimensional representations of rich observations.
no code implementations • 11 Oct 2023 • Jeongyeol Kwon, Yonathan Efroni, Shie Mannor, Constantine Caramanis
In such an environment, the latent information remains fixed throughout each episode, since the identity of the user does not change during an interaction.
2 code implementations • 31 Oct 2022 • Riashat Islam, Manan Tomar, Alex Lamb, Yonathan Efroni, Hongyu Zang, Aniket Didolkar, Dipendra Misra, Xin Li, Harm van Seijen, Remi Tachet des Combes, John Langford
We find that contemporary representation learning techniques can fail on datasets where the noise is a complex and time dependent process, which is prevalent in practical applications.
no code implementations • 5 Oct 2022 • Jeongyeol Kwon, Yonathan Efroni, Constantine Caramanis, Shie Mannor
Then, through a method-of-moments approach, we design a procedure that provably learns a near-optimal policy with $O(\texttt{poly}(A) + \texttt{poly}(M, H)^{\min(M, H)})$ interactions.
no code implementations • 5 Oct 2022 • Jeongyeol Kwon, Yonathan Efroni, Constantine Caramanis, Shie Mannor
We consider episodic reinforcement learning in reward-mixing Markov decision processes (RMMDPs): at the beginning of every episode nature randomly picks a latent reward model among $M$ candidates and an agent interacts with the MDP throughout the episode for $H$ time steps.
no code implementations • 17 Jul 2022 • Alex Lamb, Riashat Islam, Yonathan Efroni, Aniket Didolkar, Dipendra Misra, Dylan Foster, Lekan Molu, Rajan Chari, Akshay Krishnamurthy, John Langford
In many sequential decision-making tasks, the agent is not able to model the full complexity of the world, which consists of multitudes of relevant and irrelevant information.
no code implementations • 9 Jun 2022 • Yonathan Efroni, Dylan J. Foster, Dipendra Misra, Akshay Krishnamurthy, John Langford
In real-world reinforcement learning applications the learner's observation space is ubiquitously high-dimensional with both relevant and irrelevant information about the task at hand.
no code implementations • 8 Feb 2022 • Yonathan Efroni, Chi Jin, Akshay Krishnamurthy, Sobhan Miryoosefi
Real-world sequential decision making problems commonly involve partial observability, which requires the agent to maintain a memory of history in order to infer the latent states, plan and make good decisions.
no code implementations • 30 Jan 2022 • Jeongyeol Kwon, Yonathan Efroni, Constantine Caramanis, Shie Mannor
This parallelization gain is fundamentally altered by the presence of adversarial users: unless there are super-polynomial number of users, we show a lower bound of $\tilde{\Omega}(\min(S, A) \cdot \alpha^2 / \epsilon^2)$ {\it per-user} interactions to learn an $\epsilon$-optimal policy for the good users.
no code implementations • 17 Oct 2021 • Yonathan Efroni, Dipendra Misra, Akshay Krishnamurthy, Alekh Agarwal, John Langford
We initiate the formal study of latent state discovery in the presence of such exogenous noise sources by proposing a new model, the Exogenous Block MDP (EX-BMDP), for rich observation RL.
no code implementations • 12 Oct 2021 • Yonathan Efroni, Sham Kakade, Akshay Krishnamurthy, Cyril Zhang
However, in practice, we often encounter systems in which a large set of state variables evolve exogenously and independently of the control inputs; such systems are only partially controllable.
no code implementations • 12 Oct 2021 • Nadav Merlis, Yonathan Efroni, Shie Mannor
We consider a stochastic multi-armed bandit setting where reward must be actively queried for it to be observed.
no code implementations • NeurIPS 2021 • Jeongyeol Kwon, Yonathan Efroni, Constantine Caramanis, Shie Mannor
We study the problem of learning a near optimal policy for two reward-mixing MDPs.
no code implementations • ICLR 2022 • Yonathan Efroni, Dipendra Misra, Akshay Krishnamurthy, Alekh Agarwal, John Langford
We initiate the formal study of latent state discovery in the presence of such exogenous noise sources by proposing a new model, the Exogenous Block MDP (EX-BMDP), for rich observation RL.
no code implementations • NeurIPS 2021 • Alon Cohen, Yonathan Efroni, Yishay Mansour, Aviv Rosenberg
In this work we show that the minimax regret for this setting is $\widetilde O(\sqrt{ (B_\star^2 + B_\star) |S| |A| K})$ where $B_\star$ is a bound on the expected cost of the optimal policy from any state, $S$ is the state space, and $A$ is the action space.
no code implementations • NeurIPS 2021 • Jeongyeol Kwon, Yonathan Efroni, Constantine Caramanis, Shie Mannor
In this work, we consider the regret minimization problem for reinforcement learning in latent Markov Decision Processes (LMDP).
no code implementations • 5 Feb 2021 • Yonathan Efroni, Nadav Merlis, Aadirupa Saha, Shie Mannor
We analyze the performance of CBM based algorithms in different settings and show that they perform well in the presence of adversity in the contexts, initial states, and budgets.
no code implementations • 13 Aug 2020 • Yonathan Efroni, Nadav Merlis, Shie Mannor
The standard feedback model of reinforcement learning requires revealing the reward of every visited state-action pair.
no code implementations • 11 Jun 2020 • Guy Tennenholtz, Uri Shalit, Shie Mannor, Yonathan Efroni
We construct a linear bandit algorithm that takes advantage of the projected information, and prove regret bounds.
1 code implementation • ICLR 2022 • Manan Tomar, Lior Shani, Yonathan Efroni, Mohammad Ghavamzadeh
Overall, MDPO is derived from the MD principles, offers a unified approach to viewing a number of popular RL algorithms, and performs better than or on-par with TRPO, PPO, and SAC in a number of continuous control tasks.
no code implementations • 4 Mar 2020 • Yonathan Efroni, Shie Mannor, Matteo Pirotta
In many sequential decision-making problems, the goal is to optimize a utility function while satisfying a set of constraints on different utilities.
no code implementations • ICML 2020 • Yonathan Efroni, Lior Shani, Aviv Rosenberg, Shie Mannor
To the best of our knowledge, the two results are the first sub-linear regret bounds obtained for policy optimization algorithms with unknown transitions and bandit feedback.
no code implementations • ICML 2020 • Manan Tomar, Yonathan Efroni, Mohammad Ghavamzadeh
We derive model-free RL algorithms based on $\kappa$-PI and $\kappa$-VI in which the surrogate problem can be solved by any discrete or continuous action RL method, such as DQN and TRPO.
no code implementations • 25 Sep 2019 • Yonathan Efroni, Manan Tomar, Mohammad Ghavamzadeh
In this work, we explore the benefits of multi-step greedy policies in model-free RL when employed in the framework of multi-step Dynamic Programming (DP): multi-step Policy and Value Iteration.
no code implementations • NeurIPS 2020 • Yonathan Efroni, Mohammad Ghavamzadeh, Shie Mannor
This is the first work that proves improved sample complexity as a result of {\em increasing} the lookahead horizon in online planning.
no code implementations • 6 Sep 2019 • Lior Shani, Yonathan Efroni, Shie Mannor
Trust region policy optimization (TRPO) is a popular and empirically successful policy search algorithm in Reinforcement Learning (RL) in which a surrogate problem, that restricts consecutive policies to be 'close' to one another, is iteratively solved.
1 code implementation • NeurIPS 2019 • Yonathan Efroni, Nadav Merlis, Mohammad Ghavamzadeh, Shie Mannor
In this paper, we focus on model-based RL in the finite-state finite-horizon MDP setting and establish that exploring with \emph{greedy policies} -- act by \emph{1-step planning} -- can achieve tight minimax performance in terms of regret, $\tilde{\mathcal{O}}(\sqrt{HSAT})$.
Model-based Reinforcement Learning
reinforcement-learning
+2
2 code implementations • 26 Jan 2019 • Chen Tessler, Yonathan Efroni, Shie Mannor
In this work we formalize two new criteria of robustness to action uncertainty.
1 code implementation • 13 Dec 2018 • Lior Shani, Yonathan Efroni, Shie Mannor
We continue and analyze properties of exploration-conscious optimal policies and characterize two general approaches to solve such criteria.
no code implementations • NeurIPS 2018 • Yonathan Efroni, Gal Dalal, Bruno Scherrer, Shie Mannor
Multiple-step lookahead policies have demonstrated high empirical competence in Reinforcement Learning, via the use of Monte Carlo Tree Search or Model Predictive Control.
no code implementations • 6 Sep 2018 • Yonathan Efroni, Gal Dalal, Bruno Scherrer, Shie Mannor
Finite-horizon lookahead policies are abundantly used in Reinforcement Learning and demonstrate impressive empirical success.
no code implementations • ICML 2018 • Yonathan Efroni, Gal Dalal, Bruno Scherrer, Shie Mannor
The famous Policy Iteration algorithm alternates between policy improvement and policy evaluation.
no code implementations • 21 May 2018 • Yonathan Efroni, Gal Dalal, Bruno Scherrer, Shie Mannor
Multiple-step lookahead policies have demonstrated high empirical competence in Reinforcement Learning, via the use of Monte Carlo Tree Search or Model Predictive Control.
no code implementations • 10 Feb 2018 • Yonathan Efroni, Gal Dalal, Bruno Scherrer, Shie Mannor
The famous Policy Iteration algorithm alternates between policy improvement and policy evaluation.