Search Results for author: Yonathan Efroni

Found 36 papers, 6 papers with code

The Bias of Harmful Label Associations in Vision-Language Models

no code implementations • 11 Feb 2024 • Caner Hazirbas, Alicia Sun, Yonathan Efroni, Mark Ibrahim

Despite the remarkable performance of foundation vision-language models, the shared representation space for text and vision can also encode harmful label associations detrimental to fairness.

Fairness

Paper
Add Code

Pearl: A Production-ready Reinforcement Learning Agent

1 code implementation • 6 Dec 2023 • Zheqing Zhu, Rodrigo de Salvo Braz, Jalaj Bhandari, Daniel Jiang, Yi Wan, Yonathan Efroni, Liyuan Wang, Ruiyang Xu, Hongbo Guo, Alex Nikulkov, Dmytro Korenkevych, Urun Dogan, Frank Cheng, Zheng Wu, Wanqiao Xu

Reinforcement Learning (RL) offers a versatile framework for achieving long-term goals.

reinforcement-learning Reinforcement Learning (RL)

2,362

Paper
Code

PcLast: Discovering Plannable Continuous Latent States

no code implementations • 6 Nov 2023 • Anurag Koul, Shivakanth Sujit, Shaoru Chen, Ben Evans, Lili Wu, Byron Xu, Rajan Chari, Riashat Islam, Raihan Seraj, Yonathan Efroni, Lekan Molu, Miro Dudik, John Langford, Alex Lamb

Goal-conditioned planning benefits from learned low-dimensional representations of rich, high-dimensional observations.

Paper
Add Code

Prospective Side Information for Latent MDPs

no code implementations • 11 Oct 2023 • Jeongyeol Kwon, Yonathan Efroni, Shie Mannor, Constantine Caramanis

In such an environment, the latent information remains fixed throughout each episode, since the identity of the user does not change during an interaction.

Decision Making

Paper
Add Code

Agent-Controller Representations: Principled Offline RL with Rich Exogenous Information

1 code implementation • 31 Oct 2022 • Riashat Islam, Manan Tomar, Alex Lamb, Yonathan Efroni, Hongyu Zang, Aniket Didolkar, Dipendra Misra, Xin Li, Harm van Seijen, Remi Tachet des Combes, John Langford

We find that contemporary representation learning techniques can fail on datasets where the noise is a complex and time dependent process, which is prevalent in practical applications.

Offline RL Reinforcement Learning (RL) +1

Paper
Code

Tractable Optimality in Episodic Latent MABs

no code implementations • 5 Oct 2022 • Jeongyeol Kwon, Yonathan Efroni, Constantine Caramanis, Shie Mannor

Then, through a method-of-moments approach, we design a procedure that provably learns a near-optimal policy with $O(\texttt{poly}(A) + \texttt{poly}(M, H)^{\min(M, H)})$ interactions.

Paper
Add Code

Reward-Mixing MDPs with a Few Latent Contexts are Learnable

no code implementations • 5 Oct 2022 • Jeongyeol Kwon, Yonathan Efroni, Constantine Caramanis, Shie Mannor

We consider episodic reinforcement learning in reward-mixing Markov decision processes (RMMDPs): at the beginning of every episode nature randomly picks a latent reward model among $M$ candidates and an agent interacts with the MDP throughout the episode for $H$ time steps.

Paper
Add Code

Guaranteed Discovery of Control-Endogenous Latent States with Multi-Step Inverse Models

no code implementations • 17 Jul 2022 • Alex Lamb, Riashat Islam, Yonathan Efroni, Aniket Didolkar, Dipendra Misra, Dylan Foster, Lekan Molu, Rajan Chari, Akshay Krishnamurthy, John Langford

In many sequential decision-making tasks, the agent is not able to model the full complexity of the world, which consists of multitudes of relevant and irrelevant information.

Decision Making

Paper
Add Code

Sample-Efficient Reinforcement Learning in the Presence of Exogenous Information

no code implementations • 9 Jun 2022 • Yonathan Efroni, Dylan J. Foster, Dipendra Misra, Akshay Krishnamurthy, John Langford

In real-world reinforcement learning applications the learner's observation space is ubiquitously high-dimensional with both relevant and irrelevant information about the task at hand.

reinforcement-learning Reinforcement Learning (RL)

Paper
Add Code

Provable Reinforcement Learning with a Short-Term Memory

no code implementations • 8 Feb 2022 • Yonathan Efroni, Chi Jin, Akshay Krishnamurthy, Sobhan Miryoosefi

Real-world sequential decision making problems commonly involve partial observability, which requires the agent to maintain a memory of history in order to infer the latent states, plan and make good decisions.

Decision Making reinforcement-learning +1

Paper
Add Code

Coordinated Attacks against Contextual Bandits: Fundamental Limits and Defense Mechanisms

no code implementations • 30 Jan 2022 • Jeongyeol Kwon, Yonathan Efroni, Constantine Caramanis, Shie Mannor

This parallelization gain is fundamentally altered by the presence of adversarial users: unless there are super-polynomial number of users, we show a lower bound of $\tilde{\Omega}(\min(S, A) \cdot \alpha^2 / \epsilon^2)$ {\it per-user} interactions to learn an $\epsilon$-optimal policy for the good users.

Collaborative Filtering Multi-Armed Bandits +1

Paper
Add Code

Provable RL with Exogenous Distractors via Multistep Inverse Dynamics

no code implementations • 17 Oct 2021 • Yonathan Efroni, Dipendra Misra, Akshay Krishnamurthy, Alekh Agarwal, John Langford

We initiate the formal study of latent state discovery in the presence of such exogenous noise sources by proposing a new model, the Exogenous Block MDP (EX-BMDP), for rich observation RL.

Reinforcement Learning (RL) Representation Learning

Paper
Add Code

Sparsity in Partially Controllable Linear Systems

no code implementations • 12 Oct 2021 • Yonathan Efroni, Sham Kakade, Akshay Krishnamurthy, Cyril Zhang

However, in practice, we often encounter systems in which a large set of state variables evolve exogenously and independently of the control inputs; such systems are only partially controllable.

Paper
Add Code

Query-Reward Tradeoffs in Multi-Armed Bandits

no code implementations • 12 Oct 2021 • Nadav Merlis, Yonathan Efroni, Shie Mannor

We consider a stochastic multi-armed bandit setting where reward must be actively queried for it to be observed.

Multi-Armed Bandits

Paper
Add Code

Reinforcement Learning in Reward-Mixing MDPs

no code implementations • NeurIPS 2021 • Jeongyeol Kwon, Yonathan Efroni, Constantine Caramanis, Shie Mannor

We study the problem of learning a near optimal policy for two reward-mixing MDPs.

Efficient Exploration reinforcement-learning +1

Paper
Add Code

Provably Filtering Exogenous Distractors using Multistep Inverse Dynamics

no code implementations • ICLR 2022 • Yonathan Efroni, Dipendra Misra, Akshay Krishnamurthy, Alekh Agarwal, John Langford

We initiate the formal study of latent state discovery in the presence of such exogenous noise sources by proposing a new model, the Exogenous Block MDP (EX-BMDP), for rich observation RL.

Reinforcement Learning (RL) Representation Learning

Paper
Add Code

Minimax Regret for Stochastic Shortest Path

no code implementations • NeurIPS 2021 • Alon Cohen, Yonathan Efroni, Yishay Mansour, Aviv Rosenberg

In this work we show that the minimax regret for this setting is $\widetilde O(\sqrt{ (B_\star^2 + B_\star) |S| |A| K})$ where $B_\star$ is a bound on the expected cost of the optimal policy from any state, $S$ is the state space, and $A$ is the action space.

Paper
Add Code

RL for Latent MDPs: Regret Guarantees and a Lower Bound

no code implementations • NeurIPS 2021 • Jeongyeol Kwon, Yonathan Efroni, Constantine Caramanis, Shie Mannor

In this work, we consider the regret minimization problem for reinforcement learning in latent Markov Decision Processes (LMDP).

Paper
Add Code

Confidence-Budget Matching for Sequential Budgeted Learning

no code implementations • 5 Feb 2021 • Yonathan Efroni, Nadav Merlis, Aadirupa Saha, Shie Mannor

We analyze the performance of CBM based algorithms in different settings and show that they perform well in the presence of adversity in the contexts, initial states, and budgets.

Decision Making Decision Making Under Uncertainty +2

Paper
Add Code

Reinforcement Learning with Trajectory Feedback

no code implementations • 13 Aug 2020 • Yonathan Efroni, Nadav Merlis, Shie Mannor

The standard feedback model of reinforcement learning requires revealing the reward of every visited state-action pair.

reinforcement-learning Reinforcement Learning (RL) +1

Paper
Add Code

Bandits with Partially Observable Confounded Data

no code implementations • 11 Jun 2020 • Guy Tennenholtz, Uri Shalit, Shie Mannor, Yonathan Efroni

We construct a linear bandit algorithm that takes advantage of the projected information, and prove regret bounds.

Multi-Armed Bandits

Paper
Add Code

Mirror Descent Policy Optimization

1 code implementation • ICLR 2022 • Manan Tomar, Lior Shani, Yonathan Efroni, Mohammad Ghavamzadeh

Overall, MDPO is derived from the MD principles, offers a unified approach to viewing a number of popular RL algorithms, and performs better than or on-par with TRPO, PPO, and SAC in a number of continuous control tasks.

Continuous Control Reinforcement Learning (RL)

Paper
Code

Exploration-Exploitation in Constrained MDPs

no code implementations • 4 Mar 2020 • Yonathan Efroni, Shie Mannor, Matteo Pirotta

In many sequential decision-making problems, the goal is to optimize a utility function while satisfying a set of constraints on different utilities.

Decision Making

Paper
Add Code

Optimistic Policy Optimization with Bandit Feedback

no code implementations • ICML 2020 • Yonathan Efroni, Lior Shani, Aviv Rosenberg, Shie Mannor

To the best of our knowledge, the two results are the first sub-linear regret bounds obtained for policy optimization algorithms with unknown transitions and bandit feedback.

Reinforcement Learning (RL)

Paper
Add Code

Multi-step Greedy Reinforcement Learning Algorithms

no code implementations • ICML 2020 • Manan Tomar, Yonathan Efroni, Mohammad Ghavamzadeh

We derive model-free RL algorithms based on $\kappa$-PI and $\kappa$-VI in which the surrogate problem can be solved by any discrete or continuous action RL method, such as DQN and TRPO.

Continuous Control Game of Go +3

Paper
Add Code

Multi-step Greedy Policies in Model-Free Deep Reinforcement Learning

no code implementations • 25 Sep 2019 • Yonathan Efroni, Manan Tomar, Mohammad Ghavamzadeh

In this work, we explore the benefits of multi-step greedy policies in model-free RL when employed in the framework of multi-step Dynamic Programming (DP): multi-step Policy and Value Iteration.

Continuous Control Game of Go +3

Paper
Add Code

Online Planning with Lookahead Policies

no code implementations • NeurIPS 2020 • Yonathan Efroni, Mohammad Ghavamzadeh, Shie Mannor

This is the first work that proves improved sample complexity as a result of {\em increasing} the lookahead horizon in online planning.

Paper
Add Code

Adaptive Trust Region Policy Optimization: Global Convergence and Faster Rates for Regularized MDPs

no code implementations • 6 Sep 2019 • Lior Shani, Yonathan Efroni, Shie Mannor

Trust region policy optimization (TRPO) is a popular and empirically successful policy search algorithm in Reinforcement Learning (RL) in which a surrogate problem, that restricts consecutive policies to be 'close' to one another, is iteratively solved.

Reinforcement Learning (RL)

Paper
Add Code

Tight Regret Bounds for Model-Based Reinforcement Learning with Greedy Policies

1 code implementation • NeurIPS 2019 • Yonathan Efroni, Nadav Merlis, Mohammad Ghavamzadeh, Shie Mannor

In this paper, we focus on model-based RL in the finite-state finite-horizon MDP setting and establish that exploring with \emph{greedy policies} -- act by \emph{1-step planning} -- can achieve tight minimax performance in terms of regret, $\tilde{\mathcal{O}}(\sqrt{HSAT})$.

Model-based Reinforcement Learning reinforcement-learning +1

Paper
Code

Action Robust Reinforcement Learning and Applications in Continuous Control

2 code implementations • 26 Jan 2019 • Chen Tessler, Yonathan Efroni, Shie Mannor

In this work we formalize two new criteria of robustness to action uncertainty.

Continuous Control reinforcement-learning +1

Paper
Code

Exploration Conscious Reinforcement Learning Revisited

1 code implementation • 13 Dec 2018 • Lior Shani, Yonathan Efroni, Shie Mannor

We continue and analyze properties of exploration-conscious optimal policies and characterize two general approaches to solve such criteria.

reinforcement-learning Reinforcement Learning (RL)

Paper
Code

Multiple-Step Greedy Policies in Approximate and Online Reinforcement Learning

no code implementations • NeurIPS 2018 • Yonathan Efroni, Gal Dalal, Bruno Scherrer, Shie Mannor

Multiple-step lookahead policies have demonstrated high empirical competence in Reinforcement Learning, via the use of Monte Carlo Tree Search or Model Predictive Control.

Model Predictive Control reinforcement-learning +1

Paper
Add Code

How to Combine Tree-Search Methods in Reinforcement Learning

no code implementations • 6 Sep 2018 • Yonathan Efroni, Gal Dalal, Bruno Scherrer, Shie Mannor

Finite-horizon lookahead policies are abundantly used in Reinforcement Learning and demonstrate impressive empirical success.

reinforcement-learning Reinforcement Learning (RL)

Paper
Add Code

Beyond the One-Step Greedy Approach in Reinforcement Learning

no code implementations • ICML 2018 • Yonathan Efroni, Gal Dalal, Bruno Scherrer, Shie Mannor

The famous Policy Iteration algorithm alternates between policy improvement and policy evaluation.

reinforcement-learning Reinforcement Learning (RL)

Paper
Add Code

Multiple-Step Greedy Policies in Online and Approximate Reinforcement Learning

no code implementations • 21 May 2018 • Yonathan Efroni, Gal Dalal, Bruno Scherrer, Shie Mannor

Multiple-step lookahead policies have demonstrated high empirical competence in Reinforcement Learning, via the use of Monte Carlo Tree Search or Model Predictive Control.

Model Predictive Control reinforcement-learning +1

Paper
Add Code

Beyond the One Step Greedy Approach in Reinforcement Learning

no code implementations • 10 Feb 2018 • Yonathan Efroni, Gal Dalal, Bruno Scherrer, Shie Mannor

The famous Policy Iteration algorithm alternates between policy improvement and policy evaluation.

reinforcement-learning Reinforcement Learning (RL)

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.