Search Results for author: Yonathan Efroni

Found 36 papers, 6 papers with code

The Bias of Harmful Label Associations in Vision-Language Models

no code implementations11 Feb 2024 Caner Hazirbas, Alicia Sun, Yonathan Efroni, Mark Ibrahim

Despite the remarkable performance of foundation vision-language models, the shared representation space for text and vision can also encode harmful label associations detrimental to fairness.

Fairness

PcLast: Discovering Plannable Continuous Latent States

no code implementations6 Nov 2023 Anurag Koul, Shivakanth Sujit, Shaoru Chen, Ben Evans, Lili Wu, Byron Xu, Rajan Chari, Riashat Islam, Raihan Seraj, Yonathan Efroni, Lekan Molu, Miro Dudik, John Langford, Alex Lamb

Goal-conditioned planning benefits from learned low-dimensional representations of rich, high-dimensional observations.

Prospective Side Information for Latent MDPs

no code implementations11 Oct 2023 Jeongyeol Kwon, Yonathan Efroni, Shie Mannor, Constantine Caramanis

In such an environment, the latent information remains fixed throughout each episode, since the identity of the user does not change during an interaction.

Decision Making

Agent-Controller Representations: Principled Offline RL with Rich Exogenous Information

1 code implementation31 Oct 2022 Riashat Islam, Manan Tomar, Alex Lamb, Yonathan Efroni, Hongyu Zang, Aniket Didolkar, Dipendra Misra, Xin Li, Harm van Seijen, Remi Tachet des Combes, John Langford

We find that contemporary representation learning techniques can fail on datasets where the noise is a complex and time dependent process, which is prevalent in practical applications.

Offline RL Reinforcement Learning (RL) +1

Tractable Optimality in Episodic Latent MABs

no code implementations5 Oct 2022 Jeongyeol Kwon, Yonathan Efroni, Constantine Caramanis, Shie Mannor

Then, through a method-of-moments approach, we design a procedure that provably learns a near-optimal policy with $O(\texttt{poly}(A) + \texttt{poly}(M, H)^{\min(M, H)})$ interactions.

Reward-Mixing MDPs with a Few Latent Contexts are Learnable

no code implementations5 Oct 2022 Jeongyeol Kwon, Yonathan Efroni, Constantine Caramanis, Shie Mannor

We consider episodic reinforcement learning in reward-mixing Markov decision processes (RMMDPs): at the beginning of every episode nature randomly picks a latent reward model among $M$ candidates and an agent interacts with the MDP throughout the episode for $H$ time steps.

Guaranteed Discovery of Control-Endogenous Latent States with Multi-Step Inverse Models

no code implementations17 Jul 2022 Alex Lamb, Riashat Islam, Yonathan Efroni, Aniket Didolkar, Dipendra Misra, Dylan Foster, Lekan Molu, Rajan Chari, Akshay Krishnamurthy, John Langford

In many sequential decision-making tasks, the agent is not able to model the full complexity of the world, which consists of multitudes of relevant and irrelevant information.

Decision Making

Sample-Efficient Reinforcement Learning in the Presence of Exogenous Information

no code implementations9 Jun 2022 Yonathan Efroni, Dylan J. Foster, Dipendra Misra, Akshay Krishnamurthy, John Langford

In real-world reinforcement learning applications the learner's observation space is ubiquitously high-dimensional with both relevant and irrelevant information about the task at hand.

reinforcement-learning Reinforcement Learning (RL)

Provable Reinforcement Learning with a Short-Term Memory

no code implementations8 Feb 2022 Yonathan Efroni, Chi Jin, Akshay Krishnamurthy, Sobhan Miryoosefi

Real-world sequential decision making problems commonly involve partial observability, which requires the agent to maintain a memory of history in order to infer the latent states, plan and make good decisions.

Decision Making reinforcement-learning +1

Coordinated Attacks against Contextual Bandits: Fundamental Limits and Defense Mechanisms

no code implementations30 Jan 2022 Jeongyeol Kwon, Yonathan Efroni, Constantine Caramanis, Shie Mannor

This parallelization gain is fundamentally altered by the presence of adversarial users: unless there are super-polynomial number of users, we show a lower bound of $\tilde{\Omega}(\min(S, A) \cdot \alpha^2 / \epsilon^2)$ {\it per-user} interactions to learn an $\epsilon$-optimal policy for the good users.

Collaborative Filtering Multi-Armed Bandits +1

Provable RL with Exogenous Distractors via Multistep Inverse Dynamics

no code implementations17 Oct 2021 Yonathan Efroni, Dipendra Misra, Akshay Krishnamurthy, Alekh Agarwal, John Langford

We initiate the formal study of latent state discovery in the presence of such exogenous noise sources by proposing a new model, the Exogenous Block MDP (EX-BMDP), for rich observation RL.

Reinforcement Learning (RL) Representation Learning

Sparsity in Partially Controllable Linear Systems

no code implementations12 Oct 2021 Yonathan Efroni, Sham Kakade, Akshay Krishnamurthy, Cyril Zhang

However, in practice, we often encounter systems in which a large set of state variables evolve exogenously and independently of the control inputs; such systems are only partially controllable.

Query-Reward Tradeoffs in Multi-Armed Bandits

no code implementations12 Oct 2021 Nadav Merlis, Yonathan Efroni, Shie Mannor

We consider a stochastic multi-armed bandit setting where reward must be actively queried for it to be observed.

Multi-Armed Bandits

Provably Filtering Exogenous Distractors using Multistep Inverse Dynamics

no code implementations ICLR 2022 Yonathan Efroni, Dipendra Misra, Akshay Krishnamurthy, Alekh Agarwal, John Langford

We initiate the formal study of latent state discovery in the presence of such exogenous noise sources by proposing a new model, the Exogenous Block MDP (EX-BMDP), for rich observation RL.

Reinforcement Learning (RL) Representation Learning

Minimax Regret for Stochastic Shortest Path

no code implementations NeurIPS 2021 Alon Cohen, Yonathan Efroni, Yishay Mansour, Aviv Rosenberg

In this work we show that the minimax regret for this setting is $\widetilde O(\sqrt{ (B_\star^2 + B_\star) |S| |A| K})$ where $B_\star$ is a bound on the expected cost of the optimal policy from any state, $S$ is the state space, and $A$ is the action space.

RL for Latent MDPs: Regret Guarantees and a Lower Bound

no code implementations NeurIPS 2021 Jeongyeol Kwon, Yonathan Efroni, Constantine Caramanis, Shie Mannor

In this work, we consider the regret minimization problem for reinforcement learning in latent Markov Decision Processes (LMDP).

Confidence-Budget Matching for Sequential Budgeted Learning

no code implementations5 Feb 2021 Yonathan Efroni, Nadav Merlis, Aadirupa Saha, Shie Mannor

We analyze the performance of CBM based algorithms in different settings and show that they perform well in the presence of adversity in the contexts, initial states, and budgets.

Decision Making Decision Making Under Uncertainty +2

Reinforcement Learning with Trajectory Feedback

no code implementations13 Aug 2020 Yonathan Efroni, Nadav Merlis, Shie Mannor

The standard feedback model of reinforcement learning requires revealing the reward of every visited state-action pair.

reinforcement-learning Reinforcement Learning (RL) +1

Bandits with Partially Observable Confounded Data

no code implementations11 Jun 2020 Guy Tennenholtz, Uri Shalit, Shie Mannor, Yonathan Efroni

We construct a linear bandit algorithm that takes advantage of the projected information, and prove regret bounds.

Multi-Armed Bandits

Mirror Descent Policy Optimization

1 code implementation ICLR 2022 Manan Tomar, Lior Shani, Yonathan Efroni, Mohammad Ghavamzadeh

Overall, MDPO is derived from the MD principles, offers a unified approach to viewing a number of popular RL algorithms, and performs better than or on-par with TRPO, PPO, and SAC in a number of continuous control tasks.

Continuous Control Reinforcement Learning (RL)

Exploration-Exploitation in Constrained MDPs

no code implementations4 Mar 2020 Yonathan Efroni, Shie Mannor, Matteo Pirotta

In many sequential decision-making problems, the goal is to optimize a utility function while satisfying a set of constraints on different utilities.

Decision Making

Optimistic Policy Optimization with Bandit Feedback

no code implementations ICML 2020 Yonathan Efroni, Lior Shani, Aviv Rosenberg, Shie Mannor

To the best of our knowledge, the two results are the first sub-linear regret bounds obtained for policy optimization algorithms with unknown transitions and bandit feedback.

Reinforcement Learning (RL)

Multi-step Greedy Reinforcement Learning Algorithms

no code implementations ICML 2020 Manan Tomar, Yonathan Efroni, Mohammad Ghavamzadeh

We derive model-free RL algorithms based on $\kappa$-PI and $\kappa$-VI in which the surrogate problem can be solved by any discrete or continuous action RL method, such as DQN and TRPO.

Continuous Control Game of Go +3

Multi-step Greedy Policies in Model-Free Deep Reinforcement Learning

no code implementations25 Sep 2019 Yonathan Efroni, Manan Tomar, Mohammad Ghavamzadeh

In this work, we explore the benefits of multi-step greedy policies in model-free RL when employed in the framework of multi-step Dynamic Programming (DP): multi-step Policy and Value Iteration.

Continuous Control Game of Go +3

Online Planning with Lookahead Policies

no code implementations NeurIPS 2020 Yonathan Efroni, Mohammad Ghavamzadeh, Shie Mannor

This is the first work that proves improved sample complexity as a result of {\em increasing} the lookahead horizon in online planning.

Adaptive Trust Region Policy Optimization: Global Convergence and Faster Rates for Regularized MDPs

no code implementations6 Sep 2019 Lior Shani, Yonathan Efroni, Shie Mannor

Trust region policy optimization (TRPO) is a popular and empirically successful policy search algorithm in Reinforcement Learning (RL) in which a surrogate problem, that restricts consecutive policies to be 'close' to one another, is iteratively solved.

Reinforcement Learning (RL)

Tight Regret Bounds for Model-Based Reinforcement Learning with Greedy Policies

1 code implementation NeurIPS 2019 Yonathan Efroni, Nadav Merlis, Mohammad Ghavamzadeh, Shie Mannor

In this paper, we focus on model-based RL in the finite-state finite-horizon MDP setting and establish that exploring with \emph{greedy policies} -- act by \emph{1-step planning} -- can achieve tight minimax performance in terms of regret, $\tilde{\mathcal{O}}(\sqrt{HSAT})$.

Model-based Reinforcement Learning reinforcement-learning +1

Exploration Conscious Reinforcement Learning Revisited

1 code implementation13 Dec 2018 Lior Shani, Yonathan Efroni, Shie Mannor

We continue and analyze properties of exploration-conscious optimal policies and characterize two general approaches to solve such criteria.

reinforcement-learning Reinforcement Learning (RL)

Multiple-Step Greedy Policies in Approximate and Online Reinforcement Learning

no code implementations NeurIPS 2018 Yonathan Efroni, Gal Dalal, Bruno Scherrer, Shie Mannor

Multiple-step lookahead policies have demonstrated high empirical competence in Reinforcement Learning, via the use of Monte Carlo Tree Search or Model Predictive Control.

Model Predictive Control reinforcement-learning +1

How to Combine Tree-Search Methods in Reinforcement Learning

no code implementations6 Sep 2018 Yonathan Efroni, Gal Dalal, Bruno Scherrer, Shie Mannor

Finite-horizon lookahead policies are abundantly used in Reinforcement Learning and demonstrate impressive empirical success.

reinforcement-learning Reinforcement Learning (RL)

Multiple-Step Greedy Policies in Online and Approximate Reinforcement Learning

no code implementations21 May 2018 Yonathan Efroni, Gal Dalal, Bruno Scherrer, Shie Mannor

Multiple-step lookahead policies have demonstrated high empirical competence in Reinforcement Learning, via the use of Monte Carlo Tree Search or Model Predictive Control.

Model Predictive Control reinforcement-learning +1

Cannot find the paper you are looking for? You can Submit a new open access paper.