no code implementations • 18 Dec 2024 • Yinlam Chow, Guy Tennenholtz, Izzeddin Gur, Vincent Zhuang, Bo Dai, Sridhar Thiagarajan, Craig Boutilier, Rishabh Agarwal, Aviral Kumar, Aleksandra Faust
Recent studies have indicated that effectively utilizing inference-time compute is crucial for attaining better performance from large language models (LLMs).
no code implementations • 10 Dec 2024 • Ofir Nabati, Guy Tennenholtz, ChihWei Hsu, MoonKyung Ryu, Deepak Ramachandran, Yinlam Chow, Xiang Li, Craig Boutilier
We address the problem of personalized, interactive text-to-image (T2I) generation, designing a reinforcement learning (RL) agent which iteratively improves a set of generated images for a user through a sequence of prompt expansions.
no code implementations • 30 Jun 2024 • Ori Linial, Guy Tennenholtz, Uri Shalit
In many reinforcement learning (RL) applications one cannot easily let the agent act in the world; this is true for autonomous vehicles, healthcare applications, and even some recommender systems, to name a few examples.
no code implementations • 24 May 2024 • Guy Tennenholtz, Yinlam Chow, Chih-Wei Hsu, Lior Shani, Ethan Liang, Craig Boutilier
Our embedding-aligned guided language (EAGLE) agent is trained to iteratively steer the LLM's generation towards optimal regions of the latent embedding space, w. r. t.
no code implementations • 25 Feb 2024 • Anthony Liang, Guy Tennenholtz, Chih-Wei Hsu, Yinlam Chow, Erdem Biyik, Craig Boutilier
We introduce DynaMITE-RL, a meta-reinforcement learning (meta-RL) approach to approximate inference in environments where the latent state evolves at varying rates.
1 code implementation • 29 Oct 2023 • Li Ding, Masrour Zoghi, Guy Tennenholtz, Maryam Karimzadehgan
We introduce EV3, a novel meta-optimization framework designed to efficiently train scalable machine learning models through an intuitive explore-assess-adapt protocol.
no code implementations • 9 Oct 2023 • Jihwan Jeong, Yinlam Chow, Guy Tennenholtz, Chih-Wei Hsu, Azamat Tulepbergenov, Mohammad Ghavamzadeh, Craig Boutilier
Recommender systems (RSs) play a central role in connecting users to content, products, and services, matching candidate items to users based on their preferences.
no code implementations • 6 Oct 2023 • Guy Tennenholtz, Yinlam Chow, Chih-Wei Hsu, Jihwan Jeong, Lior Shani, Azamat Tulepbergenov, Deepak Ramachandran, Martin Mladenov, Craig Boutilier
Embeddings have become a pivotal means to represent complex, multi-faceted information about entities, concepts, and relationships in a condensed and useful format.
no code implementations • 8 Sep 2023 • Craig Boutilier, Martin Mladenov, Guy Tennenholtz
Modern recommender systems lie at the heart of complex ecosystems that couple the behavior of users, content providers, advertisers, and other actors.
no code implementations • 2 Jun 2023 • Marek Petrik, Guy Tennenholtz, Mohammad Ghavamzadeh
We study how to make decisions that minimize Bayesian regret in offline linear bandits.
no code implementations • 1 Jun 2023 • Alizée Pace, Hugo Yèche, Bernhard Schölkopf, Gunnar Rätsch, Guy Tennenholtz
A prominent challenge of offline reinforcement learning (RL) is the issue of hidden confounding: unobserved variables may influence both the actions taken by the agent and the observed outcomes.
no code implementations • 31 May 2023 • Ofir Nabati, Guy Tennenholtz, Shie Mannor
We present a representation-driven framework for reinforcement learning.
no code implementations • 24 May 2023 • Guy Tennenholtz, Martin Mladenov, Nadav Merlis, Robert L. Axtell, Craig Boutilier
We highlight the importance of exploration, not to eliminate popularity bias, but to mitigate its negative impact on welfare.
no code implementations • 4 Feb 2023 • Guy Tennenholtz, Nadav Merlis, Lior Shani, Martin Mladenov, Craig Boutilier
We introduce Dynamic Contextual Markov Decision Processes (DCMDPs), a novel reinforcement learning framework for history-dependent environments that generalizes the contextual MDP framework to handle non-Markov environments, where contexts change over time.
1 code implementation • 30 May 2022 • Guy Tennenholtz, Nadav Merlis, Lior Shani, Shie Mannor, Uri Shalit, Gal Chechik, Assaf Hallak, Gal Dalal
We learn the parameters of the TerMDP and leverage the structure of the estimation problem to provide state-wise confidence bounds.
no code implementations • ICLR 2022 • Guy Tennenholtz, Assaf Hallak, Gal Dalal, Shie Mannor, Gal Chechik, Uri Shalit
We analyze the limitations of learning from such data with and without external reward, and propose an adjustment of standard imitation learning algorithms to fit this setup.
no code implementations • 22 Sep 2021 • Roy Zohar, Shie Mannor, Guy Tennenholtz
Cooperative multi-agent reinforcement learning (MARL) faces significant scalability issues due to state and action spaces that are exponentially large in the number of agents.
Multi-agent Reinforcement Learning
reinforcement-learning
+1
no code implementations • 18 Mar 2021 • Nir Baram, Guy Tennenholtz, Shie Mannor
However, using mixture policies in the Maximum Entropy (MaxEnt) framework is not straightforward.
no code implementations • 22 Feb 2021 • Guy Tennenholtz, Shie Mannor
In this work, we combine parametric and nonparametric methods for uncertainty estimation through a novel latent space based metric.
no code implementations • 22 Feb 2021 • Nir Baram, Guy Tennenholtz, Shie Mannor
Maximum Entropy (MaxEnt) reinforcement learning is a powerful learning paradigm which seeks to maximize return under entropy regularization.
no code implementations • 11 Jun 2020 • Guy Tennenholtz, Uri Shalit, Shie Mannor, Yonathan Efroni
We construct a linear bandit algorithm that takes advantage of the projected information, and prove regret bounds.
no code implementations • 2 Oct 2019 • Pranav Khanna, Guy Tennenholtz, Nadav Merlis, Shie Mannor, Chen Tessler
In recent years, there has been significant progress in applying deep reinforcement learning (RL) for solving challenging problems across a wide variety of domains.
no code implementations • 2 Oct 2019 • Erez Schwartz, Guy Tennenholtz, Chen Tessler, Shie Mannor
Recent advances in reinforcement learning have shown its potential to tackle complex real-life tasks.
no code implementations • 9 Sep 2019 • Guy Tennenholtz, Shie Mannor, Uri Shalit
This work studies the problem of batch off-policy evaluation for Reinforcement Learning in partially observable environments.
3 code implementations • NeurIPS 2019 • Chen Tessler, Guy Tennenholtz, Shie Mannor
We show that optimizing over such sets results in local movement in the action space and thus convergence to sub-optimal solutions.
1 code implementation • 4 Feb 2019 • Guy Tennenholtz, Shie Mannor
We introduce Act2Vec, a general framework for learning context-based action representation for Reinforcement Learning.
no code implementations • 16 Feb 2018 • Guy Tennenholtz, Tom Zahavy, Shie Mannor
We define the notion of on-average-validation-stable algorithms as one in which using small portions of validation data for training does not overfit the model selection process.
no code implementations • 22 Nov 2017 • Guy Tennenholtz, Constantine Caramanis, Shie Mannor
We devise a simple policy that only vaccinates neighbors of infected nodes and is optimal on regular trees and on general graphs for a sufficiently large budget.