Search Results for author: Guy Tennenholtz

Found 28 papers, 4 papers with code

Inference-Aware Fine-Tuning for Best-of-N Sampling in Large Language Models

no code implementations18 Dec 2024 Yinlam Chow, Guy Tennenholtz, Izzeddin Gur, Vincent Zhuang, Bo Dai, Sridhar Thiagarajan, Craig Boutilier, Rishabh Agarwal, Aviral Kumar, Aleksandra Faust

Recent studies have indicated that effectively utilizing inference-time compute is crucial for attaining better performance from large language models (LLMs).

HumanEval Imitation Learning +2

Personalized and Sequential Text-to-Image Generation

no code implementations10 Dec 2024 Ofir Nabati, Guy Tennenholtz, ChihWei Hsu, MoonKyung Ryu, Deepak Ramachandran, Yinlam Chow, Xiang Li, Craig Boutilier

We address the problem of personalized, interactive text-to-image (T2I) generation, designing a reinforcement learning (RL) agent which iteratively improves a set of generated images for a user through a sequence of prompt expansions.

Language Modeling Language Modelling +2

Benchmarks for Reinforcement Learning with Biased Offline Data and Imperfect Simulators

no code implementations30 Jun 2024 Ori Linial, Guy Tennenholtz, Uri Shalit

In many reinforcement learning (RL) applications one cannot easily let the agent act in the world; this is true for autonomous vehicles, healthcare applications, and even some recommender systems, to name a few examples.

Autonomous Vehicles Offline RL +4

Embedding-Aligned Language Models

no code implementations24 May 2024 Guy Tennenholtz, Yinlam Chow, Chih-Wei Hsu, Lior Shani, Ethan Liang, Craig Boutilier

Our embedding-aligned guided language (EAGLE) agent is trained to iteratively steer the LLM's generation towards optimal regions of the latent embedding space, w. r. t.

Reinforcement Learning (RL) Text Generation

DynaMITE-RL: A Dynamic Model for Improved Temporal Meta-Reinforcement Learning

no code implementations25 Feb 2024 Anthony Liang, Guy Tennenholtz, Chih-Wei Hsu, Yinlam Chow, Erdem Biyik, Craig Boutilier

We introduce DynaMITE-RL, a meta-reinforcement learning (meta-RL) approach to approximate inference in environments where the latent state evolves at varying rates.

continuous-control Continuous Control +1

Ever Evolving Evaluator (EV3): Towards Flexible and Reliable Meta-Optimization for Knowledge Distillation

1 code implementation29 Oct 2023 Li Ding, Masrour Zoghi, Guy Tennenholtz, Maryam Karimzadehgan

We introduce EV3, a novel meta-optimization framework designed to efficiently train scalable machine learning models through an intuitive explore-assess-adapt protocol.

Diversity Evolutionary Algorithms +3

Factual and Personalized Recommendations using Language Models and Reinforcement Learning

no code implementations9 Oct 2023 Jihwan Jeong, Yinlam Chow, Guy Tennenholtz, Chih-Wei Hsu, Azamat Tulepbergenov, Mohammad Ghavamzadeh, Craig Boutilier

Recommender systems (RSs) play a central role in connecting users to content, products, and services, matching candidate items to users based on their preferences.

Language Modeling Language Modelling +3

Demystifying Embedding Spaces using Large Language Models

no code implementations6 Oct 2023 Guy Tennenholtz, Yinlam Chow, Chih-Wei Hsu, Jihwan Jeong, Lior Shani, Azamat Tulepbergenov, Deepak Ramachandran, Martin Mladenov, Craig Boutilier

Embeddings have become a pivotal means to represent complex, multi-faceted information about entities, concepts, and relationships in a condensed and useful format.

Dimensionality Reduction Recommendation Systems

Modeling Recommender Ecosystems: Research Challenges at the Intersection of Mechanism Design, Reinforcement Learning and Generative Models

no code implementations8 Sep 2023 Craig Boutilier, Martin Mladenov, Guy Tennenholtz

Modern recommender systems lie at the heart of complex ecosystems that couple the behavior of users, content providers, advertisers, and other actors.

Recommendation Systems

Bayesian Regret Minimization in Offline Bandits

no code implementations2 Jun 2023 Marek Petrik, Guy Tennenholtz, Mohammad Ghavamzadeh

We study how to make decisions that minimize Bayesian regret in offline linear bandits.

Delphic Offline Reinforcement Learning under Nonidentifiable Hidden Confounding

no code implementations1 Jun 2023 Alizée Pace, Hugo Yèche, Bernhard Schölkopf, Gunnar Rätsch, Guy Tennenholtz

A prominent challenge of offline reinforcement learning (RL) is the issue of hidden confounding: unobserved variables may influence both the actions taken by the agent and the observed outcomes.

Management Offline RL +3

Ranking with Popularity Bias: User Welfare under Self-Amplification Dynamics

no code implementations24 May 2023 Guy Tennenholtz, Martin Mladenov, Nadav Merlis, Robert L. Axtell, Craig Boutilier

We highlight the importance of exploration, not to eliminate popularity bias, but to mitigate its negative impact on welfare.

Reinforcement Learning with History-Dependent Dynamic Contexts

no code implementations4 Feb 2023 Guy Tennenholtz, Nadav Merlis, Lior Shani, Martin Mladenov, Craig Boutilier

We introduce Dynamic Contextual Markov Decision Processes (DCMDPs), a novel reinforcement learning framework for history-dependent environments that generalizes the contextual MDP framework to handle non-Markov environments, where contexts change over time.

reinforcement-learning Reinforcement Learning +1

Reinforcement Learning with a Terminator

1 code implementation30 May 2022 Guy Tennenholtz, Nadav Merlis, Lior Shani, Shie Mannor, Uri Shalit, Gal Chechik, Assaf Hallak, Gal Dalal

We learn the parameters of the TerMDP and leverage the structure of the estimation problem to provide state-wise confidence bounds.

Autonomous Driving reinforcement-learning +2

On Covariate Shift of Latent Confounders in Imitation and Reinforcement Learning

no code implementations ICLR 2022 Guy Tennenholtz, Assaf Hallak, Gal Dalal, Shie Mannor, Gal Chechik, Uri Shalit

We analyze the limitations of learning from such data with and without external reward, and propose an adjustment of standard imitation learning algorithms to fit this setup.

Imitation Learning Recommendation Systems +2

Locality Matters: A Scalable Value Decomposition Approach for Cooperative Multi-Agent Reinforcement Learning

no code implementations22 Sep 2021 Roy Zohar, Shie Mannor, Guy Tennenholtz

Cooperative multi-agent reinforcement learning (MARL) faces significant scalability issues due to state and action spaces that are exponentially large in the number of agents.

Multi-agent Reinforcement Learning reinforcement-learning +1

Maximum Entropy Reinforcement Learning with Mixture Policies

no code implementations18 Mar 2021 Nir Baram, Guy Tennenholtz, Shie Mannor

However, using mixture policies in the Maximum Entropy (MaxEnt) framework is not straightforward.

continuous-control Continuous Control +3

Uncertainty Estimation Using Riemannian Model~Dynamics for Offline Reinforcement Learning

no code implementations22 Feb 2021 Guy Tennenholtz, Shie Mannor

In this work, we combine parametric and nonparametric methods for uncertainty estimation through a novel latent space based metric.

Autonomous Driving continuous-control +6

Action Redundancy in Reinforcement Learning

no code implementations22 Feb 2021 Nir Baram, Guy Tennenholtz, Shie Mannor

Maximum Entropy (MaxEnt) reinforcement learning is a powerful learning paradigm which seeks to maximize return under entropy regularization.

MuJoCo reinforcement-learning +2

Bandits with Partially Observable Confounded Data

no code implementations11 Jun 2020 Guy Tennenholtz, Uri Shalit, Shie Mannor, Yonathan Efroni

We construct a linear bandit algorithm that takes advantage of the projected information, and prove regret bounds.

Multi-Armed Bandits

Never Worse, Mostly Better: Stable Policy Improvement in Deep Reinforcement Learning

no code implementations2 Oct 2019 Pranav Khanna, Guy Tennenholtz, Nadav Merlis, Shie Mannor, Chen Tessler

In recent years, there has been significant progress in applying deep reinforcement learning (RL) for solving challenging problems across a wide variety of domains.

continuous-control Continuous Control +3

Off-Policy Evaluation in Partially Observable Environments

no code implementations9 Sep 2019 Guy Tennenholtz, Shie Mannor, Uri Shalit

This work studies the problem of batch off-policy evaluation for Reinforcement Learning in partially observable environments.

Off-policy evaluation Reinforcement Learning

Distributional Policy Optimization: An Alternative Approach for Continuous Control

3 code implementations NeurIPS 2019 Chen Tessler, Guy Tennenholtz, Shie Mannor

We show that optimizing over such sets results in local movement in the action space and thus convergence to sub-optimal solutions.

continuous-control Continuous Control +1

The Natural Language of Actions

1 code implementation4 Feb 2019 Guy Tennenholtz, Shie Mannor

We introduce Act2Vec, a general framework for learning context-based action representation for Reinforcement Learning.

reinforcement-learning Reinforcement Learning +3

Train on Validation: Squeezing the Data Lemon

no code implementations16 Feb 2018 Guy Tennenholtz, Tom Zahavy, Shie Mannor

We define the notion of on-average-validation-stable algorithms as one in which using small portions of validation data for training does not overfit the model selection process.

Model Selection

The Stochastic Firefighter Problem

no code implementations22 Nov 2017 Guy Tennenholtz, Constantine Caramanis, Shie Mannor

We devise a simple policy that only vaccinates neighbors of infected nodes and is optimal on regular trees and on general graphs for a sufficiently large budget.

Cannot find the paper you are looking for? You can Submit a new open access paper.