Search Results for author: Craig Boutilier

Found 47 papers, 9 papers with code

DynaMITE-RL: A Dynamic Model for Improved Temporal Meta-Reinforcement Learning

no code implementations25 Feb 2024 Anthony Liang, Guy Tennenholtz, Chih-Wei Hsu, Yinlam Chow, Erdem Biyik, Craig Boutilier

We introduce DynaMITE-RL, a meta-reinforcement learning (meta-RL) approach to approximate inference in environments where the latent state evolves at varying rates.

Continuous Control Meta Reinforcement Learning

Preference Elicitation with Soft Attributes in Interactive Recommendation

no code implementations22 Oct 2023 Erdem Biyik, Fan Yao, Yinlam Chow, Alex Haig, Chih-Wei Hsu, Mohammad Ghavamzadeh, Craig Boutilier

Leveraging concept activation vectors for soft attribute semantics, we develop novel preference elicitation methods that can accommodate soft attributes and bring together both item and attribute-based preference elicitation.

Attribute Recommendation Systems

Factual and Personalized Recommendations using Language Models and Reinforcement Learning

no code implementations9 Oct 2023 Jihwan Jeong, Yinlam Chow, Guy Tennenholtz, Chih-Wei Hsu, Azamat Tulepbergenov, Mohammad Ghavamzadeh, Craig Boutilier

Recommender systems (RSs) play a central role in connecting users to content, products, and services, matching candidate items to users based on their preferences.

Language Modelling Recommendation Systems +1

Demystifying Embedding Spaces using Large Language Models

no code implementations6 Oct 2023 Guy Tennenholtz, Yinlam Chow, Chih-Wei Hsu, Jihwan Jeong, Lior Shani, Azamat Tulepbergenov, Deepak Ramachandran, Martin Mladenov, Craig Boutilier

Embeddings have become a pivotal means to represent complex, multi-faceted information about entities, concepts, and relationships in a condensed and useful format.

Dimensionality Reduction Recommendation Systems

Modeling Recommender Ecosystems: Research Challenges at the Intersection of Mechanism Design, Reinforcement Learning and Generative Models

no code implementations8 Sep 2023 Craig Boutilier, Martin Mladenov, Guy Tennenholtz

Modern recommender systems lie at the heart of complex ecosystems that couple the behavior of users, content providers, advertisers, and other actors.

Recommendation Systems

Content Prompting: Modeling Content Provider Dynamics to Improve User Welfare in Recommender Ecosystems

no code implementations2 Sep 2023 Siddharth Prasad, Martin Mladenov, Craig Boutilier

A prompting policy is a sequence of such prompts that is responsive to the dynamics of a provider's beliefs, skills and incentives.

Recommendation Systems

DPOK: Reinforcement Learning for Fine-tuning Text-to-Image Diffusion Models

2 code implementations25 May 2023 Ying Fan, Olivia Watkins, Yuqing Du, Hao liu, MoonKyung Ryu, Craig Boutilier, Pieter Abbeel, Mohammad Ghavamzadeh, Kangwook Lee, Kimin Lee

We focus on diffusion models, defining the fine-tuning task as an RL problem, and updating the pre-trained text-to-image diffusion models using policy gradient to maximize the feedback-trained reward.

reinforcement-learning Reinforcement Learning (RL)

Ranking with Popularity Bias: User Welfare under Self-Amplification Dynamics

no code implementations24 May 2023 Guy Tennenholtz, Martin Mladenov, Nadav Merlis, Robert L. Axtell, Craig Boutilier

We highlight the importance of exploration, not to eliminate popularity bias, but to mitigate its negative impact on welfare.

Aligning Text-to-Image Models using Human Feedback

no code implementations23 Feb 2023 Kimin Lee, Hao liu, MoonKyung Ryu, Olivia Watkins, Yuqing Du, Craig Boutilier, Pieter Abbeel, Mohammad Ghavamzadeh, Shixiang Shane Gu

Our results demonstrate the potential for learning from human feedback to significantly improve text-to-image models.

Image Generation

Reinforcement Learning with History-Dependent Dynamic Contexts

no code implementations4 Feb 2023 Guy Tennenholtz, Nadav Merlis, Lior Shani, Martin Mladenov, Craig Boutilier

We introduce Dynamic Contextual Markov Decision Processes (DCMDPs), a novel reinforcement learning framework for history-dependent environments that generalizes the contextual MDP framework to handle non-Markov environments, where contexts change over time.

reinforcement-learning Reinforcement Learning (RL)

Dynamic Planning in Open-Ended Dialogue using Reinforcement Learning

no code implementations25 Jul 2022 Deborah Cohen, MoonKyung Ryu, Yinlam Chow, Orgad Keller, Ido Greenberg, Avinatan Hassidim, Michael Fink, Yossi Matias, Idan Szpektor, Craig Boutilier, Gal Elidan

Despite recent advances in natural language understanding and generation, and decades of research on the development of conversational bots, building automated agents that can carry on rich open-ended conversations with humans "in the wild" remains a formidable challenge.

Natural Language Understanding reinforcement-learning +1

A Mixture-of-Expert Approach to RL-based Dialogue Management

no code implementations31 May 2022 Yinlam Chow, Aza Tulepbergenov, Ofir Nachum, MoonKyung Ryu, Mohammad Ghavamzadeh, Craig Boutilier

Despite recent advancements in language models (LMs), their application to dialogue management (DM) problems and ability to carry on rich conversations remain a challenge.

Attribute Dialogue Management +3

Discovering Personalized Semantics for Soft Attributes in Recommender Systems using Concept Activation Vectors

2 code implementations6 Feb 2022 Christina Göpfert, Alex Haig, Yinlam Chow, Chih-Wei Hsu, Ivan Vendrov, Tyler Lu, Deepak Ramachandran, Hubert Pham, Mohammad Ghavamzadeh, Craig Boutilier

Interactive recommender systems have emerged as a promising paradigm to overcome the limitations of the primitive user feedback used by traditional recommender systems (e. g., clicks, item consumption, ratings).

Recommendation Systems

IMO$^3$: Interactive Multi-Objective Off-Policy Optimization

no code implementations24 Jan 2022 Nan Wang, Hongning Wang, Maryam Karimzadehgan, Branislav Kveton, Craig Boutilier

This problem has been studied extensively in the setting of known objective functions.

Thompson Sampling with a Mixture Prior

no code implementations10 Jun 2021 Joey Hong, Branislav Kveton, Manzil Zaheer, Mohammad Ghavamzadeh, Craig Boutilier

We study Thompson sampling (TS) in online decision making, where the uncertain environment is sampled from a mixture distribution.

Decision Making Multi-Task Learning +3

Towards Content Provider Aware Recommender Systems: A Simulation Study on the Interplay between User and Provider Utilities

no code implementations6 May 2021 Ruohan Zhan, Konstantina Christakopoulou, Ya Le, Jayden Ooi, Martin Mladenov, Alex Beutel, Craig Boutilier, Ed H. Chi, Minmin Chen

We then build a REINFORCE recommender agent, coined EcoAgent, to optimize a joint objective of user utility and the counterfactual utility lift of the provider associated with the recommended content, which we show to be equivalent to maximizing overall user utility and the utilities of all providers on the platform under some mild assumptions.

counterfactual Recommendation Systems

RecSim NG: Toward Principled Uncertainty Modeling for Recommender Ecosystems

1 code implementation14 Mar 2021 Martin Mladenov, Chih-Wei Hsu, Vihan Jain, Eugene Ie, Christopher Colby, Nicolas Mayoraz, Hubert Pham, Dustin Tran, Ivan Vendrov, Craig Boutilier

The development of recommender systems that optimize multi-turn interaction with users, and model the interactions of different agents (e. g., users, content providers, vendors) in the recommender ecosystem have drawn increasing attention in recent years.

counterfactual Probabilistic Programming +1

Differentiable Meta-Learning of Bandit Policies

no code implementations NeurIPS 2020 Craig Boutilier, Chih-Wei Hsu, Branislav Kveton, Martin Mladenov, Csaba Szepesvari, Manzil Zaheer

Exploration policies in Bayesian bandits maximize the average reward over problem instances drawn from some distribution P. In this work, we learn such policies for an unknown distribution P using samples from P. Our approach is a form of meta-learning and exploits properties of P without making strong assumptions about its form.

Meta-Learning

Non-Stationary Latent Bandits

no code implementations1 Dec 2020 Joey Hong, Branislav Kveton, Manzil Zaheer, Yinlam Chow, Amr Ahmed, Mohammad Ghavamzadeh, Craig Boutilier

The key idea is to frame this problem as a latent bandit, where the prototypical models of user behavior are learned offline and the latent state of the user is inferred online from its interactions with the models.

Recommendation Systems Thompson Sampling

Optimizing Long-term Social Welfare in Recommender Systems: A Constrained Matching Approach

no code implementations ICML 2020 Martin Mladenov, Elliot Creager, Omer Ben-Porat, Kevin Swersky, Richard Zemel, Craig Boutilier

We develop several scalable techniques to solve the matching problem, and also draw connections to various notions of user regret and fairness, arguing that these outcomes are fairer in a utilitarian sense.

Fairness Recommendation Systems

Latent Bandits Revisited

no code implementations NeurIPS 2020 Joey Hong, Branislav Kveton, Manzil Zaheer, Yin-Lam Chow, Amr Ahmed, Craig Boutilier

A latent bandit problem is one in which the learning agent knows the arm reward distributions conditioned on an unknown discrete latent state.

Recommendation Systems Thompson Sampling

Meta-Learning Bandit Policies by Gradient Ascent

no code implementations9 Jun 2020 Branislav Kveton, Martin Mladenov, Chih-Wei Hsu, Manzil Zaheer, Csaba Szepesvari, Craig Boutilier

Most bandit policies are designed to either minimize regret in any problem instance, making very few assumptions about the underlying environment, or in a Bayesian sense, assuming a prior distribution over environment parameters.

Meta-Learning Multi-Armed Bandits

Differentiable Bandit Exploration

no code implementations NeurIPS 2020 Craig Boutilier, Chih-Wei Hsu, Branislav Kveton, Martin Mladenov, Csaba Szepesvari, Manzil Zaheer

In this work, we learn such policies for an unknown distribution $\mathcal{P}$ using samples from $\mathcal{P}$.

Meta-Learning

BRPO: Batch Residual Policy Optimization

no code implementations8 Feb 2020 Sungryull Sohn, Yin-Lam Chow, Jayden Ooi, Ofir Nachum, Honglak Lee, Ed Chi, Craig Boutilier

In batch reinforcement learning (RL), one often constrains a learned policy to be close to the behavior (data-generating) policy, e. g., by constraining the learned action distribution to differ from the behavior policy by some maximum degree that is the same at each state.

reinforcement-learning Reinforcement Learning (RL)

Gradient-based Optimization for Bayesian Preference Elicitation

no code implementations20 Nov 2019 Ivan Vendrov, Tyler Lu, Qingqing Huang, Craig Boutilier

Effective techniques for eliciting user preferences have taken on added importance as recommender systems (RSs) become increasingly interactive and conversational.

Recommendation Systems

CAQL: Continuous Action Q-Learning

no code implementations ICLR 2020 Moonkyung Ryu, Yin-Lam Chow, Ross Anderson, Christian Tjandraatmadja, Craig Boutilier

Value-based reinforcement learning (RL) methods like Q-learning have shown success in a variety of domains.

Continuous Control Q-Learning +1

RecSim: A Configurable Simulation Platform for Recommender Systems

1 code implementation11 Sep 2019 Eugene Ie, Chih-Wei Hsu, Martin Mladenov, Vihan Jain, Sanmit Narvekar, Jing Wang, Rui Wu, Craig Boutilier

We propose RecSim, a configurable platform for authoring simulation environments for recommender systems (RSs) that naturally supports sequential interaction with users.

Recommendation Systems reinforcement-learning +1

Randomized Exploration in Generalized Linear Bandits

no code implementations21 Jun 2019 Branislav Kveton, Manzil Zaheer, Csaba Szepesvari, Lihong Li, Mohammad Ghavamzadeh, Craig Boutilier

The first, GLM-TSL, samples a generalized linear model (GLM) from the Laplace approximation to the posterior distribution.

Advantage Amplification in Slowly Evolving Latent-State Environments

no code implementations29 May 2019 Martin Mladenov, Ofer Meshi, Jayden Ooi, Dale Schuurmans, Craig Boutilier

Latent-state environments with long horizons, such as those faced by recommender systems, pose significant challenges for reinforcement learning (RL).

Recommendation Systems reinforcement-learning +1

Perturbed-History Exploration in Stochastic Multi-Armed Bandits

no code implementations26 Feb 2019 Branislav Kveton, Csaba Szepesvari, Mohammad Ghavamzadeh, Craig Boutilier

Finally, we empirically evaluate PHE and show that it is competitive with state-of-the-art baselines.

Multi-Armed Bandits

Data center cooling using model-predictive control

1 code implementation NeurIPS 2018 Nevena Lazic, Craig Boutilier, Tyler Lu, Eehern Wong, Binz Roy, Mk Ryu, Greg Imwalle

Despite impressive recent advances in reinforcement learning (RL), its deployment in real-world physical systems is often complicated by unexpected events, limited data, and the potential for expensive failures.

Model Predictive Control reinforcement-learning +1

Non-delusional Q-learning and value-iteration

no code implementations NeurIPS 2018 Tyler Lu, Dale Schuurmans, Craig Boutilier

We identify a fundamental source of error in Q-learning and other forms of dynamic programming with function approximation.

Q-Learning

Safe Exploration for Identifying Linear Systems via Robust Optimization

no code implementations30 Nov 2017 Tyler Lu, Martin Zinkevich, Craig Boutilier, Binz Roy, Dale Schuurmans

Motivated by the cooling of Google's data centers, we study how one can safely identify the parameters of a system model with a desired accuracy and confidence level.

Reinforcement Learning (RL) Safe Exploration

Proceedings of the Sixteenth Conference on Uncertainty in Artificial Intelligence (2000)

no code implementations13 Apr 2013 Craig Boutilier, Moises Goldszmidt

This is the Proceedings of the Sixteenth Conference on Uncertainty in Artificial Intelligence, which was held in San Francisco, CA, June 30 - July 3, 2000

Context-Specific Independence in Bayesian Networks

1 code implementation13 Feb 2013 Craig Boutilier, Nir Friedman, Moises Goldszmidt, Daphne Koller

Bayesian networks provide a language for qualitatively representing the conditional independence properties of a distribution.

Structured Reachability Analysis for Markov Decision Processes

no code implementations30 Jan 2013 Craig Boutilier, Ronen I. Brafman, Christopher W. Geib

Another contribution is the illustration of how the compact representation of reachability constraints can be exploited by several existing (exact and approximate) abstraction algorithms for MDPs.

Optimal Bayesian Recommendation Sets and Myopically Optimal Choice Query Sets

no code implementations NeurIPS 2010 Paolo Viappiani, Craig Boutilier

We show that, under very general assumptions, the optimal choice query w. r. t.\ EVOI coincides with \emph{optimal recommendation set}, that is, a set maximizing expected utility of the user selection.

Cannot find the paper you are looking for? You can Submit a new open access paper.