Search Results for author: Craig Boutilier

Found 41 papers, 8 papers with code

DPOK: Reinforcement Learning for Fine-tuning Text-to-Image Diffusion Models

no code implementations25 May 2023 Ying Fan, Olivia Watkins, Yuqing Du, Hao liu, MoonKyung Ryu, Craig Boutilier, Pieter Abbeel, Mohammad Ghavamzadeh, Kangwook Lee, Kimin Lee

These techniques first learn a reward function that captures what humans care about in the task and then improve the models based on the learned reward function.

reinforcement-learning Reinforcement Learning (RL)

Ranking with Popularity Bias: User Welfare under Self-Amplification Dynamics

no code implementations24 May 2023 Guy Tennenholtz, Martin Mladenov, Nadav Merlis, Craig Boutilier

While popularity bias is recognized to play a role in recommmender (and other ranking-based) systems, detailed analyses of its impact on user welfare have largely been lacking.

Aligning Text-to-Image Models using Human Feedback

1 code implementation23 Feb 2023 Kimin Lee, Hao liu, MoonKyung Ryu, Olivia Watkins, Yuqing Du, Craig Boutilier, Pieter Abbeel, Mohammad Ghavamzadeh, Shixiang Shane Gu

Our results demonstrate the potential for learning from human feedback to significantly improve text-to-image models.

Image Generation

Offline Reinforcement Learning for Mixture-of-Expert Dialogue Management

no code implementations21 Feb 2023 Dhawal Gupta, Yinlam Chow, Mohammad Ghavamzadeh, Craig Boutilier

By exploiting MoE-LM structure, our methods significantly reduce the size of the action space and improve the efficacy of RL-based DM.

Dialogue Management Management +2

Reinforcement Learning with History-Dependent Dynamic Contexts

no code implementations4 Feb 2023 Guy Tennenholtz, Nadav Merlis, Lior Shani, Martin Mladenov, Craig Boutilier

We introduce Dynamic Contextual Markov Decision Processes (DCMDPs), a novel reinforcement learning framework for history-dependent environments that generalizes the contextual MDP framework to handle non-Markov environments, where contexts change over time.

reinforcement-learning Reinforcement Learning (RL)

Dynamic Planning in Open-Ended Dialogue using Reinforcement Learning

no code implementations25 Jul 2022 Deborah Cohen, MoonKyung Ryu, Yinlam Chow, Orgad Keller, Ido Greenberg, Avinatan Hassidim, Michael Fink, Yossi Matias, Idan Szpektor, Craig Boutilier, Gal Elidan

Despite recent advances in natural language understanding and generation, and decades of research on the development of conversational bots, building automated agents that can carry on rich open-ended conversations with humans "in the wild" remains a formidable challenge.

Natural Language Understanding reinforcement-learning +1

A Mixture-of-Expert Approach to RL-based Dialogue Management

no code implementations31 May 2022 Yinlam Chow, Aza Tulepbergenov, Ofir Nachum, MoonKyung Ryu, Mohammad Ghavamzadeh, Craig Boutilier

Despite recent advancements in language models (LMs), their application to dialogue management (DM) problems and ability to carry on rich conversations remain a challenge.

Dialogue Management Language Modelling +2

Discovering Personalized Semantics for Soft Attributes in Recommender Systems using Concept Activation Vectors

1 code implementation6 Feb 2022 Christina Göpfert, Yinlam Chow, Chih-Wei Hsu, Ivan Vendrov, Tyler Lu, Deepak Ramachandran, Craig Boutilier

Interactive recommender systems (RSs) allow users to express intent, preferences and contexts in a rich fashion, often using natural language.

Recommendation Systems

IMO$^3$: Interactive Multi-Objective Off-Policy Optimization

no code implementations24 Jan 2022 Nan Wang, Hongning Wang, Maryam Karimzadehgan, Branislav Kveton, Craig Boutilier

This problem has been studied extensively in the setting of known objective functions.

Thompson Sampling with a Mixture Prior

no code implementations10 Jun 2021 Joey Hong, Branislav Kveton, Manzil Zaheer, Mohammad Ghavamzadeh, Craig Boutilier

We study Thompson sampling (TS) in online decision making, where the uncertain environment is sampled from a mixture distribution.

Decision Making Multi-Task Learning +3

Towards Content Provider Aware Recommender Systems: A Simulation Study on the Interplay between User and Provider Utilities

no code implementations6 May 2021 Ruohan Zhan, Konstantina Christakopoulou, Ya Le, Jayden Ooi, Martin Mladenov, Alex Beutel, Craig Boutilier, Ed H. Chi, Minmin Chen

We then build a REINFORCE recommender agent, coined EcoAgent, to optimize a joint objective of user utility and the counterfactual utility lift of the provider associated with the recommended content, which we show to be equivalent to maximizing overall user utility and the utilities of all providers on the platform under some mild assumptions.

Recommendation Systems

RecSim NG: Toward Principled Uncertainty Modeling for Recommender Ecosystems

1 code implementation14 Mar 2021 Martin Mladenov, Chih-Wei Hsu, Vihan Jain, Eugene Ie, Christopher Colby, Nicolas Mayoraz, Hubert Pham, Dustin Tran, Ivan Vendrov, Craig Boutilier

The development of recommender systems that optimize multi-turn interaction with users, and model the interactions of different agents (e. g., users, content providers, vendors) in the recommender ecosystem have drawn increasing attention in recent years.

Probabilistic Programming Recommendation Systems

Non-Stationary Latent Bandits

no code implementations1 Dec 2020 Joey Hong, Branislav Kveton, Manzil Zaheer, Yinlam Chow, Amr Ahmed, Mohammad Ghavamzadeh, Craig Boutilier

The key idea is to frame this problem as a latent bandit, where the prototypical models of user behavior are learned offline and the latent state of the user is inferred online from its interactions with the models.

Recommendation Systems Thompson Sampling

Differentiable Meta-Learning of Bandit Policies

no code implementations NeurIPS 2020 Craig Boutilier, Chih-Wei Hsu, Branislav Kveton, Martin Mladenov, Csaba Szepesvari, Manzil Zaheer

Exploration policies in Bayesian bandits maximize the average reward over problem instances drawn from some distribution P. In this work, we learn such policies for an unknown distribution P using samples from P. Our approach is a form of meta-learning and exploits properties of P without making strong assumptions about its form.


Optimizing Long-term Social Welfare in Recommender Systems: A Constrained Matching Approach

no code implementations ICML 2020 Martin Mladenov, Elliot Creager, Omer Ben-Porat, Kevin Swersky, Richard Zemel, Craig Boutilier

We develop several scalable techniques to solve the matching problem, and also draw connections to various notions of user regret and fairness, arguing that these outcomes are fairer in a utilitarian sense.

Fairness Recommendation Systems

Latent Bandits Revisited

no code implementations NeurIPS 2020 Joey Hong, Branislav Kveton, Manzil Zaheer, Yin-Lam Chow, Amr Ahmed, Craig Boutilier

A latent bandit problem is one in which the learning agent knows the arm reward distributions conditioned on an unknown discrete latent state.

Recommendation Systems Thompson Sampling

Meta-Learning Bandit Policies by Gradient Ascent

no code implementations9 Jun 2020 Branislav Kveton, Martin Mladenov, Chih-Wei Hsu, Manzil Zaheer, Csaba Szepesvari, Craig Boutilier

Most bandit policies are designed to either minimize regret in any problem instance, making very few assumptions about the underlying environment, or in a Bayesian sense, assuming a prior distribution over environment parameters.

Meta-Learning Multi-Armed Bandits

Differentiable Bandit Exploration

no code implementations NeurIPS 2020 Craig Boutilier, Chih-Wei Hsu, Branislav Kveton, Martin Mladenov, Csaba Szepesvari, Manzil Zaheer

In this work, we learn such policies for an unknown distribution $\mathcal{P}$ using samples from $\mathcal{P}$.


BRPO: Batch Residual Policy Optimization

no code implementations8 Feb 2020 Sungryull Sohn, Yin-Lam Chow, Jayden Ooi, Ofir Nachum, Honglak Lee, Ed Chi, Craig Boutilier

In batch reinforcement learning (RL), one often constrains a learned policy to be close to the behavior (data-generating) policy, e. g., by constraining the learned action distribution to differ from the behavior policy by some maximum degree that is the same at each state.

reinforcement-learning Reinforcement Learning (RL)

Gradient-based Optimization for Bayesian Preference Elicitation

no code implementations20 Nov 2019 Ivan Vendrov, Tyler Lu, Qingqing Huang, Craig Boutilier

Effective techniques for eliciting user preferences have taken on added importance as recommender systems (RSs) become increasingly interactive and conversational.

Recommendation Systems

CAQL: Continuous Action Q-Learning

no code implementations ICLR 2020 Moonkyung Ryu, Yin-Lam Chow, Ross Anderson, Christian Tjandraatmadja, Craig Boutilier

Value-based reinforcement learning (RL) methods like Q-learning have shown success in a variety of domains.

Continuous Control Q-Learning +1

RecSim: A Configurable Simulation Platform for Recommender Systems

1 code implementation11 Sep 2019 Eugene Ie, Chih-Wei Hsu, Martin Mladenov, Vihan Jain, Sanmit Narvekar, Jing Wang, Rui Wu, Craig Boutilier

We propose RecSim, a configurable platform for authoring simulation environments for recommender systems (RSs) that naturally supports sequential interaction with users.

Recommendation Systems reinforcement-learning +1

Randomized Exploration in Generalized Linear Bandits

no code implementations21 Jun 2019 Branislav Kveton, Manzil Zaheer, Csaba Szepesvari, Lihong Li, Mohammad Ghavamzadeh, Craig Boutilier

GLM-TSL samples a generalized linear model (GLM) from the Laplace approximation to the posterior distribution.

Advantage Amplification in Slowly Evolving Latent-State Environments

no code implementations29 May 2019 Martin Mladenov, Ofer Meshi, Jayden Ooi, Dale Schuurmans, Craig Boutilier

Latent-state environments with long horizons, such as those faced by recommender systems, pose significant challenges for reinforcement learning (RL).

Recommendation Systems reinforcement-learning +1

Perturbed-History Exploration in Stochastic Linear Bandits

no code implementations21 Mar 2019 Branislav Kveton, Csaba Szepesvari, Mohammad Ghavamzadeh, Craig Boutilier

Our algorithm, perturbed-history exploration in a linear bandit (LinPHE), estimates a linear model from its perturbed history and pulls the arm with the highest value under that model.

Perturbed-History Exploration in Stochastic Multi-Armed Bandits

no code implementations26 Feb 2019 Branislav Kveton, Csaba Szepesvari, Mohammad Ghavamzadeh, Craig Boutilier

Finally, we empirically evaluate PHE and show that it is competitive with state-of-the-art baselines.

Multi-Armed Bandits

Non-delusional Q-learning and value-iteration

no code implementations NeurIPS 2018 Tyler Lu, Dale Schuurmans, Craig Boutilier

We identify a fundamental source of error in Q-learning and other forms of dynamic programming with function approximation.


Data center cooling using model-predictive control

1 code implementation NeurIPS 2018 Nevena Lazic, Craig Boutilier, Tyler Lu, Eehern Wong, Binz Roy, Mk Ryu, Greg Imwalle

Despite impressive recent advances in reinforcement learning (RL), its deployment in real-world physical systems is often complicated by unexpected events, limited data, and the potential for expensive failures.

reinforcement-learning Reinforcement Learning (RL)

Safe Exploration for Identifying Linear Systems via Robust Optimization

no code implementations30 Nov 2017 Tyler Lu, Martin Zinkevich, Craig Boutilier, Binz Roy, Dale Schuurmans

Motivated by the cooling of Google's data centers, we study how one can safely identify the parameters of a system model with a desired accuracy and confidence level.

Reinforcement Learning (RL) Safe Exploration

Proceedings of the Sixteenth Conference on Uncertainty in Artificial Intelligence (2000)

no code implementations13 Apr 2013 Craig Boutilier, Moises Goldszmidt

This is the Proceedings of the Sixteenth Conference on Uncertainty in Artificial Intelligence, which was held in San Francisco, CA, June 30 - July 3, 2000

Context-Specific Independence in Bayesian Networks

1 code implementation13 Feb 2013 Craig Boutilier, Nir Friedman, Moises Goldszmidt, Daphne Koller

Bayesian networks provide a language for qualitatively representing the conditional independence properties of a distribution.

Structured Reachability Analysis for Markov Decision Processes

no code implementations30 Jan 2013 Craig Boutilier, Ronen I. Brafman, Christopher W. Geib

Another contribution is the illustration of how the compact representation of reachability constraints can be exploited by several existing (exact and approximate) abstraction algorithms for MDPs.

Optimal Bayesian Recommendation Sets and Myopically Optimal Choice Query Sets

no code implementations NeurIPS 2010 Paolo Viappiani, Craig Boutilier

We show that, under very general assumptions, the optimal choice query w. r. t.\ EVOI coincides with \emph{optimal recommendation set}, that is, a set maximizing expected utility of the user selection.

Cannot find the paper you are looking for? You can Submit a new open access paper.