no code implementations • 26 Sep 2024 • Chih-Wei Hsu, Martin Mladenov, Ofer Meshi, James Pine, Hubert Pham, Shane Li, Xujian Liang, Anton Polishko, Li Yang, Ben Scheetz, Craig Boutilier
In this work, we describe a simulation methodology used to augment (and reduce) the use of live experiments.
no code implementations • 24 May 2024 • Guy Tennenholtz, Yinlam Chow, Chih-Wei Hsu, Lior Shani, Ethan Liang, Craig Boutilier
Our embedding-aligned guided language (EAGLE) agent is trained to iteratively steer the LLM's generation towards optimal regions of the latent embedding space, w. r. t.
no code implementations • 25 Feb 2024 • Anthony Liang, Guy Tennenholtz, Chih-Wei Hsu, Yinlam Chow, Erdem Biyik, Craig Boutilier
We introduce DynaMITE-RL, a meta-reinforcement learning (meta-RL) approach to approximate inference in environments where the latent state evolves at varying rates.
no code implementations • 22 Oct 2023 • Erdem Biyik, Fan Yao, Yinlam Chow, Alex Haig, Chih-Wei Hsu, Mohammad Ghavamzadeh, Craig Boutilier
Leveraging concept activation vectors for soft attribute semantics, we develop novel preference elicitation methods that can accommodate soft attributes and bring together both item and attribute-based preference elicitation.
no code implementations • 9 Oct 2023 • Jihwan Jeong, Yinlam Chow, Guy Tennenholtz, Chih-Wei Hsu, Azamat Tulepbergenov, Mohammad Ghavamzadeh, Craig Boutilier
Recommender systems (RSs) play a central role in connecting users to content, products, and services, matching candidate items to users based on their preferences.
no code implementations • 6 Oct 2023 • Guy Tennenholtz, Yinlam Chow, Chih-Wei Hsu, Jihwan Jeong, Lior Shani, Azamat Tulepbergenov, Deepak Ramachandran, Martin Mladenov, Craig Boutilier
Embeddings have become a pivotal means to represent complex, multi-faceted information about entities, concepts, and relationships in a condensed and useful format.
2 code implementations • 6 Feb 2022 • Christina Göpfert, Alex Haig, Yinlam Chow, Chih-Wei Hsu, Ivan Vendrov, Tyler Lu, Deepak Ramachandran, Hubert Pham, Mohammad Ghavamzadeh, Craig Boutilier
Interactive recommender systems have emerged as a promising paradigm to overcome the limitations of the primitive user feedback used by traditional recommender systems (e. g., clicks, item consumption, ratings).
1 code implementation • 14 Mar 2021 • Martin Mladenov, Chih-Wei Hsu, Vihan Jain, Eugene Ie, Christopher Colby, Nicolas Mayoraz, Hubert Pham, Dustin Tran, Ivan Vendrov, Craig Boutilier
The development of recommender systems that optimize multi-turn interaction with users, and model the interactions of different agents (e. g., users, content providers, vendors) in the recommender ecosystem have drawn increasing attention in recent years.
no code implementations • 11 Feb 2021 • Branislav Kveton, Mikhail Konobeev, Manzil Zaheer, Chih-Wei Hsu, Martin Mladenov, Craig Boutilier, Csaba Szepesvari
Efficient exploration in bandits is a fundamental online learning problem.
no code implementations • NeurIPS 2020 • Craig Boutilier, Chih-Wei Hsu, Branislav Kveton, Martin Mladenov, Csaba Szepesvari, Manzil Zaheer
Exploration policies in Bayesian bandits maximize the average reward over problem instances drawn from some distribution P. In this work, we learn such policies for an unknown distribution P using samples from P. Our approach is a form of meta-learning and exploits properties of P without making strong assumptions about its form.
no code implementations • 9 Jun 2020 • Branislav Kveton, Martin Mladenov, Chih-Wei Hsu, Manzil Zaheer, Csaba Szepesvari, Craig Boutilier
Most bandit policies are designed to either minimize regret in any problem instance, making very few assumptions about the underlying environment, or in a Bayesian sense, assuming a prior distribution over environment parameters.
no code implementations • NeurIPS 2020 • Craig Boutilier, Chih-Wei Hsu, Branislav Kveton, Martin Mladenov, Csaba Szepesvari, Manzil Zaheer
In this work, we learn such policies for an unknown distribution $\mathcal{P}$ using samples from $\mathcal{P}$.
1 code implementation • 11 Sep 2019 • Eugene Ie, Chih-Wei Hsu, Martin Mladenov, Vihan Jain, Sanmit Narvekar, Jing Wang, Rui Wu, Craig Boutilier
We propose RecSim, a configurable platform for authoring simulation environments for recommender systems (RSs) that naturally supports sequential interaction with users.
no code implementations • 4 Apr 2019 • Chih-Wei Hsu, Branislav Kveton, Ofer Meshi, Martin Mladenov, Csaba Szepesvari
In this work, we pioneer the idea of algorithm design by minimizing the empirical Bayes regret, the average regret over problem instances sampled from a known distribution.