no code implementations • 18 Dec 2024 • Yinlam Chow, Guy Tennenholtz, Izzeddin Gur, Vincent Zhuang, Bo Dai, Sridhar Thiagarajan, Craig Boutilier, Rishabh Agarwal, Aviral Kumar, Aleksandra Faust
Recent studies have indicated that effectively utilizing inference-time compute is crucial for attaining better performance from large language models (LLMs).
no code implementations • 10 Dec 2024 • Ofir Nabati, Guy Tennenholtz, ChihWei Hsu, MoonKyung Ryu, Deepak Ramachandran, Yinlam Chow, Xiang Li, Craig Boutilier
We address the problem of personalized, interactive text-to-image (T2I) generation, designing a reinforcement learning (RL) agent which iteratively improves a set of generated images for a user through a sequence of prompt expansions.
no code implementations • 24 May 2024 • Guy Tennenholtz, Yinlam Chow, Chih-Wei Hsu, Lior Shani, Ethan Liang, Craig Boutilier
Our embedding-aligned guided language (EAGLE) agent is trained to iteratively steer the LLM's generation towards optimal regions of the latent embedding space, w. r. t.
no code implementations • 25 Feb 2024 • Anthony Liang, Guy Tennenholtz, Chih-Wei Hsu, Yinlam Chow, Erdem Biyik, Craig Boutilier
We introduce DynaMITE-RL, a meta-reinforcement learning (meta-RL) approach to approximate inference in environments where the latent state evolves at varying rates.
no code implementations • 22 Oct 2023 • Erdem Biyik, Fan Yao, Yinlam Chow, Alex Haig, Chih-Wei Hsu, Mohammad Ghavamzadeh, Craig Boutilier
Leveraging concept activation vectors for soft attribute semantics, we develop novel preference elicitation methods that can accommodate soft attributes and bring together both item and attribute-based preference elicitation.
no code implementations • 9 Oct 2023 • Jihwan Jeong, Yinlam Chow, Guy Tennenholtz, Chih-Wei Hsu, Azamat Tulepbergenov, Mohammad Ghavamzadeh, Craig Boutilier
Recommender systems (RSs) play a central role in connecting users to content, products, and services, matching candidate items to users based on their preferences.
no code implementations • 6 Oct 2023 • Guy Tennenholtz, Yinlam Chow, Chih-Wei Hsu, Jihwan Jeong, Lior Shani, Azamat Tulepbergenov, Deepak Ramachandran, Martin Mladenov, Craig Boutilier
Embeddings have become a pivotal means to represent complex, multi-faceted information about entities, concepts, and relationships in a condensed and useful format.
no code implementations • 25 Jul 2022 • Deborah Cohen, MoonKyung Ryu, Yinlam Chow, Orgad Keller, Ido Greenberg, Avinatan Hassidim, Michael Fink, Yossi Matias, Idan Szpektor, Craig Boutilier, Gal Elidan
Despite recent advances in natural language understanding and generation, and decades of research on the development of conversational bots, building automated agents that can carry on rich open-ended conversations with humans "in the wild" remains a formidable challenge.
no code implementations • 31 May 2022 • Yinlam Chow, Aza Tulepbergenov, Ofir Nachum, MoonKyung Ryu, Mohammad Ghavamzadeh, Craig Boutilier
Despite recent advancements in language models (LMs), their application to dialogue management (DM) problems and ability to carry on rich conversations remain a challenge.
2 code implementations • 10 May 2022 • Ido Greenberg, Yinlam Chow, Mohammad Ghavamzadeh, Shie Mannor
In risk-averse reinforcement learning (RL), the goal is to optimize some risk measure of the returns.
no code implementations • 10 Feb 2022 • Dylan Slack, Yinlam Chow, Bo Dai, Nevan Wichers
However, we identify these techniques are not well equipped for safe policy learning because they ignore negative experiences(e. g., unsafe or unsuccessful), focusing only on positive experiences, which harms their ability to generalize to new tasks safely.
2 code implementations • 6 Feb 2022 • Christina Göpfert, Alex Haig, Yinlam Chow, Chih-Wei Hsu, Ivan Vendrov, Tyler Lu, Deepak Ramachandran, Hubert Pham, Mohammad Ghavamzadeh, Craig Boutilier
Interactive recommender systems have emerged as a promising paradigm to overcome the limitations of the primitive user feedback used by traditional recommender systems (e. g., clicks, item consumption, ratings).
no code implementations • 29 Sep 2021 • Dylan Z Slack, Yinlam Chow, Bo Dai, Nevan Wichers
Though many reinforcement learning (RL) problems involve learning policies in settings that are difficult to specify safety constraints and sparse rewards, current methods struggle to rapidly and safely acquire successful policies.
no code implementations • 1 Dec 2020 • Joey Hong, Branislav Kveton, Manzil Zaheer, Yinlam Chow, Amr Ahmed, Mohammad Ghavamzadeh, Craig Boutilier
The key idea is to frame this problem as a latent bandit, where the prototypical models of user behavior are learned offline and the latent state of the user is inferred online from its interactions with the models.
no code implementations • NeurIPS 2020 • Bo Dai, Ofir Nachum, Yinlam Chow, Lihong Li, Csaba Szepesvári, Dale Schuurmans
We study high-confidence behavior-agnostic off-policy evaluation in reinforcement learning, where the goal is to estimate a confidence interval on a target policy's value, given only access to a static experience dataset collected by unknown behavior policies.
no code implementations • NeurIPS 2021 • Tsung-Yen Yang, Michael Hu, Yinlam Chow, Peter J. Ramadge, Karthik Narasimhan
We then develop an agent with a modular architecture that can interpret and adhere to such textual constraints while learning new tasks.
no code implementations • 25 Sep 2019 • Yinlam Chow, Ofir Nachum, Aleksandra Faust, Edgar Duenez-Guzman, Mohammad Ghavamzadeh
We study continuous action reinforcement learning problems in which it is crucial that the agent interacts with the environment only through safe policies, i. e.,~policies that keep the agent in desirable situations, both during training and at convergence.
no code implementations • 27 Sep 2018 • Yinlam Chow, Ofir Nachum, Mohammad Ghavamzadeh, Edgar Guzman-Duenez
In many reinforcement learning applications, it is crucial that the agent interacts with the environment only through safe policies, i. e.,~policies that do not take the agent to certain undesirable situations.