no code implementations • 23 Sep 2024 • Kaushal Paneri, Michael Munje, Kailash Singh Maurya, Adith Swaminathan, Yifan Shi
Growing scale of recommender systems require extensive tuning to respond to market dynamics and system changes.
1 code implementation • 14 Aug 2024 • Ying Fan, Jingling Li, Adith Swaminathan, Aditya Modi, Ching-An Cheng
We present a novel method, Contextual goal-Oriented Data Augmentation (CODA), which uses commonly available unlabeled trajectories and context-goal pairs to solve Contextual Goal-Oriented (CGO) problems.
1 code implementation • 23 Jun 2024 • Ching-An Cheng, Allen Nie, Adith Swaminathan
We investigate end-to-end generative optimization -- using generative models such as LLMs within the optimizer for automatic updating of general computational workflows.
no code implementations • 1 Jun 2024 • Christine Herlihy, Jennifer Neville, Tobias Schnabel, Adith Swaminathan
We explore the use of Large Language Model (LLM-based) chatbots to power recommender systems.
1 code implementation • 26 May 2024 • Allen Nie, Ching-An Cheng, Andrey Kolobov, Adith Swaminathan
We study the potential of using large language models (LLMs) as an interactive optimizer for solving maximization problems in a text space using natural language and numerical feedback.
no code implementations • 2 Mar 2024 • Jiacen Xu, Jack W. Stokes, Geoff McDonald, Xuesong Bai, David Marshall, Siyue Wang, Adith Swaminathan, Zhou Li
Large language models (LLMs) have demonstrated impressive results on natural language tasks, and security researchers are beginning to employ them in both offensive and defensive systems.
1 code implementation • 11 Dec 2023 • Ching-An Cheng, Andrey Kolobov, Dipendra Misra, Allen Nie, Adith Swaminathan
We introduce a new benchmark, LLF-Bench (Learning from Language Feedback Benchmark; pronounced as "elf-bench"), to evaluate the ability of AI agents to interactively learn from natural language feedback and instructions.
no code implementations • 26 Oct 2023 • Huihan Liu, Alice Chen, Yuke Zhu, Adith Swaminathan, Andrey Kolobov, Ching-An Cheng
A key feature of OLAF is its ability to update the robot's visuomotor neural policy based on the verbal feedback to avoid repeating mistakes in the future.
1 code implementation • 13 Jul 2022 • Sean R. Sinclair, Felipe Frujeri, Ching-An Cheng, Luke Marshall, Hugo Barbalho, Jingling Li, Jennifer Neville, Ishai Menache, Adith Swaminathan
Many resource management problems require sequential decision-making under uncertainty, where the only uncertainty affecting the decision outcomes are exogenous variables outside the control of the decision-maker.
no code implementations • NeurIPS 2021 • Ching-An Cheng, Andrey Kolobov, Adith Swaminathan
On the theoretical side, we characterize properties of a good heuristic and its impact on RL acceleration.
no code implementations • 1 Jun 2021 • Bogdan Mazoure, Paul Mineiro, Pavithra Srinath, Reza Sharifi Sedeh, Doina Precup, Adith Swaminathan
Targeting immediately measurable proxies such as clicks can lead to suboptimal recommendations due to misalignment with the long-term metric.
no code implementations • NeurIPS 2020 • Yao Liu, Adith Swaminathan, Alekh Agarwal, Emma Brunskill
Doing batch RL in a way that yields a reliable new policy in large domains is challenging: a new decision policy may visit states and actions outside the support of the batch data, and function approximation and optimization with limited samples can further increase the potential of learning policies with overly optimistic estimates of their future performance.
1 code implementation • 16 Jul 2020 • Yao Liu, Adith Swaminathan, Alekh Agarwal, Emma Brunskill
Doing batch RL in a way that yields a reliable new policy in large domains is challenging: a new decision policy may visit states and actions outside the support of the batch data, and function approximation and optimization with limited samples can further increase the potential of learning policies with overly optimistic estimates of their future performance.
1 code implementation • 26 Apr 2020 • Edward J. Hu, Adith Swaminathan, Hadi Salman, Greg Yang
Robustness against image perturbations bounded by a $\ell_p$ ball have been well-studied in recent literature.
no code implementations • ICML 2020 • Ricky Loynd, Roland Fernandez, Asli Celikyilmaz, Adith Swaminathan, Matthew Hausknecht
Transformers have increasingly outperformed gated RNNs in obtaining new state-of-the-art results on supervised tasks involving text sequences.
2 code implementations • ICML 2020 • Eric Zhan, Albert Tseng, Yisong Yue, Adith Swaminathan, Matthew Hausknecht
We study the problem of controllable generation of long-term sequential behaviors, where the goal is to calibrate to multiple behavior styles simultaneously.
no code implementations • 12 May 2019 • Aditya Modi, Debadeepta Dey, Alekh Agarwal, Adith Swaminathan, Besmira Nushi, Sean Andrist, Eric Horvitz
We address the opportunity to maximize the utility of an overall computing system by employing reinforcement learning to guide the configuration of the set of interacting modules that comprise the system.
no code implementations • 17 Apr 2019 • Yao Liu, Adith Swaminathan, Alekh Agarwal, Emma Brunskill
We study the problem of off-policy policy optimization in Markov decision processes, and develop a novel off-policy policy gradient method.
no code implementations • 5 Apr 2019 • Ishan Durugkar, Matthew Hausknecht, Adith Swaminathan, Patrick MacAlpine
Policy gradient algorithms typically combine discounted future rewards with an estimated value function, to compute the direction and magnitude of parameter updates.
1 code implementation • 12 Feb 2019 • Matthew Hausknecht, Ricky Loynd, Greg Yang, Adith Swaminathan, Jason D. Williams
Interactive Fiction (IF) games are complex textual decision making problems.
no code implementations • ICLR 2018 • Thorsten Joachims, Adith Swaminathan, Maarten de Rijke
We propose a new output layer for deep neural networks that permits the use of logged contextual bandit feedback for training.
no code implementations • 1 Dec 2016 • Damien Lefortier, Adith Swaminathan, Xiaotao Gu, Thorsten Joachims, Maarten de Rijke
The ability to perform effective off-policy learning would revolutionize the process of building better interactive systems, such as search engines and recommendation systems for e-commerce, computational advertising and news.
no code implementations • 16 Aug 2016 • Thorsten Joachims, Adith Swaminathan, Tobias Schnabel
Implicit feedback (e. g., clicks, dwell times, etc.)
1 code implementation • NeurIPS 2017 • Adith Swaminathan, Akshay Krishnamurthy, Alekh Agarwal, Miroslav Dudík, John Langford, Damien Jose, Imed Zitouni
This paper studies the evaluation of policies that recommend an ordered set of items (e. g., a ranking) based on some context---a common scenario in web search, ads, and recommendation.
no code implementations • 25 Apr 2016 • Tobias Schnabel, Adith Swaminathan, Peter Frazier, Thorsten Joachims
Eliciting relevance judgments for ranking evaluation is labor-intensive and costly, motivating careful selection of which documents to judge.
no code implementations • 17 Feb 2016 • Tobias Schnabel, Adith Swaminathan, Ashudeep Singh, Navin Chandak, Thorsten Joachims
Most data for evaluating and training recommender systems is subject to selection biases, either through self-selection by the users or through the actions of the recommendation system itself.
no code implementations • NeurIPS 2015 • Adith Swaminathan, Thorsten Joachims
This paper identifies a severe problem of the counterfactual risk estimator typically used in batch learning from logged bandit feedback (BLBF), and proposes the use of an alternative estimator that avoids this problem. In the BLBF setting, the learner does not receive full-information feedback like in supervised learning, but observes feedback only for the actions taken by a historical policy. This makes BLBF algorithms particularly attractive for training online systems (e. g., ad placement, web search, recommendation) using their historical logs. The Counterfactual Risk Minimization (CRM) principle offers a general recipe for designing BLBF algorithms.
no code implementations • 9 Feb 2015 • Adith Swaminathan, Thorsten Joachims
We develop a learning principle and an efficient algorithm for batch learning from logged bandit feedback.