Search Results for author: Adith Swaminathan

Found 23 papers, 6 papers with code

AutoAttacker: A Large Language Model Guided System to Implement Automatic Cyber-attacks

no code implementations • 2 Mar 2024 • Jiacen Xu, Jack W. Stokes, Geoff McDonald, Xuesong Bai, David Marshall, Siyue Wang, Adith Swaminathan, Zhou Li

Large language models (LLMs) have demonstrated impressive results on natural language tasks, and security researchers are beginning to employ them in both offensive and defensive systems.

Computer Security Language Modelling +1

Paper
Add Code

LLF-Bench: Benchmark for Interactive Learning from Language Feedback

no code implementations • 11 Dec 2023 • Ching-An Cheng, Andrey Kolobov, Dipendra Misra, Allen Nie, Adith Swaminathan

We introduce a new benchmark, LLF-Bench (Learning from Language Feedback Benchmark; pronounced as "elf-bench"), to evaluate the ability of AI agents to interactively learn from natural language feedback and instructions.

Information Retrieval OpenAI Gym

Paper
Add Code

Interactive Robot Learning from Verbal Correction

no code implementations • 26 Oct 2023 • Huihan Liu, Alice Chen, Yuke Zhu, Adith Swaminathan, Andrey Kolobov, Ching-An Cheng

A key feature of OLAF is its ability to update the robot's visuomotor neural policy based on the verbal feedback to avoid repeating mistakes in the future.

Language Modelling Large Language Model

Paper
Add Code

Hindsight Learning for MDPs with Exogenous Inputs

1 code implementation • 13 Jul 2022 • Sean R. Sinclair, Felipe Frujeri, Ching-An Cheng, Luke Marshall, Hugo Barbalho, Jingling Li, Jennifer Neville, Ishai Menache, Adith Swaminathan

Many resource management problems require sequential decision-making under uncertainty, where the only uncertainty affecting the decision outcomes are exogenous variables outside the control of the decision-maker.

counterfactual Decision Making +3

Paper
Code

Heuristic-Guided Reinforcement Learning

no code implementations • NeurIPS 2021 • Ching-An Cheng, Andrey Kolobov, Adith Swaminathan

On the theoretical side, we characterize properties of a good heuristic and its impact on RL acceleration.

Decision Making reinforcement-learning +1

Paper
Add Code

Improving Long-Term Metrics in Recommendation Systems using Short-Horizon Reinforcement Learning

no code implementations • 1 Jun 2021 • Bogdan Mazoure, Paul Mineiro, Pavithra Srinath, Reza Sharifi Sedeh, Doina Precup, Adith Swaminathan

Targeting immediately measurable proxies such as clicks can lead to suboptimal recommendations due to misalignment with the long-term metric.

Offline RL reinforcement-learning +2

Paper
Add Code

Provably Good Batch Off-Policy Reinforcement Learning Without Great Exploration

no code implementations • NeurIPS 2020 • Yao Liu, Adith Swaminathan, Alekh Agarwal, Emma Brunskill

Doing batch RL in a way that yields a reliable new policy in large domains is challenging: a new decision policy may visit states and actions outside the support of the batch data, and function approximation and optimization with limited samples can further increase the potential of learning policies with overly optimistic estimates of their future performance.

reinforcement-learning Reinforcement Learning (RL)

Paper
Add Code

Provably Good Batch Reinforcement Learning Without Great Exploration

1 code implementation • 16 Jul 2020 • Yao Liu, Adith Swaminathan, Alekh Agarwal, Emma Brunskill

reinforcement-learning Reinforcement Learning (RL)

Paper
Code

Improved Image Wasserstein Attacks and Defenses

1 code implementation • 26 Apr 2020 • Edward J. Hu, Adith Swaminathan, Hadi Salman, Greg Yang

Robustness against image perturbations bounded by a $\ell_p$ ball have been well-studied in recent literature.

Paper
Code

Working Memory Graphs

no code implementations • ICML 2020 • Ricky Loynd, Roland Fernandez, Asli Celikyilmaz, Adith Swaminathan, Matthew Hausknecht

Transformers have increasingly outperformed gated RNNs in obtaining new state-of-the-art results on supervised tasks involving text sequences.

Decision Making

Paper
Add Code

Learning Calibratable Policies using Programmatic Style-Consistency

2 code implementations • ICML 2020 • Eric Zhan, Albert Tseng, Yisong Yue, Adith Swaminathan, Matthew Hausknecht

We study the problem of controllable generation of long-term sequential behaviors, where the goal is to calibrate to multiple behavior styles simultaneously.

Imitation Learning

Paper
Code

Metareasoning in Modular Software Systems: On-the-Fly Configuration using Reinforcement Learning with Rich Contextual Representations

no code implementations • 12 May 2019 • Aditya Modi, Debadeepta Dey, Alekh Agarwal, Adith Swaminathan, Besmira Nushi, Sean Andrist, Eric Horvitz

We address the opportunity to maximize the utility of an overall computing system by employing reinforcement learning to guide the configuration of the set of interacting modules that comprise the system.

Decision Making reinforcement-learning +1

Paper
Add Code

Off-Policy Policy Gradient with State Distribution Correction

no code implementations • 17 Apr 2019 • Yao Liu, Adith Swaminathan, Alekh Agarwal, Emma Brunskill

We study the problem of off-policy policy optimization in Markov decision processes, and develop a novel off-policy policy gradient method.

Paper
Add Code

Multi-Preference Actor Critic

no code implementations • 5 Apr 2019 • Ishan Durugkar, Matthew Hausknecht, Adith Swaminathan, Patrick MacAlpine

Policy gradient algorithms typically combine discounted future rewards with an estimated value function, to compute the direction and magnitude of parameter updates.

reinforcement-learning Reinforcement Learning (RL)

Paper
Add Code

NAIL: A General Interactive Fiction Agent

1 code implementation • 12 Feb 2019 • Matthew Hausknecht, Ricky Loynd, Greg Yang, Adith Swaminathan, Jason D. Williams

Interactive Fiction (IF) games are complex textual decision making problems.

Decision Making

Paper
Code

Deep Learning with Logged Bandit Feedback

no code implementations • ICLR 2018 • Thorsten Joachims, Adith Swaminathan, Maarten de Rijke

We propose a new output layer for deep neural networks that permits the use of logged contextual bandit feedback for training.

counterfactual Object Recognition +1

Paper
Add Code

Large-scale Validation of Counterfactual Learning Methods: A Test-Bed

no code implementations • 1 Dec 2016 • Damien Lefortier, Adith Swaminathan, Xiaotao Gu, Thorsten Joachims, Maarten de Rijke

The ability to perform effective off-policy learning would revolutionize the process of building better interactive systems, such as search engines and recommendation systems for e-commerce, computational advertising and news.

counterfactual Off-policy evaluation +1

Paper
Add Code

Unbiased Learning-to-Rank with Biased Feedback

no code implementations • 16 Aug 2016 • Thorsten Joachims, Adith Swaminathan, Tobias Schnabel

Implicit feedback (e. g., clicks, dwell times, etc.)

counterfactual Counterfactual Inference +2

Paper
Add Code

Off-policy evaluation for slate recommendation

1 code implementation • NeurIPS 2017 • Adith Swaminathan, Akshay Krishnamurthy, Alekh Agarwal, Miroslav Dudík, John Langford, Damien Jose, Imed Zitouni

This paper studies the evaluation of policies that recommend an ordered set of items (e. g., a ranking) based on some context---a common scenario in web search, ads, and recommendation.

Learning-To-Rank Off-policy evaluation

Paper
Code

Unbiased Comparative Evaluation of Ranking Functions

no code implementations • 25 Apr 2016 • Tobias Schnabel, Adith Swaminathan, Peter Frazier, Thorsten Joachims

Eliciting relevance judgments for ranking evaluation is labor-intensive and costly, motivating careful selection of which documents to judge.

Paper
Add Code

Recommendations as Treatments: Debiasing Learning and Evaluation

no code implementations • 17 Feb 2016 • Tobias Schnabel, Adith Swaminathan, Ashudeep Singh, Navin Chandak, Thorsten Joachims

Most data for evaluating and training recommender systems is subject to selection biases, either through self-selection by the users or through the actions of the recommendation system itself.

Causal Inference Recommendation Systems

Paper
Add Code

The Self-Normalized Estimator for Counterfactual Learning

no code implementations • NeurIPS 2015 • Adith Swaminathan, Thorsten Joachims

This paper identifies a severe problem of the counterfactual risk estimator typically used in batch learning from logged bandit feedback (BLBF), and proposes the use of an alternative estimator that avoids this problem. In the BLBF setting, the learner does not receive full-information feedback like in supervised learning, but observes feedback only for the actions taken by a historical policy. This makes BLBF algorithms particularly attractive for training online systems (e. g., ad placement, web search, recommendation) using their historical logs. The Counterfactual Risk Minimization (CRM) principle offers a general recipe for designing BLBF algorithms.

counterfactual

Paper
Add Code

Counterfactual Risk Minimization: Learning from Logged Bandit Feedback

no code implementations • 9 Feb 2015 • Adith Swaminathan, Thorsten Joachims

We develop a learning principle and an efficient algorithm for batch learning from logged bandit feedback.

counterfactual Multi-Label Classification

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.