Search Results for author: Adith Swaminathan

Found 28 papers, 10 papers with code

How to Solve Contextual Goal-Oriented Problems with Offline Datasets?

1 code implementation14 Aug 2024 Ying Fan, Jingling Li, Adith Swaminathan, Aditya Modi, Ching-An Cheng

We present a novel method, Contextual goal-Oriented Data Augmentation (CODA), which uses commonly available unlabeled trajectories and context-goal pairs to solve Contextual Goal-Oriented (CGO) problems.

Data Augmentation

Trace is the Next AutoDiff: Generative Optimization with Rich Feedback, Execution Traces, and LLMs

1 code implementation23 Jun 2024 Ching-An Cheng, Allen Nie, Adith Swaminathan

We investigate end-to-end generative optimization -- using generative models such as LLMs within the optimizer for automatic updating of general computational workflows.

The Importance of Directional Feedback for LLM-based Optimizers

1 code implementation26 May 2024 Allen Nie, Ching-An Cheng, Andrey Kolobov, Adith Swaminathan

We study the potential of using large language models (LLMs) as an interactive optimizer for solving maximization problems in a text space using natural language and numerical feedback.

AutoAttacker: A Large Language Model Guided System to Implement Automatic Cyber-attacks

no code implementations2 Mar 2024 Jiacen Xu, Jack W. Stokes, Geoff McDonald, Xuesong Bai, David Marshall, Siyue Wang, Adith Swaminathan, Zhou Li

Large language models (LLMs) have demonstrated impressive results on natural language tasks, and security researchers are beginning to employ them in both offensive and defensive systems.

Computer Security Language Modelling +1

LLF-Bench: Benchmark for Interactive Learning from Language Feedback

1 code implementation11 Dec 2023 Ching-An Cheng, Andrey Kolobov, Dipendra Misra, Allen Nie, Adith Swaminathan

We introduce a new benchmark, LLF-Bench (Learning from Language Feedback Benchmark; pronounced as "elf-bench"), to evaluate the ability of AI agents to interactively learn from natural language feedback and instructions.

Information Retrieval OpenAI Gym +1

Interactive Robot Learning from Verbal Correction

no code implementations26 Oct 2023 Huihan Liu, Alice Chen, Yuke Zhu, Adith Swaminathan, Andrey Kolobov, Ching-An Cheng

A key feature of OLAF is its ability to update the robot's visuomotor neural policy based on the verbal feedback to avoid repeating mistakes in the future.

Language Modelling Large Language Model

Hindsight Learning for MDPs with Exogenous Inputs

1 code implementation13 Jul 2022 Sean R. Sinclair, Felipe Frujeri, Ching-An Cheng, Luke Marshall, Hugo Barbalho, Jingling Li, Jennifer Neville, Ishai Menache, Adith Swaminathan

Many resource management problems require sequential decision-making under uncertainty, where the only uncertainty affecting the decision outcomes are exogenous variables outside the control of the decision-maker.

counterfactual Decision Making +4

Provably Good Batch Off-Policy Reinforcement Learning Without Great Exploration

no code implementations NeurIPS 2020 Yao Liu, Adith Swaminathan, Alekh Agarwal, Emma Brunskill

Doing batch RL in a way that yields a reliable new policy in large domains is challenging: a new decision policy may visit states and actions outside the support of the batch data, and function approximation and optimization with limited samples can further increase the potential of learning policies with overly optimistic estimates of their future performance.

reinforcement-learning Reinforcement Learning +1

Provably Good Batch Reinforcement Learning Without Great Exploration

1 code implementation16 Jul 2020 Yao Liu, Adith Swaminathan, Alekh Agarwal, Emma Brunskill

Doing batch RL in a way that yields a reliable new policy in large domains is challenging: a new decision policy may visit states and actions outside the support of the batch data, and function approximation and optimization with limited samples can further increase the potential of learning policies with overly optimistic estimates of their future performance.

reinforcement-learning Reinforcement Learning +1

Improved Image Wasserstein Attacks and Defenses

1 code implementation26 Apr 2020 Edward J. Hu, Adith Swaminathan, Hadi Salman, Greg Yang

Robustness against image perturbations bounded by a $\ell_p$ ball have been well-studied in recent literature.

Working Memory Graphs

no code implementations ICML 2020 Ricky Loynd, Roland Fernandez, Asli Celikyilmaz, Adith Swaminathan, Matthew Hausknecht

Transformers have increasingly outperformed gated RNNs in obtaining new state-of-the-art results on supervised tasks involving text sequences.

Decision Making Sequential Decision Making +1

Learning Calibratable Policies using Programmatic Style-Consistency

2 code implementations ICML 2020 Eric Zhan, Albert Tseng, Yisong Yue, Adith Swaminathan, Matthew Hausknecht

We study the problem of controllable generation of long-term sequential behaviors, where the goal is to calibrate to multiple behavior styles simultaneously.

Imitation Learning

Metareasoning in Modular Software Systems: On-the-Fly Configuration using Reinforcement Learning with Rich Contextual Representations

no code implementations12 May 2019 Aditya Modi, Debadeepta Dey, Alekh Agarwal, Adith Swaminathan, Besmira Nushi, Sean Andrist, Eric Horvitz

We address the opportunity to maximize the utility of an overall computing system by employing reinforcement learning to guide the configuration of the set of interacting modules that comprise the system.

Decision Making reinforcement-learning +2

Off-Policy Policy Gradient with State Distribution Correction

no code implementations17 Apr 2019 Yao Liu, Adith Swaminathan, Alekh Agarwal, Emma Brunskill

We study the problem of off-policy policy optimization in Markov decision processes, and develop a novel off-policy policy gradient method.

Multi-Preference Actor Critic

no code implementations5 Apr 2019 Ishan Durugkar, Matthew Hausknecht, Adith Swaminathan, Patrick MacAlpine

Policy gradient algorithms typically combine discounted future rewards with an estimated value function, to compute the direction and magnitude of parameter updates.

reinforcement-learning Reinforcement Learning +1

Deep Learning with Logged Bandit Feedback

no code implementations ICLR 2018 Thorsten Joachims, Adith Swaminathan, Maarten de Rijke

We propose a new output layer for deep neural networks that permits the use of logged contextual bandit feedback for training.

counterfactual Deep Learning +2

Large-scale Validation of Counterfactual Learning Methods: A Test-Bed

no code implementations1 Dec 2016 Damien Lefortier, Adith Swaminathan, Xiaotao Gu, Thorsten Joachims, Maarten de Rijke

The ability to perform effective off-policy learning would revolutionize the process of building better interactive systems, such as search engines and recommendation systems for e-commerce, computational advertising and news.

counterfactual Off-policy evaluation +1

Off-policy evaluation for slate recommendation

1 code implementation NeurIPS 2017 Adith Swaminathan, Akshay Krishnamurthy, Alekh Agarwal, Miroslav Dudík, John Langford, Damien Jose, Imed Zitouni

This paper studies the evaluation of policies that recommend an ordered set of items (e. g., a ranking) based on some context---a common scenario in web search, ads, and recommendation.

Learning-To-Rank Off-policy evaluation

Unbiased Comparative Evaluation of Ranking Functions

no code implementations25 Apr 2016 Tobias Schnabel, Adith Swaminathan, Peter Frazier, Thorsten Joachims

Eliciting relevance judgments for ranking evaluation is labor-intensive and costly, motivating careful selection of which documents to judge.

Recommendations as Treatments: Debiasing Learning and Evaluation

no code implementations17 Feb 2016 Tobias Schnabel, Adith Swaminathan, Ashudeep Singh, Navin Chandak, Thorsten Joachims

Most data for evaluating and training recommender systems is subject to selection biases, either through self-selection by the users or through the actions of the recommendation system itself.

Causal Inference Recommendation Systems

The Self-Normalized Estimator for Counterfactual Learning

no code implementations NeurIPS 2015 Adith Swaminathan, Thorsten Joachims

This paper identifies a severe problem of the counterfactual risk estimator typically used in batch learning from logged bandit feedback (BLBF), and proposes the use of an alternative estimator that avoids this problem. In the BLBF setting, the learner does not receive full-information feedback like in supervised learning, but observes feedback only for the actions taken by a historical policy. This makes BLBF algorithms particularly attractive for training online systems (e. g., ad placement, web search, recommendation) using their historical logs. The Counterfactual Risk Minimization (CRM) principle offers a general recipe for designing BLBF algorithms.

counterfactual

Counterfactual Risk Minimization: Learning from Logged Bandit Feedback

no code implementations9 Feb 2015 Adith Swaminathan, Thorsten Joachims

We develop a learning principle and an efficient algorithm for batch learning from logged bandit feedback.

counterfactual Multi-Label Classification

Cannot find the paper you are looking for? You can Submit a new open access paper.