Search Results for author: Ishan Durugkar

Found 13 papers, 4 papers with code

$f$-Policy Gradients: A General Framework for Goal Conditioned RL using $f$-Divergences

no code implementations10 Oct 2023 Siddhant Agarwal, Ishan Durugkar, Peter Stone, Amy Zhang

We further introduce an entropy-regularized policy optimization objective, that we call $state$-MaxEnt RL (or $s$-MaxEnt RL) as a special case of our objective.

Efficient Exploration Policy Gradient Methods +1

ABC: Adversarial Behavioral Cloning for Offline Mode-Seeking Imitation Learning

no code implementations8 Nov 2022 Eddy Hudson, Ishan Durugkar, Garrett Warnell, Peter Stone

Given a dataset of expert agent interactions with an environment of interest, a viable method to extract an effective agent policy is to estimate the maximum likelihood policy indicated by this data.

Generative Adversarial Network Imitation Learning

DM$^2$: Decentralized Multi-Agent Reinforcement Learning for Distribution Matching

1 code implementation1 Jun 2022 Caroline Wang, Ishan Durugkar, Elad Liebman, Peter Stone

The theoretical analysis shows that under certain conditions, each agent minimizing its individual distribution mismatch allows the convergence to the joint policy that generated the target distribution.

Multi-agent Reinforcement Learning reinforcement-learning +2

Wasserstein Distance Maximizing Intrinsic Control

no code implementations28 Oct 2021 Ishan Durugkar, Steven Hansen, Stephen Spencer, Volodymyr Mnih

This paper deals with the problem of learning a skill-conditioned policy that acts meaningfully in the absence of a reward signal.

Adversarial Intrinsic Motivation for Reinforcement Learning

1 code implementation NeurIPS 2021 Ishan Durugkar, Mauricio Tec, Scott Niekum, Peter Stone

In this paper, we investigate whether one such objective, the Wasserstein-1 distance between a policy's state visitation distribution and a target distribution, can be utilized effectively for reinforcement learning (RL) tasks.

Multi-Goal Reinforcement Learning reinforcement-learning +1

Reducing Sampling Error in Batch Temporal Difference Learning

no code implementations ICML 2020 Brahma Pavse, Ishan Durugkar, Josiah Hanna, Peter Stone

In this batch setting, we show that TD(0) may converge to an inaccurate value function because the update following an action is weighted according to the number of times that action occurred in the batch -- not the true probability of the action under the given policy.

An Imitation from Observation Approach to Transfer Learning with Dynamics Mismatch

no code implementations NeurIPS 2020 Siddharth Desai, Ishan Durugkar, Haresh Karnan, Garrett Warnell, Josiah Hanna, Peter Stone

We examine the problem of transferring a policy learned in a source environment to a target environment with different dynamics, particularly in the case where it is critical to reduce the amount of interaction with the target environment during learning.

Transfer Learning

HR-TD: A Regularized TD Method to Avoid Over-Generalization

no code implementations ICLR 2019 Ishan Durugkar, Bo Liu, Peter Stone

Temporal Difference learning with function approximation has been widely used recently and has led to several successful results.

Multi-Preference Actor Critic

no code implementations5 Apr 2019 Ishan Durugkar, Matthew Hausknecht, Adith Swaminathan, Patrick MacAlpine

Policy gradient algorithms typically combine discounted future rewards with an estimated value function, to compute the direction and magnitude of parameter updates.

reinforcement-learning Reinforcement Learning (RL)

TD Learning with Constrained Gradients

no code implementations ICLR 2018 Ishan Durugkar, Peter Stone

In this work we propose a constraint on the TD update that minimizes change to the target values.

Q-Learning

Go for a Walk and Arrive at the Answer: Reasoning Over Paths in Knowledge Bases using Reinforcement Learning

7 code implementations ICLR 2018 Rajarshi Das, Shehzaad Dhuliawala, Manzil Zaheer, Luke Vilnis, Ishan Durugkar, Akshay Krishnamurthy, Alex Smola, Andrew McCallum

Knowledge bases (KB), both automatically and manually constructed, are often incomplete --- many valid facts can be inferred from the KB by synthesizing existing information.

Navigate Relation +1

Generative Multi-Adversarial Networks

1 code implementation5 Nov 2016 Ishan Durugkar, Ian Gemp, Sridhar Mahadevan

Generative adversarial networks (GANs) are a framework for producing a generative model by way of a two-player minimax game.

Ranked #67 on Image Generation on CIFAR-10 (Inception score metric)

Image Generation

Inverting Variational Autoencoders for Improved Generative Accuracy

no code implementations21 Aug 2016 Ian Gemp, Ishan Durugkar, Mario Parente, M. Darby Dyar, Sridhar Mahadevan

Recent advances in semi-supervised learning with deep generative models have shown promise in generalizing from small labeled datasets ($\mathbf{x},\mathbf{y}$) to large unlabeled ones ($\mathbf{x}$).

Cannot find the paper you are looking for? You can Submit a new open access paper.