no code implementations • 23 Feb 2023 • Vincent Liu, Yash Chandak, Philip Thomas, Martha White
In this work, we consider the off-policy policy evaluation problem for contextual bandits and finite horizon reinforcement learning in the nonstationary setting.
no code implementations • 6 Feb 2023 • Yash Chandak, Shiv Shankar, Venkata Gandikota, Philip S. Thomas, Arya Mazumdar
We propose a first-order method for convex optimization, where instead of being restricted to the gradient from a single parameter, gradients from multiple parameters can be used during each step of gradient descent.
1 code implementation • 24 Jan 2023 • Yash Chandak, Shiv Shankar, Nathaniel D. Bastian, Bruno Castro da Silva, Emma Brunskil, Philip S. Thomas
Methods for sequential decision-making are often built upon a foundational assumption that the underlying decision process is stationary.
no code implementations • 6 Dec 2022 • Yunhao Tang, Zhaohan Daniel Guo, Pierre Harvey Richemond, Bernardo Ávila Pires, Yash Chandak, Rémi Munos, Mark Rowland, Mohammad Gheshlaghi Azar, Charline Le Lan, Clare Lyle, András György, Shantanu Thakoor, Will Dabney, Bilal Piot, Daniele Calandriello, Michal Valko
We identify that a faster paced optimization of the predictor and semi-gradient updates on the representation, are crucial to preventing the representation collapse.
1 code implementation • 16 Dec 2021 • Weihao Tan, David Koleczek, Siddhant Pradhan, Nicholas Perello, Vivek Chettiar, Vishal Rohra, Aaslesha Rajaram, Soundararajan Srinivasan, H M Sajjad Hossain, Yash Chandak
Shared autonomy refers to approaches for enabling an autonomous agent to collaborate with a human with the aim of improving human performance.
1 code implementation • NeurIPS 2021 • Christina J. Yuan, Yash Chandak, Stephen Giguere, Philip S. Thomas, Scott Niekum
In this paper, we present a new perspective on this bias-variance trade-off and show the existence of a spectrum of estimators whose endpoints are SIS and IS.
1 code implementation • NeurIPS 2021 • Yash Chandak, Scott Niekum, Bruno Castro da Silva, Erik Learned-Miller, Emma Brunskill, Philip S. Thomas
When faced with sequential decision-making problems, it is often useful to be able to predict what would happen if decisions were made using a new policy.
no code implementations • 25 Jan 2021 • Yash Chandak, Shiv Shankar, Philip S. Thomas
Many sequential decision-making systems leverage data collected using prior policies to propose a new policy.
1 code implementation • NeurIPS 2020 • Yash Chandak, Scott M. Jordan, Georgios Theocharous, Martha White, Philip S. Thomas
Many real-world sequential decision-making problems involve critical systems with financial risks and human-life risks.
no code implementations • 15 Sep 2020 • Georgios Theocharous, Yash Chandak, Philip S. Thomas, Frits de Nijs
Strategic recommendations (SR) refer to the problem where an intelligent agent observes the sequential behaviors and activities of users and decides when and how to interact with them to optimize some long-term objectives, both for the user and the business.
1 code implementation • ICML 2020 • Scott M. Jordan, Yash Chandak, Daniel Cohen, Mengxue Zhang, Philip S. Thomas
Performance evaluations are critical for quantifying algorithmic advances in reinforcement learning.
1 code implementation • ICML 2020 • Yash Chandak, Georgios Theocharous, Shiv Shankar, Martha White, Sridhar Mahadevan, Philip S. Thomas
Most reinforcement learning methods are based upon the key assumption that the transition dynamics and reward functions are fixed, that is, the underlying Markov decision process is stationary.
no code implementations • 6 Jun 2019 • Philip S. Thomas, Scott M. Jordan, Yash Chandak, Chris Nota, James Kostas
We propose a new objective function for finite-horizon episodic Markov decision processes that better captures Bellman's principle of optimality, and provide an expression for the gradient of the objective.
1 code implementation • 5 Jun 2019 • Yash Chandak, Georgios Theocharous, Blossom Metevier, Philip S. Thomas
The Markov decision process (MDP) formulation used to model many real-world sequential decision making problems does not efficiently capture the setting where the set of available decisions (actions) at each time step is stochastic.
1 code implementation • 5 Jun 2019 • Yash Chandak, Georgios Theocharous, Chris Nota, Philip S. Thomas
have been well-studied in the lifelong learning literature, the setting where the action set changes remains unaddressed.
no code implementations • 1 Feb 2019 • Yash Chandak, Georgios Theocharous, James Kostas, Scott Jordan, Philip S. Thomas
Most model-free reinforcement learning methods leverage state representations (embeddings) for generalization, but either ignore structure in the space of actions or assume the structure is provided a priori.
1 code implementation • 31 May 2018 • Priyesh Vijayan, Yash Chandak, Mitesh M. Khapra, Srinivasan Parthasarathy, Balaraman Ravindran
Given a graph where every node has certain attributes associated with it and some nodes have labels associated with them, Collective Classification (CC) is the task of assigning labels to every unlabeled node using information from the node as well as its neighbors.
1 code implementation • 31 May 2018 • Priyesh Vijayan, Yash Chandak, Mitesh M. Khapra, Srinivasan Parthasarathy, Balaraman Ravindran
State-of-the-art models for node classification on such attributed graphs use differentiable recursive functions that enable aggregation and filtering of neighborhood information from multiple hops.
no code implementations • 24 Sep 2015 • Andreas Veit, Michael Wilber, Rajan Vaish, Serge Belongie, James Davis, Vishal Anand, Anshu Aviral, Prithvijit Chakrabarty, Yash Chandak, Sidharth Chaturvedi, Chinmaya Devaraj, Ankit Dhall, Utkarsh Dwivedi, Sanket Gupte, Sharath N. Sridhar, Karthik Paga, Anuj Pahuja, Aditya Raisinghani, Ayush Sharma, Shweta Sharma, Darpana Sinha, Nisarg Thakkar, K. Bala Vignesh, Utkarsh Verma, Kanniganti Abhishek, Amod Agrawal, Arya Aishwarya, Aurgho Bhattacharjee, Sarveshwaran Dhanasekar, Venkata Karthik Gullapalli, Shuchita Gupta, Chandana G, Kinjal Jain, Simran Kapur, Meghana Kasula, Shashi Kumar, Parth Kundaliya, Utkarsh Mathur, Alankrit Mishra, Aayush Mudgal, Aditya Nadimpalli, Munakala Sree Nihit, Akanksha Periwal, Ayush Sagar, Ayush Shah, Vikas Sharma, Yashovardhan Sharma, Faizal Siddiqui, Virender Singh, Abhinav S., Anurag. D. Yadav
When crowdsourcing systems are used in combination with machine inference systems in the real world, they benefit the most when the machine system is deeply integrated with the crowd workers.