1 code implementation • 27 Aug 2023 • Shreyas Chaudhari, David Arbour, Georgios Theocharous, Nikos Vlassis
Prior work has developed estimators that leverage the structure in slates to estimate the expected off-policy performance, but the estimation of the entire performance distribution remains elusive.
no code implementations • 16 May 2023 • James E. Kostas, Scott M. Jordan, Yash Chandak, Georgios Theocharous, Dhawal Gupta, Martha White, Bruno Castro da Silva, Philip S. Thomas
However, the coagent framework is not just an alternative to BDL; the two approaches can be blended: BDL can be combined with coagent learning rules to create architectures with the advantages of both approaches.
2 code implementations • 6 May 2023 • Shripad Vilasrao Deshmukh, Arpan Dasgupta, Balaji Krishnamurthy, Nan Jiang, Chirag Agarwal, Georgios Theocharous, Jayakumar Subramanian
To do so, we encode trajectories in offline training data individually as well as collectively (encoding a set of trajectories).
no code implementations • 30 Jun 2022 • Atanu R Sinha, Navita Goyal, Sunny Dhamnani, Tanay Asija, Raja K Dubey, M V Kaarthik Raja, Georgios Theocharous
The recognition of cognitive bias in computer science is largely in the domain of information retrieval, and bias is identified at an aggregate level with the help of annotated data.
no code implementations • 23 Apr 2022 • Kai Wang, Zhao Song, Georgios Theocharous, Sridhar Mahadevan
Smoothed online combinatorial optimization considers a learner who repeatedly chooses a combinatorial decision to minimize an unknown changing cost function with a penalty on switching decisions in consecutive rounds.
no code implementations • 5 Mar 2022 • Jaron J. R. Lee, David Arbour, Georgios Theocharous
Second, many recommendation systems are not probabilistic and so having access to logging and target policy densities may not be feasible.
1 code implementation • 30 Dec 2021 • Tong Mu, Georgios Theocharous, David Arbour, Emma Brunskill
Online reinforcement learning (RL) algorithms are often difficult to deploy in complex human-facing applications as they may learn slowly and have poor early performance.
no code implementations • 10 Dec 2021 • James E. Kostas, Philip S. Thomas, Georgios Theocharous
In this work, we build on asynchronous coagent policy gradient algorithms \citep{kostas2020asynchronous} to propose a principled solution to this problem.
no code implementations • 19 Sep 2021 • Sridhar Mahadevan, Anup Rao, Georgios Theocharous, Jennifer Healey
Many real-world applications require aligning two temporal sequences, including bioinformatics, handwriting recognition, activity recognition, and human-robot coordination.
1 code implementation • NeurIPS 2020 • Yash Chandak, Scott M. Jordan, Georgios Theocharous, Martha White, Philip S. Thomas
Many real-world sequential decision-making problems involve critical systems with financial risks and human-life risks.
no code implementations • 15 Sep 2020 • Georgios Theocharous, Yash Chandak, Philip S. Thomas, Frits de Nijs
Strategic recommendations (SR) refer to the problem where an intelligent agent observes the sequential behaviors and activities of users and decides when and how to interact with them to optimize some long-term objectives, both for the user and the business.
1 code implementation • ICML 2020 • Yash Chandak, Georgios Theocharous, Shiv Shankar, Martha White, Sridhar Mahadevan, Philip S. Thomas
Most reinforcement learning methods are based upon the key assumption that the transition dynamics and reward functions are fixed, that is, the underlying Markov decision process is stationary.
1 code implementation • 5 Jun 2019 • Yash Chandak, Georgios Theocharous, Chris Nota, Philip S. Thomas
have been well-studied in the lifelong learning literature, the setting where the action set changes remains unaddressed.
1 code implementation • 5 Jun 2019 • Yash Chandak, Georgios Theocharous, Blossom Metevier, Philip S. Thomas
The Markov decision process (MDP) formulation used to model many real-world sequential decision making problems does not efficiently capture the setting where the set of available decisions (actions) at each time step is stochastic.
no code implementations • 1 Feb 2019 • Yash Chandak, Georgios Theocharous, James Kostas, Scott Jordan, Philip S. Thomas
Most model-free reinforcement learning methods leverage state representations (embeddings) for generalization, but either ignore structure in the space of actions or assume the structure is provided a priori.
no code implementations • NeurIPS 2018 • Georgios Theocharous, Zheng Wen, Yasin Abbasi, Nikos Vlassis
Our algorithm termed deterministic schedule PSRL (DS-PSRL) is efficient in terms of time, sample, and space complexity.
no code implementations • 21 Nov 2017 • Georgios Theocharous, Zheng Wen, Yasin Abbasi-Yadkori, Nikos Vlassis
Our algorithm termed deterministic schedule PSRL (DS-PSRL) is efficient in terms of time, sample, and space complexity.
no code implementations • 6 Mar 2016 • Sougata Chaudhuri, Georgios Theocharous, Mohammad Ghavamzadeh
We study the problem of personalized advertisement recommendation (PAR), which consist of a user visiting a system (website) and the system displaying one of $K$ ads to the user.
no code implementations • 9 Feb 2016 • Branislav Kveton, Hung Bui, Mohammad Ghavamzadeh, Georgios Theocharous, S. Muthukrishnan, Siqi Sun
Graphical models are a popular approach to modeling structured data but they are unsuitable for high-cardinality variables.
no code implementations • NeurIPS 2015 • Philip S. Thomas, Scott Niekum, Georgios Theocharous, George Konidaris
The benefit of the Ω-return is that it accounts for the correlation of different length returns.