no code implementations • 20 Dec 2023 • Dhawal Gupta, Scott M. Jordan, Shreyas Chaudhari, Bo Liu, Philip S. Thomas, Bruno Castro da Silva
In this paper, we introduce a fresh perspective on the challenges of credit assignment and policy evaluation.
no code implementations • 16 May 2023 • James E. Kostas, Scott M. Jordan, Yash Chandak, Georgios Theocharous, Dhawal Gupta, Martha White, Bruno Castro da Silva, Philip S. Thomas
However, the coagent framework is not just an alternative to BDL; the two approaches can be blended: BDL can be combined with coagent learning rules to create architectures with the advantages of both approaches.
no code implementations • 2 Feb 2023 • Wenhao Yang, Han Wang, Tadashi Kozuno, Scott M. Jordan, Zhihua Zhang
Moreover, we prove the alternative form still plays a similar role as the original form.
1 code implementation • NeurIPS 2020 • Yash Chandak, Scott M. Jordan, Georgios Theocharous, Martha White, Philip S. Thomas
Many real-world sequential decision-making problems involve critical systems with financial risks and human-life risks.
1 code implementation • ICML 2020 • Scott M. Jordan, Yash Chandak, Daniel Cohen, Mengxue Zhang, Philip S. Thomas
Performance evaluations are critical for quantifying algorithmic advances in reinforcement learning.
no code implementations • 6 Jun 2019 • Philip S. Thomas, Scott M. Jordan, Yash Chandak, Chris Nota, James Kostas
We propose a new objective function for finite-horizon episodic Markov decision processes that better captures Bellman's principle of optimality, and provide an expression for the gradient of the objective.