Search Results for author: Scott M. Jordan

Found 6 papers, 2 papers with code

From Past to Future: Rethinking Eligibility Traces

no code implementations • 20 Dec 2023 • Dhawal Gupta, Scott M. Jordan, Shreyas Chaudhari, Bo Liu, Philip S. Thomas, Bruno Castro da Silva

In this paper, we introduce a fresh perspective on the challenges of credit assignment and policy evaluation.

Paper
Add Code

Coagent Networks: Generalized and Scaled

no code implementations • 16 May 2023 • James E. Kostas, Scott M. Jordan, Yash Chandak, Georgios Theocharous, Dhawal Gupta, Martha White, Bruno Castro da Silva, Philip S. Thomas

However, the coagent framework is not just an alternative to BDL; the two approaches can be blended: BDL can be combined with coagent learning rules to create architectures with the advantages of both approaches.

Reinforcement Learning (RL)

Paper
Add Code

Robust Markov Decision Processes without Model Estimation

no code implementations • 2 Feb 2023 • Wenhao Yang, Han Wang, Tadashi Kozuno, Scott M. Jordan, Zhihua Zhang

Moreover, we prove the alternative form still plays a similar role as the original form.

Paper
Add Code

Towards Safe Policy Improvement for Non-Stationary MDPs

1 code implementation • NeurIPS 2020 • Yash Chandak, Scott M. Jordan, Georgios Theocharous, Martha White, Philip S. Thomas

Many real-world sequential decision-making problems involve critical systems with financial risks and human-life risks.

Decision Making reinforcement-learning +4

Paper
Code

Evaluating the Performance of Reinforcement Learning Algorithms

1 code implementation • ICML 2020 • Scott M. Jordan, Yash Chandak, Daniel Cohen, Mengxue Zhang, Philip S. Thomas

Performance evaluations are critical for quantifying algorithmic advances in reinforcement learning.

reinforcement-learning Reinforcement Learning (RL)

Paper
Code

Classical Policy Gradient: Preserving Bellman's Principle of Optimality

no code implementations • 6 Jun 2019 • Philip S. Thomas, Scott M. Jordan, Yash Chandak, Chris Nota, James Kostas

We propose a new objective function for finite-horizon episodic Markov decision processes that better captures Bellman's principle of optimality, and provide an expression for the gradient of the objective.

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.