no code implementations • 18 Mar 2024 • Bhrij Patel, Wesley A. Suttle, Alec Koppel, Vaneet Aggarwal, Brian M. Sadler, Amrit Singh Bedi, Dinesh Manocha
In the context of average-reward reinforcement learning, the requirement for oracle knowledge of the mixing time, a measure of the duration a Markov chain under a fixed policy needs to achieve its stationary distribution-poses a significant challenge for the global convergence of policy gradient methods.
no code implementations • 14 Mar 2024 • Vishnu Sashank Dorbala, Bhrij Patel, Amrit Singh Bedi, Dinesh Manocha
We address this concern by inferring results on two cases for object placement: one where the objects placed follow a routine or a path, and the other where they are placed at random.
no code implementations • 9 Jun 2023 • Bhrij Patel, Kasun Weerakoon, Wesley A. Suttle, Alec Koppel, Brian M. Sadler, Tianyi Zhou, Amrit Singh Bedi, Dinesh Manocha
Trajectory length stands as a crucial hyperparameter within reinforcement learning (RL) algorithms, significantly contributing to the sample inefficiency in robotics applications.
no code implementations • 28 Jan 2023 • Wesley A. Suttle, Amrit Singh Bedi, Bhrij Patel, Brian M. Sadler, Alec Koppel, Dinesh Manocha
Many existing reinforcement learning (RL) methods employ stochastic gradient iteration on the back end, whose stability hinges upon a hypothesis that the data-generating process mixes exponentially fast with a rate parameter that appears in the step-size selection.
1 code implementation • 8 May 2020 • Caroline Wang, Bin Han, Bhrij Patel, Cynthia Rudin
We compared predictive performance and fairness of these models against two methods that are currently used in the justice system to predict pretrial recidivism: the Arnold PSA and COMPAS.