no code implementations • 1 Feb 2025 • Prashant Mehta, Sean Meyn
The broad goal of the research surveyed in this article is to develop methods for understanding the aggregate behavior of interconnected dynamical systems, as found in mathematical physics, neuroscience, economics, power systems and neural networks.
no code implementations • 28 May 2024 • Caio Kalil Lauand, Sean Meyn
In particular, with $\rho \in (1/2, 1)$ it is known that on applying the averaging technique of Polyak and Ruppert, the mean-squared error (MSE) converges at the optimal rate of $O(1/n)$ and the covariance in the central limit theorem (CLT) is minimal in a precise sense.
no code implementations • 10 Sep 2023 • Fan Lu, Sean Meyn
The main contributions firstly concern properties of the relaxation, described as a deterministic convex program: we identify conditions for a bounded solution, and a significant relationship between the solution to the new convex program, and the solution to standard Q-learning.
no code implementations • 6 Sep 2023 • Caio Kalil Lauand, Sean Meyn
The remaining results are established for linear SA recursions: (ii) the bivariate parameter-disturbance process is geometrically ergodic in a topological sense; (iii) the representation for bias has a simpler form in this case, and cannot be expected to be zero if there is multiplicative noise; (iv) the asymptotic covariance of the averaged parameters is within $O(\alpha)$ of optimal.
no code implementations • 5 Jul 2023 • Sean Meyn
The algorithm is a general approach to stochastic approximation which in particular applies to Q-learning with "oblivious" training even with non-linear function approximation.
no code implementations • 10 Jan 2023 • Gian Paramo, Arturo Bretas, Sean Meyn
This technique holds several advantages over contemporary techniques: It utilizes technology that is already deployed in the field, it offers a significant degree of generality, and so far it has displayed a very high-level of sensitivity without sacrificing accuracy.
no code implementations • 20 Dec 2022 • Austin Cooper, Arturo Bretas, Sean Meyn, Newton G. Bretas
This paper presents a model for detecting high-impedance faults (HIFs) using parameter error modeling and a two-step per-phase weighted least squares state estimation (SE) process.
no code implementations • 20 Dec 2022 • Austin Cooper, Arturo Bretas, Sean Meyn, Newton G. Bretas
Distribution systems of the future smart grid require enhancements to the reliability of distribution system state estimation (DSSE) in the face of low measurement redundancy, unsynchronized measurements, and dynamic load profiles.
no code implementations • 17 Oct 2022 • Fan Lu, Prashant Mehta, Sean Meyn, Gergely Neu
The main contributions follow: (i) The dual of convex Q-learning is not precisely Manne's LP or a version of logistic Q-learning, but has similar structure that reveals the need for regularization to avoid over-fitting.
no code implementations • 14 Oct 2022 • Fan Lu, Joel Mathias, Sean Meyn, Karanjit Kalsi
Convex Q-learning is a recent approach to reinforcement learning, motivated by the possibility of a firmer theory for convergence, and the possibility of making use of greater a priori knowledge regarding policy or value function structure.
no code implementations • 27 Oct 2021 • Vivek Borkar, Shuhang Chen, Adithya Devraj, Ioannis Kontoyiannis, Sean Meyn
The paper concerns the $d$-dimensional stochastic approximation recursion, $$ \theta_{n+1}= \theta_n + \alpha_{n + 1} f(\theta_n, \Phi_{n+1}) $$ where $ \{ \Phi_n \}$ is a stochastic process on a general state space, satisfying a conditional Markov property that allows for parameter-dependent noise.
no code implementations • 30 Sep 2020 • Shuhang Chen, Adithya Devraj, Andrey Bernstein, Sean Meyn
(ii) With gain $a_t = g/(1+t)$ the results are not as sharp: the rate of convergence $1/t$ holds only if $I + g A^*$ is Hurwitz.
no code implementations • 7 Feb 2020 • Shuhang Chen, Adithya M. Devraj, Ana Bušić, Sean Meyn
This is motivation for the focus on mean square error bounds for parameter estimates.
no code implementations • 17 Sep 2018 • Adithya M. Devraj, Ana Bušić, Sean Meyn
There are two well known SA techniques that are known to have optimal asymptotic variance: the Ruppert-Polyak averaging technique, and stochastic Newton-Raphson (SNR).
no code implementations • NeurIPS 2017 • Adithya M. Devraj, Sean Meyn
The Zap Q-learning algorithm introduced in this paper is an improvement of Watkins' original algorithm and recent competitors in several respects.
no code implementations • 6 Jul 2013 • Wei Chen, Dayu Huang, Ankur A. Kulkarni, Jayakrishnan Unnikrishnan, Quanyan Zhu, Prashant Mehta, Sean Meyn, Adam Wierman
Neuro-dynamic programming is a class of powerful techniques for approximating the solution to dynamic programming equations.