1 code implementation • 19 Dec 2023 • Meshal Alharbi, Mardavij Roozbehani, Munther Dahleh
In the setting of finite episodic Markov decision processes with $S$ states, $A$ actions, and episode length $H$, we present an optimistic Q-learning algorithm that achieves $\tilde{\mathcal{O}}(\text{Poly}(H)\sqrt{T})$ regret under perfect knowledge of $f$, where $T$ is the total number of interactions with the system.
no code implementations • 24 May 2022 • Bharadwaj Satchidanandan, Mardavij Roozbehani, Munther A. Dahleh
Moreover, optimal power consumption reductions of the customers depend on the costs that they incur for curtailing consumption, which in general are private knowledge of the customers, and which they could strategically misreport in an effort to improve their own utilities even if it deteriorates the overall system cost.
no code implementations • 30 Jun 2021 • Luis Lopez, Alvaro Gonzalez-Castellanos, David Pozo, Mardavij Roozbehani, Munther Dahleh
For unlocking such capabilities, it is essential to understand the aggregated flexibility that can be harvested from the large population of new technologies located in distribution grids.