no code implementations • 9 Dec 2024 • Rohan Deb, Mohammad Ghavamzadeh, Arindam Banerjee
We show that the safety constraint is satisfied with high probability and that the regret of $\mathtt{C-SquareCB}$ is sub-linear in horizon $T$, while the regret of $\mathtt{C-FastCB}$ is first-order and is sub-linear in $L^*$, the cumulative loss of the optimal policy.
no code implementations • 28 Dec 2023 • Rohan Deb, Aadirupa Saha
We show that due to the relative nature of the feedback, the problem is more difficult than its bandit counterpart and that without further assumptions the problem is not learnable from a regret minimization perspective.
no code implementations • 12 Dec 2023 • Rohan Deb, Yikun Ban, Shiliang Zuo, Jingrui He, Arindam Banerjee
Based on such a perturbed prediction, we show a ${\mathcal{O}}(\log T)$ regret for online regression with both squared loss and KL loss, and subsequently convert these respectively to $\tilde{\mathcal{O}}(\sqrt{KT})$ and $\tilde{\mathcal{O}}(\sqrt{KL^*} + K)$ regret for NeuCB, where $L^*$ is the loss of the best policy.
no code implementations • 7 Dec 2021 • Rohan Deb, Shalabh Bhatnagar
This paper presents the first sufficient conditions that guarantee the stability and almost sure convergence of $N$-timescale stochastic approximation (SA) iterates for any $N\geq1$.
no code implementations • 23 Nov 2021 • Rohan Deb, Meet Gandhi, Shalabh Bhatnagar
However, the weights assigned to different $n$-step returns in TD($\lambda$), controlled by the parameter $\lambda$, decrease exponentially with increasing $n$.
no code implementations • 22 Nov 2021 • Rohan Deb, Shalabh Bhatnagar
Here, we consider Gradient TD algorithms with an additional heavy ball momentum term and provide choice of step size and momentum parameter that ensures almost sure convergence of these algorithms asymptotically.
no code implementations • 29 Oct 2021 • Swetha Ganesh, Rohan Deb, Gugan Thoppe, Amarjit Budhiraja
Stochastic Heavy Ball (SHB) and Nesterov's Accelerated Stochastic Gradient (ASG) are popular momentum methods in stochastic optimization.