no code implementations • 22 Oct 2024 • Amirhossein Afsharrad, Parisa Oftadeh, Ahmadreza Moradipari, Sanjay Lall
We show that our regret bound is of order $ \mathcal{O}\left(\frac{d}{\tau-c_0}\frac{\log(NT)^2}{\sqrt{N}}\sqrt{\frac{T}{\log(1/|\lambda_2|)}}\right)$, where $\lambda_2$ is the second largest (in absolute value) eigenvalue of the communication matrix, and $\tau-c_0$ is the known cost gap of a feasible action.
no code implementations • 7 Nov 2023 • Amirhossein Afsharrad, Ahmadreza Moradipari, Sanjay Lall
Recently, bandit optimization has received significant attention in real-world safety-critical systems that involve repeated interactions with humans.
no code implementations • NeurIPS 2023 • Ahmadreza Moradipari, Mohammad Pedramfar, Modjtaba Shokrian Zini, Vaneet Aggarwal
In this paper, we prove the first Bayesian regret bounds for Thompson Sampling in reinforcement learning in a multitude of settings.
no code implementations • 26 Jul 2023 • Mahyar Abbasian, Taha Rajabzadeh, Ahmadreza Moradipari, Seyed Amir Hossein Aqajari, HongSheng Lu, Amir Rahmani
Generative Adversarial Networks (GAN) have emerged as a formidable AI tool to generate realistic outputs based on training datasets.
no code implementations • 12 May 2022 • Ahmadreza Moradipari, Mohammad Ghavamzadeh, Mahnoosh Alizadeh
We propose a distributed upper confidence bound (UCB) algorithm and prove a high probability bound on its $T$-round regret in which we include a linear growth of regret associated with each communication round.
no code implementations • 12 May 2022 • Ahmadreza Moradipari, Mohammad Ghavamzadeh, Taha Rajabzadeh, Christos Thrampoulidis, Mahnoosh Alizadeh
In this work we investigate meta-learning (or learning-to-learn) approaches in multi-task linear stochastic bandit problems that can originate from multiple environments.
no code implementations • 9 Jun 2021 • Ahmadreza Moradipari, Berkay Turan, Yasin Abbasi-Yadkori, Mahnoosh Alizadeh, Mohammad Ghavamzadeh
In the second setting, the reward parameter of the LB problem is arbitrarily selected from $M$ models represented as (possibly) overlapping balls in $\mathbb R^d$.
no code implementations • NeurIPS 2020 • Ahmadreza Moradipari, Christos Thrampoulidis, Mahnoosh Alizadeh
For this problem, we present two novel algorithms, stage-wise conservative linear Thompson Sampling (SCLTS) and stage-wise conservative linear UCB (SCLUCB), that respect the baseline constraints and enjoy probabilistic regret bounds of order O(\sqrt{T} \log^{3/2}T) and O(\sqrt{T} \log T), respectively.
1 code implementation • 28 Jan 2020 • Modjtaba Shokrian Zini, Mohammad Pedramfar, Matthew Riemer, Ahmadreza Moradipari, Miao Liu
Coagent networks formalize the concept of arbitrary networks of stochastic agents that collaborate to take actions in a reinforcement learning environment.
Hierarchical Reinforcement Learning reinforcement-learning +1
no code implementations • 6 Nov 2019 • Ahmadreza Moradipari, Sanae Amani, Mahnoosh Alizadeh, Christos Thrampoulidis
We compare the performance of our algorithm with UCB-based safe algorithms and highlight how the inherently randomized nature of TS leads to a superior performance in expanding the set of safe actions the algorithm has access to at each round.
no code implementations • 21 Nov 2016 • Ahmadreza Moradipari, Sina Shahsavari, Ashkan Esmaeili, Farokh Marvasti
When sparse models are also suffering from MI, the sparse recovery and inference of the missing models are taken into account simultaneously.