Search Results for author: Ahmadreza Moradipari

Found 11 papers, 1 papers with code

Cooperative Multi-Agent Constrained Stochastic Linear Bandits

no code implementations22 Oct 2024 Amirhossein Afsharrad, Parisa Oftadeh, Ahmadreza Moradipari, Sanjay Lall

We show that our regret bound is of order $ \mathcal{O}\left(\frac{d}{\tau-c_0}\frac{\log(NT)^2}{\sqrt{N}}\sqrt{\frac{T}{\log(1/|\lambda_2|)}}\right)$, where $\lambda_2$ is the second largest (in absolute value) eigenvalue of the communication matrix, and $\tau-c_0$ is the known cost gap of a feasible action.

Convex Methods for Constrained Linear Bandits

no code implementations7 Nov 2023 Amirhossein Afsharrad, Ahmadreza Moradipari, Sanjay Lall

Recently, bandit optimization has received significant attention in real-world safety-critical systems that involve repeated interactions with humans.

Collaborative Multi-agent Stochastic Linear Bandits

no code implementations12 May 2022 Ahmadreza Moradipari, Mohammad Ghavamzadeh, Mahnoosh Alizadeh

We propose a distributed upper confidence bound (UCB) algorithm and prove a high probability bound on its $T$-round regret in which we include a linear growth of regret associated with each communication round.

Multi-Environment Meta-Learning in Stochastic Linear Bandits

no code implementations12 May 2022 Ahmadreza Moradipari, Mohammad Ghavamzadeh, Taha Rajabzadeh, Christos Thrampoulidis, Mahnoosh Alizadeh

In this work we investigate meta-learning (or learning-to-learn) approaches in multi-task linear stochastic bandit problems that can originate from multiple environments.


Feature and Parameter Selection in Stochastic Linear Bandits

no code implementations9 Jun 2021 Ahmadreza Moradipari, Berkay Turan, Yasin Abbasi-Yadkori, Mahnoosh Alizadeh, Mohammad Ghavamzadeh

In the second setting, the reward parameter of the LB problem is arbitrarily selected from $M$ models represented as (possibly) overlapping balls in $\mathbb R^d$.

feature selection Model Selection

Stage-wise Conservative Linear Bandits

no code implementations NeurIPS 2020 Ahmadreza Moradipari, Christos Thrampoulidis, Mahnoosh Alizadeh

For this problem, we present two novel algorithms, stage-wise conservative linear Thompson Sampling (SCLTS) and stage-wise conservative linear UCB (SCLUCB), that respect the baseline constraints and enjoy probabilistic regret bounds of order O(\sqrt{T} \log^{3/2}T) and O(\sqrt{T} \log T), respectively.

Thompson Sampling

Coagent Networks Revisited

1 code implementation28 Jan 2020 Modjtaba Shokrian Zini, Mohammad Pedramfar, Matthew Riemer, Ahmadreza Moradipari, Miao Liu

Coagent networks formalize the concept of arbitrary networks of stochastic agents that collaborate to take actions in a reinforcement learning environment.

Hierarchical Reinforcement Learning reinforcement-learning +1

Safe Linear Thompson Sampling with Side Information

no code implementations6 Nov 2019 Ahmadreza Moradipari, Sanae Amani, Mahnoosh Alizadeh, Christos Thrampoulidis

We compare the performance of our algorithm with UCB-based safe algorithms and highlight how the inherently randomized nature of TS leads to a superior performance in expanding the set of safe actions the algorithm has access to at each round.

Thompson Sampling

Cannot find the paper you are looking for? You can Submit a new open access paper.