Search Results for author: Sanae Amani

Found 10 papers, 0 papers with code

Scaling Distributed Multi-task Reinforcement Learning with Experience Sharing

no code implementations11 Jul 2023 Sanae Amani, Khushbu Pahwa, Vladimir Braverman, Lin F. Yang

Our research demonstrates that to achieve $\epsilon$-optimal policies for all $M$ tasks, a single agent using DistMT-LSVI needs to run a total number of episodes that is at most $\tilde{\mathcal{O}}({d^3H^6(\epsilon^{-2}+c_{\rm sep}^{-2})}\cdot M/N)$, where $c_{\rm sep}>0$ is a constant representing task separability, $H$ is the horizon of each episode, and $d$ is the feature dimension of the dynamics and rewards.

OpenAI Gym reinforcement-learning +1

Provably Efficient Lifelong Reinforcement Learning with Linear Function Approximation

no code implementations1 Jun 2022 Sanae Amani, Lin F. Yang, Ching-An Cheng

We study lifelong reinforcement learning (RL) in a regret minimization setting of linear contextual Markov decision process (MDP), where the agent needs to learn a multi-task policy while solving a streaming sequence of tasks.

4k reinforcement-learning +1

Distributed Contextual Linear Bandits with Minimax Optimal Communication Cost

no code implementations26 May 2022 Sanae Amani, Tor Lattimore, András György, Lin F. Yang

In particular, for scenarios with known context distribution, the communication cost of DisBE-LUCB is only $\tilde{\mathcal{O}}(dN)$ and its regret is ${\tilde{\mathcal{O}}}(\sqrt{dNT})$, which is of the same order as that incurred by an optimal single-agent algorithm for $NT$ rounds.

UCB-based Algorithms for Multinomial Logistic Regression Bandits

no code implementations NeurIPS 2021 Sanae Amani, Christos Thrampoulidis

Out of the rich family of generalized linear bandits, perhaps the most well studied ones are logisitc bandits that are used in problems with binary rewards: for instance, when the learner/agent tries to maximize the profit over a user that can select one of two possible outcomes (e. g., `click' vs `no-click').

regression

Decentralized Multi-Agent Linear Bandits with Safety Constraints

no code implementations1 Dec 2020 Sanae Amani, Christos Thrampoulidis

For this problem, we propose DLUCB: a fully decentralized algorithm that minimizes the cumulative regret over the entire network.

Regret Bound for Safe Gaussian Process Bandit Optimization

no code implementations L4DC 2020 Sanae Amani, Mahnoosh Alizadeh, Christos Thrampoulidis

Many applications require a learner to make sequential decisions given uncertainty regarding both the system’s payoff function and safety constraints.

Gaussian Processes

Regret Bounds for Safe Gaussian Process Bandit Optimization

no code implementations5 May 2020 Sanae Amani, Mahnoosh Alizadeh, Christos Thrampoulidis

Many applications require a learner to make sequential decisions given uncertainty regarding both the system's payoff function and safety constraints.

Gaussian Processes

Safe Linear Thompson Sampling with Side Information

no code implementations6 Nov 2019 Ahmadreza Moradipari, Sanae Amani, Mahnoosh Alizadeh, Christos Thrampoulidis

We compare the performance of our algorithm with UCB-based safe algorithms and highlight how the inherently randomized nature of TS leads to a superior performance in expanding the set of safe actions the algorithm has access to at each round.

Thompson Sampling

Linear Stochastic Bandits Under Safety Constraints

no code implementations NeurIPS 2019 Sanae Amani, Mahnoosh Alizadeh, Christos Thrampoulidis

During the pure exploration phase the learner chooses her actions at random from a restricted set of safe actions with the goal of learning a good approximation of the entire unknown safe set.

Safe Exploration

Cannot find the paper you are looking for? You can Submit a new open access paper.