Search Results for author: Anshuka Rangi

Found 9 papers, 0 papers with code

Multi-Objective Optimization via Wasserstein-Fisher-Rao Gradient Flow

no code implementations22 Nov 2023 Yinuo Ren, Tesi Xiao, Tanmay Gangwani, Anshuka Rangi, Holakou Rahmanian, Lexing Ying, Subhajit Sanyal

Multi-objective optimization (MOO) aims to optimize multiple, possibly conflicting objectives with widespread applications.

Selective Uncertainty Propagation in Offline RL

no code implementations1 Feb 2023 Sanath Kumar Krishnamurthy, Shrey Modi, Tanmay Gangwani, Sumeet Katariya, Branislav Kveton, Anshuka Rangi

We consider the finite-horizon offline reinforcement learning (RL) setting, and are motivated by the challenge of learning the policy at any step h in dynamic programming (DP) algorithms.

Offline RL reinforcement-learning +1

Multi-Player Bandits Robust to Adversarial Collisions

no code implementations15 Nov 2022 Shivakumar Mahesh, Anshuka Rangi, Haifeng Xu, Long Tran-Thanh

We provide the first decentralized and robust algorithm RESYNC for defenders whose performance deteriorates gracefully as $\tilde{O}(C)$ as the number of collisions $C$ from the attackers increases.

Multi-Armed Bandits

Understanding the Limits of Poisoning Attacks in Episodic Reinforcement Learning

no code implementations29 Aug 2022 Anshuka Rangi, Haifeng Xu, Long Tran-Thanh, Massimo Franceschetti

To understand the security threats to reinforcement learning (RL) algorithms, this paper studies poisoning attacks to manipulate \emph{any} order-optimal learning algorithm towards a targeted policy in episodic RL and examines the potential damage of two natural types of poisoning attacks, i. e., the manipulation of \emph{reward} and \emph{action}.

reinforcement-learning Reinforcement Learning (RL)

Saving Stochastic Bandits from Poisoning Attacks via Limited Data Verification

no code implementations15 Feb 2021 Anshuka Rangi, Long Tran-Thanh, Haifeng Xu, Massimo Franceschetti

In particular, for the case of unlimited verifications, we show that with $O(\log T)$ expected number of verifications, a simple modified version of the ETC type bandit algorithm can restore the order optimal $O(\log T)$ regret irrespective of the amount of contamination used by the attacker.

Data Poisoning

Sequential Choice Bandits with Feedback for Personalizing users' experience

no code implementations5 Jan 2021 Anshuka Rangi, Massimo Franceschetti, Long Tran-Thanh

We then propose bandit algorithms for the two feedback models and show that upper and lower bounds on the regret are of the order of $\tilde{O}(N^{2/3})$ and $\tilde\Omega(N^{2/3})$, respectively, where $N$ is the total number of users.

Learning-based attacks in Cyber-Physical Systems: Exploration, Detection, and Control Cost trade-offs

no code implementations21 Nov 2020 Anshuka Rangi, Mohammad Javad Khojasteh, Massimo Franceschetti

We study the trade-offs between the information acquired by the attacker from observations, the detection capabilities of the controller, and the control cost.

Online learning with feedback graphs and switching costs

no code implementations23 Oct 2018 Anshuka Rangi, Massimo Franceschetti

For the two special cases of symmetric PI setting and MAB, the expected regret of both of these algorithms is order optimal in the duration of the learning process.

Multi-Armed Bandits

Distributed Chernoff Test: Optimal decision systems over networks

no code implementations12 Sep 2018 Anshuka Rangi, Massimo Franceschetti, Stefano Marano

In the first case, the network nodes interact with each other through a central entity, which plays the role of a fusion center.

Decision Making Quantization +1

Cannot find the paper you are looking for? You can Submit a new open access paper.