no code implementations • 22 Nov 2023 • Yinuo Ren, Tesi Xiao, Tanmay Gangwani, Anshuka Rangi, Holakou Rahmanian, Lexing Ying, Subhajit Sanyal
Multi-objective optimization (MOO) aims to optimize multiple, possibly conflicting objectives with widespread applications.
no code implementations • 1 Feb 2023 • Sanath Kumar Krishnamurthy, Shrey Modi, Tanmay Gangwani, Sumeet Katariya, Branislav Kveton, Anshuka Rangi
We consider the finite-horizon offline reinforcement learning (RL) setting, and are motivated by the challenge of learning the policy at any step h in dynamic programming (DP) algorithms.
no code implementations • 15 Nov 2022 • Shivakumar Mahesh, Anshuka Rangi, Haifeng Xu, Long Tran-Thanh
We provide the first decentralized and robust algorithm RESYNC for defenders whose performance deteriorates gracefully as $\tilde{O}(C)$ as the number of collisions $C$ from the attackers increases.
no code implementations • 29 Aug 2022 • Anshuka Rangi, Haifeng Xu, Long Tran-Thanh, Massimo Franceschetti
To understand the security threats to reinforcement learning (RL) algorithms, this paper studies poisoning attacks to manipulate \emph{any} order-optimal learning algorithm towards a targeted policy in episodic RL and examines the potential damage of two natural types of poisoning attacks, i. e., the manipulation of \emph{reward} and \emph{action}.
no code implementations • 15 Feb 2021 • Anshuka Rangi, Long Tran-Thanh, Haifeng Xu, Massimo Franceschetti
In particular, for the case of unlimited verifications, we show that with $O(\log T)$ expected number of verifications, a simple modified version of the ETC type bandit algorithm can restore the order optimal $O(\log T)$ regret irrespective of the amount of contamination used by the attacker.
no code implementations • 5 Jan 2021 • Anshuka Rangi, Massimo Franceschetti, Long Tran-Thanh
We then propose bandit algorithms for the two feedback models and show that upper and lower bounds on the regret are of the order of $\tilde{O}(N^{2/3})$ and $\tilde\Omega(N^{2/3})$, respectively, where $N$ is the total number of users.
no code implementations • 21 Nov 2020 • Anshuka Rangi, Mohammad Javad Khojasteh, Massimo Franceschetti
We study the trade-offs between the information acquired by the attacker from observations, the detection capabilities of the controller, and the control cost.
no code implementations • 23 Oct 2018 • Anshuka Rangi, Massimo Franceschetti
For the two special cases of symmetric PI setting and MAB, the expected regret of both of these algorithms is order optimal in the duration of the learning process.
no code implementations • 12 Sep 2018 • Anshuka Rangi, Massimo Franceschetti, Stefano Marano
In the first case, the network nodes interact with each other through a central entity, which plays the role of a fusion center.