Search Results for author: Yinglun Xu

Found 6 papers, 1 papers with code

Reward Poisoning Attack Against Offline Reinforcement Learning

no code implementations15 Feb 2024 Yinglun Xu, Rohan Gumaste, Gagandeep Singh

To the best of our knowledge, we propose the first black-box reward poisoning attack in the general offline RL setting.

Offline RL reinforcement-learning

Efficient Two-Phase Offline Deep Reinforcement Learning from Preference Feedback

no code implementations30 Dec 2023 Yinglun Xu, Gagandeep Singh

Our method ignores such state-actions during the second learning phase to achieve higher learning efficiency.

reinforcement-learning

On the Robustness of Epoch-Greedy in Multi-Agent Contextual Bandit Mechanisms

no code implementations15 Jul 2023 Yinglun Xu, Bhuvesh Kumar, Jacob Abernethy

Efficient learning in multi-armed bandit mechanisms such as pay-per-click (PPC) auctions typically involves three challenges: 1) inducing truthful bidding behavior (incentives), 2) using personalization in the users (context), and 3) circumventing manipulations in click patterns (corruptions).

Black-Box Targeted Reward Poisoning Attack Against Online Deep Reinforcement Learning

no code implementations18 May 2023 Yinglun Xu, Gagandeep Singh

We leverage a general framework and find conditions to ensure efficient attack under a general assumption of the learning algorithms.

reinforcement-learning

Efficient Reward Poisoning Attacks on Online Deep Reinforcement Learning

1 code implementation30 May 2022 Yinglun Xu, Qi Zeng, Gagandeep Singh

We study reward poisoning attacks on online deep reinforcement learning (DRL), where the attacker is oblivious to the learning algorithm used by the agent and the dynamics of the environment.

Data Poisoning reinforcement-learning +1

Observation-Free Attacks on Stochastic Bandits

no code implementations NeurIPS 2021 Yinglun Xu, Bhuvesh Kumar, Jacob D. Abernethy

To the best of our knowledge, we develop the first data corruption attack on stochastic multi arm bandit algorithms which works without observing the algorithm's realized behavior.

Thompson Sampling

Cannot find the paper you are looking for? You can Submit a new open access paper.