Search Results for author: Junlin Wu

Found 7 papers, 3 papers with code

Preference Poisoning Attacks on Reward Model Learning

no code implementations2 Feb 2024 Junlin Wu, Jiongxiao Wang, Chaowei Xiao, Chenguang Wang, Ning Zhang, Yevgeniy Vorobeychik

In addition, we observe that the simpler and more scalable rank-by-distance approaches are often competitive with the best, and on occasion significantly outperform gradient-based methods.

On the Exploitability of Reinforcement Learning with Human Feedback for Large Language Models

no code implementations16 Nov 2023 Jiongxiao Wang, Junlin Wu, Muhao Chen, Yevgeniy Vorobeychik, Chaowei Xiao

Reinforcement Learning with Human Feedback (RLHF) is a methodology designed to align Large Language Models (LLMs) with human preferences, playing an important role in LLMs alignment.

Backdoor Attack Data Poisoning

Neural Lyapunov Control for Discrete-Time Systems

1 code implementation NeurIPS 2023 Junlin Wu, Andrew Clark, Yiannis Kantaros, Yevgeniy Vorobeychik

However, finding Lyapunov functions for general nonlinear systems is a challenging task.

Certifying Safety in Reinforcement Learning under Adversarial Perturbation Attacks

no code implementations28 Dec 2022 Junlin Wu, Hussein Sibai, Yevgeniy Vorobeychik

Our experiments demonstrate both the efficacy of the proposed approach for certifying safety in adversarial environments, and the value of the PSRL framework coupled with adversarial training in improving certified safety while preserving high nominal reward and high-quality predictions of true state.

reinforcement-learning Reinforcement Learning (RL)

Robust Deep Reinforcement Learning through Bootstrapped Opportunistic Curriculum

1 code implementation21 Jun 2022 Junlin Wu, Yevgeniy Vorobeychik

Despite considerable advances in deep reinforcement learning, it has been shown to be highly vulnerable to adversarial perturbations to state observations.

Adversarial Robustness reinforcement-learning +1

Learning Generative Deception Strategies in Combinatorial Masking Games

no code implementations23 Sep 2021 Junlin Wu, Charles Kamhoua, Murat Kantarcioglu, Yevgeniy Vorobeychik

Next, we present a novel highly scalable approach for approximately solving such games by representing the strategies of both players as neural networks.

Cannot find the paper you are looking for? You can Submit a new open access paper.