Search Results for author: Ronghui Mu

Found 8 papers, 5 papers with code

Towards Fairness-Aware Adversarial Learning

1 code implementation • 27 Feb 2024 • Yanghao Zhang, Tianle Zhang, Ronghui Mu, Xiaowei Huang, Wenjie Ruan

As a generalization of conventional AT, we re-define the problem of adversarial training as a min-max-max framework, to ensure both robustness and fairness of the trained model.

Fairness

Paper
Code

Building Guardrails for Large Language Models

no code implementations • 2 Feb 2024 • Yi Dong, Ronghui Mu, Gaojie Jin, Yi Qi, Jinwei Hu, Xingyu Zhao, Jie Meng, Wenjie Ruan, Xiaowei Huang

As Large Language Models (LLMs) become more integrated into our daily lives, it is crucial to identify and mitigate their risks, especially when the risks can have profound impacts on human users and societies.

Paper
Add Code

Reward Certification for Policy Smoothed Reinforcement Learning

no code implementations • 11 Dec 2023 • Ronghui Mu, Leandro Soriano Marcolino, Tianle Zhang, Yanghao Zhang, Xiaowei Huang, Wenjie Ruan

Reinforcement Learning (RL) has achieved remarkable success in safety-critical areas, but it can be weakened by adversarial attacks.

reinforcement-learning Reinforcement Learning (RL)

Paper
Add Code

A Survey of Safety and Trustworthiness of Large Language Models through the Lens of Verification and Validation

no code implementations • 19 May 2023 • Xiaowei Huang, Wenjie Ruan, Wei Huang, Gaojie Jin, Yi Dong, Changshun Wu, Saddek Bensalem, Ronghui Mu, Yi Qi, Xingyu Zhao, Kaiwen Cai, Yanghao Zhang, Sihao Wu, Peipei Xu, Dengyu Wu, Andre Freitas, Mustafa A. Mustafa

Large Language Models (LLMs) have exploded a new heatwave of AI for their ability to engage end-users in human-level conversations with detailed and articulate answers across many knowledge domains.

Paper
Add Code

Randomized Adversarial Training via Taylor Expansion

1 code implementation • CVPR 2023 • Gaojie Jin, Xinping Yi, Dengyu Wu, Ronghui Mu, Xiaowei Huang

The randomized weights enable our design of a novel adversarial training method via Taylor expansion of a small Gaussian noise, and we show that the new adversarial training method can flatten loss landscape and find flat minima.

Paper
Code

Certified Policy Smoothing for Cooperative Multi-Agent Reinforcement Learning

1 code implementation • 22 Dec 2022 • Ronghui Mu, Wenjie Ruan, Leandro Soriano Marcolino, Gaojie Jin, Qiang Ni

The experimental results show that our method produces meaningful guaranteed robustness for all models and environments.

Multi-agent Reinforcement Learning reinforcement-learning +1

Paper
Code

3DVerifier: Efficient Robustness Verification for 3D Point Cloud Models

1 code implementation • 15 Jul 2022 • Ronghui Mu, Wenjie Ruan, Leandro S. Marcolino, Qiang Ni

Thus, we propose an efficient verification framework, 3DVerifier, to tackle both challenges by adopting a linear relaxation function to bound the multiplication layer and combining forward and backward propagation to compute the certified bounds of the outputs of the point cloud models.

Paper
Code

Sparse Adversarial Video Attacks with Spatial Transformations

1 code implementation • 10 Nov 2021 • Ronghui Mu, Wenjie Ruan, Leandro Soriano Marcolino, Qiang Ni

In recent years, a significant amount of research efforts concentrated on adversarial attacks on images, while adversarial video attacks have seldom been explored.

Adversarial Attack Bayesian Optimisation +1

Paper
Code

Cannot find the paper you are looking for? You can Submit a new open access paper.