Search Results for author: Ronghui Mu

Found 8 papers, 5 papers with code

Towards Fairness-Aware Adversarial Learning

1 code implementation27 Feb 2024 Yanghao Zhang, Tianle Zhang, Ronghui Mu, Xiaowei Huang, Wenjie Ruan

As a generalization of conventional AT, we re-define the problem of adversarial training as a min-max-max framework, to ensure both robustness and fairness of the trained model.

Fairness

Building Guardrails for Large Language Models

no code implementations2 Feb 2024 Yi Dong, Ronghui Mu, Gaojie Jin, Yi Qi, Jinwei Hu, Xingyu Zhao, Jie Meng, Wenjie Ruan, Xiaowei Huang

As Large Language Models (LLMs) become more integrated into our daily lives, it is crucial to identify and mitigate their risks, especially when the risks can have profound impacts on human users and societies.

Reward Certification for Policy Smoothed Reinforcement Learning

no code implementations11 Dec 2023 Ronghui Mu, Leandro Soriano Marcolino, Tianle Zhang, Yanghao Zhang, Xiaowei Huang, Wenjie Ruan

Reinforcement Learning (RL) has achieved remarkable success in safety-critical areas, but it can be weakened by adversarial attacks.

reinforcement-learning Reinforcement Learning (RL)

A Survey of Safety and Trustworthiness of Large Language Models through the Lens of Verification and Validation

no code implementations19 May 2023 Xiaowei Huang, Wenjie Ruan, Wei Huang, Gaojie Jin, Yi Dong, Changshun Wu, Saddek Bensalem, Ronghui Mu, Yi Qi, Xingyu Zhao, Kaiwen Cai, Yanghao Zhang, Sihao Wu, Peipei Xu, Dengyu Wu, Andre Freitas, Mustafa A. Mustafa

Large Language Models (LLMs) have exploded a new heatwave of AI for their ability to engage end-users in human-level conversations with detailed and articulate answers across many knowledge domains.

Randomized Adversarial Training via Taylor Expansion

1 code implementation CVPR 2023 Gaojie Jin, Xinping Yi, Dengyu Wu, Ronghui Mu, Xiaowei Huang

The randomized weights enable our design of a novel adversarial training method via Taylor expansion of a small Gaussian noise, and we show that the new adversarial training method can flatten loss landscape and find flat minima.

3DVerifier: Efficient Robustness Verification for 3D Point Cloud Models

1 code implementation15 Jul 2022 Ronghui Mu, Wenjie Ruan, Leandro S. Marcolino, Qiang Ni

Thus, we propose an efficient verification framework, 3DVerifier, to tackle both challenges by adopting a linear relaxation function to bound the multiplication layer and combining forward and backward propagation to compute the certified bounds of the outputs of the point cloud models.

Sparse Adversarial Video Attacks with Spatial Transformations

1 code implementation10 Nov 2021 Ronghui Mu, Wenjie Ruan, Leandro Soriano Marcolino, Qiang Ni

In recent years, a significant amount of research efforts concentrated on adversarial attacks on images, while adversarial video attacks have seldom been explored.

Adversarial Attack Bayesian Optimisation +1

Cannot find the paper you are looking for? You can Submit a new open access paper.