Search Results for author: Ruixuan Huang

Found 2 papers, 1 papers with code

Uncovering Safety Risks in Open-source LLMs through Concept Activation Vector

no code implementations18 Apr 2024 Zhihao Xu, Ruixuan Huang, Xiting Wang, Fangzhao Wu, Jing Yao, Xing Xie

Even when successful, the harmfulness of their outputs cannot be guaranteed, leading to suspicions that these methods have not accurately identified the safety vulnerabilities of LLMs.

Cannot find the paper you are looking for? You can Submit a new open access paper.