no code implementations • 24 Oct 2024 • Hengxiang Zhang, Hongfu Gao, Qiang Hu, Guanhua Chen, Lili Yang, BingYi Jing, Hongxin Wei, Bing Wang, Haifeng Bai, Lei Yang
While previous works have introduced several benchmarks to evaluate the safety risk of LLMs, the community still has a limited understanding of current LLMs' capability to recognize illegal and unsafe content in Chinese contexts.
no code implementations • 9 Oct 2024 • Qiang Hu, Hengxiang Zhang, Hongxin Wei
Over-parameterized models are typically vulnerable to membership inference attacks, which aim to determine whether a specific sample is included in the training of a given model.
no code implementations • 9 Oct 2024 • Hengxiang Zhang, Songxin Zhang, BingYi Jing, Hongxin Wei
In light of this, we introduce a novel and effective method termed Fine-tuned Score Deviation (FSD), which improves the performance of current scoring functions for pretraining data detection.
no code implementations • 21 Aug 2024 • Minghao Liu, Zonglin Di, Jiaheng Wei, Zhongruo Wang, Hengxiang Zhang, Ruixuan Xiao, Haoyu Wang, Jinlong Pang, Hao Chen, Ankit Shah, Hongxin Wei, Xinlei He, Zhaowei Zhao, Haobo Wang, Lei Feng, Jindong Wang, James Davis, Yang Liu
Furthermore, we design three benchmark datasets focused on label noise detection, label noise learning, and class-imbalanced learning.