Search Results for author: Hengxiang Zhang

Found 4 papers, 0 papers with code

ChineseSafe: A Chinese Benchmark for Evaluating Safety in Large Language Models

no code implementations24 Oct 2024 Hengxiang Zhang, Hongfu Gao, Qiang Hu, Guanhua Chen, Lili Yang, BingYi Jing, Hongxin Wei, Bing Wang, Haifeng Bai, Lei Yang

While previous works have introduced several benchmarks to evaluate the safety risk of LLMs, the community still has a limited understanding of current LLMs' capability to recognize illegal and unsafe content in Chinese contexts.

Defending Membership Inference Attacks via Privacy-aware Sparsity Tuning

no code implementations9 Oct 2024 Qiang Hu, Hengxiang Zhang, Hongxin Wei

Over-parameterized models are typically vulnerable to membership inference attacks, which aim to determine whether a specific sample is included in the training of a given model.

Fine-tuning can Help Detect Pretraining Data from Large Language Models

no code implementations9 Oct 2024 Hengxiang Zhang, Songxin Zhang, BingYi Jing, Hongxin Wei

In light of this, we introduce a novel and effective method termed Fine-tuned Score Deviation (FSD), which improves the performance of current scoring functions for pretraining data detection.

Cannot find the paper you are looking for? You can Submit a new open access paper.