1 code implementation • 20 Apr 2023 • Hao Sun, Zhexin Zhang, Jiawen Deng, Jiale Cheng, Minlie Huang
To further promote the safe deployment of LLMs, we develop a Chinese LLM safety assessment benchmark.
1 code implementation • 13 Sep 2023 • Zhexin Zhang, Leqi Lei, Lindong Wu, Rui Sun, Yongkang Huang, Chong Long, Xiao Liu, Xuanyu Lei, Jie Tang, Minlie Huang
Notably, SafetyBench also incorporates both Chinese and English data, facilitating the evaluation in both languages.
1 code implementation • 26 Feb 2024 • Zhexin Zhang, Yida Lu, Jingyuan Ma, Di Zhang, Rui Li, Pei Ke, Hao Sun, Lei Sha, Zhifang Sui, Hongning Wang, Minlie Huang
The safety of Large Language Models (LLMs) has gained increasing attention in recent years, but there still lacks a comprehensive approach for detecting safety issues within LLMs' responses in an aligned, customizable and explainable manner.
1 code implementation • ACL 2021 • Jian Guan, Zhexin Zhang, Zhuoer Feng, Zitao Liu, Wenbiao Ding, Xiaoxi Mao, Changjie Fan, Minlie Huang
Automatic metrics are essential for developing natural language generation (NLG) models, particularly for open-ended language generation tasks such as story generation.
1 code implementation • 10 Jul 2023 • Zhexin Zhang, Jiaxin Wen, Minlie Huang
In this paper, we propose a method named Ethicist for targeted training data extraction through loss smoothed soft prompting and calibrated confidence estimation, investigating how to recover the suffix in the training data when given a prefix.
1 code implementation • 22 Apr 2022 • Zhexin Zhang, Jiaxin Wen, Jian Guan, Minlie Huang
Endowing the protagonist with a specific personality is essential for writing an engaging story.
1 code implementation • NAACL 2022 • Zhexin Zhang, Jiaxin Wen, Jian Guan, Minlie Huang
In this paper, we aim to control the protagonist’s persona in story generation, i. e., generating a story from a leading context and a persona description, where the protagonist should exhibit the specified personality through a coherent event sequence.
1 code implementation • Findings (ACL) 2022 • Zhexin Zhang, Yeshuang Zhu, Zhengcong Fei, Jinchao Zhang, Jie zhou
With the increasing popularity of online chatting, stickers are becoming important in our online communication.
1 code implementation • 29 Nov 2023 • Jiaxin Wen, Pei Ke, Hao Sun, Zhexin Zhang, Chengfei Li, Jinfeng Bai, Minlie Huang
While recent studies primarily focus on probing toxic outputs that can be easily detected with existing toxicity classifiers, we show that LLMs can generate diverse implicit toxic outputs that are exceptionally difficult to detect via simply zero-shot prompting.
1 code implementation • 4 Dec 2022 • Zhexin Zhang, Jiale Cheng, Hao Sun, Jiawen Deng, Fei Mi, Yasheng Wang, Lifeng Shang, Minlie Huang
In order to detect such toxic generations, existing methods rely on templates, real-world data extraction, crowdsourcing workers, or automatic generation to construct adversarial contexts that are likely to induce toxic generations.
1 code implementation • 21 Dec 2022 • Hao Sun, Zhexin Zhang, Fei Mi, Yasheng Wang, Wei Liu, Jianwei Cui, Bin Wang, Qun Liu, Minlie Huang
In this paper, we propose a framework, MoralDial to train and evaluate moral dialogue systems.
1 code implementation • 15 Nov 2023 • Zhexin Zhang, Junxiao Yang, Pei Ke, Minlie Huang
We hope our work could contribute to the comprehension of jailbreaking attacks and defenses, and shed light on the relationship between LLMs' capability and safety.
no code implementations • 8 Aug 2018 • Fei Zuo, Xiaopeng Li, Patrick Young, Lannan Luo, Qiang Zeng, Zhexin Zhang
The solutions to these two problems have many applications, such as cross-architecture vulnerability discovery and code plagiarism detection.
no code implementations • 18 Feb 2023 • Jiawen Deng, Jiale Cheng, Hao Sun, Zhexin Zhang, Minlie Huang
This survey presents a framework for safety research pertaining to large models, delineating the landscape of safety risks as well as safety evaluation and improvement methods.