Search Results for author: Zhexin Zhang

Found 14 papers, 12 papers with code

Safety Assessment of Chinese Large Language Models

1 code implementation20 Apr 2023 Hao Sun, Zhexin Zhang, Jiawen Deng, Jiale Cheng, Minlie Huang

To further promote the safe deployment of LLMs, we develop a Chinese LLM safety assessment benchmark.

ShieldLM: Empowering LLMs as Aligned, Customizable and Explainable Safety Detectors

1 code implementation26 Feb 2024 Zhexin Zhang, Yida Lu, Jingyuan Ma, Di Zhang, Rui Li, Pei Ke, Hao Sun, Lei Sha, Zhifang Sui, Hongning Wang, Minlie Huang

The safety of Large Language Models (LLMs) has gained increasing attention in recent years, but there still lacks a comprehensive approach for detecting safety issues within LLMs' responses in an aligned, customizable and explainable manner.

OpenMEVA: A Benchmark for Evaluating Open-ended Story Generation Metrics

1 code implementation ACL 2021 Jian Guan, Zhexin Zhang, Zhuoer Feng, Zitao Liu, Wenbiao Ding, Xiaoxi Mao, Changjie Fan, Minlie Huang

Automatic metrics are essential for developing natural language generation (NLG) models, particularly for open-ended language generation tasks such as story generation.

Story Generation

Ethicist: Targeted Training Data Extraction Through Loss Smoothed Soft Prompting and Calibrated Confidence Estimation

1 code implementation10 Jul 2023 Zhexin Zhang, Jiaxin Wen, Minlie Huang

In this paper, we propose a method named Ethicist for targeted training data extraction through loss smoothed soft prompting and calibrated confidence estimation, investigating how to recover the suffix in the training data when given a prefix.

Memorization

Persona-Guided Planning for Controlling the Protagonist’s Persona in Story Generation

1 code implementation NAACL 2022 Zhexin Zhang, Jiaxin Wen, Jian Guan, Minlie Huang

In this paper, we aim to control the protagonist’s persona in story generation, i. e., generating a story from a leading context and a persona description, where the protagonist should exhibit the specified personality through a coherent event sequence.

Sentence Story Generation

Selecting Stickers in Open-Domain Dialogue through Multitask Learning

1 code implementation Findings (ACL) 2022 Zhexin Zhang, Yeshuang Zhu, Zhengcong Fei, Jinchao Zhang, Jie zhou

With the increasing popularity of online chatting, stickers are becoming important in our online communication.

Unveiling the Implicit Toxicity in Large Language Models

1 code implementation29 Nov 2023 Jiaxin Wen, Pei Ke, Hao Sun, Zhexin Zhang, Chengfei Li, Jinfeng Bai, Minlie Huang

While recent studies primarily focus on probing toxic outputs that can be easily detected with existing toxicity classifiers, we show that LLMs can generate diverse implicit toxic outputs that are exceptionally difficult to detect via simply zero-shot prompting.

Language Modelling Reinforcement Learning (RL)

Constructing Highly Inductive Contexts for Dialogue Safety through Controllable Reverse Generation

1 code implementation4 Dec 2022 Zhexin Zhang, Jiale Cheng, Hao Sun, Jiawen Deng, Fei Mi, Yasheng Wang, Lifeng Shang, Minlie Huang

In order to detect such toxic generations, existing methods rely on templates, real-world data extraction, crowdsourcing workers, or automatic generation to construct adversarial contexts that are likely to induce toxic generations.

Response Generation

Defending Large Language Models Against Jailbreaking Attacks Through Goal Prioritization

1 code implementation15 Nov 2023 Zhexin Zhang, Junxiao Yang, Pei Ke, Minlie Huang

We hope our work could contribute to the comprehension of jailbreaking attacks and defenses, and shed light on the relationship between LLMs' capability and safety.

Neural Machine Translation Inspired Binary Code Similarity Comparison beyond Function Pairs

no code implementations8 Aug 2018 Fei Zuo, Xiaopeng Li, Patrick Young, Lannan Luo, Qiang Zeng, Zhexin Zhang

The solutions to these two problems have many applications, such as cross-architecture vulnerability discovery and code plagiarism detection.

Machine Translation Translation

Towards Safer Generative Language Models: A Survey on Safety Risks, Evaluations, and Improvements

no code implementations18 Feb 2023 Jiawen Deng, Jiale Cheng, Hao Sun, Zhexin Zhang, Minlie Huang

This survey presents a framework for safety research pertaining to large models, delineating the landscape of safety risks as well as safety evaluation and improvement methods.

Adversarial Attack Ethics

Cannot find the paper you are looking for? You can Submit a new open access paper.