Search Results for author: Ziran Yang

Found 2 papers, 0 papers with code

Panacea: Pareto Alignment via Preference Adaptation for LLMs

no code implementations • 3 Feb 2024 • Yifan Zhong, Chengdong Ma, Xiaoyuan Zhang, Ziran Yang, Qingfu Zhang, Siyuan Qi, Yaodong Yang

Our work marks a step forward in effectively and efficiently aligning models to diverse and intricate human preferences in a controllable and Pareto-optimal manner.

Language Modelling Large Language Model

Paper
Add Code

Red Teaming Game: A Game-Theoretic Framework for Red Teaming Language Models

no code implementations • 30 Sep 2023 • Chengdong Ma, Ziran Yang, Minquan Gao, Hai Ci, Jun Gao, Xuehai Pan, Yaodong Yang

In this paper, we present Red-teaming Game (RTG), a general game-theoretic framework without manual annotation.

Language Modelling Vulnerability Detection

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.