Search Results for author: Yuhang Lai

Found 5 papers, 3 papers with code

How Jailbreak Defenses Work and Ensemble? A Mechanistic Investigation

no code implementations20 Feb 2025 Zhuohang Long, Siyuan Wang, Shujun Liu, Yuhang Lai, Xuanjing Huang, Zhongyu Wei

Jailbreak attacks, where harmful prompts bypass generative models' built-in safety, raise serious concerns about model vulnerability.

Binary Classification

HAF-RM: A Hybrid Alignment Framework for Reward Model Training

no code implementations4 Jul 2024 Shujun Liu, Xiaoyu Shen, Yuhang Lai, Siyuan Wang, Shengbin Yue, Zengfeng Huang, Xuanjing Huang, Zhongyu Wei

By decoupling the reward modeling procedure and incorporating hybrid supervision, our HaF-RM framework offers a principled and effective approach to enhancing the performance and alignment of reward models, a critical component in the responsible development of powerful language models.

ALaRM: Align Language Models via Hierarchical Rewards Modeling

1 code implementation11 Mar 2024 Yuhang Lai, Siyuan Wang, Shujun Liu, Xuanjing Huang, Zhongyu Wei

We introduce ALaRM, the first framework modeling hierarchical rewards in reinforcement learning from human feedback (RLHF), which is designed to enhance the alignment of large language models (LLMs) with human preferences.

Long Form Question Answering Machine Translation +1

EVOR: Evolving Retrieval for Code Generation

1 code implementation19 Feb 2024 Hongjin Su, Shuyang Jiang, Yuhang Lai, Haoyuan Wu, Boao Shi, Che Liu, Qian Liu, Tao Yu

Recently the retrieval-augmented generation (RAG) has been successfully applied in code generation.

Code Generation RAG +1

DS-1000: A Natural and Reliable Benchmark for Data Science Code Generation

2 code implementations18 Nov 2022 Yuhang Lai, Chengxi Li, Yiming Wang, Tianyi Zhang, Ruiqi Zhong, Luke Zettlemoyer, Scott Wen-tau Yih, Daniel Fried, Sida Wang, Tao Yu

We introduce DS-1000, a code generation benchmark with a thousand data science problems spanning seven Python libraries, such as NumPy and Pandas.

Code Generation Memorization

Cannot find the paper you are looking for? You can Submit a new open access paper.