Search Results for author: Josef Dai

Found 4 papers, 2 papers with code

SafeSora: Towards Safety Alignment of Text2Video Generation via a Human Preference Dataset

1 code implementation20 Jun 2024 Josef Dai, Tianle Chen, Xuyao Wang, Ziran Yang, Taiye Chen, Jiaming Ji, Yaodong Yang

To mitigate the risk of harmful outputs from large vision models (LVMs), we introduce the SafeSora dataset to promote research on aligning text-to-video generation with human values.

Safety Alignment Text-to-Video Generation +2

PKU-SafeRLHF: Towards Multi-Level Safety Alignment for LLMs with Human Preference

no code implementations20 Jun 2024 Jiaming Ji, Donghai Hong, Borong Zhang, Boyuan Chen, Josef Dai, Boren Zheng, Tianyi Qiu, Boxun Li, Yaodong Yang

In this work, we introduce the PKU-SafeRLHF dataset, designed to promote research on safety alignment in large language models (LLMs).

Question Answering Safety Alignment

Reward Generalization in RLHF: A Topological Perspective

no code implementations15 Feb 2024 Tianyi Qiu, Fanzhi Zeng, Jiaming Ji, Dong Yan, Kaile Wang, Jiayi Zhou, Yang Han, Josef Dai, Xuehai Pan, Yaodong Yang

As a solution, we introduce a theoretical framework for investigating reward generalization in reinforcement learning from human feedback (RLHF), focusing on the topology of information flow at both macro and micro levels.

Generalization Bounds Language Modelling +1

Safe RLHF: Safe Reinforcement Learning from Human Feedback

1 code implementation19 Oct 2023 Josef Dai, Xuehai Pan, Ruiyang Sun, Jiaming Ji, Xinbo Xu, Mickel Liu, Yizhou Wang, Yaodong Yang

However, the inherent tension between the objectives of helpfulness and harmlessness presents a significant challenge during LLM training.

reinforcement-learning Reinforcement Learning +1

Cannot find the paper you are looking for? You can Submit a new open access paper.