Search Results for author: Josef Dai

Found 6 papers, 3 papers with code

Align Anything: Training All-Modality Models to Follow Instructions with Language Feedback

1 code implementation20 Dec 2024 Jiaming Ji, Jiayi Zhou, Hantao Lou, Boyuan Chen, Donghai Hong, Xuyao Wang, Wenqi Chen, Kaile Wang, Rui Pan, Jiahao Li, Mohan Wang, Josef Dai, Tianyi Qiu, Hua Xu, Dong Li, WeiPeng Chen, Jun Song, Bo Zheng, Yaodong Yang

In this work, we make the first attempt to fine-tune all-modality models (i. e. input and output with any modality, also named any-to-any models) using human preference data across all modalities (including text, image, audio, and video), ensuring its behavior aligns with human intentions.

Instruction Following

SafeSora: Towards Safety Alignment of Text2Video Generation via a Human Preference Dataset

1 code implementation20 Jun 2024 Josef Dai, Tianle Chen, Xuyao Wang, Ziran Yang, Taiye Chen, Jiaming Ji, Yaodong Yang

To mitigate the risk of harmful outputs from large vision models (LVMs), we introduce the SafeSora dataset to promote research on aligning text-to-video generation with human values.

Safety Alignment Text-to-Video Generation +2

PKU-SafeRLHF: Towards Multi-Level Safety Alignment for LLMs with Human Preference

no code implementations20 Jun 2024 Jiaming Ji, Donghai Hong, Borong Zhang, Boyuan Chen, Josef Dai, Boren Zheng, Tianyi Qiu, Boxun Li, Yaodong Yang

In this work, we introduce the PKU-SafeRLHF dataset, designed to promote research on safety alignment in large language models (LLMs).

Question Answering Safety Alignment

Language Models Resist Alignment: Evidence From Data Compression

no code implementations10 Jun 2024 Jiaming Ji, Kaile Wang, Tianyi Qiu, Boyuan Chen, Jiayi Zhou, Changye Li, Hantao Lou, Josef Dai, Yunhuai Liu, Yaodong Yang

Empirically, we demonstrate the elasticity of post-alignment models, i. e., the tendency to revert to the behavior distribution formed during the pre-training phase upon further fine-tuning.

Data Compression

Reward Generalization in RLHF: A Topological Perspective

no code implementations15 Feb 2024 Tianyi Qiu, Fanzhi Zeng, Jiaming Ji, Dong Yan, Kaile Wang, Jiayi Zhou, Yang Han, Josef Dai, Xuehai Pan, Yaodong Yang

As a solution, we introduce a theoretical framework for investigating reward generalization in reinforcement learning from human feedback (RLHF), focusing on the topology of information flow at both macro and micro levels.

Generalization Bounds Language Modelling +1

Safe RLHF: Safe Reinforcement Learning from Human Feedback

1 code implementation19 Oct 2023 Josef Dai, Xuehai Pan, Ruiyang Sun, Jiaming Ji, Xinbo Xu, Mickel Liu, Yizhou Wang, Yaodong Yang

However, the inherent tension between the objectives of helpfulness and harmlessness presents a significant challenge during LLM training.

reinforcement-learning Reinforcement Learning +1

Cannot find the paper you are looking for? You can Submit a new open access paper.