Search Results for author: Huaijie Wang

Found 2 papers, 2 papers with code

Offline Reinforcement Learning for LLM Multi-Step Reasoning

2 code implementations20 Dec 2024 Huaijie Wang, Shibo Hao, Hanze Dong, Shenao Zhang, Yilin Bao, Ziran Yang, Yi Wu

While Direct Preference Optimization (DPO) has shown promise in aligning LLMs with human preferences, it is less suitable for multi-step reasoning tasks because (1) DPO relies on paired preference data, which is not readily available for multi-step reasoning tasks, and (2) it treats all tokens uniformly, making it ineffective for credit assignment in multi-step reasoning tasks, which often come with sparse reward.

GSM8K Math +5

BitNet: Scaling 1-bit Transformers for Large Language Models

2 code implementations17 Oct 2023 Hongyu Wang, Shuming Ma, Li Dong, Shaohan Huang, Huaijie Wang, Lingxiao Ma, Fan Yang, Ruiping Wang, Yi Wu, Furu Wei

The increasing size of large language models has posed challenges for deployment and raised concerns about environmental impact due to high energy consumption.

Language Modeling Language Modelling +1

Cannot find the paper you are looking for? You can Submit a new open access paper.