no code implementations • 22 Jan 2025 • Hanning Zhang, Juntong Song, Juno Zhu, Yuanhao Wu, Tong Zhang, Cheng Niu
Using \textbf{RAG-Reward}, we train reward models and apply reinforcement learning with human feedback (RLHF) to improve LLMs' effectiveness in RAG.
1 code implementation • 15 Dec 2024 • Hanning Zhang, Pengcheng Wang, Shizhe Diao, Yong Lin, Rui Pan, Hanze Dong, Dylan Zhang, Pavlo Molchanov, Tong Zhang
Our theoretical analysis shows that we could derive the optimal reward model from the initial policy sampling.
no code implementations • 27 Nov 2023 • Chi Han, Jialiang Xu, Manling Li, Hanning Zhang, Tarek Abdelzaher, Heng Ji
Social media play a significant role in shaping public opinion and influencing ideological communities through information propagation.
1 code implementation • 16 Nov 2023 • Hanning Zhang, Shizhe Diao, Yong Lin, Yi R. Fung, Qing Lian, Xingyao Wang, Yangyi Chen, Heng Ji, Tong Zhang
This approach is formalized by first identifying the disparity in knowledge encompassed by pre-trained parameters compared to that of instruction tuning data.
1 code implementation • 12 Sep 2023 • Yong Lin, Hangyu Lin, Wei Xiong, Shizhe Diao, Jianmeng Liu, Jipeng Zhang, Rui Pan, Haoxiang Wang, Wenbin Hu, Hanning Zhang, Hanze Dong, Renjie Pi, Han Zhao, Nan Jiang, Heng Ji, Yuan YAO, Tong Zhang
Building on the analysis and the observation that averaging different layers of the transformer leads to significantly different alignment-forgetting trade-offs, we propose Heterogeneous Model Averaging (HMA) to Heterogeneously find various combination ratios of model layers.
no code implementations • 23 Dec 2020 • Huayu Li, Haiyu Wu, Xiwen Chen, Hanning Zhang, Abolfazl Razi
We equip the SPA blocks on a network designed for real image denoising.