1 code implementation • 8 Feb 2024 • Huayu Chen, Guande He, Hang Su, Jun Zhu
Existing alignment methods, such as Direct Preference Optimization (DPO), are mainly tailored for pairwise preference data where rewards are implicitly defined rather than explicitly given.
no code implementations • 6 Dec 2023 • Zehua Chen, Guande He, Kaiwen Zheng, Xu Tan, Jun Zhu
Specifically, we leverage the latent representation obtained from text input as our prior, and build a fully tractable Schrodinger bridge between it and the ground-truth mel-spectrogram, leading to a data-to-data process.
no code implementations • 18 Oct 2023 • Guande He, Peng Cui, Jianfei Chen, WenBo Hu, Jun Zhu
Despite the significant progress made in practical applications of aligned language models (LMs), they tend to be overconfident in output answers compared to the corresponding pre-trained LMs.
1 code implementation • 30 May 2023 • Guande He, Jianfei Chen, Jun Zhu
In light of these observations, we evaluate the calibration of several methods that preserve pre-trained features and show that preserving pre-trained features can improve the calibration of fine-tuned language models.