no code implementations • 7 Mar 2025 • Teng Xiao, Yige Yuan, Mingxiao Li, Zhengyu Chen, Vasant G Honavar
We establish a close theoretical connection between reinforcement learning from human feedback RLHF and imitation learning (IL), revealing that RLHF implicitly performs imitation learning on the preference data distribution.
1 code implementation • 28 Feb 2025 • Xueyun Tian, Wei Li, Bingbing Xu, Yige Yuan, Yuanzhuo Wang, HuaWei Shen
Experiments show that MIGE excels in both subject-driven generation and instruction-based editing while setting a state-of-the-art in the new task of instruction-based subject-driven editing.
1 code implementation • 2 Feb 2025 • Teng Xiao, Yige Yuan, Zhengyu Chen, Mingxiao Li, Shangsong Liang, Zhaochun Ren, Vasant G Honavar
Existing preference optimization objectives for language model alignment require additional hyperparameters that must be extensively tuned to achieve optimal performance, increasing both the complexity and time required for fine-tuning large language models.
1 code implementation • 19 Dec 2024 • Teng Xiao, Yige Yuan, Huaisheng Zhu, Mingxiao Li, Vasant G Honavar
Contrastive preference optimization has shown promising results in aligning LLMs with available preference data by optimizing the implicit reward associated with the policy.
1 code implementation • 20 Nov 2024 • Yige Yuan, Bingbing Xu, Hexiang Tan, Fei Sun, Teng Xiao, Wei Li, HuaWei Shen, Xueqi Cheng
Confidence calibration in LLMs, i. e., aligning their self-assessed confidence with the actual accuracy of their responses, enabling them to self-evaluate the correctness of their outputs.
1 code implementation • 14 Oct 2024 • Teng Xiao, Mingxiao Li, Yige Yuan, Huaisheng Zhu, Chao Cui, Vasant G Honavar
This paper introduces a novel generalized self-imitation learning ($\textbf{GSIL}$) framework, which effectively and efficiently aligns large language models with offline demonstration data.
no code implementations • 12 Oct 2024 • Yige Yuan, Bingbing Xu, Teng Xiao, Liang Hou, Fei Sun, HuaWei Shen, Xueqi Cheng
Test-Time Adaptation (TTA) has emerged as a promising paradigm for enhancing the generalizability of models.
no code implementations • 25 May 2024 • Zixu Wang, Bingbing Xu, Yige Yuan, HuaWei Shen, Xueqi Cheng
Graph contrastive learning (GCL), standing as the dominant paradigm in the realm of graph pre-training, has yielded considerable progress.
1 code implementation • CVPR 2024 • Yige Yuan, Bingbing Xu, Liang Hou, Fei Sun, HuaWei Shen, Xueqi Cheng
To address this, we propose a novel energy-based perspective, enhancing the model's perception of target data distributions without requiring access to training data or processes.
1 code implementation • 25 May 2023 • Yige Yuan, Bingbing Xu, Bo Lin, Liang Hou, Fei Sun, HuaWei Shen, Xueqi Cheng
The generalization of neural networks is a central challenge in machine learning, especially concerning the performance under distributions that differ from training ones.
no code implementations • 20 Nov 2022 • Yige Yuan, Bingbing Xu, HuaWei Shen, Qi Cao, Keting Cen, Wen Zheng, Xueqi Cheng
Guided by the bound, we design a GCL framework named InfoAdv with enhanced generalization ability, which jointly optimizes the generalization metric and InfoMax to strike the right balance between pretext task fitting and the generalization ability on downstream tasks.
no code implementations • 16 Nov 2022 • Yang Li, Bingbing Xu, Qi Cao, Yige Yuan, HuaWei Shen
On account that previous studies either lacks variance analysis or only focus on a particular sampling paradigm, we firstly propose an unified node sampling variance analysis framework and analyze the core challenge "circular dependency" for deriving the minimum variance sampler, i. e., sampling probability depends on node embeddings while node embeddings can not be calculated until sampling is finished.
1 code implementation • NeurIPS 2023 • Liang Hou, Qi Cao, Yige Yuan, Songtao Zhao, Chongyang Ma, Siyuan Pan, Pengfei Wan, Zhongyuan Wang, HuaWei Shen, Xueqi Cheng
Training generative adversarial networks (GANs) with limited data is challenging because the discriminator is prone to overfitting.