1 code implementation • 2 Dec 2024 • Lifan Yuan, Wendi Li, Huayu Chen, Ganqu Cui, Ning Ding, Kaiyan Zhang, BoWen Zhou, Zhiyuan Liu, Hao Peng
The only assumption is to parameterize the outcome reward as the log-likelihood ratios of the policy and reference models, which can be optimized regardless of the specific choice of loss objectives.
1 code implementation • 12 Oct 2024 • Huayu Chen, Hang Su, Peize Sun, Jun Zhu
Motivated by language model alignment methods, we propose \textit{Condition Contrastive Alignment} (CCA) to facilitate guidance-free AR visual generation with high performance and analyze its theoretical connection with guided sampling methods.
no code implementations • 10 Oct 2024 • Songming Liu, Lingxuan Wu, Bangguo Li, Hengkai Tan, Huayu Chen, Zhengyi Wang, Ke Xu, Hang Su, Jun Zhu
Bimanual manipulation is essential in robotics, yet developing foundation models is extremely challenging due to the inherent complexity of coordinating two robot arms (leading to multi-modal action distributions) and the scarcity of training data.
1 code implementation • 12 Jul 2024 • Huayu Chen, Kaiwen Zheng, Hang Su, Jun Zhu
Drawing upon recent advances in language model alignment, we formulate offline Reinforcement Learning as a two-stage optimization problem: First pretraining expressive generative policies on reward-free behavior datasets, then fine-tuning these policies to align with task-specific annotations like Q-values.
no code implementations • 26 Feb 2024 • Tianjiao Luo, Tim Pearce, Huayu Chen, Jianfei Chen, Jun Zhu
Generative Adversarial Imitation Learning (GAIL) trains a generative policy to mimic a demonstrator.
3 code implementations • 8 Feb 2024 • Huayu Chen, Guande He, Lifan Yuan, Ganqu Cui, Hang Su, Jun Zhu
We evaluate our methods in both reward and preference settings with Mistral-8*7B and 7B models.
1 code implementation • 11 Oct 2023 • Huayu Chen, Cheng Lu, Zhengyi Wang, Hang Su, Jun Zhu
Recent developments in offline reinforcement learning have uncovered the immense potential of diffusion modeling, which excels at representing heterogeneous behavior policies.
3 code implementations • 25 Apr 2023 • Cheng Lu, Huayu Chen, Jianfei Chen, Hang Su, Chongxuan Li, Jun Zhu
The main challenge for this setting is that the intermediate guidance during the diffusion sampling procedure, which is jointly defined by the sampling distribution and the energy function, is unknown and is hard to estimate.
1 code implementation • 29 Sep 2022 • Huayu Chen, Cheng Lu, Chengyang Ying, Hang Su, Jun Zhu
To address this problem, we adopt a generative approach by decoupling the learned policy into two parts: an expressive generative behavior model and an action evaluation model.
no code implementations • 13 Sep 2022 • Huayu Chen, Huanhuan He, Jing Zhu, Shuting Sun, Jianxiu Li, Xuexiao Shao, Junxiang Li, Xiaowei Li, Bin Hu
Cross-dataset emotion recognition as an extremely challenging task in the field of EEG-based affective computing is influenced by many factors, which makes the universal models yield unsatisfactory results.
1 code implementation • 29 Jul 2021 • Jiayi Weng, Huayu Chen, Dong Yan, Kaichao You, Alexis Duburcq, Minghao Zhang, Yi Su, Hang Su, Jun Zhu
In this paper, we present Tianshou, a highly modularized Python library for deep reinforcement learning (DRL) that uses PyTorch as its backend.
no code implementations • 23 Feb 2020 • Shuting Sun, Jianxiu Li, Huayu Chen, Tao Gong, Xiaowei Li, Bin Hu
Results: Functional connectivity feature PLI is superior to the linear features and nonlinear features.