no code implementations • 11 Jan 2024 • Yuanzhao Zhai, Yiying Li, Zijian Gao, Xudong Gong, Kele Xu, Dawei Feng, Ding Bo, Huaimin Wang
ORPO generates Optimistic model Rollouts for Pessimistic offline policy Optimization.
no code implementations • 30 Dec 2023 • Yuanzhao Zhai, Han Zhang, Yu Lei, Yue Yu, Kele Xu, Dawei Feng, Bo Ding, Huaimin Wang
Reinforcement learning from human feedback (RLHF) emerges as a promising paradigm for aligning large language models (LLMs).
no code implementations • 21 Nov 2022 • Shiqiang Zhu, Ting Yu, Tao Xu, Hongyang Chen, Schahram Dustdar, Sylvain Gigan, Deniz Gunduz, Ekram Hossain, Yaochu Jin, Feng Lin, Bo Liu, Zhiguo Wan, Ji Zhang, Zhifeng Zhao, Wentao Zhu, Zuoning Chen, Tariq Durrani, Huaimin Wang, Jiangxing Wu, Tongyi Zhang, Yunhe Pan
In recent years, we have witnessed the emergence of intelligent computing, a new computing paradigm that is reshaping traditional computing and promoting digital revolution in the era of big data, artificial intelligence and internet-of-things with new computing theories, architectures, methods, systems, and applications.
no code implementations • 24 Aug 2022 • Zijian Gao, Yiying Li, Kele Xu, Yuanzhao Zhai, Dawei Feng, Bo Ding, XinJun Mao, Huaimin Wang
The curiosity arouses if memorized information can not deal with the current state, and the information gap between dual learners can be formulated as the intrinsic reward for agents, and then such state information can be consolidated into the dynamic memory.
no code implementations • 24 Aug 2022 • Zijian Gao, Kele Xu, Yuanzhao Zhai, Dawei Feng, Bo Ding, XinJun Mao, Huaimin Wang
Our method involves training a self-supervised prediction model, saving snapshots of the model parameters, and using nuclear norm to evaluate the temporal inconsistency between the predictions of different snapshots as intrinsic rewards.
no code implementations • 12 Jul 2022 • Ming Feng, Kele Xu, Nanhui Wu, Weiquan Huang, Yan Bai, Changjian Wang, Huaimin Wang
Leveraging the Vision Transformer as the backbone for multi branches, our framework can jointly classification modeling, estimating the uncertainty of each magnification of a microscope and integrate the evidence from different magnification.
no code implementations • 21 May 2022 • Chao Chen, Zijian Gao, Kele Xu, Sen yang, Yiying Li, Bo Ding, Dawei Feng, Huaimin Wang
To handle the sparsity of the extrinsic rewards in reinforcement learning, researchers have proposed intrinsic reward which enables the agent to learn the skills that might come in handy for pursuing the rewards in the future, such as encouraging the agent to visit novel states.
1 code implementation • 28 Apr 2022 • Boqing Zhu, Kele Xu, Changjian Wang, Zheng Qin, Tao Sun, Huaimin Wang, Yuxing Peng
We present an approach to learn voice-face representations from the talking face videos, without any identity labels.
1 code implementation • 13 Jan 2022 • Da Li, Sen yang, Kele Xu, Ming Yi, Yukai He, Huaimin Wang
To demonstrate the effectiveness of our method, we conduct extensive experiments on three widely-used datasets, WN18RR, FB15k-237, and UMLS.
Ranked #1 on Link Prediction on UMLS
no code implementations • 25 May 2021 • Zijian Gao, Kele Xu, Bo Ding, Huaimin Wang, Yiying Li, Hongda Jia
In this paper, we present an adaptation method of the majority of multi-agent reinforcement learning (MARL) algorithms called KnowSR which takes advantage of the differences in learning between agents.
Knowledge Distillation Multi-agent Reinforcement Learning +2
no code implementations • 27 Mar 2021 • Zijian Gao, Kele Xu, Bo Ding, Huaimin Wang, Yiying Li, Hongda Jia
In this paper, we propose a method, named "KnowRU" for knowledge reusing which can be easily deployed in the majority of the multi-agent reinforcement learning algorithms without complicated hand-coded design.
Knowledge Distillation Multi-agent Reinforcement Learning +2
no code implementations • 27 Jan 2021 • Yiying Li, Wei Zhou, Huaimin Wang, Haibo Mi, Timothy M. Hospedales
Federated learning (FL) enables distributed participants to collectively learn a strong global model without sacrificing their individual data privacy.
no code implementations • 16 Jul 2020 • Boqing Zhu, Kele Xu, Qiuqiang Kong, Huaimin Wang, Yuxing Peng
Yet, it is labor-intensive to accurately annotate large amount of audio data, and the dataset may contain noisy labels in the practical settings.
1 code implementation • NeurIPS 2020 • Wei Zhou, Yiying Li, Yongxin Yang, Huaimin Wang, Timothy M. Hospedales
Off-Policy Actor-Critic (Off-PAC) methods have proven successful in a variety of continuous control tasks.
no code implementations • 5 Oct 2019 • Mingyang Geng, Kele Xu, Yiying Li, Shuqi Liu, Bo Ding, Huaimin Wang
The aim of multi-agent reinforcement learning systems is to provide interacting agents with the ability to collaboratively learn and adapt to the behavior of other agents.
Multi-agent Reinforcement Learning reinforcement-learning +1
1 code implementation • 19 Feb 2019 • Chaojie Zhao, Peng Zhang, Jian Zhu, Chengrui Wu, Huaimin Wang, Kele Xu
A challenge in speech production research is to predict future tongue movements based on a short period of past tongue movements.
no code implementations • 22 Jan 2019 • Mingyang Geng, Suning Shang, Bo Ding, Huaimin Wang, Pengfei Zhang, Lei Zhang
Furthermore, we successfully exploit our unsupervised learning framework to assist the traditional ORB-SLAM system when the initialization module of ORB-SLAM method could not match enough features.
no code implementations • 12 Nov 2018 • Mingyang Geng, Kele Xu, Bo Ding, Huaimin Wang, Lei Zhang
AutoAugment searches for the augmentation polices in the discrete search space, which may lead to a sub-optimal solution.
2 code implementations • 30 Oct 2018 • Kele Xu, Boqing Zhu, Qiuqiang Kong, Haibo Mi, Bo Ding, Dezhi Wang, Huaimin Wang
Audio tagging is challenging due to the limited size of data and noisy labels.
no code implementations • 16 Oct 2018 • Kele Xu, Haibo Mi, Dawei Feng, Huaimin Wang, Chuan Chen, Zibin Zheng, Xu Lan
Valuable training data is often owned by independent organizations and located in multiple data centers.