no code implementations • 30 Dec 2023 • Yuanzhao Zhai, Han Zhang, Yu Lei, Yue Yu, Kele Xu, Dawei Feng, Bo Ding, Huaimin Wang
Reinforcement learning from human feedback (RLHF) emerges as a promising paradigm for aligning large language models (LLMs).
no code implementations • 5 Dec 2023 • Bo Ding, Zhenfeng Fan, Shuang Yang, Shihong Xia
We incorporate personalized prior in a monocular video and morphable prior in 3D face morphable space for generating personalized details under novel controllable parameters.
no code implementations • 30 May 2023 • Junpeng Wang, Mengke Ge, Bo Ding, Qi Xu, Song Chen, Yi Kang
As one of the feasible processing-in-memory(PIM) architectures, 3D-stacked-DRAM-based PIM(DRAM-PIM) architecture enables large-capacity memory and low-cost memory access, which is a promising solution for DNN accelerators with better performance and energy efficiency.
no code implementations • 11 Dec 2022 • Bo Ding, Jinglei Huang, Junpeng Wang, Qi Xu, Song Chen, Yi Kang
To better solve the problems in the automation process of FPGA-PDRS and narrow the gap between algorithm and application, in this paper, we propose a complete workflow including three parts, pre-processing to generate the list of task modules candidate shapes according to the resources requirements, exploration process to search the solution of task modules partitioning, scheduling, and floorplanning, and post-optimization to improve the success rate of floorplan.
no code implementations • 24 Aug 2022 • Zijian Gao, Yiying Li, Kele Xu, Yuanzhao Zhai, Dawei Feng, Bo Ding, XinJun Mao, Huaimin Wang
The curiosity arouses if memorized information can not deal with the current state, and the information gap between dual learners can be formulated as the intrinsic reward for agents, and then such state information can be consolidated into the dynamic memory.
no code implementations • 24 Aug 2022 • Zijian Gao, Kele Xu, Yuanzhao Zhai, Dawei Feng, Bo Ding, XinJun Mao, Huaimin Wang
Our method involves training a self-supervised prediction model, saving snapshots of the model parameters, and using nuclear norm to evaluate the temporal inconsistency between the predictions of different snapshots as intrinsic rewards.
no code implementations • 21 May 2022 • Chao Chen, Zijian Gao, Kele Xu, Sen yang, Yiying Li, Bo Ding, Dawei Feng, Huaimin Wang
To handle the sparsity of the extrinsic rewards in reinforcement learning, researchers have proposed intrinsic reward which enables the agent to learn the skills that might come in handy for pursuing the rewards in the future, such as encouraging the agent to visit novel states.
no code implementations • 25 May 2021 • Zijian Gao, Kele Xu, Bo Ding, Huaimin Wang, Yiying Li, Hongda Jia
In this paper, we present an adaptation method of the majority of multi-agent reinforcement learning (MARL) algorithms called KnowSR which takes advantage of the differences in learning between agents.
Knowledge Distillation Multi-agent Reinforcement Learning +2
no code implementations • 27 Mar 2021 • Zijian Gao, Kele Xu, Bo Ding, Huaimin Wang, Yiying Li, Hongda Jia
In this paper, we propose a method, named "KnowRU" for knowledge reusing which can be easily deployed in the majority of the multi-agent reinforcement learning algorithms without complicated hand-coded design.
Knowledge Distillation Multi-agent Reinforcement Learning +2
no code implementations • 5 Oct 2019 • Mingyang Geng, Kele Xu, Yiying Li, Shuqi Liu, Bo Ding, Huaimin Wang
The aim of multi-agent reinforcement learning systems is to provide interacting agents with the ability to collaboratively learn and adapt to the behavior of other agents.
Multi-agent Reinforcement Learning reinforcement-learning +1
no code implementations • 22 Jan 2019 • Mingyang Geng, Suning Shang, Bo Ding, Huaimin Wang, Pengfei Zhang, Lei Zhang
Furthermore, we successfully exploit our unsupervised learning framework to assist the traditional ORB-SLAM system when the initialization module of ORB-SLAM method could not match enough features.
no code implementations • 12 Nov 2018 • Mingyang Geng, Kele Xu, Bo Ding, Huaimin Wang, Lei Zhang
AutoAugment searches for the augmentation polices in the discrete search space, which may lead to a sub-optimal solution.
2 code implementations • 30 Oct 2018 • Kele Xu, Boqing Zhu, Qiuqiang Kong, Haibo Mi, Bo Ding, Dezhi Wang, Huaimin Wang
Audio tagging is challenging due to the limited size of data and noisy labels.