Search Results for author: Huaimin Wang

Found 20 papers, 5 papers with code

Optimistic Model Rollouts for Pessimistic Offline Policy Optimization

no code implementations • 11 Jan 2024 • Yuanzhao Zhai, Yiying Li, Zijian Gao, Xudong Gong, Kele Xu, Dawei Feng, Ding Bo, Huaimin Wang

ORPO generates Optimistic model Rollouts for Pessimistic offline policy Optimization.

Offline RL Reinforcement Learning (RL)

Paper
Add Code

Uncertainty-Penalized Reinforcement Learning from Human Feedback with Diverse Reward LoRA Ensembles

no code implementations • 30 Dec 2023 • Yuanzhao Zhai, Han Zhang, Yu Lei, Yue Yu, Kele Xu, Dawei Feng, Bo Ding, Huaimin Wang

Reinforcement learning from human feedback (RLHF) emerges as a promising paradigm for aligning large language models (LLMs).

Uncertainty Quantification

Paper
Add Code

Intelligent Computing: The Latest Advances, Challenges and Future

no code implementations • 21 Nov 2022 • Shiqiang Zhu, Ting Yu, Tao Xu, Hongyang Chen, Schahram Dustdar, Sylvain Gigan, Deniz Gunduz, Ekram Hossain, Yaochu Jin, Feng Lin, Bo Liu, Zhiguo Wan, Ji Zhang, Zhifeng Zhao, Wentao Zhu, Zuoning Chen, Tariq Durrani, Huaimin Wang, Jiangxing Wu, Tongyi Zhang, Yunhe Pan

In recent years, we have witnessed the emergence of intelligent computing, a new computing paradigm that is reshaping traditional computing and promoting digital revolution in the era of big data, artificial intelligence and internet-of-things with new computing theories, architectures, methods, systems, and applications.

Paper
Add Code

Dynamic Memory-based Curiosity: A Bootstrap Approach for Exploration

no code implementations • 24 Aug 2022 • Zijian Gao, Yiying Li, Kele Xu, Yuanzhao Zhai, Dawei Feng, Bo Ding, XinJun Mao, Huaimin Wang

The curiosity arouses if memorized information can not deal with the current state, and the information gap between dual learners can be formulated as the intrinsic reward for agents, and then such state information can be consolidated into the dynamic memory.

Reinforcement Learning (RL)

Paper
Add Code

Self-Supervised Exploration via Temporal Inconsistency in Reinforcement Learning

no code implementations • 24 Aug 2022 • Zijian Gao, Kele Xu, Yuanzhao Zhai, Dawei Feng, Bo Ding, XinJun Mao, Huaimin Wang

Our method involves training a self-supervised prediction model, saving snapshots of the model parameters, and using nuclear norm to evaluate the temporal inconsistency between the predictions of different snapshots as intrinsic rewards.

reinforcement-learning Reinforcement Learning (RL)

Paper
Add Code

Trusted Multi-Scale Classification Framework for Whole Slide Image

no code implementations • 12 Jul 2022 • Ming Feng, Kele Xu, Nanhui Wu, Weiquan Huang, Yan Bai, Changjian Wang, Huaimin Wang

Leveraging the Vision Transformer as the backbone for multi branches, our framework can jointly classification modeling, estimating the uncertainty of each magnification of a microscope and integrate the evidence from different magnification.

Classification

Paper
Add Code

Nuclear Norm Maximization Based Curiosity-Driven Learning

no code implementations • 21 May 2022 • Chao Chen, Zijian Gao, Kele Xu, Sen yang, Yiying Li, Bo Ding, Dawei Feng, Huaimin Wang

To handle the sparsity of the extrinsic rewards in reinforcement learning, researchers have proposed intrinsic reward which enables the agent to learn the skills that might come in handy for pursuing the rewards in the future, such as encouraging the agent to visit novel states.

Atari Games

Paper
Add Code

Unsupervised Voice-Face Representation Learning by Cross-Modal Prototype Contrast

1 code implementation • 28 Apr 2022 • Boqing Zhu, Kele Xu, Changjian Wang, Zheng Qin, Tao Sun, Huaimin Wang, Yuxing Peng

We present an approach to learn voice-face representations from the talking face videos, without any identity labels.

Contrastive Learning Representation Learning

Paper
Code

Multi-task Pre-training Language Model for Semantic Network Completion

1 code implementation • 13 Jan 2022 • Da Li, Sen yang, Kele Xu, Ming Yi, Yukai He, Huaimin Wang

To demonstrate the effectiveness of our method, we conduct extensive experiments on three widely-used datasets, WN18RR, FB15k-237, and UMLS.

Ranked #1 on Link Prediction on UMLS

Contrastive Learning Data Augmentation +3

Paper
Code

KnowSR: Knowledge Sharing among Homogeneous Agents in Multi-agent Reinforcement Learning

no code implementations • 25 May 2021 • Zijian Gao, Kele Xu, Bo Ding, Huaimin Wang, Yiying Li, Hongda Jia

In this paper, we present an adaptation method of the majority of multi-agent reinforcement learning (MARL) algorithms called KnowSR which takes advantage of the differences in learning between agents.

Knowledge Distillation Multi-agent Reinforcement Learning +2

Paper
Add Code

KnowRU: Knowledge Reusing via Knowledge Distillation in Multi-agent Reinforcement Learning

no code implementations • 27 Mar 2021 • Zijian Gao, Kele Xu, Bo Ding, Huaimin Wang, Yiying Li, Hongda Jia

In this paper, we propose a method, named "KnowRU" for knowledge reusing which can be easily deployed in the majority of the multi-agent reinforcement learning algorithms without complicated hand-coded design.

Knowledge Distillation Multi-agent Reinforcement Learning +2

Paper
Add Code

FedH2L: Federated Learning with Model and Statistical Heterogeneity

no code implementations • 27 Jan 2021 • Yiying Li, Wei Zhou, Huaimin Wang, Haibo Mi, Timothy M. Hospedales

Federated learning (FL) enables distributed participants to collectively learn a strong global model without sacrificing their individual data privacy.

Federated Learning

Paper
Add Code

Audio Tagging by Cross Filtering Noisy Labels

no code implementations • 16 Jul 2020 • Boqing Zhu, Kele Xu, Qiuqiang Kong, Huaimin Wang, Yuxing Peng

Yet, it is labor-intensive to accurately annotate large amount of audio data, and the dataset may contain noisy labels in the practical settings.

Audio Tagging Memorization +1

Paper
Add Code

Online Meta-Critic Learning for Off-Policy Actor-Critic Methods

1 code implementation • NeurIPS 2020 • Wei Zhou, Yiying Li, Yongxin Yang, Huaimin Wang, Timothy M. Hospedales

Off-Policy Actor-Critic (Off-PAC) methods have proven successful in a variety of continuous control tasks.

Continuous Control Meta-Learning

Paper
Code

Attention-based Fault-tolerant Approach for Multi-agent Reinforcement Learning Systems

no code implementations • 5 Oct 2019 • Mingyang Geng, Kele Xu, Yiying Li, Shuqi Liu, Bo Ding, Huaimin Wang

The aim of multi-agent reinforcement learning systems is to provide interacting agents with the ability to collaboratively learn and adapt to the behavior of other agents.

Multi-agent Reinforcement Learning reinforcement-learning +1

Paper
Add Code

Predicting tongue motion in unlabeled ultrasound videos using convolutional LSTM neural network

1 code implementation • 19 Feb 2019 • Chaojie Zhao, Peng Zhang, Jian Zhu, Chengrui Wu, Huaimin Wang, Kele Xu

A challenge in speech production research is to predict future tongue movements based on a short period of past tongue movements.

motion prediction

Paper
Code

Unsupervised Learning-based Depth Estimation aided Visual SLAM Approach

no code implementations • 22 Jan 2019 • Mingyang Geng, Suning Shang, Bo Ding, Huaimin Wang, Pengfei Zhang, Lei Zhang

Furthermore, we successfully exploit our unsupervised learning framework to assist the traditional ORB-SLAM system when the initialization module of ORB-SLAM method could not match enough features.

Depth And Camera Motion Image Reconstruction +2