Search Results for author: Chao Yu

Found 48 papers, 17 papers with code

ESCM$^2$: Entire Space Counterfactual Multi-Task Model for Post-Click Conversion Rate Estimation

1 code implementation3 Apr 2022 Hao Wang, Tai-Wei Chang, Tianqiao Liu, Jianmin Huang, Zhichao Chen, Chao Yu, Ruopeng Li, Wei Chu

In this paper, we theoretically demonstrate that ESMM suffers from the following two problems: (1) Inherent Estimation Bias (IEB), where the estimated CVR of ESMM is inherently higher than the ground truth; (2) Potential Independence Priority (PIP) for CTCVR estimation, where there is a risk that the ESMM overlooks the causality from click to conversion.

counterfactual Recommendation Systems +1

The Surprising Effectiveness of PPO in Cooperative, Multi-Agent Games

15 code implementations2 Mar 2021 Chao Yu, Akash Velu, Eugene Vinitsky, Jiaxuan Gao, Yu Wang, Alexandre Bayen, Yi Wu

This is often due to the belief that PPO is significantly less sample efficient than off-policy methods in multi-agent systems.

Multi-agent Reinforcement Learning reinforcement-learning +3

DS-SLAM: A Semantic Visual SLAM towards Dynamic Environments

2 code implementations22 Sep 2018 Chao Yu, Zuxin Liu, Xinjun Liu, Fugui Xie, Yi Yang, Qi Wei, Qiao Fei

It is one of the state-of-the-art SLAM systems in high-dynamic environments.

Robotics

OmniDrones: An Efficient and Flexible Platform for Reinforcement Learning in Drone Control

1 code implementation22 Sep 2023 Botian Xu, Feng Gao, Chao Yu, Ruize Zhang, Yi Wu, Yu Wang

In this work, we introduce OmniDrones, an efficient and flexible platform tailored for reinforcement learning in drone control, built on Nvidia's Omniverse Isaac Sim.

reinforcement-learning

Discovering Diverse Multi-Agent Strategic Behavior via Reward Randomization

2 code implementations ICLR 2021 Zhenggang Tang, Chao Yu, Boyuan Chen, Huazhe Xu, Xiaolong Wang, Fei Fang, Simon Du, Yu Wang, Yi Wu

We propose a simple, general and effective technique, Reward Randomization for discovering diverse strategic policies in complex multi-agent games.

LLM-Powered Hierarchical Language Agent for Real-time Human-AI Coordination

1 code implementation23 Dec 2023 Jijia Liu, Chao Yu, Jiaxuan Gao, Yuqing Xie, Qingmin Liao, Yi Wu, Yu Wang

AI agents powered by Large Language Models (LLMs) have made significant advances, enabling them to assist humans in diverse complex tasks and leading to a revolution in human-AI coordination.

Code Generation

Coordinated Proximal Policy Optimization

1 code implementation NeurIPS 2021 Zifan Wu, Chao Yu, Deheng Ye, Junge Zhang, Haiyin Piao, Hankz Hankui Zhuo

We present Coordinated Proximal Policy Optimization (CoPPO), an algorithm that extends the original Proximal Policy Optimization (PPO) to the multi-agent setting.

Starcraft Starcraft II

Safe Offline Reinforcement Learning with Real-Time Budget Constraints

1 code implementation1 Jun 2023 Qian Lin, Bo Tang, Zifan Wu, Chao Yu, Shangqin Mao, Qianlong Xie, Xingxing Wang, Dong Wang

Aiming at promoting the safe real-world deployment of Reinforcement Learning (RL), research on safe RL has made significant progress in recent years.

reinforcement-learning Reinforcement Learning (RL)

Automatic Truss Design with Reinforcement Learning

1 code implementation27 Jun 2023 Weihua Du, Jinglun Zhao, Chao Yu, Xingcheng Yao, Zimeng Song, Siyang Wu, Ruifeng Luo, Zhiyuan Liu, Xianzhong Zhao, Yi Wu

Directly applying end-to-end reinforcement learning (RL) methods to truss layout design is infeasible either, since only a tiny portion of the entire layout space is valid under the physical constraints, leading to particularly sparse rewards for RL training.

Combinatorial Optimization Layout Design +3

Plan To Predict: Learning an Uncertainty-Foreseeing Model for Model-Based Reinforcement Learning

1 code implementation20 Jan 2023 Zifan Wu, Chao Yu, Chen Chen, Jianye Hao, Hankz Hankui Zhuo

In Model-based Reinforcement Learning (MBRL), model learning is critical since an inaccurate model can bias policy learning via generating misleading samples.

Decision Making Model-based Reinforcement Learning

Off-Policy Primal-Dual Safe Reinforcement Learning

1 code implementation26 Jan 2024 Zifan Wu, Bo Tang, Qian Lin, Chao Yu, Shangqin Mao, Qianlong Xie, Xingxing Wang, Dong Wang

Results on benchmark tasks show that our method not only achieves an asymptotic performance comparable to state-of-the-art on-policy methods while using much fewer samples, but also significantly reduces constraint violation during training.

reinforcement-learning Safe Reinforcement Learning

Learning Graph-Enhanced Commander-Executor for Multi-Agent Navigation

1 code implementation8 Feb 2023 Xinyi Yang, Shiyu Huang, Yiwen Sun, Yuxiang Yang, Chao Yu, Wei-Wei Tu, Huazhong Yang, Yu Wang

Goal-conditioned hierarchical reinforcement learning (HRL) provides a promising direction to tackle this challenge by introducing a hierarchical structure to decompose the search space, where the low-level policy predicts primitive actions in the guidance of the goals derived from the high-level policy.

Hierarchical Reinforcement Learning Multi-agent Reinforcement Learning +2

Policy-regularized Offline Multi-objective Reinforcement Learning

1 code implementation4 Jan 2024 Qian Lin, Chao Yu, Zongkai Liu, Zifan Wu

In this paper, we aim to utilize only offline trajectory data to train a policy for multi-objective RL.

Multi-Objective Reinforcement Learning Offline RL +1

The Price of Governance: A Middle Ground Solution to Coordination in Organizational Control

no code implementations9 Nov 2018 Chao Yu

We then propose a hierarchical supervision framework to explicitly model the PoG, and define step by step how to realize the core principle of the framework and compute the optimal PoG for a control problem.

Learning Shaping Strategies in Human-in-the-loop Interactive Reinforcement Learning

no code implementations10 Nov 2018 Chao Yu, Tianpei Yang, Wenxuan Zhu, Dongxu Wang, Guangliang Li

Providing reinforcement learning agents with informationally rich human knowledge can dramatically improve various aspects of learning.

reinforcement-learning Reinforcement Learning (RL)

Reinforcement Learning in Healthcare: A Survey

no code implementations22 Aug 2019 Chao Yu, Jiming Liu, Shamim Nemati

As a subfield of machine learning, reinforcement learning (RL) aims at empowering one's capabilities in behavioural decision making by using interaction experience with the world and an evaluative feedback.

Decision Making Medical Diagnosis +3

Symmetrical Gaussian Error Linear Units (SGELUs)

no code implementations10 Nov 2019 Chao Yu, Zhiguo Su

In this paper, a novel neural network activation function, called Symmetrical Gaussian Error Linear Unit (SGELU), is proposed to obtain high performance.

Benchmarking Multi-Agent Deep Reinforcement Learning Algorithms

no code implementations1 Jan 2021 Chao Yu, Akash Velu, Eugene Vinitsky, Yu Wang, Alexandre Bayen, Yi Wu

We benchmark commonly used multi-agent deep reinforcement learning (MARL) algorithms on a variety of cooperative multi-agent games.

Benchmarking reinforcement-learning +2

Deep Learning-based Modulation Detection for NOMA Systems

no code implementations24 May 2020 Wenwu Xie, Jian Xiao, Jinxia Yang, Xin Peng, Chao Yu, Peng Zhu

Since the signal with strong power should be demodulated first for successive interference cancellation (SIC) demodulation in non-orthogonal multiple access (NOMA) systems, the base station (BS) should inform the near user terminal (UT), which has allocated higher power, of modulation mode of the far user terminal.

Denoising

A Joint Training Dual-MRC Framework for Aspect Based Sentiment Analysis

no code implementations4 Jan 2021 Yue Mao, Yi Shen, Chao Yu, Longjun Cai

Some recent work focused on solving a combination of two subtasks, e. g., extracting aspect terms along with sentiment polarities or extracting the aspect and opinion terms pair-wisely.

Aspect-Based Sentiment Analysis Aspect-oriented Opinion Extraction +6

Single-photon imaging over 200 km

no code implementations10 Mar 2021 Zheng-Ping Li, Jun-Tian Ye, Xin Huang, Peng-Yu Jiang, Yuan Cao, Yu Hong, Chao Yu, Jun Zhang, Qiang Zhang, Cheng-Zhi Peng, Feihu Xu, Jian-Wei Pan

Long-range active imaging has widespread applications in remote sensing and target recognition.

Reinforcement Learning with Expert Trajectory For Quantitative Trading

no code implementations9 May 2021 Sihang Chen, Weiqi Luo, Chao Yu

In recent years, quantitative investment methods combined with artificial intelligence have attracted more and more attention from investors and researchers.

Q-Learning reinforcement-learning +1

Learning Efficient Multi-Agent Cooperative Visual Exploration

no code implementations12 Oct 2021 Chao Yu, Xinyi Yang, Jiaxuan Gao, Huazhong Yang, Yu Wang, Yi Wu

In this paper, we extend the state-of-the-art single-agent visual navigation method, Active Neural SLAM (ANS), to the multi-agent setting by introducing a novel RL-based planning module, Multi-agent Spatial Planner (MSP). MSP leverages a transformer-based architecture, Spatial-TeamFormer, which effectively captures spatial relations and intra-agent interactions via hierarchical spatial self-attentions.

Reinforcement Learning (RL) Visual Navigation

Lifelong Reinforcement Learning with Temporal Logic Formulas and Reward Machines

no code implementations18 Nov 2021 Xuejing Zheng, Chao Yu, Chen Chen, Jianye Hao, Hankz Hankui Zhuo

In this paper, we propose Lifelong reinforcement learning with Sequential linear temporal logic formulas and Reward Machines (LSRM), which enables an agent to leverage previously learned knowledge to fasten learning of logically specified tasks.

reinforcement-learning Reinforcement Learning (RL) +1

Multi-Agent Vulnerability Discovery for Autonomous Driving with Hazard Arbitration Reward

no code implementations12 Dec 2021 Weilin Liu, Ye Mu, Chao Yu, Xuefei Ning, Zhong Cao, Yi Wu, Shuang Liang, Huazhong Yang, Yu Wang

These scenarios indeed correspond to the vulnerabilities of the under-test driving policies, thus are meaningful for their further improvements.

Autonomous Driving Multi-agent Reinforcement Learning

Creativity of AI: Hierarchical Planning Model Learning for Facilitating Deep Reinforcement Learning

no code implementations18 Dec 2021 Hankz Hankui Zhuo, Shuting Deng, Mu Jin, Zhihao Ma, Kebing Jin, Chen Chen, Chao Yu

Despite of achieving great success in real-world applications, Deep Reinforcement Learning (DRL) is still suffering from three critical issues, i. e., data efficiency, lack of the interpretability and transferability.

Montezuma's Revenge reinforcement-learning +1

Passive Motion Detection via mmWave Communication System

no code implementations28 Mar 2022 Jie Li, Chao Yu, Yan Luo, Yifei Sun, Rui Wang

Relying on the passive sensing system, a dataset of received signals, where three types of hand gestures are sensed, is collected by using Line-of-Sight (LoS) and Non-Line-of-Sight (NLoS) paths as the reference channel respectively.

Hand Gesture Recognition Hand-Gesture Recognition +1

Constrained Sequence-to-Tree Generation for Hierarchical Text Classification

no code implementations2 Apr 2022 Chao Yu, Yi Shen, Yue Mao, Longjun Cai

Hierarchical Text Classification (HTC) is a challenging task where a document can be assigned to multiple hierarchically structured categories within a taxonomy.

Multi-Label Classification text-classification +1

Revisiting Some Common Practices in Cooperative Multi-Agent Reinforcement Learning

no code implementations15 Jun 2022 Wei Fu, Chao Yu, Zelai Xu, Jiaqi Yang, Yi Wu

Despite all the advantages, we revisit these two principles and show that in certain scenarios, e. g., environments with a highly multi-modal reward landscape, value decomposition, and parameter sharing can be problematic and lead to undesired outcomes.

Multi-agent Reinforcement Learning reinforcement-learning +2

Causal Deep Reinforcement Learning Using Observational Data

no code implementations28 Nov 2022 Wenxuan Zhu, Chao Yu, Qiang Zhang

Offline reinforcement learning promises to alleviate this issue by exploiting the vast amount of observational data available in the real world.

Autonomous Driving Causal Inference +3

mmAlert: mmWave Link Blockage Prediction via Passive Sensing

no code implementations22 Feb 2023 Chao Yu, Yifei Sun, Yan Luo, Rui Wang

It is demonstrated via experiments that the mmAlert system can always detect the motions of the walking person close to the LoS path, and predict 90\% of the LoS blockage with sensing time of 1. 4 seconds.

Reinforcement Learning with Knowledge Representation and Reasoning: A Brief Survey

no code implementations24 Apr 2023 Chao Yu, Xuejing Zheng, Hankz Hankui Zhuo, Hai Wan, Weilin Luo

Reinforcement Learning(RL) has achieved tremendous development in recent years, but still faces significant obstacles in addressing complex real-life problems due to the issues of poor system generalization, low sample efficiency as well as safety and interpretability concerns.

reinforcement-learning Reinforcement Learning (RL)

SIFTER: A Task-specific Alignment Strategy for Enhancing Sentence Embeddings

no code implementations21 Jun 2023 Chao Yu, Wenhao Zhu, Chaoming Liu, XiaoYu Zhang, Qiuhong zhai

This indicates that different downstream tasks have different levels of sensitivity to sentence components.

Sentence Sentence Embeddings +1

AlphaZero Gomoku

no code implementations4 Sep 2023 Wen Liang, Chao Yu, Brian Whiteaker, Inyoung Huh, Hua Shao, Youzhi Liang

In the past few years, AlphaZero's exceptional capability in mastering intricate board games has garnered considerable interest.

Game of Go

Fictitious Cross-Play: Learning Global Nash Equilibrium in Mixed Cooperative-Competitive Games

no code implementations5 Oct 2023 Zelai Xu, Yancheng Liang, Chao Yu, Yu Wang, Yi Wu

Alternatively, Policy-Space Response Oracles (PSRO) is an iterative framework for learning NE, where the best responses w. r. t.

Multi-agent Reinforcement Learning

Language Agents with Reinforcement Learning for Strategic Play in the Werewolf Game

no code implementations29 Oct 2023 Zelai Xu, Chao Yu, Fei Fang, Yu Wang, Yi Wu

To mitigate the intrinsic bias in language actions, our agents use an LLM to perform deductive reasoning and generate a diverse set of action candidates.

Decision Making Reinforcement Learning (RL)

Active Neural Topological Mapping for Multi-Agent Exploration

no code implementations1 Nov 2023 Xinyi Yang, Yuxiang Yang, Chao Yu, Jiayu Chen, Jingchen Yu, Haibing Ren, Huazhong Yang, Yu Wang

In this paper, we propose Multi-Agent Neural Topological Mapping (MANTM) to improve exploration efficiency and generalization for multi-agent exploration tasks.

Passive Handwriting Tracking via Weak mmWave Communication Signals

no code implementations3 Nov 2023 Chao Yu, Yan Luo, Renqi Chen, Rui Wang

In this letter, a cooperative sensing framework based on millimeter wave (mmWave) communication systems is proposed to detect tiny motions with a millimeter-level resolution.

MASP: Scalable GNN-based Planning for Multi-Agent Navigation

no code implementations5 Dec 2023 Xinyi Yang, Xinting Yang, Chao Yu, Jiayu Chen, Huazhong Yang, Yu Wang

Besides, to enhance generalization capabilities in scenarios with unseen team sizes, we divide agents into multiple groups, each with a previously trained number of agents.

Reinforcement Learning (RL) Zero-shot Generalization

TaskFlex Solver for Multi-Agent Pursuit via Automatic Curriculum Learning

no code implementations19 Dec 2023 Jiayu Chen, Guosheng Li, Chao Yu, Xinyi Yang, Botian Xu, Huazhong Yang, Yu Wang

In this work, we combine RL and curriculum learning to introduce a flexible solver for multiagent pursuit problems, named TaskFlex Solver (TFS), which is capable of solving multi-agent pursuit problems with diverse and dynamically changing task conditions in both 2-dimensional and 3-dimensional scenarios.

Reinforcement Learning (RL)

Multi-Agent Reinforcement Learning with a Hierarchy of Reward Machines

no code implementations8 Mar 2024 Xuejing Zheng, Chao Yu

In this paper, we study the cooperative Multi-Agent Reinforcement Learning (MARL) problems using Reward Machines (RMs) to specify the reward functions such that the prior knowledge of high-level events in a task can be leveraged to facilitate the learning efficiency.

Multi-agent Reinforcement Learning reinforcement-learning

Is DPO Superior to PPO for LLM Alignment? A Comprehensive Study

no code implementations16 Apr 2024 Shusheng Xu, Wei Fu, Jiaxuan Gao, Wenjie Ye, Weilin Liu, Zhiyu Mei, Guangju Wang, Chao Yu, Yi Wu

However, in academic benchmarks, state-of-the-art results are often achieved via reward-free methods, such as Direct Preference Optimization (DPO).

Code Generation

Cannot find the paper you are looking for? You can Submit a new open access paper.