In the proposed CASTER simulator, however, the training dataset can be simulated via existing videos.
In this letter, a cooperative sensing framework based on millimeter wave (mmWave) communication systems is proposed to detect tiny motions with a millimeter-level resolution.
In this paper, we propose Multi-Agent Neural Topological Mapping (MANTM) to improve exploration efficiency and generalization for multi-agent exploration tasks.
Agents built with large language models (LLMs) have recently achieved great advancements.
In this work, we present a novel subgame curriculum learning framework for zero-sum games.
Alternatively, Policy-Space Response Oracles (PSRO) is an iterative framework for learning NE, where the best responses w. r. t.
In this work, we introduce OmniDrones, an efficient and flexible platform tailored for reinforcement learning in drone control, built on Nvidia's Omniverse Isaac Sim.
Directly applying end-to-end reinforcement learning (RL) methods to truss layout design is infeasible either, since only a tiny portion of the entire layout space is valid under the physical constraints, leading to particularly sparse rewards for RL training.
This indicates that different downstream tasks have different levels of sensitivity to sentence components.
Exploring sparse reward multi-agent reinforcement learning (MARL) environments with traps in a collaborative manner is a complex task.
Aiming at promoting the safe real-world deployment of Reinforcement Learning (RL), research on safe RL has made significant progress in recent years.
Recently, Multi-Agent Reinforcement Learning (MARL) has been applied to a large number of scenarios and has shown promising performance.
Reinforcement Learning(RL) has achieved tremendous development in recent years, but still faces significant obstacles in addressing complex real-life problems due to the issues of poor system generalization, low sample efficiency as well as safety and interpretability concerns.
It is demonstrated via experiments that the mmAlert system can always detect the motions of the walking person close to the LoS path, and predict 90\% of the LoS blockage with sensing time of 1. 4 seconds.
Goal-conditioned hierarchical reinforcement learning (HRL) provides a promising direction to tackle this challenge by introducing a hierarchical structure to decompose the search space, where the low-level policy predicts primitive actions in the guidance of the goals derived from the high-level policy.
A crucial limitation of this framework is that every policy in the pool is optimized w. r. t.
In Model-based Reinforcement Learning (MBRL), model learning is critical since an inaccurate model can bias policy learning via generating misleading samples.
Simply waiting for every robot being ready for the next action can be particularly time-inefficient.
Offline reinforcement learning promises to alleviate this issue by exploiting the vast amount of observational data available in the real world.
Despite all the advantages, we revisit these two principles and show that in certain scenarios, e. g., environments with a highly multi-modal reward landscape, value decomposition, and parameter sharing can be problematic and lead to undesired outcomes.
In this paper, we theoretically demonstrate that ESMM suffers from the following two problems: (1) Inherent Estimation Bias (IEB), where the estimated CVR of ESMM is inherently higher than the ground truth; (2) Potential Independence Priority (PIP) for CTCVR estimation, where there is a risk that the ESMM overlooks the causality from click to conversion.
Hierarchical Text Classification (HTC) is a challenging task where a document can be assigned to multiple hierarchically structured categories within a taxonomy.
Relying on the passive sensing system, a dataset of received signals, where three types of hand gestures are sensed, is collected by using Line-of-Sight (LoS) and Non-Line-of-Sight (NLoS) paths as the reference channel respectively.
Despite of achieving great success in real-world applications, Deep Reinforcement Learning (DRL) is still suffering from three critical issues, i. e., data efficiency, lack of the interpretability and transferability.
These scenarios indeed correspond to the vulnerabilities of the under-test driving policies, thus are meaningful for their further improvements.
In this paper, we propose Lifelong reinforcement learning with Sequential linear temporal logic formulas and Reward Machines (LSRM), which enables an agent to leverage previously learned knowledge to fasten learning of logically specified tasks.
We present Coordinated Proximal Policy Optimization (CoPPO), an algorithm that extends the original Proximal Policy Optimization (PPO) to the multi-agent setting.
In this paper, we extend the state-of-the-art single-agent visual navigation method, Active Neural SLAM (ANS), to the multi-agent setting by introducing a novel RL-based planning module, Multi-agent Spatial Planner (MSP). MSP leverages a transformer-based architecture, Spatial-TeamFormer, which effectively captures spatial relations and intra-agent interactions via hierarchical spatial self-attentions.
In recent years, quantitative investment methods combined with artificial intelligence have attracted more and more attention from investors and researchers.
Long-range active imaging has widespread applications in remote sensing and target recognition.
We propose a simple, general and effective technique, Reward Randomization for discovering diverse strategic policies in complex multi-agent games.
This is often due to the belief that PPO is significantly less sample efficient than off-policy methods in multi-agent systems.
Some recent work focused on solving a combination of two subtasks, e. g., extracting aspect terms along with sentiment polarities or extracting the aspect and opinion terms pair-wisely.
We benchmark commonly used multi-agent deep reinforcement learning (MARL) algorithms on a variety of cooperative multi-agent games.
Since the signal with strong power should be demodulated first for successive interference cancellation (SIC) demodulation in non-orthogonal multiple access (NOMA) systems, the base station (BS) should inform the near user terminal (UT), which has allocated higher power, of modulation mode of the far user terminal.
As a subfield of machine learning, reinforcement learning (RL) aims at empowering one's capabilities in behavioural decision making by using interaction experience with the world and an evaluative feedback.
Providing reinforcement learning agents with informationally rich human knowledge can dramatically improve various aspects of learning.
We then propose a hierarchical supervision framework to explicitly model the PoG, and define step by step how to realize the core principle of the framework and compute the optimal PoG for a control problem.