no code implementations • 13 May 2022 • Nicolas Kardous, Amaury Hayat, Sean T. McQuade, Xiaoqian Gong, Sydney Truong, Tinhinane Mezair, Paige Arnold, Ryan Delorenzo, Alexandre Bayen, Benedetto Piccoli
The choice of these parameters in the lane-change mechanism is critical to modeling traffic accurately, because different parameter values can lead to drastically different traffic behaviors.
This work in progress considers reachability-based safety analysis in the domain of autonomous driving in multi-agent systems.
The proposed scheme does not require pre-computation and can improve the amortized running time of the composed MPC with a well-trained neural network.
no code implementations • 2 Apr 2021 • Saleh Albeaik, Alexandre Bayen, Maria Teresa Chiri, Xiaoqian Gong, Amaury Hayat, Nicolas Kardous, Alexander Keimer, Sean T. McQuade, Benedetto Piccoli, Yiling You
First it is shown that, for a specific class of initial data, the vehicles' velocities become negative or even diverge to $-\infty$ in finite time, both undesirable properties for a car-following model.
Proximal Policy Optimization (PPO) is a popular on-policy reinforcement learning algorithm but is significantly less utilized than off-policy learning algorithms in multi-agent settings.
We benchmark commonly used multi-agent deep reinforcement learning (MARL) algorithms on a variety of cooperative multi-agent games.
We fill this gap by enhancing a deep learning approach, Diffusion Convolutional Recurrent Neural Network, with spatial information generated from signal timing plans at targeted intersections.
We call our technique Protagonist Antagonist Induced Regret Environment Design (PAIRED).
We apply multi-agent reinforcement algorithms to this problem and demonstrate that significant improvements in bottleneck throughput, from 20\% at a 5\% penetration rate to 33\% at a 40\% penetration rate, can be achieved.
Reinforcement Learning (RL) is an effective tool for controller design but can struggle with issues of robustness, failing catastrophically when the underlying system dynamics are perturbed.
We then directly transfer this policy without any tuning to the University of Delaware Scaled Smart City (UDSSC), a 1:25 scale testbed for connected and automated vehicles.
This dynamics can be described naturally as a coupling of a dual variable accumulating gradients at a given rate $\eta(t)$, and a primal variable obtained as the weighted average of the mirrored dual trajectory, with weights $w(t)$.
We study a general adversarial online learning problem, in which we are given a decision set X' in a reflexive Banach space X and a sequence of reward vectors in the dual space of X.
Under the assumption of uniformly continuous rewards, we obtain explicit anytime regret bounds in a setting where the decision set is the set of probability distributions on a compact metric space $S$ whose Radon-Nikodym derivatives are elements of $L^p(S)$ for some $p > 1$.