Safe reinforcement learning (RL) trains a policy to maximize the task reward while satisfying safety constraints.
no code implementations • 13 Apr 2022 • Shaojin Ding, Weiran Wang, Ding Zhao, Tara N. Sainath, Yanzhang He, Robert David, Rami Botros, Xin Wang, Rina Panigrahy, Qiao Liang, Dongseong Hwang, Ian McGraw, Rohit Prabhavalkar, Trevor Strohman
In this paper, we propose a dynamic cascaded encoder Automatic Speech Recognition (ASR) model, which unifies models for different deployment scenarios.
Multimedia summarization with multimodal output can play an essential role in real-world applications, i. e., automatically generating cover images and titles for news articles or providing introductions to online videos.
In this work, we present Deep Importance Sampling (Deep IS) framework that utilizes a deep neural network to obtain an efficient IS that is on par with the state-of-the-art, capable of reducing the required sample size 43 times smaller than the naive sampling method to achieve 10% relative error and while producing an estimate that is much less conservative.
The proposed method learns an individual-specific predictive model from heterogeneous observations, and enables estimation of an optimal transport map that yields a push forward operation onto the demographic features for each task.
We leverage COPA to certify three RL environments trained with different algorithms and conclude: (1) The proposed robust aggregation protocols such as temporal aggregation can significantly improve the certifications; (2) Our certification for both per-state action stability and cumulative reward bound are efficient and tight; (3) The certification for different training algorithms and environments are different, implying their intrinsic robustness properties.
In this paper, we introduce a novel hierarchical formulation of robust RL - a general-sum Stackelberg game model called RRL-Stack - to formalize the sequential nature and provide extra flexibility for robust training.
Full-field traffic state information (i. e., flow, speed, and density) is critical for the successful operation of Intelligent Transportation Systems (ITS) on freeways.
Safe reinforcement learning (RL) aims to learn policies that satisfy certain constraints before deploying them to safety-critical applications.
In this paper, we focus on a new method of data augmentation to solve the data imbalance problem within imbalanced ECG datasets to improve the robustness and accuracy of heart disease detection.
This paper studies an intelligent reflecting surface (IRS)-assisted integrated sensing and communication (ISAC) system, in which one IRS is deployed to not only assist the wireless communication from a multi-antenna base station (BS) to a single-antenna communication user (CU), but also create virtual line-of-sight (LoS) links for sensing targets at areas with LoS links blocked.
Rare-event simulation techniques, such as importance sampling (IS), constitute powerful tools to speed up challenging estimation of rare catastrophic events.
Generating safety-critical scenarios, which are crucial yet difficult to collect, provides an effective way to evaluate the robustness of autonomous driving systems.
The evaluation of rare but high-stakes events remains one of the main difficulties in obtaining reliable policies from intelligent agents, especially in large or continuous state/action spaces where limited scalability enforces the use of a prohibitively large number of testing iterations.
We then develop a local smoothing algorithm for policies derived from Q-functions to guarantee the robustness of actions taken along the trajectory; we also develop a global smoothing algorithm for certifying the lower bound of a finite-horizon cumulative reward, as well as a novel local smoothing algorithm to perform adaptive search in order to obtain tighter reward certification.
Generating adversarial scenarios, which have the potential to fail autonomous driving systems, provides an effective way to improve the robustness.
To this end, we introduce an easy-to-compute information-theoretic surrogate metric to quantitatively and fast evaluate LiDAR placement for 3D detection of different types of objects.
In this paper, we introduce a streaming keyphrase detection system that can be easily customized to accurately detect any phrase composed of words from a large vocabulary.
We introduce a formulation of optimal transport problem for distributions on function spaces, where the stochastic map between functional domains can be partially represented in terms of an (infinite-dimensional) Hilbert-Schmidt operator mapping a Hilbert space of functions to another.
The algorithm is evaluated in realistic safety-critical environments with non-stationary disturbances.
Different environments pose a great challenge on the outdoor robust visual perception for long-term autonomous driving and the generalization of learning-based algorithms on different environmental effects is still an open problem.
We propose a model-based approach to enable RL agents to effectively explore the environment with unknown system dynamics and environment constraints given a significantly small number of violation budgets.
We study rare-event simulation for a class of problems where the target hitting sets of interest are defined via modern machine learning tools such as neural networks and random forests.
Existing neural network-based autonomous systems are shown to be vulnerable against adversarial attacks, therefore sophisticated evaluation on their robustness is of great importance.
Multi-agent navigation in dynamic environments is of great industrial value when deploying a large scale fleet of robot to real-world applications.
Evaluating the reliability of intelligent physical systems against rare safety-critical events poses a huge testing burden for real-world applications.
We propose a transition prior to account for the temporal dependencies in streaming data and update the mixture online via sequential variational inference.
These distance metrics can serve as an objective for assessing the stability of an interaction learning algorithm.
In automatic speech recognition (ASR), model pruning is a widely adopted technique that reduces model size and latency to deploy neural network models on edge devices with resource constraints.
We also test the proposed algorithm in traffic scenarios that require coordination of all autonomous vehicles to show the practical value of delay-awareness.
Action delays degrade the performance of reinforcement learning in many real-world systems.
no code implementations • 28 Mar 2020 • Tara N. Sainath, Yanzhang He, Bo Li, Arun Narayanan, Ruoming Pang, Antoine Bruguier, Shuo-Yiin Chang, Wei Li, Raziel Alvarez, Zhifeng Chen, Chung-Cheng Chiu, David Garcia, Alex Gruenstein, Ke Hu, Minho Jin, Anjuli Kannan, Qiao Liang, Ian McGraw, Cal Peyser, Rohit Prabhavalkar, Golan Pundak, David Rybach, Yuan Shangguan, Yash Sheth, Trevor Strohman, Mirko Visontai, Yonghui Wu, Yu Zhang, Ding Zhao
Thus far, end-to-end (E2E) models have not been shown to outperform state-of-the-art conventional models with respect to both quality, i. e., word error rate (WER), and latency, i. e., the time the hypothesis is finalized after the user stops speaking.
We then train the generative model as an agent (or a generator) to investigate the risky distribution parameters for a given driving algorithm being evaluated.
Constructed by incorporating NPs with recurrent neural networks (RNNs), the ARNP model predicts the distribution of a target vehicle trajectory conditioned on the observed long-term sequential data of all surrounding vehicles.
Neural processes (NPs) learn stochastic processes and predict the distribution of target output adaptively conditioned on a context set of observed input-output pairs.
The experimental results demonstrate that our proposed method can generate a bunch of human-like multi-vehicle interaction trajectories that can fit different road conditions remaining the key interaction patterns of agents in the provided scenarios, which is import to the development of autonomous vehicles.
However, most of the data is collected in safe scenarios leading to the duplication of trajectories which are easy to be handled by currently developed algorithms.
One typical assumption in inverse reinforcement learning (IRL) is that human experts act to optimize the expected utility of a stochastic cost with a fixed distribution.
Semantic learning and understanding of multi-vehicle interaction patterns in a cluttered driving environment are essential but challenging for autonomous vehicles to make proper decisions.
We then use this model to reproduce the high-dimensional driving scenarios in a finitely tractable form.
2 code implementations • 15 Nov 2018 • Yanzhang He, Tara N. Sainath, Rohit Prabhavalkar, Ian McGraw, Raziel Alvarez, Ding Zhao, David Rybach, Anjuli Kannan, Yonghui Wu, Ruoming Pang, Qiao Liang, Deepti Bhatia, Yuan Shangguan, Bo Li, Golan Pundak, Khe Chai Sim, Tom Bagby, Shuo-Yiin Chang, Kanishka Rao, Alexander Gruenstein
End-to-end (E2E) models, which directly predict output character sequences given input speech, are good candidates for on-device speech recognition.
Generating multi-vehicle trajectories from existing limited data can provide rich resources for autonomous vehicle development and testing.
Our approach, which we re- fer to as Contextual Listen, Attend and Spell (CLAS) jointly- optimizes the ASR components along with embeddings of the context n-grams.
Semantically understanding complex drivers' encountering behavior, wherein two or multiple vehicles are spatially close to each other, does potentially benefit autonomous car's decision-making design.
A multitude of publicly-available driving datasets and data platforms have been raised for autonomous vehicles (AV).
Various automobile and mobility companies, for instance Ford, Uber and Waymo, are currently testing their pre-produced autonomous vehicle (AV) fleets on the public roads.
Learning knowledge from driving encounters could help self-driving cars make appropriate decisions when driving in complex settings with nearby vehicles engaged.
To overcome this problem, we extract naturalistic V2V encounters data from the database, and then separate the primary vehicle encounter category by clustering.
A learning-based inference method, using onboard data from CAN-Bus, radar and cameras as explanatory variables, is introduced to infer drivers' braking decisions by combining a Gaussian mixture model (GMM) with a hidden Markov model (HMM).
The distribution used in sampling is pivotal to the performance of the method, but building a suitable distribution requires case-by-case analysis.
Developing an automated vehicle, that can handle complicated driving scenarios and appropriately interact with other road users, requires the ability to semantically learn and understand driving environment, oftentimes, based on analyzing massive amounts of naturalistic driving data.
In order to achieve this, first, a Bayesian nonparametric learning method based on a hidden semi-Markov model (HSMM) is introduced to extract primitive driving patterns from time series driving data without prior knowledge of the number of these patterns.
For projects that cost millions of dollars, it is critical to determine the right amount of data needed.
It has been shown, in our previous work, that the GNSS error can be reduced from several meters to sub-meter level by matching the biased GNSS positioning to a digital map with road constraints.
Systems and Control
Cooperative map matching (CMM) uses the Global Navigation Satellite System (GNSS) positioning of a group of vehicles to improve the standalone localization accuracy.
Systems and Control
A Rao-Blackwellized particle filter (RBPF) is used to jointly estimate the common biases of the pseudo-ranges and the vehicle positions.
Systems and Control
Second, based on this model, we develop an online model-based prediction algorithm to predict the forthcoming vehicle trajectory and judge whether the driver will demonstrate an LDB or a DCB.