This approach significantly reduces computational costs in comparison with training each DNN backbone individually.
In this paper, we study the problem of unsupervised object detection from 3D point clouds in self-driving scenes.
Recent advances in high-fidelity simulators have enabled closed-loop training of autonomous driving agents, potentially solving the distribution shift in training v. s.
Forward gradient learning computes a noisy directional gradient and is a biologically plausible alternative to backprop for learning deep neural networks.
Large language models have recently shown promising progress in mathematical reasoning when fine-tuned with human-generated sequences walking through a sequence of solution steps.
Real world learning scenarios involve a nonstationary distribution of classes with sequential dependencies among the samples, in contrast to the standard machine learning formulation of drawing samples independently from a fixed, typically uniform distribution.
As data collection is often significantly cheaper than labeling in this domain, the decision of which subset of examples to label can have a profound impact on model performance.
Yet, there have been limited studies on the adversarial robustness of multi-modal models that fuse LiDAR features with image features.
Recent work on hyperparameters optimization (HPO) has shown the possibility of training certain hyperparameters together with regular parameters.
Growing at a fast pace, modern autonomous systems will soon be deployed at scale, opening up the possibility for cooperative multi-agent systems.
Motivated by this ability, we present a new self-supervised learning representation framework that can be directly deployed on a video stream of complex scenes with many moving objects.
Existing methods typically insert actors into the scene according to a set of hand-crafted heuristics and are limited in their ability to model the true complexity and diversity of real traffic scenes, thus inducing a content gap between synthesized traffic scenes versus real ones.
Importantly, by simulating directly from sensor data, we obtain adversarial scenarios that are safety-critical for the full autonomy stack.
In this work, we consider a realistic setting where the relationship between examples can change from episode to episode depending on the task context, which is not given to the learner.
Despite impressive progress in deep learning, generalizing far beyond the training distribution is an important open challenge.
Learned communication makes multi-agent systems more effective by aggregating distributed information.
In this paper, we propose an end-to-end self-driving network featuring a sparse attention module that learns to automatically attend to important regions of the input.
Machine learning models have traditionally been developed under the assumption that the training and test distributions match exactly.
We show that our approach can outperform the state-of-the-art on both datasets.
In this paper we propose a novel end-to-end learnable network that performs joint perception, prediction and motion planning for self-driving vehicles and produces interpretable intermediate representations.
Deep neural nets typically perform end-to-end backpropagation to learn the weights, a procedure that creates synchronization constraints in the weight update step across layers and is not biologically plausible.
We aim to bridge the gap between typical human and machine-learning environments by extending the standard framework of few-shot learning to an online, continual setting.
Modern autonomous driving systems rely heavily on deep learning models to process point cloud sensory data; meanwhile, deep models have been shown to be susceptible to adversarial attacks with visually imperceptible perturbations.
In the past few years, we have seen great progress in perception algorithms, particular through the use of deep learning.
Recent studies on catastrophic forgetting during sequential learning typically focus on fixing the accuracy of the predictions for a previously learned task.
The motion planners used in self-driving vehicles need to generate trajectories that are safe, comfortable, and obey the traffic rules.
This paper addresses this problem, incremental few-shot learning, where a regular classification network has already been trained to recognize a set of base classes, and several extra novel classes are being considered, each with only a few labeled examples.
Neural architecture search (NAS) automatically finds the best task-specific neural network topology, outperforming many manual architecture designs.
Deep neural networks have been shown to be very powerful modeling tools for many supervised learning tasks involving complex input patterns.
To address this paradigm, we propose novel extensions of Prototypical Networks (Snell et al., 2017) that are augmented with the ability to use unlabeled examples when producing prototypes.
Conventional deep convolutional neural networks (CNNs) apply convolution operators uniformly in space across all feature maps for hundreds of layers - this incurs a high computational cost for real-time applications.
Deep residual networks (ResNets) have significantly pushed forward the state-of-the-art on image classification, increasing in performance as networks grow both deeper and wider.
On the other hand, layer normalization normalizes the activations across all activities within a layer.
While convolutional neural networks have gained impressive success recently in solving structured prediction problems such as semantic segmentation, it remains a challenge to differentiate individual object instances in the scene.