We also argue that it is necessary for DNNs to exploit GO to overcome shortcut learning.
Speech Activity Detection (SAD), locating speech segments within an audio recording, is a main part of most speech technology applications.
We conclude that the data augmentation caused by style-variation accounts for the improved corruption robustness and increased shape bias is only a byproduct.
Single-view 3D object reconstruction has seen much progress, yet methods still struggle generalizing to novel shapes unseen during training.
We introduce the SE(3)-Transformer, a variant of the self-attention module for 3D point clouds and graphs, which is equivariant under continuous 3D roto-translations.
Recently, there has been a growing interest in developing saliency methods that provide visual explanations of network predictions.
Deep neural networks, and in particular recurrent networks, are promising candidates to control autonomous agents that interact in real-time with the physical world.
Most artificial deep neural networks are partitioned into a directed graph of connected modules or layers and the layers themselves consist of elemental building blocks, such as single units.
We show empirically that there exist barely perceptible universal noise patterns which result in nearly the same predicted segmentation for arbitrary inputs.
Machine learning methods in general and Deep Neural Networks in particular have shown to be vulnerable to adversarial perturbations.
In this work, we propose to augment deep neural networks with a small "detector" subnetwork which is trained on the binary classification task of distinguishing genuine data from data containing adversarial perturbations.