We introduce a similarity index that measures the relationship between representational similarity matrices and does not suffer from this limitation.
Interpretability methods should be both meaningful to a human and correctly explain model behavior.
FEATURE IMPORTANCE IMAGE CLASSIFICATION INTERPRETABLE MACHINE LEARNING
We rigorously evaluate three state-of-the-art techniques for inducing sparsity in deep neural networks on two large-scale learning tasks: Transformer trained on WMT 2014 English-to-German, and ResNet-50 trained on ImageNet.
To this end, we introduce a method that allows for self-adaptation of learned policies: No-Reward Meta Learning (NoRML).
Recent progress in the field of reinforcement learning has been accelerated by virtual learning environments such as video games, where novel algorithms and ideas can be quickly tested in a safe and reproducible manner.