ICLR 2020

ProtoAttend: Attention-Based Prototypical Learning

ICLR 2020 google-research/google-research

We propose a novel inherently interpretable machine learning method that bases decisions on few relevant examples that we call prototypes.

DECISION MAKING INTERPRETABLE MACHINE LEARNING

On Mutual Information Maximization for Representation Learning

ICLR 2020 google-research/google-research

Many recent methods for unsupervised or self-supervised representation learning train feature extractors by maximizing an estimate of the mutual information (MI) between different views of the data.

REPRESENTATION LEARNING SELF-SUPERVISED IMAGE CLASSIFICATION

U-GAT-IT: Unsupervised Generative Attentional Networks with Adaptive Layer-Instance Normalization for Image-to-Image Translation

ICLR 2020 taki0112/UGATIT

We propose a novel method for unsupervised image-to-image translation, which incorporates a new attention module and a new learnable normalization function in an end-to-end manner.

UNSUPERVISED IMAGE-TO-IMAGE TRANSLATION

WikiMatrix: Mining 135M Parallel Sentences in 1620 Language Pairs from Wikipedia

ICLR 2020 facebookresearch/LASER

We present an approach based on multilingual sentence embeddings to automatically extract parallel sentences from the content of Wikipedia articles in 85 languages, including several dialects or low-resource languages.

SENTENCE EMBEDDINGS

Large Batch Optimization for Deep Learning: Training BERT in 76 minutes

ICLR 2020 microsoft/DeepSpeed

In this paper, we first study a principled layerwise adaptation strategy to accelerate training of deep neural networks using large mini-batches.

#9 best model for Question Answering on SQuAD1.1 dev (F1 metric)

QUESTION ANSWERING STOCHASTIC OPTIMIZATION

Recurrent Independent Mechanisms

ICLR 2020 maximecb/gym-minigrid

Learning modular structures which reflect the dynamics of the environment can lead to better generalization and robustness to changes which only affect a few of the underlying causes.

ATARI GAMES

AutoSlim: Towards One-Shot Architecture Search for Channel Numbers

ICLR 2020 JiahuiYu/slimmable_networks

Notably, by setting optimized channel numbers, our AutoSlim-MobileNet-v2 at 305M FLOPs achieves 74. 2% top-1 accuracy, 2. 4% better than default MobileNet-v2 (301M FLOPs), and even 0. 2% better than RL-searched MNasNet (317M FLOPs).

NEURAL ARCHITECTURE SEARCH

Decentralized Distributed PPO: Mastering PointGoal Navigation

ICLR 2020 facebookresearch/habitat-api

We leverage this scaling to train an agent for 2. 5 Billion steps of experience (the equivalent of 80 years of human experience) -- over 6 months of GPU-time training in under 3 days of wall-clock time with 64 GPUs.

AUTONOMOUS NAVIGATION POINTGOAL NAVIGATION

Generative Models for Effective ML on Private, Decentralized Datasets

ICLR 2020 tensorflow/gan

To improve real-world applications of machine learning, experienced modelers develop intuition about their datasets, their models, and how the two interact.

Sparse Networks from Scratch: Faster Training without Losing Performance

ICLR 2020 TimDettmers/sparse_learning

We demonstrate the possibility of what we call sparse learning: accelerated training of deep neural networks that maintain sparse weights throughout training while achieving dense performance levels.

IMAGE CLASSIFICATION SPARSE LEARNING