ICLR 2019

Meta-Learning Update Rules for Unsupervised Representation Learning

ICLR 2019 tensorflow/models

Specifically, we target semi-supervised classification performance, and we meta-learn an algorithm -- an unsupervised weight update rule -- that produces representations useful for this task.

META-LEARNING UNSUPERVISED REPRESENTATION LEARNING

GANSynth: Adversarial Neural Audio Synthesis

ICLR 2019 tensorflow/magenta

Efficient audio synthesis is an inherently difficult machine learning task, as human perception is sensitive to both global structure and fine-scale waveform coherence.

AUDIO GENERATION

Transformer-XL: Attentive Language Models Beyond a Fixed-Length Context

ICLR 2019 huggingface/pytorch-pretrained-BERT

Transformer networks have a potential of learning longer-term dependency, but are limited by a fixed-length context in the setting of language modeling.

LANGUAGE MODELLING

Pay Less Attention with Lightweight and Dynamic Convolutions

ICLR 2019 pytorch/fairseq

We predict separate convolution kernels based solely on the current time-step in order to determine the importance of context elements.

ABSTRACTIVE TEXT SUMMARIZATION LANGUAGE MODELLING MACHINE TRANSLATION

Adaptive Gradient Methods with Dynamic Bound of Learning Rate

ICLR 2019 Luolc/AdaBound

Adaptive optimization methods such as AdaGrad, RMSProp and Adam have been proposed to achieve a rapid training process with an element-wise scaling term on learning rates.

STOCHASTIC OPTIMIZATION

GAN Dissection: Visualizing and Understanding Generative Adversarial Networks

ICLR 2019 CSAILVision/gandissect

Then, we quantify the causal effect of interpretable units by measuring the ability of interventions to control objects in the output.

IMAGE GENERATION

Rethinking the Value of Network Pruning

ICLR 2019 Eric-mingjie/rethinking-network-pruning

Our observations are consistent for multiple network architectures, datasets, and tasks, which imply that: 1) training a large, over-parameterized model is often not necessary to obtain an efficient final model, 2) learned "important" weights of the large model are typically not useful for the small pruned model, 3) the pruned architecture itself, rather than a set of inherited "important" weights, is more crucial to the efficiency in the final model, which suggests that in some cases pruning can be useful as an architecture search paradigm.

ARCHITECTURE SEARCH NETWORK PRUNING

Ordered Neurons: Integrating Tree Structures into Recurrent Neural Networks

ICLR 2019 jsalt18-sentence-repl/jiant

This is a strict hierarchy: when a larger constituent ends, all of the smaller constituents that are nested within it must also be closed.

LANGUAGE MODELLING

Looking for ELMo's Friends: Sentence-Level Pretraining Beyond Language Modeling

ICLR 2019 jsalt18-sentence-repl/jiant

Work on the problem of contextualized word representation---the development of reusable neural network components for sentence understanding---has recently seen a surge of progress centered on the unsupervised pretraining task of language modeling with methods like ELMo.

LANGUAGE MODELLING

Convolutional CRFs for Semantic Segmentation

ICLR 2019 MarvinTeichmann/ConvCRF

For the challenging semantic image segmentation task the most efficient models have traditionally combined the structured modelling capabilities of Conditional Random Fields (CRFs) with the feature extraction power of CNNs.

SEMANTIC SEGMENTATION