ICML 2018

The most popular implementations from this conference
1
Card image cap
Obfuscated Gradients Give a False Sense of Security: Circumventing Defenses to Adversarial Examples
We identify obfuscated gradients, a kind of gradient masking, as a phenomenon that leads to a false sense of security in defenses against adversarial examples. While defenses that cause obfuscated gradients appear to defeat iterative optimization-based attacks, we find defenses relying on this effect can be circumvented.
2
Card image cap
Neural Relational Inference for Interacting Systems
Interacting systems are prevalent in nature, from dynamical systems in physics to complex societal dynamics. The interplay of components can give rise to complex behavior, which can often be explained using a simple model of the system's constituent parts.
3
Card image cap
Autoregressive Convolutional Neural Networks for Asynchronous Time Series
We propose Significance-Offset Convolutional Neural Network, a deep convolutional network architecture for regression of multivariate asynchronous time series. The model is inspired by standard autoregressive (AR) models and gating mechanisms used in recurrent neural networks.
4
Addressing Function Approximation Error in Actor-Critic Methods
In value-based reinforcement learning methods such as deep Q-learning, function approximation errors are known to lead to overestimated value estimates and suboptimal policies. We show that this problem persists in an actor-critic setting and propose novel mechanisms to minimize its effects on both the actor and the critic.
5
Card image cap
Provable defenses against adversarial examples via the convex outer adversarial polytope
We propose a method to learn deep ReLU-based classifiers that are provably robust against norm-bounded adversarial perturbations on the training data. For previously unseen examples, the approach is guaranteed to detect all adversarial examples, though it may flag some non-adversarial examples as well.
6
Card image cap
Gated Path Planning Networks
Value Iteration Networks (VINs) are effective differentiable path planning modules that can be used by agents to perform navigation while still maintaining end-to-end differentiability of the entire architecture. Despite their effectiveness, they suffer from several disadvantages including training instability, random seed sensitivity, and other optimization problems.
7
Card image cap
Learning Representations and Generative Models for 3D Point Clouds
Three-dimensional geometric data offer an excellent domain for studying representation learning and generative modeling. In this paper, we look at geometric data represented as point clouds.
8
Card image cap
The Mechanics of n-Player Differentiable Games
The first is related to potential games, which reduce to gradient descent on an implicit function; the second relates to Hamiltonian games, a new class of games that obey a conservation law, akin to conservation laws in classical mechanical systems. The decomposition motivates Symplectic Gradient Adjustment (SGA), a new algorithm for finding stable fixed points in general games.
9
Card image cap
Attention-based Deep Multiple Instance Learning
Multiple instance learning (MIL) is a variation of supervised learning where a single class label is assigned to a bag of instances. In this paper, we state the MIL problem as learning the Bernoulli distribution of the bag label where the bag label probability is fully parameterized by neural networks.
10
Card image cap
Semi-Amortized Variational Autoencoders
Amortized variational inference (AVI) replaces instance-specific local inference with a global inference network. While AVI has enabled efficient training of deep generative models such as variational autoencoders (VAE), recent empirical work suggests that inference networks can produce suboptimal variational parameters.
11
Card image cap
Path-Level Network Transformation for Efficient Architecture Search
We introduce a new function-preserving transformation for efficient neural architecture search. This network transformation allows reusing previously trained networks and existing successful architectures that improves sample efficiency.
12
Card image cap
Geometry Score: A Method For Comparing Generative Adversarial Networks
One of the biggest challenges in the research of generative adversarial networks (GANs) is assessing the quality of generated samples and detecting various levels of mode collapse. In this work, we construct a novel measure of performance of a GAN by comparing geometrical properties of the underlying data manifold and the generated one, which provides both qualitative and quantitative means for evaluation.
13
Card image cap
Interpretability Beyond Feature Attribution: Quantitative Testing with Concept Activation Vectors (TCAV)
The interpretation of deep learning models is a challenge due to their size, complexity, and often opaque internal state. In addition, many systems, such as image classifiers, operate on low-level features rather than high-level concepts.
14
Card image cap
Efficient end-to-end learning for quantizable representations
Embedding representation learning via neural networks is at the core foundation of modern similarity based search. To this end, we consider the problem of directly learning a quantizable embedding representation and the sparse binary hash code end-to-end which can be used to construct an efficient hash table not only providing significant search reduction in the number of data but also achieving the state of the art search accuracy outperforming previous state of the art deep metric learning methods.
15
Card image cap
Fast and Scalable Bayesian Deep Learning by Weight-Perturbation in Adam
Uncertainty computation in deep learning is essential to design robust and reliable systems. Variational inference (VI) is a promising approach for such computation, but requires more effort to implement and execute compared to maximum-likelihood methods.
16
Card image cap
Differentiable Compositional Kernel Learning for Gaussian Processes
The generalization properties of Gaussian processes depend heavily on the choice of kernel, and this choice remains a dark art. The NKN architecture is based on the composition rules for kernels, so that each unit of the network corresponds to a valid kernel.
17
Card image cap
Celer: a Fast Solver for the Lasso with Dual Extrapolation
Here, we propose an extrapolation technique starting from a sequence of iterates in the dual that leads to the construction of improved dual points. Finally, we propose a working set strategy based on an aggressive use of Gap Safe screening rules.
18
Card image cap
Overcoming catastrophic forgetting with hard attention to the task
In this paper, we propose a task-based hard attention mechanism that preserves previous tasks' information without affecting the current task's learning. We also show that it is robust to different hyperparameter choices, and that it offers a number of monitoring capabilities.
19
Card image cap
Mean Field Multi-Agent Reinforcement Learning
Existing multi-agent reinforcement learning methods are limited typically to a small number of agents. When the agent number increases largely, the learning becomes intractable due to the curse of the dimensionality and the exponential growth of agent interactions.
20
Card image cap
Investigating Human Priors for Playing Video Games
What makes humans so good at solving seemingly complex video games? Unlike computers, humans bring in a great deal of prior knowledge about the world, enabling efficient decision making.
21
Card image cap
Towards Binary-Valued Gates for Robust LSTM Training
Long Short-Term Memory (LSTM) is one of the most widely used recurrent structures in sequence modeling. It aims to use gates to control information flow (e.g., whether to skip some information or not) in the recurrent computations, although its practical implementation based on soft gates only partially achieves this goal.
22
Card image cap
Black-box Adversarial Attacks with Limited Queries and Information
Current neural network-based classifiers are susceptible to adversarial examples even in the black-box setting, where the attacker only has query access to the model. We develop new attacks that fool classifiers under these more restrictive threat models, where previous methods would be impractical or ineffective.
23
Hyperbolic Entailment Cones for Learning Hierarchical Embeddings
Learning graph representations via low-dimensional embeddings that preserve relevant network properties is an important class of problems in machine learning. We here present a novel method to embed directed acyclic graphs.
24
Card image cap
Learning to Explain: An Information-Theoretic Perspective on Model Interpretation
We introduce instancewise feature selection as a methodology for model interpretation. Our method is based on learning a function to extract a subset of features that are most informative for each given example.
25
Card image cap
Disentangling by Factorising
We define and address the problem of unsupervised learning of disentangled representations on data generated from independent factors of variation. We propose FactorVAE, a method that disentangles by encouraging the distribution of representations to be factorial and hence independent across the dimensions.
26
Card image cap
Provable defenses against adversarial examples via the convex outer adversarial polytope
We propose a method to learn deep ReLU-based classifiers that are provably robust against norm-bounded adversarial perturbations on the training data. For previously unseen examples, the approach is guaranteed to detect all adversarial examples, though it may flag some non-adversarial examples as well.
27
Card image cap
First Order Generative Adversarial Networks
GANs excel at learning high dimensional distributions, but they can update generator parameters in directions that do not correspond to the steepest descent direction of the objective. To formally describe an optimal update direction, we introduce a theoretical framework which allows the derivation of requirements on both the divergence and corresponding method for determining an update direction, with these requirements guaranteeing unbiased mini-batch updates in the direction of steepest descent.
28
Card image cap
Bayesian Coreset Construction via Greedy Iterative Geodesic Ascent
Coherent uncertainty quantification is a key strength of Bayesian methods. But modern algorithms for approximate Bayesian posterior inference often sacrifice accurate posterior uncertainty estimation in the pursuit of scalability.
29
Card image cap
Attention-based Deep Multiple Instance Learning
Multiple instance learning (MIL) is a variation of supervised learning where a single class label is assigned to a bag of instances. In this paper, we state the MIL problem as learning the Bernoulli distribution of the bag label where the bag label probability is fully parameterized by neural networks.
30
Card image cap
GEP-PG: Decoupling Exploration and Exploitation in Deep Reinforcement Learning Algorithms
In continuous action domains, standard deep reinforcement learning algorithms like DDPG suffer from inefficient exploration when facing sparse or deceptive reward problems. Conversely, evolutionary and developmental methods focusing on exploration like Novelty Search, Quality-Diversity or Goal Exploration Processes explore more robustly but are less efficient at fine-tuning policies using gradient descent.
31
Card image cap
Semi-Implicit Variational Inference
Semi-implicit variational inference (SIVI) is introduced to expand the commonly used analytic variational distribution family, by mixing the variational parameter with a flexible distribution. This mixing distribution can assume any density function, explicit or not, as long as independent random samples can be generated via reparameterization.
32
Card image cap
Quickshift++: Provably Good Initializations for Sample-Based Mean Shift
We provide initial seedings to the Quick Shift clustering algorithm, which approximate the locally high-density regions of the data. Such seedings act as more stable and expressive cluster-cores than the singleton modes found by Quick Shift.
33
Card image cap
Structured Variational Learning of Bayesian Neural Networks with Horseshoe Priors
Bayesian Neural Networks (BNNs) have recently received increasing attention for their ability to provide well-calibrated posterior uncertainties. However, model selection---even choosing the number of nodes---remains an open question.
34
Card image cap
Differentiable Compositional Kernel Learning for Gaussian Processes
The generalization properties of Gaussian processes depend heavily on the choice of kernel, and this choice remains a dark art. The NKN architecture is based on the composition rules for kernels, so that each unit of the network corresponds to a valid kernel.
35
Card image cap
One-Shot Segmentation in Clutter
We tackle the problem of one-shot segmentation: finding and segmenting a previously unseen object in a cluttered scene based on a single instruction example. We propose a novel dataset, which we call $\textit{cluttered Omniglot}$.
36
Card image cap
Differentiable Compositional Kernel Learning for Gaussian Processes
The generalization properties of Gaussian processes depend heavily on the choice of kernel, and this choice remains a dark art. The NKN architecture is based on the composition rules for kernels, so that each unit of the network corresponds to a valid kernel.
37
Card image cap
A Spectral Approach to Gradient Estimation for Implicit Distributions
Recently there have been increasing interests in learning and inference with implicit distributions (i.e., distributions without tractable densities). To this end, we develop a gradient estimator for implicit distributions based on Stein's identity and a spectral decomposition of kernel operators, where the eigenfunctions are approximated by the Nystr\"om method.
38
Card image cap
Adversarial Attack on Graph Structured Data
Deep learning on graph structures has shown exciting results in various applications. However, few attentions have been paid to the robustness of such models, in contrast to numerous research work for image or text adversarial attack and defense.
39
Card image cap
Anonymous Walk Embeddings
The task of representing entire graphs has seen a surge of prominent results, mainly due to learning convolutional neural networks (CNNs) on graph-structured data. While CNNs demonstrate state-of-the-art performance in graph classification task, such methods are supervised and therefore steer away from the original problem of network representation in task-agnostic manner.
40
Card image cap
Gradient-Based Meta-Learning with Learned Layerwise Metric and Subspace
Gradient-based meta-learning methods leverage gradient descent to learn the commonalities among various tasks. Our primary contribution is the {\em MT-net}, which enables the meta-learner to learn on each layer's activation space a subspace that the task-specific learner performs gradient descent on.
41
Card image cap
JointGAN: Multi-Domain Joint Distribution Learning with Generative Adversarial Nets
A new generative adversarial network is developed for joint distribution matching. Distinct from most existing approaches, that only learn conditional distributions, the proposed model aims to learn a joint distribution of multiple random variables (domains).
42
Card image cap
Towards Fast Computation of Certified Robustness for ReLU Networks
Verifying the robustness property of a general Rectified Linear Unit (ReLU) network is an NP-complete problem [Katz, Barrett, Dill, Julian and Kochenderfer CAV17]. Although finding the exact minimum adversarial distortion is hard, giving a certified lower bound of the minimum distortion is possible.
43
Card image cap
Optimization, fast and slow: optimally switching between local and Bayesian optimization
We develop the first Bayesian Optimization algorithm, BLOSSOM, which selects between multiple alternative acquisition functions and traditional local optimization at each step. This is combined with a novel stopping condition based on expected regret.
44
Card image cap
Learning Dynamics of Linear Denoising Autoencoders
Here we develop theory for how noise influences learning in DAEs. We also show that our theoretical predictions approximate learning dynamics on real-world data and qualitatively match observed dynamics in nonlinear DAEs.
45
Card image cap
Meta-Learning by Adjusting Priors Based on Extended PAC-Bayes Theory
In meta-learning an agent extracts knowledge from observed tasks, aiming to facilitate learning of novel future tasks. Thus, prior knowledge is incorporated through setting an experience-dependent prior for novel tasks.
46
Card image cap
Dynamical Isometry and a Mean Field Theory of CNNs: How to Train 10,000-Layer Vanilla Convolutional Neural Networks
In this work, we demonstrate that it is possible to train vanilla CNNs with ten thousand layers or more simply by using an appropriate initialization scheme. These conditions require that the convolution operator be an orthogonal transformation in the sense that it is norm-preserving.
47
Card image cap
Fast Information-theoretic Bayesian Optimisation
Information-theoretic Bayesian optimisation techniques have demonstrated state-of-the-art performance in tackling important global optimisation problems. However, current information-theoretic approaches require many approximations in implementation, introduce often-prohibitive computational overhead and limit the choice of kernels available to model the objective.
48
Card image cap
Decoupled Parallel Backpropagation with Convergence Guarantee
Backpropagation algorithm is indispensable for the training of feedforward neural networks. The backward locking in backpropagation algorithm constrains us from updating network layers in parallel and fully leveraging the computing resources.
49
Card image cap
Autoregressive Quantile Networks for Generative Modeling
We introduce autoregressive implicit quantile networks (AIQN), a fundamentally different approach to generative modeling than those commonly used, that implicitly captures the distribution using quantile regression. AIQN is able to achieve superior perceptual quality and improvements in evaluation metrics, without incurring a loss of sample diversity.
50
Card image cap
Classification from Pairwise Similarity and Unlabeled Data
Supervised learning needs a huge amount of labeled data, which can be a big bottleneck under the situation where there is a privacy concern or labeling cost is high. To overcome this problem, we propose a new weakly-supervised learning setting where only similar (S) data pairs (two examples belong to the same class) and unlabeled (U) data points are needed instead of fully labeled data, which is called SU classification.
51
Card image cap
Efficient Model-Based Deep Reinforcement Learning with Variational State Tabulation
Modern reinforcement learning algorithms reach super-human performance on many board and video games, but they are sample inefficient, i.e. they typically require significantly more playing experience than humans to reach an equal performance level. To improve sample efficiency, an agent may build a model of the environment and use planning methods to update its policy.
52
Card image cap
Learning the Reward Function for a Misspecified Model
It is not clear a priori what value the reward function should assign to such states. Empirically, this approach to reward learning can yield dramatic improvements in control performance when the dynamics model is flawed.
53
Card image cap
Provable defenses against adversarial examples via the convex outer adversarial polytope
We propose a method to learn deep ReLU-based classifiers that are provably robust against norm-bounded adversarial perturbations on the training data. For previously unseen examples, the approach is guaranteed to detect all adversarial examples, though it may flag some non-adversarial examples as well.
54
Card image cap
Generalization without systematicity: On the compositional skills of sequence-to-sequence recurrent networks
Humans can understand and produce new utterances effortlessly, thanks to their compositional skills. Once a person learns the meaning of a new verb "dax," he or she can immediately understand the meaning of "dax twice" or "sing and dax."
55
Card image cap
Differentially Private Database Release via Kernel Mean Embeddings
The proposed framework rests on two main ideas. First, releasing (an estimate of) the kernel mean embedding of the data generating random variable instead of the database itself still allows third-parties to construct consistent estimators of a wide class of population statistics.
56
Card image cap
Inference Suboptimality in Variational Autoencoders
We find that divergence from the true posterior is often due to imperfect recognition networks, rather than the limited complexity of the approximating distribution. Furthermore, we show that the parameters used to increase the expressiveness of the approximation play a role in generalizing inference rather than simply improving the complexity of the approximation.
57
Card image cap
Inference Suboptimality in Variational Autoencoders
We find that divergence from the true posterior is often due to imperfect recognition networks, rather than the limited complexity of the approximating distribution. Furthermore, we show that the parameters used to increase the expressiveness of the approximation play a role in generalizing inference rather than simply improving the complexity of the approximation.
58
Card image cap
Not to Cry Wolf: Distantly Supervised Multitask Learning in Critical Care
Patients in the intensive care unit (ICU) require constant and close supervision. To assist clinical staff in this task, hospitals use monitoring systems that trigger audiovisual alarms if their algorithms indicate that a patient's condition may be worsening.
59
Card image cap
Provable defenses against adversarial examples via the convex outer adversarial polytope
We propose a method to learn deep ReLU-based classifiers that are provably robust against norm-bounded adversarial perturbations on the training data. For previously unseen examples, the approach is guaranteed to detect all adversarial examples, though it may flag some non-adversarial examples as well.
60
Card image cap
Towards Fast Computation of Certified Robustness for ReLU Networks
Verifying the robustness property of a general Rectified Linear Unit (ReLU) network is an NP-complete problem [Katz, Barrett, Dill, Julian and Kochenderfer CAV17]. Although finding the exact minimum adversarial distortion is hard, giving a certified lower bound of the minimum distortion is possible.
61
Card image cap
Improving the Gaussian Mechanism for Differential Privacy: Analytical Calibration and Optimal Denoising
The Gaussian mechanism is an essential building block used in multitude of differentially private data analysis algorithms. In this paper we revisit the Gaussian mechanism and show that the original analysis has several important limitations.
62
Card image cap
Selecting Representative Examples for Program Synthesis
Program synthesis is a class of regression problems where one seeks a solution, in the form of a source-code program, mapping the inputs to their corresponding outputs exactly. Due to its precise and combinatorial nature, program synthesis is commonly formulated as a constraint satisfaction problem, where input-output examples are encoded as constraints and solved with a constraint solver.
63
Card image cap
On the Spectrum of Random Features Maps of High Dimensional Data
Random feature maps are ubiquitous in modern statistical machine learning, where they generalize random projections by means of powerful, yet often difficult to analyze nonlinear operators. In this paper, we leverage the "concentration" phenomenon induced by random matrix theory to perform a spectral analysis on the Gram matrix of these random feature maps, here for Gaussian mixture models of simultaneously large dimension and size.
64
Card image cap
Overcoming catastrophic forgetting with hard attention to the task
In this paper, we propose a task-based hard attention mechanism that preserves previous tasks' information without affecting the current task's learning. We also show that it is robust to different hyperparameter choices, and that it offers a number of monitoring capabilities.
65
Card image cap
DRACO: Byzantine-resilient Distributed Training via Redundant Gradients
Distributed model training is vulnerable to byzantine system failures and adversarial compute nodes, i.e., nodes that use malicious updates to corrupt the global model stored at a parameter server (PS). To guarantee some form of robustness, recent work suggests using variants of the geometric median as an aggregation rule, in place of gradient averaging.
66
Card image cap
Learning to Explain: An Information-Theoretic Perspective on Model Interpretation
We introduce instancewise feature selection as a methodology for model interpretation. Our method is based on learning a function to extract a subset of features that are most informative for each given example.
67
Card image cap
Analyzing the Robustness of Nearest Neighbors to Adversarial Examples
Motivated by safety-critical applications, test-time attacks on classifiers via adversarial examples has recently received a great deal of attention. Our analysis shows that its robustness properties depend critically on the value of k - the classifier may be inherently non-robust for small k, but its robustness approaches that of the Bayes Optimal classifier for fast-growing k. We propose a novel modified 1-nearest neighbor classifier, and guarantee its robustness in the large sample limit.
68
Card image cap
Analysis of Minimax Error Rate for Crowdsourcing and Its Application to Worker Clustering Model
While crowdsourcing has become an important means to label data, there is great interest in estimating the ground truth from unreliable labels produced by crowdworkers. In this paper, we derive a minimax error rate under more practical setting for a broader class of crowdsourcing models including the DS model as a special case.
69
Card image cap
Lipschitz Continuity in Model-based Reinforcement Learning
We examine the impact of learning Lipschitz continuous models in the context of model-based reinforcement learning. We go on to prove an error bound for the value-function estimate arising from Lipschitz models and show that the estimated value function is itself Lipschitz.
70
Card image cap
Interpretability Beyond Feature Attribution: Quantitative Testing with Concept Activation Vectors (TCAV)
The interpretation of deep learning models is a challenge due to their size, complexity, and often opaque internal state. In addition, many systems, such as image classifiers, operate on low-level features rather than high-level concepts.
71
Card image cap
Interpretability Beyond Feature Attribution: Quantitative Testing with Concept Activation Vectors (TCAV)
The interpretation of deep learning models is a challenge due to their size, complexity, and often opaque internal state. In addition, many systems, such as image classifiers, operate on low-level features rather than high-level concepts.
72
Card image cap
Stochastic Wasserstein Barycenters
We present a stochastic algorithm to compute the barycenter of a set of probability distributions under the Wasserstein metric from optimal transport. Unlike previous approaches, our method extends to continuous input distributions and allows the support of the barycenter to be adjusted in each iteration.
73
Card image cap
BOCK : Bayesian Optimization with Cylindrical Kernels
A major challenge in Bayesian Optimization is the boundary issue (Swersky, 2017) where an algorithm spends too many evaluations near the boundary of its search space. In this paper, we propose BOCK, Bayesian Optimization with Cylindrical Kernels, whose basic idea is to transform the ball geometry of the search space using a cylindrical transformation.
74
Card image cap
Overcoming catastrophic forgetting with hard attention to the task
In this paper, we propose a task-based hard attention mechanism that preserves previous tasks' information without affecting the current task's learning. We also show that it is robust to different hyperparameter choices, and that it offers a number of monitoring capabilities.
75
Card image cap
Tropical Geometry of Deep Neural Networks
We establish, for the first time, connections between feedforward neural networks with ReLU activation and tropical geometry --- we show that the family of such neural networks is equivalent to the family of tropical rational maps. Among other things, we deduce that feedforward ReLU neural networks with one hidden layer can be characterized by zonotopes, which serve as building blocks for deeper networks; we relate decision boundaries of such neural networks to tropical hypersurfaces, a major object of study in tropical geometry; and we prove that linear regions of such neural networks correspond to vertices of polytopes associated with tropical rational functions.
76
Card image cap
Black-box Variational Inference for Stochastic Differential Equations
Parameter inference for stochastic differential equations is challenging due to the presence of a latent diffusion process. Working with an Euler-Maruyama discretisation for the diffusion, we use variational inference to jointly learn the parameters and the diffusion paths.
77
Card image cap
On the Power of Over-parametrization in Neural Networks with Quadratic Activation
We provide new theoretical insights on why over-parametrization is effective in learning neural networks. For a $k$ hidden node shallow network with quadratic activation and $n$ training data points, we show as long as $ k \ge \sqrt{2n}$, over-parametrization enables local search algorithms to find a \emph{globally} optimal solution for general smooth and convex loss functions.
78
Card image cap
Self-Imitation Learning
This paper proposes Self-Imitation Learning (SIL), a simple off-policy actor-critic algorithm that learns to reproduce the agent's past good decisions. This algorithm is designed to verify our hypothesis that exploiting past good experiences can indirectly drive deep exploration.
79
Card image cap
Dynamical Isometry and a Mean Field Theory of CNNs: How to Train 10,000-Layer Vanilla Convolutional Neural Networks
In this work, we demonstrate that it is possible to train vanilla CNNs with ten thousand layers or more simply by using an appropriate initialization scheme. These conditions require that the convolution operator be an orthogonal transformation in the sense that it is norm-preserving.
80
Card image cap
Learning to Explain: An Information-Theoretic Perspective on Model Interpretation
We introduce instancewise feature selection as a methodology for model interpretation. Our method is based on learning a function to extract a subset of features that are most informative for each given example.
81
Card image cap
Efficient First-Order Algorithms for Adaptive Signal Denoising
We consider the problem of discrete-time signal denoising, focusing on a specific family of non-linear convolution-type estimators. Our second contribution is a computational complexity analysis of the proposed procedures, which takes into account their statistical nature and the related notion of statistical accuracy.
82
Card image cap
The Weighted Kendall and High-order Kernels for Permutations
We propose new positive definite kernels for permutations. First we introduce a weighted version of the Kendall kernel, which allows to weight unequally the contributions of different item pairs in the permutations depending on their ranks.
83
Card image cap
The Weighted Kendall and High-order Kernels for Permutations
We propose new positive definite kernels for permutations. First we introduce a weighted version of the Kendall kernel, which allows to weight unequally the contributions of different item pairs in the permutations depending on their ranks.
84
Card image cap
A Fast and Scalable Joint Estimator for Integrating Additional Knowledge in Learning Multiple Related Sparse Gaussian Graphical Models
We consider the problem of including additional knowledge in estimating sparse Gaussian graphical models (sGGMs) from aggregated samples, arising often in bioinformatics and neuroimaging applications. Previous joint sGGM estimators either fail to use existing knowledge or cannot scale-up to many tasks (large $K$) under a high-dimensional (large $p$) situation.
85
Card image cap
Learning Continuous Hierarchies in the Lorentz Model of Hyperbolic Geometry
We are concerned with the discovery of hierarchical relationships from large-scale unstructured similarity scores. For this purpose, we study different models of hyperbolic space and find that learning embeddings in the Lorentz model is substantially more efficient than in the Poincar\'e-ball model.
86
Card image cap
Deep Models of Interactions Across Sets
We use deep learning to model interactions across two or more sets of objects, such as user-movie ratings, protein-drug bindings, or ternary user-item-tag interactions. In experiments, our models achieved surprisingly good generalization performance on this matrix extrapolation task, both within domains (e.g., new users and new movies drawn from the same distribution used for training) and even across domains (e.g., predicting music ratings after training on movies).
87
Card image cap
Provable defenses against adversarial examples via the convex outer adversarial polytope
We propose a method to learn deep ReLU-based classifiers that are provably robust against norm-bounded adversarial perturbations on the training data. For previously unseen examples, the approach is guaranteed to detect all adversarial examples, though it may flag some non-adversarial examples as well.
88
Card image cap
Stein Points
An important task in computational statistics and machine learning is to approximate a posterior distribution $p(x)$ with an empirical measure supported on a set of representative points $\{x_i\}_{i=1}^n$. To this end, we present `Stein Points'.
89
Card image cap
JointGAN: Multi-Domain Joint Distribution Learning with Generative Adversarial Nets
A new generative adversarial network is developed for joint distribution matching. Distinct from most existing approaches, that only learn conditional distributions, the proposed model aims to learn a joint distribution of multiple random variables (domains).
90
Card image cap
Accelerating Natural Gradient with Higher-Order Invariance
An appealing property of the natural gradient is that it is invariant to arbitrary differentiable reparameterizations of the model. We define the order of invariance of a numerical method to be its convergence order to an invariant solution.
91
Card image cap
Addressing Function Approximation Error in Actor-Critic Methods
In value-based reinforcement learning methods such as deep Q-learning, function approximation errors are known to lead to overestimated value estimates and suboptimal policies. We show that this problem persists in an actor-critic setting and propose novel mechanisms to minimize its effects on both the actor and the critic.
92
Card image cap
Attention-based Deep Multiple Instance Learning
Multiple instance learning (MIL) is a variation of supervised learning where a single class label is assigned to a bag of instances. In this paper, we state the MIL problem as learning the Bernoulli distribution of the bag label where the bag label probability is fully parameterized by neural networks.
93
Card image cap
Stochastic Wasserstein Barycenters
We present a stochastic algorithm to compute the barycenter of a set of probability distributions under the Wasserstein metric from optimal transport. Unlike previous approaches, our method extends to continuous input distributions and allows the support of the barycenter to be adjusted in each iteration.
94
Card image cap
Generalization without systematicity: On the compositional skills of sequence-to-sequence recurrent networks
Humans can understand and produce new utterances effortlessly, thanks to their compositional skills. Once a person learns the meaning of a new verb "dax," he or she can immediately understand the meaning of "dax twice" or "sing and dax."
95
Card image cap
Disentangling by Factorising
We define and address the problem of unsupervised learning of disentangled representations on data generated from independent factors of variation. We propose FactorVAE, a method that disentangles by encouraging the distribution of representations to be factorial and hence independent across the dimensions.
96
Card image cap
Spatio-temporal Bayesian On-line Changepoint Detection with Model Selection
Bayesian On-line Changepoint Detection is extended to on-line model selection and non-stationary spatio-temporal processes. We propose spatially structured Vector Autoregressions (VARs) for modelling the process between changepoints (CPs) and give an upper bound on the approximation error of such models.
97
Card image cap
Automatic Goal Generation for Reinforcement Learning Agents
However, an agent that is trained using reinforcement learning is only capable of achieving the single task that is specified via its reward function. Instead, we propose a method that allows an agent to automatically discover the range of tasks that it is capable of performing.
98
Card image cap
Differentiable plasticity: training plastic neural networks with backpropagation
How can we build agents that keep learning from experience, quickly and efficiently, after their initial training? Here we take inspiration from the main mechanism of learning in biological brains: synaptic plasticity, carefully tuned by evolution to produce efficient lifelong learning.
99
Card image cap
Differentiable plasticity: training plastic neural networks with backpropagation
How can we build agents that keep learning from experience, quickly and efficiently, after their initial training? Here we take inspiration from the main mechanism of learning in biological brains: synaptic plasticity, carefully tuned by evolution to produce efficient lifelong learning.
100
Card image cap
Differentiable plasticity: training plastic neural networks with backpropagation
How can we build agents that keep learning from experience, quickly and efficiently, after their initial training? Here we take inspiration from the main mechanism of learning in biological brains: synaptic plasticity, carefully tuned by evolution to produce efficient lifelong learning.