# ICML 2018

The most popular implementations from this conference
##### Obfuscated Gradients Give a False Sense of Security: Circumventing Defenses to Adversarial Examples
We identify obfuscated gradients, a kind of gradient masking, as a phenomenon that leads to a false sense of security in defenses against adversarial examples. While defenses that cause obfuscated gradients appear to defeat iterative optimization-based attacks, we find defenses relying on this effect can be circumvented.
541
##### Neural Relational Inference for Interacting Systems
Interacting systems are prevalent in nature, from dynamical systems in physics to complex societal dynamics. The interplay of components can give rise to complex behavior, which can often be explained using a simple model of the system's constituent parts.
296
##### Addressing Function Approximation Error in Actor-Critic Methods
In value-based reinforcement learning methods such as deep Q-learning, function approximation errors are known to lead to overestimated value estimates and suboptimal policies. We show that this problem persists in an actor-critic setting and propose novel mechanisms to minimize its effects on both the actor and the critic.
158
##### Provable defenses against adversarial examples via the convex outer adversarial polytope
We propose a method to learn deep ReLU-based classifiers that are provably robust against norm-bounded adversarial perturbations on the training data. For previously unseen examples, the approach is guaranteed to detect all adversarial examples, though it may flag some non-adversarial examples as well.
147
##### Learning Representations and Generative Models for 3D Point Clouds
Three-dimensional geometric data offer an excellent domain for studying representation learning and generative modeling. In this paper, we look at geometric data represented as point clouds.
146
##### Autoregressive Convolutional Neural Networks for Asynchronous Time Series
We propose Significance-Offset Convolutional Neural Network, a deep convolutional network architecture for regression of multivariate asynchronous time series. The model is inspired by standard autoregressive (AR) models and gating mechanisms used in recurrent neural networks.
144
##### Gated Path Planning Networks
Value Iteration Networks (VINs) are effective differentiable path planning modules that can be used by agents to perform navigation while still maintaining end-to-end differentiability of the entire architecture. Despite their effectiveness, they suffer from several disadvantages including training instability, random seed sensitivity, and other optimization problems.
121
##### Attention-based Deep Multiple Instance Learning
Multiple instance learning (MIL) is a variation of supervised learning where a single class label is assigned to a bag of instances. In this paper, we state the MIL problem as learning the Bernoulli distribution of the bag label where the bag label probability is fully parameterized by neural networks.
114
##### The Mechanics of n-Player Differentiable Games
The first is related to potential games, which reduce to gradient descent on an implicit function; the second relates to Hamiltonian games, a new class of games that obey a conservation law, akin to conservation laws in classical mechanical systems. The decomposition motivates Symplectic Gradient Adjustment (SGA), a new algorithm for finding stable fixed points in general games.
105
##### Addressing Function Approximation Error in Actor-Critic Methods
In value-based reinforcement learning methods such as deep Q-learning, function approximation errors are known to lead to overestimated value estimates and suboptimal policies. We show that this problem persists in an actor-critic setting and propose novel mechanisms to minimize its effects on both the actor and the critic.
96
##### Semi-Amortized Variational Autoencoders
Amortized variational inference (AVI) replaces instance-specific local inference with a global inference network. While AVI has enabled efficient training of deep generative models such as variational autoencoders (VAE), recent empirical work suggests that inference networks can produce suboptimal variational parameters.
86
##### GraphRNN: Generating Realistic Graphs with Deep Auto-regressive Models
Modeling and generating graphs is fundamental for studying networks in biology, engineering, and social sciences. However, modeling complex distributions over graphs and then efficiently sampling from these distributions is challenging due to the non-unique, high-dimensional nature of graphs and the complex, non-local dependencies that exist between edges in a given graph.
75
##### Path-Level Network Transformation for Efficient Architecture Search
We introduce a new function-preserving transformation for efficient neural architecture search. This network transformation allows reusing previously trained networks and existing successful architectures that improves sample efficiency.
74
##### Interpretability Beyond Feature Attribution: Quantitative Testing with Concept Activation Vectors (TCAV)
The interpretation of deep learning models is a challenge due to their size, complexity, and often opaque internal state. In addition, many systems, such as image classifiers, operate on low-level features rather than high-level concepts.
71
##### Geometry Score: A Method For Comparing Generative Adversarial Networks
One of the biggest challenges in the research of generative adversarial networks (GANs) is assessing the quality of generated samples and detecting various levels of mode collapse. In this work, we construct a novel measure of performance of a GAN by comparing geometrical properties of the underlying data manifold and the generated one, which provides both qualitative and quantitative means for evaluation.
66
##### Fast and Scalable Bayesian Deep Learning by Weight-Perturbation in Adam
Uncertainty computation in deep learning is essential to design robust and reliable systems. Variational inference (VI) is a promising approach for such computation, but requires more effort to implement and execute compared to maximum-likelihood methods.
50
##### Efficient end-to-end learning for quantizable representations
Embedding representation learning via neural networks is at the core foundation of modern similarity based search. To this end, we consider the problem of directly learning a quantizable embedding representation and the sparse binary hash code end-to-end which can be used to construct an efficient hash table not only providing significant search reduction in the number of data but also achieving the state of the art search accuracy outperforming previous state of the art deep metric learning methods.
50
##### Hyperbolic Entailment Cones for Learning Hierarchical Embeddings
Learning graph representations via low-dimensional embeddings that preserve relevant network properties is an important class of problems in machine learning. We here present a novel method to embed directed acyclic graphs.
47
##### Black-box Adversarial Attacks with Limited Queries and Information
Current neural network-based classifiers are susceptible to adversarial examples even in the black-box setting, where the attacker only has query access to the model. We develop new attacks that fool classifiers under these more restrictive threat models, where previous methods would be impractical or ineffective.
46
##### Overcoming catastrophic forgetting with hard attention to the task
In this paper, we propose a task-based hard attention mechanism that preserves previous tasks' information without affecting the current task's learning. We also show that it is robust to different hyperparameter choices, and that it offers a number of monitoring capabilities.
46
##### Differentiable Compositional Kernel Learning for Gaussian Processes
The generalization properties of Gaussian processes depend heavily on the choice of kernel, and this choice remains a dark art. The NKN architecture is based on the composition rules for kernels, so that each unit of the network corresponds to a valid kernel.
45
##### Mean Field Multi-Agent Reinforcement Learning
Existing multi-agent reinforcement learning methods are limited typically to a small number of agents. When the agent number increases largely, the learning becomes intractable due to the curse of the dimensionality and the exponential growth of agent interactions.
44
##### Celer: a Fast Solver for the Lasso with Dual Extrapolation
Here, we propose an extrapolation technique starting from a sequence of iterates in the dual that leads to the construction of improved dual points. Finally, we propose a working set strategy based on an aggressive use of Gap Safe screening rules.
44
##### Towards Binary-Valued Gates for Robust LSTM Training
Long Short-Term Memory (LSTM) is one of the most widely used recurrent structures in sequence modeling. It aims to use gates to control information flow (e.g., whether to skip some information or not) in the recurrent computations, although its practical implementation based on soft gates only partially achieves this goal.
42
##### Disentangling by Factorising
We define and address the problem of unsupervised learning of disentangled representations on data generated from independent factors of variation. We propose FactorVAE, a method that disentangles by encouraging the distribution of representations to be factorial and hence independent across the dimensions.
41
##### Investigating Human Priors for Playing Video Games
What makes humans so good at solving seemingly complex video games? Unlike computers, humans bring in a great deal of prior knowledge about the world, enabling efficient decision making.
38
##### Learning to Explain: An Information-Theoretic Perspective on Model Interpretation
We introduce instancewise feature selection as a methodology for model interpretation. Our method is based on learning a function to extract a subset of features that are most informative for each given example.
37
##### Provable defenses against adversarial examples via the convex outer adversarial polytope
We propose a method to learn deep ReLU-based classifiers that are provably robust against norm-bounded adversarial perturbations on the training data. For previously unseen examples, the approach is guaranteed to detect all adversarial examples, though it may flag some non-adversarial examples as well.
29
##### Bayesian Coreset Construction via Greedy Iterative Geodesic Ascent
Coherent uncertainty quantification is a key strength of Bayesian methods. But modern algorithms for approximate Bayesian posterior inference often sacrifice accurate posterior uncertainty estimation in the pursuit of scalability.
27
##### Attention-based Deep Multiple Instance Learning
Multiple instance learning (MIL) is a variation of supervised learning where a single class label is assigned to a bag of instances. In this paper, we state the MIL problem as learning the Bernoulli distribution of the bag label where the bag label probability is fully parameterized by neural networks.
26
##### First Order Generative Adversarial Networks
GANs excel at learning high dimensional distributions, but they can update generator parameters in directions that do not correspond to the steepest descent direction of the objective. To formally describe an optimal update direction, we introduce a theoretical framework which allows the derivation of requirements on both the divergence and corresponding method for determining an update direction, with these requirements guaranteeing unbiased mini-batch updates in the direction of steepest descent.
25
##### Anonymous Walk Embeddings
The task of representing entire graphs has seen a surge of prominent results, mainly due to learning convolutional neural networks (CNNs) on graph-structured data. While CNNs demonstrate state-of-the-art performance in graph classification task, such methods are supervised and therefore steer away from the original problem of network representation in task-agnostic manner.
22
##### GEP-PG: Decoupling Exploration and Exploitation in Deep Reinforcement Learning Algorithms
In continuous action domains, standard deep reinforcement learning algorithms like DDPG suffer from inefficient exploration when facing sparse or deceptive reward problems. Conversely, evolutionary and developmental methods focusing on exploration like Novelty Search, Quality-Diversity or Goal Exploration Processes explore more robustly but are less efficient at fine-tuning policies using gradient descent.
21
##### Structured Variational Learning of Bayesian Neural Networks with Horseshoe Priors
Bayesian Neural Networks (BNNs) have recently received increasing attention for their ability to provide well-calibrated posterior uncertainties. However, model selection---even choosing the number of nodes---remains an open question.
20
##### Differentiable Compositional Kernel Learning for Gaussian Processes
The generalization properties of Gaussian processes depend heavily on the choice of kernel, and this choice remains a dark art. The NKN architecture is based on the composition rules for kernels, so that each unit of the network corresponds to a valid kernel.
20
##### One-Shot Segmentation in Clutter
We tackle the problem of one-shot segmentation: finding and segmenting a previously unseen object in a cluttered scene based on a single instruction example. We propose a novel dataset, which we call $\textit{cluttered Omniglot}$.
19
##### Adversarial Attack on Graph Structured Data
Deep learning on graph structures has shown exciting results in various applications. However, few attentions have been paid to the robustness of such models, in contrast to numerous research work for image or text adversarial attack and defense.
19
##### Differentiable Compositional Kernel Learning for Gaussian Processes
The generalization properties of Gaussian processes depend heavily on the choice of kernel, and this choice remains a dark art. The NKN architecture is based on the composition rules for kernels, so that each unit of the network corresponds to a valid kernel.
17
##### A Spectral Approach to Gradient Estimation for Implicit Distributions
Recently there have been increasing interests in learning and inference with implicit distributions (i.e., distributions without tractable densities). To this end, we develop a gradient estimator for implicit distributions based on Stein's identity and a spectral decomposition of kernel operators, where the eigenfunctions are approximated by the Nystr\"om method.
17
##### Provable defenses against adversarial examples via the convex outer adversarial polytope
We propose a method to learn deep ReLU-based classifiers that are provably robust against norm-bounded adversarial perturbations on the training data. For previously unseen examples, the approach is guaranteed to detect all adversarial examples, though it may flag some non-adversarial examples as well.
15
##### JointGAN: Multi-Domain Joint Distribution Learning with Generative Adversarial Nets
A new generative adversarial network is developed for joint distribution matching. Distinct from most existing approaches, that only learn conditional distributions, the proposed model aims to learn a joint distribution of multiple random variables (domains).
15
##### Dynamical Isometry and a Mean Field Theory of CNNs: How to Train 10,000-Layer Vanilla Convolutional Neural Networks
In this work, we demonstrate that it is possible to train vanilla CNNs with ten thousand layers or more simply by using an appropriate initialization scheme. These conditions require that the convolution operator be an orthogonal transformation in the sense that it is norm-preserving.
13
##### Gradient-Based Meta-Learning with Learned Layerwise Metric and Subspace
Gradient-based meta-learning methods leverage gradient descent to learn the commonalities among various tasks. Our primary contribution is the {\em MT-net}, which enables the meta-learner to learn on each layer's activation space a subspace that the task-specific learner performs gradient descent on.
12
##### Meta-Learning by Adjusting Priors Based on Extended PAC-Bayes Theory
In meta-learning an agent extracts knowledge from observed tasks, aiming to facilitate learning of novel future tasks. Thus, prior knowledge is incorporated through setting an experience-dependent prior for novel tasks.
12
##### Towards Fast Computation of Certified Robustness for ReLU Networks
Verifying the robustness property of a general Rectified Linear Unit (ReLU) network is an NP-complete problem [Katz, Barrett, Dill, Julian and Kochenderfer CAV17]. Although finding the exact minimum adversarial distortion is hard, giving a certified lower bound of the minimum distortion is possible.
11
##### Classification from Pairwise Similarity and Unlabeled Data
Supervised learning needs a huge amount of labeled data, which can be a big bottleneck under the situation where there is a privacy concern or labeling cost is high. To overcome this problem, we propose a new weakly-supervised learning setting where only similar (S) data pairs (two examples belong to the same class) and unlabeled (U) data points are needed instead of fully labeled data, which is called SU classification.
10
##### Learning Dynamics of Linear Denoising Autoencoders
Here we develop theory for how noise influences learning in DAEs. We also show that our theoretical predictions approximate learning dynamics on real-world data and qualitatively match observed dynamics in nonlinear DAEs.
9
##### Fast Information-theoretic Bayesian Optimisation
Information-theoretic Bayesian optimisation techniques have demonstrated state-of-the-art performance in tackling important global optimisation problems. However, current information-theoretic approaches require many approximations in implementation, introduce often-prohibitive computational overhead and limit the choice of kernels available to model the objective.
8
##### Autoregressive Quantile Networks for Generative Modeling
We introduce autoregressive implicit quantile networks (AIQN), a fundamentally different approach to generative modeling than those commonly used, that implicitly captures the distribution using quantile regression. AIQN is able to achieve superior perceptual quality and improvements in evaluation metrics, without incurring a loss of sample diversity.
7
##### Generalization without systematicity: On the compositional skills of sequence-to-sequence recurrent networks
Humans can understand and produce new utterances effortlessly, thanks to their compositional skills. Once a person learns the meaning of a new verb "dax," he or she can immediately understand the meaning of "dax twice" or "sing and dax."
7
##### Efficient Model-Based Deep Reinforcement Learning with Variational State Tabulation
Modern reinforcement learning algorithms reach super-human performance on many board and video games, but they are sample inefficient, i.e. they typically require significantly more playing experience than humans to reach an equal performance level. To improve sample efficiency, an agent may build a model of the environment and use planning methods to update its policy.
6
##### Learning the Reward Function for a Misspecified Model
It is not clear a priori what value the reward function should assign to such states. Empirically, this approach to reward learning can yield dramatic improvements in control performance when the dynamics model is flawed.
6
##### Towards Fast Computation of Certified Robustness for ReLU Networks
Verifying the robustness property of a general Rectified Linear Unit (ReLU) network is an NP-complete problem [Katz, Barrett, Dill, Julian and Kochenderfer CAV17]. Although finding the exact minimum adversarial distortion is hard, giving a certified lower bound of the minimum distortion is possible.
5
##### Differentially Private Database Release via Kernel Mean Embeddings
The proposed framework rests on two main ideas. First, releasing (an estimate of) the kernel mean embedding of the data generating random variable instead of the database itself still allows third-parties to construct consistent estimators of a wide class of population statistics.
4
##### Not to Cry Wolf: Distantly Supervised Multitask Learning in Critical Care
Patients in the intensive care unit (ICU) require constant and close supervision. To assist clinical staff in this task, hospitals use monitoring systems that trigger audiovisual alarms if their algorithms indicate that a patient's condition may be worsening.
3
##### Towards Fast Computation of Certified Robustness for ReLU Networks
Verifying the robustness property of a general Rectified Linear Unit (ReLU) network is an NP-complete problem [Katz, Barrett, Dill, Julian and Kochenderfer CAV17]. Although finding the exact minimum adversarial distortion is hard, giving a certified lower bound of the minimum distortion is possible.
3
##### Towards Fast Computation of Certified Robustness for ReLU Networks
Verifying the robustness property of a general Rectified Linear Unit (ReLU) network is an NP-complete problem [Katz, Barrett, Dill, Julian and Kochenderfer CAV17]. Although finding the exact minimum adversarial distortion is hard, giving a certified lower bound of the minimum distortion is possible.
3
##### Addressing Function Approximation Error in Actor-Critic Methods
In value-based reinforcement learning methods such as deep Q-learning, function approximation errors are known to lead to overestimated value estimates and suboptimal policies. We show that this problem persists in an actor-critic setting and propose novel mechanisms to minimize its effects on both the actor and the critic.
3
##### Improving the Gaussian Mechanism for Differential Privacy: Analytical Calibration and Optimal Denoising
The Gaussian mechanism is an essential building block used in multitude of differentially private data analysis algorithms. In this paper we revisit the Gaussian mechanism and show that the original analysis has several important limitations.
3
##### Spatio-temporal Bayesian On-line Changepoint Detection with Model Selection
Bayesian On-line Changepoint Detection is extended to on-line model selection and non-stationary spatio-temporal processes. We propose spatially structured Vector Autoregressions (VARs) for modelling the process between changepoints (CPs) and give an upper bound on the approximation error of such models.
3
##### Analysis of Minimax Error Rate for Crowdsourcing and Its Application to Worker Clustering Model
While crowdsourcing has become an important means to label data, there is great interest in estimating the ground truth from unreliable labels produced by crowdworkers. In this paper, we derive a minimax error rate under more practical setting for a broader class of crowdsourcing models including the DS model as a special case.
2
##### Provable defenses against adversarial examples via the convex outer adversarial polytope
We propose a method to learn deep ReLU-based classifiers that are provably robust against norm-bounded adversarial perturbations on the training data. For previously unseen examples, the approach is guaranteed to detect all adversarial examples, though it may flag some non-adversarial examples as well.
2
##### Learning to Reweight Examples for Robust Deep Learning
Deep neural networks have been shown to be very powerful modeling tools for many supervised learning tasks involving complex input patterns. However, they can also easily overfit to training set biases and label noises.
2
##### Lipschitz Continuity in Model-based Reinforcement Learning
We examine the impact of learning Lipschitz continuous models in the context of model-based reinforcement learning. We go on to prove an error bound for the value-function estimate arising from Lipschitz models and show that the estimated value function is itself Lipschitz.
2
##### Interpretability Beyond Feature Attribution: Quantitative Testing with Concept Activation Vectors (TCAV)
The interpretation of deep learning models is a challenge due to their size, complexity, and often opaque internal state. In addition, many systems, such as image classifiers, operate on low-level features rather than high-level concepts.
2
##### Selecting Representative Examples for Program Synthesis
Program synthesis is a class of regression problems where one seeks a solution, in the form of a source-code program, mapping the inputs to their corresponding outputs exactly. Due to its precise and combinatorial nature, program synthesis is commonly formulated as a constraint satisfaction problem, where input-output examples are encoded as constraints and solved with a constraint solver.
2
##### BOCK : Bayesian Optimization with Cylindrical Kernels
A major challenge in Bayesian Optimization is the boundary issue (Swersky, 2017) where an algorithm spends too many evaluations near the boundary of its search space. In this paper, we propose BOCK, Bayesian Optimization with Cylindrical Kernels, whose basic idea is to transform the ball geometry of the search space using a cylindrical transformation.
2
##### On the Spectrum of Random Features Maps of High Dimensional Data
Random feature maps are ubiquitous in modern statistical machine learning, where they generalize random projections by means of powerful, yet often difficult to analyze nonlinear operators. In this paper, we leverage the "concentration" phenomenon induced by random matrix theory to perform a spectral analysis on the Gram matrix of these random feature maps, here for Gaussian mixture models of simultaneously large dimension and size.
2
##### Learning to Explain: An Information-Theoretic Perspective on Model Interpretation
We introduce instancewise feature selection as a methodology for model interpretation. Our method is based on learning a function to extract a subset of features that are most informative for each given example.
1
##### Analyzing the Robustness of Nearest Neighbors to Adversarial Examples
Motivated by safety-critical applications, test-time attacks on classifiers via adversarial examples has recently received a great deal of attention. Our analysis shows that its robustness properties depend critically on the value of k - the classifier may be inherently non-robust for small k, but its robustness approaches that of the Bayes Optimal classifier for fast-growing k. We propose a novel modified 1-nearest neighbor classifier, and guarantee its robustness in the large sample limit.
1
##### Representation Learning on Graphs with Jumping Knowledge Networks
Recent deep learning approaches for representation learning on graphs follow a neighborhood aggregation procedure. Furthermore, combining the JK framework with models like Graph Convolutional Networks, GraphSAGE and Graph Attention Networks consistently improves those models' performance.
1
##### Deep Models of Interactions Across Sets
We use deep learning to model interactions across two or more sets of objects, such as user-movie ratings, protein-drug bindings, or ternary user-item-tag interactions. In experiments, our models achieved surprisingly good generalization performance on this matrix extrapolation task, both within domains (e.g., new users and new movies drawn from the same distribution used for training) and even across domains (e.g., predicting music ratings after training on movies).
1
##### Accelerating Natural Gradient with Higher-Order Invariance
An appealing property of the natural gradient is that it is invariant to arbitrary differentiable reparameterizations of the model. We define the order of invariance of a numerical method to be its convergence order to an invariant solution.
1
##### Interpretability Beyond Feature Attribution: Quantitative Testing with Concept Activation Vectors (TCAV)
The interpretation of deep learning models is a challenge due to their size, complexity, and often opaque internal state. In addition, many systems, such as image classifiers, operate on low-level features rather than high-level concepts.
1
##### Stochastic Wasserstein Barycenters
We present a stochastic algorithm to compute the barycenter of a set of probability distributions under the Wasserstein metric from optimal transport. Unlike previous approaches, our method extends to continuous input distributions and allows the support of the barycenter to be adjusted in each iteration.
1
##### On the Power of Over-parametrization in Neural Networks with Quadratic Activation
We provide new theoretical insights on why over-parametrization is effective in learning neural networks. For a $k$ hidden node shallow network with quadratic activation and $n$ training data points, we show as long as $k \ge \sqrt{2n}$, over-parametrization enables local search algorithms to find a \emph{globally} optimal solution for general smooth and convex loss functions.
0
##### Self-Imitation Learning
This paper proposes Self-Imitation Learning (SIL), a simple off-policy actor-critic algorithm that learns to reproduce the agent's past good decisions. This algorithm is designed to verify our hypothesis that exploiting past good experiences can indirectly drive deep exploration.
0
##### Dynamical Isometry and a Mean Field Theory of CNNs: How to Train 10,000-Layer Vanilla Convolutional Neural Networks
In this work, we demonstrate that it is possible to train vanilla CNNs with ten thousand layers or more simply by using an appropriate initialization scheme. These conditions require that the convolution operator be an orthogonal transformation in the sense that it is norm-preserving.
0
##### Learning to Explain: An Information-Theoretic Perspective on Model Interpretation
We introduce instancewise feature selection as a methodology for model interpretation. Our method is based on learning a function to extract a subset of features that are most informative for each given example.
0
##### Efficient First-Order Algorithms for Adaptive Signal Denoising
We consider the problem of discrete-time signal denoising, focusing on a specific family of non-linear convolution-type estimators. Our second contribution is a computational complexity analysis of the proposed procedures, which takes into account their statistical nature and the related notion of statistical accuracy.
0
##### The Weighted Kendall and High-order Kernels for Permutations
We propose new positive definite kernels for permutations. First we introduce a weighted version of the Kendall kernel, which allows to weight unequally the contributions of different item pairs in the permutations depending on their ranks.
0
##### The Weighted Kendall and High-order Kernels for Permutations
We propose new positive definite kernels for permutations. First we introduce a weighted version of the Kendall kernel, which allows to weight unequally the contributions of different item pairs in the permutations depending on their ranks.
0
##### Which Training Methods for GANs do actually Converge?
In this paper, we show that the requirement of absolute continuity is necessary: we describe a simple yet prototypical counterexample showing that in the more realistic case of distributions that are not absolutely continuous, unregularized GAN training is not always convergent. Based on our analysis, we extend our convergence results to more general GANs and prove local convergence for simplified gradient penalties even if the generator and data distribution lie on lower dimensional manifolds.
0
##### A Fast and Scalable Joint Estimator for Integrating Additional Knowledge in Learning Multiple Related Sparse Gaussian Graphical Models
We consider the problem of including additional knowledge in estimating sparse Gaussian graphical models (sGGMs) from aggregated samples, arising often in bioinformatics and neuroimaging applications. Previous joint sGGM estimators either fail to use existing knowledge or cannot scale-up to many tasks (large $K$) under a high-dimensional (large $p$) situation.
0
##### Learning Continuous Hierarchies in the Lorentz Model of Hyperbolic Geometry
We are concerned with the discovery of hierarchical relationships from large-scale unstructured similarity scores. For this purpose, we study different models of hyperbolic space and find that learning embeddings in the Lorentz model is substantially more efficient than in the Poincar\'e-ball model.
0
##### Provable defenses against adversarial examples via the convex outer adversarial polytope
We propose a method to learn deep ReLU-based classifiers that are provably robust against norm-bounded adversarial perturbations on the training data. For previously unseen examples, the approach is guaranteed to detect all adversarial examples, though it may flag some non-adversarial examples as well.
0
##### Provable defenses against adversarial examples via the convex outer adversarial polytope
We propose a method to learn deep ReLU-based classifiers that are provably robust against norm-bounded adversarial perturbations on the training data. For previously unseen examples, the approach is guaranteed to detect all adversarial examples, though it may flag some non-adversarial examples as well.
0
##### Stein Points
An important task in computational statistics and machine learning is to approximate a posterior distribution $p(x)$ with an empirical measure supported on a set of representative points $\{x_i\}_{i=1}^n$. To this end, we present `Stein Points'.
0
##### Classification from Pairwise Similarity and Unlabeled Data
Supervised learning needs a huge amount of labeled data, which can be a big bottleneck under the situation where there is a privacy concern or labeling cost is high. To overcome this problem, we propose a new weakly-supervised learning setting where only similar (S) data pairs (two examples belong to the same class) and unlabeled (U) data points are needed instead of fully labeled data, which is called SU classification.
0
##### JointGAN: Multi-Domain Joint Distribution Learning with Generative Adversarial Nets
A new generative adversarial network is developed for joint distribution matching. Distinct from most existing approaches, that only learn conditional distributions, the proposed model aims to learn a joint distribution of multiple random variables (domains).
0
##### Addressing Function Approximation Error in Actor-Critic Methods
In value-based reinforcement learning methods such as deep Q-learning, function approximation errors are known to lead to overestimated value estimates and suboptimal policies. We show that this problem persists in an actor-critic setting and propose novel mechanisms to minimize its effects on both the actor and the critic.
0
##### Attention-based Deep Multiple Instance Learning
Multiple instance learning (MIL) is a variation of supervised learning where a single class label is assigned to a bag of instances. In this paper, we state the MIL problem as learning the Bernoulli distribution of the bag label where the bag label probability is fully parameterized by neural networks.
0
##### Interpretability Beyond Feature Attribution: Quantitative Testing with Concept Activation Vectors (TCAV)
The interpretation of deep learning models is a challenge due to their size, complexity, and often opaque internal state. In addition, many systems, such as image classifiers, operate on low-level features rather than high-level concepts.
0
##### Stochastic Wasserstein Barycenters
We present a stochastic algorithm to compute the barycenter of a set of probability distributions under the Wasserstein metric from optimal transport. Unlike previous approaches, our method extends to continuous input distributions and allows the support of the barycenter to be adjusted in each iteration.
0
##### Generalization without systematicity: On the compositional skills of sequence-to-sequence recurrent networks
Humans can understand and produce new utterances effortlessly, thanks to their compositional skills. Once a person learns the meaning of a new verb "dax," he or she can immediately understand the meaning of "dax twice" or "sing and dax."
0
##### Disentangling by Factorising
We define and address the problem of unsupervised learning of disentangled representations on data generated from independent factors of variation. We propose FactorVAE, a method that disentangles by encouraging the distribution of representations to be factorial and hence independent across the dimensions.
0
##### Disentangling by Factorising
We define and address the problem of unsupervised learning of disentangled representations on data generated from independent factors of variation. We propose FactorVAE, a method that disentangles by encouraging the distribution of representations to be factorial and hence independent across the dimensions.
0
##### Neural Relational Inference for Interacting Systems
Interacting systems are prevalent in nature, from dynamical systems in physics to complex societal dynamics. The interplay of components can give rise to complex behavior, which can often be explained using a simple model of the system's constituent parts.
0
##### Automatic Goal Generation for Reinforcement Learning Agents
However, an agent that is trained using reinforcement learning is only capable of achieving the single task that is specified via its reward function. Instead, we propose a method that allows an agent to automatically discover the range of tasks that it is capable of performing.
0
##### Generalized Robust Bayesian Committee Machine for Large-scale Gaussian Process Regression
In order to scale standard Gaussian process (GP) regression to large-scale datasets, aggregation models employ factorized training process and then combine predictions from distributed experts. The state-of-the-art aggregation models, however, either provide inconsistent predictions or require time-consuming aggregation process.
0