At the final update, each client computes the joint gradient over both client-specific and common weights and returns the gradient of common parameters to the server.
Gradient estimation -- approximating the gradient of an expectation with respect to the parameters of a distribution -- is central to the solution of many machine learning problems.
We introduce a framework for online changepoint detection and simultaneous model learning which is applicable to highly parametrized models, such as deep neural networks.
The variational auto-encoder (VAE) is a deep latent variable model that has two neural networks in an autoencoder-like architecture; one of them parameterizes the model's likelihood.
We formulate meta learning using information theoretic concepts; namely, mutual information and the information bottleneck.
We introduce a new interpretation of sparse variational approximations for Gaussian processes using inducing points, which can lead to more scalable algorithms than previous methods.
Generative adversarial networks (GANs) are a powerful approach to unsupervised learning.
Ranked #2 on Image Generation on Stacked MNIST
We develop a method to combine Markov chain Monte Carlo (MCMC) and variational inference (VI), leveraging the advantages of both inference approaches.
We introduce a framework for Continual Learning (CL) based on Bayesian inference over the function space rather than the parameters of a deep neural network.
The resulting method is flexible and it can be easily incorporated to any standard off-policy and on-policy algorithms, such as those based on temporal differences and policy gradients.
We develop unbiased implicit variational inference (UIVI), a method that expands the applicability of variational inference by defining an expressive variational family.
We introduce fully scalable Gaussian processes, an implementation scheme that tackles the problem of treating a high number of training instances together with high dimensional input data.
It maximizes a lower bound on the marginal likelihood of the data.
We introduce a new algorithm for approximate inference that combines reparametrization, Markov chain Monte Carlo and variational methods.
Bayesian inference for factorial hidden Markov models is challenging due to the exponentially sized latent variable space.
Boolean matrix factorisation aims to decompose a binary data matrix into an approximate Boolean product of two low rank, binary matrices: one containing meaningful patterns, the other quantifying how the observations can be expressed as a combination of these patterns.
The reparameterization gradient has become a widely used method to obtain Monte Carlo gradients to optimize the variational objective.
The softmax representation of probabilities for categorical variables plays a prominent role in modern machine learning with numerous applications in areas such as large scale classification, neural language modeling and recommendation systems.
Instead of taking samples from the variational distribution, we use importance sampling to take samples from an overdispersed distribution in the same exponential family as the variational approximation.
DPPs possess desirable properties, such as exact sampling or analyticity of the moments, but learning the parameters of kernel $K$ through likelihood-based inference is not straightforward.
We introduce local expectation gradients which is a general purpose stochastic variational inference algorithm for constructing stochastic gradients through sampling from the variational distribution.
The Gaussian process latent variable model (GP-LVM) provides a flexible approach for non-linear dimensionality reduction that has been widely applied.
We introduce a variational Bayesian inference algorithm which can be widely applied to sparse linear models.
Our work builds on recent variational approximations for Gaussian process latent variable models to allow for nonlinear dimensionality reduction simultaneously with learning a dynamical prior in the latent space.
We describe an efficient Markov chain Monte Carlo algorithm for sampling from the posterior process of the GP model.