You need to log in to edit.

You can create a new account if you don't have one.

Or, discuss a change on Slack.

You can create a new account if you don't have one.

Or, discuss a change on Slack.

no code implementations • 5 Feb 2022 • Liyuan Xu, Yutian Chen, Arnaud Doucet, Arthur Gretton

We study a nonparametric approach to Bayesian computation via feature means, where the expectation of prior features is updated to yield expected posterior features, based on regression from kernel or neural net features of the observations.

1 code implementation • 2 Feb 2022 • Antonin Schrab, Benjamin Guedj, Arthur Gretton

KSDAgg avoids splitting the data to perform kernel selection (which leads to a loss in test power), and rather maximises the test power over a collection of kernels.

no code implementations • 1 Feb 2022 • Chieh Wu, Aria Masoomi, Arthur Gretton, Jennifer Dy

There is currently a debate within the neuroscience community over the likelihood of the brain performing backpropagation (BP).

no code implementations • 19 Nov 2021 • Oscar Key, Tamara Fernandez, Arthur Gretton, François-Xavier Briol

Model misspecification can create significant challenges for the implementation of probabilistic models, and this has led to development of a range of inference methods which directly account for this issue.

no code implementations • 6 Nov 2021 • Rahul Singh, Liyuan Xu, Arthur Gretton

We propose kernel ridge regression estimators for mediation analysis and dynamic treatment effects over short horizons.

2 code implementations • 28 Oct 2021 • Antonin Schrab, Ilmun Kim, Mélisande Albert, Béatrice Laurent, Benjamin Guedj, Arthur Gretton

We propose a novel nonparametric two-sample test based on the Maximum Mean Discrepancy (MMD), which is constructed by aggregating tests with different kernel bandwidths.

no code implementations • NeurIPS 2021 • Pierre Glaser, Michael Arbel, Arthur Gretton

We study the gradient flow for a relaxed approximation to the Kullback-Leibler (KL) divergence between a moving source and a fixed target distribution.

1 code implementation • NeurIPS 2021 • Yazhe Li, Roman Pogodin, Danica J. Sutherland, Arthur Gretton

We approach self-supervised learning of image representations from a statistical dependence perspective, proposing Self-Supervised Learning with the Hilbert-Schmidt Independence Criterion (SSL-HSIC).

no code implementations • NeurIPS 2021 • Liyuan Xu, Heishiro Kanagawa, Arthur Gretton

Proxy causal learning (PCL) is a method for estimating the causal effect of treatments on outcomes in the presence of unobserved confounding, using proxies (structured side information) for the confounder.

no code implementations • 6 Jun 2021 • Zhu Li, Zhi-Hua Zhou, Arthur Gretton

Modern machine learning models often employ a huge number of parameters and are typically optimized to have zero training loss; yet surprisingly, they possess near-optimal prediction performance, contradicting classical learning theory.

1 code implementation • 21 May 2021 • Yutian Chen, Liyuan Xu, Caglar Gulcehre, Tom Le Paine, Arthur Gretton, Nando de Freitas, Arnaud Doucet

By applying different IV techniques to OPE, we are not only able to recover previously proposed OPE methods such as model-based techniques but also to obtain competitive new techniques.

1 code implementation • 10 May 2021 • Afsaneh Mastouri, Yuchen Zhu, Limor Gultchin, Anna Korba, Ricardo Silva, Matt J. Kusner, Arthur Gretton, Krikamol Muandet

In particular, we provide a unifying view of two-stage and moment restriction approaches for solving this problem in a nonlinear setting.

no code implementations • 14 Dec 2020 • Mihaela Rosca, Theophane Weber, Arthur Gretton, Shakir Mohamed

How sensitive should machine learning models be to input changes?

no code implementations • NeurIPS 2020 • Tamara Fernández, Wenkai Xu, Marc Ditzhaus, Arthur Gretton

We consider settings in which the data of interest correspond to pairs of ordered times, e. g, the birth times of the first and second child, the times at which a new user creates an account and makes the first purchase on a website, and the entry and survival times of patients in a clinical trial.

no code implementations • 4 Nov 2020 • Chieh Wu, Aria Masoomi, Arthur Gretton, Jennifer Dy

We propose a greedy strategy to spectrally train a deep network for multi-class classification.

no code implementations • 27 Oct 2020 • Alexander Marx, Arthur Gretton, Joris M. Mooij

One of the core assumptions in causal discovery is the faithfulness assumption, i. e., assuming that independencies found in the data are due to separations in the true causal graph.

no code implementations • NeurIPS Workshop ICBINB 2020 • Mihaela Rosca, Theophane Weber, Arthur Gretton, Shakir Mohamed

How sensitive should machine learning models be to input changes?

1 code implementation • ICLR 2021 • Liyuan Xu, Yutian Chen, Siddarth Srinivasan, Nando de Freitas, Arnaud Doucet, Arthur Gretton

We propose a novel method, deep feature instrumental variable regression (DFIV), to address the case where relations between instruments, treatments, and outcomes may be nonlinear.

1 code implementation • ICLR 2021 • Ted Moskovitz, Michael Arbel, Ferenc Huszar, Arthur Gretton

A novel optimization approach is proposed for application to policy gradient methods and evolution strategies for reinforcement learning (RL).

no code implementations • 10 Oct 2020 • Rahul Singh, Liyuan Xu, Arthur Gretton

For treatment effects, we prove $\sqrt{n}$ consistency, Gaussian approximation, and semiparametric efficiency with a new double spectral robustness property.

no code implementations • ICML 2020 • Tamara Fernandez, Nicolas Rivera, Wenkai Xu, Arthur Gretton

Survival Analysis and Reliability Theory are concerned with the analysis of time-to-event data, in which observations correspond to waiting times until an event of interest such as death from a particular disease or failure of a component in a mechanical system.

no code implementations • NeurIPS 2020 • Anna Korba, Adil Salim, Michael Arbel, Giulia Luise, Arthur Gretton

We study the Stein Variational Gradient Descent (SVGD) algorithm, which optimises a set of particles to approximate a target probability distribution $\pi\propto e^{-V}$ on $\mathbb{R}^d$.

no code implementations • 15 Jun 2020 • Chieh Wu, Aria Masoomi, Arthur Gretton, Jennifer Dy

There is currently a debate within the neuroscience community over the likelihood of the brain performing backpropagation (BP).

1 code implementation • ICML Workshop LifelongML 2020 • Iryna Korshunova, Jonas Degrave, Joni Dambre, Arthur Gretton, Ferenc Huszar

One recent approach to meta reinforcement learning (meta-RL) is to integrate models for task inference with models for control.

1 code implementation • ICLR 2021 • Michael Arbel, Liang Zhou, Arthur Gretton

We show that both training stages are well-defined: the energy is learned by maximising a generalized likelihood, and the resulting energy-based loss provides informative gradients for learning the base.

1 code implementation • ICML 2020 • Feng Liu, Wenkai Xu, Jie Lu, Guangquan Zhang, Arthur Gretton, Danica J. Sutherland

We propose a class of kernel-based two-sample tests, which aim to determine whether two sets of samples are drawn from the same distribution.

Ranked #1 on Two-sample testing on HIGGS Data Set

1 code implementation • 8 Dec 2019 • Tamara Fernandez, Arthur Gretton, David Rindt, Dino Sejdinovic

We introduce a general non-parametric independence test between right-censored survival times and covariates, which may be multivariate.

1 code implementation • ICLR 2020 • Michael Arbel, Arthur Gretton, Wuchen Li, Guido Montufar

Many machine learning problems can be expressed as the optimization of some cost functional over a parametric family of probability distributions.

no code implementations • 20 Aug 2019 • Nicolo Colombo, Ricardo Silva, Soong M Kang, Arthur Gretton

The inference problem is how information concerning perturbations, with particular covariates such as location and time, can be generalized to predict the effect of novel perturbations.

no code implementations • 1 Jul 2019 • Heishiro Kanagawa, Wittawat Jitkrittum, Lester Mackey, Kenji Fukumizu, Arthur Gretton

We propose a kernel-based nonparametric test of relative goodness of fit, where the goal is to compare two models, both of which may have unobserved latent variables, such that the marginal distribution of the observed variables is intractable.

1 code implementation • NeurIPS 2019 • Michael Arbel, Anna Korba, Adil Salim, Arthur Gretton

We construct a Wasserstein gradient flow of the maximum mean discrepancy (MMD) and study its convergence properties.

1 code implementation • NeurIPS 2019 • Rahul Singh, Maneesh Sahani, Arthur Gretton

Instrumental variable (IV) regression is a strategy for learning causal relationships in observational data.

1 code implementation • NeurIPS 2019 • Bo Dai, Zhen Liu, Hanjun Dai, Niao He, Arthur Gretton, Le Song, Dale Schuurmans

We present an efficient algorithm for maximum likelihood estimation (MLE) of exponential family models, with a general parametrization of the energy function that includes neural networks.

1 code implementation • 20 Nov 2018 • Li Wenliang, Danica J. Sutherland, Heiko Strathmann, Arthur Gretton

The kernel exponential family is a rich class of distributions, which can be fit efficiently and with statistical guarantees by score matching.

1 code implementation • 6 Nov 2018 • Bo Dai, Hanjun Dai, Arthur Gretton, Le Song, Dale Schuurmans, Niao He

We investigate penalized maximum log-likelihood estimation for exponential family distributions whose natural parameter resides in a reproducing kernel Hilbert space.

3 code implementations • NeurIPS 2018 • Wittawat Jitkrittum, Heishiro Kanagawa, Patsorn Sangkloy, James Hays, Bernhard Schölkopf, Arthur Gretton

Given two candidate models, and a set of target observations, we address the problem of measuring the relative goodness of fit of the two models.

no code implementations • 1 Jul 2018 • Maria Lomeli, Mark Rowland, Arthur Gretton, Zoubin Ghahramani

We also present a novel variance reduction scheme based on an antithetic variate construction between permutations to obtain an improved estimator for the Mallows kernel.

1 code implementation • NeurIPS 2018 • Michael Arbel, Danica J. Sutherland, Mikołaj Bińkowski, Arthur Gretton

We propose a principled method for gradient-based regularization of the critic of GAN-like models trained by adversarially optimizing the kernel of a Maximum Mean Discrepancy (MMD).

Ranked #74 on Image Generation on CIFAR-10

3 code implementations • NeurIPS 2018 • Iryna Korshunova, Jonas Degrave, Ferenc Huszár, Yarin Gal, Arthur Gretton, Joni Dambre

We present a novel model architecture which leverages deep learning tools to perform exact Bayesian inference on sets of high dimensional, complex observations.

4 code implementations • ICLR 2018 • Mikołaj Bińkowski, Danica J. Sutherland, Michael Arbel, Arthur Gretton

We investigate the training and performance of generative adversarial networks using the Maximum Mean Discrepancy (MMD) as critic, termed MMD GANs.

1 code implementation • 15 Nov 2017 • Michael Arbel, Arthur Gretton

A nonparametric family of conditional distributions is introduced, which generalizes conditional exponential families using functional parameters in a suitable RKHS.

1 code implementation • 23 May 2017 • Danica J. Sutherland, Heiko Strathmann, Michael Arbel, Arthur Gretton

We propose a fast method with statistical guarantees for learning an exponential family density model where the natural parameter is in a reproducing kernel Hilbert space, and may be infinite-dimensional.

4 code implementations • NeurIPS 2017 • Wittawat Jitkrittum, Wenkai Xu, Zoltan Szabo, Kenji Fukumizu, Arthur Gretton

We propose a novel adaptive test of goodness-of-fit, with computational cost linear in the number of samples.

no code implementations • 17 Nov 2016 • Wacha Bounliphone, Eugene Belilovsky, Arthur Tenenhaus, Ioannis Antonoglou, Arthur Gretton, Matthew B. Blashcko

The second test, called the relative test of similarity, is use to determine which of the two samples from arbitrary distributions is significantly closer to a reference sample of interest and the relative measure of similarity is based on the Maximum Mean Discrepancy (MMD).

1 code implementation • 14 Nov 2016 • Danica J. Sutherland, Hsiao-Yu Tung, Heiko Strathmann, Soumyajit De, Aaditya Ramdas, Alex Smola, Arthur Gretton

In this context, the MMD may be used in two roles: first, as a discriminator, either directly on the samples, or on features of the samples.

1 code implementation • ICML 2017 • Wittawat Jitkrittum, Zoltan Szabo, Arthur Gretton

The dependence measure is the difference between analytic embeddings of the joint distribution and the product of the marginals, evaluated at a finite set of locations (features).

1 code implementation • 25 Jun 2016 • Qinyi Zhang, Sarah Filippi, Arthur Gretton, Dino Sejdinovic

Representations of probability measures in reproducing kernel Hilbert spaces provide a flexible framework for fully nonparametric hypothesis tests of independence, which can capture any type of departure from independence, including nonlinear associations and multivariate interactions.

1 code implementation • NeurIPS 2016 • Wittawat Jitkrittum, Zoltan Szabo, Kacper Chwialkowski, Arthur Gretton

Two semimetrics on probability distributions are proposed, given as the sum of differences of expectations of analytic functions evaluated at spatial or frequency locations (i. e, features).

1 code implementation • 2 May 2016 • Sebastian Weichwald, Arthur Gretton, Bernhard Schölkopf, Moritz Grosse-Wentrup

Causal inference concerns the identification of cause-effect relationships between variables.

no code implementations • 2 Mar 2016 • Paul K. Rubenstein, Kacper P. Chwialkowski, Arthur Gretton

The main contributions of this paper are twofold: first, we prove that the Lancaster statistic satisfies the conditions required to estimate the quantiles of the null distribution using the wild bootstrap; second, the manner in which this is proved is novel, simpler than existing methods, and can further be applied to other statistics.

1 code implementation • 9 Feb 2016 • Kacper Chwialkowski, Heiko Strathmann, Arthur Gretton

Our test statistic is based on an empirical estimate of this divergence, taking the form of a V-statistic in terms of the log gradients of the target density and the kernel.

1 code implementation • 3 Dec 2015 • Sebastian Weichwald, Moritz Grosse-Wentrup, Arthur Gretton

Causal inference concerns the identification of cause-effect relationships between variables, e. g. establishing whether a stimulus affects activity in a certain brain region.

1 code implementation • 14 Nov 2015 • Wacha Bounliphone, Eugene Belilovsky, Matthew B. Blaschko, Ioannis Antonoglou, Arthur Gretton

Probabilistic generative models provide a powerful framework for representing data that avoids the expense of manual annotation typically needed by discriminative approaches.

1 code implementation • NeurIPS 2015 • Kacper Chwialkowski, Aaditya Ramdas, Dino Sejdinovic, Arthur Gretton

The new tests are consistent against a larger class of alternatives than the previous linear-time tests based on the (non-smoothed) empirical characteristic functions, while being much faster than the current state-of-the-art quadratic-time kernel-based or energy distance-based tests.

2 code implementations • NeurIPS 2015 • Heiko Strathmann, Dino Sejdinovic, Samuel Livingstone, Zoltan Szabo, Arthur Gretton

We propose Kernel Hamiltonian Monte Carlo (KMC), a gradient-free adaptive MCMC algorithm based on Hamiltonian Monte Carlo (HMC).

1 code implementation • 9 Mar 2015 • Wittawat Jitkrittum, Arthur Gretton, Nicolas Heess, S. M. Ali Eslami, Balaji Lakshminarayanan, Dino Sejdinovic, Zoltán Szabó

We propose an efficient nonparametric strategy for learning a message operator in expectation propagation (EP), which takes as input the set of incoming messages to a factor node, and produces an outgoing message as output.

no code implementations • 25 Jan 2015 • Arthur Gretton

The HSIC is defined as the distance between the embedding of the joint distribution, and the embedding of the product of the marginals, in a Reproducing Kernel Hilbert Space (RKHS).

no code implementations • 2 Jan 2015 • Wittawat Jitkrittum, Arthur Gretton, Nicolas Heess

We propose to learn a kernel-based message operator which takes as input all expectation propagation (EP) incoming messages to a factor node and produces an outgoing message.

no code implementations • 10 Dec 2014 • Jacquelyn A. Shelton, Jan Gasthaus, Zhenwen Dai, Joerg Luecke, Arthur Gretton

We propose a nonparametric procedure to achieve fast inference in generative graphical models when the number of latent states is very large.

1 code implementation • 8 Nov 2014 • Zoltan Szabo, Bharath Sriperumbudur, Barnabas Poczos, Arthur Gretton

In this paper, we study a simple, analytically computable, ridge regression-based alternative to distribution regression, where we embed the distributions to a reproducing kernel Hilbert space, and learn the regressor from the embeddings to the outputs.

no code implementations • 18 Sep 2014 • Yu Nishiyama, Motonobu Kanagawa, Arthur Gretton, Kenji Fukumizu

Our contribution in this paper is to introduce a novel approach, termed the {\em model-based kernel sum rule} (Mb-KSR), to combine a probabilistic model and kernel Bayesian inference.

1 code implementation • NeurIPS 2014 • Kacper Chwialkowski, Dino Sejdinovic, Arthur Gretton

A wild bootstrap method for nonparametric hypothesis tests based on kernel distribution embeddings is proposed.

1 code implementation • 15 Jun 2014 • Wacha Bounliphone, Arthur Gretton, Arthur Tenenhaus, Matthew Blaschko

Such a test enables us to determine whether one source variable is significantly more dependent on a first target variable or a second.

no code implementations • 21 May 2014 • Krikamol Muandet, Bharath Sriperumbudur, Kenji Fukumizu, Arthur Gretton, Bernhard Schölkopf

A mean function in a reproducing kernel Hilbert space (RKHS), or a kernel mean, is central to kernel methods in that it is used by many classical algorithms such as kernel principal component analysis, and it also forms the core inference step of modern kernel methods that rely on embedding probability distributions in RKHSs.

1 code implementation • 18 Feb 2014 • Kacper Chwialkowski, Arthur Gretton

A new non parametric approach to the problem of testing the independence of two random process is developed.

no code implementations • 7 Feb 2014 • Zoltan Szabo, Arthur Gretton, Barnabas Poczos, Bharath Sriperumbudur

To the best of our knowledge, the only existing method with consistency guarantees for distribution regression requires kernel density estimation as an intermediate step (which suffers from slow convergence issues in high dimensions), and the domain of the distributions to be compact Euclidean.

no code implementations • 17 Dec 2013 • Motonobu Kanagawa, Yu Nishiyama, Arthur Gretton, Kenji Fukumizu

In particular, the sampling and resampling procedures are novel in being expressed using kernel mean embeddings, so we theoretically analyze their behaviors.

1 code implementation • 12 Dec 2013 • Bharath Sriperumbudur, Kenji Fukumizu, Arthur Gretton, Aapo Hyvärinen, Revant Kumar

When $p_0\in\mathcal{P}$, we show that the proposed estimator is consistent, and provide a convergence rate of $n^{-\min\left\{\frac{2}{3},\frac{2\beta+1}{2\beta+2}\right\}}$ in Fisher divergence under the smoothness assumption that $\log p_0\in\mathcal{R}(C^\beta)$ for some $\beta\ge 0$, where $C$ is a certain Hilbert-Schmidt operator on $H$ and $\mathcal{R}(C^\beta)$ denotes the image of $C^\beta$.

no code implementations • NeurIPS 2013 • Wojciech Zaremba, Arthur Gretton, Matthew Blaschko

We propose a family of maximum mean discrepancy (MMD) kernel two-sample tests that have low sample complexity and are consistent.

no code implementations • 26 Sep 2013 • Byron Boots, Geoffrey Gordon, Arthur Gretton

The essence is to represent the state as a nonparametric conditional embedding operator in a Reproducing Kernel Hilbert Space (RKHS) and leverage recent work in kernel methods to estimate, predict, and update the representation.

1 code implementation • 19 Jul 2013 • Dino Sejdinovic, Heiko Strathmann, Maria Lomeli Garcia, Christophe Andrieu, Arthur Gretton

A Kernel Adaptive Metropolis-Hastings algorithm is introduced, for the purpose of sampling from a target distribution with strongly nonlinear support.

1 code implementation • 8 Jul 2013 • Wojciech Zaremba, Arthur Gretton, Matthew Blaschko

A family of maximum mean discrepancy (MMD) kernel two-sample tests is introduced.

no code implementations • NeurIPS 2013 • Dino Sejdinovic, Arthur Gretton, Wicher Bergsma

We introduce kernel nonparametric tests for Lancaster three-variable interaction and for total independence, using embeddings of signed measures into a reproducing kernel Hilbert space.

no code implementations • 4 Jun 2013 • Krikamol Muandet, Kenji Fukumizu, Bharath Sriperumbudur, Arthur Gretton, Bernhard Schölkopf

A mean function in reproducing kernel Hilbert space, or a kernel mean, is an important part of many applications ranging from kernel principal component analysis to Hilbert-space embedding of distributions.

no code implementations • NeurIPS 2012 • Arthur Gretton, Dino Sejdinovic, Heiko Strathmann, Sivaraman Balakrishnan, Massimiliano Pontil, Kenji Fukumizu, Bharath K. Sriperumbudur

A means of parameter selection for the two-sample test based on the MMD is proposed.

no code implementations • 25 Jul 2012 • Dino Sejdinovic, Bharath Sriperumbudur, Arthur Gretton, Kenji Fukumizu

We provide a unifying framework linking two classes of statistics used in two-sample and independence testing: on the one hand, the energy distances and distance covariances from the statistics literature; on the other, maximum mean discrepancies (MMD), that is, distances between embeddings of distributions to reproducing kernel Hilbert spaces (RKHS), as established in machine learning.

no code implementations • 18 Jun 2012 • Steffen Grunewalder, Guy Lever, Luca Baldassarre, Massi Pontil, Arthur Gretton

For policy optimisation we compare with least-squares policy iteration where a Gaussian process is used for value function estimation.

no code implementations • NeurIPS 2011 • Kenji Fukumizu, Le Song, Arthur Gretton

A nonparametric kernel-based method for realizing Bayes' rule is proposed, based on kernel representations of probabilities in reproducing kernel Hilbert spaces.

no code implementations • NeurIPS 2009 • Arthur Gretton, Kenji Fukumizu, Zaïd Harchaoui, Bharath K. Sriperumbudur

A kernel embedding of probability distributions into reproducing kernel Hilbert spaces (RKHS) has recently been proposed, which allows the comparison of two probability measures P and Q based on the distance between their respective embeddings: for a sufficiently rich RKHS, this distance is zero if and only if P and Q coincide.

no code implementations • NeurIPS 2009 • Arthur Gretton, Peter Spirtes, Robert E. Tillman

This results in a more computationally efficient approach that is useful for arbitrary distributions even when additive noise models are invertible.

no code implementations • 30 Jul 2009 • Bharath K. Sriperumbudur, Arthur Gretton, Kenji Fukumizu, Bernhard Schölkopf, Gert R. G. Lanckriet

First, we consider the question of determining the conditions on the kernel $k$ for which $\gamma_k$ is a metric: such $k$ are denoted {\em characteristic kernels}.

no code implementations • 18 Jan 2009 • Bharath K. Sriperumbudur, Kenji Fukumizu, Arthur Gretton, Bernhard Schölkopf, Gert R. G. Lanckriet

First, to understand the relation between IPMs and $\phi$-divergences, the necessary and sufficient conditions under which these classes intersect are derived: the total variation distance is shown to be the only non-trivial $\phi$-divergence that is also an IPM.

Information Theory Information Theory

no code implementations • NeurIPS 2008 • Matthew Blaschko, Arthur Gretton

We introduce a family of unsupervised algorithms, numerical taxonomy clustering, to simultaneously cluster data, and to learn a taxonomy that encodes the relationship between the clusters.

no code implementations • NeurIPS 2008 • Kenji Fukumizu, Arthur Gretton, Bernhard Schölkopf, Bharath K. Sriperumbudur

Embeddings of random variables in reproducing kernel Hilbert spaces (RKHSs) may be used to conduct statistical inference based on higher order moments.

no code implementations • NeurIPS 2008 • Xinhua Zhang, Le Song, Arthur Gretton, Alex J. Smola

Many machine learning algorithms can be formulated in the framework of statistical independence such as the Hilbert Schmidt Independence Criterion.

Cannot find the paper you are looking for? You can
Submit a new open access paper.

Contact us on:
hello@paperswithcode.com
.
Papers With Code is a free resource with all data licensed under CC-BY-SA.