no code implementations • 4 Feb 2025 • Valentin De Bortoli, Alexandre Galashov, J. Swaroop Guntupalli, Guangyao Zhou, Kevin Murphy, Arthur Gretton, Arnaud Doucet
The corresponding reverse process progressively "denoises" a Gaussian sample into a sample from the data distribution.
no code implementations • 9 Jan 2025 • Valentin De Bortoli, Alexandre Galashov, Arthur Gretton, Arnaud Doucet
Speculative sampling is a popular technique for accelerating inference in Large Language Models by generating candidate tokens using a fast draft model and accepting or rejecting them based on the target model's distribution.
no code implementations • 9 Jan 2025 • Juno Kim, Dimitri Meunier, Arthur Gretton, Taiji Suzuki, Zhu Li
We prove that the DFIV algorithm achieves the minimax optimal learning rate when the target structural function lies in a Besov space.
no code implementations • 18 Dec 2024 • Eleni Sgouritsa, Virginia Aglietti, Yee Whye Teh, Arnaud Doucet, Arthur Gretton, Silvia Chiappa
The reasoning abilities of Large Language Models (LLMs) are attracting increasing attention.
no code implementations • 29 Nov 2024 • Dimitri Meunier, Zhu Li, Tim Christensen, Arthur Gretton
We study the kernel instrumental variable algorithm of \citet{singh2019kernel}, a nonparametric two-stage least squares (2SLS) procedure which has demonstrated strong empirical performance.
no code implementations • 18 Oct 2024 • Hugh Dance, Peter Orbanz, Arthur Gretton
Accurate uncertainty quantification for causal effects is essential for robust decision making in complex systems, but remains challenging in non-parametric settings.
1 code implementation • 16 Oct 2024 • Siu Lun Chau, Antonin Schrab, Arthur Gretton, Dino Sejdinovic, Krikamol Muandet
We introduce credal two-sample testing, a new hypothesis testing framework for comparing credal sets -- convex sets of probability measures where each element captures aleatoric uncertainty and the set itself represents epistemic uncertainty that arises from the modeller's partial ignorance.
1 code implementation • 23 Sep 2024 • Zonghao Chen, Aratrika Mustafi, Pierre Glaser, Anna Korba, Arthur Gretton, Bharath K. Sriperumbudur
We introduce a (de)-regularization of the Maximum Mean Discrepancy (DrMMD) and its Wasserstein gradient flow.
no code implementations • 31 Aug 2024 • Harley Wiltzer, Jesse Farebrother, Arthur Gretton, Mark Rowland
In reinforcement learning (RL), the consideration of multivariate reward signals has led to fundamental advancements in multi-objective decision-making, transfer learning, and representation learning.
1 code implementation • 15 Jul 2024 • Tongzheng Ren, Haotian Sun, Antoine Moulin, Arthur Gretton, Bo Dai
We address the problem of causal effect estimation where hidden confounders are present, with a focus on two settings: instrumental variable regression with additional observed confounders, and proxy causal learning.
no code implementations • 25 Jun 2024 • Jessica Schrouff, Alexis Bellot, Amal Rannen-Triki, Alan Malek, Isabela Albuquerque, Arthur Gretton, Alexander D'Amour, Silvia Chiappa
Failures of fairness or robustness in machine learning predictive settings can be due to undesired dependencies between covariates, outcomes and auxiliary factors of variation.
1 code implementation • 24 Jun 2024 • Zonghao Chen, Masha Naslidnyk, Arthur Gretton, François-Xavier Briol
We propose a novel approach for estimating conditional or parametric expectations in the setting where obtaining samples or evaluating integrands is costly.
no code implementations • 23 May 2024 • Dimitri Meunier, Zikai Shen, Mattes Mollenhauer, Arthur Gretton, Zhu Li
First, we rigorously confirm the so-called saturation effect for ridge regression with vector-valued output by deriving a novel lower bound on learning rates; this bound is shown to be suboptimal when the smoothness of the regression function exceeds a certain level.
no code implementations • 10 May 2024 • Alexandre Galashov, Valentin De Bortoli, Arthur Gretton
We propose a gradient flow procedure for generative modeling by transporting particles from an initial source distribution to a target distribution, where the gradient field on the particles is given by a noise-adaptive Wasserstein Gradient of the Maximum Mean Discrepancy (MMD).
no code implementations • 12 Mar 2024 • Katherine Tsai, Stephen R. Pfohl, Olawale Salaudeen, Nicole Chiou, Matt J. Kusner, Alexander D'Amour, Sanmi Koyejo, Arthur Gretton
We study the problem of domain adaptation under distribution shift, where the shift is due to a change in the distribution of an unobserved, latent variable that confounds both the covariates and the labels.
1 code implementation • 20 Feb 2024 • Roman Pogodin, Antonin Schrab, Yazhe Li, Danica J. Sutherland, Arthur Gretton
We describe a data-efficient, kernel-based approach to statistical testing of conditional independence.
1 code implementation • 13 Feb 2024 • Harley Wiltzer, Jesse Farebrother, Arthur Gretton, Yunhao Tang, André Barreto, Will Dabney, Marc G. Bellemare, Mark Rowland
This paper contributes a new approach for distributional reinforcement learning which elucidates a clean separation of transition structure and reward in the learning process.
Distributional Reinforcement Learning
Model-based Reinforcement Learning
+2
no code implementations • 12 Dec 2023 • Zhu Li, Dimitri Meunier, Mattes Mollenhauer, Arthur Gretton
We present the first optimal rates for infinite-dimensional vector-valued ridge regression on a continuous scale of norms that interpolate between $L_2$ and the hypothesis space, which we consider as a vector-valued reproducing kernel Hilbert space.
1 code implementation • 9 Dec 2023 • Li Kevin Wenliang, Grégoire Delétang, Matthew Aitchison, Marcus Hutter, Anian Ruoss, Arthur Gretton, Mark Rowland
We propose a novel algorithmic framework for distributional reinforcement learning, based on learning finite-dimensional mean embeddings of return distributions.
1 code implementation • 8 Aug 2023 • Liyuan Xu, Arthur Gretton
We consider the problem of causal effect estimation with an unobserved confounder, where we observe a proxy variable that is associated with the confounder.
no code implementations • 20 Jul 2023 • Dimitri Meunier, Zhu Li, Arthur Gretton, Samory Kpotufe
Many recent theoretical works on meta-learning aim to achieve guarantees in leveraging similar representational structures from related tasks towards simplifying a target task.
no code implementations • 23 Jun 2023 • William I. Walker, Arthur Gretton, Maneesh Sahani
We introduce a new approach to prediction in graphical models with latent-shift adaptation, i. e., where source and target environments differ in the distribution of an unobserved confounding latent variable.
1 code implementation • NeurIPS 2023 • Felix Biggs, Antonin Schrab, Arthur Gretton
We propose novel statistics which maximise the power of a two-sample test based on the Maximum Mean Discrepancy (MMD), by adapting over the set of kernels used in defining it.
1 code implementation • 8 Mar 2023 • Lisa M. Koch, Christian M. Schürch, Christian F. Baumgartner, Arthur Gretton, Philipp Berens
We formulate subgroup shift detection in the framework of statistical hypothesis testing and show that recent state-of-the-art statistical tests can be effectively applied to subgroup shift detection on medical imaging data.
no code implementations • 21 Dec 2022 • Ibrahim Alabdulmohsin, Nicole Chiou, Alexander D'Amour, Arthur Gretton, Sanmi Koyejo, Matt J. Kusner, Stephen R. Pfohl, Olawale Salaudeen, Jessica Schrouff, Katherine Tsai
We show that the optimal target predictor can be non-parametrically identified with the help of concept and proxy variables available only in the source domain, and unlabeled data from the target.
1 code implementation • 16 Dec 2022 • Roman Pogodin, Namrata Deka, Yazhe Li, Danica J. Sutherland, Victor Veitch, Arthur Gretton
The procedure requires just a single ridge regression from $Y$ to kernelized features of $Z$, which can be done in advance.
no code implementations • 10 Nov 2022 • Heishiro Kanagawa, Alessandro Barp, Arthur Gretton, Lester Mackey
Kernel Stein discrepancies (KSDs) measure the quality of a distributional approximation and can be computed even when the target density has an intractable normalizing constant.
1 code implementation • 26 Oct 2022 • Pierre Glaser, Michael Arbel, Samo Hromadka, Arnaud Doucet, Arthur Gretton
We introduce two synthetic likelihood methods for Simulation-Based Inference (SBI), to conduct either amortized or targeted inference from experimental observations when a high-fidelity simulator is available.
1 code implementation • 19 Oct 2022 • Jerome Baum, Heishiro Kanagawa, Arthur Gretton
We propose a goodness-of-fit measure for probability densities modeling observations with varying dimensionality, such as text documents of differing lengths or variable-length sequences.
no code implementations • 12 Oct 2022 • Liyuan Xu, Arthur Gretton
We consider the estimation of average and counterfactual treatment effects, under two settings: back-door adjustment and front-door adjustment.
no code implementations • 2 Aug 2022 • Zhu Li, Dimitri Meunier, Mattes Mollenhauer, Arthur Gretton
We address the misspecified setting, where the target CME is in the space of Hilbert-Schmidt operators acting from an input interpolation space between $\mathcal{H}_X$ and $L_2$, to $\mathcal{H}_Y$.
no code implementations • 22 Jun 2022 • Antonin Schrab, Wittawat Jitkrittum, Zoltán Szabó, Dino Sejdinovic, Arthur Gretton
We discuss how MultiFIT, the Multiscale Fisher's Independence Test for Multivariate Dependence proposed by Gorsky and Ma (2022), compares to existing linear-time kernel tests based on the Hilbert-Schmidt independence criterion (HSIC).
4 code implementations • 18 Jun 2022 • Antonin Schrab, Ilmun Kim, Benjamin Guedj, Arthur Gretton
We derive non-asymptotic uniform separation rates for MMDAggInc and HSICAggInc, and quantify exactly the trade-off between computational efficiency and the attainable rates: this result is novel for tests based on incomplete $U$-statistics, to our knowledge.
no code implementations • 18 Jun 2022 • Yuchen Zhu, Limor Gultchin, Arthur Gretton, Matt Kusner, Ricardo Silva
We propose a kernel-based nonparametric estimator for the causal effect when the cause is corrupted by error.
no code implementations • 5 Feb 2022 • Liyuan Xu, Yutian Chen, Arnaud Doucet, Arthur Gretton
We study a nonparametric approach to Bayesian computation via feature means, where the expectation of prior features is updated to yield expected kernel posterior features, based on regression from learned neural net or kernel features of the observations.
2 code implementations • 2 Feb 2022 • Antonin Schrab, Benjamin Guedj, Arthur Gretton
KSDAgg avoids splitting the data to perform kernel selection (which leads to a loss in test power), and rather maximises the test power over a collection of kernels.
no code implementations • 1 Feb 2022 • Chieh Wu, Aria Masoomi, Arthur Gretton, Jennifer Dy
There is currently a debate within the neuroscience community over the likelihood of the brain performing backpropagation (BP).
1 code implementation • 19 Nov 2021 • Oscar Key, Arthur Gretton, François-Xavier Briol, Tamara Fernandez
Model misspecification can create significant challenges for the implementation of probabilistic models, and this has led to development of a range of robust methods which directly account for this issue.
no code implementations • 6 Nov 2021 • Rahul Singh, Liyuan Xu, Arthur Gretton
We propose simple nonparametric estimators for mediated and time-varying dose response curves based on kernel ridge regression.
3 code implementations • NeurIPS 2023 • Antonin Schrab, Ilmun Kim, Mélisande Albert, Béatrice Laurent, Benjamin Guedj, Arthur Gretton
In practice, this parameter is unknown and, hence, the optimal MMD test with this particular kernel cannot be used.
1 code implementation • NeurIPS 2021 • Pierre Glaser, Michael Arbel, Arthur Gretton
We study the gradient flow for a relaxed approximation to the Kullback-Leibler (KL) divergence between a moving source and a fixed target distribution.
1 code implementation • NeurIPS 2021 • Yazhe Li, Roman Pogodin, Danica J. Sutherland, Arthur Gretton
We approach self-supervised learning of image representations from a statistical dependence perspective, proposing Self-Supervised Learning with the Hilbert-Schmidt Independence Criterion (SSL-HSIC).
1 code implementation • NeurIPS 2021 • Liyuan Xu, Heishiro Kanagawa, Arthur Gretton
Proxy causal learning (PCL) is a method for estimating the causal effect of treatments on outcomes in the presence of unobserved confounding, using proxies (structured side information) for the confounder.
no code implementations • 6 Jun 2021 • Zhu Li, Zhi-Hua Zhou, Arthur Gretton
Modern machine learning models often employ a huge number of parameters and are typically optimized to have zero training loss; yet surprisingly, they possess near-optimal prediction performance, contradicting classical learning theory.
1 code implementation • 21 May 2021 • Yutian Chen, Liyuan Xu, Caglar Gulcehre, Tom Le Paine, Arthur Gretton, Nando de Freitas, Arnaud Doucet
By applying different IV techniques to OPE, we are not only able to recover previously proposed OPE methods such as model-based techniques but also to obtain competitive new techniques.
2 code implementations • 10 May 2021 • Afsaneh Mastouri, Yuchen Zhu, Limor Gultchin, Anna Korba, Ricardo Silva, Matt J. Kusner, Arthur Gretton, Krikamol Muandet
In particular, we provide a unifying view of two-stage and moment restriction approaches for solving this problem in a nonlinear setting.
no code implementations • 14 Dec 2020 • Mihaela Rosca, Theophane Weber, Arthur Gretton, Shakir Mohamed
How sensitive should machine learning models be to input changes?
no code implementations • NeurIPS 2020 • Tamara Fernández, Wenkai Xu, Marc Ditzhaus, Arthur Gretton
We consider settings in which the data of interest correspond to pairs of ordered times, e. g, the birth times of the first and second child, the times at which a new user creates an account and makes the first purchase on a website, and the entry and survival times of patients in a clinical trial.
no code implementations • 4 Nov 2020 • Chieh Wu, Aria Masoomi, Arthur Gretton, Jennifer Dy
We propose a greedy strategy to spectrally train a deep network for multi-class classification.
no code implementations • 27 Oct 2020 • Alexander Marx, Arthur Gretton, Joris M. Mooij
One of the core assumptions in causal discovery is the faithfulness assumption, i. e., assuming that independencies found in the data are due to separations in the true causal graph.
no code implementations • NeurIPS Workshop ICBINB 2020 • Mihaela Rosca, Theophane Weber, Arthur Gretton, Shakir Mohamed
How sensitive should machine learning models be to input changes?
1 code implementation • ICLR 2021 • Liyuan Xu, Yutian Chen, Siddarth Srinivasan, Nando de Freitas, Arnaud Doucet, Arthur Gretton
We propose a novel method, deep feature instrumental variable regression (DFIV), to address the case where relations between instruments, treatments, and outcomes may be nonlinear.
1 code implementation • ICLR 2021 • Ted Moskovitz, Michael Arbel, Ferenc Huszar, Arthur Gretton
A novel optimization approach is proposed for application to policy gradient methods and evolution strategies for reinforcement learning (RL).
no code implementations • 10 Oct 2020 • Rahul Singh, Liyuan Xu, Arthur Gretton
We propose estimators based on kernel ridge regression for nonparametric causal functions such as dose, heterogeneous, and incremental response curves.
no code implementations • ICML 2020 • Tamara Fernandez, Nicolas Rivera, Wenkai Xu, Arthur Gretton
Survival Analysis and Reliability Theory are concerned with the analysis of time-to-event data, in which observations correspond to waiting times until an event of interest such as death from a particular disease or failure of a component in a mechanical system.
no code implementations • NeurIPS 2020 • Anna Korba, Adil Salim, Michael Arbel, Giulia Luise, Arthur Gretton
We study the Stein Variational Gradient Descent (SVGD) algorithm, which optimises a set of particles to approximate a target probability distribution $\pi\propto e^{-V}$ on $\mathbb{R}^d$.
no code implementations • 15 Jun 2020 • Chieh Wu, Aria Masoomi, Arthur Gretton, Jennifer Dy
There is currently a debate within the neuroscience community over the likelihood of the brain performing backpropagation (BP).
1 code implementation • ICML Workshop LifelongML 2020 • Iryna Korshunova, Jonas Degrave, Joni Dambre, Arthur Gretton, Ferenc Huszar
One recent approach to meta reinforcement learning (meta-RL) is to integrate models for task inference with models for control.
1 code implementation • ICLR 2021 • Michael Arbel, Liang Zhou, Arthur Gretton
We show that both training stages are well-defined: the energy is learned by maximising a generalized likelihood, and the resulting energy-based loss provides informative gradients for learning the base.
1 code implementation • ICML 2020 • Feng Liu, Wenkai Xu, Jie Lu, Guangquan Zhang, Arthur Gretton, Danica J. Sutherland
We propose a class of kernel-based two-sample tests, which aim to determine whether two sets of samples are drawn from the same distribution.
Ranked #1 on
Two-sample testing
on HIGGS Data Set
1 code implementation • 8 Dec 2019 • Tamara Fernandez, Arthur Gretton, David Rindt, Dino Sejdinovic
We introduce a general non-parametric independence test between right-censored survival times and covariates, which may be multivariate.
1 code implementation • ICLR 2020 • Michael Arbel, Arthur Gretton, Wuchen Li, Guido Montufar
Many machine learning problems can be expressed as the optimization of some cost functional over a parametric family of probability distributions.
no code implementations • 20 Aug 2019 • Nicolo Colombo, Ricardo Silva, Soong M Kang, Arthur Gretton
The inference problem is how information concerning perturbations, with particular covariates such as location and time, can be generalized to predict the effect of novel perturbations.
1 code implementation • 1 Jul 2019 • Heishiro Kanagawa, Wittawat Jitkrittum, Lester Mackey, Kenji Fukumizu, Arthur Gretton
We propose a kernel-based nonparametric test of relative goodness of fit, where the goal is to compare two models, both of which may have unobserved latent variables, such that the marginal distribution of the observed variables is intractable.
1 code implementation • NeurIPS 2019 • Michael Arbel, Anna Korba, Adil Salim, Arthur Gretton
We construct a Wasserstein gradient flow of the maximum mean discrepancy (MMD) and study its convergence properties.
1 code implementation • NeurIPS 2019 • Rahul Singh, Maneesh Sahani, Arthur Gretton
Instrumental variable (IV) regression is a strategy for learning causal relationships in observational data.
1 code implementation • NeurIPS 2019 • Bo Dai, Zhen Liu, Hanjun Dai, Niao He, Arthur Gretton, Le Song, Dale Schuurmans
We present an efficient algorithm for maximum likelihood estimation (MLE) of exponential family models, with a general parametrization of the energy function that includes neural networks.
1 code implementation • 20 Nov 2018 • Li Wenliang, Danica J. Sutherland, Heiko Strathmann, Arthur Gretton
The kernel exponential family is a rich class of distributions, which can be fit efficiently and with statistical guarantees by score matching.
1 code implementation • 6 Nov 2018 • Bo Dai, Hanjun Dai, Arthur Gretton, Le Song, Dale Schuurmans, Niao He
We investigate penalized maximum log-likelihood estimation for exponential family distributions whose natural parameter resides in a reproducing kernel Hilbert space.
3 code implementations • NeurIPS 2018 • Wittawat Jitkrittum, Heishiro Kanagawa, Patsorn Sangkloy, James Hays, Bernhard Schölkopf, Arthur Gretton
Given two candidate models, and a set of target observations, we address the problem of measuring the relative goodness of fit of the two models.
no code implementations • 1 Jul 2018 • Maria Lomeli, Mark Rowland, Arthur Gretton, Zoubin Ghahramani
We also present a novel variance reduction scheme based on an antithetic variate construction between permutations to obtain an improved estimator for the Mallows kernel.
1 code implementation • NeurIPS 2018 • Michael Arbel, Danica J. Sutherland, Mikołaj Bińkowski, Arthur Gretton
We propose a principled method for gradient-based regularization of the critic of GAN-like models trained by adversarially optimizing the kernel of a Maximum Mean Discrepancy (MMD).
Ranked #53 on
Image Generation
on CIFAR-10
3 code implementations • NeurIPS 2018 • Iryna Korshunova, Jonas Degrave, Ferenc Huszár, Yarin Gal, Arthur Gretton, Joni Dambre
We present a novel model architecture which leverages deep learning tools to perform exact Bayesian inference on sets of high dimensional, complex observations.
7 code implementations • ICLR 2018 • Mikołaj Bińkowski, Danica J. Sutherland, Michael Arbel, Arthur Gretton
We investigate the training and performance of generative adversarial networks using the Maximum Mean Discrepancy (MMD) as critic, termed MMD GANs.
1 code implementation • 15 Nov 2017 • Michael Arbel, Arthur Gretton
A nonparametric family of conditional distributions is introduced, which generalizes conditional exponential families using functional parameters in a suitable RKHS.
1 code implementation • 23 May 2017 • Danica J. Sutherland, Heiko Strathmann, Michael Arbel, Arthur Gretton
We propose a fast method with statistical guarantees for learning an exponential family density model where the natural parameter is in a reproducing kernel Hilbert space, and may be infinite-dimensional.
4 code implementations • NeurIPS 2017 • Wittawat Jitkrittum, Wenkai Xu, Zoltan Szabo, Kenji Fukumizu, Arthur Gretton
We propose a novel adaptive test of goodness-of-fit, with computational cost linear in the number of samples.
no code implementations • 17 Nov 2016 • Wacha Bounliphone, Eugene Belilovsky, Arthur Tenenhaus, Ioannis Antonoglou, Arthur Gretton, Matthew B. Blashcko
The second test, called the relative test of similarity, is use to determine which of the two samples from arbitrary distributions is significantly closer to a reference sample of interest and the relative measure of similarity is based on the Maximum Mean Discrepancy (MMD).
1 code implementation • 14 Nov 2016 • Danica J. Sutherland, Hsiao-Yu Tung, Heiko Strathmann, Soumyajit De, Aaditya Ramdas, Alex Smola, Arthur Gretton
In this context, the MMD may be used in two roles: first, as a discriminator, either directly on the samples, or on features of the samples.
1 code implementation • ICML 2017 • Wittawat Jitkrittum, Zoltan Szabo, Arthur Gretton
The dependence measure is the difference between analytic embeddings of the joint distribution and the product of the marginals, evaluated at a finite set of locations (features).
1 code implementation • 25 Jun 2016 • Qinyi Zhang, Sarah Filippi, Arthur Gretton, Dino Sejdinovic
Representations of probability measures in reproducing kernel Hilbert spaces provide a flexible framework for fully nonparametric hypothesis tests of independence, which can capture any type of departure from independence, including nonlinear associations and multivariate interactions.
1 code implementation • NeurIPS 2016 • Wittawat Jitkrittum, Zoltan Szabo, Kacper Chwialkowski, Arthur Gretton
Two semimetrics on probability distributions are proposed, given as the sum of differences of expectations of analytic functions evaluated at spatial or frequency locations (i. e, features).
1 code implementation • 2 May 2016 • Sebastian Weichwald, Arthur Gretton, Bernhard Schölkopf, Moritz Grosse-Wentrup
Causal inference concerns the identification of cause-effect relationships between variables.
no code implementations • 2 Mar 2016 • Paul K. Rubenstein, Kacper P. Chwialkowski, Arthur Gretton
The main contributions of this paper are twofold: first, we prove that the Lancaster statistic satisfies the conditions required to estimate the quantiles of the null distribution using the wild bootstrap; second, the manner in which this is proved is novel, simpler than existing methods, and can further be applied to other statistics.
1 code implementation • 9 Feb 2016 • Kacper Chwialkowski, Heiko Strathmann, Arthur Gretton
Our test statistic is based on an empirical estimate of this divergence, taking the form of a V-statistic in terms of the log gradients of the target density and the kernel.
1 code implementation • 3 Dec 2015 • Sebastian Weichwald, Moritz Grosse-Wentrup, Arthur Gretton
Causal inference concerns the identification of cause-effect relationships between variables, e. g. establishing whether a stimulus affects activity in a certain brain region.
1 code implementation • 14 Nov 2015 • Wacha Bounliphone, Eugene Belilovsky, Matthew B. Blaschko, Ioannis Antonoglou, Arthur Gretton
Probabilistic generative models provide a powerful framework for representing data that avoids the expense of manual annotation typically needed by discriminative approaches.
1 code implementation • NeurIPS 2015 • Kacper Chwialkowski, Aaditya Ramdas, Dino Sejdinovic, Arthur Gretton
The new tests are consistent against a larger class of alternatives than the previous linear-time tests based on the (non-smoothed) empirical characteristic functions, while being much faster than the current state-of-the-art quadratic-time kernel-based or energy distance-based tests.
2 code implementations • NeurIPS 2015 • Heiko Strathmann, Dino Sejdinovic, Samuel Livingstone, Zoltan Szabo, Arthur Gretton
We propose Kernel Hamiltonian Monte Carlo (KMC), a gradient-free adaptive MCMC algorithm based on Hamiltonian Monte Carlo (HMC).
1 code implementation • 9 Mar 2015 • Wittawat Jitkrittum, Arthur Gretton, Nicolas Heess, S. M. Ali Eslami, Balaji Lakshminarayanan, Dino Sejdinovic, Zoltán Szabó
We propose an efficient nonparametric strategy for learning a message operator in expectation propagation (EP), which takes as input the set of incoming messages to a factor node, and produces an outgoing message as output.
no code implementations • 25 Jan 2015 • Arthur Gretton
The HSIC is defined as the distance between the embedding of the joint distribution, and the embedding of the product of the marginals, in a Reproducing Kernel Hilbert Space (RKHS).
no code implementations • 2 Jan 2015 • Wittawat Jitkrittum, Arthur Gretton, Nicolas Heess
We propose to learn a kernel-based message operator which takes as input all expectation propagation (EP) incoming messages to a factor node and produces an outgoing message.
no code implementations • 10 Dec 2014 • Jacquelyn A. Shelton, Jan Gasthaus, Zhenwen Dai, Joerg Luecke, Arthur Gretton
We propose a nonparametric procedure to achieve fast inference in generative graphical models when the number of latent states is very large.
1 code implementation • 8 Nov 2014 • Zoltan Szabo, Bharath Sriperumbudur, Barnabas Poczos, Arthur Gretton
In this paper, we study a simple, analytically computable, ridge regression-based alternative to distribution regression, where we embed the distributions to a reproducing kernel Hilbert space, and learn the regressor from the embeddings to the outputs.
no code implementations • 18 Sep 2014 • Yu Nishiyama, Motonobu Kanagawa, Arthur Gretton, Kenji Fukumizu
Our contribution in this paper is to introduce a novel approach, termed the {\em model-based kernel sum rule} (Mb-KSR), to combine a probabilistic model and kernel Bayesian inference.
1 code implementation • NeurIPS 2014 • Kacper Chwialkowski, Dino Sejdinovic, Arthur Gretton
A wild bootstrap method for nonparametric hypothesis tests based on kernel distribution embeddings is proposed.
1 code implementation • 15 Jun 2014 • Wacha Bounliphone, Arthur Gretton, Arthur Tenenhaus, Matthew Blaschko
Such a test enables us to determine whether one source variable is significantly more dependent on a first target variable or a second.
no code implementations • 21 May 2014 • Krikamol Muandet, Bharath Sriperumbudur, Kenji Fukumizu, Arthur Gretton, Bernhard Schölkopf
A mean function in a reproducing kernel Hilbert space (RKHS), or a kernel mean, is central to kernel methods in that it is used by many classical algorithms such as kernel principal component analysis, and it also forms the core inference step of modern kernel methods that rely on embedding probability distributions in RKHSs.
1 code implementation • 18 Feb 2014 • Kacper Chwialkowski, Arthur Gretton
A new non parametric approach to the problem of testing the independence of two random process is developed.
no code implementations • 7 Feb 2014 • Zoltan Szabo, Arthur Gretton, Barnabas Poczos, Bharath Sriperumbudur
To the best of our knowledge, the only existing method with consistency guarantees for distribution regression requires kernel density estimation as an intermediate step (which suffers from slow convergence issues in high dimensions), and the domain of the distributions to be compact Euclidean.
no code implementations • 17 Dec 2013 • Motonobu Kanagawa, Yu Nishiyama, Arthur Gretton, Kenji Fukumizu
In particular, the sampling and resampling procedures are novel in being expressed using kernel mean embeddings, so we theoretically analyze their behaviors.
1 code implementation • 12 Dec 2013 • Bharath Sriperumbudur, Kenji Fukumizu, Arthur Gretton, Aapo Hyvärinen, Revant Kumar
When $p_0\in\mathcal{P}$, we show that the proposed estimator is consistent, and provide a convergence rate of $n^{-\min\left\{\frac{2}{3},\frac{2\beta+1}{2\beta+2}\right\}}$ in Fisher divergence under the smoothness assumption that $\log p_0\in\mathcal{R}(C^\beta)$ for some $\beta\ge 0$, where $C$ is a certain Hilbert-Schmidt operator on $H$ and $\mathcal{R}(C^\beta)$ denotes the image of $C^\beta$.
no code implementations • NeurIPS 2013 • Wojciech Zaremba, Arthur Gretton, Matthew Blaschko
We propose a family of maximum mean discrepancy (MMD) kernel two-sample tests that have low sample complexity and are consistent.
no code implementations • 26 Sep 2013 • Byron Boots, Geoffrey Gordon, Arthur Gretton
The essence is to represent the state as a nonparametric conditional embedding operator in a Reproducing Kernel Hilbert Space (RKHS) and leverage recent work in kernel methods to estimate, predict, and update the representation.
1 code implementation • 19 Jul 2013 • Dino Sejdinovic, Heiko Strathmann, Maria Lomeli Garcia, Christophe Andrieu, Arthur Gretton
A Kernel Adaptive Metropolis-Hastings algorithm is introduced, for the purpose of sampling from a target distribution with strongly nonlinear support.
1 code implementation • 8 Jul 2013 • Wojciech Zaremba, Arthur Gretton, Matthew Blaschko
A family of maximum mean discrepancy (MMD) kernel two-sample tests is introduced.
no code implementations • NeurIPS 2013 • Dino Sejdinovic, Arthur Gretton, Wicher Bergsma
We introduce kernel nonparametric tests for Lancaster three-variable interaction and for total independence, using embeddings of signed measures into a reproducing kernel Hilbert space.
no code implementations • 4 Jun 2013 • Krikamol Muandet, Kenji Fukumizu, Bharath Sriperumbudur, Arthur Gretton, Bernhard Schölkopf
A mean function in reproducing kernel Hilbert space, or a kernel mean, is an important part of many applications ranging from kernel principal component analysis to Hilbert-space embedding of distributions.
no code implementations • NeurIPS 2012 • Arthur Gretton, Dino Sejdinovic, Heiko Strathmann, Sivaraman Balakrishnan, Massimiliano Pontil, Kenji Fukumizu, Bharath K. Sriperumbudur
A means of parameter selection for the two-sample test based on the MMD is proposed.
no code implementations • 25 Jul 2012 • Dino Sejdinovic, Bharath Sriperumbudur, Arthur Gretton, Kenji Fukumizu
We provide a unifying framework linking two classes of statistics used in two-sample and independence testing: on the one hand, the energy distances and distance covariances from the statistics literature; on the other, maximum mean discrepancies (MMD), that is, distances between embeddings of distributions to reproducing kernel Hilbert spaces (RKHS), as established in machine learning.
no code implementations • 18 Jun 2012 • Steffen Grunewalder, Guy Lever, Luca Baldassarre, Massi Pontil, Arthur Gretton
For policy optimisation we compare with least-squares policy iteration where a Gaussian process is used for value function estimation.
no code implementations • NeurIPS 2011 • Kenji Fukumizu, Le Song, Arthur Gretton
A nonparametric kernel-based method for realizing Bayes' rule is proposed, based on kernel representations of probabilities in reproducing kernel Hilbert spaces.
no code implementations • NeurIPS 2009 • Arthur Gretton, Kenji Fukumizu, Zaïd Harchaoui, Bharath K. Sriperumbudur
A kernel embedding of probability distributions into reproducing kernel Hilbert spaces (RKHS) has recently been proposed, which allows the comparison of two probability measures P and Q based on the distance between their respective embeddings: for a sufficiently rich RKHS, this distance is zero if and only if P and Q coincide.
no code implementations • NeurIPS 2009 • Arthur Gretton, Peter Spirtes, Robert E. Tillman
This results in a more computationally efficient approach that is useful for arbitrary distributions even when additive noise models are invertible.
no code implementations • 30 Jul 2009 • Bharath K. Sriperumbudur, Arthur Gretton, Kenji Fukumizu, Bernhard Schölkopf, Gert R. G. Lanckriet
First, we consider the question of determining the conditions on the kernel $k$ for which $\gamma_k$ is a metric: such $k$ are denoted {\em characteristic kernels}.
no code implementations • 18 Jan 2009 • Bharath K. Sriperumbudur, Kenji Fukumizu, Arthur Gretton, Bernhard Schölkopf, Gert R. G. Lanckriet
First, to understand the relation between IPMs and $\phi$-divergences, the necessary and sufficient conditions under which these classes intersect are derived: the total variation distance is shown to be the only non-trivial $\phi$-divergence that is also an IPM.
Information Theory Information Theory
no code implementations • NeurIPS 2008 • Matthew Blaschko, Arthur Gretton
We introduce a family of unsupervised algorithms, numerical taxonomy clustering, to simultaneously cluster data, and to learn a taxonomy that encodes the relationship between the clusters.
no code implementations • NeurIPS 2008 • Kenji Fukumizu, Arthur Gretton, Bernhard Schölkopf, Bharath K. Sriperumbudur
Embeddings of random variables in reproducing kernel Hilbert spaces (RKHSs) may be used to conduct statistical inference based on higher order moments.
no code implementations • NeurIPS 2008 • Xinhua Zhang, Le Song, Arthur Gretton, Alex J. Smola
Many machine learning algorithms can be formulated in the framework of statistical independence such as the Hilbert Schmidt Independence Criterion.