Search Results for author: Arthur Gretton

Found 105 papers, 55 papers with code

Demystifying MMD GANs

7 code implementations • ICLR 2018 • Mikołaj Bińkowski, Danica J. Sutherland, Michael Arbel, Arthur Gretton

We investigate the training and performance of generative adversarial networks using the Maximum Mean Discrepancy (MMD) as critic, termed MMD GANs.

3,121

Paper
Code

Generative Models and Model Criticism via Optimized Maximum Mean Discrepancy

1 code implementation • 14 Nov 2016 • Danica J. Sutherland, Hsiao-Yu Tung, Heiko Strathmann, Soumyajit De, Aaditya Ramdas, Alex Smola, Arthur Gretton

In this context, the MMD may be used in two roles: first, as a discriminator, either directly on the samples, or on features of the samples.

204

Paper
Code

A Linear-Time Kernel Goodness-of-Fit Test

4 code implementations • NeurIPS 2017 • Wittawat Jitkrittum, Wenkai Xu, Zoltan Szabo, Kenji Fukumizu, Arthur Gretton

We propose a novel adaptive test of goodness-of-fit, with computational cost linear in the number of samples.

Paper
Code

Interpretable Distribution Features with Maximum Testing Power

1 code implementation • NeurIPS 2016 • Wittawat Jitkrittum, Zoltan Szabo, Kacper Chwialkowski, Arthur Gretton

Two semimetrics on probability distributions are proposed, given as the sum of differences of expectations of analytic functions evaluated at spatial or frequency locations (i. e, features).

Paper
Code

Generalized Energy Based Models

1 code implementation • ICLR 2021 • Michael Arbel, Liang Zhou, Arthur Gretton

We show that both training stages are well-defined: the energy is learned by maximising a generalized likelihood, and the resulting energy-based loss provides informative gradients for learning the base.

Image Generation

Paper
Code

Learning Deep Kernels for Non-Parametric Two-Sample Tests

1 code implementation • ICML 2020 • Feng Liu, Wenkai Xu, Jie Lu, Guangquan Zhang, Arthur Gretton, Danica J. Sutherland

We propose a class of kernel-based two-sample tests, which aim to determine whether two sets of samples are drawn from the same distribution.

Ranked #1 on Two-sample testing on HIGGS Data Set

Two-sample testing Vocal Bursts Valence Prediction

Paper
Code

BRUNO: A Deep Recurrent Model for Exchangeable Data

3 code implementations • NeurIPS 2018 • Iryna Korshunova, Jonas Degrave, Ferenc Huszár, Yarin Gal, Arthur Gretton, Joni Dambre

We present a novel model architecture which leverages deep learning tools to perform exact Bayesian inference on sets of high dimensional, complex observations.

Anomaly Detection Bayesian Inference +2

Paper
Code

Self-Supervised Learning with Kernel Dependence Maximization

1 code implementation • NeurIPS 2021 • Yazhe Li, Roman Pogodin, Danica J. Sutherland, Arthur Gretton

We approach self-supervised learning of image representations from a statistical dependence perspective, proposing Self-Supervised Learning with the Hilbert-Schmidt Independence Criterion (SSL-HSIC).

Depth Estimation Object Recognition +2

Paper
Code

On gradient regularizers for MMD GANs

1 code implementation • NeurIPS 2018 • Michael Arbel, Danica J. Sutherland, Mikołaj Bińkowski, Arthur Gretton

We propose a principled method for gradient-based regularization of the critic of GAN-like models trained by adversarially optimizing the kernel of a Maximum Mean Discrepancy (MMD).

Ranked #128 on Image Generation on CIFAR-10

Image Generation

Paper
Code

Kernel Adaptive Metropolis-Hastings

1 code implementation • 19 Jul 2013 • Dino Sejdinovic, Heiko Strathmann, Maria Lomeli Garcia, Christophe Andrieu, Arthur Gretton

A Kernel Adaptive Metropolis-Hastings algorithm is introduced, for the purpose of sampling from a target distribution with strongly nonlinear support.

Paper
Code

A Test of Relative Similarity For Model Selection in Generative Models

1 code implementation • 14 Nov 2015 • Wacha Bounliphone, Eugene Belilovsky, Matthew B. Blaschko, Ioannis Antonoglou, Arthur Gretton

Probabilistic generative models provide a powerful framework for representing data that avoids the expense of manual annotation typically needed by discriminative approaches.

Model Selection

Paper
Code

Learning deep kernels for exponential family densities

1 code implementation • 20 Nov 2018 • Li Wenliang, Danica J. Sutherland, Heiko Strathmann, Arthur Gretton

The kernel exponential family is a rich class of distributions, which can be fit efficiently and with statistical guarantees by score matching.

Paper
Code

Gradient-free Hamiltonian Monte Carlo with Efficient Kernel Exponential Families

2 code implementations • NeurIPS 2015 • Heiko Strathmann, Dino Sejdinovic, Samuel Livingstone, Zoltan Szabo, Arthur Gretton

We propose Kernel Hamiltonian Monte Carlo (KMC), a gradient-free adaptive MCMC algorithm based on Hamiltonian Monte Carlo (HMC).

Bayesian Inference

Paper
Code

A Kernel Test of Goodness of Fit

1 code implementation • 9 Feb 2016 • Kacper Chwialkowski, Heiko Strathmann, Arthur Gretton

Our test statistic is based on an empirical estimate of this divergence, taking the form of a V-statistic in terms of the log gradients of the target density and the kernel.

Density Estimation

Paper
Code

Efficient Conditionally Invariant Representation Learning

1 code implementation • 16 Dec 2022 • Roman Pogodin, Namrata Deka, Yazhe Li, Danica J. Sutherland, Victor Veitch, Arthur Gretton

The procedure requires just a single ridge regression from $Y$ to kernelized features of $Z$, which can be done in advance.

Fairness regression +1

Paper
Code

Kernel-Based Just-In-Time Learning for Passing Expectation Propagation Messages

1 code implementation • 9 Mar 2015 • Wittawat Jitkrittum, Arthur Gretton, Nicolas Heess, S. M. Ali Eslami, Balaji Lakshminarayanan, Dino Sejdinovic, Zoltán Szabó

We propose an efficient nonparametric strategy for learning a message operator in expectation propagation (EP), which takes as input the set of incoming messages to a factor node, and produces an outgoing message as output.

regression

Paper
Code

Informative Features for Model Comparison

3 code implementations • NeurIPS 2018 • Wittawat Jitkrittum, Heishiro Kanagawa, Patsorn Sangkloy, James Hays, Bernhard Schölkopf, Arthur Gretton

Given two candidate models, and a set of target observations, we address the problem of measuring the relative goodness of fit of the two models.

Paper
Code

Density Estimation in Infinite Dimensional Exponential Families

1 code implementation • 12 Dec 2013 • Bharath Sriperumbudur, Kenji Fukumizu, Arthur Gretton, Aapo Hyvärinen, Revant Kumar

When $p_0\in\mathcal{P}$, we show that the proposed estimator is consistent, and provide a convergence rate of $n^{-\min\left\{\frac{2}{3},\frac{2\beta+1}{2\beta+2}\right\}}$ in Fisher divergence under the smoothness assumption that $\log p_0\in\mathcal{R}(C^\beta)$ for some $\beta\ge 0$, where $C$ is a certain Hilbert-Schmidt operator on $H$ and $\mathcal{R}(C^\beta)$ denotes the image of $C^\beta$.

Density Estimation

Paper
Code

An Adaptive Test of Independence with Analytic Kernel Embeddings

1 code implementation • ICML 2017 • Wittawat Jitkrittum, Zoltan Szabo, Arthur Gretton

The dependence measure is the difference between analytic embeddings of the joint distribution and the product of the marginals, evaluated at a finite set of locations (features).

Paper
Code

A Kernel Independence Test for Random Processes

1 code implementation • 18 Feb 2014 • Kacper Chwialkowski, Arthur Gretton

A new non parametric approach to the problem of testing the independence of two random process is developed.

Paper
Code

B-tests: Low Variance Kernel Two-Sample Tests

1 code implementation • 8 Jul 2013 • Wojciech Zaremba, Arthur Gretton, Matthew Blaschko

A family of maximum mean discrepancy (MMD) kernel two-sample tests is introduced.

Two-sample testing Vocal Bursts Valence Prediction

Paper
Code

MMD Aggregated Two-Sample Test

3 code implementations • NeurIPS 2023 • Antonin Schrab, Ilmun Kim, Mélisande Albert, Béatrice Laurent, Benjamin Guedj, Arthur Gretton

In practice, this parameter is unknown and, hence, the optimal MMD test with this particular kernel cannot be used.

Translation Two-sample testing +1

Paper
Code

Large-Scale Kernel Methods for Independence Testing

1 code implementation • 25 Jun 2016 • Qinyi Zhang, Sarah Filippi, Arthur Gretton, Dino Sejdinovic

Representations of probability measures in reproducing kernel Hilbert spaces provide a flexible framework for fully nonparametric hypothesis tests of independence, which can capture any type of departure from independence, including nonlinear associations and multivariate interactions.

Computational Efficiency

Paper
Code

Kernelized Wasserstein Natural Gradient

1 code implementation • ICLR 2020 • Michael Arbel, Arthur Gretton, Wuchen Li, Guido Montufar

Many machine learning problems can be expressed as the optimization of some cost functional over a parametric family of probability distributions.

Paper
Code

Exponential Family Estimation via Adversarial Dynamics Embedding

1 code implementation • NeurIPS 2019 • Bo Dai, Zhen Liu, Hanjun Dai, Niao He, Arthur Gretton, Le Song, Dale Schuurmans

We present an efficient algorithm for maximum likelihood estimation (MLE) of exponential family models, with a general parametrization of the energy function that includes neural networks.

Paper
Code

Learning Deep Features in Instrumental Variable Regression

1 code implementation • ICLR 2021 • Liyuan Xu, Yutian Chen, Siddarth Srinivasan, Nando de Freitas, Arnaud Doucet, Arthur Gretton

We propose a novel method, deep feature instrumental variable regression (DFIV), to address the case where relations between instruments, treatments, and outcomes may be nonlinear.

regression

Paper
Code

Efficient Wasserstein Natural Gradients for Reinforcement Learning

1 code implementation • ICLR 2021 • Ted Moskovitz, Michael Arbel, Ferenc Huszar, Arthur Gretton

A novel optimization approach is proposed for application to policy gradient methods and evolution strategies for reinforcement learning (RL).

Policy Gradient Methods reinforcement-learning +1

Paper
Code

Recovery of non-linear cause-effect relationships from linearly mixed neuroimaging data

1 code implementation • 2 May 2016 • Sebastian Weichwald, Arthur Gretton, Bernhard Schölkopf, Moritz Grosse-Wentrup

Causal inference concerns the identification of cause-effect relationships between variables.

Causal Inference

Paper
Code

MERLiN: Mixture Effect Recovery in Linear Networks

1 code implementation • 3 Dec 2015 • Sebastian Weichwald, Moritz Grosse-Wentrup, Arthur Gretton

Causal inference concerns the identification of cause-effect relationships between variables, e. g. establishing whether a stimulus affects activity in a certain brain region.

Causal Inference EEG

Paper
Code

Kernel Instrumental Variable Regression

1 code implementation • NeurIPS 2019 • Rahul Singh, Maneesh Sahani, Arthur Gretton

Instrumental variable (IV) regression is a strategy for learning causal relationships in observational data.

regression

Paper
Code

Fast Two-Sample Testing with Analytic Representations of Probability Measures

1 code implementation • NeurIPS 2015 • Kacper Chwialkowski, Aaditya Ramdas, Dino Sejdinovic, Arthur Gretton

The new tests are consistent against a larger class of alternatives than the previous linear-time tests based on the (non-smoothed) empirical characteristic functions, while being much faster than the current state-of-the-art quadratic-time kernel-based or energy distance-based tests.

Two-sample testing Vocal Bursts Valence Prediction

Paper
Code

Maximum Mean Discrepancy Gradient Flow

1 code implementation • NeurIPS 2019 • Michael Arbel, Anna Korba, Adil Salim, Arthur Gretton

We construct a Wasserstein gradient flow of the maximum mean discrepancy (MMD) and study its convergence properties.

Paper
Code

KALE Flow: A Relaxed KL Gradient Flow for Probabilities with Disjoint Support

1 code implementation • NeurIPS 2021 • Pierre Glaser, Michael Arbel, Arthur Gretton

We study the gradient flow for a relaxed approximation to the Kullback-Leibler (KL) divergence between a moving source and a fixed target distribution.

Paper
Code

Kernel Exponential Family Estimation via Doubly Dual Embedding

1 code implementation • 6 Nov 2018 • Bo Dai, Hanjun Dai, Arthur Gretton, Le Song, Dale Schuurmans, Niao He

We investigate penalized maximum log-likelihood estimation for exponential family distributions whose natural parameter resides in a reproducing kernel Hilbert space.

Paper
Code

KSD Aggregated Goodness-of-fit Test

2 code implementations • 2 Feb 2022 • Antonin Schrab, Benjamin Guedj, Arthur Gretton

KSDAgg avoids splitting the data to perform kernel selection (which leads to a loss in test power), and rather maximises the test power over a collection of kernels.

Paper
Code

A Distributional Analogue to the Successor Representation

1 code implementation • 13 Feb 2024 • Harley Wiltzer, Jesse Farebrother, Arthur Gretton, Yunhao Tang, André Barreto, Will Dabney, Marc G. Bellemare, Mark Rowland

This paper contributes a new approach for distributional reinforcement learning which elucidates a clean separation of transition structure and reward in the learning process.

Distributional Reinforcement Learning Model-based Reinforcement Learning +1

Paper
Code

Efficient Aggregated Kernel Tests using Incomplete $U$-statistics

4 code implementations • 18 Jun 2022 • Antonin Schrab, Ilmun Kim, Benjamin Guedj, Arthur Gretton

We derive non-asymptotic uniform separation rates for MMDAggInc and HSICAggInc, and quantify exactly the trade-off between computational efficiency and the attainable rates: this result is novel for tests based on incomplete $U$-statistics, to our knowledge.

Computational Efficiency

Paper
Code

Practical Kernel Tests of Conditional Independence

1 code implementation • 20 Feb 2024 • Roman Pogodin, Antonin Schrab, Yazhe Li, Danica J. Sutherland, Arthur Gretton

We describe a data-efficient, kernel-based approach to statistical testing of conditional independence.

Paper
Code

A Wild Bootstrap for Degenerate Kernel Tests

1 code implementation • NeurIPS 2014 • Kacper Chwialkowski, Dino Sejdinovic, Arthur Gretton

A wild bootstrap method for nonparametric hypothesis tests based on kernel distribution embeddings is proposed.

Benchmarking Time Series +1

Paper
Code

Proximal Causal Learning with Kernels: Two-Stage Estimation and Moment Restriction

2 code implementations • 10 May 2021 • Afsaneh Mastouri, Yuchen Zhu, Limor Gultchin, Anna Korba, Ricardo Silva, Matt J. Kusner, Arthur Gretton, Krikamol Muandet

In particular, we provide a unifying view of two-stage and moment restriction approaches for solving this problem in a nonlinear setting.

Vocal Bursts Valence Prediction

Paper
Code

A kernel Stein test of goodness of fit for sequential models

1 code implementation • 19 Oct 2022 • Jerome Baum, Heishiro Kanagawa, Arthur Gretton

We propose a goodness-of-fit measure for probability densities modeling observations with varying dimensionality, such as text documents of differing lengths or variable-length sequences.

Paper
Code

Kernel Conditional Exponential Family

1 code implementation • 15 Nov 2017 • Michael Arbel, Arthur Gretton

A nonparametric family of conditional distributions is introduced, which generalizes conditional exponential families using functional parameters in a suitable RKHS.

Paper
Code

A low variance consistent test of relative dependency

1 code implementation • 15 Jun 2014 • Wacha Bounliphone, Arthur Gretton, Arthur Tenenhaus, Matthew Blaschko

Such a test enables us to determine whether one source variable is significantly more dependent on a first target variable or a second.

Paper
Code

On Instrumental Variable Regression for Deep Offline Policy Evaluation

1 code implementation • 21 May 2021 • Yutian Chen, Liyuan Xu, Caglar Gulcehre, Tom Le Paine, Arthur Gretton, Nando de Freitas, Arnaud Doucet

By applying different IV techniques to OPE, we are not only able to recover previously proposed OPE methods such as model-based techniques but also to obtain competitive new techniques.

regression Reinforcement Learning (RL)

Paper
Code

Deep Proxy Causal Learning and its Application to Confounded Bandit Policy Evaluation

1 code implementation • NeurIPS 2021 • Liyuan Xu, Heishiro Kanagawa, Arthur Gretton

Proxy causal learning (PCL) is a method for estimating the causal effect of treatments on outcomes in the presence of unobserved confounding, using proxies (structured side information) for the confounder.

Off-policy evaluation

Paper
Code

Efficient and principled score estimation with Nyström kernel exponential families

1 code implementation • 23 May 2017 • Danica J. Sutherland, Heiko Strathmann, Michael Arbel, Arthur Gretton

We propose a fast method with statistical guarantees for learning an exponential family density model where the natural parameter is in a reproducing kernel Hilbert space, and may be infinite-dimensional.

Computational Efficiency Denoising +1

Paper
Code

A kernel log-rank test of independence for right-censored data

1 code implementation • 8 Dec 2019 • Tamara Fernandez, Arthur Gretton, David Rindt, Dino Sejdinovic

We introduce a general non-parametric independence test between right-censored survival times and covariates, which may be multivariate.

Survival Analysis

Paper
Code

Composite Goodness-of-fit Tests with Kernels

1 code implementation • 19 Nov 2021 • Oscar Key, Arthur Gretton, François-Xavier Briol, Tamara Fernandez

Model misspecification can create significant challenges for the implementation of probabilistic models, and this has led to development of a range of robust methods which directly account for this issue.

Paper
Code

Exchangeable Models in Meta Reinforcement Learning

1 code implementation • ICML Workshop LifelongML 2020 • Iryna Korshunova, Jonas Degrave, Joni Dambre, Arthur Gretton, Ferenc Huszar

One recent approach to meta reinforcement learning (meta-RL) is to integrate models for task inference with models for control.

Meta Reinforcement Learning reinforcement-learning +1

Paper
Code

Distributional Bellman Operators over Mean Embeddings

1 code implementation • 9 Dec 2023 • Li Kevin Wenliang, Grégoire Delétang, Matthew Aitchison, Marcus Hutter, Anian Ruoss, Arthur Gretton, Mark Rowland

We propose a novel algorithmic framework for distributional reinforcement learning, based on learning finite-dimensional mean embeddings of return distributions.

Atari Games Distributional Reinforcement Learning +1

Paper
Code

MMD-FUSE: Learning and Combining Kernels for Two-Sample Testing Without Data Splitting

1 code implementation • NeurIPS 2023 • Felix Biggs, Antonin Schrab, Arthur Gretton

We propose novel statistics which maximise the power of a two-sample test based on the Maximum Mean Discrepancy (MMD), by adapting over the set of kernels used in defining it.

Two-sample testing

Paper
Code

Model-based Kernel Sum Rule: Kernel Bayesian Inference with Probabilistic Models

no code implementations • 18 Sep 2014 • Yu Nishiyama, Motonobu Kanagawa, Arthur Gretton, Kenji Fukumizu

Our contribution in this paper is to introduce a novel approach, termed the {\em model-based kernel sum rule} (Mb-KSR), to combine a probabilistic model and kernel Bayesian inference.

Bayesian Inference

Paper
Add Code

Fast Non-Parametric Tests of Relative Dependency and Similarity

no code implementations • 17 Nov 2016 • Wacha Bounliphone, Eugene Belilovsky, Arthur Tenenhaus, Ioannis Antonoglou, Arthur Gretton, Matthew B. Blashcko

The second test, called the relative test of similarity, is use to determine which of the two samples from arbitrary distributions is significantly closer to a reference sample of interest and the relative measure of similarity is based on the Maximum Mean Discrepancy (MMD).

Paper
Add Code

Learning Theory for Distribution Regression

1 code implementation • 8 Nov 2014 • Zoltan Szabo, Bharath Sriperumbudur, Barnabas Poczos, Arthur Gretton

In this paper, we study a simple, analytically computable, ridge regression-based alternative to distribution regression, where we embed the distributions to a reproducing kernel Hilbert space, and learn the regressor from the embeddings to the outputs.

Density Estimation Learning Theory +2

Paper
Code

GP-select: Accelerating EM using adaptive subspace preselection

no code implementations • 10 Dec 2014 • Jacquelyn A. Shelton, Jan Gasthaus, Zhenwen Dai, Joerg Luecke, Arthur Gretton

We propose a nonparametric procedure to achieve fast inference in generative graphical models when the number of latent states is very large.

Object Localization

Paper
Add Code

A Kernel Test for Three-Variable Interactions with Random Processes

no code implementations • 2 Mar 2016 • Paul K. Rubenstein, Kacper P. Chwialkowski, Arthur Gretton

The main contributions of this paper are twofold: first, we prove that the Lancaster statistic satisfies the conditions required to estimate the quantiles of the null distribution using the wild bootstrap; second, the manner in which this is proved is novel, simpler than existing methods, and can further be applied to other statistics.

Paper
Add Code

Kernel Mean Shrinkage Estimators

no code implementations • 21 May 2014 • Krikamol Muandet, Bharath Sriperumbudur, Kenji Fukumizu, Arthur Gretton, Bernhard Schölkopf

A mean function in a reproducing kernel Hilbert space (RKHS), or a kernel mean, is central to kernel methods in that it is used by many classical algorithms such as kernel principal component analysis, and it also forms the core inference step of modern kernel methods that rely on embedding probability distributions in RKHSs.

Paper
Add Code

Filtering with State-Observation Examples via Kernel Monte Carlo Filter

no code implementations • 17 Dec 2013 • Motonobu Kanagawa, Yu Nishiyama, Arthur Gretton, Kenji Fukumizu

In particular, the sampling and resampling procedures are novel in being expressed using kernel mean embeddings, so we theoretically analyze their behaviors.

Paper
Add Code

Two-stage Sampled Learning Theory on Distributions

no code implementations • 7 Feb 2014 • Zoltan Szabo, Arthur Gretton, Barnabas Poczos, Bharath Sriperumbudur

To the best of our knowledge, the only existing method with consistency guarantees for distribution regression requires kernel density estimation as an intermediate step (which suffers from slow convergence issues in high dimensions), and the domain of the distributions to be compact Euclidean.

Density Estimation Learning Theory +3

Paper
Add Code

A simpler condition for consistency of a kernel independence test

no code implementations • 25 Jan 2015 • Arthur Gretton

The HSIC is defined as the distance between the embedding of the joint distribution, and the embedding of the product of the marginals, in a Reproducing Kernel Hilbert Space (RKHS).

Paper
Add Code

Passing Expectation Propagation Messages with Kernel Methods

no code implementations • 2 Jan 2015 • Wittawat Jitkrittum, Arthur Gretton, Nicolas Heess

We propose to learn a kernel-based message operator which takes as input all expectation propagation (EP) incoming messages to a factor node and produces an outgoing message.

Paper
Add Code

Equivalence of distance-based and RKHS-based statistics in hypothesis testing

no code implementations • 25 Jul 2012 • Dino Sejdinovic, Bharath Sriperumbudur, Arthur Gretton, Kenji Fukumizu

We provide a unifying framework linking two classes of statistics used in two-sample and independence testing: on the one hand, the energy distances and distance covariances from the statistics literature; on the other, maximum mean discrepancies (MMD), that is, distances between embeddings of distributions to reproducing kernel Hilbert spaces (RKHS), as established in machine learning.

Two-sample testing

Paper
Add Code

Hilbert Space Embeddings of Predictive State Representations

no code implementations • 26 Sep 2013 • Byron Boots, Geoffrey Gordon, Arthur Gretton

The essence is to represent the state as a nonparametric conditional embedding operator in a Reproducing Kernel Hilbert Space (RKHS) and leverage recent work in kernel methods to estimate, predict, and update the representation.

Paper
Add Code

A Kernel Test for Three-Variable Interactions

no code implementations • NeurIPS 2013 • Dino Sejdinovic, Arthur Gretton, Wicher Bergsma

We introduce kernel nonparametric tests for Lancaster three-variable interaction and for total independence, using embeddings of signed measures into a reproducing kernel Hilbert space.

Paper
Add Code

Kernel Mean Estimation and Stein's Effect

no code implementations • 4 Jun 2013 • Krikamol Muandet, Kenji Fukumizu, Bharath Sriperumbudur, Arthur Gretton, Bernhard Schölkopf

A mean function in reproducing kernel Hilbert space, or a kernel mean, is an important part of many applications ranging from kernel principal component analysis to Hilbert-space embedding of distributions.

Paper
Add Code

Hilbert space embeddings and metrics on probability measures

no code implementations • 30 Jul 2009 • Bharath K. Sriperumbudur, Arthur Gretton, Kenji Fukumizu, Bernhard Schölkopf, Gert R. G. Lanckriet

First, we consider the question of determining the conditions on the kernel $k$ for which $\gamma_k$ is a metric: such $k$ are denoted {\em characteristic kernels}.

Dimensionality Reduction

Paper
Add Code

Antithetic and Monte Carlo kernel estimators for partial rankings

no code implementations • 1 Jul 2018 • Maria Lomeli, Mark Rowland, Arthur Gretton, Zoubin Ghahramani

We also present a novel variance reduction scheme based on an antithetic variate construction between permutations to obtain an improved estimator for the Mallows kernel.

Multi-Object Tracking Recommendation Systems

Paper
Add Code

On integral probability metrics, φ-divergences and binary classification

no code implementations • 18 Jan 2009 • Bharath K. Sriperumbudur, Kenji Fukumizu, Arthur Gretton, Bernhard Schölkopf, Gert R. G. Lanckriet

First, to understand the relation between IPMs and $\phi$-divergences, the necessary and sufficient conditions under which these classes intersect are derived: the total variation distance is shown to be the only non-trivial $\phi$-divergence that is also an IPM.

Information Theory Information Theory

Paper
Add Code

B-test: A Non-parametric, Low Variance Kernel Two-sample Test

no code implementations • NeurIPS 2013 • Wojciech Zaremba, Arthur Gretton, Matthew Blaschko

We propose a family of maximum mean discrepancy (MMD) kernel two-sample tests that have low sample complexity and are consistent.

Vocal Bursts Valence Prediction

Paper
Add Code

Optimal kernel choice for large-scale two-sample tests

no code implementations • NeurIPS 2012 • Arthur Gretton, Dino Sejdinovic, Heiko Strathmann, Sivaraman Balakrishnan, Massimiliano Pontil, Kenji Fukumizu, Bharath K. Sriperumbudur

A means of parameter selection for the two-sample test based on the MMD is proposed.

Vocal Bursts Valence Prediction

Paper
Add Code

Kernel Bayes' Rule

no code implementations • NeurIPS 2011 • Kenji Fukumizu, Le Song, Arthur Gretton

A nonparametric kernel-based method for realizing Bayes' rule is proposed, based on kernel representations of probabilities in reproducing kernel Hilbert spaces.

Bayesian Inference

Paper
Add Code

A Fast, Consistent Kernel Two-Sample Test

no code implementations • NeurIPS 2009 • Arthur Gretton, Kenji Fukumizu, Zaïd Harchaoui, Bharath K. Sriperumbudur

A kernel embedding of probability distributions into reproducing kernel Hilbert spaces (RKHS) has recently been proposed, which allows the comparison of two probability measures P and Q based on the distance between their respective embeddings: for a sufficiently rich RKHS, this distance is zero if and only if P and Q coincide.

Vocal Bursts Valence Prediction

Paper
Add Code

Nonlinear directed acyclic structure learning with weakly additive noise models

no code implementations • NeurIPS 2009 • Arthur Gretton, Peter Spirtes, Robert E. Tillman

This results in a more computationally efficient approach that is useful for arbitrary distributions even when additive noise models are invertible.

Paper
Add Code

Learning Taxonomies by Dependence Maximization

no code implementations • NeurIPS 2008 • Matthew Blaschko, Arthur Gretton

We introduce a family of unsupervised algorithms, numerical taxonomy clustering, to simultaneously cluster data, and to learn a taxonomy that encodes the relationship between the clusters.

Clustering

Paper
Add Code

Characteristic Kernels on Groups and Semigroups

no code implementations • NeurIPS 2008 • Kenji Fukumizu, Arthur Gretton, Bernhard Schölkopf, Bharath K. Sriperumbudur

Embeddings of random variables in reproducing kernel Hilbert spaces (RKHSs) may be used to conduct statistical inference based on higher order moments.

Paper
Add Code

Kernel Measures of Independence for non-iid Data

no code implementations • NeurIPS 2008 • Xinhua Zhang, Le Song, Arthur Gretton, Alex J. Smola

Many machine learning algorithms can be formulated in the framework of statistical independence such as the Hilbert Schmidt Independence Criterion.

BIG-bench Machine Learning Clustering

Paper
Add Code

A Kernel Stein Test for Comparing Latent Variable Models

1 code implementation • 1 Jul 2019 • Heishiro Kanagawa, Wittawat Jitkrittum, Lester Mackey, Kenji Fukumizu, Arthur Gretton

We propose a kernel-based nonparametric test of relative goodness of fit, where the goal is to compare two models, both of which may have unobserved latent variables, such that the marginal distribution of the observed variables is intractable.

Paper
Code

Counterfactual Distribution Regression for Structured Inference

no code implementations • 20 Aug 2019 • Nicolo Colombo, Ricardo Silva, Soong M Kang, Arthur Gretton

The inference problem is how information concerning perturbations, with particular covariates such as location and time, can be generalized to predict the effect of novel perturbations.

counterfactual regression

Paper
Add Code

Modelling transition dynamics in MDPs with RKHS embeddings

no code implementations • 18 Jun 2012 • Steffen Grunewalder, Guy Lever, Luca Baldassarre, Massi Pontil, Arthur Gretton

For policy optimisation we compare with least-squares policy iteration where a Gaussian process is used for value function estimation.

Paper
Add Code

Deep Layer-wise Networks Have Closed-Form Weights

no code implementations • 15 Jun 2020 • Chieh Wu, Aria Masoomi, Arthur Gretton, Jennifer Dy

There is currently a debate within the neuroscience community over the likelihood of the brain performing backpropagation (BP).

Multi-class Classification

Paper
Add Code

A Non-Asymptotic Analysis for Stein Variational Gradient Descent

no code implementations • NeurIPS 2020 • Anna Korba, Adil Salim, Michael Arbel, Giulia Luise, Arthur Gretton

We study the Stein Variational Gradient Descent (SVGD) algorithm, which optimises a set of particles to approximate a target probability distribution $\pi\propto e^{-V}$ on $\mathbb{R}^d$.

LEMMA

Paper
Add Code

Kernelized Stein Discrepancy Tests of Goodness-of-fit for Time-to-Event Data

no code implementations • ICML 2020 • Tamara Fernandez, Nicolas Rivera, Wenkai Xu, Arthur Gretton

Survival Analysis and Reliability Theory are concerned with the analysis of time-to-event data, in which observations correspond to waiting times until an event of interest such as death from a particular disease or failure of a component in a mechanical system.

Survival Analysis

Paper
Add Code

Kernel Methods for Causal Functions: Dose, Heterogeneous, and Incremental Response Curves

no code implementations • 10 Oct 2020 • Rahul Singh, Liyuan Xu, Arthur Gretton

We propose estimators based on kernel ridge regression for nonparametric causal functions such as dose, heterogeneous, and incremental response curves.

counterfactual regression

Paper
Add Code

A Weaker Faithfulness Assumption based on Triple Interactions

no code implementations • 27 Oct 2020 • Alexander Marx, Arthur Gretton, Joris M. Mooij

One of the core assumptions in causal discovery is the faithfulness assumption, i. e., assuming that independencies found in the data are due to separations in the true causal graph.

Causal Discovery

Paper
Add Code

Kernel Dependence Network

no code implementations • 4 Nov 2020 • Chieh Wu, Aria Masoomi, Arthur Gretton, Jennifer Dy

We propose a greedy strategy to spectrally train a deep network for multi-class classification.

Multi-class Classification

Paper
Add Code

A kernel test for quasi-independence

no code implementations • NeurIPS 2020 • Tamara Fernández, Wenkai Xu, Marc Ditzhaus, Arthur Gretton

We consider settings in which the data of interest correspond to pairs of ordered times, e. g, the birth times of the first and second child, the times at which a new user creates an account and makes the first purchase on a website, and the entry and survival times of patients in a clinical trial.

Paper
Add Code

A case for new neural network smoothness constraints

no code implementations • 14 Dec 2020 • Mihaela Rosca, Theophane Weber, Arthur Gretton, Shakir Mohamed

How sensitive should machine learning models be to input changes?

Adversarial Robustness BIG-bench Machine Learning +3

Paper
Add Code

Towards an Understanding of Benign Overfitting in Neural Networks

no code implementations • 6 Jun 2021 • Zhu Li, Zhi-Hua Zhou, Arthur Gretton

Modern machine learning models often employ a huge number of parameters and are typically optimized to have zero training loss; yet surprisingly, they possess near-optimal prediction performance, contradicting classical learning theory.

Learning Theory

Paper
Add Code

Sequential Kernel Embedding for Mediated and Time-Varying Dose Response Curves

no code implementations • 6 Nov 2021 • Rahul Singh, Liyuan Xu, Arthur Gretton

We propose simple nonparametric estimators for mediated and time-varying dose response curves based on kernel ridge regression.

Causal Inference counterfactual

Paper
Add Code

A case for new neural networks smoothness constraints

no code implementations • NeurIPS Workshop ICBINB 2020 • Mihaela Rosca, Theophane Weber, Arthur Gretton, Shakir Mohamed

How sensitive should machine learning models be to input changes?

Adversarial Robustness Inductive Bias +2

Paper
Add Code

Deep Layer-wise Networks Have Closed-Form Weights

no code implementations • 1 Feb 2022 • Chieh Wu, Aria Masoomi, Arthur Gretton, Jennifer Dy

There is currently a debate within the neuroscience community over the likelihood of the brain performing backpropagation (BP).

Paper
Add Code

Importance Weighting Approach in Kernel Bayes' Rule

no code implementations • 5 Feb 2022 • Liyuan Xu, Yutian Chen, Arnaud Doucet, Arthur Gretton

We study a nonparametric approach to Bayesian computation via feature means, where the expectation of prior features is updated to yield expected kernel posterior features, based on regression from learned neural net or kernel features of the observations.

Paper
Add Code

Causal Inference with Treatment Measurement Error: A Nonparametric Instrumental Variable Approach

no code implementations • 18 Jun 2022 • Yuchen Zhu, Limor Gultchin, Arthur Gretton, Matt Kusner, Ricardo Silva

We propose a kernel-based nonparametric estimator for the causal effect when the cause is corrupted by error.

Causal Inference

Paper
Add Code

Discussion of `Multiscale Fisher's Independence Test for Multivariate Dependence'

no code implementations • 22 Jun 2022 • Antonin Schrab, Wittawat Jitkrittum, Zoltán Szabó, Dino Sejdinovic, Arthur Gretton

We discuss how MultiFIT, the Multiscale Fisher's Independence Test for Multivariate Dependence proposed by Gorsky and Ma (2022), compares to existing linear-time kernel tests based on the Hilbert-Schmidt independence criterion (HSIC).

Paper
Add Code

Optimal Rates for Regularized Conditional Mean Embedding Learning

no code implementations • 2 Aug 2022 • Zhu Li, Dimitri Meunier, Mattes Mollenhauer, Arthur Gretton

We address the misspecified setting, where the target CME is in the space of Hilbert-Schmidt operators acting from an input interpolation space between $\mathcal{H}_X$ and $L_2$, to $\mathcal{H}_Y$.

Bayesian Inference

Paper
Add Code

A Neural Mean Embedding Approach for Back-door and Front-door Adjustment

no code implementations • 12 Oct 2022 • Liyuan Xu, Arthur Gretton

We consider the estimation of average and counterfactual treatment effects, under two settings: back-door adjustment and front-door adjustment.

counterfactual Density Estimation +1

Paper
Add Code

Maximum Likelihood Learning of Unnormalized Models for Simulation-Based Inference

1 code implementation • 26 Oct 2022 • Pierre Glaser, Michael Arbel, Samo Hromadka, Arnaud Doucet, Arthur Gretton

We introduce two synthetic likelihood methods for Simulation-Based Inference (SBI), to conduct either amortized or targeted inference from experimental observations when a high-fidelity simulator is available.

Paper
Code

Controlling Moments with Kernel Stein Discrepancies

no code implementations • 10 Nov 2022 • Heishiro Kanagawa, Alessandro Barp, Arthur Gretton, Lester Mackey

Kernel Stein discrepancies (KSDs) measure the quality of a distributional approximation and can be computed even when the target density has an intractable normalizing constant.

Paper
Add Code

Adapting to Latent Subgroup Shifts via Concepts and Proxies

no code implementations • 21 Dec 2022 • Ibrahim Alabdulmohsin, Nicole Chiou, Alexander D'Amour, Arthur Gretton, Sanmi Koyejo, Matt J. Kusner, Stephen R. Pfohl, Olawale Salaudeen, Jessica Schrouff, Katherine Tsai

We show that the optimal target predictor can be non-parametrically identified with the help of concept and proxy variables available only in the source domain, and unlabeled data from the target.

Unsupervised Domain Adaptation

Paper
Add Code

Deep Hypothesis Tests Detect Clinically Relevant Subgroup Shifts in Medical Images

1 code implementation • 8 Mar 2023 • Lisa M. Koch, Christian M. Schürch, Christian F. Baumgartner, Arthur Gretton, Philipp Berens

We formulate subgroup shift detection in the framework of statistical hypothesis testing and show that recent state-of-the-art statistical tests can be effectively applied to subgroup shift detection on medical imaging data.

Paper
Code

Prediction under Latent Subgroup Shifts with High-Dimensional Observations

no code implementations • 23 Jun 2023 • William I. Walker, Arthur Gretton, Maneesh Sahani

We introduce a new approach to prediction in graphical models with latent-shift adaptation, i. e., where source and target environments differ in the distribution of an unobserved confounding latent variable.

Paper
Add Code

Nonlinear Meta-Learning Can Guarantee Faster Rates

no code implementations • 20 Jul 2023 • Dimitri Meunier, Zhu Li, Arthur Gretton, Samory Kpotufe

Many recent theoretical works on \emph{meta-learning} aim to achieve guarantees in leveraging similar representational structures from related tasks towards simplifying a target task.

Meta-Learning regression

Paper
Add Code

Kernel Single Proxy Control for Deterministic Confounding

no code implementations • 8 Aug 2023 • Liyuan Xu, Arthur Gretton

We consider the problem of causal effect estimation with an unobserved confounder, where we observe a proxy variable that is associated with the confounder.

Paper
Add Code

Towards Optimal Sobolev Norm Rates for the Vector-Valued Regularized Least-Squares Algorithm

no code implementations • 12 Dec 2023 • Zhu Li, Dimitri Meunier, Mattes Mollenhauer, Arthur Gretton

We present the first optimal rates for infinite-dimensional vector-valued ridge regression on a continuous scale of norms that interpolate between $L_2$ and the hypothesis space, which we consider as a vector-valued reproducing kernel Hilbert space.

regression

Paper
Add Code

Proxy Methods for Domain Adaptation

no code implementations • 12 Mar 2024 • Katherine Tsai, Stephen R. Pfohl, Olawale Salaudeen, Nicole Chiou, Matt J. Kusner, Alexander D'Amour, Sanmi Koyejo, Arthur Gretton

We study the problem of domain adaptation under distribution shift, where the shift is due to a change in the distribution of an unobserved, latent variable that confounds both the covariates and the labels.

Domain Adaptation

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.