Search Results for author: Arthur Gretton

Found 105 papers, 55 papers with code

Demystifying MMD GANs

7 code implementations ICLR 2018 Mikołaj Bińkowski, Danica J. Sutherland, Michael Arbel, Arthur Gretton

We investigate the training and performance of generative adversarial networks using the Maximum Mean Discrepancy (MMD) as critic, termed MMD GANs.

Generative Models and Model Criticism via Optimized Maximum Mean Discrepancy

1 code implementation14 Nov 2016 Danica J. Sutherland, Hsiao-Yu Tung, Heiko Strathmann, Soumyajit De, Aaditya Ramdas, Alex Smola, Arthur Gretton

In this context, the MMD may be used in two roles: first, as a discriminator, either directly on the samples, or on features of the samples.

A Linear-Time Kernel Goodness-of-Fit Test

4 code implementations NeurIPS 2017 Wittawat Jitkrittum, Wenkai Xu, Zoltan Szabo, Kenji Fukumizu, Arthur Gretton

We propose a novel adaptive test of goodness-of-fit, with computational cost linear in the number of samples.

Interpretable Distribution Features with Maximum Testing Power

1 code implementation NeurIPS 2016 Wittawat Jitkrittum, Zoltan Szabo, Kacper Chwialkowski, Arthur Gretton

Two semimetrics on probability distributions are proposed, given as the sum of differences of expectations of analytic functions evaluated at spatial or frequency locations (i. e, features).

Generalized Energy Based Models

1 code implementation ICLR 2021 Michael Arbel, Liang Zhou, Arthur Gretton

We show that both training stages are well-defined: the energy is learned by maximising a generalized likelihood, and the resulting energy-based loss provides informative gradients for learning the base.

Image Generation

BRUNO: A Deep Recurrent Model for Exchangeable Data

3 code implementations NeurIPS 2018 Iryna Korshunova, Jonas Degrave, Ferenc Huszár, Yarin Gal, Arthur Gretton, Joni Dambre

We present a novel model architecture which leverages deep learning tools to perform exact Bayesian inference on sets of high dimensional, complex observations.

Anomaly Detection Bayesian Inference +2

Self-Supervised Learning with Kernel Dependence Maximization

1 code implementation NeurIPS 2021 Yazhe Li, Roman Pogodin, Danica J. Sutherland, Arthur Gretton

We approach self-supervised learning of image representations from a statistical dependence perspective, proposing Self-Supervised Learning with the Hilbert-Schmidt Independence Criterion (SSL-HSIC).

Depth Estimation Object Recognition +2

On gradient regularizers for MMD GANs

1 code implementation NeurIPS 2018 Michael Arbel, Danica J. Sutherland, Mikołaj Bińkowski, Arthur Gretton

We propose a principled method for gradient-based regularization of the critic of GAN-like models trained by adversarially optimizing the kernel of a Maximum Mean Discrepancy (MMD).

Image Generation

Kernel Adaptive Metropolis-Hastings

1 code implementation19 Jul 2013 Dino Sejdinovic, Heiko Strathmann, Maria Lomeli Garcia, Christophe Andrieu, Arthur Gretton

A Kernel Adaptive Metropolis-Hastings algorithm is introduced, for the purpose of sampling from a target distribution with strongly nonlinear support.

A Test of Relative Similarity For Model Selection in Generative Models

1 code implementation14 Nov 2015 Wacha Bounliphone, Eugene Belilovsky, Matthew B. Blaschko, Ioannis Antonoglou, Arthur Gretton

Probabilistic generative models provide a powerful framework for representing data that avoids the expense of manual annotation typically needed by discriminative approaches.

Model Selection

Learning deep kernels for exponential family densities

1 code implementation20 Nov 2018 Li Wenliang, Danica J. Sutherland, Heiko Strathmann, Arthur Gretton

The kernel exponential family is a rich class of distributions, which can be fit efficiently and with statistical guarantees by score matching.

A Kernel Test of Goodness of Fit

1 code implementation9 Feb 2016 Kacper Chwialkowski, Heiko Strathmann, Arthur Gretton

Our test statistic is based on an empirical estimate of this divergence, taking the form of a V-statistic in terms of the log gradients of the target density and the kernel.

Density Estimation

Efficient Conditionally Invariant Representation Learning

1 code implementation16 Dec 2022 Roman Pogodin, Namrata Deka, Yazhe Li, Danica J. Sutherland, Victor Veitch, Arthur Gretton

The procedure requires just a single ridge regression from $Y$ to kernelized features of $Z$, which can be done in advance.

Fairness regression +1

Kernel-Based Just-In-Time Learning for Passing Expectation Propagation Messages

1 code implementation9 Mar 2015 Wittawat Jitkrittum, Arthur Gretton, Nicolas Heess, S. M. Ali Eslami, Balaji Lakshminarayanan, Dino Sejdinovic, Zoltán Szabó

We propose an efficient nonparametric strategy for learning a message operator in expectation propagation (EP), which takes as input the set of incoming messages to a factor node, and produces an outgoing message as output.

regression

Informative Features for Model Comparison

3 code implementations NeurIPS 2018 Wittawat Jitkrittum, Heishiro Kanagawa, Patsorn Sangkloy, James Hays, Bernhard Schölkopf, Arthur Gretton

Given two candidate models, and a set of target observations, we address the problem of measuring the relative goodness of fit of the two models.

Density Estimation in Infinite Dimensional Exponential Families

1 code implementation12 Dec 2013 Bharath Sriperumbudur, Kenji Fukumizu, Arthur Gretton, Aapo Hyvärinen, Revant Kumar

When $p_0\in\mathcal{P}$, we show that the proposed estimator is consistent, and provide a convergence rate of $n^{-\min\left\{\frac{2}{3},\frac{2\beta+1}{2\beta+2}\right\}}$ in Fisher divergence under the smoothness assumption that $\log p_0\in\mathcal{R}(C^\beta)$ for some $\beta\ge 0$, where $C$ is a certain Hilbert-Schmidt operator on $H$ and $\mathcal{R}(C^\beta)$ denotes the image of $C^\beta$.

Density Estimation

An Adaptive Test of Independence with Analytic Kernel Embeddings

1 code implementation ICML 2017 Wittawat Jitkrittum, Zoltan Szabo, Arthur Gretton

The dependence measure is the difference between analytic embeddings of the joint distribution and the product of the marginals, evaluated at a finite set of locations (features).

A Kernel Independence Test for Random Processes

1 code implementation18 Feb 2014 Kacper Chwialkowski, Arthur Gretton

A new non parametric approach to the problem of testing the independence of two random process is developed.

Large-Scale Kernel Methods for Independence Testing

1 code implementation25 Jun 2016 Qinyi Zhang, Sarah Filippi, Arthur Gretton, Dino Sejdinovic

Representations of probability measures in reproducing kernel Hilbert spaces provide a flexible framework for fully nonparametric hypothesis tests of independence, which can capture any type of departure from independence, including nonlinear associations and multivariate interactions.

Computational Efficiency

Kernelized Wasserstein Natural Gradient

1 code implementation ICLR 2020 Michael Arbel, Arthur Gretton, Wuchen Li, Guido Montufar

Many machine learning problems can be expressed as the optimization of some cost functional over a parametric family of probability distributions.

Exponential Family Estimation via Adversarial Dynamics Embedding

1 code implementation NeurIPS 2019 Bo Dai, Zhen Liu, Hanjun Dai, Niao He, Arthur Gretton, Le Song, Dale Schuurmans

We present an efficient algorithm for maximum likelihood estimation (MLE) of exponential family models, with a general parametrization of the energy function that includes neural networks.

Learning Deep Features in Instrumental Variable Regression

1 code implementation ICLR 2021 Liyuan Xu, Yutian Chen, Siddarth Srinivasan, Nando de Freitas, Arnaud Doucet, Arthur Gretton

We propose a novel method, deep feature instrumental variable regression (DFIV), to address the case where relations between instruments, treatments, and outcomes may be nonlinear.

regression

Efficient Wasserstein Natural Gradients for Reinforcement Learning

1 code implementation ICLR 2021 Ted Moskovitz, Michael Arbel, Ferenc Huszar, Arthur Gretton

A novel optimization approach is proposed for application to policy gradient methods and evolution strategies for reinforcement learning (RL).

Policy Gradient Methods reinforcement-learning +1

MERLiN: Mixture Effect Recovery in Linear Networks

1 code implementation3 Dec 2015 Sebastian Weichwald, Moritz Grosse-Wentrup, Arthur Gretton

Causal inference concerns the identification of cause-effect relationships between variables, e. g. establishing whether a stimulus affects activity in a certain brain region.

Causal Inference EEG

Kernel Instrumental Variable Regression

1 code implementation NeurIPS 2019 Rahul Singh, Maneesh Sahani, Arthur Gretton

Instrumental variable (IV) regression is a strategy for learning causal relationships in observational data.

regression

Fast Two-Sample Testing with Analytic Representations of Probability Measures

1 code implementation NeurIPS 2015 Kacper Chwialkowski, Aaditya Ramdas, Dino Sejdinovic, Arthur Gretton

The new tests are consistent against a larger class of alternatives than the previous linear-time tests based on the (non-smoothed) empirical characteristic functions, while being much faster than the current state-of-the-art quadratic-time kernel-based or energy distance-based tests.

Two-sample testing Vocal Bursts Valence Prediction

Maximum Mean Discrepancy Gradient Flow

1 code implementation NeurIPS 2019 Michael Arbel, Anna Korba, Adil Salim, Arthur Gretton

We construct a Wasserstein gradient flow of the maximum mean discrepancy (MMD) and study its convergence properties.

KALE Flow: A Relaxed KL Gradient Flow for Probabilities with Disjoint Support

1 code implementation NeurIPS 2021 Pierre Glaser, Michael Arbel, Arthur Gretton

We study the gradient flow for a relaxed approximation to the Kullback-Leibler (KL) divergence between a moving source and a fixed target distribution.

Kernel Exponential Family Estimation via Doubly Dual Embedding

1 code implementation6 Nov 2018 Bo Dai, Hanjun Dai, Arthur Gretton, Le Song, Dale Schuurmans, Niao He

We investigate penalized maximum log-likelihood estimation for exponential family distributions whose natural parameter resides in a reproducing kernel Hilbert space.

KSD Aggregated Goodness-of-fit Test

2 code implementations2 Feb 2022 Antonin Schrab, Benjamin Guedj, Arthur Gretton

KSDAgg avoids splitting the data to perform kernel selection (which leads to a loss in test power), and rather maximises the test power over a collection of kernels.

A Distributional Analogue to the Successor Representation

1 code implementation13 Feb 2024 Harley Wiltzer, Jesse Farebrother, Arthur Gretton, Yunhao Tang, André Barreto, Will Dabney, Marc G. Bellemare, Mark Rowland

This paper contributes a new approach for distributional reinforcement learning which elucidates a clean separation of transition structure and reward in the learning process.

Distributional Reinforcement Learning Model-based Reinforcement Learning +1

Efficient Aggregated Kernel Tests using Incomplete $U$-statistics

4 code implementations18 Jun 2022 Antonin Schrab, Ilmun Kim, Benjamin Guedj, Arthur Gretton

We derive non-asymptotic uniform separation rates for MMDAggInc and HSICAggInc, and quantify exactly the trade-off between computational efficiency and the attainable rates: this result is novel for tests based on incomplete $U$-statistics, to our knowledge.

Computational Efficiency

Practical Kernel Tests of Conditional Independence

1 code implementation20 Feb 2024 Roman Pogodin, Antonin Schrab, Yazhe Li, Danica J. Sutherland, Arthur Gretton

We describe a data-efficient, kernel-based approach to statistical testing of conditional independence.

A Wild Bootstrap for Degenerate Kernel Tests

1 code implementation NeurIPS 2014 Kacper Chwialkowski, Dino Sejdinovic, Arthur Gretton

A wild bootstrap method for nonparametric hypothesis tests based on kernel distribution embeddings is proposed.

Benchmarking Time Series +1

Proximal Causal Learning with Kernels: Two-Stage Estimation and Moment Restriction

2 code implementations10 May 2021 Afsaneh Mastouri, Yuchen Zhu, Limor Gultchin, Anna Korba, Ricardo Silva, Matt J. Kusner, Arthur Gretton, Krikamol Muandet

In particular, we provide a unifying view of two-stage and moment restriction approaches for solving this problem in a nonlinear setting.

Vocal Bursts Valence Prediction

A kernel Stein test of goodness of fit for sequential models

1 code implementation19 Oct 2022 Jerome Baum, Heishiro Kanagawa, Arthur Gretton

We propose a goodness-of-fit measure for probability densities modeling observations with varying dimensionality, such as text documents of differing lengths or variable-length sequences.

Kernel Conditional Exponential Family

1 code implementation15 Nov 2017 Michael Arbel, Arthur Gretton

A nonparametric family of conditional distributions is introduced, which generalizes conditional exponential families using functional parameters in a suitable RKHS.

A low variance consistent test of relative dependency

1 code implementation15 Jun 2014 Wacha Bounliphone, Arthur Gretton, Arthur Tenenhaus, Matthew Blaschko

Such a test enables us to determine whether one source variable is significantly more dependent on a first target variable or a second.

On Instrumental Variable Regression for Deep Offline Policy Evaluation

1 code implementation21 May 2021 Yutian Chen, Liyuan Xu, Caglar Gulcehre, Tom Le Paine, Arthur Gretton, Nando de Freitas, Arnaud Doucet

By applying different IV techniques to OPE, we are not only able to recover previously proposed OPE methods such as model-based techniques but also to obtain competitive new techniques.

regression Reinforcement Learning (RL)

Deep Proxy Causal Learning and its Application to Confounded Bandit Policy Evaluation

1 code implementation NeurIPS 2021 Liyuan Xu, Heishiro Kanagawa, Arthur Gretton

Proxy causal learning (PCL) is a method for estimating the causal effect of treatments on outcomes in the presence of unobserved confounding, using proxies (structured side information) for the confounder.

Off-policy evaluation

Efficient and principled score estimation with Nyström kernel exponential families

1 code implementation23 May 2017 Danica J. Sutherland, Heiko Strathmann, Michael Arbel, Arthur Gretton

We propose a fast method with statistical guarantees for learning an exponential family density model where the natural parameter is in a reproducing kernel Hilbert space, and may be infinite-dimensional.

Computational Efficiency Denoising +1

A kernel log-rank test of independence for right-censored data

1 code implementation8 Dec 2019 Tamara Fernandez, Arthur Gretton, David Rindt, Dino Sejdinovic

We introduce a general non-parametric independence test between right-censored survival times and covariates, which may be multivariate.

Survival Analysis

Composite Goodness-of-fit Tests with Kernels

1 code implementation19 Nov 2021 Oscar Key, Arthur Gretton, François-Xavier Briol, Tamara Fernandez

Model misspecification can create significant challenges for the implementation of probabilistic models, and this has led to development of a range of robust methods which directly account for this issue.

Distributional Bellman Operators over Mean Embeddings

1 code implementation9 Dec 2023 Li Kevin Wenliang, Grégoire Delétang, Matthew Aitchison, Marcus Hutter, Anian Ruoss, Arthur Gretton, Mark Rowland

We propose a novel algorithmic framework for distributional reinforcement learning, based on learning finite-dimensional mean embeddings of return distributions.

Atari Games Distributional Reinforcement Learning +1

MMD-FUSE: Learning and Combining Kernels for Two-Sample Testing Without Data Splitting

1 code implementation NeurIPS 2023 Felix Biggs, Antonin Schrab, Arthur Gretton

We propose novel statistics which maximise the power of a two-sample test based on the Maximum Mean Discrepancy (MMD), by adapting over the set of kernels used in defining it.

Two-sample testing

Model-based Kernel Sum Rule: Kernel Bayesian Inference with Probabilistic Models

no code implementations18 Sep 2014 Yu Nishiyama, Motonobu Kanagawa, Arthur Gretton, Kenji Fukumizu

Our contribution in this paper is to introduce a novel approach, termed the {\em model-based kernel sum rule} (Mb-KSR), to combine a probabilistic model and kernel Bayesian inference.

Bayesian Inference

Fast Non-Parametric Tests of Relative Dependency and Similarity

no code implementations17 Nov 2016 Wacha Bounliphone, Eugene Belilovsky, Arthur Tenenhaus, Ioannis Antonoglou, Arthur Gretton, Matthew B. Blashcko

The second test, called the relative test of similarity, is use to determine which of the two samples from arbitrary distributions is significantly closer to a reference sample of interest and the relative measure of similarity is based on the Maximum Mean Discrepancy (MMD).

Learning Theory for Distribution Regression

1 code implementation8 Nov 2014 Zoltan Szabo, Bharath Sriperumbudur, Barnabas Poczos, Arthur Gretton

In this paper, we study a simple, analytically computable, ridge regression-based alternative to distribution regression, where we embed the distributions to a reproducing kernel Hilbert space, and learn the regressor from the embeddings to the outputs.

Density Estimation Learning Theory +2

GP-select: Accelerating EM using adaptive subspace preselection

no code implementations10 Dec 2014 Jacquelyn A. Shelton, Jan Gasthaus, Zhenwen Dai, Joerg Luecke, Arthur Gretton

We propose a nonparametric procedure to achieve fast inference in generative graphical models when the number of latent states is very large.

Object Localization

A Kernel Test for Three-Variable Interactions with Random Processes

no code implementations2 Mar 2016 Paul K. Rubenstein, Kacper P. Chwialkowski, Arthur Gretton

The main contributions of this paper are twofold: first, we prove that the Lancaster statistic satisfies the conditions required to estimate the quantiles of the null distribution using the wild bootstrap; second, the manner in which this is proved is novel, simpler than existing methods, and can further be applied to other statistics.

Kernel Mean Shrinkage Estimators

no code implementations21 May 2014 Krikamol Muandet, Bharath Sriperumbudur, Kenji Fukumizu, Arthur Gretton, Bernhard Schölkopf

A mean function in a reproducing kernel Hilbert space (RKHS), or a kernel mean, is central to kernel methods in that it is used by many classical algorithms such as kernel principal component analysis, and it also forms the core inference step of modern kernel methods that rely on embedding probability distributions in RKHSs.

Filtering with State-Observation Examples via Kernel Monte Carlo Filter

no code implementations17 Dec 2013 Motonobu Kanagawa, Yu Nishiyama, Arthur Gretton, Kenji Fukumizu

In particular, the sampling and resampling procedures are novel in being expressed using kernel mean embeddings, so we theoretically analyze their behaviors.

Two-stage Sampled Learning Theory on Distributions

no code implementations7 Feb 2014 Zoltan Szabo, Arthur Gretton, Barnabas Poczos, Bharath Sriperumbudur

To the best of our knowledge, the only existing method with consistency guarantees for distribution regression requires kernel density estimation as an intermediate step (which suffers from slow convergence issues in high dimensions), and the domain of the distributions to be compact Euclidean.

Density Estimation Learning Theory +3

A simpler condition for consistency of a kernel independence test

no code implementations25 Jan 2015 Arthur Gretton

The HSIC is defined as the distance between the embedding of the joint distribution, and the embedding of the product of the marginals, in a Reproducing Kernel Hilbert Space (RKHS).

Passing Expectation Propagation Messages with Kernel Methods

no code implementations2 Jan 2015 Wittawat Jitkrittum, Arthur Gretton, Nicolas Heess

We propose to learn a kernel-based message operator which takes as input all expectation propagation (EP) incoming messages to a factor node and produces an outgoing message.

Equivalence of distance-based and RKHS-based statistics in hypothesis testing

no code implementations25 Jul 2012 Dino Sejdinovic, Bharath Sriperumbudur, Arthur Gretton, Kenji Fukumizu

We provide a unifying framework linking two classes of statistics used in two-sample and independence testing: on the one hand, the energy distances and distance covariances from the statistics literature; on the other, maximum mean discrepancies (MMD), that is, distances between embeddings of distributions to reproducing kernel Hilbert spaces (RKHS), as established in machine learning.

Two-sample testing

Hilbert Space Embeddings of Predictive State Representations

no code implementations26 Sep 2013 Byron Boots, Geoffrey Gordon, Arthur Gretton

The essence is to represent the state as a nonparametric conditional embedding operator in a Reproducing Kernel Hilbert Space (RKHS) and leverage recent work in kernel methods to estimate, predict, and update the representation.

A Kernel Test for Three-Variable Interactions

no code implementations NeurIPS 2013 Dino Sejdinovic, Arthur Gretton, Wicher Bergsma

We introduce kernel nonparametric tests for Lancaster three-variable interaction and for total independence, using embeddings of signed measures into a reproducing kernel Hilbert space.

Kernel Mean Estimation and Stein's Effect

no code implementations4 Jun 2013 Krikamol Muandet, Kenji Fukumizu, Bharath Sriperumbudur, Arthur Gretton, Bernhard Schölkopf

A mean function in reproducing kernel Hilbert space, or a kernel mean, is an important part of many applications ranging from kernel principal component analysis to Hilbert-space embedding of distributions.

Hilbert space embeddings and metrics on probability measures

no code implementations30 Jul 2009 Bharath K. Sriperumbudur, Arthur Gretton, Kenji Fukumizu, Bernhard Schölkopf, Gert R. G. Lanckriet

First, we consider the question of determining the conditions on the kernel $k$ for which $\gamma_k$ is a metric: such $k$ are denoted {\em characteristic kernels}.

Dimensionality Reduction

Antithetic and Monte Carlo kernel estimators for partial rankings

no code implementations1 Jul 2018 Maria Lomeli, Mark Rowland, Arthur Gretton, Zoubin Ghahramani

We also present a novel variance reduction scheme based on an antithetic variate construction between permutations to obtain an improved estimator for the Mallows kernel.

Multi-Object Tracking Recommendation Systems

On integral probability metrics, φ-divergences and binary classification

no code implementations18 Jan 2009 Bharath K. Sriperumbudur, Kenji Fukumizu, Arthur Gretton, Bernhard Schölkopf, Gert R. G. Lanckriet

First, to understand the relation between IPMs and $\phi$-divergences, the necessary and sufficient conditions under which these classes intersect are derived: the total variation distance is shown to be the only non-trivial $\phi$-divergence that is also an IPM.

Information Theory Information Theory

B-test: A Non-parametric, Low Variance Kernel Two-sample Test

no code implementations NeurIPS 2013 Wojciech Zaremba, Arthur Gretton, Matthew Blaschko

We propose a family of maximum mean discrepancy (MMD) kernel two-sample tests that have low sample complexity and are consistent.

Vocal Bursts Valence Prediction

Kernel Bayes' Rule

no code implementations NeurIPS 2011 Kenji Fukumizu, Le Song, Arthur Gretton

A nonparametric kernel-based method for realizing Bayes' rule is proposed, based on kernel representations of probabilities in reproducing kernel Hilbert spaces.

Bayesian Inference

A Fast, Consistent Kernel Two-Sample Test

no code implementations NeurIPS 2009 Arthur Gretton, Kenji Fukumizu, Zaïd Harchaoui, Bharath K. Sriperumbudur

A kernel embedding of probability distributions into reproducing kernel Hilbert spaces (RKHS) has recently been proposed, which allows the comparison of two probability measures P and Q based on the distance between their respective embeddings: for a sufficiently rich RKHS, this distance is zero if and only if P and Q coincide.

Vocal Bursts Valence Prediction

Nonlinear directed acyclic structure learning with weakly additive noise models

no code implementations NeurIPS 2009 Arthur Gretton, Peter Spirtes, Robert E. Tillman

This results in a more computationally efficient approach that is useful for arbitrary distributions even when additive noise models are invertible.

Learning Taxonomies by Dependence Maximization

no code implementations NeurIPS 2008 Matthew Blaschko, Arthur Gretton

We introduce a family of unsupervised algorithms, numerical taxonomy clustering, to simultaneously cluster data, and to learn a taxonomy that encodes the relationship between the clusters.

Clustering

Characteristic Kernels on Groups and Semigroups

no code implementations NeurIPS 2008 Kenji Fukumizu, Arthur Gretton, Bernhard Schölkopf, Bharath K. Sriperumbudur

Embeddings of random variables in reproducing kernel Hilbert spaces (RKHSs) may be used to conduct statistical inference based on higher order moments.

Kernel Measures of Independence for non-iid Data

no code implementations NeurIPS 2008 Xinhua Zhang, Le Song, Arthur Gretton, Alex J. Smola

Many machine learning algorithms can be formulated in the framework of statistical independence such as the Hilbert Schmidt Independence Criterion.

BIG-bench Machine Learning Clustering

A Kernel Stein Test for Comparing Latent Variable Models

1 code implementation1 Jul 2019 Heishiro Kanagawa, Wittawat Jitkrittum, Lester Mackey, Kenji Fukumizu, Arthur Gretton

We propose a kernel-based nonparametric test of relative goodness of fit, where the goal is to compare two models, both of which may have unobserved latent variables, such that the marginal distribution of the observed variables is intractable.

Counterfactual Distribution Regression for Structured Inference

no code implementations20 Aug 2019 Nicolo Colombo, Ricardo Silva, Soong M Kang, Arthur Gretton

The inference problem is how information concerning perturbations, with particular covariates such as location and time, can be generalized to predict the effect of novel perturbations.

counterfactual regression

Modelling transition dynamics in MDPs with RKHS embeddings

no code implementations18 Jun 2012 Steffen Grunewalder, Guy Lever, Luca Baldassarre, Massi Pontil, Arthur Gretton

For policy optimisation we compare with least-squares policy iteration where a Gaussian process is used for value function estimation.

Deep Layer-wise Networks Have Closed-Form Weights

no code implementations15 Jun 2020 Chieh Wu, Aria Masoomi, Arthur Gretton, Jennifer Dy

There is currently a debate within the neuroscience community over the likelihood of the brain performing backpropagation (BP).

Multi-class Classification

A Non-Asymptotic Analysis for Stein Variational Gradient Descent

no code implementations NeurIPS 2020 Anna Korba, Adil Salim, Michael Arbel, Giulia Luise, Arthur Gretton

We study the Stein Variational Gradient Descent (SVGD) algorithm, which optimises a set of particles to approximate a target probability distribution $\pi\propto e^{-V}$ on $\mathbb{R}^d$.

LEMMA

Kernelized Stein Discrepancy Tests of Goodness-of-fit for Time-to-Event Data

no code implementations ICML 2020 Tamara Fernandez, Nicolas Rivera, Wenkai Xu, Arthur Gretton

Survival Analysis and Reliability Theory are concerned with the analysis of time-to-event data, in which observations correspond to waiting times until an event of interest such as death from a particular disease or failure of a component in a mechanical system.

Survival Analysis

Kernel Methods for Causal Functions: Dose, Heterogeneous, and Incremental Response Curves

no code implementations10 Oct 2020 Rahul Singh, Liyuan Xu, Arthur Gretton

We propose estimators based on kernel ridge regression for nonparametric causal functions such as dose, heterogeneous, and incremental response curves.

counterfactual regression

A Weaker Faithfulness Assumption based on Triple Interactions

no code implementations27 Oct 2020 Alexander Marx, Arthur Gretton, Joris M. Mooij

One of the core assumptions in causal discovery is the faithfulness assumption, i. e., assuming that independencies found in the data are due to separations in the true causal graph.

Causal Discovery

Kernel Dependence Network

no code implementations4 Nov 2020 Chieh Wu, Aria Masoomi, Arthur Gretton, Jennifer Dy

We propose a greedy strategy to spectrally train a deep network for multi-class classification.

Multi-class Classification

A kernel test for quasi-independence

no code implementations NeurIPS 2020 Tamara Fernández, Wenkai Xu, Marc Ditzhaus, Arthur Gretton

We consider settings in which the data of interest correspond to pairs of ordered times, e. g, the birth times of the first and second child, the times at which a new user creates an account and makes the first purchase on a website, and the entry and survival times of patients in a clinical trial.

Towards an Understanding of Benign Overfitting in Neural Networks

no code implementations6 Jun 2021 Zhu Li, Zhi-Hua Zhou, Arthur Gretton

Modern machine learning models often employ a huge number of parameters and are typically optimized to have zero training loss; yet surprisingly, they possess near-optimal prediction performance, contradicting classical learning theory.

Learning Theory

Sequential Kernel Embedding for Mediated and Time-Varying Dose Response Curves

no code implementations6 Nov 2021 Rahul Singh, Liyuan Xu, Arthur Gretton

We propose simple nonparametric estimators for mediated and time-varying dose response curves based on kernel ridge regression.

Causal Inference counterfactual

Deep Layer-wise Networks Have Closed-Form Weights

no code implementations1 Feb 2022 Chieh Wu, Aria Masoomi, Arthur Gretton, Jennifer Dy

There is currently a debate within the neuroscience community over the likelihood of the brain performing backpropagation (BP).

Importance Weighting Approach in Kernel Bayes' Rule

no code implementations5 Feb 2022 Liyuan Xu, Yutian Chen, Arnaud Doucet, Arthur Gretton

We study a nonparametric approach to Bayesian computation via feature means, where the expectation of prior features is updated to yield expected kernel posterior features, based on regression from learned neural net or kernel features of the observations.

Discussion of `Multiscale Fisher's Independence Test for Multivariate Dependence'

no code implementations22 Jun 2022 Antonin Schrab, Wittawat Jitkrittum, Zoltán Szabó, Dino Sejdinovic, Arthur Gretton

We discuss how MultiFIT, the Multiscale Fisher's Independence Test for Multivariate Dependence proposed by Gorsky and Ma (2022), compares to existing linear-time kernel tests based on the Hilbert-Schmidt independence criterion (HSIC).

Optimal Rates for Regularized Conditional Mean Embedding Learning

no code implementations2 Aug 2022 Zhu Li, Dimitri Meunier, Mattes Mollenhauer, Arthur Gretton

We address the misspecified setting, where the target CME is in the space of Hilbert-Schmidt operators acting from an input interpolation space between $\mathcal{H}_X$ and $L_2$, to $\mathcal{H}_Y$.

Bayesian Inference

A Neural Mean Embedding Approach for Back-door and Front-door Adjustment

no code implementations12 Oct 2022 Liyuan Xu, Arthur Gretton

We consider the estimation of average and counterfactual treatment effects, under two settings: back-door adjustment and front-door adjustment.

counterfactual Density Estimation +1

Maximum Likelihood Learning of Unnormalized Models for Simulation-Based Inference

1 code implementation26 Oct 2022 Pierre Glaser, Michael Arbel, Samo Hromadka, Arnaud Doucet, Arthur Gretton

We introduce two synthetic likelihood methods for Simulation-Based Inference (SBI), to conduct either amortized or targeted inference from experimental observations when a high-fidelity simulator is available.

Controlling Moments with Kernel Stein Discrepancies

no code implementations10 Nov 2022 Heishiro Kanagawa, Alessandro Barp, Arthur Gretton, Lester Mackey

Kernel Stein discrepancies (KSDs) measure the quality of a distributional approximation and can be computed even when the target density has an intractable normalizing constant.

Adapting to Latent Subgroup Shifts via Concepts and Proxies

no code implementations21 Dec 2022 Ibrahim Alabdulmohsin, Nicole Chiou, Alexander D'Amour, Arthur Gretton, Sanmi Koyejo, Matt J. Kusner, Stephen R. Pfohl, Olawale Salaudeen, Jessica Schrouff, Katherine Tsai

We show that the optimal target predictor can be non-parametrically identified with the help of concept and proxy variables available only in the source domain, and unlabeled data from the target.

Unsupervised Domain Adaptation

Deep Hypothesis Tests Detect Clinically Relevant Subgroup Shifts in Medical Images

1 code implementation8 Mar 2023 Lisa M. Koch, Christian M. Schürch, Christian F. Baumgartner, Arthur Gretton, Philipp Berens

We formulate subgroup shift detection in the framework of statistical hypothesis testing and show that recent state-of-the-art statistical tests can be effectively applied to subgroup shift detection on medical imaging data.

Prediction under Latent Subgroup Shifts with High-Dimensional Observations

no code implementations23 Jun 2023 William I. Walker, Arthur Gretton, Maneesh Sahani

We introduce a new approach to prediction in graphical models with latent-shift adaptation, i. e., where source and target environments differ in the distribution of an unobserved confounding latent variable.

Nonlinear Meta-Learning Can Guarantee Faster Rates

no code implementations20 Jul 2023 Dimitri Meunier, Zhu Li, Arthur Gretton, Samory Kpotufe

Many recent theoretical works on \emph{meta-learning} aim to achieve guarantees in leveraging similar representational structures from related tasks towards simplifying a target task.

Meta-Learning regression

Kernel Single Proxy Control for Deterministic Confounding

no code implementations8 Aug 2023 Liyuan Xu, Arthur Gretton

We consider the problem of causal effect estimation with an unobserved confounder, where we observe a proxy variable that is associated with the confounder.

Towards Optimal Sobolev Norm Rates for the Vector-Valued Regularized Least-Squares Algorithm

no code implementations12 Dec 2023 Zhu Li, Dimitri Meunier, Mattes Mollenhauer, Arthur Gretton

We present the first optimal rates for infinite-dimensional vector-valued ridge regression on a continuous scale of norms that interpolate between $L_2$ and the hypothesis space, which we consider as a vector-valued reproducing kernel Hilbert space.

regression

Proxy Methods for Domain Adaptation

no code implementations12 Mar 2024 Katherine Tsai, Stephen R. Pfohl, Olawale Salaudeen, Nicole Chiou, Matt J. Kusner, Alexander D'Amour, Sanmi Koyejo, Arthur Gretton

We study the problem of domain adaptation under distribution shift, where the shift is due to a change in the distribution of an unobserved, latent variable that confounds both the covariates and the labels.

Domain Adaptation

Cannot find the paper you are looking for? You can Submit a new open access paper.