Two-sample testing

76 papers with code • 5 benchmarks • 1 datasets

In statistical hypothesis testing, a two-sample test is a test performed on the data of two random samples, each independently obtained from a different given population. The purpose of the test is to determine whether the difference between these two populations is statistically significant. The statistics used in two-sample tests can be used to solve many machine learning problems, such as domain adaptation, covariate shift and generative adversarial networks.

Benchmarks

Add a Result

These leaderboards are used to track progress in Two-sample testing

Dataset	Best Model	Compare
HIGGS Data Set	MMD-D	See all
MNIST vs Fake MNIST	MMD-D	See all
Blob (9 modes, 40 for each)	MMD-D	See all
HDGM (d=10, N=4000)	MMD-D	See all
CIFAR-10 vs CIFAR-10.1 (1000 samples)	MMD-D	See all

Datasets

HIGGS Data Set

Most implemented papers

Most implemented Social Latest No code

A Differentially Private Kernel Two-Sample Test

antoninschrab/dpkernel-paper • • 1 Aug 2018

As a result, a simple chi-squared test is obtained, where a test statistic depends on a mean and covariance of empirical differences between the samples, which we perturb for a privacy guarantee.

Paper
Code

Scalable and Efficient Hypothesis Testing with Random Forests

tim-coleman/SURFTest • 16 Apr 2019

Throughout the last decade, random forests have established themselves as among the most accurate and popular supervised learning methods.

Paper
Code

Comparing distributions: $\ell_1$ geometry improves kernel two-sample testing

meyerscetbon/l1_two_sample_test • 19 Sep 2019

Here, we show that $L^p$ distances (with $p\geq 1$) between these distribution representatives give metrics on the space of distributions that are well-behaved to detect differences between distributions as they metrize the weak convergence.

Paper
Code

Decision-Making with Auto-Encoding Variational Bayes

PierreBoyeau/sbVAE • • NeurIPS 2020

To make decisions based on a model fit with auto-encoding variational Bayes (AEVB), practitioners often let the variational distribution serve as a surrogate for the posterior distribution.

Paper
Code

Confidence Sets and Hypothesis Testing in a Likelihood-Free Inference Setting

Mr8ND/ACORE-LFI • ICML 2020

In this paper, we present $\texttt{ACORE}$ (Approximate Computation via Odds Ratio Estimation), a frequentist approach to LFI that first formulates the classical likelihood ratio test (LRT) as a parametrized classification problem, and then uses the equivalence of tests and confidence sets to build confidence regions for parameters of interest.

Paper
Code

AI Feynman 2.0: Pareto-optimal symbolic regression exploiting graph modularity

SJ001/AI-Feynman • • NeurIPS 2020

We present an improved method for symbolic regression that seeks to fit data to formulas that are Pareto-optimal, in the sense of having the best accuracy for a given complexity.

Paper
Code

Expert-Supervised Reinforcement Learning for Offline Policy Learning and Evaluation

asonabend/ESRL • NeurIPS 2020

However, the adoption of such policies in practice is often challenging, as they are hard to interpret within the application context, and lack measures of uncertainty for the learned policy value and its decisions.

Paper
Code

Quantifying Statistical Significance of Neural Network-based Image Segmentation by Selective Inference

vonguyenleduy/dnn_segmentation_selective_inference • • 5 Oct 2020

To overcome this difficulty, we introduce a conditional selective inference (SI) framework -- a new statistical inference framework for data-driven hypotheses that has recently received considerable attention -- to compute exact (non-asymptotic) valid p-values for the segmentation results.

Paper
Code