Two-sample testing

76 papers with code • 5 benchmarks • 1 datasets

In statistical hypothesis testing, a two-sample test is a test performed on the data of two random samples, each independently obtained from a different given population. The purpose of the test is to determine whether the difference between these two populations is statistically significant. The statistics used in two-sample tests can be used to solve many machine learning problems, such as domain adaptation, covariate shift and generative adversarial networks.

Benchmarks

Add a Result

These leaderboards are used to track progress in Two-sample testing

Dataset	Best Model	Compare
HIGGS Data Set	MMD-D	See all
MNIST vs Fake MNIST	MMD-D	See all
Blob (9 modes, 40 for each)	MMD-D	See all
HDGM (d=10, N=4000)	MMD-D	See all
CIFAR-10 vs CIFAR-10.1 (1000 samples)	MMD-D	See all

Datasets

HIGGS Data Set

Most implemented papers

Most implemented Social Latest No code

A Meta-Analysis of the Anomaly Detection Problem

yaroslav-moiseev/evidence-based-possibly-best-practices-in-classical-ML • 3 Mar 2015

The intended contributions of this article are many; in addition to providing a large publicly-available corpus of anomaly detection benchmarks, we provide an ontology for describing anomaly detection contexts, a methodology for controlling various aspects of benchmark creation, guidelines for future experimental design and a discussion of the many potential pitfalls of trying to measure success in this field.

Paper
Code

Sequential Nonparametric Testing with the Law of the Iterated Logarithm

sshekhar17/nonparametric-testing-by-betting • 10 Jun 2015

It is novel in several ways: (a) it takes linear time and constant space to compute on the fly, (b) it has the same power guarantee as a non-sequential version of the test with the same computational constraints up to a small factor, and (c) it accesses only as many samples as are required - its stopping time adapts to the unknown difficulty of the problem.

Paper
Code

Fast Two-Sample Testing with Analytic Representations of Probability Measures

kacperChwialkowski/analyticMeanEmbeddings • NeurIPS 2015

The new tests are consistent against a larger class of alternatives than the previous linear-time tests based on the (non-smoothed) empirical characteristic functions, while being much faster than the current state-of-the-art quadratic-time kernel-based or energy distance-based tests.

Paper
Code

On Wasserstein Two Sample Testing and Related Families of Nonparametric Tests

marcodeangelis/Area-metric • 8 Sep 2015

In this work, our central object is the Wasserstein distance, as we form a chain of connections from univariate methods like the Kolmogorov-Smirnov test, PP/QQ plots and ROC/ODC curves, to multivariate tests involving energy statistics and kernel based maximum mean discrepancy.

Paper
Code

Interpretability of Multivariate Brain Maps in Brain Decoding: Definition and Quantification

smkia/interpretability • 29 Mar 2016

In this paper, first, we present a theoretical definition of interpretability in brain decoding; we show that the interpretability of multivariate brain maps can be decomposed into their reproducibility and representativeness.

Paper
Code

A U-statistic Approach to Hypothesis Testing for Structure Discovery in Undirected Graphical Models

wbounliphone/Ustatistics_Approach_For_SD • 6 Apr 2016

For some class of probability distributions, an edge between two variables is present if and only if the corresponding entry in the precision matrix is non-zero.

Paper
Code

Efficient Nonparametric Smoothness Estimation

sss1/SobolevEstimation • NeurIPS 2016

Sobolev quantities (norms, inner products, and distances) of probability density functions are important in the theory of nonparametric statistics, but have rarely been used in practice, partly due to a lack of practical estimators.

Paper
Code

Statistical comparison of classifiers through Bayesian hierarchical modelling

BayesianTestsML/tutorial • 28 Sep 2016

Usually one compares the accuracy of two competing classifiers via null hypothesis significance tests (nhst).

Paper
Code

Priv'IT: Private and Sample Efficient Identity Testing

hoonose/privit • 29 Mar 2017

We develop differentially private hypothesis testing methods for the small sample regime.

Paper
Code

Data-adaptive statistics for multiple hypothesis testing in high-dimensional settings

wilsoncai1992/adaptest • 24 Apr 2017

Current statistical inference problems in areas like astronomy, genomics, and marketing routinely involve the simultaneous testing of thousands -- even millions -- of null hypotheses.

Paper
Code

Two-sample testing

Benchmarks Add a Result

Datasets

Most implemented papers

Content

Benchmarks

Add a Result