Two-sample testing
76 papers with code • 5 benchmarks • 1 datasets
In statistical hypothesis testing, a two-sample test is a test performed on the data of two random samples, each independently obtained from a different given population. The purpose of the test is to determine whether the difference between these two populations is statistically significant. The statistics used in two-sample tests can be used to solve many machine learning problems, such as domain adaptation, covariate shift and generative adversarial networks.
Most implemented papers
A Meta-Analysis of the Anomaly Detection Problem
The intended contributions of this article are many; in addition to providing a large publicly-available corpus of anomaly detection benchmarks, we provide an ontology for describing anomaly detection contexts, a methodology for controlling various aspects of benchmark creation, guidelines for future experimental design and a discussion of the many potential pitfalls of trying to measure success in this field.
Sequential Nonparametric Testing with the Law of the Iterated Logarithm
It is novel in several ways: (a) it takes linear time and constant space to compute on the fly, (b) it has the same power guarantee as a non-sequential version of the test with the same computational constraints up to a small factor, and (c) it accesses only as many samples as are required - its stopping time adapts to the unknown difficulty of the problem.
Fast Two-Sample Testing with Analytic Representations of Probability Measures
The new tests are consistent against a larger class of alternatives than the previous linear-time tests based on the (non-smoothed) empirical characteristic functions, while being much faster than the current state-of-the-art quadratic-time kernel-based or energy distance-based tests.
On Wasserstein Two Sample Testing and Related Families of Nonparametric Tests
In this work, our central object is the Wasserstein distance, as we form a chain of connections from univariate methods like the Kolmogorov-Smirnov test, PP/QQ plots and ROC/ODC curves, to multivariate tests involving energy statistics and kernel based maximum mean discrepancy.
Interpretability of Multivariate Brain Maps in Brain Decoding: Definition and Quantification
In this paper, first, we present a theoretical definition of interpretability in brain decoding; we show that the interpretability of multivariate brain maps can be decomposed into their reproducibility and representativeness.
A U-statistic Approach to Hypothesis Testing for Structure Discovery in Undirected Graphical Models
For some class of probability distributions, an edge between two variables is present if and only if the corresponding entry in the precision matrix is non-zero.
Efficient Nonparametric Smoothness Estimation
Sobolev quantities (norms, inner products, and distances) of probability density functions are important in the theory of nonparametric statistics, but have rarely been used in practice, partly due to a lack of practical estimators.
Statistical comparison of classifiers through Bayesian hierarchical modelling
Usually one compares the accuracy of two competing classifiers via null hypothesis significance tests (nhst).
Priv'IT: Private and Sample Efficient Identity Testing
We develop differentially private hypothesis testing methods for the small sample regime.
Data-adaptive statistics for multiple hypothesis testing in high-dimensional settings
Current statistical inference problems in areas like astronomy, genomics, and marketing routinely involve the simultaneous testing of thousands -- even millions -- of null hypotheses.