5 code implementations • 14 Dec 2020 • Pang Wei Koh, Shiori Sagawa, Henrik Marklund, Sang Michael Xie, Marvin Zhang, Akshay Balsubramani, Weihua Hu, Michihiro Yasunaga, Richard Lanas Phillips, Irena Gao, Tony Lee, Etienne David, Ian Stavness, Wei Guo, Berton A. Earnshaw, Imran S. Haque, Sara Beery, Jure Leskovec, Anshul Kundaje, Emma Pierson, Sergey Levine, Chelsea Finn, Percy Liang
Distribution shifts -- where the training distribution differs from the test distribution -- can substantially degrade the accuracy of machine learning (ML) systems deployed in the wild.
A pervasive issue in statistical hypothesis testing is that the reported $p$-values are biased downward by data "peeking" -- the practice of reporting only progressively extreme values of the test statistic as more data samples are collected.
We show an extension of Sanov's theorem on large deviations, controlling the tail probabilities of i. i. d.
Optimal transport (OT) is a principled approach to align datasets, but a key challenge in applying OT is that we need to specify a transport cost function that accurately captures how the two datasets are related.
We introduce a variant of the $k$-nearest neighbor classifier in which $k$ is chosen adaptively for each query, rather than supplied as a parameter.
In this note, we point out a basic link between generative adversarial (GA) training and binary classification -- any powerful discriminator essentially computes an (f-)divergence between real and generated samples.
Corresponding samples from the real dataset consist of two distinct photographs of the same subject.
We formulate learning of a binary autoencoder as a biconvex optimization problem which learns from the pairwise correlations between encoded and decoded bits.
We explore the problem of binary classification in machine learning, with a twist - the classifier is allowed to abstain on any datum, professing ignorance about the true class label without committing to any prediction.
We address the problem of aggregating an ensemble of predictors with known loss bounds in a semi-supervised binary classification setting, to minimize prediction loss incurred on the unlabeled data.
We give tight concentration bounds for mixtures of martingales that are simultaneously uniform over (a) mixture distributions, in a PAC-Bayes sense; and (b) all finite times.
It is novel in several ways: (a) it takes linear time and constant space to compute on the fly, (b) it has the same power guarantee as a non-sequential version of the test with the same computational constraints up to a small factor, and (c) it accesses only as many samples as are required - its stopping time adapts to the unknown difficulty of the problem.
We give concentration bounds for martingales that are uniform over finite times and extend classical Hoeffding and Bernstein inequalities.