We solve this problem in a principled manner, by introducing a combinatorial dimension called VCL that characterizes the best $d'$ for which $d'/n$ is a strong minimax lower bound.
We introduce a generalization to the lottery ticket hypothesis in which the notion of "sparsity" is relaxed by choosing an arbitrary basis in the space of parameters.
46 code implementations • • Ilya Tolstikhin, Neil Houlsby, Alexander Kolesnikov, Lucas Beyer, Xiaohua Zhai, Thomas Unterthiner, Jessica Yung, Andreas Steiner, Daniel Keysers, Jakob Uszkoreit, Mario Lucic, Alexey Dosovitskiy
Convolutional Neural Networks (CNNs) are the go-to model for computer vision.
Ranked #18 on Image Classification on OmniBenchmark
We show how this alignment produces a positive transfer: networks pre-trained with random labels train faster downstream compared to training from scratch even after accounting for simple effects, such as weight scaling.
Furthermore, the predictors are able to rank networks trained on different, unobserved datasets and with different architectures.
The estimation of an f-divergence between two probability distributions based on samples is a fundamental problem in statistics and machine learning.
We introduce GeNet, a method for shotgun metagenomic classification from raw DNA sequences that exploits the known hierarchical structure between labels for training.
A common assumption in causal modeling posits that the data is generated by a set of independent mechanisms, and algorithms should aim to recover this structure.
First, releasing (an estimate of) the kernel mean embedding of the data generating random variable instead of the database itself still allows third-parties to construct consistent estimators of a wide class of population statistics.
We consider the problem of learning the functions computing children from parents in a Structural Causal Model once the underlying causal graph has been identified.
We study unsupervised generative modeling in terms of the optimal transport (OT) problem between true (but unknown) data distribution $P_X$ and the latent variable model distribution $P_G$.
Generative Adversarial Networks (GAN) (Goodfellow et al., 2014) are an effective method for training generative models of complex data such as natural images.
We provide a theoretical foundation for non-parametric estimation of functions of random variables using kernel mean embeddings.
Transductive learning considers a training set of $m$ labeled samples and a test set of $u$ unlabeled samples, with the goal of best labeling that particular test set.
This paper introduces a new complexity measure for transductive learning called Permutational Rademacher Complexity (PRC) and studies its properties.
We pose causal inference as the problem of learning to classify probability distributions.
We show two novel concentration inequalities for suprema of empirical processes when sampling without replacement, which both take the variance of the functions into account.