Although neural networks are conventionally optimized towards zero training loss, it has been recently learned that targeting a non-zero training loss threshold, referred to as a flood level, often enables better test time generalization.
Most meta-learning methods assume that the (very small) context set used to establish a new task at test time is passively provided.
no code implementations • 29 Mar 2023 • Organizers Of QueerInAI, :, Anaelia Ovalle, Arjun Subramonian, Ashwin Singh, Claas Voelcker, Danica J. Sutherland, Davide Locatelli, Eva Breznik, Filip Klubička, Hang Yuan, Hetvi J, huan zhang, Jaidev Shriram, Kruno Lehman, Luca Soldaini, Maarten Sap, Marc Peter Deisenroth, Maria Leonor Pacheco, Maria Ryskina, Martin Mundt, Milind Agarwal, Nyx McLean, Pan Xu, A Pranav, Raj Korpan, Ruchira Ray, Sarah Mathew, Sarthak Arora, ST John, Tanvi Anand, Vishakha Agrawal, William Agnew, Yanan Long, Zijie J. Wang, Zeerak Talat, Avijit Ghosh, Nathaniel Dennler, Michael Noseworthy, Sharvani Jha, Emi Baylor, Aditya Joshi, Natalia Y. Bilenko, Andrew McNamara, Raphael Gontijo-Lopes, Alex Markham, Evyn Dǒng, Jackie Kay, Manu Saraswat, Nikhil Vytla, Luke Stark
We present Queer in AI as a case study for community-led participatory design in AI.
We show that incorporating Exphormer into the recently-proposed GraphGPS framework produces models with competitive empirical results on a wide variety of graph datasets, including state-of-the-art results on three datasets.
Ranked #1 on Graph Classification on MNIST
Maximum mean discrepancy (MMD) is a particularly useful distance metric for differentially private data generation: when used with finite-dimensional features it allows us to summarize and privatize the data distribution once, which we can repeatedly use during generator training without further privacy loss.
The procedure requires just a single ridge regression from $Y$ to kernelized features of $Z$, which can be done in advance.
We prove a new generalization bound that shows for any class of linear predictors in Gaussian space, the Rademacher complexity of the class and the training error under any continuous loss $\ell$ can control the test error under all Moreau envelopes of the loss $\ell$.
Weakly Supervised Object Detection (WSOD) is a task that detects objects in an image using a model trained only on image-level annotations.
Ranked #1 on Weakly Supervised Object Detection on MS-COCO-2017
Empirical neural tangent kernels (eNTKs) can provide a good understanding of a given network's representation: they are often far less expensive to compute and applicable more broadly than infinite width NTKs.
We propose a new method for approximating active learning acquisition strategies that are based on retraining with hypothetically-labeled candidate data points.
A wide range of models have been proposed for Graph Generative Models, necessitating effective methods to evaluate their quality.
Training even moderately-sized generative models with differentially-private stochastic gradient descent (DP-SGD) is difficult: the required level of noise for reasonable levels of privacy is simply too large.
Semi-weakly supervised semantic segmentation (SWSSS) aims to train a model to identify objects in images based on a small number of images with pixel-level labels, and many more images with only image-level labels.
Observing the learning path not only provides a new perspective for understanding knowledge distillation, overfitting, and learning dynamics, but also reveals that the supervisory signal of a teacher network can be very unstable near the best points in training on real tasks.
We study a localized notion of uniform convergence known as an "optimistic rate" (Panchenko 2002; Srebro et al. 2010) for linear regression with Gaussian data.
We consider interpolation learning in high-dimensional linear regression with Gaussian data, and prove a generic uniform convergence guarantee on the generalization error of interpolators in an arbitrary hypothesis class in terms of the class's Gaussian width.
We approach self-supervised learning of image representations from a statistical dependence perspective, proposing Self-Supervised Learning with the Hilbert-Schmidt Independence Criterion (SSL-HSIC).
In realistic scenarios with very limited numbers of data samples, however, it can be challenging to identify a kernel powerful enough to distinguish complex distributions.
We consider interpolation learning in high-dimensional linear regression with Gaussian data, and prove a generic uniform convergence guarantee on the generalization error of interpolators in an arbitrary hypothesis class in terms of the class’s Gaussian width.
We show that the Invariant Risk Minimization (IRM) formulation of Arjovsky et al. (2019) can fail to capture "natural" invariances, at least when used in its practical "linear" form, and even on very simple problems which directly follow the motivating examples for IRM.
But we argue we can explain the consistency of the minimal-norm interpolator with a slightly weaker, yet standard, notion: uniform convergence of zero-error predictors in a norm ball.
We propose a class of kernel-based two-sample tests, which aim to determine whether two sets of samples are drawn from the same distribution.
Ranked #1 on Two-sample testing on HIGGS Data Set
The maximum mean discrepancy (MMD) is a kernel-based distance between probability distributions useful in many applications (Gretton et al. 2012), bearing a simple estimator with pleasing computational and statistical properties.
The kernel exponential family is a rich class of distributions, which can be fit efficiently and with statistical guarantees by score matching.
We propose a principled method for gradient-based regularization of the critic of GAN-like models trained by adversarially optimizing the kernel of a Maximum Mean Discrepancy (MMD).
Ranked #121 on Image Generation on CIFAR-10
We propose a fast method with statistical guarantees for learning an exponential family density model where the natural parameter is in a reproducing kernel Hilbert space, and may be infinite-dimensional.
Distribution regression has recently attracted much interest as a generic solution to the problem of supervised learning where labels are available at the group level, rather than at the individual level.
In this context, the MMD may be used in two roles: first, as a discriminator, either directly on the samples, or on features of the samples.
We combine fine-grained spatially referenced census data with the vote outcomes from the 2016 US presidential election.
This work develops the first random features for pdfs whose dot product approximates kernels using these non-Euclidean metrics, allowing estimators using such kernels to scale to large datasets by working in a primal space, without computing large Gram matrices.
Most machine learning algorithms, such as classification or regression, treat the individual data point as the object of interest.