Search Results for author: Danica J. Sutherland

Found 38 papers, 20 papers with code

Language Model Evolution: An Iterated Learning Perspective

2 code implementations • 4 Apr 2024 • Yi Ren, Shangmin Guo, Linlu Qiu, Bailin Wang, Danica J. Sutherland

With the widespread adoption of Large Language Models (LLMs), the prevalence of iterative interactions among these models is anticipated to increase.

Language Modelling

144

Paper
Code

Practical Kernel Tests of Conditional Independence

1 code implementation • 20 Feb 2024 • Roman Pogodin, Antonin Schrab, Yazhe Li, Danica J. Sutherland, Arthur Gretton

We describe a data-efficient, kernel-based approach to statistical testing of conditional independence.

Paper
Code

Exploring Active Learning in Meta-Learning: Enhancing Context Set Labeling

no code implementations • 6 Nov 2023 • Wonho Bae, Jing Wang, Danica J. Sutherland

Most meta-learning methods assume that the (very small) context set used to establish a new task at test time is passively provided.

Active Learning Meta-Learning

Paper
Add Code

AdaFlood: Adaptive Flood Regularization

no code implementations • 6 Nov 2023 • Wonho Bae, Yi Ren, Mohamad Osama Ahmed, Frederick Tung, Danica J. Sutherland, Gabriel L. Oliveira

Although neural networks are conventionally optimized towards zero training loss, it has been recently learned that targeting a non-zero training loss threshold, referred to as a flood level, often enables better test time generalization.

Paper
Add Code

Queer In AI: A Case Study in Community-Led Participatory AI

no code implementations • 29 Mar 2023 • Organizers Of QueerInAI, :, Anaelia Ovalle, Arjun Subramonian, Ashwin Singh, Claas Voelcker, Danica J. Sutherland, Davide Locatelli, Eva Breznik, Filip Klubička, Hang Yuan, Hetvi J, huan zhang, Jaidev Shriram, Kruno Lehman, Luca Soldaini, Maarten Sap, Marc Peter Deisenroth, Maria Leonor Pacheco, Maria Ryskina, Martin Mundt, Milind Agarwal, Nyx McLean, Pan Xu, A Pranav, Raj Korpan, Ruchira Ray, Sarah Mathew, Sarthak Arora, ST John, Tanvi Anand, Vishakha Agrawal, William Agnew, Yanan Long, Zijie J. Wang, Zeerak Talat, Avijit Ghosh, Nathaniel Dennler, Michael Noseworthy, Sharvani Jha, Emi Baylor, Aditya Joshi, Natalia Y. Bilenko, Andrew McNamara, Raphael Gontijo-Lopes, Alex Markham, Evyn Dǒng, Jackie Kay, Manu Saraswat, Nikhil Vytla, Luke Stark

We present Queer in AI as a case study for community-led participatory design in AI.

Paper
Add Code

Exphormer: Sparse Transformers for Graphs

1 code implementation • 10 Mar 2023 • Hamed Shirzad, Ameya Velingker, Balaji Venkatachalam, Danica J. Sutherland, Ali Kemal Sinop

We show that incorporating Exphormer into the recently-proposed GraphGPS framework produces models with competitive empirical results on a wide variety of graph datasets, including state-of-the-art results on three datasets.

Ranked #1 on Graph Classification on MNIST

Graph Classification Graph Learning +3

142

Paper
Code

Differentially Private Neural Tangent Kernels for Privacy-Preserving Data Generation

no code implementations • 3 Mar 2023 • Yilin Yang, Kamil Adamczewski, Danica J. Sutherland, Xiaoxiao Li, Mijung Park

Maximum mean discrepancy (MMD) is a particularly useful distance metric for differentially private data generation: when used with finite-dimensional features it allows us to summarize and privatize the data distribution once, which we can repeatedly use during generator training without further privacy loss.

Privacy Preserving

Paper
Add Code

How to prepare your task head for finetuning

no code implementations • 11 Feb 2023 • Yi Ren, Shangmin Guo, Wonho Bae, Danica J. Sutherland

We identify a significant trend in the effect of changes in this initial energy on the resulting features after fine-tuning.

Paper
Add Code

Efficient Conditionally Invariant Representation Learning

1 code implementation • 16 Dec 2022 • Roman Pogodin, Namrata Deka, Yazhe Li, Danica J. Sutherland, Victor Veitch, Arthur Gretton

The procedure requires just a single ridge regression from $Y$ to kernelized features of $Z$, which can be done in advance.

Fairness regression +1

Paper
Code

MMD-B-Fair: Learning Fair Representations with Statistical Testing

1 code implementation • 15 Nov 2022 • Namrata Deka, Danica J. Sutherland

We introduce a method, MMD-B-Fair, to learn fair representations of data via kernel two-sample testing.

Representation Learning Two-sample testing

Paper
Code

A Non-Asymptotic Moreau Envelope Theory for High-Dimensional Generalized Linear Models

1 code implementation • 21 Oct 2022 • Lijia Zhou, Frederic Koehler, Pragya Sur, Danica J. Sutherland, Nathan Srebro

We prove a new generalization bound that shows for any class of linear predictors in Gaussian space, the Rademacher complexity of the class and the training error under any continuous loss $\ell$ can control the test error under all Moreau envelopes of the loss $\ell$.

LEMMA

Paper
Code

Object Discovery via Contrastive Learning for Weakly Supervised Object Detection

1 code implementation • 16 Aug 2022 • Jinhwan Seo, Wonho Bae, Danica J. Sutherland, Junhyug Noh, Daijin Kim

Weakly Supervised Object Detection (WSOD) is a task that detects objects in an image using a model trained only on image-level annotations.

Ranked #1 on Weakly Supervised Object Detection on MS-COCO-2017

Contrastive Learning Object +2

Paper
Code

A Fast, Well-Founded Approximation to the Empirical Neural Tangent Kernel

no code implementations • 25 Jun 2022 • Mohamad Amin Mohamadi, Wonho Bae, Danica J. Sutherland

Empirical neural tangent kernels (eNTKs) can provide a good understanding of a given network's representation: they are often far less expensive to compute and applicable more broadly than infinite width NTKs.

Paper
Add Code

Making Look-Ahead Active Learning Strategies Feasible with Neural Tangent Kernels

no code implementations • 25 Jun 2022 • Mohamad Amin Mohamadi, Wonho Bae, Danica J. Sutherland

We propose a new method for approximating active learning acquisition strategies that are based on retraining with hypothetically-labeled candidate data points.

Active Learning

Paper
Add Code

Evaluating Graph Generative Models with Contrastively Learned Features

1 code implementation • 13 Jun 2022 • Hamed Shirzad, Kaveh Hassani, Danica J. Sutherland

A wide range of models have been proposed for Graph Generative Models, necessitating effective methods to evaluate their quality.

Subgraph Counting

Paper
Code

Pre-trained Perceptual Features Improve Differentially Private Image Generation

1 code implementation • 25 May 2022 • Fredrik Harder, Milad Jalali Asadabadi, Danica J. Sutherland, Mijung Park

Training even moderately-sized generative models with differentially-private stochastic gradient descent (DP-SGD) is difficult: the required level of noise for reasonable levels of privacy is simply too large.

Image Generation Transfer Learning

Paper
Code

One Weird Trick to Improve Your Semi-Weakly Supervised Semantic Segmentation Model

no code implementations • 2 May 2022 • Wonho Bae, Junhyug Noh, Milad Jalali Asadabadi, Danica J. Sutherland

Semi-weakly supervised semantic segmentation (SWSSS) aims to train a model to identify objects in images based on a small number of images with pixel-level labels, and many more images with only image-level labels.

Pseudo Label Segmentation +2

Paper
Add Code

Better Supervisory Signals by Observing Learning Paths

1 code implementation • ICLR 2022 • Yi Ren, Shangmin Guo, Danica J. Sutherland

Observing the learning path not only provides a new perspective for understanding knowledge distillation, overfitting, and learning dynamics, but also reveals that the supervisory signal of a teacher network can be very unstable near the best points in training on real tasks.

Knowledge Distillation

Paper
Code

Optimistic Rates: A Unifying Theory for Interpolation Learning and Regularization in Linear Regression

no code implementations • 8 Dec 2021 • Lijia Zhou, Frederic Koehler, Danica J. Sutherland, Nathan Srebro

We study a localized notion of uniform convergence known as an "optimistic rate" (Panchenko 2002; Srebro et al. 2010) for linear regression with Gaussian data.

regression

Paper
Add Code

Uniform Convergence of Interpolators: Gaussian Width, Norm Bounds, and Benign Overfitting

no code implementations • NeurIPS 2021 • Frederic Koehler, Lijia Zhou, Danica J. Sutherland, Nathan Srebro

We consider interpolation learning in high-dimensional linear regression with Gaussian data, and prove a generic uniform convergence guarantee on the generalization error of interpolators in an arbitrary hypothesis class in terms of the class's Gaussian width.

Generalization Bounds regression

Paper
Add Code

Self-Supervised Learning with Kernel Dependence Maximization

1 code implementation • NeurIPS 2021 • Yazhe Li, Roman Pogodin, Danica J. Sutherland, Arthur Gretton

We approach self-supervised learning of image representations from a statistical dependence perspective, proposing Self-Supervised Learning with the Hilbert-Schmidt Independence Criterion (SSL-HSIC).

Depth Estimation Object Recognition +2

Paper
Code

Meta Two-Sample Testing: Learning Kernels for Testing with Limited Data

1 code implementation • NeurIPS 2021 • Feng Liu, Wenkai Xu, Jie Lu, Danica J. Sutherland

In realistic scenarios with very limited numbers of data samples, however, it can be challenging to identify a kernel powerful enough to distinguish complex distributions.

Two-sample testing Vocal Bursts Valence Prediction

Paper
Code

Uniform Convergence of Interpolators: Gaussian Width, Norm Bounds and Benign Overfitting

no code implementations • NeurIPS 2021 • Frederic Koehler, Lijia Zhou, Danica J. Sutherland, Nathan Srebro

Generalization Bounds regression

Paper
Add Code

Does Invariant Risk Minimization Capture Invariance?

no code implementations • 4 Jan 2021 • Pritish Kamath, Akilesh Tangella, Danica J. Sutherland, Nathan Srebro

We show that the Invariant Risk Minimization (IRM) formulation of Arjovsky et al. (2019) can fail to capture "natural" invariances, at least when used in its practical "linear" form, and even on very simple problems which directly follow the motivating examples for IRM.

Paper
Add Code

On Uniform Convergence and Low-Norm Interpolation Learning

no code implementations • NeurIPS 2020 • Lijia Zhou, Danica J. Sutherland, Nathan Srebro

But we argue we can explain the consistency of the minimal-norm interpolator with a slightly weaker, yet standard, notion: uniform convergence of zero-error predictors in a norm ball.

Paper
Add Code

Learning Deep Kernels for Non-Parametric Two-Sample Tests

1 code implementation • ICML 2020 • Feng Liu, Wenkai Xu, Jie Lu, Guangquan Zhang, Arthur Gretton, Danica J. Sutherland

We propose a class of kernel-based two-sample tests, which aim to determine whether two sets of samples are drawn from the same distribution.

Ranked #1 on Two-sample testing on HIGGS Data Set

Two-sample testing Vocal Bursts Valence Prediction

Paper
Code

Unbiased estimators for the variance of MMD estimators

no code implementations • 5 Jun 2019 • Danica J. Sutherland, Namrata Deka

The maximum mean discrepancy (MMD) is a kernel-based distance between probability distributions useful in many applications (Gretton et al. 2012), bearing a simple estimator with pleasing computational and statistical properties.

Two-sample testing

Paper
Add Code

Learning deep kernels for exponential family densities

1 code implementation • 20 Nov 2018 • Li Wenliang, Danica J. Sutherland, Heiko Strathmann, Arthur Gretton

The kernel exponential family is a rich class of distributions, which can be fit efficiently and with statistical guarantees by score matching.

Paper
Code

On gradient regularizers for MMD GANs

1 code implementation • NeurIPS 2018 • Michael Arbel, Danica J. Sutherland, Mikołaj Bińkowski, Arthur Gretton

We propose a principled method for gradient-based regularization of the critic of GAN-like models trained by adversarially optimizing the kernel of a Maximum Mean Discrepancy (MMD).

Ranked #128 on Image Generation on CIFAR-10

Image Generation

Paper
Code

Demystifying MMD GANs

7 code implementations • ICLR 2018 • Mikołaj Bińkowski, Danica J. Sutherland, Michael Arbel, Arthur Gretton

We investigate the training and performance of generative adversarial networks using the Maximum Mean Discrepancy (MMD) as critic, termed MMD GANs.

3,120

Paper
Code

Efficient and principled score estimation with Nyström kernel exponential families

1 code implementation • 23 May 2017 • Danica J. Sutherland, Heiko Strathmann, Michael Arbel, Arthur Gretton

We propose a fast method with statistical guarantees for learning an exponential family density model where the natural parameter is in a reproducing kernel Hilbert space, and may be infinite-dimensional.

Computational Efficiency Denoising +1

Paper
Code

Bayesian Approaches to Distribution Regression

1 code implementation • 11 May 2017 • Ho Chung Leon Law, Danica J. Sutherland, Dino Sejdinovic, Seth Flaxman

Distribution regression has recently attracted much interest as a generic solution to the problem of supervised learning where labels are available at the group level, rather than at the individual level.

regression

Paper
Code

Fixing an error in Caponnetto and de Vito (2007)

no code implementations • 9 Feb 2017 • Danica J. Sutherland

The seminal paper of Caponnetto and de Vito (2007) provides minimax-optimal rates for kernel ridge regression in a very general setting.

regression

Paper
Add Code

Generative Models and Model Criticism via Optimized Maximum Mean Discrepancy

1 code implementation • 14 Nov 2016 • Danica J. Sutherland, Hsiao-Yu Tung, Heiko Strathmann, Soumyajit De, Aaditya Ramdas, Alex Smola, Arthur Gretton

In this context, the MMD may be used in two roles: first, as a discriminator, either directly on the samples, or on features of the samples.

205

Paper
Code

Understanding the 2016 US Presidential Election using ecological inference and distribution regression with census microdata

1 code implementation • 11 Nov 2016 • Seth Flaxman, Danica J. Sutherland, Yu-Xiang Wang, Yee Whye Teh

We combine fine-grained spatially referenced census data with the vote outcomes from the 2016 US presidential election.

regression

Paper
Code

Deep Mean Maps

no code implementations • 13 Nov 2015 • Junier B. Oliva, Danica J. Sutherland, Barnabás Póczos, Jeff Schneider

The use of distributions and high-level features from deep architecture has become commonplace in modern computer vision.

Paper
Add Code

Linear-time Learning on Distributions with Approximate Kernel Embeddings

no code implementations • 24 Sep 2015 • Danica J. Sutherland, Junier B. Oliva, Barnabás Póczos, Jeff Schneider

This work develops the first random features for pdfs whose dot product approximates kernels using these non-Euclidean metrics, allowing estimators using such kernels to scale to large datasets by working in a primal space, without computing large Gram matrices.

BIG-bench Machine Learning

Paper
Add Code

Kernels on Sample Sets via Nonparametric Divergence Estimates

no code implementations • 1 Feb 2012 • Danica J. Sutherland, Liang Xiong, Barnabás Póczos, Jeff Schneider

Most machine learning algorithms, such as classification or regression, treat the individual data point as the object of interest.

Anomaly Detection BIG-bench Machine Learning +2

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.