Search Results for author: Lester Mackey

Found 64 papers, 35 papers with code

Debiased Distribution Compression

no code implementations • 18 Apr 2024 • Lingxiao Li, Raaz Dwivedi, Lester Mackey

Modern compression methods can summarize a target distribution $\mathbb{P}$ more succinctly than i. i. d.

Paper
Add Code

SatCLIP: Global, General-Purpose Location Embeddings with Satellite Imagery

1 code implementation • 28 Nov 2023 • Konstantin Klemmer, Esther Rolf, Caleb Robinson, Lester Mackey, Marc Rußwurm

The resulting SatCLIP location encoder efficiently summarizes the characteristics of any given location for convenient use in downstream tasks.

Contrastive Learning

157

Paper
Code

Self-Taught Optimizer (STOP): Recursively Self-Improving Code Generation

1 code implementation • 3 Oct 2023 • Eric Zelikman, Eliana Lorch, Lester Mackey, Adam Tauman Kalai

In this work, we use a language-model-infused scaffolding program to improve itself.

Code Generation Language Modelling

Paper
Code

Reflections from the Workshop on AI-Assisted Decision Making for Conservation

no code implementations • 17 Jul 2023 • Lily Xu, Esther Rolf, Sara Beery, Joseph R. Bennett, Tanya Berger-Wolf, Tanya Birch, Elizabeth Bondi-Kelly, Justin Brashares, Melissa Chapman, Anthony Corso, Andrew Davies, Nikhil Garg, Angela Gaylard, Robert Heilmayr, Hannah Kerner, Konstantin Klemmer, Vipin Kumar, Lester Mackey, Claire Monteleoni, Paul Moorcroft, Jonathan Palmer, Andrew Perrault, David Thau, Milind Tambe

In this white paper, we synthesize key points made during presentations and discussions from the AI-Assisted Decision Making for Conservation workshop, hosted by the Center for Research on Computation and Society at Harvard University on October 20-21, 2022.

Decision Making

Paper
Add Code

Do Language Models Know When They're Hallucinating References?

no code implementations • 29 May 2023 • Ayush Agrawal, Mirac Suzgun, Lester Mackey, Adam Tauman Kalai

In this work, we focus on hallucinated book and article references and present them as the "model organism" of language model hallucination research, due to their frequent and easy-to-discern nature.

Hallucination Language Modelling +1

Paper
Add Code

Learning Rate Free Sampling in Constrained Domains

1 code implementation • 24 May 2023 • Louis Sharrock, Lester Mackey, Christopher Nemeth

We introduce a suite of new particle-based algorithms for sampling in constrained domains which are entirely learning rate free.

Fairness

Paper
Code

Compress Then Test: Powerful Kernel Testing in Near-linear Time

1 code implementation • 14 Jan 2023 • Carles Domingo-Enrich, Raaz Dwivedi, Lester Mackey

To address these shortcomings, we introduce Compress Then Test (CTT), a new framework for high-powered kernel testing based on sample compression.

Two-sample testing

Paper
Code

Controlling Moments with Kernel Stein Discrepancies

no code implementations • 10 Nov 2022 • Heishiro Kanagawa, Alessandro Barp, Arthur Gretton, Lester Mackey

Kernel Stein discrepancies (KSDs) measure the quality of a distributional approximation and can be computed even when the target density has an intractable normalizing constant.

Paper
Add Code

Budget-Constrained Bounds for Mini-Batch Estimation of Optimal Transport

no code implementations • 24 Oct 2022 • David Alvarez-Melis, Nicolò Fusi, Lester Mackey, Tal Wagner

Optimal Transport (OT) is a fundamental tool for comparing probability distributions, but its exact computation remains prohibitive for large datasets.

Paper
Add Code

Targeted Separation and Convergence with Kernel Discrepancies

no code implementations • 26 Sep 2022 • Alessandro Barp, Carl-Johann Simon-Gabriel, Mark Girolami, Lester Mackey

Maximum mean discrepancies (MMDs) like the kernel Stein discrepancy (KSD) have grown central to a wide range of applications, including hypothesis testing, sampler selection, distribution approximation, and variational inference.

Variational Inference

Paper
Add Code

Adaptive Bias Correction for Improved Subseasonal Forecasting

1 code implementation • 21 Sep 2022 • Soukayna Mouatadid, Paulo Orenstein, Genevieve Flaspohler, Judah Cohen, Miruna Oprescu, Ernest Fraenkel, Lester Mackey

Subseasonal forecasting -- predicting temperature and precipitation 2 to 6 weeks ahead -- is critical for effective water allocation, wildfire management, and drought and flood mitigation.

Management Precipitation Forecasting

Paper
Code

Scalable Spike-and-Slab

2 code implementations • 4 Apr 2022 • Niloy Biswas, Lester Mackey, Xiao-Li Meng

Spike-and-slab priors are commonly used for Bayesian variable selection, due to their interpretability and favorable statistical properties.

Variable Selection

Paper
Code

Gradient Estimation with Discrete Stein Operators

1 code implementation • 19 Feb 2022 • Jiaxin Shi, Yuhao Zhou, Jessica Hwang, Michalis K. Titsias, Lester Mackey

Gradient estimation -- approximating the gradient of an expectation with respect to the parameters of a distribution -- is central to the solution of many machine learning problems.

Paper
Code

Bounding Wasserstein distance with couplings

1 code implementation • pproximateinference AABI Symposium 2022 • Niloy Biswas, Lester Mackey

Markov chain Monte Carlo (MCMC) provides asymptotically consistent estimates of intractable posterior expectations as the number of iterations tends to infinity.

regression

Paper
Code

Distribution Compression in Near-linear Time

1 code implementation • ICLR 2022 • Abhishek Shetty, Raaz Dwivedi, Lester Mackey

Near-optimal thinning procedures achieve this goal by sampling $n$ points from a Markov chain and identifying $\sqrt{n}$ points with $\widetilde{\mathcal{O}}(1/\sqrt{n})$ discrepancy to $\mathbb{P}$.

Paper
Code

Generalized Kernel Thinning

1 code implementation • ICLR 2022 • Raaz Dwivedi, Lester Mackey

Fourth, we establish that KT applied to a sum of the target and power kernels (a procedure we call KT+) simultaneously inherits the improved MMD guarantees of power KT and the tighter individual function guarantees of target KT.

Paper
Code

SubseasonalClimateUSA: A Dataset for Subseasonal Forecasting and Benchmarking

2 code implementations • NeurIPS 2023 • Soukayna Mouatadid, Paulo Orenstein, Genevieve Flaspohler, Miruna Oprescu, Judah Cohen, Franklyn Wang, Sean Knight, Maria Geogdzhayeva, Sam Levang, Ernest Fraenkel, Lester Mackey

To streamline this process and accelerate future development, we introduce SubseasonalClimateUSA, a curated dataset for training and benchmarking subseasonal forecasting models in the United States.

Benchmarking

Paper
Code

Social Norm Bias: Residual Harms of Fairness-Aware Algorithms

no code implementations • 25 Aug 2021 • Myra Cheng, Maria De-Arteaga, Lester Mackey, Adam Tauman Kalai

Many modern machine learning algorithms mitigate bias by enforcing fairness constraints across coarsely-defined groups related to a sensitive attribute like gender or race.

Attribute Decision Making +1

Paper
Add Code

Near-optimal inference in adaptive linear regression

no code implementations • 5 Jul 2021 • Koulik Khamaru, Yash Deshpande, Tor Lattimore, Lester Mackey, Martin J. Wainwright

We propose a family of online debiasing estimators to correct these distributional anomalies in least squares estimation.

Active Learning regression +2

Paper
Add Code

Sampling with Mirrored Stein Operators

2 code implementations • ICLR 2022 • Jiaxin Shi, Chang Liu, Lester Mackey

We introduce a new family of particle evolution samplers suitable for constrained domains and non-Euclidean geometries.

valid

Paper
Code

Online Learning with Optimism and Delay

1 code implementation • 13 Jun 2021 • Genevieve Flaspohler, Francesco Orabona, Judah Cohen, Soukayna Mouatadid, Miruna Oprescu, Paulo Orenstein, Lester Mackey

Inspired by the demands of real-time climate and weather forecasting, we develop optimistic online learning algorithms that require no parameter tuning and have optimal regret guarantees under delayed feedback.

Benchmarking Weather Forecasting

Paper
Code

Kernel Thinning

1 code implementation • 12 May 2021 • Raaz Dwivedi, Lester Mackey

The maximum discrepancy in integration error is $\mathcal{O}_d(n^{-1/2}\sqrt{\log n})$ in probability for compactly supported $\mathbb{P}$ and $\mathcal{O}_d(n^{-\frac{1}{2}} (\log n)^{(d+1)/2}\sqrt{\log\log n})$ for sub-exponential $\mathbb{P}$ on $\mathbb{R}^d$.

Paper
Code

Initialization and Regularization of Factorized Neural Layers

1 code implementation • ICLR 2021 • Mikhail Khodak, Neil Tenenholtz, Lester Mackey, Nicolò Fusi

In model compression, we show that they enable low-rank methods to significantly outperform both unstructured sparsity and tensor methods on the task of training low-memory residual networks; analogs of the schemes also improve the performance of tensor decomposition techniques.

Knowledge Distillation Model Compression +2

Paper
Code

Knowledge Distillation as Semiparametric Inference

1 code implementation • ICLR 2021 • Tri Dao, Govinda M Kamath, Vasilis Syrgkanis, Lester Mackey

A popular approach to model compression is to train an inexpensive student model to mimic the class probabilities of a highly accurate but cumbersome teacher model.

Knowledge Distillation Model Compression

Paper
Code

Model-specific Data Subsampling with Influence Functions

no code implementations • 20 Oct 2020 • Anant Raj, Cameron Musco, Lester Mackey, Nicolo Fusi

Model selection requires repeatedly evaluating models on a given dataset and measuring their relative performances.

BIG-bench Machine Learning Model Selection

Paper
Add Code

Independent versus truncated finite approximations for Bayesian nonparametric inference

no code implementations • NeurIPS Workshop ICBINB 2020 • Tin D. Nguyen, Jonathan H. Huggins, Lorenzo Masoero, Lester Mackey, Tamara Broderick

Bayesian nonparametric models based on completely random measures (CRMs) offers flexibility when the number of clusters or latent components in a data set is unknown.

Image Denoising

Paper
Add Code

Independent finite approximations for Bayesian nonparametric inference

no code implementations • 22 Sep 2020 • Tin D. Nguyen, Jonathan Huggins, Lorenzo Masoero, Lester Mackey, Tamara Broderick

We call our construction the automated independent finite approximation (AIFA).

Image Denoising

Paper
Add Code

Cross-validation Confidence Intervals for Test Error

1 code implementation • NeurIPS 2020 • Pierre Bayle, Alexandre Bayle, Lucas Janson, Lester Mackey

This work develops central limit theorems for cross-validation and consistent estimators of its asymptotic variance under weak stability conditions on the learning algorithm.

valid

Paper
Code

Stochastic Stein Discrepancies

1 code implementation • NeurIPS 2020 • Jackson Gorham, Anant Raj, Lester Mackey

Stein discrepancies (SDs) monitor convergence and non-convergence in approximate inference when exact integration and sampling are intractable.

Open-Ended Question Answering

Paper
Code

Metrizing Weak Convergence with Maximum Mean Discrepancies

no code implementations • 16 Jun 2020 • Carl-Johann Simon-Gabriel, Alessandro Barp, Bernhard Schölkopf, Lester Mackey

More precisely, we prove that, on a locally compact, non-compact, Hausdorff space, the MMD of a bounded continuous Borel measurable kernel k, whose reproducing kernel Hilbert space (RKHS) functions vanish at infinity, metrizes the weak convergence of probability measures if and only if k is continuous and integrally strictly positive definite (i. s. p. d.)

Paper
Add Code

Minimax Estimation of Conditional Moment Models

1 code implementation • NeurIPS 2020 • Nishanth Dikkala, Greg Lewis, Lester Mackey, Vasilis Syrgkanis

We develop an approach for estimating models described via conditional moment restrictions, with a prototypical application being non-parametric instrumental variable regression.

Paper
Code

Optimal Thinning of MCMC Output

3 code implementations • pproximateinference AABI Symposium 2021 • Marina Riabiz, Wilson Chen, Jon Cockayne, Pawel Swietach, Steven A. Niederer, Lester Mackey, Chris. J. Oates

The use of heuristics to assess the convergence and compress the output of Markov chain Monte Carlo can be sub-optimal in terms of the empirical approximations that are produced.

Paper
Code

Weighted Meta-Learning

no code implementations • 20 Mar 2020 • Diana Cai, Rishit Sheth, Lester Mackey, Nicolo Fusi

Meta-learning leverages related source tasks to learn an initialization that can be quickly fine-tuned to a target task with limited labeled examples.

Meta-Learning

Paper
Add Code

Approximate Cross-validation: Guarantees for Model Assessment and Selection

1 code implementation • 2 Mar 2020 • Ashia Wilson, Maximilian Kasy, Lester Mackey

Cross-validation (CV) is a popular approach for assessing and selecting predictive models.

Model Selection

Paper
Code

Importance Sampling via Local Sensitivity

no code implementations • 4 Nov 2019 • Anant Raj, Cameron Musco, Lester Mackey

Unfortunately, sensitivity sampling is difficult to apply since (1) it is unclear how to efficiently compute the sensitivity scores and (2) the sample size required is often impractically large.

Paper
Add Code

Single Point Transductive Prediction

1 code implementation • ICML 2020 • Nilesh Tripuraneni, Lester Mackey

Standard methods in supervised learning separate training and prediction: the model is fit independently of any test points it may encounter.

Paper
Code

A Kernel Stein Test for Comparing Latent Variable Models

1 code implementation • 1 Jul 2019 • Heishiro Kanagawa, Wittawat Jitkrittum, Lester Mackey, Kenji Fukumizu, Arthur Gretton

We propose a kernel-based nonparametric test of relative goodness of fit, where the goal is to compare two models, both of which may have unobserved latent variables, such that the marginal distribution of the observed variables is intractable.

Paper
Code

Stochastic Runge-Kutta Accelerates Langevin Monte Carlo and Beyond

no code implementations • NeurIPS 2019 • Xuechen Li, Denny Wu, Lester Mackey, Murat A. Erdogdu

In this paper, we establish the convergence rate of sampling algorithms obtained by discretizing smooth It\^o diffusions exhibiting fast Wasserstein-$2$ contraction, based on local deviation properties of the integration scheme.

Numerical Integration

Paper
Add Code

Minimum Stein Discrepancy Estimators

no code implementations • NeurIPS 2019 • Alessandro Barp, Francois-Xavier Briol, Andrew B. Duncan, Mark Girolami, Lester Mackey

We provide a unifying perspective of these techniques as minimum Stein discrepancy estimators, and use this lens to design new diffusion kernel Stein discrepancy (DKSD) and diffusion score matching (DSM) estimators with complementary strengths.

Paper
Add Code

Stein Point Markov Chain Monte Carlo

1 code implementation • 9 May 2019 • Wilson Ye Chen, Alessandro Barp, François-Xavier Briol, Jackson Gorham, Mark Girolami, Lester Mackey, Chris. J. Oates

Stein Points are a class of algorithms for this task, which proceed by sequentially minimising a Stein discrepancy between the empirical measure and the target and, hence, require the solution of a non-convex optimisation problem to obtain each new point.

Bayesian Inference

Paper
Code

Model Compression with Generative Adversarial Networks

no code implementations • ICLR 2019 • Ruishan Liu, Nicolo Fusi, Lester Mackey

Our GAN-assisted model compression (GAN-MC) significantly improves student accuracy for expensive models such as deep neural networks and large random forests on both image and tabular datasets.

Generative Adversarial Network Image Classification +1

Paper
Add Code

Accelerating Rescaled Gradient Descent: Fast Optimization of Smooth Functions

1 code implementation • NeurIPS 2019 • Ashia Wilson, Lester Mackey, Andre Wibisono

We also introduce a new first-order algorithm, called rescaled gradient descent (RGD), and show that RGD achieves a faster convergence rate than gradient descent provided the function is strongly smooth -- a natural generalization of the standard smoothness assumption on the objective function.

Optimization and Control

Paper
Code

Teacher-Student Compression with Generative Adversarial Networks

1 code implementation • ICLR 2019 • Ruishan Liu, Nicolo Fusi, Lester Mackey

Our GAN-assisted TSC (GAN-TSC) significantly improves student accuracy for expensive models such as large random forests and deep neural networks on both tabular and image datasets.

Generative Adversarial Network Image Classification +1

Paper
Code

Global Non-convex Optimization with Discretized Diffusions

no code implementations • NeurIPS 2018 • Murat A. Erdogdu, Lester Mackey, Ohad Shamir

An Euler discretization of the Langevin diffusion is known to converge to the global minimizers of certain convex and non-convex optimization problems.

Paper
Add Code

Improving Subseasonal Forecasting in the Western U.S. with Machine Learning

2 code implementations • 19 Sep 2018 • Jessica Hwang, Paulo Orenstein, Judah Cohen, Karl Pfeiffer, Lester Mackey

We hope that both our dataset and our methods will help to advance the state of the art in subseasonal forecasting.

BIG-bench Machine Learning Model Selection +1

Paper
Code

Random Feature Stein Discrepancies

1 code implementation • NeurIPS 2018 • Jonathan H. Huggins, Lester Mackey

Computable Stein discrepancies have been deployed for a variety of applications, ranging from sampler selection in posterior inference to approximate Bayesian inference to goodness-of-fit testing.

Bayesian Inference

Paper
Code

DeepMiner: Discovering Interpretable Representations for Mammogram Classification and Explanation

1 code implementation • 31 May 2018 • Jimmy Wu, Bolei Zhou, Diondra Peck, Scott Hsieh, Vandana Dialani, Lester Mackey, Genevieve Patterson

We propose DeepMiner, a framework to discover interpretable representations in deep neural networks and to build explanations for medical predictions.

Classification General Classification +1

Paper
Code

Stein Points

1 code implementation • ICML 2018 • Wilson Ye Chen, Lester Mackey, Jackson Gorham, François-Xavier Briol, Chris. J. Oates

An important task in computational statistics and machine learning is to approximate a posterior distribution $p(x)$ with an empirical measure supported on a set of representative points $\{x_i\}_{i=1}^n$.

Paper
Code

Expert identification of visual primitives used by CNNs during mammogram classification

1 code implementation • 13 Mar 2018 • Jimmy Wu, Diondra Peck, Scott Hsieh, Vandana Dialani, Constance D. Lehman, Bolei Zhou, Vasilis Syrgkanis, Lester Mackey, Genevieve Patterson

This work interprets the internal representations of deep neural networks trained for classification of diseased tissue in 2D mammograms.

Classification General Classification

Paper
Code

Accurate Inference for Adaptive Linear Models

1 code implementation • ICML 2018 • Yash Deshpande, Lester Mackey, Vasilis Syrgkanis, Matt Taddy

Estimators computed from adaptively collected data do not behave like their non-adaptive brethren.

Time Series Time Series Analysis

Paper
Code

Orthogonal Machine Learning: Power and Limitations

1 code implementation • ICML 2018 • Lester Mackey, Vasilis Syrgkanis, Ilias Zadik

Double machine learning provides $\sqrt{n}$-consistent estimates of parameters of interest even when high-dimensional or nonparametric nuisance parameters are estimated at an $n^{-1/4}$ rate.

2k BIG-bench Machine Learning +2

Paper
Code

Improving Gibbs Sampler Scan Quality with DoGS

no code implementations • ICML 2017 • Ioannis Mitliagkas, Lester Mackey

The pairwise influence matrix of Dobrushin has long been used as an analytical tool to bound the rate of convergence of Gibbs sampling.

Image Segmentation Object Recognition +2

Paper
Add Code

Measuring Sample Quality with Kernels

no code implementations • ICML 2017 • Jackson Gorham, Lester Mackey

We develop a theory of weak convergence for KSDs based on Stein's method, demonstrate that commonly used KSDs fail to detect non-convergence even for Gaussian targets, and show that kernels with slowly decaying tails provably determine convergence for a large class of target distributions.

Two-sample testing

Paper
Add Code

Measuring Sample Quality with Diffusions

no code implementations • 21 Nov 2016 • Jackson Gorham, Andrew B. Duncan, Sebastian J. Vollmer, Lester Mackey

Stein's method for measuring convergence to a continuous target distribution relies on an operator characterizing the target and Stein factor bounds on the solutions of an associated differential equation.

Paper
Add Code

Jet-Images -- Deep Learning Edition

1 code implementation • 16 Nov 2015 • Luke de Oliveira, Michael Kagan, Lester Mackey, Benjamin Nachman, Ariel Schwartzman

Building on the notion of a particle physics detector as a camera and the collimated streams of high energy particles, or jets, it measures as an image, we investigate the potential of machine learning techniques based on deep learning architectures to identify highly boosted W bosons.

Jet Tagging

Paper
Code

Fuzzy Jets

no code implementations • 7 Sep 2015 • Lester Mackey, Benjamin Nachman, Ariel Schwartzman, Conrad Stansbury

Collimated streams of particles produced in high energy physics experiments are organized using clustering algorithms to form jets.

Clustering Jet Tagging

Paper
Add Code

Measuring Sample Quality with Stein's Method

no code implementations • NeurIPS 2015 • Jackson Gorham, Lester Mackey

To improve the efficiency of Monte Carlo estimation, practitioners are turning to biased Markov chain Monte Carlo procedures that trade off asymptotic exactness for computational speed.

Paper
Add Code

Weighted Classification Cascades for Optimizing Discovery Significance in the HiggsML Challenge

no code implementations • 9 Sep 2014 • Lester Mackey, Jordan Bryan, Man Yue Mo

We introduce a minorization-maximization approach to optimizing common measures of discovery significance in high energy physics.

BIG-bench Machine Learning Binary Classification +2

Paper
Add Code

Corrupted Sensing: Novel Guarantees for Separating Structured Signals

no code implementations • 11 May 2013 • Rina Foygel, Lester Mackey

While an arbitrary signal cannot be recovered in the face of arbitrary corruption, tractable recovery is possible when both signal and corruption are suitably structured.

Paper
Add Code

Distributed Low-rank Subspace Segmentation

no code implementations • 20 Apr 2013 • Ameet Talwalkar, Lester Mackey, Yadong Mu, Shih-Fu Chang, Michael. I. Jordan

Vision problems ranging from image clustering to motion segmentation to semi-supervised learning can naturally be framed as subspace segmentation problems, in which one aims to recover multiple low-dimensional subspaces from noisy and corrupted input data.

Clustering Event Detection +4

Paper
Add Code

The asymptotics of ranking algorithms

no code implementations • 7 Apr 2012 • John C. Duchi, Lester Mackey, Michael. I. Jordan

With these negative results as motivation, we present a new approach to supervised ranking based on aggregation of partial preferences, and we develop $U$-statistic-based empirical risk minimization procedures.

Paper
Add Code

Combinatorial clustering and the beta negative binomial process

no code implementations • 8 Nov 2011 • Tamara Broderick, Lester Mackey, John Paisley, Michael. I. Jordan

We show that the NBP is conjugate to the beta process, and we characterize the posterior distribution under the beta-negative binomial process (BNBP) and hierarchical models based on the BNBP (the HBNBP).

Clustering Image Segmentation +2

Paper
Add Code

Distributed Matrix Completion and Robust Factorization

no code implementations • 5 Jul 2011 • Lester Mackey, Ameet Talwalkar, Michael. I. Jordan

If learning methods are to scale to the massive sizes of modern datasets, it is essential for the field of machine learning to embrace parallel and distributed computing.

Collaborative Filtering Distributed Computing +1

Paper
Add Code

Feature-Weighted Linear Stacking

3 code implementations • 3 Nov 2009 • Joseph Sill, Gabor Takacs, Lester Mackey, David Lin

Ensemble methods, such as stacking, are designed to boost predictive accuracy by blending the predictions of multiple machine learning models.

Collaborative Filtering

117

Paper
Code

Cannot find the paper you are looking for? You can Submit a new open access paper.