Search Results for author: Lester Mackey

Found 64 papers, 35 papers with code

Debiased Distribution Compression

no code implementations18 Apr 2024 Lingxiao Li, Raaz Dwivedi, Lester Mackey

Modern compression methods can summarize a target distribution $\mathbb{P}$ more succinctly than i. i. d.

SatCLIP: Global, General-Purpose Location Embeddings with Satellite Imagery

1 code implementation28 Nov 2023 Konstantin Klemmer, Esther Rolf, Caleb Robinson, Lester Mackey, Marc Rußwurm

The resulting SatCLIP location encoder efficiently summarizes the characteristics of any given location for convenient use in downstream tasks.

Contrastive Learning

Reflections from the Workshop on AI-Assisted Decision Making for Conservation

no code implementations17 Jul 2023 Lily Xu, Esther Rolf, Sara Beery, Joseph R. Bennett, Tanya Berger-Wolf, Tanya Birch, Elizabeth Bondi-Kelly, Justin Brashares, Melissa Chapman, Anthony Corso, Andrew Davies, Nikhil Garg, Angela Gaylard, Robert Heilmayr, Hannah Kerner, Konstantin Klemmer, Vipin Kumar, Lester Mackey, Claire Monteleoni, Paul Moorcroft, Jonathan Palmer, Andrew Perrault, David Thau, Milind Tambe

In this white paper, we synthesize key points made during presentations and discussions from the AI-Assisted Decision Making for Conservation workshop, hosted by the Center for Research on Computation and Society at Harvard University on October 20-21, 2022.

Decision Making

Do Language Models Know When They're Hallucinating References?

no code implementations29 May 2023 Ayush Agrawal, Mirac Suzgun, Lester Mackey, Adam Tauman Kalai

In this work, we focus on hallucinated book and article references and present them as the "model organism" of language model hallucination research, due to their frequent and easy-to-discern nature.

Hallucination Language Modelling +1

Learning Rate Free Sampling in Constrained Domains

1 code implementation24 May 2023 Louis Sharrock, Lester Mackey, Christopher Nemeth

We introduce a suite of new particle-based algorithms for sampling in constrained domains which are entirely learning rate free.


Compress Then Test: Powerful Kernel Testing in Near-linear Time

1 code implementation14 Jan 2023 Carles Domingo-Enrich, Raaz Dwivedi, Lester Mackey

To address these shortcomings, we introduce Compress Then Test (CTT), a new framework for high-powered kernel testing based on sample compression.

Two-sample testing

Controlling Moments with Kernel Stein Discrepancies

no code implementations10 Nov 2022 Heishiro Kanagawa, Alessandro Barp, Arthur Gretton, Lester Mackey

Kernel Stein discrepancies (KSDs) measure the quality of a distributional approximation and can be computed even when the target density has an intractable normalizing constant.

Budget-Constrained Bounds for Mini-Batch Estimation of Optimal Transport

no code implementations24 Oct 2022 David Alvarez-Melis, Nicolò Fusi, Lester Mackey, Tal Wagner

Optimal Transport (OT) is a fundamental tool for comparing probability distributions, but its exact computation remains prohibitive for large datasets.

Targeted Separation and Convergence with Kernel Discrepancies

no code implementations26 Sep 2022 Alessandro Barp, Carl-Johann Simon-Gabriel, Mark Girolami, Lester Mackey

Maximum mean discrepancies (MMDs) like the kernel Stein discrepancy (KSD) have grown central to a wide range of applications, including hypothesis testing, sampler selection, distribution approximation, and variational inference.

Variational Inference

Adaptive Bias Correction for Improved Subseasonal Forecasting

1 code implementation21 Sep 2022 Soukayna Mouatadid, Paulo Orenstein, Genevieve Flaspohler, Judah Cohen, Miruna Oprescu, Ernest Fraenkel, Lester Mackey

Subseasonal forecasting -- predicting temperature and precipitation 2 to 6 weeks ahead -- is critical for effective water allocation, wildfire management, and drought and flood mitigation.

Management Precipitation Forecasting

Scalable Spike-and-Slab

2 code implementations4 Apr 2022 Niloy Biswas, Lester Mackey, Xiao-Li Meng

Spike-and-slab priors are commonly used for Bayesian variable selection, due to their interpretability and favorable statistical properties.

Variable Selection

Gradient Estimation with Discrete Stein Operators

1 code implementation19 Feb 2022 Jiaxin Shi, Yuhao Zhou, Jessica Hwang, Michalis K. Titsias, Lester Mackey

Gradient estimation -- approximating the gradient of an expectation with respect to the parameters of a distribution -- is central to the solution of many machine learning problems.

Bounding Wasserstein distance with couplings

1 code implementation pproximateinference AABI Symposium 2022 Niloy Biswas, Lester Mackey

Markov chain Monte Carlo (MCMC) provides asymptotically consistent estimates of intractable posterior expectations as the number of iterations tends to infinity.


Distribution Compression in Near-linear Time

1 code implementation ICLR 2022 Abhishek Shetty, Raaz Dwivedi, Lester Mackey

Near-optimal thinning procedures achieve this goal by sampling $n$ points from a Markov chain and identifying $\sqrt{n}$ points with $\widetilde{\mathcal{O}}(1/\sqrt{n})$ discrepancy to $\mathbb{P}$.

Generalized Kernel Thinning

1 code implementation ICLR 2022 Raaz Dwivedi, Lester Mackey

Fourth, we establish that KT applied to a sum of the target and power kernels (a procedure we call KT+) simultaneously inherits the improved MMD guarantees of power KT and the tighter individual function guarantees of target KT.

SubseasonalClimateUSA: A Dataset for Subseasonal Forecasting and Benchmarking

2 code implementations NeurIPS 2023 Soukayna Mouatadid, Paulo Orenstein, Genevieve Flaspohler, Miruna Oprescu, Judah Cohen, Franklyn Wang, Sean Knight, Maria Geogdzhayeva, Sam Levang, Ernest Fraenkel, Lester Mackey

To streamline this process and accelerate future development, we introduce SubseasonalClimateUSA, a curated dataset for training and benchmarking subseasonal forecasting models in the United States.


Social Norm Bias: Residual Harms of Fairness-Aware Algorithms

no code implementations25 Aug 2021 Myra Cheng, Maria De-Arteaga, Lester Mackey, Adam Tauman Kalai

Many modern machine learning algorithms mitigate bias by enforcing fairness constraints across coarsely-defined groups related to a sensitive attribute like gender or race.

Attribute Decision Making +1

Near-optimal inference in adaptive linear regression

no code implementations5 Jul 2021 Koulik Khamaru, Yash Deshpande, Tor Lattimore, Lester Mackey, Martin J. Wainwright

We propose a family of online debiasing estimators to correct these distributional anomalies in least squares estimation.

Active Learning regression +2

Sampling with Mirrored Stein Operators

2 code implementations ICLR 2022 Jiaxin Shi, Chang Liu, Lester Mackey

We introduce a new family of particle evolution samplers suitable for constrained domains and non-Euclidean geometries.


Online Learning with Optimism and Delay

1 code implementation13 Jun 2021 Genevieve Flaspohler, Francesco Orabona, Judah Cohen, Soukayna Mouatadid, Miruna Oprescu, Paulo Orenstein, Lester Mackey

Inspired by the demands of real-time climate and weather forecasting, we develop optimistic online learning algorithms that require no parameter tuning and have optimal regret guarantees under delayed feedback.

Benchmarking Weather Forecasting

Kernel Thinning

1 code implementation12 May 2021 Raaz Dwivedi, Lester Mackey

The maximum discrepancy in integration error is $\mathcal{O}_d(n^{-1/2}\sqrt{\log n})$ in probability for compactly supported $\mathbb{P}$ and $\mathcal{O}_d(n^{-\frac{1}{2}} (\log n)^{(d+1)/2}\sqrt{\log\log n})$ for sub-exponential $\mathbb{P}$ on $\mathbb{R}^d$.

Initialization and Regularization of Factorized Neural Layers

1 code implementation ICLR 2021 Mikhail Khodak, Neil Tenenholtz, Lester Mackey, Nicolò Fusi

In model compression, we show that they enable low-rank methods to significantly outperform both unstructured sparsity and tensor methods on the task of training low-memory residual networks; analogs of the schemes also improve the performance of tensor decomposition techniques.

Knowledge Distillation Model Compression +2

Knowledge Distillation as Semiparametric Inference

1 code implementation ICLR 2021 Tri Dao, Govinda M Kamath, Vasilis Syrgkanis, Lester Mackey

A popular approach to model compression is to train an inexpensive student model to mimic the class probabilities of a highly accurate but cumbersome teacher model.

Knowledge Distillation Model Compression

Model-specific Data Subsampling with Influence Functions

no code implementations20 Oct 2020 Anant Raj, Cameron Musco, Lester Mackey, Nicolo Fusi

Model selection requires repeatedly evaluating models on a given dataset and measuring their relative performances.

BIG-bench Machine Learning Model Selection

Independent versus truncated finite approximations for Bayesian nonparametric inference

no code implementations NeurIPS Workshop ICBINB 2020 Tin D. Nguyen, Jonathan H. Huggins, Lorenzo Masoero, Lester Mackey, Tamara Broderick

Bayesian nonparametric models based on completely random measures (CRMs) offers flexibility when the number of clusters or latent components in a data set is unknown.

Image Denoising

Cross-validation Confidence Intervals for Test Error

1 code implementation NeurIPS 2020 Pierre Bayle, Alexandre Bayle, Lucas Janson, Lester Mackey

This work develops central limit theorems for cross-validation and consistent estimators of its asymptotic variance under weak stability conditions on the learning algorithm.


Stochastic Stein Discrepancies

1 code implementation NeurIPS 2020 Jackson Gorham, Anant Raj, Lester Mackey

Stein discrepancies (SDs) monitor convergence and non-convergence in approximate inference when exact integration and sampling are intractable.

Open-Ended Question Answering

Metrizing Weak Convergence with Maximum Mean Discrepancies

no code implementations16 Jun 2020 Carl-Johann Simon-Gabriel, Alessandro Barp, Bernhard Schölkopf, Lester Mackey

More precisely, we prove that, on a locally compact, non-compact, Hausdorff space, the MMD of a bounded continuous Borel measurable kernel k, whose reproducing kernel Hilbert space (RKHS) functions vanish at infinity, metrizes the weak convergence of probability measures if and only if k is continuous and integrally strictly positive definite (i. s. p. d.)

Minimax Estimation of Conditional Moment Models

1 code implementation NeurIPS 2020 Nishanth Dikkala, Greg Lewis, Lester Mackey, Vasilis Syrgkanis

We develop an approach for estimating models described via conditional moment restrictions, with a prototypical application being non-parametric instrumental variable regression.

Optimal Thinning of MCMC Output

3 code implementations pproximateinference AABI Symposium 2021 Marina Riabiz, Wilson Chen, Jon Cockayne, Pawel Swietach, Steven A. Niederer, Lester Mackey, Chris. J. Oates

The use of heuristics to assess the convergence and compress the output of Markov chain Monte Carlo can be sub-optimal in terms of the empirical approximations that are produced.

Weighted Meta-Learning

no code implementations20 Mar 2020 Diana Cai, Rishit Sheth, Lester Mackey, Nicolo Fusi

Meta-learning leverages related source tasks to learn an initialization that can be quickly fine-tuned to a target task with limited labeled examples.


Approximate Cross-validation: Guarantees for Model Assessment and Selection

1 code implementation2 Mar 2020 Ashia Wilson, Maximilian Kasy, Lester Mackey

Cross-validation (CV) is a popular approach for assessing and selecting predictive models.

Model Selection

Importance Sampling via Local Sensitivity

no code implementations4 Nov 2019 Anant Raj, Cameron Musco, Lester Mackey

Unfortunately, sensitivity sampling is difficult to apply since (1) it is unclear how to efficiently compute the sensitivity scores and (2) the sample size required is often impractically large.

Single Point Transductive Prediction

1 code implementation ICML 2020 Nilesh Tripuraneni, Lester Mackey

Standard methods in supervised learning separate training and prediction: the model is fit independently of any test points it may encounter.

A Kernel Stein Test for Comparing Latent Variable Models

1 code implementation1 Jul 2019 Heishiro Kanagawa, Wittawat Jitkrittum, Lester Mackey, Kenji Fukumizu, Arthur Gretton

We propose a kernel-based nonparametric test of relative goodness of fit, where the goal is to compare two models, both of which may have unobserved latent variables, such that the marginal distribution of the observed variables is intractable.

Minimum Stein Discrepancy Estimators

no code implementations NeurIPS 2019 Alessandro Barp, Francois-Xavier Briol, Andrew B. Duncan, Mark Girolami, Lester Mackey

We provide a unifying perspective of these techniques as minimum Stein discrepancy estimators, and use this lens to design new diffusion kernel Stein discrepancy (DKSD) and diffusion score matching (DSM) estimators with complementary strengths.

Stochastic Runge-Kutta Accelerates Langevin Monte Carlo and Beyond

no code implementations NeurIPS 2019 Xuechen Li, Denny Wu, Lester Mackey, Murat A. Erdogdu

In this paper, we establish the convergence rate of sampling algorithms obtained by discretizing smooth It\^o diffusions exhibiting fast Wasserstein-$2$ contraction, based on local deviation properties of the integration scheme.

Numerical Integration

Stein Point Markov Chain Monte Carlo

1 code implementation9 May 2019 Wilson Ye Chen, Alessandro Barp, François-Xavier Briol, Jackson Gorham, Mark Girolami, Lester Mackey, Chris. J. Oates

Stein Points are a class of algorithms for this task, which proceed by sequentially minimising a Stein discrepancy between the empirical measure and the target and, hence, require the solution of a non-convex optimisation problem to obtain each new point.

Bayesian Inference

Model Compression with Generative Adversarial Networks

no code implementations ICLR 2019 Ruishan Liu, Nicolo Fusi, Lester Mackey

Our GAN-assisted model compression (GAN-MC) significantly improves student accuracy for expensive models such as deep neural networks and large random forests on both image and tabular datasets.

Generative Adversarial Network Image Classification +1

Accelerating Rescaled Gradient Descent: Fast Optimization of Smooth Functions

1 code implementation NeurIPS 2019 Ashia Wilson, Lester Mackey, Andre Wibisono

We also introduce a new first-order algorithm, called rescaled gradient descent (RGD), and show that RGD achieves a faster convergence rate than gradient descent provided the function is strongly smooth -- a natural generalization of the standard smoothness assumption on the objective function.

Optimization and Control

Teacher-Student Compression with Generative Adversarial Networks

1 code implementation ICLR 2019 Ruishan Liu, Nicolo Fusi, Lester Mackey

Our GAN-assisted TSC (GAN-TSC) significantly improves student accuracy for expensive models such as large random forests and deep neural networks on both tabular and image datasets.

Generative Adversarial Network Image Classification +1

Global Non-convex Optimization with Discretized Diffusions

no code implementations NeurIPS 2018 Murat A. Erdogdu, Lester Mackey, Ohad Shamir

An Euler discretization of the Langevin diffusion is known to converge to the global minimizers of certain convex and non-convex optimization problems.

Random Feature Stein Discrepancies

1 code implementation NeurIPS 2018 Jonathan H. Huggins, Lester Mackey

Computable Stein discrepancies have been deployed for a variety of applications, ranging from sampler selection in posterior inference to approximate Bayesian inference to goodness-of-fit testing.

Bayesian Inference

DeepMiner: Discovering Interpretable Representations for Mammogram Classification and Explanation

1 code implementation31 May 2018 Jimmy Wu, Bolei Zhou, Diondra Peck, Scott Hsieh, Vandana Dialani, Lester Mackey, Genevieve Patterson

We propose DeepMiner, a framework to discover interpretable representations in deep neural networks and to build explanations for medical predictions.

Classification General Classification +1

Stein Points

1 code implementation ICML 2018 Wilson Ye Chen, Lester Mackey, Jackson Gorham, François-Xavier Briol, Chris. J. Oates

An important task in computational statistics and machine learning is to approximate a posterior distribution $p(x)$ with an empirical measure supported on a set of representative points $\{x_i\}_{i=1}^n$.

Orthogonal Machine Learning: Power and Limitations

1 code implementation ICML 2018 Lester Mackey, Vasilis Syrgkanis, Ilias Zadik

Double machine learning provides $\sqrt{n}$-consistent estimates of parameters of interest even when high-dimensional or nonparametric nuisance parameters are estimated at an $n^{-1/4}$ rate.

2k BIG-bench Machine Learning +2

Improving Gibbs Sampler Scan Quality with DoGS

no code implementations ICML 2017 Ioannis Mitliagkas, Lester Mackey

The pairwise influence matrix of Dobrushin has long been used as an analytical tool to bound the rate of convergence of Gibbs sampling.

Image Segmentation Object Recognition +2

Measuring Sample Quality with Kernels

no code implementations ICML 2017 Jackson Gorham, Lester Mackey

We develop a theory of weak convergence for KSDs based on Stein's method, demonstrate that commonly used KSDs fail to detect non-convergence even for Gaussian targets, and show that kernels with slowly decaying tails provably determine convergence for a large class of target distributions.

Two-sample testing

Measuring Sample Quality with Diffusions

no code implementations21 Nov 2016 Jackson Gorham, Andrew B. Duncan, Sebastian J. Vollmer, Lester Mackey

Stein's method for measuring convergence to a continuous target distribution relies on an operator characterizing the target and Stein factor bounds on the solutions of an associated differential equation.

Jet-Images -- Deep Learning Edition

1 code implementation16 Nov 2015 Luke de Oliveira, Michael Kagan, Lester Mackey, Benjamin Nachman, Ariel Schwartzman

Building on the notion of a particle physics detector as a camera and the collimated streams of high energy particles, or jets, it measures as an image, we investigate the potential of machine learning techniques based on deep learning architectures to identify highly boosted W bosons.

Jet Tagging

Fuzzy Jets

no code implementations7 Sep 2015 Lester Mackey, Benjamin Nachman, Ariel Schwartzman, Conrad Stansbury

Collimated streams of particles produced in high energy physics experiments are organized using clustering algorithms to form jets.

Clustering Jet Tagging

Measuring Sample Quality with Stein's Method

no code implementations NeurIPS 2015 Jackson Gorham, Lester Mackey

To improve the efficiency of Monte Carlo estimation, practitioners are turning to biased Markov chain Monte Carlo procedures that trade off asymptotic exactness for computational speed.

Weighted Classification Cascades for Optimizing Discovery Significance in the HiggsML Challenge

no code implementations9 Sep 2014 Lester Mackey, Jordan Bryan, Man Yue Mo

We introduce a minorization-maximization approach to optimizing common measures of discovery significance in high energy physics.

BIG-bench Machine Learning Binary Classification +2

Corrupted Sensing: Novel Guarantees for Separating Structured Signals

no code implementations11 May 2013 Rina Foygel, Lester Mackey

While an arbitrary signal cannot be recovered in the face of arbitrary corruption, tractable recovery is possible when both signal and corruption are suitably structured.

Distributed Low-rank Subspace Segmentation

no code implementations20 Apr 2013 Ameet Talwalkar, Lester Mackey, Yadong Mu, Shih-Fu Chang, Michael. I. Jordan

Vision problems ranging from image clustering to motion segmentation to semi-supervised learning can naturally be framed as subspace segmentation problems, in which one aims to recover multiple low-dimensional subspaces from noisy and corrupted input data.

Clustering Event Detection +4

The asymptotics of ranking algorithms

no code implementations7 Apr 2012 John C. Duchi, Lester Mackey, Michael. I. Jordan

With these negative results as motivation, we present a new approach to supervised ranking based on aggregation of partial preferences, and we develop $U$-statistic-based empirical risk minimization procedures.

Combinatorial clustering and the beta negative binomial process

no code implementations8 Nov 2011 Tamara Broderick, Lester Mackey, John Paisley, Michael. I. Jordan

We show that the NBP is conjugate to the beta process, and we characterize the posterior distribution under the beta-negative binomial process (BNBP) and hierarchical models based on the BNBP (the HBNBP).

Clustering Image Segmentation +2

Distributed Matrix Completion and Robust Factorization

no code implementations5 Jul 2011 Lester Mackey, Ameet Talwalkar, Michael. I. Jordan

If learning methods are to scale to the massive sizes of modern datasets, it is essential for the field of machine learning to embrace parallel and distributed computing.

Collaborative Filtering Distributed Computing +1

Feature-Weighted Linear Stacking

3 code implementations3 Nov 2009 Joseph Sill, Gabor Takacs, Lester Mackey, David Lin

Ensemble methods, such as stacking, are designed to boost predictive accuracy by blending the predictions of multiple machine learning models.

Collaborative Filtering

Cannot find the paper you are looking for? You can Submit a new open access paper.