no code implementations • 18 Apr 2024 • Lingxiao Li, Raaz Dwivedi, Lester Mackey
Modern compression methods can summarize a target distribution $\mathbb{P}$ more succinctly than i. i. d.
1 code implementation • 28 Nov 2023 • Konstantin Klemmer, Esther Rolf, Caleb Robinson, Lester Mackey, Marc Rußwurm
The resulting SatCLIP location encoder efficiently summarizes the characteristics of any given location for convenient use in downstream tasks.
1 code implementation • 3 Oct 2023 • Eric Zelikman, Eliana Lorch, Lester Mackey, Adam Tauman Kalai
In this work, we use a language-model-infused scaffolding program to improve itself.
no code implementations • 17 Jul 2023 • Lily Xu, Esther Rolf, Sara Beery, Joseph R. Bennett, Tanya Berger-Wolf, Tanya Birch, Elizabeth Bondi-Kelly, Justin Brashares, Melissa Chapman, Anthony Corso, Andrew Davies, Nikhil Garg, Angela Gaylard, Robert Heilmayr, Hannah Kerner, Konstantin Klemmer, Vipin Kumar, Lester Mackey, Claire Monteleoni, Paul Moorcroft, Jonathan Palmer, Andrew Perrault, David Thau, Milind Tambe
In this white paper, we synthesize key points made during presentations and discussions from the AI-Assisted Decision Making for Conservation workshop, hosted by the Center for Research on Computation and Society at Harvard University on October 20-21, 2022.
no code implementations • 29 May 2023 • Ayush Agrawal, Mirac Suzgun, Lester Mackey, Adam Tauman Kalai
In this work, we focus on hallucinated book and article references and present them as the "model organism" of language model hallucination research, due to their frequent and easy-to-discern nature.
1 code implementation • 24 May 2023 • Louis Sharrock, Lester Mackey, Christopher Nemeth
We introduce a suite of new particle-based algorithms for sampling in constrained domains which are entirely learning rate free.
1 code implementation • 14 Jan 2023 • Carles Domingo-Enrich, Raaz Dwivedi, Lester Mackey
To address these shortcomings, we introduce Compress Then Test (CTT), a new framework for high-powered kernel testing based on sample compression.
no code implementations • 10 Nov 2022 • Heishiro Kanagawa, Alessandro Barp, Arthur Gretton, Lester Mackey
Kernel Stein discrepancies (KSDs) measure the quality of a distributional approximation and can be computed even when the target density has an intractable normalizing constant.
no code implementations • 24 Oct 2022 • David Alvarez-Melis, Nicolò Fusi, Lester Mackey, Tal Wagner
Optimal Transport (OT) is a fundamental tool for comparing probability distributions, but its exact computation remains prohibitive for large datasets.
no code implementations • 26 Sep 2022 • Alessandro Barp, Carl-Johann Simon-Gabriel, Mark Girolami, Lester Mackey
Maximum mean discrepancies (MMDs) like the kernel Stein discrepancy (KSD) have grown central to a wide range of applications, including hypothesis testing, sampler selection, distribution approximation, and variational inference.
1 code implementation • 21 Sep 2022 • Soukayna Mouatadid, Paulo Orenstein, Genevieve Flaspohler, Judah Cohen, Miruna Oprescu, Ernest Fraenkel, Lester Mackey
Subseasonal forecasting -- predicting temperature and precipitation 2 to 6 weeks ahead -- is critical for effective water allocation, wildfire management, and drought and flood mitigation.
2 code implementations • 4 Apr 2022 • Niloy Biswas, Lester Mackey, Xiao-Li Meng
Spike-and-slab priors are commonly used for Bayesian variable selection, due to their interpretability and favorable statistical properties.
1 code implementation • 19 Feb 2022 • Jiaxin Shi, Yuhao Zhou, Jessica Hwang, Michalis K. Titsias, Lester Mackey
Gradient estimation -- approximating the gradient of an expectation with respect to the parameters of a distribution -- is central to the solution of many machine learning problems.
1 code implementation • pproximateinference AABI Symposium 2022 • Niloy Biswas, Lester Mackey
Markov chain Monte Carlo (MCMC) provides asymptotically consistent estimates of intractable posterior expectations as the number of iterations tends to infinity.
1 code implementation • ICLR 2022 • Abhishek Shetty, Raaz Dwivedi, Lester Mackey
Near-optimal thinning procedures achieve this goal by sampling $n$ points from a Markov chain and identifying $\sqrt{n}$ points with $\widetilde{\mathcal{O}}(1/\sqrt{n})$ discrepancy to $\mathbb{P}$.
1 code implementation • ICLR 2022 • Raaz Dwivedi, Lester Mackey
Fourth, we establish that KT applied to a sum of the target and power kernels (a procedure we call KT+) simultaneously inherits the improved MMD guarantees of power KT and the tighter individual function guarantees of target KT.
2 code implementations • NeurIPS 2023 • Soukayna Mouatadid, Paulo Orenstein, Genevieve Flaspohler, Miruna Oprescu, Judah Cohen, Franklyn Wang, Sean Knight, Maria Geogdzhayeva, Sam Levang, Ernest Fraenkel, Lester Mackey
To streamline this process and accelerate future development, we introduce SubseasonalClimateUSA, a curated dataset for training and benchmarking subseasonal forecasting models in the United States.
no code implementations • 25 Aug 2021 • Myra Cheng, Maria De-Arteaga, Lester Mackey, Adam Tauman Kalai
Many modern machine learning algorithms mitigate bias by enforcing fairness constraints across coarsely-defined groups related to a sensitive attribute like gender or race.
no code implementations • 5 Jul 2021 • Koulik Khamaru, Yash Deshpande, Tor Lattimore, Lester Mackey, Martin J. Wainwright
We propose a family of online debiasing estimators to correct these distributional anomalies in least squares estimation.
2 code implementations • ICLR 2022 • Jiaxin Shi, Chang Liu, Lester Mackey
We introduce a new family of particle evolution samplers suitable for constrained domains and non-Euclidean geometries.
1 code implementation • 13 Jun 2021 • Genevieve Flaspohler, Francesco Orabona, Judah Cohen, Soukayna Mouatadid, Miruna Oprescu, Paulo Orenstein, Lester Mackey
Inspired by the demands of real-time climate and weather forecasting, we develop optimistic online learning algorithms that require no parameter tuning and have optimal regret guarantees under delayed feedback.
1 code implementation • 12 May 2021 • Raaz Dwivedi, Lester Mackey
The maximum discrepancy in integration error is $\mathcal{O}_d(n^{-1/2}\sqrt{\log n})$ in probability for compactly supported $\mathbb{P}$ and $\mathcal{O}_d(n^{-\frac{1}{2}} (\log n)^{(d+1)/2}\sqrt{\log\log n})$ for sub-exponential $\mathbb{P}$ on $\mathbb{R}^d$.
1 code implementation • ICLR 2021 • Mikhail Khodak, Neil Tenenholtz, Lester Mackey, Nicolò Fusi
In model compression, we show that they enable low-rank methods to significantly outperform both unstructured sparsity and tensor methods on the task of training low-memory residual networks; analogs of the schemes also improve the performance of tensor decomposition techniques.
1 code implementation • ICLR 2021 • Tri Dao, Govinda M Kamath, Vasilis Syrgkanis, Lester Mackey
A popular approach to model compression is to train an inexpensive student model to mimic the class probabilities of a highly accurate but cumbersome teacher model.
no code implementations • 20 Oct 2020 • Anant Raj, Cameron Musco, Lester Mackey, Nicolo Fusi
Model selection requires repeatedly evaluating models on a given dataset and measuring their relative performances.
no code implementations • NeurIPS Workshop ICBINB 2020 • Tin D. Nguyen, Jonathan H. Huggins, Lorenzo Masoero, Lester Mackey, Tamara Broderick
Bayesian nonparametric models based on completely random measures (CRMs) offers flexibility when the number of clusters or latent components in a data set is unknown.
no code implementations • 22 Sep 2020 • Tin D. Nguyen, Jonathan Huggins, Lorenzo Masoero, Lester Mackey, Tamara Broderick
We call our construction the automated independent finite approximation (AIFA).
1 code implementation • NeurIPS 2020 • Pierre Bayle, Alexandre Bayle, Lucas Janson, Lester Mackey
This work develops central limit theorems for cross-validation and consistent estimators of its asymptotic variance under weak stability conditions on the learning algorithm.
1 code implementation • NeurIPS 2020 • Jackson Gorham, Anant Raj, Lester Mackey
Stein discrepancies (SDs) monitor convergence and non-convergence in approximate inference when exact integration and sampling are intractable.
no code implementations • 16 Jun 2020 • Carl-Johann Simon-Gabriel, Alessandro Barp, Bernhard Schölkopf, Lester Mackey
More precisely, we prove that, on a locally compact, non-compact, Hausdorff space, the MMD of a bounded continuous Borel measurable kernel k, whose reproducing kernel Hilbert space (RKHS) functions vanish at infinity, metrizes the weak convergence of probability measures if and only if k is continuous and integrally strictly positive definite (i. s. p. d.)
1 code implementation • NeurIPS 2020 • Nishanth Dikkala, Greg Lewis, Lester Mackey, Vasilis Syrgkanis
We develop an approach for estimating models described via conditional moment restrictions, with a prototypical application being non-parametric instrumental variable regression.
3 code implementations • pproximateinference AABI Symposium 2021 • Marina Riabiz, Wilson Chen, Jon Cockayne, Pawel Swietach, Steven A. Niederer, Lester Mackey, Chris. J. Oates
The use of heuristics to assess the convergence and compress the output of Markov chain Monte Carlo can be sub-optimal in terms of the empirical approximations that are produced.
no code implementations • 20 Mar 2020 • Diana Cai, Rishit Sheth, Lester Mackey, Nicolo Fusi
Meta-learning leverages related source tasks to learn an initialization that can be quickly fine-tuned to a target task with limited labeled examples.
1 code implementation • 2 Mar 2020 • Ashia Wilson, Maximilian Kasy, Lester Mackey
Cross-validation (CV) is a popular approach for assessing and selecting predictive models.
no code implementations • 4 Nov 2019 • Anant Raj, Cameron Musco, Lester Mackey
Unfortunately, sensitivity sampling is difficult to apply since (1) it is unclear how to efficiently compute the sensitivity scores and (2) the sample size required is often impractically large.
1 code implementation • ICML 2020 • Nilesh Tripuraneni, Lester Mackey
Standard methods in supervised learning separate training and prediction: the model is fit independently of any test points it may encounter.
1 code implementation • 1 Jul 2019 • Heishiro Kanagawa, Wittawat Jitkrittum, Lester Mackey, Kenji Fukumizu, Arthur Gretton
We propose a kernel-based nonparametric test of relative goodness of fit, where the goal is to compare two models, both of which may have unobserved latent variables, such that the marginal distribution of the observed variables is intractable.
no code implementations • NeurIPS 2019 • Alessandro Barp, Francois-Xavier Briol, Andrew B. Duncan, Mark Girolami, Lester Mackey
We provide a unifying perspective of these techniques as minimum Stein discrepancy estimators, and use this lens to design new diffusion kernel Stein discrepancy (DKSD) and diffusion score matching (DSM) estimators with complementary strengths.
no code implementations • NeurIPS 2019 • Xuechen Li, Denny Wu, Lester Mackey, Murat A. Erdogdu
In this paper, we establish the convergence rate of sampling algorithms obtained by discretizing smooth It\^o diffusions exhibiting fast Wasserstein-$2$ contraction, based on local deviation properties of the integration scheme.
1 code implementation • 9 May 2019 • Wilson Ye Chen, Alessandro Barp, François-Xavier Briol, Jackson Gorham, Mark Girolami, Lester Mackey, Chris. J. Oates
Stein Points are a class of algorithms for this task, which proceed by sequentially minimising a Stein discrepancy between the empirical measure and the target and, hence, require the solution of a non-convex optimisation problem to obtain each new point.
no code implementations • ICLR 2019 • Ruishan Liu, Nicolo Fusi, Lester Mackey
Our GAN-assisted model compression (GAN-MC) significantly improves student accuracy for expensive models such as deep neural networks and large random forests on both image and tabular datasets.
1 code implementation • NeurIPS 2019 • Ashia Wilson, Lester Mackey, Andre Wibisono
We also introduce a new first-order algorithm, called rescaled gradient descent (RGD), and show that RGD achieves a faster convergence rate than gradient descent provided the function is strongly smooth -- a natural generalization of the standard smoothness assumption on the objective function.
Optimization and Control
1 code implementation • ICLR 2019 • Ruishan Liu, Nicolo Fusi, Lester Mackey
Our GAN-assisted TSC (GAN-TSC) significantly improves student accuracy for expensive models such as large random forests and deep neural networks on both tabular and image datasets.
no code implementations • NeurIPS 2018 • Murat A. Erdogdu, Lester Mackey, Ohad Shamir
An Euler discretization of the Langevin diffusion is known to converge to the global minimizers of certain convex and non-convex optimization problems.
2 code implementations • 19 Sep 2018 • Jessica Hwang, Paulo Orenstein, Judah Cohen, Karl Pfeiffer, Lester Mackey
We hope that both our dataset and our methods will help to advance the state of the art in subseasonal forecasting.
1 code implementation • NeurIPS 2018 • Jonathan H. Huggins, Lester Mackey
Computable Stein discrepancies have been deployed for a variety of applications, ranging from sampler selection in posterior inference to approximate Bayesian inference to goodness-of-fit testing.
1 code implementation • 31 May 2018 • Jimmy Wu, Bolei Zhou, Diondra Peck, Scott Hsieh, Vandana Dialani, Lester Mackey, Genevieve Patterson
We propose DeepMiner, a framework to discover interpretable representations in deep neural networks and to build explanations for medical predictions.
1 code implementation • ICML 2018 • Wilson Ye Chen, Lester Mackey, Jackson Gorham, François-Xavier Briol, Chris. J. Oates
An important task in computational statistics and machine learning is to approximate a posterior distribution $p(x)$ with an empirical measure supported on a set of representative points $\{x_i\}_{i=1}^n$.
1 code implementation • 13 Mar 2018 • Jimmy Wu, Diondra Peck, Scott Hsieh, Vandana Dialani, Constance D. Lehman, Bolei Zhou, Vasilis Syrgkanis, Lester Mackey, Genevieve Patterson
This work interprets the internal representations of deep neural networks trained for classification of diseased tissue in 2D mammograms.
1 code implementation • ICML 2018 • Yash Deshpande, Lester Mackey, Vasilis Syrgkanis, Matt Taddy
Estimators computed from adaptively collected data do not behave like their non-adaptive brethren.
1 code implementation • ICML 2018 • Lester Mackey, Vasilis Syrgkanis, Ilias Zadik
Double machine learning provides $\sqrt{n}$-consistent estimates of parameters of interest even when high-dimensional or nonparametric nuisance parameters are estimated at an $n^{-1/4}$ rate.
no code implementations • ICML 2017 • Ioannis Mitliagkas, Lester Mackey
The pairwise influence matrix of Dobrushin has long been used as an analytical tool to bound the rate of convergence of Gibbs sampling.
no code implementations • ICML 2017 • Jackson Gorham, Lester Mackey
We develop a theory of weak convergence for KSDs based on Stein's method, demonstrate that commonly used KSDs fail to detect non-convergence even for Gaussian targets, and show that kernels with slowly decaying tails provably determine convergence for a large class of target distributions.
no code implementations • 21 Nov 2016 • Jackson Gorham, Andrew B. Duncan, Sebastian J. Vollmer, Lester Mackey
Stein's method for measuring convergence to a continuous target distribution relies on an operator characterizing the target and Stein factor bounds on the solutions of an associated differential equation.
1 code implementation • 16 Nov 2015 • Luke de Oliveira, Michael Kagan, Lester Mackey, Benjamin Nachman, Ariel Schwartzman
Building on the notion of a particle physics detector as a camera and the collimated streams of high energy particles, or jets, it measures as an image, we investigate the potential of machine learning techniques based on deep learning architectures to identify highly boosted W bosons.
no code implementations • 7 Sep 2015 • Lester Mackey, Benjamin Nachman, Ariel Schwartzman, Conrad Stansbury
Collimated streams of particles produced in high energy physics experiments are organized using clustering algorithms to form jets.
no code implementations • NeurIPS 2015 • Jackson Gorham, Lester Mackey
To improve the efficiency of Monte Carlo estimation, practitioners are turning to biased Markov chain Monte Carlo procedures that trade off asymptotic exactness for computational speed.
no code implementations • 9 Sep 2014 • Lester Mackey, Jordan Bryan, Man Yue Mo
We introduce a minorization-maximization approach to optimizing common measures of discovery significance in high energy physics.
no code implementations • 11 May 2013 • Rina Foygel, Lester Mackey
While an arbitrary signal cannot be recovered in the face of arbitrary corruption, tractable recovery is possible when both signal and corruption are suitably structured.
no code implementations • 20 Apr 2013 • Ameet Talwalkar, Lester Mackey, Yadong Mu, Shih-Fu Chang, Michael. I. Jordan
Vision problems ranging from image clustering to motion segmentation to semi-supervised learning can naturally be framed as subspace segmentation problems, in which one aims to recover multiple low-dimensional subspaces from noisy and corrupted input data.
no code implementations • 7 Apr 2012 • John C. Duchi, Lester Mackey, Michael. I. Jordan
With these negative results as motivation, we present a new approach to supervised ranking based on aggregation of partial preferences, and we develop $U$-statistic-based empirical risk minimization procedures.
no code implementations • 8 Nov 2011 • Tamara Broderick, Lester Mackey, John Paisley, Michael. I. Jordan
We show that the NBP is conjugate to the beta process, and we characterize the posterior distribution under the beta-negative binomial process (BNBP) and hierarchical models based on the BNBP (the HBNBP).
no code implementations • 5 Jul 2011 • Lester Mackey, Ameet Talwalkar, Michael. I. Jordan
If learning methods are to scale to the massive sizes of modern datasets, it is essential for the field of machine learning to embrace parallel and distributed computing.
3 code implementations • 3 Nov 2009 • Joseph Sill, Gabor Takacs, Lester Mackey, David Lin
Ensemble methods, such as stacking, are designed to boost predictive accuracy by blending the predictions of multiple machine learning models.