Search Results for author: Larry Wasserman

Found 68 papers, 18 papers with code

Causal Inference for Genomic Data with Multiple Heterogeneous Outcomes

no code implementations14 Apr 2024 Jin-Hong Du, Zhenghao Zeng, Edward H. Kennedy, Larry Wasserman, Kathryn Roeder

In this paper, we propose a generic semiparametric inference framework for doubly robust estimation with multiple derived outcomes, which also encompasses the usual setting of multiple outcomes when the response of each unit is available.

Causal Inference

Double Cross-fit Doubly Robust Estimators: Beyond Series Regression

1 code implementation22 Mar 2024 Alec McClean, Sivaraman Balakrishnan, Edward H. Kennedy, Larry Wasserman

Then, assuming the nuisance functions are H\"{o}lder smooth, but without assuming knowledge of the true smoothness level or the covariate density, we establish that DCDR estimators with several linear smoothers are semiparametric efficient under minimal conditions and achieve fast convergence rates in the non-$\sqrt{n}$ regime.

Causal Inference regression

Semi-Supervised U-statistics

no code implementations29 Feb 2024 Ilmun Kim, Larry Wasserman, Sivaraman Balakrishnan, Matey Neykov

Semi-supervised datasets are ubiquitous across diverse domains where obtaining fully labeled data is costly or time-consuming.

Simultaneous inference for generalized linear models with unmeasured confounders

1 code implementation13 Sep 2023 Jin-Hong Du, Larry Wasserman, Kathryn Roeder

Tens of thousands of simultaneous hypothesis tests are routinely performed in genomic studies to identify differentially expressed genes.

The Fundamental Limits of Structure-Agnostic Functional Estimation

no code implementations6 May 2023 Sivaraman Balakrishnan, Edward H. Kennedy, Larry Wasserman

These first-order methods are however provably suboptimal in a minimax sense for functional estimation when the nuisance functions live in Holder-type function spaces.

Causal Inference

Feature Importance: A Closer Look at Shapley Values and LOCO

no code implementations10 Mar 2023 Isabella Verdinelli, Larry Wasserman

We are particularly interested in the effect of correlation between features which can obscure interpretability.

Feature Correlation Feature Importance

Data fission: splitting a single data point

no code implementations21 Dec 2021 James Leiner, Boyan Duan, Larry Wasserman, Aaditya Ramdas

Rasines and Young (2022) offers an alternative approach that uses additive Gaussian noise -- this enables post-selection inference in finite samples for Gaussian distributed data and asymptotically when errors are non-Gaussian.

Additive models Bayesian Inference

Decorrelated Variable Importance

no code implementations21 Nov 2021 Isabella Verdinelli, Larry Wasserman

We propose a method for mitigating the effect of correlation by defining a modified version of LOCO.

Universal Inference Meets Random Projections: A Scalable Test for Log-concavity

2 code implementations17 Nov 2021 Robin Dunn, Aditya Gangrade, Larry Wasserman, Aaditya Ramdas

Shape constraints yield flexible middle grounds between fully nonparametric and fully parametric approaches to modeling distributions of data.

valid

Plugin Estimation of Smooth Optimal Transport Maps

1 code implementation26 Jul 2021 Tudor Manole, Sivaraman Balakrishnan, Jonathan Niles-Weed, Larry Wasserman

Our work also provides new bounds on the risk of corresponding plugin estimators for the quadratic Wasserstein distance, and we show how this problem relates to that of estimating optimal transport maps using stability arguments for smooth and strongly convex Brenier potentials.

Forest Guided Smoothing

no code implementations8 Mar 2021 Isabella Verdinelli, Larry Wasserman

We use the output of a random forest to define a family of local smoothers with spatially adaptive bandwidth matrices.

Model-Independent Detection of New Physics Signals Using Interpretable Semi-Supervised Classifier Tests

no code implementations15 Feb 2021 Purvasha Chakravarti, Mikael Kuusela, Jing Lei, Larry Wasserman

Here we instead investigate a model-independent method that does not make any assumptions about the signal and uses a semi-supervised classifier to detect the presence of the signal in the experimental data.

Applications High Energy Physics - Phenomenology Data Analysis, Statistics and Probability

PLLay: Efficient Topological Layer based on Persistent Landscapes

1 code implementation NeurIPS 2020 Kwangho Kim, Jisu Kim, Manzil Zaheer, Joon Kim, Frederic Chazal, Larry Wasserman

We propose PLLay, a novel topological layer for general deep learning models based on persistence landscapes, in which we can efficiently exploit the underlying topological features of the input data structure.

The huge Package for High-dimensional Undirected Graph Estimation in R

no code implementations26 Jun 2020 Tuo Zhao, Han Liu, Kathryn Roeder, John Lafferty, Larry Wasserman

We describe an R package named huge which provides easy-to-use functions for estimating high dimensional undirected graphs from data.

Model Selection Vocal Bursts Intensity Prediction

Familywise Error Rate Control by Interactive Unmasking

1 code implementation ICML 2020 Boyan Duan, Aaditya Ramdas, Larry Wasserman

We propose a method for multiple hypothesis testing with familywise error rate (FWER) control, called the i-FWER test.

Methodology

PLLay: Efficient Topological Layer based on Persistence Landscapes

2 code implementations NeurIPS 2020 Kwangho Kim, Jisu Kim, Manzil Zaheer, Joon Sik Kim, Frederic Chazal, Larry Wasserman

We propose PLLay, a novel topological layer for general deep learning models based on persistence landscapes, in which we can efficiently exploit the underlying topological features of the input data structure.

Trend Filtering -- II. Denoising Astronomical Signals with Varying Degrees of Smoothness

4 code implementations10 Jan 2020 Collin A. Politsch, Jessi Cisewski-Kehe, Rupert A. C. Croft, Larry Wasserman

The remaining studies share broad themes of: (1) estimating observable parameters of light curves and spectra; and (2) constructing observational spectral/light-curve templates.

Instrumentation and Methods for Astrophysics Cosmology and Nongalactic Astrophysics Earth and Planetary Astrophysics Solar and Stellar Astrophysics Applications

Universal Inference

no code implementations24 Dec 2019 Larry Wasserman, Aaditya Ramdas, Sivaraman Balakrishnan

Constructing tests and confidence sets for such models is notoriously difficult.

valid

Minimax Confidence Intervals for the Sliced Wasserstein Distance

2 code implementations17 Sep 2019 Tudor Manole, Sivaraman Balakrishnan, Larry Wasserman

To motivate the choice of these classes, we also study minimax rates of estimating a distribution under the Sliced Wasserstein distance.

Uncertainty Quantification

Trend Filtering: A Modern Statistical Tool for Time-Domain Astronomy and Astronomical Spectroscopy

2 code implementations20 Aug 2019 Collin A. Politsch, Jessi Cisewski-Kehe, Rupert A. C. Croft, Larry Wasserman

The problem of denoising a one-dimensional signal possessing varying degrees of smoothness is ubiquitous in time-domain astronomy and astronomical spectroscopy.

Instrumentation and Methods for Astrophysics Cosmology and Nongalactic Astrophysics Applications

Cautious Deep Learning

no code implementations24 May 2018 Yotam Hechtlinger, Barnabás Póczos, Larry Wasserman

Our construction is based on $p(x|y)$ rather than $p(y|x)$ which results in a classifier that is very cautious: it outputs the null set --- meaning "I don't know" --- when the object does not resemble the training examples.

Conformal Prediction

Hypothesis Testing for High-Dimensional Multinomials: A Selective Review

no code implementations17 Dec 2017 Sivaraman Balakrishnan, Larry Wasserman

The statistical analysis of discrete data has been the subject of extensive statistical research dating back to the work of Pearson.

Two-sample testing Vocal Bursts Intensity Prediction

Hypothesis Testing For Densities and High-Dimensional Multinomials: Sharp Local Minimax Rates

no code implementations30 Jun 2017 Sivaraman Balakrishnan, Larry Wasserman

In contrast to existing results, we show that the minimax rate and critical testing radius in these settings depend strongly, and in a precise way, on the null distribution being tested and this motivates the study of the (local) minimax rate as a function of the null distribution.

Two-sample testing

Topological Data Analysis

1 code implementation27 Sep 2016 Larry Wasserman

Topological Data Analysis (TDA) can broadly be described as a collection of data analysis methods that find structure in data.

Methodology

Least Ambiguous Set-Valued Classifiers with Bounded Error Levels

no code implementations2 Sep 2016 Mauricio Sadinle, Jing Lei, Larry Wasserman

In most classification tasks there are observations that are ambiguous and therefore difficult to correctly label.

General Classification

Finding Singular Features

no code implementations1 Jun 2016 Christopher Genovese, Marco Perone-Pacifico, Isabella Verdinelli, Larry Wasserman

We present a method for finding high density, low-dimensional structures in noisy point clouds.

Clustering

Statistical Inference for Cluster Trees

no code implementations NeurIPS 2016 Jisu Kim, Yen-Chi Chen, Sivaraman Balakrishnan, Alessandro Rinaldo, Larry Wasserman

A cluster tree provides a highly-interpretable summary of a density function by representing the hierarchy of its high-density clusters.

Distribution-Free Predictive Inference For Regression

5 code implementations14 Apr 2016 Jing Lei, Max G'Sell, Alessandro Rinaldo, Ryan J. Tibshirani, Larry Wasserman

In the spirit of reproducibility, all of our empirical results can also be easily (re)generated using this package.

Computational Efficiency Prediction Intervals +2

Classification accuracy as a proxy for two sample testing

no code implementations6 Feb 2016 Ilmun Kim, Aaditya Ramdas, Aarti Singh, Larry Wasserman

We prove two results that hold for all classifiers in any dimensions: if its true error remains $\epsilon$-better than chance for some $\epsilon>0$ as $d, n \to \infty$, then (a) the permutation-based test is consistent (has power approaching to one), (b) a computationally efficient test based on a Gaussian approximation of the null distribution is also consistent.

Classification General Classification +2

Minimax Lower Bounds for Linear Independence Testing

no code implementations23 Jan 2016 Aaditya Ramdas, David Isenberg, Aarti Singh, Larry Wasserman

Linear independence testing is a fundamental information-theoretic and statistical problem that can be posed as follows: given $n$ points $\{(X_i, Y_i)\}^n_{i=1}$ from a $p+q$ dimensional multivariate distribution where $X_i \in \mathbb{R}^p$ and $Y_i \in\mathbb{R}^q$, determine whether $a^T X$ and $b^T Y$ are uncorrelated for every $a \in \mathbb{R}^p, b\in \mathbb{R}^q$ or not.

Two-sample testing

Nonparametric von Mises Estimators for Entropies, Divergences and Mutual Informations

no code implementations NeurIPS 2015 Kirthevasan Kandasamy, Akshay Krishnamurthy, Barnabas Poczos, Larry Wasserman, James M. Robins

We propose and analyse estimators for statistical functionals of one or moredistributions under nonparametric assumptions. Our estimators are derived from the von Mises expansion andare based on the theory of influence functions, which appearin the semiparametric statistics literature. We show that estimators based either on data-splitting or a leave-one-out techniqueenjoy fast rates of convergence and other favorable theoretical properties. We apply this framework to derive estimators for several popular informationtheoretic quantities, and via empirical evaluation, show the advantage of thisapproach over existing estimators.

Statistical Analysis of Persistence Intensity Functions

no code implementations8 Oct 2015 Yen-Chi Chen, Daren Wang, Alessandro Rinaldo, Larry Wasserman

Persistence diagrams are two-dimensional plots that summarize the topological features of functions and are an important part of topological data analysis.

Clustering Topological Data Analysis

Adaptivity and Computation-Statistics Tradeoffs for Kernel and Distance based High Dimensional Two Sample Testing

no code implementations4 Aug 2015 Aaditya Ramdas, Sashank J. Reddi, Barnabas Poczos, Aarti Singh, Larry Wasserman

We formally characterize the power of popular tests for GDA like the Maximum Mean Discrepancy with the Gaussian kernel (gMMD) and bandwidth-dependent variants of the Energy Distance with the Euclidean norm (eED) in the high-dimensional MDA regime.

Two-sample testing

Statistical Inference using the Morse-Smale Complex

1 code implementation29 Jun 2015 Yen-Chi Chen, Christopher R. Genovese, Larry Wasserman

The Morse-Smale complex of a function $f$ decomposes the sample space into cells where $f$ is increasing or decreasing.

Clustering Density Estimation +1

Optimal Ridge Detection using Coverage Risk

no code implementations NeurIPS 2015 Yen-Chi Chen, Christopher R. Genovese, Shirley Ho, Larry Wasserman

We introduce the concept of coverage risk as an error measure for density ridge estimation.

An Analysis of Active Learning With Uniform Feature Noise

no code implementations15 May 2015 Aaditya Ramdas, Barnabas Poczos, Aarti Singh, Larry Wasserman

For larger $\sigma$, the \textit{unflattening} of the regression function on convolution with uniform noise, along with its local antisymmetry around the threshold, together yield a behaviour where noise \textit{appears} to be beneficial.

Active Learning Binary Classification +1

Robust Topological Inference: Distance To a Measure and Kernel Distance

2 code implementations22 Dec 2014 Frédéric Chazal, Brittany T. Fasy, Fabrizio Lecci, Bertrand Michel, Alessandro Rinaldo, Larry Wasserman

However, the empirical distance function is highly non-robust to noise and outliers.

Statistics Theory Computational Geometry Algebraic Topology Statistics Theory

Nonparametric modal regression

no code implementations4 Dec 2014 Yen-Chi Chen, Christopher R. Genovese, Ryan J. Tibshirani, Larry Wasserman

Modal regression estimates the local modes of the distribution of $Y$ given $X=x$, instead of the mean, as in the usual regression sense, and can hence reveal important structure missed by usual regression methods.

regression

On the High-dimensional Power of Linear-time Kernel Two-Sample Testing under Mean-difference Alternatives

no code implementations23 Nov 2014 Aaditya Ramdas, Sashank J. Reddi, Barnabas Poczos, Aarti Singh, Larry Wasserman

The current literature is split into two kinds of tests - those which are consistent without any assumptions about how the distributions may differ (\textit{general} alternatives), and those which are designed to specifically test easier alternatives, like a difference in means (\textit{mean-shift} alternatives).

Two-sample testing

On Estimating $L_2^2$ Divergence

no code implementations30 Oct 2014 Akshay Krishnamurthy, Kirthevasan Kandasamy, Barnabas Poczos, Larry Wasserman

We give a comprehensive theoretical characterization of a nonparametric estimator for the $L_2^2$ divergence between two continuous distributions.

The functional mean-shift algorithm for mode hunting and clustering in infinite dimensions

no code implementations6 Aug 2014 Mattia Ciollaro, Christopher Genovese, Jing Lei, Larry Wasserman

We introduce the functional mean-shift algorithm, an iterative algorithm for estimating the local modes of a surrogate density from functional data.

Clustering Spike Sorting

Estimating the distribution of Galaxy Morphologies on a continuous space

no code implementations29 Jun 2014 Giuseppe Vinci, Peter Freeman, Jeffrey Newman, Larry Wasserman, Christopher Genovese

The incredible variety of galaxy shapes cannot be summarized by human defined discrete classes of shapes without causing a possibly large loss of information.

Dictionary Learning

Efficient Sparse Clustering of High-Dimensional Non-spherical Gaussian Mixtures

no code implementations9 Jun 2014 Martin Azizyan, Aarti Singh, Larry Wasserman

We consider the problem of clustering data points in high dimensions, i. e. when the number of data points may be much smaller than the number of dimensions.

Clustering Vocal Bursts Intensity Prediction

Subsampling Methods for Persistent Homology

no code implementations7 Jun 2014 Frédéric Chazal, Brittany Terese Fasy, Fabrizio Lecci, Bertrand Michel, Alessandro Rinaldo, Larry Wasserman

Persistent homology is a multiscale method for analyzing the shape of sets and functions from point cloud data arising from an unknown distribution supported on those sets.

Algebraic Topology Computational Geometry Applications

A Comprehensive Approach to Mode Clustering

no code implementations6 Jun 2014 Yen-Chi Chen, Christopher R. Genovese, Larry Wasserman

Mode clustering is a nonparametric method for clustering that defines clusters using the basins of attraction of a density estimator's modes.

Clustering Denoising

Nonparametric Estimation of Renyi Divergence and Friends

no code implementations12 Feb 2014 Akshay Krishnamurthy, Kirthevasan Kandasamy, Barnabas Poczos, Larry Wasserman

We consider nonparametric estimation of $L_2$, Renyi-$\alpha$ and Tsallis-$\alpha$ divergences between continuous distributions.

Nonparametric Inference For Density Modes

no code implementations29 Dec 2013 Christopher Genovese, Marco Perone-Pacifico, Isabella Verdinelli, Larry Wasserman

We derive nonparametric confidence intervals for the eigenvalues of the Hessian at modes of a density estimate.

valid

Stochastic Convergence of Persistence Landscapes and Silhouettes

no code implementations2 Dec 2013 Frédéric Chazal, Brittany Terese Fasy, Fabrizio Lecci, Alessandro Rinaldo, Larry Wasserman

Persistent homology is a widely used tool in Topological Data Analysis that encodes multiscale topological information as a multi-set of points in the plane called a persistence diagram.

Statistics Theory Computational Geometry Algebraic Topology Statistics Theory

On the Bootstrap for Persistence Diagrams and Landscapes

1 code implementation2 Nov 2013 Frédéric Chazal, Brittany Terese Fasy, Fabrizio Lecci, Alessandro Rinaldo, Aarti Singh, Larry Wasserman

Persistent homology probes topological properties from point clouds and functions.

Algebraic Topology Computational Geometry Applications

Estimating Undirected Graphs Under Weak Assumptions

no code implementations26 Sep 2013 Larry Wasserman, Mladen Kolar, Alessandro Rinaldo

In particular, we consider: cluster graphs, restricted partial correlation graphs and correlation graphs.

valid

Tight Lower Bounds for Homology Inference

no code implementations29 Jul 2013 Sivaraman Balakrishnan, Alessandro Rinaldo, Aarti Singh, Larry Wasserman

In this note we use a different construction based on the direct analysis of the likelihood ratio test to show that the upper bound of Niyogi, Smale and Weinberger is in fact tight, thus establishing rate optimal asymptotic minimax bounds for the problem.

LEMMA

Cluster Trees on Manifolds

no code implementations NeurIPS 2013 Sivaraman Balakrishnan, Srivatsan Narayanan, Alessandro Rinaldo, Aarti Singh, Larry Wasserman

In this paper we investigate the problem of estimating the cluster tree for a density $f$ supported on or near a smooth $d$-dimensional manifold $M$ isometrically embedded in $\mathbb{R}^D$.

Clustering

Minimax Theory for High-dimensional Gaussian Mixtures with Sparse Mean Separation

no code implementations NeurIPS 2013 Martin Azizyan, Aarti Singh, Larry Wasserman

While several papers have investigated computationally and statistically efficient methods for learning Gaussian mixtures, precise minimax bounds for their statistical performance as well as fundamental limits in high-dimensional settings are not well-understood.

Clustering feature selection +1

Confidence sets for persistence diagrams

no code implementations28 Mar 2013 Brittany Terese Fasy, Fabrizio Lecci, Alessandro Rinaldo, Larry Wasserman, Sivaraman Balakrishnan, Aarti Singh

Persistent homology is a method for probing topological properties of point clouds and functions.

A Conformal Prediction Approach to Explore Functional Data

no code implementations26 Feb 2013 Jing Lei, Alessandro Rinaldo, Larry Wasserman

This paper applies conformal prediction techniques to compute simultaneous prediction bands and clustering trees for functional data.

Clustering Conformal Prediction

Nonparametric ridge estimation

no code implementations20 Dec 2012 Christopher R. Genovese, Marco Perone-Pacifico, Isabella Verdinelli, Larry Wasserman

Ridge estimation is an extension of mode finding and is useful for understanding the structure of a density.

Density-sensitive semisupervised inference

no code implementations7 Apr 2012 Martin Azizyan, Aarti Singh, Larry Wasserman

Semisupervised methods are techniques for using labeled data $(X_1, Y_1),\ldots,(X_n, Y_n)$ together with unlabeled data $X_{n+1},\ldots, X_N$ to make predictions.

Graph-Valued Regression

no code implementations NeurIPS 2010 Han Liu, Xi Chen, Larry Wasserman, John D. Lafferty

In this paper, we propose a semiparametric method for estimating $G(x)$ that builds a tree on the $X$ space just as in CART (classification and regression trees), but at each leaf of the tree estimates a graph.

regression

Stability Approach to Regularization Selection (StARS) for High Dimensional Graphical Models

2 code implementations NeurIPS 2010 Han Liu, Kathryn Roeder, Larry Wasserman

In this paper, we present StARS: a new stability-based method for choosing the regularization parameter in high dimensional inference for undirected graphs.

Model Selection Vocal Bursts Intensity Prediction

Nonparametric regression and classification with joint sparsity constraints

no code implementations NeurIPS 2008 Han Liu, Larry Wasserman, John D. Lafferty

We propose new families of models and algorithms for high-dimensional nonparametric learning with joint sparsity constraints.

Additive models Classification +2

Compressed Regression

no code implementations NeurIPS 2007 Shuheng Zhou, Larry Wasserman, John D. Lafferty

Recent research has studied the role of sparsity in high dimensional regression and signal reconstruction, establishing theoretical limits for recovering sparse models from sparse data.

regression

Treelets--An adaptive multi-scale basis for sparse unordered data

3 code implementations3 Jul 2007 Ann B. Lee, Boaz Nadler, Larry Wasserman

In many modern applications, including analysis of gene expression and text documents, the data are noisy, high-dimensional, and unordered--with no particular meaning to the given order of the variables.

Methodology

Cannot find the paper you are looking for? You can Submit a new open access paper.