Search Results for author: Cencheng Shen

Found 19 papers, 8 papers with code

Graph Encoder Embedding

1 code implementation27 Sep 2021 Cencheng Shen, Qizhe Wang, Carey E. Priebe

In this paper we propose a lightning fast graph embedding method called graph encoder embedding.

Graph Embedding Stochastic Block Model

High-Dimensional Independence Testing and Maximum Marginal Correlation

no code implementations4 Jan 2020 Cencheng Shen

A number of universally consistent dependence measures have been recently proposed for testing independence, such as distance correlation, kernel correlation, multiscale graph correlation, etc.

The Chi-Square Test of Distance Correlation

1 code implementation27 Dec 2019 Cencheng Shen, Sambit Panda, Joshua T. Vogelstein

One major bottleneck is the testing process: because the null distribution of distance correlation depends on the underlying random variables and metric choice, it typically requires a permutation test to estimate the null and compute the p-value, which is very costly for large amount of data.

Nonpar MANOVA via Independence Testing

no code implementations20 Oct 2019 Sambit Panda, Cencheng Shen, Ronan Perry, Jelle Zorn, Antoine Lutz, Carey E. Priebe, Joshua T. Vogelstein

The $k$-sample testing problem tests whether or not $k$ groups of data points are sampled from the same distribution.

Two-sample testing

Independence Testing for Multivariate Time Series

no code implementations18 Aug 2019 Ronak Mehta, Jaewon Chung, Cencheng Shen, Ting Xu, Joshua T. Vogelstein

The proposed nonparametric procedure is valid and consistent, building upon prior work by characterizing the geometry of the relationship, estimating the time lag at which dependence is maximized, avoiding the need for multiple testing, and exhibiting superior power in high-dimensional, low sample size, nonlinear settings.

Time Series Time Series Analysis

hyppo: A Multivariate Hypothesis Testing Python Package

4 code implementations3 Jul 2019 Sambit Panda, Satish Palaniappan, Junhao Xiong, Eric W. Bridgeford, Ronak Mehta, Cencheng Shen, Joshua T. Vogelstein

We introduce hyppo, a unified library for performing multivariate hypothesis testing, including independence, two-sample, and k-sample testing.

Two-sample testing

Random Forests for Adaptive Nearest Neighbor Estimation of Information-Theoretic Quantities

1 code implementation30 Jun 2019 Ronan Perry, Ronak Mehta, Richard Guo, Eva Yezerets, Jesús Arroyo, Mike Powell, Hayden Helm, Cencheng Shen, Joshua T. Vogelstein

Information-theoretic quantities, such as conditional entropy and mutual information, are critical data summaries for quantifying uncertainty.

Sparse Representation Classification via Screening for Graphs

no code implementations4 Jun 2019 Cencheng Shen, Li Chen, Yuexiao Dong, Carey Priebe

The sparse representation classifier (SRC) is shown to work well for image recognition problems that satisfy a subspace assumption.

Classification Classification Consistency +1

Learning Interpretable Characteristic Kernels via Decision Forests

no code implementations30 Nov 2018 Cencheng Shen, Sambit Panda, Joshua T. Vogelstein

It has been demonstrated that these proximity matrices can be thought of as kernels, connecting the decision forest literature to the extensive kernel machine literature.

Feature Importance General Classification

The Exact Equivalence of Distance and Kernel Methods for Hypothesis Testing

no code implementations14 Jun 2018 Cencheng Shen, Joshua T. Vogelstein

Distance-based tests, also called "energy statistics", are leading methods for two-sample and independence tests from the statistics community.

Two-sample testing

From Distance Correlation to Multiscale Graph Correlation

1 code implementation26 Oct 2017 Cencheng Shen, Carey E. Priebe, Joshua T. Vogelstein

Understanding and developing a correlation measure that can detect general dependencies is not only imperative to statistics and machine learning, but also crucial to general scientific discovery in the big data age.

Discovering and Deciphering Relationships Across Disparate Data Modalities

4 code implementations16 Sep 2016 Joshua T. Vogelstein, Eric Bridgeford, Qing Wang, Carey E. Priebe, Mauro Maggioni, Cencheng Shen

Understanding the relationships between different properties of data, such as whether a connectome or genome has information about disease status, is becoming increasingly important in modern biological datasets.

Sparse Projection Oblique Randomer Forests

2 code implementations10 Jun 2015 Tyler M. Tomita, James Browne, Cencheng Shen, Jaewon Chung, Jesse L. Patsolic, Benjamin Falk, Jason Yim, Carey E. Priebe, Randal Burns, Mauro Maggioni, Joshua T. Vogelstein

Unfortunately, these extensions forfeit one or more of the favorable properties of decision forests based on axis-aligned splits, such as robustness to many noise dimensions, interpretability, or computational efficiency.

Sparse Representation Classification Beyond L1 Minimization and the Subspace Assumption

no code implementations4 Feb 2015 Cencheng Shen, Li Chen, Yuexiao Dong, Carey E. Priebe

The results are demonstrated via simulations and real data experiments, where the new algorithm achieves comparable numerical performance and significantly faster.

Classification Classification Consistency +1

Manifold Matching using Shortest-Path Distance and Joint Neighborhood Selection

1 code implementation12 Dec 2014 Cencheng Shen, Joshua T. Vogelstein, Carey E. Priebe

Then the shortest-path distance within each modality is calculated from the joint neighborhood graph, followed by embedding into and matching in a common low-dimensional Euclidean space.

Robust Vertex Classification

no code implementations23 Nov 2013 Li Chen, Cencheng Shen, Joshua Vogelstein, Carey Priebe

For random graphs distributed according to stochastic blockmodels, a special case of latent position graphs, adjacency spectral embedding followed by appropriate vertex classification is asymptotically Bayes optimal; but this approach requires knowledge of and critically depends on the model dimension.

Classification General Classification

Generalized Canonical Correlation Analysis for Classification

no code implementations30 Apr 2013 Cencheng Shen, Ming Sun, Minh Tang, Carey E. Priebe

For multiple multivariate data sets, we derive conditions under which Generalized Canonical Correlation Analysis (GCCA) improves classification performance of the projected datasets, compared to standard Canonical Correlation Analysis (CCA) using only two data sets.

Classification General Classification

On the Incommensurability Phenomenon

no code implementations9 Jan 2013 Donniell E. Fishkind, Cencheng Shen, Youngser Park, Carey E. Priebe

Suppose that two large, multi-dimensional data sets are each noisy measurements of the same underlying random process, and principle components analysis is performed separately on the data sets to reduce their dimensionality.

Cannot find the paper you are looking for? You can Submit a new open access paper.