1 code implementation • 27 Sep 2021 • Cencheng Shen, Qizhe Wang, Carey E. Priebe

In this paper we propose a lightning fast graph embedding method called graph encoder embedding.

no code implementations • 25 Oct 2020 • Carey E. Priebe, Cencheng Shen, Ningyuan Huang, Tianyi Chen

Neural networks have achieved remarkable successes in machine learning tasks.

no code implementations • 4 Jan 2020 • Cencheng Shen

A number of universally consistent dependence measures have been recently proposed for testing independence, such as distance correlation, kernel correlation, multiscale graph correlation, etc.

1 code implementation • 27 Dec 2019 • Cencheng Shen, Sambit Panda, Joshua T. Vogelstein

One major bottleneck is the testing process: because the null distribution of distance correlation depends on the underlying random variables and metric choice, it typically requires a permutation test to estimate the null and compute the p-value, which is very costly for large amount of data.

no code implementations • 20 Oct 2019 • Sambit Panda, Cencheng Shen, Ronan Perry, Jelle Zorn, Antoine Lutz, Carey E. Priebe, Joshua T. Vogelstein

The $k$-sample testing problem tests whether or not $k$ groups of data points are sampled from the same distribution.

no code implementations • 18 Aug 2019 • Ronak Mehta, Jaewon Chung, Cencheng Shen, Ting Xu, Joshua T. Vogelstein

The proposed nonparametric procedure is valid and consistent, building upon prior work by characterizing the geometry of the relationship, estimating the time lag at which dependence is maximized, avoiding the need for multiple testing, and exhibiting superior power in high-dimensional, low sample size, nonlinear settings.

4 code implementations • 3 Jul 2019 • Sambit Panda, Satish Palaniappan, Junhao Xiong, Eric W. Bridgeford, Ronak Mehta, Cencheng Shen, Joshua T. Vogelstein

We introduce hyppo, a unified library for performing multivariate hypothesis testing, including independence, two-sample, and k-sample testing.

1 code implementation • 30 Jun 2019 • Ronan Perry, Ronak Mehta, Richard Guo, Eva Yezerets, Jesús Arroyo, Mike Powell, Hayden Helm, Cencheng Shen, Joshua T. Vogelstein

Information-theoretic quantities, such as conditional entropy and mutual information, are critical data summaries for quantifying uncertainty.

no code implementations • 4 Jun 2019 • Cencheng Shen, Li Chen, Yuexiao Dong, Carey Priebe

The sparse representation classifier (SRC) is shown to work well for image recognition problems that satisfy a subspace assumption.

no code implementations • 30 Nov 2018 • Cencheng Shen, Sambit Panda, Joshua T. Vogelstein

It has been demonstrated that these proximity matrices can be thought of as kernels, connecting the decision forest literature to the extensive kernel machine literature.

no code implementations • 14 Jun 2018 • Cencheng Shen, Joshua T. Vogelstein

Distance-based tests, also called "energy statistics", are leading methods for two-sample and independence tests from the statistics community.

1 code implementation • 26 Oct 2017 • Cencheng Shen, Carey E. Priebe, Joshua T. Vogelstein

Understanding and developing a correlation measure that can detect general dependencies is not only imperative to statistics and machine learning, but also crucial to general scientific discovery in the big data age.

4 code implementations • 16 Sep 2016 • Joshua T. Vogelstein, Eric Bridgeford, Qing Wang, Carey E. Priebe, Mauro Maggioni, Cencheng Shen

Understanding the relationships between different properties of data, such as whether a connectome or genome has information about disease status, is becoming increasingly important in modern biological datasets.

2 code implementations • 10 Jun 2015 • Tyler M. Tomita, James Browne, Cencheng Shen, Jaewon Chung, Jesse L. Patsolic, Benjamin Falk, Jason Yim, Carey E. Priebe, Randal Burns, Mauro Maggioni, Joshua T. Vogelstein

Unfortunately, these extensions forfeit one or more of the favorable properties of decision forests based on axis-aligned splits, such as robustness to many noise dimensions, interpretability, or computational efficiency.

no code implementations • 4 Feb 2015 • Cencheng Shen, Li Chen, Yuexiao Dong, Carey E. Priebe

The results are demonstrated via simulations and real data experiments, where the new algorithm achieves comparable numerical performance and significantly faster.

1 code implementation • 12 Dec 2014 • Cencheng Shen, Joshua T. Vogelstein, Carey E. Priebe

Then the shortest-path distance within each modality is calculated from the joint neighborhood graph, followed by embedding into and matching in a common low-dimensional Euclidean space.

no code implementations • 23 Nov 2013 • Li Chen, Cencheng Shen, Joshua Vogelstein, Carey Priebe

For random graphs distributed according to stochastic blockmodels, a special case of latent position graphs, adjacency spectral embedding followed by appropriate vertex classification is asymptotically Bayes optimal; but this approach requires knowledge of and critically depends on the model dimension.

no code implementations • 30 Apr 2013 • Cencheng Shen, Ming Sun, Minh Tang, Carey E. Priebe

For multiple multivariate data sets, we derive conditions under which Generalized Canonical Correlation Analysis (GCCA) improves classification performance of the projected datasets, compared to standard Canonical Correlation Analysis (CCA) using only two data sets.

no code implementations • 9 Jan 2013 • Donniell E. Fishkind, Cencheng Shen, Youngser Park, Carey E. Priebe

Suppose that two large, multi-dimensional data sets are each noisy measurements of the same underlying random process, and principle components analysis is performed separately on the data sets to reduce their dimensionality.

