Search Results for author: Cencheng Shen

Found 24 papers, 13 papers with code

Discovering and Deciphering Relationships Across Disparate Data Modalities

4 code implementations16 Sep 2016 Joshua T. Vogelstein, Eric Bridgeford, Qing Wang, Carey E. Priebe, Mauro Maggioni, Cencheng Shen

Understanding the relationships between different properties of data, such as whether a connectome or genome has information about disease status, is becoming increasingly important in modern biological datasets.

Computational Efficiency

hyppo: A Multivariate Hypothesis Testing Python Package

4 code implementations3 Jul 2019 Sambit Panda, Satish Palaniappan, Junhao Xiong, Eric W. Bridgeford, Ronak Mehta, Cencheng Shen, Joshua T. Vogelstein

We introduce hyppo, a unified library for performing multivariate hypothesis testing, including independence, two-sample, and k-sample testing.

Two-sample testing

Sparse Projection Oblique Randomer Forests

2 code implementations10 Jun 2015 Tyler M. Tomita, James Browne, Cencheng Shen, Jaewon Chung, Jesse L. Patsolic, Benjamin Falk, Jason Yim, Carey E. Priebe, Randal Burns, Mauro Maggioni, Joshua T. Vogelstein

Unfortunately, these extensions forfeit one or more of the favorable properties of decision forests based on axis-aligned splits, such as robustness to many noise dimensions, interpretability, or computational efficiency.

Computational Efficiency

One-Hot Graph Encoder Embedding

3 code implementations27 Sep 2021 Cencheng Shen, Qizhe Wang, Carey E. Priebe

In this paper we propose a lightning fast graph embedding method called one-hot graph encoder embedding.

Clustering Graph Embedding +1

Graph Encoder Ensemble for Simultaneous Vertex Embedding and Community Detection

1 code implementation18 Jan 2023 Cencheng Shen, Youngser Park, Carey E. Priebe

In this paper, we introduce a novel and computationally efficient method for vertex embedding, community detection, and community size determination.

Community Detection

Synergistic Graph Fusion via Encoder Embedding

1 code implementation31 Mar 2023 Cencheng Shen, Carey E. Priebe, Jonathan Larson, Ha Trinh

In this paper, we introduce a novel approach called graph fusion embedding, designed for multi-graph embedding with shared vertex sets.

Classification Graph Embedding +1

Discovering Communication Pattern Shifts in Large-Scale Labeled Networks using Encoder Embedding and Vertex Dynamics

1 code implementation3 May 2023 Cencheng Shen, Jonathan Larson, Ha Trinh, Xihan Qin, Youngser Park, Carey E. Priebe

Analyzing large-scale time-series network data, such as social media and email communications, poses a significant challenge in understanding social dynamics, detecting anomalies, and predicting trends.

Time Series

Random Forests for Adaptive Nearest Neighbor Estimation of Information-Theoretic Quantities

1 code implementation30 Jun 2019 Ronan Perry, Ronak Mehta, Richard Guo, Eva Yezerets, Jesús Arroyo, Mike Powell, Hayden Helm, Cencheng Shen, Joshua T. Vogelstein

Information-theoretic quantities, such as conditional entropy and mutual information, are critical data summaries for quantifying uncertainty.

From Distance Correlation to Multiscale Graph Correlation

1 code implementation26 Oct 2017 Cencheng Shen, Carey E. Priebe, Joshua T. Vogelstein

Understanding and developing a correlation measure that can detect general dependencies is not only imperative to statistics and machine learning, but also crucial to general scientific discovery in the big data age.

The Chi-Square Test of Distance Correlation

1 code implementation27 Dec 2019 Cencheng Shen, Sambit Panda, Joshua T. Vogelstein

One major bottleneck is the testing process: because the null distribution of distance correlation depends on the underlying random variables and metric choice, it typically requires a permutation test to estimate the null and compute the p-value, which is very costly for large amount of data.

valid

Edge-Parallel Graph Encoder Embedding

1 code implementation6 Feb 2024 Ariel Lubonja, Cencheng Shen, Carey Priebe, Randal Burns

New algorithms for embedding graphs have reduced the asymptotic complexity of finding low-dimensional representations.

The Exact Equivalence of Distance and Kernel Methods for Hypothesis Testing

no code implementations14 Jun 2018 Cencheng Shen, Joshua T. Vogelstein

Distance-based tests, also called "energy statistics", are leading methods for two-sample and independence tests from the statistics community.

Two-sample testing

Sparse Representation Classification Beyond L1 Minimization and the Subspace Assumption

no code implementations4 Feb 2015 Cencheng Shen, Li Chen, Yuexiao Dong, Carey E. Priebe

The results are demonstrated via simulations and real data experiments, where the new algorithm achieves comparable numerical performance and significantly faster.

Classification Classification Consistency +1

Manifold Matching using Shortest-Path Distance and Joint Neighborhood Selection

1 code implementation12 Dec 2014 Cencheng Shen, Joshua T. Vogelstein, Carey E. Priebe

Then the shortest-path distance within each modality is calculated from the joint neighborhood graph, followed by embedding into and matching in a common low-dimensional Euclidean space.

Robust Vertex Classification

no code implementations23 Nov 2013 Li Chen, Cencheng Shen, Joshua Vogelstein, Carey Priebe

For random graphs distributed according to stochastic blockmodels, a special case of latent position graphs, adjacency spectral embedding followed by appropriate vertex classification is asymptotically Bayes optimal; but this approach requires knowledge of and critically depends on the model dimension.

Classification General Classification +1

On the Incommensurability Phenomenon

no code implementations9 Jan 2013 Donniell E. Fishkind, Cencheng Shen, Youngser Park, Carey E. Priebe

Suppose that two large, multi-dimensional data sets are each noisy measurements of the same underlying random process, and principle components analysis is performed separately on the data sets to reduce their dimensionality.

Generalized Canonical Correlation Analysis for Classification

no code implementations30 Apr 2013 Cencheng Shen, Ming Sun, Minh Tang, Carey E. Priebe

For multiple multivariate data sets, we derive conditions under which Generalized Canonical Correlation Analysis (GCCA) improves classification performance of the projected datasets, compared to standard Canonical Correlation Analysis (CCA) using only two data sets.

Classification General Classification

Sparse Representation Classification via Screening for Graphs

no code implementations4 Jun 2019 Cencheng Shen, Li Chen, Yuexiao Dong, Carey Priebe

The sparse representation classifier (SRC) is shown to work well for image recognition problems that satisfy a subspace assumption.

Classification Classification Consistency +1

Independence Testing for Temporal Data

no code implementations18 Aug 2019 Cencheng Shen, Jaewon Chung, Ronak Mehta, Ting Xu, Joshua T. Vogelstein

While many non-parametric and universally consistent dependence measures have recently been proposed, directly applying them to temporal data can inflate the p-value and result in invalid test.

Time Series Time Series Analysis +1

High-dimensional and universally consistent k-sample tests

no code implementations20 Oct 2019 Sambit Panda, Cencheng Shen, Ronan Perry, Jelle Zorn, Antoine Lutz, Carey E. Priebe, Joshua T. Vogelstein

The evaluation included several popular independence statistics and covered a comprehensive set of simulations.

Two-sample testing

High-Dimensional Independence Testing via Maximum and Average Distance Correlations

no code implementations4 Jan 2020 Cencheng Shen, Yuexiao Dong

This paper introduces and investigates the utilization of maximum and average distance correlations for multivariate independence testing.

valid Vocal Bursts Intensity Prediction

Cannot find the paper you are looking for? You can Submit a new open access paper.