no code implementations • 24 Jan 2025 • Cencheng Shen, Darren Edge, Jonathan Larson, Carey E. Priebe
This graph covariance quantifies temporal changes in dependence structures within categorical data and is established as a consistent dependence measure under the Bernoulli distribution.
no code implementations • 24 Jan 2025 • Cencheng Shen, Yuexiao Dong, Carey E. Priebe, Jonathan Larson, Ha Trinh, Youngser Park
We prove that the population principal graph encoder embedding preserves the conditional density of the vertex labels and that the population community score successfully distinguishes the principal communities.
no code implementations • 6 Jun 2024 • Xihan Qin, Cencheng Shen
Graph is a ubiquitous representation of data in various research fields, and graph embedding is a prevalent machine learning technique for capturing key features and generating fixed-sized attributes.
no code implementations • 4 Jun 2024 • Cencheng Shen
This paper introduces a new kernel-based classifier by viewing kernel matrices as generalized graphs and leveraging recent progress in graph embedding techniques.
no code implementations • 24 May 2024 • Cencheng Shen
Graph encoder embedding, a recent technique for graph data, offers speed and scalability in producing vertex-level representations from binary graphs.
no code implementations • 21 May 2024 • Cencheng Shen, Jonathan Larson, Ha Trinh, Carey E. Priebe
We provide the theoretical rationale for the refinement procedure, demonstrating how and why our proposed method can effectively identify useful hidden communities via stochastic block models, and how the refinement method leads to improved vertex embedding and better decision boundaries for subsequent vertex classification.
1 code implementation • 6 Feb 2024 • Ariel Lubonja, Cencheng Shen, Carey Priebe, Randal Burns
New algorithms for embedding graphs have reduced the asymptotic complexity of finding low-dimensional representations.
1 code implementation • 26 Jul 2023 • Eric W. Bridgeford, Jaewon Chung, Brian Gilbert, Sambit Panda, Adam Li, Cencheng Shen, Alexandra Badea, Brian Caffo, Joshua T. Vogelstein
Causal inference studies whether the presence of a variable influences an observed outcome.
1 code implementation • 3 May 2023 • Cencheng Shen, Jonathan Larson, Ha Trinh, Xihan Qin, Youngser Park, Carey E. Priebe
Analyzing large-scale time-series network data, such as social media and email communications, poses a significant challenge in understanding social dynamics, detecting anomalies, and predicting trends.
1 code implementation • 31 Mar 2023 • Cencheng Shen, Carey E. Priebe, Jonathan Larson, Ha Trinh
In this paper, we introduce a method called graph fusion embedding, designed for multi-graph embedding with shared vertex sets.
1 code implementation • 18 Jan 2023 • Cencheng Shen, Youngser Park, Carey E. Priebe
In this paper, we introduce a novel and computationally efficient method for vertex embedding, community detection, and community size determination.
3 code implementations • 27 Sep 2021 • Cencheng Shen, Qizhe Wang, Carey E. Priebe
In this paper we propose a lightning fast graph embedding method called one-hot graph encoder embedding.
no code implementations • 25 Oct 2020 • Carey E. Priebe, Cencheng Shen, Ningyuan Huang, Tianyi Chen
Neural networks have achieved remarkable successes in machine learning tasks.
no code implementations • 4 Jan 2020 • Cencheng Shen, Yuexiao Dong
This paper introduces and investigates the utilization of maximum and average distance correlations for multivariate independence testing.
1 code implementation • 27 Dec 2019 • Cencheng Shen, Sambit Panda, Joshua T. Vogelstein
One major bottleneck is the testing process: because the null distribution of distance correlation depends on the underlying random variables and metric choice, it typically requires a permutation test to estimate the null and compute the p-value, which is very costly for large amount of data.
no code implementations • 20 Oct 2019 • Sambit Panda, Cencheng Shen, Ronan Perry, Jelle Zorn, Antoine Lutz, Carey E. Priebe, Joshua T. Vogelstein
The K-sample testing problem involves determining whether K groups of data points are each drawn from the same distribution.
no code implementations • 18 Aug 2019 • Cencheng Shen, Jaewon Chung, Ronak Mehta, Ting Xu, Joshua T. Vogelstein
While many non-parametric and universally consistent dependence measures have recently been proposed, directly applying them to temporal data can inflate the p-value and result in an invalid test.
4 code implementations • 3 Jul 2019 • Sambit Panda, Satish Palaniappan, Junhao Xiong, Eric W. Bridgeford, Ronak Mehta, Cencheng Shen, Joshua T. Vogelstein
We introduce hyppo, a unified library for performing multivariate hypothesis testing, including independence, two-sample, and k-sample testing.
1 code implementation • 30 Jun 2019 • Ronan Perry, Ronak Mehta, Richard Guo, Eva Yezerets, Jesús Arroyo, Mike Powell, Hayden Helm, Cencheng Shen, Joshua T. Vogelstein
Information-theoretic quantities, such as conditional entropy and mutual information, are critical data summaries for quantifying uncertainty.
no code implementations • 4 Jun 2019 • Cencheng Shen, Li Chen, Yuexiao Dong, Carey Priebe
The sparse representation classifier (SRC) is shown to work well for image recognition problems that satisfy a subspace assumption.
no code implementations • 30 Nov 2018 • Sambit Panda, Cencheng Shen, Joshua T. Vogelstein
Decision forests are widely used for classification and regression tasks.
no code implementations • 14 Jun 2018 • Cencheng Shen, Joshua T. Vogelstein
Distance-based tests, also called "energy statistics", are leading methods for two-sample and independence tests from the statistics community.
1 code implementation • 26 Oct 2017 • Cencheng Shen, Carey E. Priebe, Joshua T. Vogelstein
Understanding and developing a correlation measure that can detect general dependencies is not only imperative to statistics and machine learning, but also crucial to general scientific discovery in the big data age.
4 code implementations • 16 Sep 2016 • Joshua T. Vogelstein, Eric Bridgeford, Qing Wang, Carey E. Priebe, Mauro Maggioni, Cencheng Shen
Understanding the relationships between different properties of data, such as whether a connectome or genome has information about disease status, is becoming increasingly important in modern biological datasets.
2 code implementations • 10 Jun 2015 • Tyler M. Tomita, James Browne, Cencheng Shen, Jaewon Chung, Jesse L. Patsolic, Benjamin Falk, Jason Yim, Carey E. Priebe, Randal Burns, Mauro Maggioni, Joshua T. Vogelstein
Unfortunately, these extensions forfeit one or more of the favorable properties of decision forests based on axis-aligned splits, such as robustness to many noise dimensions, interpretability, or computational efficiency.
no code implementations • 4 Feb 2015 • Cencheng Shen, Li Chen, Yuexiao Dong, Carey E. Priebe
The results are demonstrated via simulations and real data experiments, where the new algorithm achieves comparable numerical performance and significantly faster.
1 code implementation • 12 Dec 2014 • Cencheng Shen, Joshua T. Vogelstein, Carey E. Priebe
Then the shortest-path distance within each modality is calculated from the joint neighborhood graph, followed by embedding into and matching in a common low-dimensional Euclidean space.
no code implementations • 23 Nov 2013 • Li Chen, Cencheng Shen, Joshua Vogelstein, Carey Priebe
For random graphs distributed according to stochastic blockmodels, a special case of latent position graphs, adjacency spectral embedding followed by appropriate vertex classification is asymptotically Bayes optimal; but this approach requires knowledge of and critically depends on the model dimension.
no code implementations • 30 Apr 2013 • Cencheng Shen, Ming Sun, Minh Tang, Carey E. Priebe
For multiple multivariate data sets, we derive conditions under which Generalized Canonical Correlation Analysis (GCCA) improves classification performance of the projected datasets, compared to standard Canonical Correlation Analysis (CCA) using only two data sets.
no code implementations • 9 Jan 2013 • Donniell E. Fishkind, Cencheng Shen, Youngser Park, Carey E. Priebe
Suppose that two large, multi-dimensional data sets are each noisy measurements of the same underlying random process, and principle components analysis is performed separately on the data sets to reduce their dimensionality.