1 code implementation • 6 Feb 2024 • Ariel Lubonja, Cencheng Shen, Carey Priebe, Randal Burns
New algorithms for embedding graphs have reduced the asymptotic complexity of finding low-dimensional representations.
1 code implementation • 26 Jul 2023 • Eric W. Bridgeford, Jaewon Chung, Brian Gilbert, Sambit Panda, Adam Li, Cencheng Shen, Alexandra Badea, Brian Caffo, Joshua T. Vogelstein
Causal inference studies whether the presence of a variable influences an observed outcome.
1 code implementation • 3 May 2023 • Cencheng Shen, Jonathan Larson, Ha Trinh, Xihan Qin, Youngser Park, Carey E. Priebe
Analyzing large-scale time-series network data, such as social media and email communications, poses a significant challenge in understanding social dynamics, detecting anomalies, and predicting trends.
1 code implementation • 31 Mar 2023 • Cencheng Shen, Carey E. Priebe, Jonathan Larson, Ha Trinh
In this paper, we introduce a novel approach called graph fusion embedding, designed for multi-graph embedding with shared vertex sets.
1 code implementation • 18 Jan 2023 • Cencheng Shen, Youngser Park, Carey E. Priebe
In this paper, we introduce a novel and computationally efficient method for vertex embedding, community detection, and community size determination.
3 code implementations • 27 Sep 2021 • Cencheng Shen, Qizhe Wang, Carey E. Priebe
In this paper we propose a lightning fast graph embedding method called one-hot graph encoder embedding.
no code implementations • 25 Oct 2020 • Carey E. Priebe, Cencheng Shen, Ningyuan Huang, Tianyi Chen
Neural networks have achieved remarkable successes in machine learning tasks.
no code implementations • 4 Jan 2020 • Cencheng Shen, Yuexiao Dong
This paper introduces and investigates the utilization of maximum and average distance correlations for multivariate independence testing.
1 code implementation • 27 Dec 2019 • Cencheng Shen, Sambit Panda, Joshua T. Vogelstein
One major bottleneck is the testing process: because the null distribution of distance correlation depends on the underlying random variables and metric choice, it typically requires a permutation test to estimate the null and compute the p-value, which is very costly for large amount of data.
no code implementations • 20 Oct 2019 • Sambit Panda, Cencheng Shen, Ronan Perry, Jelle Zorn, Antoine Lutz, Carey E. Priebe, Joshua T. Vogelstein
The evaluation included several popular independence statistics and covered a comprehensive set of simulations.
no code implementations • 18 Aug 2019 • Cencheng Shen, Jaewon Chung, Ronak Mehta, Ting Xu, Joshua T. Vogelstein
While many non-parametric and universally consistent dependence measures have recently been proposed, directly applying them to temporal data can inflate the p-value and result in invalid test.
4 code implementations • 3 Jul 2019 • Sambit Panda, Satish Palaniappan, Junhao Xiong, Eric W. Bridgeford, Ronak Mehta, Cencheng Shen, Joshua T. Vogelstein
We introduce hyppo, a unified library for performing multivariate hypothesis testing, including independence, two-sample, and k-sample testing.
1 code implementation • 30 Jun 2019 • Ronan Perry, Ronak Mehta, Richard Guo, Eva Yezerets, Jesús Arroyo, Mike Powell, Hayden Helm, Cencheng Shen, Joshua T. Vogelstein
Information-theoretic quantities, such as conditional entropy and mutual information, are critical data summaries for quantifying uncertainty.
no code implementations • 4 Jun 2019 • Cencheng Shen, Li Chen, Yuexiao Dong, Carey Priebe
The sparse representation classifier (SRC) is shown to work well for image recognition problems that satisfy a subspace assumption.
no code implementations • 30 Nov 2018 • Sambit Panda, Cencheng Shen, Joshua T. Vogelstein
Decision forests are widely used for classification and regression tasks.
no code implementations • 14 Jun 2018 • Cencheng Shen, Joshua T. Vogelstein
Distance-based tests, also called "energy statistics", are leading methods for two-sample and independence tests from the statistics community.
1 code implementation • 26 Oct 2017 • Cencheng Shen, Carey E. Priebe, Joshua T. Vogelstein
Understanding and developing a correlation measure that can detect general dependencies is not only imperative to statistics and machine learning, but also crucial to general scientific discovery in the big data age.
4 code implementations • 16 Sep 2016 • Joshua T. Vogelstein, Eric Bridgeford, Qing Wang, Carey E. Priebe, Mauro Maggioni, Cencheng Shen
Understanding the relationships between different properties of data, such as whether a connectome or genome has information about disease status, is becoming increasingly important in modern biological datasets.
2 code implementations • 10 Jun 2015 • Tyler M. Tomita, James Browne, Cencheng Shen, Jaewon Chung, Jesse L. Patsolic, Benjamin Falk, Jason Yim, Carey E. Priebe, Randal Burns, Mauro Maggioni, Joshua T. Vogelstein
Unfortunately, these extensions forfeit one or more of the favorable properties of decision forests based on axis-aligned splits, such as robustness to many noise dimensions, interpretability, or computational efficiency.
no code implementations • 4 Feb 2015 • Cencheng Shen, Li Chen, Yuexiao Dong, Carey E. Priebe
The results are demonstrated via simulations and real data experiments, where the new algorithm achieves comparable numerical performance and significantly faster.
1 code implementation • 12 Dec 2014 • Cencheng Shen, Joshua T. Vogelstein, Carey E. Priebe
Then the shortest-path distance within each modality is calculated from the joint neighborhood graph, followed by embedding into and matching in a common low-dimensional Euclidean space.
no code implementations • 23 Nov 2013 • Li Chen, Cencheng Shen, Joshua Vogelstein, Carey Priebe
For random graphs distributed according to stochastic blockmodels, a special case of latent position graphs, adjacency spectral embedding followed by appropriate vertex classification is asymptotically Bayes optimal; but this approach requires knowledge of and critically depends on the model dimension.
no code implementations • 30 Apr 2013 • Cencheng Shen, Ming Sun, Minh Tang, Carey E. Priebe
For multiple multivariate data sets, we derive conditions under which Generalized Canonical Correlation Analysis (GCCA) improves classification performance of the projected datasets, compared to standard Canonical Correlation Analysis (CCA) using only two data sets.
no code implementations • 9 Jan 2013 • Donniell E. Fishkind, Cencheng Shen, Youngser Park, Carey E. Priebe
Suppose that two large, multi-dimensional data sets are each noisy measurements of the same underlying random process, and principle components analysis is performed separately on the data sets to reduce their dimensionality.