no code implementations • 30 Apr 2024 • Yin-Jen Chen, Minh Tang
We study the matrix-variate regression problem $Y_i = \sum_{k} \beta_{1k} X_i \beta_{2k}^{\top} + E_i$ for $i=1, 2\dots, n$ in the high dimensional regime wherein the response $Y_i$ are matrices whose dimensions $p_{1}\times p_{2}$ outgrow both the sample size $n$ and the dimensions $q_{1}\times q_{2}$ of the predictor variables $X_i$ i. e., $q_{1}, q_{2} \ll n \ll p_{1}, p_{2}$.
no code implementations • 20 Aug 2022 • Sheyda Peyman, Minh Tang, Vince Lyzinski
Here, a common suite of methods relies on spectral graph embeddings, which have been shown to provide both good algorithmic performance and flexible settings in which regularization techniques can be implemented to help mitigate the effect of an adversary.
no code implementations • 19 Mar 2022 • Yichi Zhang, Minh Tang
We first derive upper bounds for the $\ell_2$ (spectral norm) and $\ell_{2\to\infty}$ (maximum row-wise $\ell_2$ norm) distances between the approximate singular vectors of $\hat{\mathbf{M}}$ and the true singular vectors of the signal matrix $\mathbf{M}$.
no code implementations • 5 Oct 2021 • Yin-Jen Chen, Minh Tang
We study the classification problem for high-dimensional data with $n$ observations on $p$ features where the $p \times p$ covariance matrix $\Sigma$ exhibits a spiked eigenvalues structure and the vector $\zeta$, given by the difference between the whitened mean vectors, is sparse with sparsity at most $s$.
1 code implementation • 9 Sep 2021 • John Koo, Minh Tang, Michael W. Trosset
We connect two random graph models, the Popularity Adjusted Block Model (PABM) and the Generalized Random Dot Product Graph (GRDPG), by demonstrating that the PABM is a special case of the GRDPG in which communities correspond to mutually orthogonal subspaces of latent vectors.
no code implementations • 23 May 2021 • Xinjie Du, Minh Tang
Special cases of this hypothesis test include testing whether two vertices in a stochastic block model or degree-corrected stochastic block model graph have the same block membership vectors, or testing whether two vertices in a popularity adjusted block model have the same community assignment.
no code implementations • 18 Jan 2021 • Yichi Zhang, Minh Tang
Random-walk based network embedding algorithms like DeepWalk and node2vec are widely used to obtain Euclidean representation of the nodes in a network prior to performing downstream inference tasks.
no code implementations • 17 Dec 2020 • Joshua Agterberg, Minh Tang, Carey Priebe
We propose a nonparametric two-sample test statistic for low-rank, conditionally independent edge random graphs whose edge probability matrices have negative eigenvalues and arbitrarily close eigenvalues.
Graph Embedding Statistics Theory Statistics Theory
no code implementations • 15 Apr 2020 • Michael W. Trosset, Mingyue Gao, Minh Tang, Carey E. Priebe
We submit that techniques for manifold learning can be used to learn the unknown submanifold well enough to realize benefit from restricted inference.
no code implementations • 31 Mar 2020 • Joshua Agterberg, Minh Tang, Carey E. Priebe
Two separate and distinct sources of nonidentifiability arise naturally in the context of latent position random graph models, though neither are unique to this setting.
no code implementations • 29 Sep 2019 • Keith Levin, Fred Roosta, Minh Tang, Michael W. Mahoney, Carey E. Priebe
In both cases, we prove that when the underlying graph is generated according to a latent space model called the random dot product graph, which includes the popular stochastic block model as a special case, an out-of-sample extension based on a least-squares objective obeys a central limit theorem about the true latent position of the out-of-sample vertex.
no code implementations • 23 Aug 2018 • Carey E. Priebe, Youngser Park, Joshua T. Vogelstein, John M. Conroy, Vince Lyzinski, Minh Tang, Avanti Athreya, Joshua Cape, Eric Bridgeford
Clustering is concerned with coherently grouping observations without any explicit concept of true groupings.
no code implementations • 30 Mar 2018 • Minh Tang
We derive the limiting distribution for the largest eigenvalues of the adjacency matrix for a stochastic blockmodel graph when the number of vertices tends to infinity.
no code implementations • 16 Sep 2017 • Patrick Rubin-Delanchy, Joshua Cape, Minh Tang, Carey E. Priebe
Spectral embedding is a procedure which can be used to obtain vector representations of the nodes of a graph.
no code implementations • 16 Sep 2017 • Avanti Athreya, Donniell E. Fishkind, Keith Levin, Vince Lyzinski, Youngser Park, Yichen Qin, Daniel L. Sussman, Minh Tang, Joshua T. Vogelstein, Carey E. Priebe
In this survey paper, we describe a comprehensive paradigm for statistical inference on random dot product graphs, a paradigm centered on spectral embeddings of adjacency and Laplacian matrices.
1 code implementation • 5 Sep 2017 • Joshua T. Vogelstein, Eric Bridgeford, Minh Tang, Da Zheng, Christopher Douville, Randal Burns, Mauro Maggioni
To solve key biomedical problems, experimentalists now routinely measure millions or billions of features (dimensions) per sample, with the hope that data science techniques will be able to build accurate data-driven inferences.
1 code implementation • 9 May 2017 • Carey E. Priebe, Youngser Park, Minh Tang, Avanti Athreya, Vince Lyzinski, Joshua T. Vogelstein, Yichen Qin, Ben Cocanougher, Katharina Eichler, Marta Zlatic, Albert Cardona
We present semiparametric spectral modeling of the complete larval Drosophila mushroom body connectome.
no code implementations • 28 Jul 2016 • Minh Tang, Carey E. Priebe
As a corollary, we show that for stochastic blockmodel graphs, the rows of the spectral embedding of the normalized Laplacian converge to multivariate normals and furthermore the mean and the covariance matrix of each row are functions of the associated vertex's block membership.
no code implementations • 7 Mar 2015 • Vince Lyzinski, Minh Tang, Avanti Athreya, Youngser Park, Carey E. Priebe
We propose a robust, scalable, integrated methodology for community detection and community comparison in graphs.
no code implementations • 23 May 2014 • Shakira Suwan, Dominic S. Lee, Runze Tang, Daniel L. Sussman, Minh Tang, Carey E. Priebe
Inference for the stochastic blockmodel is currently of burgeoning interest in the statistical community, as well as in various application domains as diverse as social networks, citation networks, brain connectivity networks (connectomics), etc.
no code implementations • 2 Oct 2013 • Vince Lyzinski, Daniel Sussman, Minh Tang, Avanti Athreya, Carey Priebe
Vertex clustering in a stochastic blockmodel graph has wide applicability and has been the subject of extensive research.
no code implementations • 31 May 2013 • Avanti Athreya, Vince Lyzinski, David J. Marchette, Carey E. Priebe, Daniel L. Sussman, Minh Tang
We prove a central limit theorem for the components of the largest eigenvectors of the adjacency matrix of a finite-dimensional random dot product graph whose true latent positions are unknown.
no code implementations • 21 May 2013 • Minh Tang, Youngser Park, Carey E. Priebe
We show that, under the latent position graph model and for sufficiently large $n$, the mapping of the out-of-sample vertices is close to its true latent position.
no code implementations • 30 Apr 2013 • Cencheng Shen, Ming Sun, Minh Tang, Carey E. Priebe
For multiple multivariate data sets, we derive conditions under which Generalized Canonical Correlation Analysis (GCCA) improves classification performance of the projected datasets, compared to standard Canonical Correlation Analysis (CCA) using only two data sets.
no code implementations • 5 Dec 2012 • Minh Tang, Daniel L. Sussman, Carey E. Priebe
In this work we show that, using the eigen-decomposition of the adjacency matrix, we can consistently estimate feature maps for latent position graphs with positive definite link function $\kappa$, provided that the latent positions are i. i. d.
no code implementations • 15 Nov 2012 • Carey E. Priebe, Daniel L. Sussman, Minh Tang, Joshua T. Vogelstein
Thus we errorfully observe $G$ when we observe the graph $\widetilde{G} = (V,\widetilde{E})$ as the edges in $\widetilde{E}$ arise from the classifications of the "edge-features", and are expected to be errorful.
no code implementations • 26 May 2012 • Nam H. Lee, Jordan Yoder, Minh Tang, Carey E. Priebe
Each of the message-exchanging actors is modeled as a process in a latent space.