 Graph Embeddings

# Spectral Gap Rewiring Layer

Introduced by Arnaiz-Rodriguez et al. in DiffWire: Inductive Graph Rewiring via the Lovász Bound

TL;DR: GAP-Layer is a GNN Layer which is able to rewire a graph in an inductive an parameter-free way optimizing the spectral gap (minimizing or maximizing the bottleneck size), learning a differentiable way to compute the Fiedler vector and the Fiedler value of the graph.

## Summary

GAP-Layer is a rewiring layer based on minimizing or maximizing the spectral gap (or graph bottleneck size) in an inductive way. Depending on the mining task we want to perform in our graph, we would like to maximize or minimize the size of the bottleneck, aiming to more connected or more separated communities.

## GAP-Layer: Spectral Gap Rewiring

#### Loss and derivatives using $\mathbf{L}$ or $\mathbf{\cal L}$

For this explanation, we are going to suppose we want to minimize the spectral gap, i.e. make the graph bottleneck size smaller. For minimizing the spectral GAP we minimize this loss:

$$L_{Fiedler} = |\tilde{\mathbf{A}}-\mathbf{A}| _F + \alpha(\lambda_2)^2$$

The gradients of this cost function w.r.t each element of $\mathbf{A}$ are not trivial. Depending on if we use the Laplacian, $\mathbf{L}$, or the normalized Laplacian, $\cal L$, the derivatives are going to be different. For the former case ($\mathbf{L}$), we will use the derivatives presented in Kang et al. 2019. In the latter scenario ($\cal L$), we present the Spectral Gradients: derivatives from the spectral gap w.r.t. the Normalized Laplacian. However, whatever option we choose, $\lambda_2$ can seen as a function of $\tilde{\mathbf{A}}$ and , hence, $\nabla_{\tilde{\mathbf{A}}}\lambda_2$, the gradient of $\lambda_2$ wrt each component of $\tilde{\mathbf{A}}$ (how does the bottleneck change with each change in our graph?), comes from the chain rule of the matrix derivative $Tr\left[\left(\nabla_{\tilde{\mathbf{L}}}\lambda_2\right)^T\cdot\nabla_{\tilde{\mathbf{A}}}\tilde{\mathbf{L}}\right]$ if using the Laplacian or $Tr\left[\left(\nabla_{\tilde{\mathbf{\cal L}}}\lambda_2\right)^T\cdot\nabla_{\tilde{\mathbf{A}}}\tilde{\mathbf{\cal L}}\right]$ if using the normalized Laplacian. Both of this derivatives, relies on the Fiedler vector (2nd eigenvector: $\mathbf{f}_2$ if we use $\mathbf{L}$ and $\mathbf{g}_2$ if using $\mathbf{\cal L}$ instead). For more details on those derivatives, and for the sake of simplicity in this blog explanation, I suggest go to the original paper.

#### Differentiable approximation of $\mathbf{f}_2$ and $\lambda_2$

Once we have those derivatives, the problem is still not that trivial. Note that our cost function $L_{Fiedler}$, relies on an eigenvalue $\lambda_2$. In addition, the derivatives also depends on the Fiedler vector $\mathbf{f}_2$ or $\mathbf{g}_2$, which is the eigenvector corresponding to the aforementioned eigenvalue. However, we DO NOT COMPUTE IT SPECTRALLY, as its computation has a complexity of $O(n^3)$ and would need to be computed in every learning iteration. Instead, we learn an approximation of $\mathbf{f}_2$ and use its Dirichlet energy ${\cal E}(\mathbf{f}_2)$ to approximate the $\lambda_2$. $$\mathbf{f}_2(u) = \begin{array}{cl} +1/\sqrt{n} & \text{if}\;\; u\;\; \text{belongs to the first cluster} \ -1/\sqrt{n} & \text{if}\;\; u\;\; \text{belongs to the second cluster} \end{array}$$ In addition, if using $\mathbf{\cal L}$, since $\mathbf{g}_2=\mathbf{D}^{1/2}\mathbf{f}_2$, we first approximate $\mathbf{g}_2$ and then approximate $\lambda_2$ from ${\cal E}(\mathbf{g}_2)$. With this approximation, we can easily compute the node belonging to each cluster with a simple MLP. In addition, such as the Fiedler value must satisfy orthogonality and normality, restrictions must be added to that MLP Clustering.

### GAP-Layer

To sum up, GAP-Layer can be defined as the following. Given the matrix $\mathbf{X}_{n\times F}$ encoding the features of the nodes after any message passing (MP) layer, $\mathbf{S}_{n\times 2}=\textrm{Softmax}(\textrm{MLP}(\mathbf{X}))$ learns the association $\mathbf{X}\rightarrow \mathbf{S}$ while $\mathbf{S}$ is optimized according to the loss:

$$L_{Cut} = -\frac{Tr[\mathbf{S}^T\mathbf{A}\mathbf{S}]}{Tr[\mathbf{S}^T\mathbf{D}\mathbf{S}]} + \left|\frac{\mathbf{S}^T\mathbf{S}}{|\mathbf{S}^T\mathbf{S}|_F} - \frac{\mathbf{I}_n}{\sqrt{2}}\right|_F$$ Then, the $\mathbf{f}_2$ is approximated from $\mathbf{S}$ using $\mathbf{f}_2(u)$ equation. Once calculated $\mathbf{f}_2$ and $\lambda_2$ we consider the loss:

$$L_{Fiedler} = |\tilde{\mathbf{A}}-\mathbf{A}|_F + \alpha(\lambda_2)^2$$ $$\mathbf{\tilde{A}} = \mathbf{A} - \mu \nabla_\mathbf{\tilde{A}}\lambda_2$$ returning $\tilde{\mathbf{A}}$. Then the GAP diffusion $\mathbf{T}^{GAP} = \tilde{\mathbf{A}}(\mathbf{S}) \odot \mathbf{A}$ results from minimizing

$$L_{GAP}= L_{Cut} + L_{Fiedler}$$

References (Kang et al. 2019) Kang, J., & Tong, H. (2019, November). N2n: Network derivative mining. In Proceedings of the 28th ACM International Conference on Information and Knowledge Management (pp. 861-870).

#### Papers

Paper Code Results Date Stars