 Graph Embeddings

# Commute Times Layer

Introduced by Arnaiz-Rodriguez et al. in DiffWire: Inductive Graph Rewiring via the Lovász Bound

TL;DR: CT-Layer is a GNN Layer which is able to rewire a graph in an inductive an parameter-free way according to the commute times distance (or effective resistance). We address it learning a differentiable way to compute the CT-embedding of the graph.

### Summary

CT-Layer is able to Learn the Commute Times distance between nodes (i.e. effective resistance distance) in a differentiable way, instead of the common spectral version, and in a parameter free manner, which is not the cased of the heat kernel. This approach allow to solve it as an optimization problem inside a GNN, leading to have a new layer which is able to learn how rewire a given graph in an optimal, and inductive way.

In addition, CT-Layer also is able to learn Commute Times embeddings, and then calculate it for any graph in an inductive way. The Commute Times embedding is also related with the eigenvalues and eigenvectors of the Laplacian of the graph, because CT embedding is just the eigenvectors scaled. Therefore, CT-Layer is also able to learn hot to calculate the spectrum of the Laplacian in a differentiable way. Therefore, this embedding must satisfy orthogonality and normality.

Finally, recent connections has been found between commute times distance and curvature (which is non-differentiable), establishing equivalences between them. Therefore, CT-Layer can also be seen as the differentiable version of the curvature rewiring.

We are going through a quick overview of the layer, but I suggest go to the paper for a detailed explanation.

### Spectral CT- Embedding downsides

CT-embedding $\mathbf{Z}$ is computed spectrally in the literature (until the proposal of this method) or it is approximated using the heat kernel (very dependent on hyperparameter $t$). This fact does not allow us to propose differentiable methods using that measure: $$\mathbf{Z}=\sqrt{vol(G)}\mathbf{\Lambda}^\frac{1}{2}\mathbf{F}^T \textrm{ given } \mathbf{L}=\mathbf{F}\mathbf{\Lambda}\mathbf{F}^T$$

Then, CT-distance is given by the Euclidean distances between the embeddings $CT_{uv} = ||\mathbf{z_u}-\mathbf{z_v}||^2$. The spectral form is:

$$\frac{CT_{uv}}{vol(G)} = \sum_{i=2}^n \frac{1}{\lambda_i} (\mathbf{f}(u)-\mathbf{f}(v))^2$$ being $\mathbf{f}$ the eigenvectors of the graph Laplacian.

This embedding and distances gives us desirable properties of the graph, such an understanding of the structure, or an embedding based on the spectrum which minimizes Dirichlet energies. However, the spectral computation is not differentiable.

### CT-Layer as an optimization problem: Differentiable, learnable and inductive CT-Layer

Giving that $\mathbf{Z}$ minimizes Dirichlet energies s.t. being orthogonal and normalized, we can formulate this problem as constraining neighboring nodes to have a similar embeddings s.t. $\mathbf{Z}\mathbf{Z}^T=\mathbf{I}$.

$$\mathbf{Z} = \arg\min_{\mathbf{Z}^T\mathbf{Z}=\mathbf{I}} \frac{\sum_{u,v} ||\mathbf{z_u}-\mathbf{z_v}||^2\mathbf{A}_{uv}}{\sum_{u,v} \mathbf{Z}^2_{uv} d_u}=\frac{Tr[\mathbf{Z}^T\mathbf{L}\mathbf{Z}]}{Tr[\mathbf{Z}^T\mathbf{D}\mathbf{Z}]}$$

With the above elements we have a definition of CT-Layer, our rewiring layer: Given the matrix $\mathbf{X}_{n\times F}$ encoding the features of the nodes after any message passing (MP) layer, $\mathbf{Z}_{n\times O(n)}=\tanh(\textrm{MLP}(\mathbf{X}))$ learns the association $\mathbf{X}\rightarrow \mathbf{Z}$ while $\mathbf{Z}$ is optimized according to the loss $$L_{CT} = \frac{Tr[\mathbf{Z}^T\mathbf{L}\mathbf{Z}]}{Tr[\mathbf{Z}^T\mathbf{D}\mathbf{Z}]} + \left|\frac{\mathbf{Z}^T\mathbf{Z}}{|\mathbf{Z}^T\mathbf{Z}|_F} - \mathbf{I}_n\right|_F$$ This results in the following resistance diffusion $\mathbf{T}^{CT} = \mathbf{R}(\mathbf{S})\odot \mathbf{A}$ (Hadamard product between the resistance distance and the adjacency) which provides as input to the subsequent MP layer a learnt convolution matrix.

As explained before, $\mathbf{Z}$ is the commute times embedding matrix and the pairwise euclidian distance of that learned embeddings are the commute times distances or resistance distances. Therefore, once trained this layer, it will be able to calculate the commute times embedding for a new graph, and rewire that new and unseen graph in a principled way based on the commute times distance.

## Preservation of Structure

Does this rewiring preserve the original structure? Let $G' = \textrm{Sparsify}(G, q)$ be a sampling algorithm of graph $G = (V, E)$, where edges $e \in E$ are sampled with probability $q\propto R_e$ (proportional to the effective resistance, i.e. commute times). Then, for $n = |V|$ sufficiently large and $1/\sqrt{n}< \epsilon\le 1$, we need O(n\log n/\epsilon^2)$samples to satisfy: $$\forall \mathbf{x}\in\mathbb{R}^n:\; (1-\epsilon)\mathbf{x}^T\mathbf{L}_G\mathbf{x}\le\mathbf{x}^T\mathbf{L}_{G'}\mathbf{x}\le (1+\epsilon)\mathbf{x}^T\mathbf{L}_G\mathbf{x}$$ The intuitions behind is that Dirichlet energies in$G'$are bounded in$(1\pm \epsilon)$of the Dirichlet energies of the original graph$G\$.

#### Papers

Paper Code Results Date Stars