An invertible transform for efficient string matching in labeled digraphs

9 May 2019  ·  Abhinav Nellore, Austin Nguyen, Reid F. Thompson ·

Let $G = (V, E)$ be a digraph where each vertex is unlabeled, each edge is labeled by a character in some alphabet $\Omega$, and any two edges with both the same head and the same tail have different labels. The powerset construction gives a transform of $G$ into a weakly connected digraph $G' = (V', E')$ that enables solving the decision problem of whether there is a walk in $G$ matching an arbitrarily long query string $q$ in time linear in $|q|$ and independent of $|E|$ and $|V|$. We show $G$ is uniquely determined by $G'$ when for every $v_\ell \in V$, there is some distinct string $s_\ell$ on $\Omega$ such that $v_\ell$ is the origin of a closed walk in $G$ matching $s_\ell$, and no other walk in $G$ matches $s_\ell$ unless it starts and ends at $v_\ell$. We then exploit this invertibility condition to strategically alter any $G$ so its transform $G'$ enables retrieval of all $t$ terminal vertices of walks in the unaltered $G$ matching $q$ in $O(|q| + t \log |V|)$ time. We conclude by proposing two defining properties of a class of transforms that includes the Burrows-Wheeler transform and the transform presented here.

PDF Abstract
No code implementations yet. Submit your code now

Categories


Data Structures and Algorithms

Datasets


  Add Datasets introduced or used in this paper