Sublinear-Time Algorithms for Computing & Embedding Gap Edit Distance

24 Jul 2020 · Tomasz Kociumaka, Barna Saha ·

In this paper, we design new sublinear-time algorithms for solving the gap edit distance problem and for embedding edit distance to Hamming distance. For the gap edit distance problem, we give a greedy algorithm that distinguishes in time $\tilde{O}(\frac{n}{k}+k^2)$ between length-$n$ input strings with edit distance at most $k$ and those with edit distance more than $4k^2$. This is an improvement and a simplification upon the main result of [Goldenberg, Krauthgamer, Saha, FOCS 2019], where the $k$ vs $\Theta(k^2)$ gap edit distance problem is solved in $\tilde{O}(\frac{n}{k}+k^3)$ time. We further generalize our result to solve the $k$ vs $\alpha k$ gap edit distance problem in time $\tilde{O}(\frac{n}{\alpha}+k^2+ \frac{k}{\alpha}\sqrt{nk})$, strictly improving upon the previously known bound $\tilde{O}(\frac{n}{\alpha}+k^3)$. Finally, we show that if the input strings do not have long highly periodic substrings, then the gap edit distance problem can be solved in sublinear time within any factor $\alpha>1$. We further give the first sublinear-time algorithm for the probabilistic embedding of edit distance to Hamming distance. Our $\tilde{O}(\frac{n}{p})$-time procedure yields an embedding with distortion $k^2p$, where $k$ is the edit distance of the original strings. Specifically, the Hamming distance of the resultant strings is between $\frac{k}{p}$ and $k^2$ with good probability. This generalizes the linear-time embedding of [Chakraborty, Goldenberg, Koucky, STOC 2016], where the resultant Hamming distance is between $k$ and $k^2$. Our algorithm is based on a random walk over samples, which we believe will find other applications in sublinear-time algorithms.

PDF Abstract

Code

Add Remove Mark official

No code implementations yet. Submit your code now

Datasets

Add Datasets introduced or used in this paper

Edit Social Preview

Sublinear-Time Algorithms for Computing & Embedding Gap Edit Distance

Code Edit Add Remove Mark official

Categories

Datasets Edit

Code

Add Remove Mark official

Datasets