Randomized Algorithms for Comparison-based Search

This paper addresses the problem of finding the nearest neighbor (or one of the $R$-nearest neighbors) of a query object $q$ in a database of $n$ objects, when we can only use a comparison oracle. The comparison oracle, given two reference objects and a query object, returns the reference object most similar to the query object. The main problem we study is how to search the database for the nearest neighbor (NN) of a query, while minimizing the questions. The difficulty of this problem depends on properties of the underlying database. We show the importance of a characterization: \emph{combinatorial disorder} $D$ which defines approximate triangle inequalities on ranks. We present a lower bound of $\Omega(D\log \frac{n}{D}+D^2)$ average number of questions in the search phase for any randomized algorithm, which demonstrates the fundamental role of $D$ for worst case behavior. We develop a randomized scheme for NN retrieval in $O(D^3\log^2 n+ D\log^2 n \log\log n^{D^3})$ questions. The learning requires asking $O(n D^3\log^2 n+ D \log^2 n \log\log n^{D^3})$ questions and $O(n\log^2n/\log(2D))$ bits to store.

PDF Abstract
No code implementations yet. Submit your code now

Datasets


  Add Datasets introduced or used in this paper

Results from the Paper


  Submit results from this paper to get state-of-the-art GitHub badges and help the community compare results to other papers.

Methods


No methods listed for this paper. Add relevant methods here