Experimental Analysis of Machine Learning Techniques for Finding Search Radius in Locality Sensitive Hashing

16 Nov 2022 · Omid Jafari, Parth Nagarkar ·

Finding similar data in high-dimensional spaces is one of the important tasks in multimedia applications. Approaches introduced to find exact searching techniques often use tree-based index structures which are known to suffer from the curse of the dimensionality problem that limits their performance. Approximate searching techniques prefer performance over accuracy and they return good enough results while achieving a better performance. Locality Sensitive Hashing (LSH) is one of the most popular approximate nearest neighbor search techniques for high-dimensional spaces. One of the most time-consuming processes in LSH is to find the neighboring points in the projected spaces. An improved LSH-based index structure, called radius-optimized Locality Sensitive Hashing (roLSH) has been proposed to utilize Machine Learning and efficiently find these neighboring points; thus, further improve the overall performance of LSH. In this paper, we extend roLSH by experimentally studying the effect of different types of famous Machine Learning techniques on overall performance. We compare ten regression techniques on four real-world datasets and show that Neural Network-based techniques are the best fit to be used in roLSH as their accuracy and performance trade-off are the best compared to the other techniques.

PDF Abstract