Large scale deduplication based on fingerprints

13 Jan 2021  ·  Jean Aymar Biyiha Nlend, Ibrahim Moukouop Nguena, Thomas Bouetou Bouetou ·

In fingerprint-based systems, the size of databases increases considerably with population growth. In developing countries, because of the difficulty in using a central system when enlisting voters, it often happens that several regional voter databases are created and then merged to form a central database... A process is used to remove duplicates and ensure uniqueness by voter. Until now, companies specializing in biometrics use several costly computing servers with algorithms to perform large-scale deduplication based on fingerprints. These algorithms take a considerable time because of their complexity in O (n2), where n is the size of the database. This article presents an algorithm that can perform this operation in O (2n), with just a computer. It is based on the development of an index obtained using a 5 * 5 matrix performed on each fingerprint. This index makes it possible to build clusters of O (1) in size in order to compare fingerprints. This approach has been evaluated using close to 11 4000 fingerprints, and the results obtained show that this approach allows a penetration rate of less than 1%, an almost O (1) identification, and an O (n) deduplication. A base of 10 000 000 fingerprints can be deduplicated with a just computer in less than two hours, contrary to several days and servers for the usual tools. Keywords: fingerprint, cluster, index, deduplication. read more

PDF Abstract
No code implementations yet. Submit your code now

Tasks


Datasets


  Add Datasets introduced or used in this paper

Results from the Paper


  Submit results from this paper to get state-of-the-art GitHub badges and help the community compare results to other papers.

Methods


No methods listed for this paper. Add relevant methods here