FPScreen: A Rapid Similarity Search Tool for Massive Molecular Library Based on Molecular Fingerprint Comparison

13 Jun 2019  ·  Lijun Wang, Jianbing Gong, Yingxia Zhang, Tianmou Liu, Junhui Gao ·

We designed a fast similarity search engine for large molecular libraries: FPScreen. We downloaded 100 million molecules' structure files in PubChem with SDF extension, then applied a computational chemistry tool RDKit to convert each structure file into one line of text in MACCS format and stored them in a text file as our molecule library. The similarity search engine compares the similarity while traversing the 166-bit strings in the library file line by line. FPScreen can complete similarity search through 100 million entries in our molecule library within one hour. That is very fast as a biology computation tool. Additionally, we divided our library into several strides for parallel processing. FPScreen was developed in WEB mode.

PDF Abstract
No code implementations yet. Submit your code now

Tasks


Datasets


  Add Datasets introduced or used in this paper

Results from the Paper


  Submit results from this paper to get state-of-the-art GitHub badges and help the community compare results to other papers.

Methods


No methods listed for this paper. Add relevant methods here