Improving Hit-finding: Multilabel Neural Architecture with DEL

DNA-Encoded Libraries (DEL thereafter) data, often with millions of data points, enables large deep learning models to make real contributions in the drug discovery process (e.g., hit-finding). The current state-of-the-art method of modeling DEL data, GCNN multiclass model, requires domain experts to create mutually exclusive classification labels from multiple selection readouts of DEL data, which is not always an ideal assumption to formulate the problem. In this work, we designed a GCNN multilabel architecture that directly models each selection data to eliminate the corresponding dependency on human expertise. We selected effective choices for key modeling components such as label reduction scheme from in silico evaluation.To assess its performance in real-world drug discovery settings, we further carried out prospective wet-lab testing where the multilabel model shows consistent improvement in hit-rate (percentage of hits in a proposed molecule list) over the current state-of-the-art multiclass model.

PDF Abstract
No code implementations yet. Submit your code now

Datasets


  Add Datasets introduced or used in this paper

Results from the Paper


  Submit results from this paper to get state-of-the-art GitHub badges and help the community compare results to other papers.

Methods


No methods listed for this paper. Add relevant methods here