DNA encoded libraries (DELs) are pooled, combinatorial compound collections where each member is tagged with its own unique DNA barcode. DELs are used in drug discovery for early hit finding against protein targets. Recently, several groups have proposed building machine learning models with quantities derived from DEL datasets. However, DEL datasets have a low signal-to-noise ratio which makes modeling them challenging. To that end, we propose a novel graph neural network (GNN) based regression model that directly predicts enrichment scores from raw sequencing counts while accounting for multiple sources of technical variation and intrinsic assay noise. We show that our GNN regression model quantitatively outperforms standard classification approaches and can be used to find diverse sets of molecules in external virtual libraries.

PDF Abstract
No code implementations yet. Submit your code now

Datasets


  Add Datasets introduced or used in this paper

Results from the Paper


  Submit results from this paper to get state-of-the-art GitHub badges and help the community compare results to other papers.

Methods