Papyrus is comprised of around 60 million data points. This dataset contains multiple large publicly available datasets such as ChEMBL and ExCAPE-DB combined with several smaller datasets containing high-quality data. The aggregated data has been standardized and normalised in a manner that is suitable for machine learning.
Check https://jcheminf.biomedcentral.com/articles/10.1186/s13321-022-00672-x
Paper | Code | Results | Date | Stars |
---|