Topical: Learning Repository Embeddings from Source Code using Attention

19 Aug 2022  ·  Agathe Lherondelle, Varun Babbar, Yash Satsangi, Fran Silavong, Shaltiel Eloul, Sean Moran ·

This paper presents Topical, a novel deep neural network for repository level embeddings. Existing methods, reliant on natural language documentation or naive aggregation techniques, are outperformed by Topical's utilization of an attention mechanism. This mechanism generates repository-level representations from source code, full dependency graphs, and script level textual data. Trained on publicly accessible GitHub repositories, Topical surpasses multiple baselines in tasks such as repository auto-tagging, highlighting the attention mechanism's efficacy over traditional aggregation methods. Topical also demonstrates scalability and efficiency, making it a valuable contribution to repository-level representation computation. For further research, the accompanying tools, code, and training dataset are provided at: https://github.com/jpmorganchase/topical.

PDF Abstract

Datasets


  Add Datasets introduced or used in this paper

Results from the Paper


  Submit results from this paper to get state-of-the-art GitHub badges and help the community compare results to other papers.

Methods


No methods listed for this paper. Add relevant methods here