Graph4Code attempts to build well structured knowledge graphs about program code to similarly revolutionize diverse applications such as code search, code understanding, refactoring, bug detection, and code automation.
Results show that CodeBERT achieves state-of-the-art performance on both natural language code search and code documentation generation tasks.
Continuous embeddings of tokens in computer programs have been used to support a variety of software development tools, including readability, code search, and program repair.
With the recent explosion in the size and complexity of source codebases and software projects, the need for efficient source code search engines has increased dramatically.
We use Stack Overflow code snippets and their tags to train a language-agnostic, deep convolutional neural network to automatically predict semantic labels for source code documents.
Our evaluation shows that: 1. adding supervision to an existing unsupervised technique can improve performance, though not necessarily by much; 2. simple networks for supervision can be more effective that more sophisticated sequence-based networks for code search; 3. while it is common to use docstrings to carry out supervision, there is a sizeable gap between the effectiveness of docstrings and a more query-appropriate supervision corpus.
Since programming concepts do not match their syntactic representations, code search is a very tedious task.