The goal of Code Search is to retrieve code fragments from a large code corpus that most closely match a developer’s intent, which is expressed in natural language.
To enable evaluation of progress on code search, we are releasing the CodeSearchNet Corpus and are presenting the CodeSearchNet Challenge, which consists of 99 natural language queries with about 4k expert relevance annotations of likely results from CodeSearchNet Corpus.
In this work, we introduce a new solution for fast ReID by formulating a novel Coarse-to-Fine (CtF) hashing code search strategy, which complementarily uses short and long codes, achieving both faster speed and better accuracy.
Results show that CodeBERT achieves state-of-the-art performance on both natural language code search and code documentation generation tasks.
The graph uses generic techniques to capture the semantics of Python code: the key nodes in the graph are classes, functions and methods in popular Python modules.
CODE SEARCH IMAGE CLASSIFICATION KNOWLEDGE GRAPHS NATURAL LANGUAGE UNDERSTANDING
In this work, we propose and study annotated code search: the retrieval of code snippets paired with brief descriptions of their intent using natural language queries.
Ranked #1 on
Annotated Code Search
on PACS-StaQC-py
ANNOTATED CODE SEARCH INFORMATION RETRIEVAL TRANSFER LEARNING
This study intends to examine the effectiveness of graph neural networks to estimate program similarity, by analysing the associated control flow graphs.
We evaluate the approach against several baselines on a real-world dataset comprised of over 1 million queries mined from Bing web search engine and show that the CNN based model can achieve an accuracy of 77% and 76% for C# and Java respectively.
We propose a technique for semantic code search: A Convolutional Neural Network approach to code retrieval (CoNCRA).
Our evaluation shows that: 1. adding supervision to an existing unsupervised technique can improve performance, though not necessarily by much; 2. simple networks for supervision can be more effective that more sophisticated sequence-based networks for code search; 3. while it is common to use docstrings to carry out supervision, there is a sizeable gap between the effectiveness of docstrings and a more query-appropriate supervision corpus.
With the rapid increase in the amount of public code repositories, developers maintain a great desire to retrieve precise code snippets by using natural language.