33 papers with code • 6 benchmarks • 10 datasets
The goal of Code Search is to retrieve code fragments from a large code corpus that most closely match a developer’s intent, which is expressed in natural language.
To enable evaluation of progress on code search, we are releasing the CodeSearchNet Corpus and are presenting the CodeSearchNet Challenge, which consists of 99 natural language queries with about 4k expert relevance annotations of likely results from CodeSearchNet Corpus.
Results show that CodeBERT achieves state-of-the-art performance on both natural language code search and code documentation generation tasks.
Our evaluation shows that: 1. adding supervision to an existing unsupervised technique can improve performance, though not necessarily by much; 2. simple networks for supervision can be more effective that more sophisticated sequence-based networks for code search; 3. while it is common to use docstrings to carry out supervision, there is a sizeable gap between the effectiveness of docstrings and a more query-appropriate supervision corpus.
Recent advances in self-supervised learning have dramatically improved the state of the art on a wide variety of tasks.
The goal of this paper is to evaluate and compare the extent of memorization and generalization in neural code intelligence models.
Furthermore, we propose to utilize multi-modal contents to learn representation of code fragment with contrastive learning, and then align representations among programming languages using a cross-modal generation task.
Analyzing this dataset, we observe that code related searching often requires more effort (e. g., time, result clicks, and query modifications) than general non-code search, which indicates code search performance with a general search engine is less effective.