Code Search

49 papers with code • 5 benchmarks • 10 datasets

The goal of Code Search is to retrieve code fragments from a large code corpus that most closely match a developer’s intent, which is expressed in natural language.

Source: When Deep Learning Met Code Search

Benchmarks

Add a Result

These leaderboards are used to track progress in Code Search

Dataset	Best Model	Compare
CodeSearchNet	cpt-code M	See all
CoDesc	Self-attention	See all
CodeXGLUE - AdvTest	CodeT5+ 770M	See all
CodeSearchNet - Ruby	Uni-SBT	See all
CodeXGLUE - WebQueryTest	CodeBERT	See all

Libraries

Use these libraries to find Code Search models and implementations

microsoft/CodeBERT

5 papers

1,973

facebookresearch/CodeGen

2 papers

672

Datasets

Subtasks

Annotated Code Search

Most implemented papers

Most implemented Social Latest No code

CodeSearchNet Challenge: Evaluating the State of Semantic Code Search

github/CodeSearchNet • • 20 Sep 2019

To enable evaluation of progress on code search, we are releasing the CodeSearchNet Corpus and are presenting the CodeSearchNet Challenge, which consists of 99 natural language queries with about 4k expert relevance annotations of likely results from CodeSearchNet Corpus.

Paper
Code

CodeBERT: A Pre-Trained Model for Programming and Natural Languages

microsoft/CodeBERT • • Findings of the Association for Computational Linguistics 2020

Results show that CodeBERT achieves state-of-the-art performance on both natural language code search and code documentation generation tasks.

Paper
Code

CodeXGLUE: A Machine Learning Benchmark Dataset for Code Understanding and Generation

microsoft/CodeXGLUE • • 9 Feb 2021

Benchmark datasets have a significant impact on accelerating research in programming language tasks.

Paper
Code

When Deep Learning Met Code Search

facebookresearch/Neural-Code-Search-Evaluation-Dataset • 9 May 2019

Our evaluation shows that: 1. adding supervision to an existing unsupervised technique can improve performance, though not necessarily by much; 2. simple networks for supervision can be more effective that more sophisticated sequence-based networks for code search; 3. while it is common to use docstrings to carry out supervision, there is a sizeable gap between the effectiveness of docstrings and a more query-appropriate supervision corpus.

Paper
Code

CoNCRA: A Convolutional Neural Network Code Retrieval Approach

mrezende/concra • • 3 Sep 2020

We propose a technique for semantic code search: A Convolutional Neural Network approach to code retrieval (CoNCRA).

Paper
Code

DOBF: A Deobfuscation Pre-Training Objective for Programming Languages

facebookresearch/CodeGen • NeurIPS 2021

Recent advances in self-supervised learning have dramatically improved the state of the art on a wide variety of tasks.

Paper
Code

Memorization and Generalization in Neural Code Intelligence Models

uh-serg/ci-memorization • • 16 Jun 2021

The goal of this paper is to evaluate and compare the extent of memorization and generalization in neural code intelligence models.

Paper
Code

UniXcoder: Unified Cross-Modal Pre-training for Code Representation

microsoft/CodeBERT • • ACL 2022

Furthermore, we propose to utilize multi-modal contents to learn representation of code fragment with contrastive learning, and then align representations among programming languages using a cross-modal generation task.

Paper
Code

Evaluating How Developers Use General-Purpose Web-Search for Code Retrieval

masud99r/code-intent • 22 Mar 2018

Analyzing this dataset, we observe that code related searching often requires more effort (e. g., time, result clicks, and query modifications) than general non-code search, which indicates code search performance with a general search engine is less effective.

Paper
Code

A Toolkit for Generating Code Knowledge Graphs

wala/graph4code • • 21 Feb 2020

We make the toolkit to build such graphs as well as the sample extraction of the 2 billion triples graph publicly available to the community for use.

Paper
Code

Code Search

Benchmarks Add a Result

Libraries

Datasets

Subtasks

Most implemented papers

Content

Benchmarks

Add a Result