Code Search

18 papers with code • 5 benchmarks • 9 datasets

The goal of Code Search is to retrieve code fragments from a large code corpus that most closely match a developer’s intent, which is expressed in natural language.

Source: When Deep Learning Met Code Search

Greatest papers with code

CodeSearchNet Challenge: Evaluating the State of Semantic Code Search

github/CodeSearchNet 20 Sep 2019

To enable evaluation of progress on code search, we are releasing the CodeSearchNet Corpus and are presenting the CodeSearchNet Challenge, which consists of 99 natural language queries with about 4k expert relevance annotations of likely results from CodeSearchNet Corpus.

Code Search Information Retrieval

GraphCodeBERT: Pre-training Code Representations with Data Flow

microsoft/CodeBERT ICLR 2021

Instead of taking syntactic-level structure of code like abstract syntax tree (AST), we use data flow in the pre-training stage, which is a semantic-level structure of code that encodes the relation of "where-the-value-comes-from" between variables.

Clone Detection Code Completion +5

CodeXGLUE: A Machine Learning Benchmark Dataset for Code Understanding and Generation

microsoft/CodeXGLUE 9 Feb 2021

Benchmark datasets have a significant impact on accelerating research in programming language tasks.

Clone Detection Cloze Test +9

Faster Person Re-Identification

wangguanan/light-reid ECCV 2020

In this work, we introduce a new solution for fast ReID by formulating a novel Coarse-to-Fine (CtF) hashing code search strategy, which complementarily uses short and long codes, achieving both faster speed and better accuracy.

Code Search Person Re-Identification

DOBF: A Deobfuscation Pre-Training Objective for Programming Languages

facebookresearch/CodeGen NeurIPS 2021

Recent advances in self-supervised learning have dramatically improved the state of the art on a wide variety of tasks.

Code Search Code Translation +4

When Deep Learning Met Code Search

facebookresearch/Neural-Code-Search-Evaluation-Dataset 9 May 2019

Our evaluation shows that: 1. adding supervision to an existing unsupervised technique can improve performance, though not necessarily by much; 2. simple networks for supervision can be more effective that more sophisticated sequence-based networks for code search; 3. while it is common to use docstrings to carry out supervision, there is a sizeable gap between the effectiveness of docstrings and a more query-appropriate supervision corpus.

Code Search

A Toolkit for Generating Code Knowledge Graphs

wala/graph4code 21 Feb 2020

We make the toolkit to build such graphs as well as the sample extraction of the 2 billion triples graph publicly available to the community for use.

Code Search Image Classification +3

Neural Code Search Revisited: Enhancing Code Snippet Retrieval through Natural Language Intent

nokia/codesearch 27 Aug 2020

In this work, we propose and study annotated code search: the retrieval of code snippets paired with brief descriptions of their intent using natural language queries.

Annotated Code Search Information Retrieval +1

CoSQA: 20,000+ Web Queries for Code Search and Question Answering

Jun-jie-Huang/CoCLR ACL 2021

Finding codes given natural language query isb eneficial to the productivity of software developers.

Code Search Contrastive Learning +1