Code Search

49 papers with code • 5 benchmarks • 10 datasets

The goal of Code Search is to retrieve code fragments from a large code corpus that most closely match a developer’s intent, which is expressed in natural language.

Source: When Deep Learning Met Code Search

Libraries

Use these libraries to find Code Search models and implementations

Most implemented papers

funcGNN: A Graph Neural Network Approach to Program Similarity

aravi11/funcGNN 26 Jul 2020

This study intends to examine the effectiveness of graph neural networks to estimate program similarity, by analysing the associated control flow graphs.

Faster Person Re-Identification

wangguanan/light-reid ECCV 2020

In this work, we introduce a new solution for fast ReID by formulating a novel Coarse-to-Fine (CtF) hashing code search strategy, which complementarily uses short and long codes, achieving both faster speed and better accuracy.

Neural Code Search Revisited: Enhancing Code Snippet Retrieval through Natural Language Intent

nokia/codesearch 27 Aug 2020

In this work, we propose and study annotated code search: the retrieval of code snippets paired with brief descriptions of their intent using natural language queries.

GraphCodeBERT: Pre-training Code Representations with Data Flow

microsoft/CodeBERT ICLR 2021

Instead of taking syntactic-level structure of code like abstract syntax tree (AST), we use data flow in the pre-training stage, which is a semantic-level structure of code that encodes the relation of "where-the-value-comes-from" between variables.

Search4Code: Code Search Intent Classification Using Weak Supervision

microsoft/Search4Code 24 Nov 2020

We evaluate the approach against several baselines on a real-world dataset comprised of over 1 million queries mined from Bing web search engine and show that the CNN based model can achieve an accuracy of 77% and 76% for C# and Java respectively.

PalmTree: Learning an Assembly Language Model for Instruction Embedding

palmtreemodel/palmtree 21 Jan 2021

Deep learning has demonstrated its strengths in numerous binary analysis tasks, including function boundary detection, binary code search, function prototype inference, value set analysis, etc.

deGraphCS: Embedding Variable-based Flow Graph for Neural Code Search

degraphcs/DeGraphCS 24 Mar 2021

With the rapid increase in the amount of public code repositories, developers maintain a great desire to retrieve precise code snippets by using natural language.

CoSQA: 20,000+ Web Queries for Code Search and Question Answering

Jun-jie-Huang/CoCLR ACL 2021

Finding codes given natural language query isb eneficial to the productivity of software developers.

CoDesc: A Large Code-Description Parallel Dataset

csebuetnlp/CoDesc 29 May 2021

In this study, we present CoDesc -- a large parallel dataset composed of 4. 2 million Java methods and natural language descriptions.

Multimodal Representation for Neural Code Search

jianguda/mrncs 2 Jul 2021

In this paper, to improve the vector space, we introduce tree-serialization methods on a simplified form of AST and build the multimodal representation for the code data.