Code Search
49 papers with code • 5 benchmarks • 10 datasets
The goal of Code Search is to retrieve code fragments from a large code corpus that most closely match a developer’s intent, which is expressed in natural language.
Libraries
Use these libraries to find Code Search models and implementationsDatasets
Most implemented papers
funcGNN: A Graph Neural Network Approach to Program Similarity
This study intends to examine the effectiveness of graph neural networks to estimate program similarity, by analysing the associated control flow graphs.
Faster Person Re-Identification
In this work, we introduce a new solution for fast ReID by formulating a novel Coarse-to-Fine (CtF) hashing code search strategy, which complementarily uses short and long codes, achieving both faster speed and better accuracy.
Neural Code Search Revisited: Enhancing Code Snippet Retrieval through Natural Language Intent
In this work, we propose and study annotated code search: the retrieval of code snippets paired with brief descriptions of their intent using natural language queries.
GraphCodeBERT: Pre-training Code Representations with Data Flow
Instead of taking syntactic-level structure of code like abstract syntax tree (AST), we use data flow in the pre-training stage, which is a semantic-level structure of code that encodes the relation of "where-the-value-comes-from" between variables.
Search4Code: Code Search Intent Classification Using Weak Supervision
We evaluate the approach against several baselines on a real-world dataset comprised of over 1 million queries mined from Bing web search engine and show that the CNN based model can achieve an accuracy of 77% and 76% for C# and Java respectively.
PalmTree: Learning an Assembly Language Model for Instruction Embedding
Deep learning has demonstrated its strengths in numerous binary analysis tasks, including function boundary detection, binary code search, function prototype inference, value set analysis, etc.
deGraphCS: Embedding Variable-based Flow Graph for Neural Code Search
With the rapid increase in the amount of public code repositories, developers maintain a great desire to retrieve precise code snippets by using natural language.
CoSQA: 20,000+ Web Queries for Code Search and Question Answering
Finding codes given natural language query isb eneficial to the productivity of software developers.
CoDesc: A Large Code-Description Parallel Dataset
In this study, we present CoDesc -- a large parallel dataset composed of 4. 2 million Java methods and natural language descriptions.
Multimodal Representation for Neural Code Search
In this paper, to improve the vector space, we introduce tree-serialization methods on a simplified form of AST and build the multimodal representation for the code data.