Code Search

47 papers with code • 5 benchmarks • 10 datasets

The goal of Code Search is to retrieve code fragments from a large code corpus that most closely match a developer’s intent, which is expressed in natural language.

Source: When Deep Learning Met Code Search

Libraries

Use these libraries to find Code Search models and implementations

Source Code Clone Detection Using Unsupervised Similarity Measures

jorge-martinez-gil/codesim 18 Jan 2024

Assessing similarity in source code has gained significant attention in recent years due to its importance in software engineering tasks such as clone detection and code search and recommendation.

2
18 Jan 2024

TransformCode: A Contrastive Learning Framework for Code Embedding via Subtree transformation

iamfaith/transformcode 10 Nov 2023

The main reason for this is that encoding each code token would cause model parameter inflation, resulting in a lot of parameters storing information that we are not very concerned about.

1
10 Nov 2023

Language Models are Universal Embedders

izhx/uni-rep 12 Oct 2023

As such cases span from English to other natural or programming languages, from retrieval to classification and beyond, it is desirable to build a unified embedding model rather than dedicated ones for each scenario.

14
12 Oct 2023

Rethinking Negative Pairs in Code Search

Alex-HaochenLi/Soft-InfoNCE 12 Oct 2023

In our proposed loss function, we apply three methods to estimate the weights of negative pairs and show that the vanilla InfoNCE loss is a special case of Soft-InfoNCE.

8
12 Oct 2023

MELT: Mining Effective Lightweight Transformations from Pull Requests

squareslab/melt 28 Aug 2023

By leveraging code examples mined from the library source and automatically generated code examples based on the pull requests, we infer transformation rules in \comby, a language for structural code search and replace.

4
28 Aug 2023

Constructing Multilingual Code Search Dataset Using Neural Machine Translation

ynklab/xcodesearchnet 27 Jun 2023

Code search is a task to find programming codes that semantically match the given natural language queries.

3
27 Jun 2023

Structure-Aware Language Model Pretraining Improves Dense Retrieval on Structured Data

openmatch/openmatch 31 May 2023

SANTA proposes two pretraining methods to make language models structure-aware and learn effective representations for structured data: 1) Structured Data Alignment, which utilizes the natural alignment relations between structured data and unstructured data for structure-aware pretraining.

127
31 May 2023

Backdooring Neural Code Search

wssun/badcode 27 May 2023

Neural code search models are hence behind many such engines.

12
27 May 2023

CodeT5+: Open Code Large Language Models for Code Understanding and Generation

salesforce/codet5 13 May 2023

To address these limitations, we propose ``CodeT5+'', a family of encoder-decoder LLMs for code in which component modules can be flexibly combined to suit a wide range of downstream code tasks.

2,546
13 May 2023

The Vault: A Comprehensive Multilingual Dataset for Advancing Code Understanding and Generation

fsoft-ai4code/thevault 9 May 2023

We present The Vault, a dataset of high-quality code-text pairs in multiple programming languages for training large language models to understand and generate code.

77
09 May 2023