Clone Detection
25 papers with code • 2 benchmarks • 1 datasets
Most implemented papers
CodeT5: Identifier-aware Unified Pre-trained Encoder-Decoder Models for Code Understanding and Generation
We present CodeT5, a unified pre-trained encoder-decoder Transformer model that better leverages the code semantics conveyed from the developer-assigned identifiers.
CodeXGLUE: A Machine Learning Benchmark Dataset for Code Understanding and Generation
Benchmark datasets have a significant impact on accelerating research in programming language tasks.
Unified Pre-training for Program Understanding and Generation
Experiments on code summarization in the English language, code generation, and code translation in seven programming languages show that PLBART outperforms or rivals state-of-the-art models.
Trained Without My Consent: Detecting Code Inclusion In Language Models Trained on Code
Therefore, auditing code developed using LLMs is challenging, as it is difficult to reliably assert if an LLM used during development has been trained on specific copyrighted codes, given that we do not have access to the training datasets of these models.
Detecting Code Clones with Graph Neural Networkand Flow-Augmented Abstract Syntax Tree
As far as we have concerned, we are the first to apply graph neural networks on the domain of code clone detection.
Contrastive Code Representation Learning
Recent work learns contextual representations of source code by reconstructing tokens from their context.
GraphCodeBERT: Pre-training Code Representations with Data Flow
Instead of taking syntactic-level structure of code like abstract syntax tree (AST), we use data flow in the pre-training stage, which is a semantic-level structure of code that encodes the relation of "where-the-value-comes-from" between variables.
Bridging Pre-trained Models and Downstream Tasks for Source Code Understanding
In this paper, we propose an approach to bridge pre-trained models and code-related tasks.
Learning Program Semantics with Code Representations: An Empirical Study
However, currently, a comprehensive and systematic study on evaluating different program representation techniques across diverse tasks is still missed.
On The Cross-Modal Transfer from Natural Language to Code through Adapter Modules
Although adapters are known to facilitate adapting to many downstream tasks compared to fine-tuning the model that require retraining all of the models' parameters -- which owes to the adapters' plug and play nature and being parameter efficient -- their usage in software engineering is not explored.