Code Summarization is a task that tries to comprehend code and automatically generate descriptions directly from the source code.
Source: Improving Automatic Source Code Summarization via Deep Reinforcement Learning
The ability to generate natural language sequences from source code snippets has a variety of applications such as code summarization, documentation, and retrieval.
Generating a readable summary that describes the functionality of a program is known as source code summarization.
We propose Contrastive Code Representation Learning (ContraCode), a self-supervised algorithm for learning task-agnostic semantic representations of programs via contrastive learning.
Ranked #1 on
Method name prediction
on CodeSearchNet
METHOD NAME PREDICTION REPRESENTATION LEARNING TYPE PREDICTION
Summarization of long sequences into a concise statement is a core problem in natural language processing, requiring non-trivial understanding of the input.
The first approaches to use structural information flattened the AST into a sequence.
Neural machine translation models are used to automatically generate a document from given source code since this can be regarded as a machine translation task.
Code summarization (CS) and code generation (CG) are two crucial tasks in the field of automatic software development.
To the best of our knowledge, most state-of-the-art approaches follow an encoder-decoder framework which encodes the code into a hidden space and then decode it into natural language space, suffering from two major drawbacks: a) Their encoders only consider the sequential content of code, ignoring the tree structure which is also critical for the task of code summarization, b) Their decoders are typically trained to predict the next word by maximizing the likelihood of next ground-truth word with previous ground-truth word given.