Retrieval Augmented Code Generation and Summarization

Software developers write a lot of source code and documentation during software development. Intrinsically, developers often recall parts of source code or code summaries that they had written in the past while implementing software or documenting them. To mimic developers' code or summary generation behavior, we propose a retrieval augmented framework, REDCODER, that retrieves relevant code or summaries from a retrieval database and provides them as a supplement to code generation or summarization models. REDCODER has a couple of uniqueness. First, it extends the state-of-the-art dense retrieval technique to search for relevant code or summaries. Second, it can work with retrieval databases that include unimodal (only code or natural language description) or bimodal instances (code-description pairs). We conduct experiments and extensive analysis on two benchmark datasets of code generation and summarization in Java and Python, and the promising results endorse the effectiveness of our proposed retrieval augmented framework.

PDF Abstract Findings (EMNLP) 2021 PDF Findings (EMNLP) 2021 Abstract

Results from the Paper


 Ranked #1 on Code Generation on CodeXGLUE - CodeSearchNet (using extra training data)

     Get a GitHub badge
Task Dataset Model Metric Name Metric Value Global Rank Uses Extra
Training Data
Result Benchmark
Code Generation CodeXGLUE - CodeSearchNet Redcoder-ext Java/EM 10.21 # 1
Python/EM 9.61 # 1
Java/BLEU 28.98 # 1
Python/BLEU 24.43 # 1
Java/CodeBLEU 33.18 # 1
Python/CodeBLEU 30.21 # 1
Code Summarization CodeXGLUE - CodeSearchNet Redcoder Python 21.01 # 1
Java 22.94 # 1
Code Generation CONCODE Redcoder-ext Exact Match 23.4 # 1
BLEU 42.5 # 1
CodeBLEU 43.4 # 2

Methods


No methods listed for this paper. Add relevant methods here