CodeXGLUE: A Machine Learning Benchmark Dataset for Code Understanding and Generation

Benchmark datasets have a significant impact on accelerating research in programming language tasks. In this paper, we introduce CodeXGLUE, a benchmark dataset to foster machine learning research for program understanding and generation. CodeXGLUE includes a collection of 10 tasks across 14 datasets and a platform for model evaluation and comparison. CodeXGLUE also features three baseline systems, including the BERT-style, GPT-style, and Encoder-Decoder models, to make it easy for researchers to use the platform. The availability of such data and baselines can help the development and validation of new methods that can be applied to various program understanding and generation problems.

PDF Abstract

Datasets


Introduced in the Paper:

CodeXGLUE

Used in the Paper:

ImageNet GLUE CodeSearchNet CONCODE XGLUE

Results from the Paper


Task Dataset Model Metric Name Metric Value Global Rank Result Benchmark
Code Search CodeXGLUE - AdvTest CodeBERT MRR 27.19 # 3
Clone Detection CodeXGLUE - BigCloneBench CodeBERT F1 96.5 # 2
Code Repair CodeXGLUE - Bugs2Fix CodeBERT BLEU (small) 77.42 # 1
Accuracy (small) 0.164 # 1
CodeBLEU (small) 75.58 # 1
BLEU (medium) 91.07 # 1
Accuracy (medium) 0.052 # 1
CodeBLEU (medium) 87.52 # 1
Code Summarization CodeXGLUE - CodeSearchNet CodeBERT Ruby 12.16 # 4
Javascript 14.9 # 4
Go 19.06 # 3
Python 18.07 # 5
Java 17.65 # 5
PHP 25.16 # 4
Code Translation CodeXGLUE - CodeTrans CodeBERT BLEU (Java→C#) 79.92 # 2
Accuracy (Java→C#) 59 # 2
CodeBLEU (Java→C#) 85.1 # 1
BLEU (C#β†’Java) 72.14 # 2
Accuracy (C#β†’Java) 58 # 2
CodeBLEU (C#β†’Java) 79.41 # 1
Text-to-Code Generation CodeXGLUE - CONCODE CodeGPT-adapted EM 20.1 # 2
BLEU 32.79 # 2
CodeBLEU 27.74 # 2
Cloze Test CodeXGLUE - CT-all CodeBERT(MLM) Ruby 80.17 # 1
JS 81.77 # 1
Go 83.31 # 1
Python 87.21 # 1
Java 80.63 # 1
PHP 85.05 # 1
Cloze Test CodeXGLUE - CT-maxmin CodeBERT(MLM) Ruby 86.84 # 1
JS 86.4 # 1
Go 90.79 # 1
Python 82.2 # 1
Java 90.46 # 1
PHP 88.21 # 1
Defect Detection CodeXGLUE - Devign CodeBERT Accuracy 62.08 # 2
Code Completion CodeXGLUE - Github Java Corpus CodeGPT-adapted Accuracy (token-level) 77.13 # 1
EM (line-level) 26.43 # 3
Edit Sim (line-level) 63.03 # 3
Document Translation CodeXGLUE - Microsoft Docs Pretrained Transformer EN β†’ DA 67.09 # 1
EN β†’ LA 51.92 # 1
EN β†’ NO 68.0 # 1
EN β†’ ZH 70.6 # 1
DA β†’ EN 67.02 # 1
LA β†’ EN 68.3 # 1
NO β†’ EN 71.84 # 1
ZH β†’ EN 64.47 # 1
Clone Detection CodeXGLUE - POJ-104 CodeBERT MAP 84.29 # 1
Code Completion CodeXGLUE - PY150 CodeGPT-adapted Accuracy (token-level) 75.11 # 1
EM (line-level) 39.65 # 3
Edit Sim (line-level) 69.84 # 3
Code Search CodeXGLUE - WebQueryTest CodeBERT F1 58.95 # 1
Accuracy 47.8 # 1

Methods


No methods listed for this paper. Add relevant methods here