TASK	DATASET	MODEL	METRIC NAME	METRIC VALUE	GLOBAL RANK
Method name prediction	CodeSearchNet	ContraCode	F1	17.24	# 1
Source Code Summarization	CodeSearchNet	ContraCode	F1	17.24	# 1
Type prediction	DeepTyper	ContraCode	Accuracy@5	84.60	# 1

Badge	Markdown
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/contrastive-code-representation-learning/method-name-prediction-on-codesearchnet)](https://paperswithcode.com/sota/method-name-prediction-on-codesearchnet?p=contrastive-code-representation-learning)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/contrastive-code-representation-learning/code-summarization-on-codesearchnet)](https://paperswithcode.com/sota/code-summarization-on-codesearchnet?p=contrastive-code-representation-learning)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/contrastive-code-representation-learning/type-prediction-on-deeptyper)](https://paperswithcode.com/sota/type-prediction-on-deeptyper?p=contrastive-code-representation-learning)`

Contrastive Code Representation Learning

EMNLP 2021 · Paras Jain, Ajay Jain, Tianjun Zhang, Pieter Abbeel, Joseph E. Gonzalez, Ion Stoica ·

Recent work learns contextual representations of source code by reconstructing tokens from their context. For downstream semantic understanding tasks like summarizing code in English, these representations should ideally capture program functionality. However, we show that the popular reconstruction-based BERT model is sensitive to source code edits, even when the edits preserve semantics. We propose ContraCode: a contrastive pre-training task that learns code functionality, not form. ContraCode pre-trains a neural network to identify functionally similar variants of a program among many non-equivalent distractors. We scalably generate these variants using an automated source-to-source compiler as a form of data augmentation. Contrastive pre-training improves JavaScript summarization and TypeScript type inference accuracy by 2% to 13%. We also propose a new zero-shot JavaScript code clone detection dataset, showing that ContraCode is both more robust and semantically meaningful. On it, we outperform RoBERTa by 39% AUROC in an adversarial setting and up to 5% on natural code.

PDF Abstract EMNLP 2021 PDF EMNLP 2021 Abstract

Code

Add Remove Mark official

parasj/contracode official

165

Tasks

Add Remove

Clone Detection

Contrastive Learning

Data Augmentation

Method name prediction

Representation Learning

Source Code Summarization

Type prediction

Datasets

CodeSearchNet

Results from the Paper

Edit

Ranked #1 on Method name prediction on CodeSearchNet

Get a GitHub badge

Task	Dataset	Model	Metric Name	Metric Value	Global Rank	Benchmark
Method name prediction	CodeSearchNet	ContraCode	F1	17.24	# 1	Compare
Source Code Summarization	CodeSearchNet	ContraCode	F1	17.24	# 1	Compare
Type prediction	DeepTyper	ContraCode	Accuracy@5	84.60	# 1	Compare

Methods

Add Remove

Adam • Attention Dropout • BERT • Dense Connections • Dropout • GELU • Layer Normalization • Linear Layer • Linear Warmup With Linear Decay • Multi-Head Attention • Residual Connection • RoBERTa • Scaled Dot-Product Attention • Softmax • Weight Decay • WordPiece

Edit Social Preview

Contrastive Code Representation Learning

Code Edit Add Remove Mark official

Tasks Edit Add Remove

Datasets Edit

Results from the Paper Edit

Methods Edit Add Remove

Code

Add Remove Mark official

Tasks

Add Remove

Datasets

Results from the Paper

Edit

Methods

Add Remove