CodeT5

Introduced by Wang et al. in CodeT5: Identifier-aware Unified Pre-trained Encoder-Decoder Models for Code Understanding and Generation

CodeT5 is a Transformer-based model for code understanding and generation based on the T5 architecture. It utilizes an identifier-aware pre-training objective that considers the crucial token type information (identifiers) from code. Specifically, the denoising Seq2Seq objective of T5 is extended with two identifier tagging and prediction tasks to enable the model to better leverage the token type information from programming languages, which are the identifiers assigned by developers. To improve the natural language-programming language alignment, a bimodal dual learning objective is used for a bidirectional conversion between natural language and programming language.

Source: CodeT5: Identifier-aware Unified Pre-trained Encoder-Decoder Models for Code Understanding and Generation

Read Paper See Code

Papers

Paper	Code	Results	Date	Stars

Tasks

Task	Papers	Share
Code Generation	5	17.86%
Program Repair	4	14.29%
Language Modelling	3	10.71%
Code Translation	3	10.71%
Translation	2	7.14%
Reinforcement Learning (RL)	1	3.57%
Machine Translation	1	3.57%
Synthetic Data Generation	1	3.57%
Retrieval	1	3.57%

Usage Over Time

This feature is experimental; we are continuously improving our matching algorithm.

Components

Component	Type	Add Remove
T5	Transformers

Categories

Add Remove

Code Generation Transformers

Autoencoding Transformers

Transformers